[BlueOnyx:15150] Re: YUM updates for 5106R/5107R/5108R

Hisao SHIBUYA shibuya at alpha.or.jp
Wed Apr 9 06:56:44 -05 2014


Hi Michael,

Sorry for slow response,

>> I understood the situation after your update.
>> There is no translation function from GUI to CODB, so after update there are
>> two encoding if the user add the users.
>> If the charset is EUC-JP, the object is stored as EUC-JP.
>> If the charset is UTF-8, the object is stored as UTF-8.
>> 
>> Before : GUI(EUC-JP) -> CODB(EUC-JP)
>> After  : GUI(UTF-8)  -> CODB(UTF-8)
> 
> Yeah, but non-ASCII text is stored in octal in CODB. That is what I
> meant with "translation". The same applies if the text contains umlauts
> ("äöüßÖÄÜ") or accents or acutes. Without such "special" non-ASCII
> characters CODB stores the text as ASCII. With them it "encodes" it in
> octals. Which octals it uses seems to depend on the used charset (EUC-JP
> or UTF-8).
> 
> But it's good. We gained a slightly better understanding of these
> mechanisms now. It's not yet perfect, but we might get there.

Ooh, exactly, we need to support other languages.
I’m not clear with non-ASCII like you wrote.


>> I believe this will be the way to resolve, but we need to check with other language
>> that this doesn’t effect.
> 
> Yeah, it would need to go into get(), getObject() and for good measure
> also into set() and setObject().
> 
> Generally the new function I18n::Utf8Encode() can do this nicely:
> 
>  function Utf8Encode($text) {
>      if (mb_detect_encoding($text, "JIS, EUC-JP, ISO-8859-1,
> ISO-8859-15, windows-1252, UTF-8") == "EUC-JP") {
>        $text = mb_convert_encoding($text, "UTF-8", "EUC-JP");
>      }
>      if (detectUTF8($text) == "1" ) {
>          return $text;
>      }
>      return BXEncoding::toUTF8($text);
>  }
> 
> It checks if the charsets encoding is in Japanese. If so, it will be
> turned into UTF-8. The result is then cleaned with BXEncoding::toUTF8()
> to make sure the UTF-8 is good.
> 
> The only problem is that the function I18n::Utf8Encode() only deals with
> strings. And for the get(), getObject(), set() and setObject() we would
> need to run this over all values within the array of the CODB object(s).
> 
> This will affect performance.

I checked with the latest updates and looked your codes.
I’m agree with your update on this version, you add the translation code for
some field which is allowed to enter Japanese.
I think that to call multibyte function affect some performance.

I found some corruption with specific character like the following.
http://www.alpha.or.jp/~shibuya/BlueOnyx/UserList-currupt.png

I tested with some character and strings pattern that it is sometimes corrupt,
sorry I worked QA team for Japanese Qube3 and RaQ550 ;-p

Some strings aren't detected with right encoding. Generally, mb_detect_encoding
detected with wrong encoding with short strings or encoding order.
So, can you change the encoding order for detection, if there is no problem with
other language like as non-ASCII with umlauts.

-----
--- I18n.php.orig       2014-04-10 09:01:24.000000000 +0900
+++ I18n.php    2014-04-10 09:31:47.000000000 +0900
@@ -83,7 +83,7 @@
   }
 
   function Utf8Encode($text) {
-      if (mb_detect_encoding($text, "JIS, EUC-JP, ISO-8859-1, ISO-8859-15, windows-1252, UTF-8") == "EUC-JP") {
+      if (mb_detect_encoding($text, "JIS, UTF-8, EUC-JP, ISO-8859-1, ISO-8859-15, windows-1252") == "EUC-JP") {
         $text = mb_convert_encoding($text, "UTF-8", "EUC-JP");
       }
       if (I18n::detectUTF8($text) == "1" ) {
——

Thanks,
Hisao





More information about the Blueonyx mailing list