[BlueOnyx:15087] Re: YUM updates for 5106R/5107R/5108R

Fri Apr 4 15:43:55 -05 2014

Hi Hisao,

> I understood the situation after your update.
> There is no translation function from GUI to CODB, so after update there are
> two encoding if the user add the users.
> If the charset is EUC-JP, the object is stored as EUC-JP.
> If the charset is UTF-8, the object is stored as UTF-8.
> 
> Before : GUI(EUC-JP) -> CODB(EUC-JP)
> After  : GUI(UTF-8)  -> CODB(UTF-8)

Yeah, but non-ASCII text is stored in octal in CODB. That is what I
meant with "translation". The same applies if the text contains umlauts
("äöüßÖÄÜ") or accents or acutes. Without such "special" non-ASCII
characters CODB stores the text as ASCII. With them it "encodes" it in
octals. Which octals it uses seems to depend on the used charset (EUC-JP
or UTF-8).

But it's good. We gained a slightly better understanding of these
mechanisms now. It's not yet perfect, but we might get there.

> This is one of the way to resolve, but we need to add translation code for
> all modules, because we can enter Japanese for like Description on other GUI.

Yeah, that is a good point. I'm looking for the other places where this
needs to be done.

> The get() function at /usr/sausalito/ui/libPhp/CceClient.php is called by php
> to get CODB data.
> So, to add the translation code into get() function will be resolve this
> issue, I think.
> The result of ccephp_get($this->handle, $oid, $namespace); is multi dimension
> array, so we need to translate all of value to UTF-8.

Ah, thank you very much. That's exactly what I've been looking for. Yes,
that will certainly work nicely. I'll see to it.

> I believe this will be the way to resolve, but we need to check with other language
> that this doesn’t effect.

Yeah, it would need to go into get(), getObject() and for good measure
also into set() and setObject().

Generally the new function I18n::Utf8Encode() can do this nicely:

  function Utf8Encode($text) {
      if (mb_detect_encoding($text, "JIS, EUC-JP, ISO-8859-1,
ISO-8859-15, windows-1252, UTF-8") == "EUC-JP") {
        $text = mb_convert_encoding($text, "UTF-8", "EUC-JP");
      }
      if (detectUTF8($text) == "1" ) {
          return $text;
      }
      return BXEncoding::toUTF8($text);
  }

It checks if the charsets encoding is in Japanese. If so, it will be
turned into UTF-8. The result is then cleaned with BXEncoding::toUTF8()
to make sure the UTF-8 is good.

The only problem is that the function I18n::Utf8Encode() only deals with
strings. And for the get(), getObject(), set() and setObject() we would
need to run this over all values within the array of the CODB object(s).

This will affect performance.

How much I don't know yet. I'll run some tests to check this.

> How do you think, Michael?

Yes, that's really good information, Hisao. Thank you very much!

-- 
With best regards

Michael Stauber