[BlueOnyx:15086] Re: [SPAM] Re: YUM updates for 5106R/5107R/5108R
Hisao SHIBUYA
shibuya at alpha.or.jp
Fri Apr 4 15:32:42 -05 2014
Follow up,
> The get() function at /usr/sausalito/ui/libPhp/CceClient.php is called by php
> to get CODB data.
> So, to add the translation code into get() function will be resolve this
> issue, I think.
> The result of ccephp_get($this->handle, $oid, $namespace); is multi dimension
> array, so we need to translate all of value to UTF-8.
We need to add same function on perl module for handler.
FullName is written in /etc/password.
It’s not critical to be written by different encoding as EUC-JP or UTF-8.
Thanks,
Hisao
On Apr 5, 2014, at 5:22 AM, Hisao SHIBUYA <shibuya at alpha.or.jp> wrote:
> Hi Michael,
>
> I understood the situation after your update.
> There is no translation function from GUI to CODB, so after update there are
> two encoding if the user add the users.
> If the charset is EUC-JP, the object is stored as EUC-JP.
> If the charset is UTF-8, the object is stored as UTF-8.
>
> Before : GUI(EUC-JP) -> CODB(EUC-JP)
> After : GUI(UTF-8) -> CODB(UTF-8)
>
> So we need to handle objects with EUC-JP and UTF-8 and translate to UTF-8 for
> display.
>
>> 2.) base-user.mod:
>>
>> The pages where we might run into these problems are just a few. Namely:
>> The user-list, the page where users are edited, personal profile and
>> personal email.
>>
>> The affected input fields are:
>>
>> - Username
>> - Comments
>> - Vacation message text
>>
>> I modified the GUI pages for these to pass the above CODB data through
>> I18n::Utf8Encode() for cleaning. If the text *was* EUC-JP, it will be
>> shown as correct UTF-8. Upon saving it will be stored in CODB as UTF-8.
>> On subsequent usage of the same pages no further EUC-JP to UTF-8
>> transformation will be required, as the text is shown (and saved)
>> correctly by then.
>>
>> SVN: http://devel.blueonyx.it/trac/changeset/1400/BlueOnyx/ui/base-user.mod
>>
>> There *might* be other fields in the GUI where this may also be needed,
>> but right now I can't think of any.
>
> This is one of the way to resolve, but we need to add translation code for
> all modules, because we can enter Japanese for like Description on other GUI.
>
> The get() function at /usr/sausalito/ui/libPhp/CceClient.php is called by php
> to get CODB data.
> So, to add the translation code into get() function will be resolve this
> issue, I think.
> The result of ccephp_get($this->handle, $oid, $namespace); is multi dimension
> array, so we need to translate all of value to UTF-8.
>
> I didn’t write the codes to translate all values.
> I believe this will be the way to resolve, but we need to check with other language
> that this doesn’t effect.
>
> How do you think, Michael?
>
> Thanks,
> Hisao
>
>
> On Apr 5, 2014, at 3:57 AM, Michael Stauber <mstauber at blueonyx.it> wrote:
>
>> Hi Hisao,
>>
>>> I checked on my BlueQuartz 5200R, the result as same as you tested.
>>> And, this file is written in UTF-8.
>>> The result of vavationMsg looks like corrupted, cceclient doesn’t support multibyte
>>> to display, because of why.
>>
>> Yeah, it is doing some encoding and that also depends on the charset
>> that the submitted text was initially in.
>>
>> I did some test, too. I installed an old 5108R from two years ago and
>> did not YUM update it. I changed the language to Japanese and created a
>> site with a dozen users that had Japanese names, Japanese comments and
>> Japanese vacation messages.
>>
>> Then I fully YUM updated it and the display was indeed as broken as in
>> the screenshots that Eiji posted in [BlueOnyx:15081].
>>
>> I then examined the CODB object of one of the Users. In the GUI I had
>> entered his name as this:
>>
>> ベルタ
>>
>> In CODB it looked like this:
>>
>> 102 DATA fullName = "\245\331\245\353\245\277"
>>
>> With the GUI now being in UTF-8 (even for Japanese) I saved this user
>> again. After changing his"fullname" back to "ベルタ". It got stored in
>> CODB as this:
>>
>> 102 DATA fullName = "\343\203\231\343\203\253\343\202\277"
>>
>> So we can assume this:
>>
>> CODB data when submitted as EUC-JP:
>> 102 DATA fullName = "\245\331\245\353\245\277"
>>
>> Same CODB data when submitted as UTF-8:
>> 102 DATA fullName = "\343\203\231\343\203\253\343\202\277"
>>
>> To me it isn't entirely clear where, why or how CODB does the
>> transformation. I don't understand the C code well enough. But when I
>> look at the Perl client module CCE.pm
>> (http://devel.blueonyx.it/trac/browser/BlueOnyx/utils/cce/client/perl/CCE.pm)
>> it appears that the sub _escape does the encoding and the sub unescape
>> does the decoding. We can assume that the Perl module does the same
>> procedure as the PHP library of the same purpose.
>>
>> If that's the case, then the encoded values appear to be stored in octal
>> format.
>>
>> As you can see, in UTF-8 the same Japanese text is also longer. It is
>> almost twice as long, but not quite: 6 Groups for EUC-JP, 9 for UTF-8.
>>
>> It can probably be explained with multibyte encoding. For some
>> characters it needs just two bytes and for others it might need three or
>> more.
>>
>> I just did some math and it looks like this:
>>
>> Char: タ (Kanji)
>> = \343\202\277 (octal)
>> = 75A0BF (hex)
>> = 11101011010000010111111 (binary)
>>
>> And that explains it. The Kanji character "タ" equals U+30BF in the
>> UTF-8 table:
>>
>> http://www.eva.hi-ho.ne.jp/cgi-bin/user/zxcv/decodeUTF8.cgi?req=url&url=%E3%83%9F%E3%83%A4%E3%83%95%E3%82%B8%E3%83%AA%E3%83%A8%E3%82%A6%E3%82%BF
>>
>> In the EUC-JP table "タ" = A5 BF (hex). Which doesn't match any of the
>> octal numbers in "\245\331\245\353\245\277".
>>
>> See: http://fcd3.org/nihongo/euc-jp/index.html
>>
>> In the Shift-JIS table "タ" = 83 5E (hex). In decimal it is "12479"
>>
>> See:
>> http://www.kreativekorp.com/charset/encoding.php?file=shift-jis.kte&char=835E
>>
>> Can it be that EUC-JP encoded data is stored in CODB in Shift-JIS? To me
>> that is a bit inconclusive.
>>
>> Anyway: To fix this issue for the moment (pending a more thorough
>> solution) I did this two part update, which is now available via YUM:
>>
>> 1.) sausalito-18n-*:
>>
>> The Class I18n.php got modified again. The function I18n::Utf8Encode()
>> now checks if the input string is in EUC-JP. If so, it is converted to
>> UTF-8. After that check the result is passed through I18n::detectUTF8(),
>> which might (or might not) run the string through BXEncoding::toUTF8(),
>> which fixes damaged UTF-8 text.
>>
>> SVN: http://devel.blueonyx.it/trac/changeset/1397/BlueOnyx/5107R/i18n
>>
>> 2.) base-user.mod:
>>
>> The pages where we might run into these problems are just a few. Namely:
>> The user-list, the page where users are edited, personal profile and
>> personal email.
>>
>> The affected input fields are:
>>
>> - Username
>> - Comments
>> - Vacation message text
>>
>> I modified the GUI pages for these to pass the above CODB data through
>> I18n::Utf8Encode() for cleaning. If the text *was* EUC-JP, it will be
>> shown as correct UTF-8. Upon saving it will be stored in CODB as UTF-8.
>> On subsequent usage of the same pages no further EUC-JP to UTF-8
>> transformation will be required, as the text is shown (and saved)
>> correctly by then.
>>
>> SVN: http://devel.blueonyx.it/trac/changeset/1400/BlueOnyx/ui/base-user.mod
>>
>> There *might* be other fields in the GUI where this may also be needed,
>> but right now I can't think of any.
>>
>> So I think that might do it for now.
>>
>> --
>> With best regards
>>
>> Michael Stauber
>> _______________________________________________
>> Blueonyx mailing list
>> Blueonyx at mail.blueonyx.it
>> http://mail.blueonyx.it/mailman/listinfo/blueonyx
>
>
> _______________________________________________
> Blueonyx mailing list
> Blueonyx at mail.blueonyx.it
> http://mail.blueonyx.it/mailman/listinfo/blueonyx
More information about the Blueonyx
mailing list