[BlueOnyx:15086] Re: [SPAM] Re: YUM updates for 5106R/5107R/5108R

Hisao SHIBUYA shibuya at alpha.or.jp
Fri Apr 4 15:32:42 -05 2014


Follow up,

> The get() function at /usr/sausalito/ui/libPhp/CceClient.php is called by php
> to get CODB data.
> So, to add the translation code into get() function will be resolve this
> issue, I think.
> The result of ccephp_get($this->handle, $oid, $namespace); is multi dimension
> array, so we need to translate all of value to UTF-8.

We need to add same function on perl module for handler.
FullName is written in /etc/password.
It’s not critical to be written by different encoding as EUC-JP or UTF-8.

Thanks,
Hisao


On Apr 5, 2014, at 5:22 AM, Hisao SHIBUYA <shibuya at alpha.or.jp> wrote:

> Hi Michael,
> 
> I understood the situation after your update.
> There is no translation function from GUI to CODB, so after update there are
> two encoding if the user add the users.
> If the charset is EUC-JP, the object is stored as EUC-JP.
> If the charset is UTF-8, the object is stored as UTF-8.
> 
> Before : GUI(EUC-JP) -> CODB(EUC-JP)
> After  : GUI(UTF-8)  -> CODB(UTF-8)
> 
> So we need to handle objects with EUC-JP and UTF-8 and translate to UTF-8 for
> display.
> 
>> 2.) base-user.mod:
>> 
>> The pages where we might run into these problems are just a few. Namely:
>> The user-list, the page where users are edited, personal profile and
>> personal email.
>> 
>> The affected input fields are:
>> 
>> - Username
>> - Comments
>> - Vacation message text
>> 
>> I modified the GUI pages for these to pass the above CODB data through
>> I18n::Utf8Encode() for cleaning. If the text *was* EUC-JP, it will be
>> shown as correct UTF-8. Upon saving it will be stored in CODB as UTF-8.
>> On subsequent usage of the same pages no further EUC-JP to UTF-8
>> transformation will be required, as the text is shown (and saved)
>> correctly by then.
>> 
>> SVN: http://devel.blueonyx.it/trac/changeset/1400/BlueOnyx/ui/base-user.mod
>> 
>> There *might* be other fields in the GUI where this may also be needed,
>> but right now I can't think of any.
> 
> This is one of the way to resolve, but we need to add translation code for
> all modules, because we can enter Japanese for like Description on other GUI.
> 
> The get() function at /usr/sausalito/ui/libPhp/CceClient.php is called by php
> to get CODB data.
> So, to add the translation code into get() function will be resolve this
> issue, I think.
> The result of ccephp_get($this->handle, $oid, $namespace); is multi dimension
> array, so we need to translate all of value to UTF-8.
> 
> I didn’t write the codes to translate all values.
> I believe this will be the way to resolve, but we need to check with other language
> that this doesn’t effect.
> 
> How do you think, Michael?
> 
> Thanks,
> Hisao
> 
> 
> On Apr 5, 2014, at 3:57 AM, Michael Stauber <mstauber at blueonyx.it> wrote:
> 
>> Hi Hisao,
>> 
>>> I checked on my BlueQuartz 5200R, the result as same as you tested.
>>> And, this file is written in UTF-8.
>>> The result of vavationMsg looks like corrupted, cceclient doesn’t support multibyte
>>> to display, because of why.
>> 
>> Yeah, it is doing some encoding and that also depends on the charset
>> that the submitted text was initially in.
>> 
>> I did some test, too. I installed an old 5108R from two years ago and
>> did not YUM update it. I changed the language to Japanese and created a
>> site with a dozen users that had Japanese names, Japanese comments and
>> Japanese vacation messages.
>> 
>> Then I fully YUM updated it and the display was indeed as broken as in
>> the screenshots that Eiji posted in [BlueOnyx:15081].
>> 
>> I then examined the CODB object of one of the Users. In the GUI I had
>> entered his name as this:
>> 
>> ベルタ
>> 
>> In CODB it looked like this:
>> 
>> 102 DATA fullName = "\245\331\245\353\245\277"
>> 
>> With the GUI now being in UTF-8 (even for Japanese) I saved this user
>> again. After changing his"fullname" back to "ベルタ". It got stored in
>> CODB as this:
>> 
>> 102 DATA fullName = "\343\203\231\343\203\253\343\202\277"
>> 
>> So we can assume this:
>> 
>> CODB data when submitted as EUC-JP:
>> 102 DATA fullName = "\245\331\245\353\245\277"
>> 
>> Same CODB data when submitted as UTF-8:
>> 102 DATA fullName = "\343\203\231\343\203\253\343\202\277"
>> 
>> To me it isn't entirely clear where, why or how CODB does the
>> transformation. I don't understand the C code well enough. But when I
>> look at the Perl client module CCE.pm
>> (http://devel.blueonyx.it/trac/browser/BlueOnyx/utils/cce/client/perl/CCE.pm)
>> it appears that the sub _escape does the encoding and the sub unescape
>> does the decoding. We can assume that the Perl module does the same
>> procedure as the PHP library of the same purpose.
>> 
>> If that's the case, then the encoded values appear to be stored in octal
>> format.
>> 
>> As you can see, in UTF-8 the same Japanese text is also longer. It is
>> almost twice as long, but not quite: 6 Groups for EUC-JP, 9 for UTF-8.
>> 
>> It can probably be explained with multibyte encoding. For some
>> characters it needs just two bytes and for others it might need three or
>> more.
>> 
>> I just did some math and it looks like this:
>> 
>> Char:	タ				(Kanji)
>> =	\343\202\277 			(octal)
>> =	75A0BF				(hex)
>> =	11101011010000010111111		(binary)
>> 
>> And that explains it. The Kanji character "タ" equals U+30BF in the
>> UTF-8 table:
>> 
>> http://www.eva.hi-ho.ne.jp/cgi-bin/user/zxcv/decodeUTF8.cgi?req=url&url=%E3%83%9F%E3%83%A4%E3%83%95%E3%82%B8%E3%83%AA%E3%83%A8%E3%82%A6%E3%82%BF
>> 
>> In the EUC-JP table "タ" = A5 BF (hex). Which doesn't match any of the
>> octal numbers in "\245\331\245\353\245\277".
>> 
>> See: http://fcd3.org/nihongo/euc-jp/index.html
>> 
>> In the Shift-JIS table "タ" = 83 5E (hex). In decimal it is "12479"
>> 
>> See:
>> http://www.kreativekorp.com/charset/encoding.php?file=shift-jis.kte&char=835E
>> 
>> Can it be that EUC-JP encoded data is stored in CODB in Shift-JIS? To me
>> that is a bit inconclusive.
>> 
>> Anyway: To fix this issue for the moment (pending a more thorough
>> solution) I did this two part update, which is now available via YUM:
>> 
>> 1.) sausalito-18n-*:
>> 
>> The Class I18n.php got modified again. The function I18n::Utf8Encode()
>> now checks if the input string is in EUC-JP. If so, it is converted to
>> UTF-8. After that check the result is passed through I18n::detectUTF8(),
>> which might (or might not) run the string through BXEncoding::toUTF8(),
>> which fixes damaged UTF-8 text.
>> 
>> SVN: http://devel.blueonyx.it/trac/changeset/1397/BlueOnyx/5107R/i18n
>> 
>> 2.) base-user.mod:
>> 
>> The pages where we might run into these problems are just a few. Namely:
>> The user-list, the page where users are edited, personal profile and
>> personal email.
>> 
>> The affected input fields are:
>> 
>> - Username
>> - Comments
>> - Vacation message text
>> 
>> I modified the GUI pages for these to pass the above CODB data through
>> I18n::Utf8Encode() for cleaning. If the text *was* EUC-JP, it will be
>> shown as correct UTF-8. Upon saving it will be stored in CODB as UTF-8.
>> On subsequent usage of the same pages no further EUC-JP to UTF-8
>> transformation will be required, as the text is shown (and saved)
>> correctly by then.
>> 
>> SVN: http://devel.blueonyx.it/trac/changeset/1400/BlueOnyx/ui/base-user.mod
>> 
>> There *might* be other fields in the GUI where this may also be needed,
>> but right now I can't think of any.
>> 
>> So I think that might do it for now.
>> 
>> -- 
>> With best regards
>> 
>> Michael Stauber
>> _______________________________________________
>> Blueonyx mailing list
>> Blueonyx at mail.blueonyx.it
>> http://mail.blueonyx.it/mailman/listinfo/blueonyx
> 
> 
> _______________________________________________
> Blueonyx mailing list
> Blueonyx at mail.blueonyx.it
> http://mail.blueonyx.it/mailman/listinfo/blueonyx





More information about the Blueonyx mailing list