[BlueOnyx:25087] Re: 5210R i18n Bug??

Michael Stauber mstauber at blueonyx.it
Thu Sep 9 00:54:25 -05 2021


Hi Sasaki,

Nice to hear from you and many thanks for reporting the problem with
Kanji characters on 5210R.

> There is no problem with Japanese processing on the 5209R, but when
> registering as a user on the 5210R, I confirmed that the Japanese
> processing was not performed correctly.

Yes, the i18n functionality is handled differently on 5209R than it is
on 5210R.

Up to (and including on) 5209R we were using a Zend API i18n.so PHP
module for displaying internationalized texts. That i18n.so had
extensive and well working functions to deal with special characters
such as Umlauts, Accents and even the Japanese Kanji characters.

However, due to massive changes between the PHP version used on 5209R
and the one used on 5210R we were unable to port both of our PHP Zend
API modules (cce.so and i18n.so) to PHP. We therefore replaced them with
native PHP Classes which reproduce and/or approximate the functionality
of the previously used Zend API modules.

This hasn't been without hick-ups and problems. The PHP class that
replaced cce.so has been working without issues for quite some time now,
but the i18n replacement indeed still seems to have some issues with
Japanese.

> There is no problem if it is in romaji like "test002" like the
> attachment, but you can register with "test" and i18n, but you can not
> get the details in UserMod.php and it is not displayed, so save button A
> warning message will be displayed if there are required items

I just tested in on a 5210R and I can indeed reproduce the issue. I
switched the GUI to Japanese, created a user with the name "test001" and
the Username テスト ("test" in Japanese).

I then saved the pages and the user was created successfully. However,
upon editing the user the GUI page had the Username field empty, which
would (of course) create an error message when saving the page again.

So I did a preliminary check of the root cause of the issue.

Upon user creation /var/log/messages recorded the following CCE transaction:

Sep  9 00:11:14 5210r cced(smd)[3891151]: client 19:[49:3249755]: SET
128 fullName = "テスト" description = "" capLevels = "" emailDisabled = "0"
ui_enabled = "1" ftpDisabled = "0" password = xxx

We can see: The username was stored in Kanji:

fullName = "テスト"

When I examine the CODB object of that User via "cceclient", I can see
that the "fullName" data field is populated and not empty:

[root at 5210r ~]# /usr/sausalito/bin/cceclient
100 CSCP/0.99
200 READY
get 128
102 DATA NAMESPACE = ""
102 DATA fullName = "\343\203\206\343\202\271\343\203\210"
102 DATA capabilities = ""
102 DATA ftpDisabled = "0"
102 DATA capLevels = ""
[...]

What we got there is this:

fullName = "\343\203\206\343\202\271\343\203\210"

Which is expected. This also happens to other non-plain ASCII text that
is stored in CODB. It's then stored in Octal format. I don't exactly
recall if these are the characters from the extended ASCII-table or if
it's the octal representation of UTF-8 characters. It's been a while and
I need to look this up in my notes and in the code.

Either way:

When the GUI needs to display text from CODB, that text is passed
through our i18n routines. On 5209R the actual heavy lifting is done by
the flawlessly working i18n.so and on 5210R we use the I18n_native Class
we created as a substitute. That Class should realize that anything
represented as a backslash followed by a 3-digit number is Octal and
needs to be converted back into UTF-8 or JIIS. For Umlauts and Accents
it seems to work fine and it also seems to work for *some* Kanji text.

But evidently not for everything.

So I did a further test. On the "Edit User" GUI page for this same user
I re-entered the Full Name of the User in Kanji as "テスト" and in the
"description" field I wrote "これはテストです。 無視して下さい。" (This is a
test. Please ignore.)

The resulting SET transaction to CODB was this:

Sep  9 00:39:16 5210r cced(smd)[3902607]: client 19:[49:3891316]: SET
128 fullName = "テスト" description = "これはテストです。 無視して下さい。 "
capLevels = "" emailDisabled = "0" ui_enabled = "1" ftpDisabled = "0"
password = xxx

Both the "fullName" and the "description" were set in Kanji.

In CODB this looks like this:

[root at 5210r ~]# /usr/sausalito/bin/cceclient
100 CSCP/0.99
200 READY
get 128
102 DATA fullName = "\343\203\206\343\202\271\343\203\210"
[...]
102 DATA description =
"\343\201\223\343\202\214\343\201\257\343\203\206\343\202\271\343\203\210\343\201\247\343\201\231\343\200\202
\347\204\241\350\246\226\343\201\227\343\201\246\344\270\213\343\201\225\343\201\204\343\200\202
"
[...]

When the GUI page reloaded the Full Name ("fullName"-field) was empty
again, but the "Description" ("description"-field in CODB) was still
showing "これはテストです。 無視して下さい。" as I had entered it.

So the code that refused to work for the "fullName"-field did work
flawlessly for the "description"-field. Or perhaps both fields use
slightly different methods and that explains the discrepancy. I'll need
to check that.

Out of curiosity I saved "テスト" into both "fullName" and "description"
and got the same results: "fullName" was empty afterwards and
"description" still had "テスト" in it.

I'll take a closer look at it tomorrow and will try to find out what's
causing the issue.

Many thanks for reporting this problem and I'll keep you posted on what
I find out.

-- 
With best regards

Michael Stauber



More information about the Blueonyx mailing list