Watch, Follow, &
Connect with Us

For forums, blogs and more please visit our
Developer Tools Community.


Welcome, Guest
Guest Settings
Help

Thread: problems displaying Displaying thai characters


This question is answered. Helpful answers available: 2. Correct answers available: 1.


Permlink Replies: 10 - Last Post: Nov 14, 2017 12:13 AM Last Post By: Jeff Karingada Threads: [ Previous | Next ]
Jeff Karingada

Posts: 10
Registered: 2/27/17
problems displaying Displaying thai characters  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Nov 9, 2017 2:24 PM
I have migrated a web application from intraweb 4/ Delphi 6 to intraweb 14/ Delphi Berlin and I am still using classic ansistring because of other dependencies. However I am having issues to displaying Thai characters. I have set <META http-equiv="Content-Type" content="text/html" charset="TIS-620"> in the header of my forms. However it seems the webpages are being displayed as UTF-8 because of the <meta charset="utf-8"> taking precedence over the content type I have set.

the webpage displays unrecognizable text, however when I right click the webpage and change the encoding the thai(windows) it displays the Thai characters correctly.
the question I had, is it possible to override the default charset of the application(UTF-8) and use a different charset

Kind Regards

Edited by: Jeff Karingada on Nov 9, 2017 2:24 PM
Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: problems displaying Displaying thai characters [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Nov 9, 2017 4:17 PM   in response to: Jeff Karingada in response to: Jeff Karingada
Jeff Karingada wrote:

I have migrated a web application from intraweb 4/ Delphi 6 to
intraweb 14/ Delphi Berlin and I am still using classic ansistring
because of other dependencies.

Just because you use ANSI strings in memory doesn't mean you can't use
UTF-8 in the HTML. They are two different and independant things.
Just make sure you convert your ANSI strings to Unicode properly (ie,
using a Thai charset) before giving them to IntraWeb, if it is
expecting Unicode strings. I would find it very unlikely that a modern
IntraWeb version would accept ANSI strings in a modern Delphi version.

I have set <META http-equiv="Content-Type" content="text/html"
charset="TIS-620"> in the header of my forms. However it seems the
webpages are being displayed as UTF-8 because of the <meta
charset="utf-8"> taking precedence over the content type I have set.

Why are you mixing <meta http-equiv> with <meta charset> at all? The
former is for HTML4 only, the latter is for HTML5 only. You can't mix
different HTML versions on the same page. Pick one or the other, not
both. Sounds like IntraWeb uses HTML5, so stick with that.

the webpage displays unrecognizable text, however when I right click
the webpage and change the encoding the thai(windows) it displays the
Thai characters correctly.

I find that hard to believe. Considering that Delphi 2009+ uses
Unicode strings, I would expect IntraWeb to do so as well, and I
wouldn't expect it to send Unicode characters in a Thai charset unless
you explictly tell it to do so, which would contradict the above.
Unless it has a bug that doesn't set <meta charset> correctly.

How EXACTLY are you giving your Thai strings to IntraWeb and/or sending
them to the client? Are you doing something manually to send the raw
bytes yourself?

the question I had, is it possible to override the default charset of
the application(UTF-8) and use a different charset

I'm not an IntraWeb user, so I couldn't answer that.

--
Remy Lebeau (TeamB)
Jeff Karingada

Posts: 10
Registered: 2/27/17
Re: problems displaying Displaying thai characters [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Nov 10, 2017 3:25 AM   in response to: Remy Lebeau (Te... in response to: Remy Lebeau (Te...
Remy Lebeau (TeamB) wrote:
Jeff Karingada wrote:

I have migrated a web application from intraweb 4/ Delphi 6 to
intraweb 14/ Delphi Berlin and I am still using classic ansistring
because of other dependencies.

Just because you use ANSI strings in memory doesn't mean you can't use
UTF-8 in the HTML. They are two different and independant things.
Just make sure you convert your ANSI strings to Unicode properly (ie,
using a Thai charset) before giving them to IntraWeb, if it is
expecting Unicode strings. I would find it very unlikely that a modern
IntraWeb version would accept ANSI strings in a modern Delphi version.

I have set <META http-equiv="Content-Type" content="text/html"
charset="TIS-620"> in the header of my forms. However it seems the
webpages are being displayed as UTF-8 because of the <meta
charset="utf-8"> taking precedence over the content type I have set.


Why are you mixing <meta http-equiv> with <meta charset> at all? The
former is for HTML4 only, the latter is for HTML5 only. You can't mix
different HTML versions on the same page. Pick one or the other, not
both. Sounds like IntraWeb uses HTML5, so stick with that.

Intraweb sets UTF-8 for each of the web pages. the meta http-equil was ported across from the application. I thought I could override the charset my inserting it into the header

the webpage displays unrecognizable text, however when I right click
the webpage and change the encoding the thai(windows) it displays the
Thai characters correctly.

I find that hard to believe. Considering that Delphi 2009+ uses
Unicode strings, I would expect IntraWeb to do so as well, and I
wouldn't expect it to send Unicode characters in a Thai charset unless
you explictly tell it to do so, which would contradict the above.
Unless it has a bug that doesn't set <meta charset> correctly.

How EXACTLY are you giving your Thai strings to IntraWeb and/or sending
them to the client? Are you doing something manually to send the raw
bytes yourself?

this application talks to a mainframe backend and the cthai haracters are stored as single byte EBCDIC with CCSID of 838. There is a conversion process from EBCDIC to ASCII on the mainframe side before the message is sent to the Intraweb application.
I have tried various options on the connector side on the client application. I have used Unicode string and used the correct encoding(878 for thai)
buffStr2 := BytesToString(inBuff,IndyTextEncoding(874));
when I debug, I can see buffstr2 contains the actual Thai characters but when it displays it on the webpage, it shows it as ?????. I believe this is because there is data loss when moving it to intermediate data stores within the application

I also tried converting the data into a raw string and TIS-620 charset would display the thai characters:

buffStr2 := ansistring(BytesToStringRaw(inBuff));
characters that are displayed:

9/99åä»ñÆí¢«­å¶´Âòݤ¦

however when I change the encoding on the browser, it actually displays the thai characters
9/99รTMรฆรรฑรกรยขยซร,รTM

thanks

the question I had, is it possible to override the default charset of
the application(UTF-8) and use a different charset

I'm not an IntraWeb user, so I couldn't answer that.

--
Remy Lebeau (TeamB)
Chad Hower

Posts: 613
Registered: 3/2/07
Re: problems displaying Displaying thai characters [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Nov 10, 2017 7:05 AM   in response to: Jeff Karingada in response to: Jeff Karingada
On 11/10/2017 7:25 AM, Jeff Karingada wrote:
this application talks to a mainframe backend and the cthai haracters are stored as single byte EBCDIC with CCSID of

The issue is that in Delphi everything is pretty much Unicode now. Sure
there are still AnsiStrings there, but many methods etc do not support them.

I've asked Alexandre to follow up as he is more authoritative in this
area currently, but I would like to suggest that your best path is to
convert it from when you get it from the DB into a normal Unicode string
and then let IW handle it from there. UTF-8 can handle Thai just fine,
and many browsers etc are moving towards Unicode only as well with
specific encodings becoming legacy and deprecated.
Dan Barclay

Posts: 889
Registered: 11/9/03
Re: problems displaying Displaying thai characters [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Nov 10, 2017 9:11 AM   in response to: Chad Hower in response to: Chad Hower
Chad Hower wrote:
On 11/10/2017 7:25 AM, Jeff Karingada wrote:
this application talks to a mainframe backend and the cthai haracters are stored as single byte EBCDIC with CCSID of

The issue is that in Delphi everything is pretty much Unicode now. Sure
there are still AnsiStrings there, but many methods etc do not support them.

True there. Delphi handled the transition very badly, taking the (fairly arrogant) approach that "you shouldn't use strings for anything but text". When you count on maintaining bit level consistency there are landmines. Ansistrings have historically been a very good container for those (particularly if you need multidimensional arrays of them), with a lot of effective support functions. Unfortunately, they completely discounted existing developer use of that functionality.

One "poster child" example is their handling of AnsiXxxxStr() functions. Even if you declare all your variables as ansistring, in spite of the function the name containing "Ansi", they convert your parameter to Unicode, perform the function, then return a Unicode string and convert that again to assign it to your ansistring. That is:

MyAnsi:=AnsiLeftStr(MyAnsi,2); ... will convert your MyAnsi to Unicode, perform the LeftStr function in Unicode, then convert it back to Ansi for the return assignment. These (almost) always convert correctly but that "almost" can create bugs that are difficult to track down.

You can (eventually) find these correct them but you have to explicitly reference an older unit containing the actual Ansi versions. Why they didn't provide overloaded functions is beyond comprehension. In fact, the revelation that they don't seem to give a damn about forward migration of existing code that doesn't meet their "idealistic" standards has scared me away from migrating past XE2 for some time.

You can track down the issues, but it takes vigilance and time on code that was already debugged and working. They seem to forget "there is no more rapid development technology than reusing existing, working, code" and, yes, you can quote me on that.

Sorry for the rant, but as you can tell I've been through this more than once.

Dan
Alexandre Machado

Posts: 1,754
Registered: 8/10/13
Re: problems displaying Displaying thai characters [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Nov 12, 2017 1:00 AM   in response to: Jeff Karingada in response to: Jeff Karingada
Jeff Karingada wrote:
Remy Lebeau (TeamB) wrote:
Jeff Karingada wrote:

I have migrated a web application from intraweb 4/ Delphi 6 to
intraweb 14/ Delphi Berlin and I am still using classic ansistring
because of other dependencies.

Just because you use ANSI strings in memory doesn't mean you can't use
UTF-8 in the HTML. They are two different and independant things.
Just make sure you convert your ANSI strings to Unicode properly (ie,
using a Thai charset) before giving them to IntraWeb, if it is
expecting Unicode strings. I would find it very unlikely that a modern
IntraWeb version would accept ANSI strings in a modern Delphi version.

I have set <META http-equiv="Content-Type" content="text/html"
charset="TIS-620"> in the header of my forms. However it seems the
webpages are being displayed as UTF-8 because of the <meta
charset="utf-8"> taking precedence over the content type I have set.


Why are you mixing <meta http-equiv> with <meta charset> at all? The
former is for HTML4 only, the latter is for HTML5 only. You can't mix
different HTML versions on the same page. Pick one or the other, not
both. Sounds like IntraWeb uses HTML5, so stick with that.

Intraweb sets UTF-8 for each of the web pages. the meta http-equil was ported across from the application. I thought I could override the charset my inserting it into the header

the webpage displays unrecognizable text, however when I right click
the webpage and change the encoding the thai(windows) it displays the
Thai characters correctly.

I find that hard to believe. Considering that Delphi 2009+ uses
Unicode strings, I would expect IntraWeb to do so as well, and I
wouldn't expect it to send Unicode characters in a Thai charset unless
you explictly tell it to do so, which would contradict the above.
Unless it has a bug that doesn't set <meta charset> correctly.

How EXACTLY are you giving your Thai strings to IntraWeb and/or sending
them to the client? Are you doing something manually to send the raw
bytes yourself?

this application talks to a mainframe backend and the cthai haracters are stored as single byte EBCDIC with CCSID of 838. There is a conversion process from EBCDIC to ASCII on the mainframe side before the message is sent to the Intraweb application.
I have tried various options on the connector side on the client application. I have used Unicode string and used the correct encoding(878 for thai)
buffStr2 := BytesToString(inBuff,IndyTextEncoding(874));
when I debug, I can see buffstr2 contains the actual Thai characters but when it displays it on the webpage, it shows it as ?????. I believe this is because there is data loss when moving it to intermediate data stores within the application

I also tried converting the data into a raw string and TIS-620 charset would display the thai characters:

buffStr2 := ansistring(BytesToStringRaw(inBuff));
characters that are displayed:

9/99åä»ñÆí¢«­å¶´Âòݤ¦

however when I change the encoding on the browser, it actually displays the thai characters
9/99รTMรฆรรฑรกรยขยซร,รTM

thanks

the question I had, is it possible to override the default charset of
the application(UTF-8) and use a different charset

I'm not an IntraWeb user, so I couldn't answer that.

--
Remy Lebeau (TeamB)


Are you able to recreate it in a simple IW application?

I added your example Thai text to one IW test case and this is the result: http://downloads.atozed.com/intraweb/images/unicode.png

As you can see, UTF-8 charset works correctly in a standard IW application (seems identical to the same chars as displayed here using the web interface and you claim to be correct).

Jeff Karingada

Posts: 10
Registered: 2/27/17
Re: problems displaying Displaying thai characters [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Nov 13, 2017 12:16 AM   in response to: Alexandre Machado in response to: Alexandre Machado

Are you able to recreate it in a simple IW application?


I added your example Thai text to one IW test case and this is the result: http://downloads.atozed.com/intraweb/images/unicode.png

As you can see, UTF-8 charset works correctly in a standard IW application (seems identical to the same chars as displayed here using the web interface and you claim to be correct).

I have re created it in a demo IW app. I found that the Thai character display works if the data type used is Unicode string but not if the data type is ansistring. The Thai characters displays correctly if I set the code page of the ansistring to 874 (thai encoding) ie:
type AnsiString(874);

if the data type of the storage containing the Thai characters is AnsiString, then there will be data loss and get question marks depending on your system locale.

However in my Migrated application, there is a lot of usage of shortstrings, ansistring, ansichars which are all single byte characters. Even though my connector to the Mainframe is making use of strings and the thai characters are being encoded correctly, there is data loss from the connector module all the way to the display module. The intermediately storage makes use of single byte characters. I think I will need to make use of 2 byte data types and use the correct encoding in order for the display to show the Thai characters properly.
Before the days of UTF-8, you could override the a webpage encoding by changing charset on the webpage.

I think I will have to make use of Unicode strings all through my code in order to support Asian languages

Chad Hower

Posts: 613
Registered: 3/2/07
Re: problems displaying Displaying thai characters [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Nov 13, 2017 5:43 AM   in response to: Jeff Karingada in response to: Jeff Karingada
On 11/13/2017 4:16 AM, Jeff Karingada wrote:
I think I will have to make use of Unicode strings all through my code in order to support Asian languages

The whole world is going Unicode and for a good reason. This really will
be your best path forward.
Jeff Karingada

Posts: 10
Registered: 2/27/17
Re: problems displaying Displaying thai characters [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Nov 13, 2017 11:36 PM   in response to: Chad Hower in response to: Chad Hower
Chad Hower wrote:
On 11/13/2017 4:16 AM, Jeff Karingada wrote:
I think I will have to make use of Unicode strings all through my code in order to support Asian languages

The whole world is going Unicode and for a good reason. This really will
be your best path forward.
yeah I am going to do that. I think it would be the best option to do especially for compliances purposes. thank you for input
Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: problems displaying Displaying thai characters [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Nov 13, 2017 10:22 AM   in response to: Jeff Karingada in response to: Jeff Karingada
Jeff Karingada wrote:

I have re created it in a demo IW app. I found that the Thai
character display works if the data type used is Unicode string but
not if the data type is ansistring.

IntraWeb uses Unicode, so if you store Thai characters in an
AnsiString, you have to make sure to use codepage 874 for any
conversions to Unicode.

The Thai characters displays correctly if I set the code page of the
ansistring to 874 (thai encoding) ie: type AnsiString(874);

As you should be.

if the data type of the storage containing the Thai characters is
AnsiString, then there will be data loss and get question marks
depending on your system locale.

If you use a plain ordinary non-codepaged AnsiString, then the
conversion from ANSI to Unicode will use the system default ANSI
codepage (see the global DefaultSystemCodePage variable in Delphi RTL's
System unit), which is likely NOT 874 in your situation. You can set a
custom codepage using the System.SetMultiByteConversionCodePage()
function.

However in my Migrated application, there is a lot of usage of
shortstrings, ansistring, ansichars which are all single byte
characters. Even though my connector to the Mainframe is making use
of strings and the thai characters are being encoded correctly, there
is data loss from the connector module all the way to the display
module. The intermediately storage makes use of single byte
characters. I think I will need to make use of 2 byte data types and
use the correct encoding in order for the display to show the Thai
characters properly.

You need to stop using ANSI types. As soon as the arrives from the
Mainframe, convert it to Unicode right away, and then pass it up the
rest of the chain as Unicode.

I think I will have to make use of Unicode strings all through my
code in order to support Asian languages

Yes, you should.

--
Remy Lebeau (TeamB)
Jeff Karingada

Posts: 10
Registered: 2/27/17
Re: problems displaying Displaying thai characters [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Nov 14, 2017 12:13 AM   in response to: Remy Lebeau (Te... in response to: Remy Lebeau (Te...
Remy Lebeau (TeamB) wrote:
Jeff Karingada wrote:

I have re created it in a demo IW app. I found that the Thai
character display works if the data type used is Unicode string but
not if the data type is ansistring.

IntraWeb uses Unicode, so if you store Thai characters in an
AnsiString, you have to make sure to use codepage 874 for any
conversions to Unicode.

I did a test of this and works as long the codepage for the ansistring is 874

The Thai characters displays correctly if I set the code page of the
ansistring to 874 (thai encoding) ie: type AnsiString(874);

As you should be.

if the data type of the storage containing the Thai characters is
AnsiString, then there will be data loss and get question marks
depending on your system locale.

If you use a plain ordinary non-codepaged AnsiString, then the
conversion from ANSI to Unicode will use the system default ANSI
codepage (see the global DefaultSystemCodePage variable in Delphi RTL's
System unit), which is likely NOT 874 in your situation. You can set a
custom codepage using the System.SetMultiByteConversionCodePage()
function.
thank you for this suggestion. By setting System.SetMultiByteConversionCodePage(874), it will use code page 874 throughout the application. The Thai characters display correctly
. I also had to ensure that it was the right encoding as soon as it came from the mainframe. I also had to ensure that the right encoding was used in the sending channel to the mainframe (writing data to the mainframe). I think this would be a solution for applications that are still using classic ansistrings and are not using unicode

However in my Migrated application, there is a lot of usage of
shortstrings, ansistring, ansichars which are all single byte
characters. Even though my connector to the Mainframe is making use
of strings and the thai characters are being encoded correctly, there
is data loss from the connector module all the way to the display
module. The intermediately storage makes use of single byte
characters. I think I will need to make use of 2 byte data types and
use the correct encoding in order for the display to show the Thai
characters properly.

You need to stop using ANSI types. As soon as the arrives from the
Mainframe, convert it to Unicode right away, and then pass it up the
rest of the chain as Unicode.

I have decided to go the Unicode route and convert data from the mainframe to Unicode and use Unicode strings. I did a proof a concept for this and it works as expected. it is displaying and allowing the input of Thai characters

I think I will have to make use of Unicode strings all through my
code in order to support Asian languages

Yes, you should.

I will be going the Unicode route. I think it will be easier to maintain later on and will be complaint. I thought it would have been easier to use Ansistrings when I ported across from Delphi 6 to the current version because of talking to a mainframe that uses 8 bit. I learnt the hard way but I think if you are supporting multi language platforms Unicode is the way to go when migrating from Pre Delphi 2009 provided you have the resources and time.

thank you for your help and input. I have learnt quite a lot over the past few days. Much appreciated

--
Remy Lebeau (TeamB)
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02