Watch, Follow, &
Connect with Us

For forums, blogs and more please visit our
Developer Tools Community.


Welcome, Guest
Guest Settings
Help

Thread: Unicode strings in structured data


This question is answered.


Permlink Replies: 83 - Last Post: Jun 23, 2015 5:30 AM Last Post By: Rudy Velthuis (...
Jens Munk

Posts: 11
Registered: 9/12/01
Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 2, 2015 8:47 AM
I have a packed array of packed records, and the record holds regular numbers of known sizes like integers and singles and then some strings. If there were no strings, this would be easy to store and retrieve or memory map to a file, and individual records could be updated without accessing the entire file. However, the strings (which are unicode) ruins this.

In my existing application, which dates way back in time, the equivalent strings of ascii(byte) characters are just stored as arrays of fixed lengths of AnsiChar, but I must switch to unicode to support Asian text in this new version. Any particular tricks for this?

What if I use PChar or PWideChar in a union with a packed array of byte of sufficient size? Would this work?

Thanks,

Jens.
Peter Below

Posts: 1,227
Registered: 12/16/99
Re: Unicode strings in structured data
Correct
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 2, 2015 10:04 AM   in response to: Jens Munk in response to: Jens Munk
Jens Munk wrote:

I have a packed array of packed records, and the record holds regular
numbers of known sizes like integers and singles and then some
strings. If there were no strings, this would be easy to store and
retrieve or memory map to a file, and individual records could be
updated without accessing the entire file. However, the strings
(which are unicode) ruins this.

In my existing application, which dates way back in time, the
equivalent strings of ascii(byte) characters are just stored as
arrays of fixed lengths of AnsiChar, but I must switch to unicode to
support Asian text in this new version. Any particular tricks for
this?

Just use arrays of Widechar for storage, this way you can define a
fixed-size array as part of the records and gain direct indexed record
access in the disk file, like for your old application. The alternative
is to go to a real database engine for storage.


What if I use PChar or PWideChar in a union with a packed array of
byte of sufficient size? Would this work?

No, not the way you are thinking probably. But you can take the address
of the first element of such an array and cast it to PWidechar, if
you make sure your array is always zero-terminated. For read access
that would work directly. For write access you need to make sure first
that your Unicodestring does not contain more characters than can fit
into the array (including the terminating #0000) and the copy the
characters using System.SysUtils.StrLCopy, which has an overloaded
version for PWidechar.

Do you need to deal with surrogate pairs, code points outside the basic
16 bit Unicode encoding?

The alternative would be to store the actual length of the string
contained in the array into the record as well (a bit like the old
Shortstring type is implemented) and then copy the content of the array
to a String (Unicodestring) using SetString. This way you do not need a
#0000 terminator.

--
Peter Below (TeamB)

Jens Munk

Posts: 11
Registered: 9/12/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 12:58 AM   in response to: Peter Below in response to: Peter Below
Thanks Peter,

I can get this to work. One clarifying question, though. The need for surrogate pairs or not. I don't know. The languages I will have to support are besides obvious Latin letter ones, Chinese, Korean and perhaps Japanese. Do they require surrogate pairs?

Not that I think it will ever happens, but what about Cyrillic, Arab and Hebrew?


Peter Below wrote:
Jens Munk wrote:

Do you need to deal with surrogate pairs, code points outside the basic
16 bit Unicode encoding?

--
Peter Below (TeamB)

Bo Berglund

Posts: 757
Registered: 10/23/02
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 2, 2015 10:07 AM   in response to: Jens Munk in response to: Jens Munk
On Tue, 2 Jun 2015 08:47:28 -0700, Jens Munk <> wrote:

I have a packed array of packed records, and the record holds
regular numbers of known sizes like integers and singles and
then some strings. If there were no strings, this would be
easy to store and retrieve or memory map to a file, and individual
records could be updated without accessing the entire file. However,
the strings (which are unicode) ruins this.

In my existing application, which dates way back in time, the
equivalent strings of ascii(byte) characters are just stored as
arrays of fixed lengths of AnsiChar, but I must switch to unicode
to support Asian text in this new version. Any particular tricks
for this?

What if I use PChar or PWideChar in a union with a packed array of
byte of sufficient size? Would this work?
AFAIK the change to unicode in Delphi was done by redefining the
string type from AnsiString to WideString, i.e. the character codes
are 2 bytes instead of 1 byte in size.
So all of your (fixed length?) strings in the record will double in
size and this makes the new files of this record type incompatible
with old files.
I do not know what will happen for a packed record where some fields
are fixed length strings, but I suspect that the zero bytes will still
persist.

This is not the full story, I am sure, but you will have to wait for
someone with more insight (like Remy) to fill in the voids and
probably correct what I wrote too... ;)

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 2, 2015 10:46 AM   in response to: Bo Berglund in response to: Bo Berglund
Bo wrote:

AFAIK the change to unicode in Delphi was done by redefining the
string type from AnsiString to WideString, i.e. the character codes
are 2 bytes instead of 1 byte in size.

Also redefining the (P)Char type from (P)AnsiChar to (P)WideChar.

So all of your (fixed length?) strings in the record will double in
size

Only if the record was using the generic Char type and not AnsiChar directly.

this makes the new files of this record type incompatible with old files.

For backwards compatibility with existing data, you would have to define
the original record as using AnsiChar explicitly, and then define a separate
record that uses WideChar/UnicodeString instead, converting between the two
records when loading/saving data.

--
Remy Lebeau (TeamB)
Dan Barclay

Posts: 889
Registered: 11/9/03
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 2, 2015 11:07 AM   in response to: Jens Munk in response to: Jens Munk
Jens Munk wrote:

I have a packed array of packed records, and the record holds regular
numbers of known sizes like integers and singles and then some
strings. If there were no strings, this would be easy to store and
retrieve or memory map to a file, and individual records could be
updated without accessing the entire file. However, the strings
(which are unicode) ruins this.

In my existing application, which dates way back in time, the
equivalent strings of ascii(byte) characters are just stored as
arrays of fixed lengths of AnsiChar, but I must switch to unicode to
support Asian text in this new version. Any particular tricks for
this?

What if I use PChar or PWideChar in a union with a packed array of
byte of sufficient size? Would this work?

They did a very poor job of dealing with this. Byte wide strings (and
arrays of them!) have been an excellent tool for handling binary data
for a very long time. They could have added unicode without such
turmoil. Unfortunataly, we got what we got.

Also, beware there are zealots who just conclude we are stupid for
using a tool so well suited for the task because of the name "string"
in the datatype.

That said, for the variables that need binary compatibility you should
declare them as explicit ansistring. That will take care of 99% of the
problems.

Unfortunately, failing (refusing!) to recognize this use leads to
problems you'll have to guard against. Not the least of which, even
with functions including Ansi in thier name (ex: AnsiRightStr() and
friends are NOT ansi by default), they have coerced the data types into
text strings. Maddening.

So, declare as ansistrings, then follow them through the compile
process watching for conversion warnings. There are versions of
Ansi*() functions available in a special ansi unit that you can add to
your uses. that actually support ansistring but don't assume using a
function with the name has any relevance.

Having to deal with these changes at the level required is fairly
stupid. Worse, failing (again, it's actually a refusal) to recognize
and extend such a great tool for dealing with binary data is beyond
stupid.

Now they point to some magic beans in the form of array functionality
in XE7/8 as if it's a help.

They don't "get it", and you can count on that to continue. Standby
for the zealots to start their lecture on how stupid it is to use
ansistring for binary data...

in 3...2...1...

Dan
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 12:05 AM   in response to: Dan Barclay in response to: Dan Barclay
Dan

Also, beware there are zealots who just conclude we are stupid for
using a tool so well suited for the task because of the name "string"
in the datatype.

I don't know the maximum upvotes allowed but +MaxInt should work.

They don't "get it", and you can count on that to continue. Standby
for the zealots to start their lecture on how stupid it is to use
ansistring for binary data...

in 3...2...1...

Bowl of popcorn at the ready :)

Roy Lambert
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 12:11 AM   in response to: Dan Barclay in response to: Dan Barclay
Dan

Also, beware there are zealots who just conclude we are stupid for
using a tool so well suited for the task because of the name "string"
in the datatype.

I don't know the maximum upvotes allowed but +MaxInt should work.

They don't "get it", and you can count on that to continue. Standby
for the zealots to start their lecture on how stupid it is to use
ansistring for binary data...

in 3...2...1...

Bowl of popcorn at the ready :)

Roy Lambert
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 12:20 AM   in response to: Dan Barclay in response to: Dan Barclay
Dan

Also, beware there are zealots who just conclude we are stupid for
using a tool so well suited for the task because of the name "string"
in the datatype.

I don't know the maximum upvotes allowed but +MaxInt should work.

They don't "get it", and you can count on that to continue. Standby
for the zealots to start their lecture on how stupid it is to use
ansistring for binary data...

in 3...2...1...

Bowl of popcorn at the ready :)

Roy Lambert
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 12:41 AM   in response to: Dan Barclay in response to: Dan Barclay
Dan Barclay wrote:

Jens Munk wrote:

I have a packed array of packed records, and the record holds
regular numbers of known sizes like integers and singles and then
some strings. If there were no strings, this would be easy to store
and retrieve or memory map to a file, and individual records could
be updated without accessing the entire file. However, the strings
(which are unicode) ruins this.

In my existing application, which dates way back in time, the
equivalent strings of ascii(byte) characters are just stored as
arrays of fixed lengths of AnsiChar, but I must switch to unicode to
support Asian text in this new version. Any particular tricks for
this?

What if I use PChar or PWideChar in a union with a packed array of
byte of sufficient size? Would this work?

They did a very poor job of dealing with this. Byte wide strings (and
arrays of them!) have been an excellent tool for handling binary data
for a very long time. They could have added unicode without such
turmoil. Unfortunataly, we got what we got.

Also, beware there are zealots who just conclude we are stupid for
using a tool so well suited for the task because of the name "string"
in the datatype.

Yes, I am such a "zealot". Strings are meant to contain text, nothing
else. Using them for something else has already shown to be a bad
choice, when strings got a different internal format. They can still
contain text, and even much better now. But that made them unsuitable
for binary data.

You can conclude that they should not have changed the internal format,
but hey, if nothing ever changed, we would still have short strings.

So, yes, there are a few obstinate unteachables who keep on using the
old Turbo Vision style hacks, just because they never grew up and
realized that hacks should only be used if nothing else works. And
these people blame Embarcadero for changing the data type, instead of
looking into themselves and thinking: was my solution future proof?
Obviously not.

I don't think I ever called anyone an idiot. But it was indeed pretty
foolish (and stubborn) to keep on using that hack without thinking of
removing it and doing the right thing.

Call me a zealot. I never had any problems converting from short to
long strings and I never had any problems converting to UnicodeString,
because I did not use such hacks. I think that is a pretty practical
approach. The last time I used them was for Turbo Vision (color
schemes were passed as literals like #01#04#07#00), some 20 or more
years ago.

So don't blame Embarcadero for changing strings. Blame yourself for not
thinking about the consequences of using a hack, in all those many
years you had the opportunity to change things.

--
Rudy Velthuis http://www.rvelthuis.de

Murphy's Fourth Law: If there is a possibility of several things
going wrong, the one that will cause the most damage will be the
one to go wrong.
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 7:17 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

Strings are meant to contain text, nothing
else. Using them for something else has already shown to be a bad
choice, when strings got a different internal format. They can still
contain text, and even much better now. But that made them unsuitable
for binary data.

A string is still just a load of bytes strung together. The principal change from my viewpoint is that Delphi decided that rather than one byte meaning one character its now two. I also seem to remember people stored unicode text into strings even in the old days. Should they be smacked for being naughty?

You can conclude that they should not have changed the internal format,
but hey, if nothing ever changed, we would still have short strings.

Change = good idea
Change for the sake of change = bad idea
Change implemented badly = worst idea

So don't blame Embarcadero for changing strings.

They made their life easier at the expense of making others more difficult. I feel that merits some blame.

Blame yourself for not
thinking about the consequences of using a hack, in all those many
years you had the opportunity to change things.

You keep using the pejorative term hack. How about think of it in terms of making sensible use of facilities available and not expecting someone to take it away for no good reason?

Murphy's Fourth Law: If there is a possibility of several things
going wrong, the one that will cause the most damage will be the
one to go wrong.

I do like this one. :)

Roy
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 8:30 AM   in response to: Roy Lambert in response to: Roy Lambert
Roy Lambert wrote:

Rudy

Strings are meant to contain text, nothing
else. Using them for something else has already shown to be a bad
choice, when strings got a different internal format. They can still
contain text, and even much better now. But that made them
unsuitable for binary data.

A string is still just a load of bytes strung together.

Yeah, and a keyboard is just a number of atoms strung together.

You could use a keyboard as a plate and eat your food from it, but most
people are well advised to only use if for the purpose of entering text
into a computer. The same can be said about strings.

String are vehicles to contain text. If you do it right, the internals,
and any changes to these internals, should hardly matter. They are NOT
vehicles to contain binary data, even if AnsiStrings have been abused
for that purpose. I mean, you could just as well use an array of
longints to contain bytes, but that would be just as silly and just as
wrong.

But hey, just continue to use strings as byte containers. You'll see
where that will get you. Fact is that those who do that should not
complain that THEIR hack doesn't work anymore. It is not Embarcadero's
job to forego improvements (and Unicode is a vast improvement over Ansi
with its codepages) in order to sustain such hacks.

A hack is a hack is a hack. So stop complaining, start doing things
properly and you won't have any problems if such things are changed.

--
Rudy Velthuis http://www.rvelthuis.de

"If people are good only because they fear punishment, and hope for
reward, then we are a sorry lot indeed." -- Albert Einstein
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 4, 2015 12:27 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

Yeah, and a keyboard is just a number of atoms strung together.

You could use a keyboard as a plate and eat your food from it, but most
people are well advised to only use if for the purpose of entering text
into a computer. The same can be said about strings.

The same argument can be used about anything that is not being used for what it was expressly designed for. However, to refute your argument totally

http://www.theregister.co.uk/2015/05/20/kfc_germany_bakes_bluetooth_keyboard_into_meal_trays/

String are vehicles to contain text. If you do it right, the internals,
and any changes to these internals, should hardly matter. They are NOT
vehicles to contain binary data, even if AnsiStrings have been abused
for that purpose. I mean, you could just as well use an array of
longints to contain bytes, but that would be just as silly and just as
wrong.

You mean even sillier than using multiple integers to represent one character?

But hey, just continue to use strings as byte containers. You'll see
where that will get you. Fact is that those who do that should not
complain that THEIR hack doesn't work anymore. It is not Embarcadero's
job to forego improvements (and Unicode is a vast improvement over Ansi
with its codepages) in order to sustain such hacks.

A hack is a hack is a hack. So stop complaining, start doing things
properly and you won't have any problems if such things are changed.

At what point does good and sensible practice become a hack?

Roy Lambert

Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 6, 2015 12:29 AM   in response to: Roy Lambert in response to: Roy Lambert
Roy Lambert wrote:

The same argument can be used about anything that is not being used
for what it was expressly designed for.

Indeed.

Strings were not meant to be carrying binary data. Doing so is foolish,
as time has told. What else can I say?

--
Rudy Velthuis http://www.rvelthuis.de

"The only function of economic forecasting is to make astrology
look respectable." -- John Kenneth Galbraith
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 6, 2015 6:06 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

The same argument can be used about anything that is not being used
for what it was expressly designed for.

Indeed.

Strings were not meant to be carrying binary data. Doing so is foolish,
as time has told. What else can I say?

1. please give the timepoint at which your assertion became truth.

2. Aren't you going to carry on the analogies? I did so enjoy the eating of a keyboard one <G>

Roy Lambert
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 8:37 AM   in response to: Roy Lambert in response to: Roy Lambert
Roy Lambert wrote:

Change = good idea
Change for the sake of change = bad idea

Bullshit.

Unicode was not just introduced for the sake of change. It was one of
the most favourite requests by users having to write international
software. It was an excellent thing that they changed the default from
Ansi to Unicode.

--
Rudy Velthuis http://www.rvelthuis.de

"It is practically imposible to teach good programming to
students that have had a prior exposure to BASIC: as potential
programmers they are mentally mutilated beyond hope of
regeneration." -- Edsger Dijkstra
Dan Barclay

Posts: 889
Registered: 11/9/03
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 11:35 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy Velthuis (TeamB) wrote:

Roy Lambert wrote:

Change = good idea
Change for the sake of change = bad idea

Bullshit.

Implementation: correct assessment

Unicode was not just introduced for the sake of change. It was one of
the most favourite requests by users having to write international
software. It was an excellent thing that they changed the default from
Ansi to Unicode.

I agree.

The problem was the way in which they did it, and the failure to
maintain this great tool for binary data handling in the process.

Dan
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 6, 2015 12:26 AM   in response to: Dan Barclay in response to: Dan Barclay
Dan Barclay wrote:

Unicode was not just introduced for the sake of change. It was one
of the most favourite requests by users having to write
international software. It was an excellent thing that they changed
the default from Ansi to Unicode.

I agree.

The problem was the way in which they did it

They did it in a way that those who used strings for text would not
have any problems at all.

--
Rudy Velthuis http://www.rvelthuis.de

"My opinions might have changed, but not the fact that I am
right."

Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 6, 2015 6:01 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

They did it in a way that those who used strings for text would not
have any problems at all.

I do believe your sig makes my point for me, well that coupled with quite a few posts in these ngs

Rudy Velthuis http://www.rvelthuis.de

"My opinions might have changed, but not the fact that I am
right."
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 6, 2015 7:46 AM   in response to: Roy Lambert in response to: Roy Lambert
Roy Lambert wrote:

They did it in a way that those who used strings for text would not
have any problems at all.

I do believe your sig makes my point for me,

<sigh>

No, it doesn't. I did not change my opinion. That has always been my
opinion, but indeed, the fact I am right has not changed.

--
Rudy Velthuis http://www.rvelthuis.de

"You must ask your neighbor if you shall live in peace."
-- John Clark
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 6, 2015 9:56 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

Whilst I was out walking the dog I thought "shall I let Rudy have the last word as he so loves" - then I thought "nah"

Was it an exasperated sigh, an exhausted one or something else?

Your belief, unless I misunderstand, is that Embarcadero altered strings in the best possible fashion, causing as little disruption as possible to anyone.

Mine is that there were alternative strategies such as introducing a UnicodeString type, or a compiler switch, which would have been less disruptive of existing code, left a useful artefact in place, allowed future unicode development but have been more expensive for Embarcadero than the one they took.

I have yet to see any reasoned refutation of my viewpoint - simply chanting "your wrong and I'm right" just doesn't seem to work.

Roy Lambert

"Absolute certainty is a sign of a rigid and inflexible mind"
unattributed

Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 6, 2015 2:12 PM   in response to: Roy Lambert in response to: Roy Lambert
Roy Lambert wrote:

Your belief, unless I misunderstand, is that Embarcadero altered
strings in the best possible fashion, causing as little disruption as
possible to anyone.

That is not a belief, that is an established fact (see later in text).
They caused as little disruption as possible for those who use and used
strings for what strings are meant: to contain text.

It has turned out (and I predicited this well before the actual switch)
that the only people who had considerable problems with thr switch were
those who prematurely ansified their programs (i.e. changed every
occurrence of the generic "string" by a specified "AnsiString") and
those who abused (Ansi)strings to contain binary data. And perhaps some
ASM routines.

So those who simply used strings to contain text and did not do
anything special (like ansifying or storing binary data) had hardly any
problems. Usually a full rebuild and a few corrections were enough.

That does show that using strings as carrier of binary data was a hack,
that may have made sense in the early days. People have had 15 years to
remove their hacks, and whatever the motives were of those who didn't
doesn't matter. Fact is that the hack caused them big problems when the
switch was made. <shrug>

So some may call me a zealot fot telling people it was wrong to keep on
using strings for binary data (and I have said that for many years
already, well before the Unicode switch was made), but ISTM that was
not zealotry, it was pure practicism. It is never a good idea to
leave a hack in place longer than absolutely necessary. A hack is a
hack is a hack, after all.

--
Rudy Velthuis http://www.rvelthuis.de

"There is a tragic clash between truth and the world. Pure
undistorted truth burns up the world."
-- Nikolay Berdyayev
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 12:30 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

Your belief, unless I misunderstand, is that Embarcadero altered
strings in the best possible fashion, causing as little disruption as
possible to anyone.

That is not a belief, that is an established fact (see later in text).
They caused as little disruption as possible for those who use and used
strings for what strings are meant: to contain text.

I saw later in the text, unfortunately I did not see any facts. Again I see belief or, if you prefer the word, opinion.

It has turned out (and I predicited this well before the actual switch)
that the only people who had considerable problems with thr switch were
those who prematurely ansified their programs (i.e. changed every
occurrence of the generic "string" by a specified "AnsiString") and
those who abused (Ansi)strings to contain binary data. And perhaps some
ASM routines.

Congratulations, I'm not sure where your evidence comes from though. I must have been reading the wrong newsgroups - I don't monitor all of the Embacardero ones.

So those who simply used strings to contain text and did not do
anything special (like ansifying or storing binary data) had hardly any
problems. Usually a full rebuild and a few corrections were enough.

IO neither doubt that, or dispute it. After all if you don't do much there's not a lot to go wrong.

That does show that using strings as carrier of binary data was a hack,
that may have made sense in the early days.

That assertion is rubbish. What is shown is that trying to use unicodesiting in the same way as the old string is difficult at best and at times foolhardy

People have had 15 years to
remove their hacks, and whatever the motives were of those who didn't
doesn't matter. Fact is that the hack caused them big problems when the
switch was made. <shrug>

The timescale you quote rather surprises me.

So some may call me a zealot fot telling people it was wrong to keep on
using strings for binary data (and I have said that for many years
already, well before the Unicode switch was made), but ISTM that was
not zealotry, it was pure practicism. It is never a good idea to
leave a hack in place longer than absolutely necessary. A hack is a
hack is a hack, after all.

I wouldn't call you a zealot, I think you have strong opinions, are willing to stand up for them, but are not willing to listen to others views when they contradict your own.

Your persistent reference to the use of strings to carry other than printable characters (or as you call it binary data) as a hack is a fine example of this. At one point it was good programming practice (probably because it was the only vehicle available). AT that point it wasn't a hack.
..
Roy Lambert
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 7:32 AM   in response to: Roy Lambert in response to: Roy Lambert
Roy Lambert wrote:

I saw later in the text, unfortunately I did not see any facts.

Then you don't. I don't really mind if you see it or not.

--
Rudy Velthuis http://www.rvelthuis.de

"Why do we kill people who are killing people to show that
killing people is wrong?"
-- Holly Near
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 6, 2015 2:16 PM   in response to: Roy Lambert in response to: Roy Lambert
Roy Lambert wrote:

Mine is that there were alternative strategies such as introducing a
UnicodeString type, or a compiler switch

No. A switch was never a viable option (just like an ARC/no ARC switch
is no viable option). This has been discussed ad nauseam, so try to
find those discussions, I won't repeat them anymore.

So the facts contradict your belief. This is not a matter of belief or
opinion, it is a matter of fact.

You can, of course, keep on believing what you want, but ISTM that
ignoring the facts won't do you any good.

--
Rudy Velthuis http://www.rvelthuis.de

"To understand a man you should walk a mile in his shoes. If what
he says still bothers you that's ok because you'll be a mile away
from him and you'll have his shoes." -- Unknown
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 12:31 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

So the facts contradict your belief. This is not a matter of belief or
opinion, it is a matter of fact.

Facts are things that can be scientifically proved or disproved, anything else is at best an hypothesis or more probably an opinion or belief.

Roy Lambert
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 7:46 AM   in response to: Roy Lambert in response to: Roy Lambert
Roy Lambert wrote:

Rudy

So the facts contradict your belief. This is not a matter of belief
or opinion, it is a matter of fact.

Facts are things that can be scientifically proved

Bullshit.

Not every fact can be proven. It is a fact I am thinking of a glass of
cool water now, but it can't be proven. It is a fact I picked up a
screwdriver a few seconds ago, but it can't be proven. Etc. etc.

But it is a metter of fact, and it can be proven, that a switch was not
a viable option. It has been proven many times already and I will not
repeat it, especially since it won't change your fixed notions anyway.

--
Rudy Velthuis http://www.rvelthuis.de

"In this war - as in others - I am less interested in honoring
the dead than in preventing the dead." -- Butler Shaffer
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 7:52 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

But it is a metter of fact, and it can be proven, that a switch was not
a viable option. It has been proven many times already and I will not
repeat it, especially since it won't change your fixed notions anyway.

It can only be proven if it was attempted. It was not. You have a strange definition of proof.

Roy Lambert
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 7:57 AM   in response to: Roy Lambert in response to: Roy Lambert
Roy Lambert wrote:

Rudy

But it is a metter of fact, and it can be proven, that a switch was
not a viable option. It has been proven many times already and I
will not repeat it, especially since it won't change your fixed
notions anyway.

It can only be proven if it was attempted.

Bullshit again. No attempt is needed to tell it is not a good idea to
jump off the Empire State without something like a parachute or so.

It has been shown, many many times, why a switch is a bad idea.

--
Rudy Velthuis http://www.rvelthuis.de

"It was, of course, a lie what you read about my religious
convictions, a lie which is being systematically repeated. I do
not believe in a personal god and I have never denied this but
have expressed it clearly. If something is in me which can be
called religious, then it is the unbounded admiration for the
structure of the world so far as our science can reveal it."
-- Albert Einstein
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 8:23 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

Bullshit again. No attempt is needed to tell it is not a good idea to
jump off the Empire State without something like a parachute or so.

It is if you want to commit suicide <vbg>

It has been shown, many many times, why a switch is a bad idea.

Interesting so we've moved from "not viable" to "bad idea". Some progress may be being made.

Roy Lambert
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 8:22 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

Not every fact can be proven. It is a fact I am thinking of a glass of
cool water now, but it can't be proven.

I will agree with the second part of the statement, and since that is so maybe the first part should be considered null ie unknown

It is a fact I picked up a
screwdriver a few seconds ago, but it can't be proven. Etc. etc.

Possibly, possibly not. Was there a witness? Did you record it on video?

But it is a metter of fact, and it can be proven, that a switch was not
a viable option. It has been proven many times already and I will not
repeat it, especially since it won't change your fixed notions anyway.

It has not been proven it has been stated. Simply continuing to state something does not make it true.

Roy Lambert
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 6, 2015 2:26 PM   in response to: Roy Lambert in response to: Roy Lambert
Roy Lambert wrote:

I have yet to see any reasoned refutation of my viewpoint

Actually, your point of view is totally irrelevant to me.

I will just correct your saying that the Unicode switch was implemented
badly. The switch was done very well and in the only viable way.

I have explained how and why. Your views about a switch, etc. don't
make sense. Just ponder it a little harder and longer, and perhaps
you'll see why. Good luck with that.

And I will correct any "accusations" of zealotry. Time has shown that
storing binary data in strings was and is a bad idea. If you did it,
you knew (but perhaps forgot) that it might one day break terribly,
especially if you did it consistently, like some here. If, by chance,
you did not know this, then well, too bad. If you use a hack, you
should be aware of the dangers and you should keep the use of the hack
to a minimum. Forget those principles and one day you may be in big
trouble.

--
Rudy Velthuis http://www.rvelthuis.de

"If I could find a way to get [Saddam Hussein] out of there, even
putting a contract out on him, ... ahh ... if the CIA still did
that sort of thing, . . . ahh . . . assuming it ever did . . . .
. . . I would be for it." -- Richard Nixon
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 12:35 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

I have yet to see any reasoned refutation of my viewpoint

Actually, your point of view is totally irrelevant to me.

I know <g>

I will just correct your saying that the Unicode switch was implemented
badly. The switch was done very well and in the only viable way.

As you may have guessed I disagree.

I have explained how and why. Your views about a switch, etc. don't
make sense. Just ponder it a little harder and longer, and perhaps
you'll see why. Good luck with that.

Must have missed that.

And I will correct any "accusations" of zealotry. Time has shown that
storing binary data in strings was and is a bad idea. If you did it,
you knew (but perhaps forgot) that it might one day break terribly,
especially if you did it consistently, like some here. If, by chance,
you did not know this, then well, too bad. If you use a hack, you
should be aware of the dangers and you should keep the use of the hack
to a minimum. Forget those principles and one day you may be in big
trouble.

Time has certainly shown that if someone comes along and changes the fundamental definition of an object it screws up what's gone before.

Roy Lambert
Bo Berglund

Posts: 757
Registered: 10/23/02
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 1:24 AM   in response to: Roy Lambert in response to: Roy Lambert
On Sun, 7 Jun 2015 00:35:43 -0700, Roy Lambert <roy at lybster dot me dot uk>
wrote:

Time has certainly shown that if someone comes along and changes
the fundamental definition of an object it screws up what's gone
before.
Interesting discussion/flaming...

There is another similar case concerning Indy evolution.
When Indy went from 9 to 10 I had to hack my Delphi installation
by using environment variables to re-point the search paths etc
to go to the Indy version pertinent to the application in question.
We had many applications built with Indy9 which broke in Indy10.
The reason is that the Indy team removed/renamed/changed a number of
public methods and properties in a way that caused a LOT of work to
modify.

And since Delphi switched from 9 to 10 at some time when we upgraded
we could not continue development in the new Delphi version because of
this. Until I realized I could "hack" my way into the Delphi
installation directory and move away all of the Indy source and dcu
files to a directory not in the common Delphi search system. Then I
could add the env vars to connect to the correct version and got a
possibility to continue.

Of course new applications were using Indy10....
And Indy is not nearly as much in use "everywhere" as string is.

Side note concerning string:
My group was developing industrial automation applications in my
workplace (I am now retired) and we started using Delphi back in 1995.
In this we had to interface to machine tools for control purposes and
these used ASCII character control over RS232 and later TCP/IP (where
Indy came into use). Basically all machine tool makers used different
control syntax and transmission packet formats.

Needless to say we composed the control sentences in strings, even
those that had packet systems with embedded control characters and
other binary data. The reason: Ease of programming (RAD!) using all
the different string manipulation functions in Delphi.

I do not consider that a hack!

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572

Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 7:27 AM   in response to: Bo Berglund in response to: Bo Berglund
Bo

I do not consider that a hack!

I'm certain Rudy will <g>

Roy

ps

From your posts it doesn't seen as if you're retired
Bo Berglund

Posts: 757
Registered: 10/23/02
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 7:49 AM   in response to: Roy Lambert in response to: Roy Lambert
On Sun, 7 Jun 2015 07:27:43 -0700, Roy Lambert <roy at lybster dot me dot uk>
wrote:

From your posts it doesn't seem as if you're retired

Well, for many years I had a day job in Sweden, which I have retired
from.
But I also have a share in a small business in Austin Texas into which
I provide some development support.
Used to be basically electronics development but now I have been
thrown into maintenance of a software suite developed by someone else
who has quit. Hence the many posts about things I have encountered
when:
1) Migrating the 3 applications to the Unicode enabled RAD studio from
BDS2006.

2) Adding new functionality to one of the applications, which was
written in C++ (a language I have never programmed in myself).

One has to have something to do even when no longer working, for
instance work....

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572

Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 8:23 AM   in response to: Bo Berglund in response to: Bo Berglund
Bo

One has to have something to do even when no longer working, for
instance work....

Yup. My body is breaking down, I'm trying to keep my brain functioning.

Roy
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 7:39 AM   in response to: Bo Berglund in response to: Bo Berglund
Bo Berglund wrote:

Needless to say we composed the control sentences in strings, even
those that had packet systems with embedded control characters and
other binary data. The reason: Ease of programming (RAD!) using all
the different string manipulation functions in Delphi.

I do not consider that a hack!

Even if it was actually a hack. I bet it broke or will break badly one
day.

--
Rudy Velthuis http://www.rvelthuis.de

"It is not easy to find happiness in ourselves; it is not
possible to find it elsewhere."
-- Agnes Repplier

Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 7:38 AM   in response to: Roy Lambert in response to: Roy Lambert
Roy Lambert wrote:

Rudy

I have yet to see any reasoned refutation of my viewpoint

Actually, your point of view is totally irrelevant to me.

I know <g>

Just like whatever I write will not change your mind anyway, so I won't
bother.

--
Rudy Velthuis http://www.rvelthuis.de

"'Everything you say is boring and incomprehensible', she said,
'but that alone doesn't make it true.'" -- Franz Kafka
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 8:02 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

Just like whatever I write will not change your mind anyway,

Well not when your contribution consists of "you're wrong" "its a hack"

so I won't
bother.

OK

Roy Lambert
Dan Barclay

Posts: 889
Registered: 11/9/03
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 10, 2015 6:05 PM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy Velthuis (TeamB) wrote:

Roy Lambert wrote:

I have yet to see any reasoned refutation of my viewpoint

Actually, your point of view is totally irrelevant to me.

I will just correct your saying that the Unicode switch was
implemented badly. The switch was done very well and in the only
viable way.

I have explained how and why. Your views about a switch, etc. don't
make sense. Just ponder it a little harder and longer, and perhaps
you'll see why. Good luck with that.

And I will correct any "accusations" of zealotry. Time has shown that
storing binary data in strings was and is a bad idea. If you did it,
you knew (but perhaps forgot) that it might one day break terribly,
especially if you did it consistently, like some here. If, by chance,
you did not know this, then well, too bad. If you use a hack, you
should be aware of the dangers and you should keep the use of the hack
to a minimum. Forget those principles and one day you may be in big
trouble.

There seems to be confusion here, maybe I can clear some of it up.

Actually, what some of us did was to translate our binary into text
that worked well even in narrow (one byte) Ansistring text variables.
I can't speak for others, but I used a simple substitution table. Yup,
they were text and every text character was important. The text
wasn't "English" or "German", but it was one byte ansi text. A
straight up translation.

Where EMB went wrong was, in fact, in their implementation. Had they
implemented the unicode "automagic" type conversions correctly there
would have been no corruption of the text. In fact, EMB went out of
their way to put conversions where none were needed (ref AnsiLeftStr()
and friends).

Ansistring to Unicode, back to Ansistring, should be clean. As it
happens, there is occasionaly corruption of the text. The text itself
is corrupted. As a result, our translation back to binary is also
corrupted, since you can't fix corruption once it has happened. This
happens rarely, but it happens and that breaks things.

Now, does that make you feel better that we don't store nasty binary
data in your precious text strings? I apologize if my earlier
description was confusing to you.

Funny how the end result is almost identical. Almost? Hmmm...

I'm curious... you're a TeamB guy huh? Is your "bullshit" approach to
things representative of that moniker? It really doesn't seem as
helpful as the approach other team members seem to use. You might want
to use something more like their style. They cut the bullshit and
treat others with respect. I have no allusions that you will do that,
but I thought I'd offer the suggestion.

For whatever that is worth.

Dan
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 15, 2015 1:34 AM   in response to: Dan Barclay in response to: Dan Barclay
Dan Barclay wrote:

Actually, what some of us did was to translate our binary into text
that worked well even in narrow (one byte) Ansistring text variables.
I can't speak for others, but I used a simple substitution table.
Yup, they were text and every text character was important. The text
wasn't "English" or "German", but it was one byte ansi text. A
straight up translation.

No problem with that. If the end result of the "translation" was indeed
text, it should not cause a problem with encodings like UTF-16 or UTF-8.

Where EMB went wrong was, in fact, in their implementation. Had they
implemented the unicode "automagic" type conversions correctly there
would have been no corruption of the text.

If you encoded it as you claim, there was still no corruption, no
matter if it was encoded in UTF-16, UTF-8 or even plain ASCII. If not,
then you did not do what you claim above.

--
Rudy Velthuis http://www.rvelthuis.de

"Programmers are in a race with the Universe to create bigger and
better idiot-proof programs, while the Universe is trying to
create bigger and better idiots. So far the Universe is winning."
-- Rich Cook
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 4, 2015 12:12 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

Why did you decide to cut my list before

Change implemented badly = worst idea

Roy Lambert

Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 6, 2015 12:27 AM   in response to: Roy Lambert in response to: Roy Lambert
Roy Lambert wrote:

Why did you decide to cut my list before

Change implemented badly = worst idea

Because it was irrelevant. The changes were not implemented badly at
all. They were done in the best possible way, so that those who use
strings for the purpose they are meant for would not have any problems
at all.

--
Rudy Velthuis http://www.rvelthuis.de

"Religion is excellent stuff for keeping common people quiet."
-- Napoleon

Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 6, 2015 6:01 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Rudy

I'm so glad it was irrelevant and not you trying to either ignore something or twist to your viewpoint.

Roy Lambert

Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 12:48 AM   in response to: Dan Barclay in response to: Dan Barclay
Dan Barclay wrote:

They did a very poor job of dealing with this. Byte wide strings (and
arrays of them!) have been an excellent tool for handling binary data
for a very long time. They could have added unicode without such
turmoil. Unfortunataly, we got what we got.

Also, beware there are zealots who just conclude we are stupid for
using a tool so well suited for the task because of the name "string"
in the datatype.

Yes, I am such a "zealot". Strings are meant to contain text, nothing
else. Using them for something else has already shown to be a bad
choice, when strings got a different internal format. They can still
contain text, and even much better now. But that made them unsuitable
for binary data.

You can conclude that they should not have changed the internal format,
but hey, if nothing ever changed, we would still have short strings.

So, yes, there are a few obstinate unteachables who keep on using the
old Turbo Vision style hacks, just because they never grew up and
realized that hacks should only be used if nothing else works. And
these people blame Embarcadero for changing the data type, instead of
looking into themselves and thinking: was my solution future proof?
Obviously not.

I don't think I ever called anyone an idiot. But it was indeed pretty
foolish (and stubborn) to keep on using that hack without thinking of
removing it and doing the right thing.

Call me a zealot. I never had any problems converting from short to
long strings and I never had any problems converting to UnicodeString,
because I did not use such hacks. I think that is a pretty practical
approach. The last time I used them was for Turbo Vision (color
schemes were passed as literals like #01#04#07#00), some 20 or more
years ago.

So don't blame Embarcadero for changing strings. They are not the ones
who did a poor job. They actually did an excellent job.

Blame yourself for not thinking about the consequences of using a hack,
in all those many years you had the opportunity to change things.

--
Rudy Velthuis http://www.rvelthuis.de

Murphy's Fourth Law: If there is a possibility of several things
going wrong, the one that will cause the most damage will be the
one to go wrong.
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 6:57 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
am so glad I'm not the only one with hiccoughs

Roy Lambert

Bo Berglund

Posts: 757
Registered: 10/23/02
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 8:02 AM   in response to: Roy Lambert in response to: Roy Lambert
On Wed, 3 Jun 2015 06:57:45 -0700, Roy Lambert <roy at lybster dot me dot uk>
wrote:

am so glad I'm not the only one with hiccoughs

Roy Lambert


If the posting does not succeed, do not retry it!

I have found that this forum/news server is very often acting up by
being extremely slow to the extent that my newsreader gives up and
tells me the post failed.
But it did not, instead it is the acknowledge message that apparently
got too delayed for my newsreader.
Hitting send again just duplicates the post (as we have seen).

Instead when this happens I first change forum/newsgroup and refresh
until I get a response, then head back to the original ng and refresh
again. Mostly my failed post is actually there!

One should not have to go through such loops in a discussion
forum/newsgroup hosted by a software development tool company!

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572

Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 8:32 AM   in response to: Bo Berglund in response to: Bo Berglund
Bo Berglund wrote:

If the posting does not succeed, do not retry it!

Sometimes, the newsreader doesn't get a clue that the posting did
succeed, so it tries until it gets such a clue.

--
Rudy Velthuis http://www.rvelthuis.de

"The Bible was a consolation to a fellow alone in the old cell.
The lovely thin paper with a bit of matress stuffing in it, if
you could get a match, was as good a smoke as I ever tasted."
-- Brendan Behan.
Bo Berglund

Posts: 757
Registered: 10/23/02
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 8:45 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
On Wed, 3 Jun 2015 08:32:40 -0700, Rudy Velthuis (TeamB)
<newsgroups at rvelthuis dot de> wrote:

Bo Berglund wrote:

If the posting does not succeed, do not retry it!

Sometimes, the newsreader doesn't get a clue that the posting did
succeed, so it tries until it gets such a clue.

Oh, I see. So the newsreader continues posting all by itself then?
Mine does not, it shows an error message when it fails (the
determining factor for that I really do not know, but I suspect a
return message that got astray).

So I have control over the amount of posting. Just recently discovered
that postings which seem to fail actually succeeded notbeknowest by
the newsreader...

Your milage may vary of course.

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 3, 2015 8:52 AM   in response to: Bo Berglund in response to: Bo Berglund
Bo Berglund wrote:

Sometimes, the newsreader doesn't get a clue that the posting did
succeed, so it tries until it gets such a clue.

Oh, I see. So the newsreader continues posting all by itself then?

It tries to post until it gets a clue it succeeded. Then it removes the
post from its outbox. If it already got through, but the newsreaer
doesn't know this, it keeps on trying.

--
Rudy Velthuis http://www.rvelthuis.de

"The compulsion to do good is an innate American trait. Only
North Americans seem to believe that they always should, may,
and actually can choose somebody with whom to share their
blessings. Ultimately this attitude leads to bombing people
into the acceptance of gifts."
-- Ivan Illich
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 4, 2015 12:12 AM   in response to: Bo Berglund in response to: Bo Berglund
Bo

I think on this occasion you're right, but all to often the attempted post hasn't turned up after several days. I'm just been conditioned to expect that posting failed :(

Roy Lambert

Bo Berglund

Posts: 757
Registered: 10/23/02
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 4, 2015 1:54 AM   in response to: Roy Lambert in response to: Roy Lambert
On Thu, 4 Jun 2015 00:12:09 -0700, Roy Lambert <roy at lybster dot me dot uk>
wrote:

Bo

I think on this occasion you're right, but all to often the attempted
post hasn't turned up after several days. I'm just been conditioned
to expect that posting failed :(


Yeah,
and I was accustomed to discussion forums/newsservers "just working"
until about a year ago.
At that time this one broke badly and has not recovered since (for
example everything in the past history is lost in all forums/ngs).

But a little more than a year ago another forum I use heavily also
broke, this is the Microchip forum for embedded controller design (the
PIC line of controllers).
Here it is: http://www.microchip.com/forums/default.aspx
But contrary to the Embarcadero way Microchip has at least spent some
effort returning it to operational status and without losing past
history!

I do not get why EBT disregards their user base this way.....

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 4, 2015 4:37 AM   in response to: Bo Berglund in response to: Bo Berglund
Bo

I do not get why EBT disregards their user base this way.....

They either believe, or have proven to themselves, that it is more profitable to do it this way.

Roy
wenjie zhou

Posts: 424
Registered: 6/28/02
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 8, 2015 7:09 PM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Yes, I am such a "zealot". Strings are meant to contain text, nothing
else.

Text can be encode by many style. Unicode is just one coding way. Unicode is not equal text.
Can you distinguish the difference between the two concepts.

we would still have short strings.

What's question about short strings? If we have "String[n] ", and each character has two bytes. And the whole "short string" can
have 65535 characters. That's very nice. Of course, we should have "AnsiString[n]".

This is the now situation : String = two bytes String[n] = one byte. It's a complete joke. For a beginner, this is a serious ambiguity.
They can draw a conclusion, poorly designed language.

Blame yourself for not thinking about the consequences of using a hack,
in all those many years you had the opportunity to change things.

Hack? If one day, the world replaced the Unicode with another encoding method.
Well, Delphi is hard to use Hack to using new encoding method.
So, please do not confuse encoding with text. You should take one second and think about the opinions of others.
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 8, 2015 11:48 PM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

Yes, I am such a "zealot". Strings are meant to contain text,
nothing else.

Text can be encode by many style. Unicode is just one coding way.

Sure. Not sure how that matters, though. Strings contain text, not
binary data. The fact that strings contain an encoding is exactly why
they should not be used for binary.

--
Rudy Velthuis http://www.rvelthuis.de

"Roses are #FF0000
Violets are #0000FF
All my base are belong to you!"
-- Geek Valentine T-shirt at ThinkGeek
Lajos Juhasz

Posts: 801
Registered: 3/14/14
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 9, 2015 10:58 PM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

Text can be encode by many style. Unicode is just one coding way.
Unicode is not equal text. Can you distinguish the difference
between the two concepts.

Finally you wrote something correctly. You wrote text and not binary
data. Yes strings are meant to contain text data and not binary.
Embarcadero did a great job to support it.

What's question about short strings? If we have "String[n] ", and
each character has two bytes. And the whole "short string" can have
65535 characters. That's very nice. Of course, we should have
"AnsiString[n]".

I agree they should deprecate short strings. Unfortunately short
strings are widely used and they made a choice to leave them in the
language. I can't blame them. Fortunately short strings are not widely
used and can be replaced without a headache (except when used in packed
records).

This is the now situation : String = two bytes String[n] = one
byte. It's a complete joke. For a beginner, this is a serious
ambiguity. They can draw a conclusion, poorly designed language.

Embarcadero has addressed this in the mobile compiler. You should know
that most of the forum users didn't celebrate that decision. So now you
are advocating removing ANSI strings from the language completely? No
it's not a joke; you should read some of the threads in the
non-technical forum on this topic regarding the mobile compiler.

Hack? If one day, the world replaced the Unicode with another
encoding method. Well, Delphi is hard to use Hack to using new
encoding method. So, please do not confuse encoding with text. You
should take one second and think about the opinions of others.

Please don't forget that there were changes. In my case first I had to
store every text data regardless of the code page in (ansi)string (due
to lack of Unicode support for Informix databases. Of course with the
old ansi string it was not easy as you was in charge to take care of
code page conversions. There was not a built in support to easily
convert the text to Unicode and back. The next step was to move the
text to Unicode using widestrings. Now with the new (Unicode)string
type the life is much easier. Of course this new shiny features
required some code rewrite mostly for the dot-matrix printer support as
now the data you send must be threatened as binary data.
wenjie zhou

Posts: 424
Registered: 6/28/02
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 10, 2015 8:30 PM   in response to: Lajos Juhasz in response to: Lajos Juhasz
Finally you wrote something correctly. You wrote text and not binary
data. Yes strings are meant to contain text data and not binary.
Embarcadero did a great job to support it.

First, what is text? and what is binary data?
Before UTF8, Ansi is text, and UTF8 is binary data, is it? Today is the binary data, and maybe tomorrow it is text.
In old days, we can use AnsiString to store UTF8 encoding string, ansi encoding string, even can store GB2312 and etc.
Perhaps in my application, I need to customize a encoding text. You maybe call it binary. But i think it is text data.
Embarcadero did a bad job to trevent the expression of diversification.
I know that Americans mostly pursuit freedom. Why to add so many restrictions? Where is freedom?

I agree they should deprecate short strings. Unfortunately short
strings are widely used and they made a choice to leave them in the
language. I can't blame them. Fortunately short strings are not widely
used and can be replaced without a headache (except when used in packed
records).

I can't understand why you think they should deprecate short strings.

Short string is very useful for net application. It's perfect. I don't know what can replace short string in records.
Do not tell me it is array of bytes.

So now you are advocating removing ANSI strings from the language completely?

No, i do not advocat removing ANSI strings from the language completely?
I want that have AnsiString[n] and UnicodeString[n] in the same time. And, in XE, String[n] = UnicodeString[n].
UnicodeString[n] can have 65545(word) charaters. And each charater has two bytes. That's it.

Now with the new (Unicode)string type the life is much easier.

We can have mang methods to make the life much easier. But (Unicode) string is bad way.
If you are interested, we can discuss this topic further.
Lajos Juhasz

Posts: 801
Registered: 3/14/14
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 11, 2015 7:33 AM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

First, what is text? and what is binary data?

Text is a sequence of characters written in some language. Now text can
be represented using ANSI code pages or using unicode.

Before UTF8, Ansi is text, and UTF8 is binary data, is it? Today is
the binary data, and maybe tomorrow it is text. In old days, we can
use AnsiString to store UTF8 encoding string, ansi encoding string,
even can store GB2312 and etc. Perhaps in my application, I need to
customize a encoding text. You maybe call it binary. But i think it
is text data. Embarcadero did a bad job to trevent the expression of
diversification. I know that Americans mostly pursuit freedom. Why
to add so many restrictions? Where is freedom?

Embarcadero by making ansi string code-page aware tried to make even
simpler to assign different ansi strings to unicodestring and back.
This will work perfectly when you know in which code page is used in
input data. Otherwise if you have to handle multiple code pages this
can be a harder job than before. If your input data can be in multiple
code pages than yes you have to threat it as a binary data parse for
every part of it detect in which code page is the data written and
using the TEncoding class convert it to unicode. This can be more

I can't understand why you think they should deprecate short strings.

Shortstring is a leftover from the old days (Pascal days or Delphi 1).
When strings were more binary containers than really strings. It's not
code-page aware and just bring confusion on the table.

Short string is very useful for net application. It's perfect. I
don't know what can replace short string in records. Do not tell me
it is array of bytes.

If the application is used to store text in a single language that
shortstring is yes usefull. However nowdays we live in unicode word.
Everyone would like to use the full alphabet. How could you handle
chineese russian etc. characters in the shortstring? You can't you
would have to add an extra field describing the code page in that the
data was written. Now if you have a codepage in addition of the string
you are using the shortstring as an array of bytes that must be
converted to unicode in order to display it.

UnicodeString[n] can have 65545(word) charaters. And each charater
has two bytes. That's it.

Please don't forget that some characters can be represented only by
surrogate pairs. Thus how much memory should be allocated for
UnicodeString[n]? Also please note that a record containing
UnicodeString[n] would not be binary compatible with the old version of
the record. A true fixed length shortstring could be achieved only by
encoding the content in UTF-32.


Now with the new (Unicode)string type the life is much easier.

We can have mang methods to make the life much easier. But (Unicode)
string is bad way. If you are interested, we can discuss this topic
further.

Every string representation has it's possitive and negative sides. None
of the representations is perfect. While UTF-8 tends to be most
compact, UTF-32 on the other hand would be lenght encoding.
Roy Lambert

Posts: 1,063
Registered: 8/7/01
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 11, 2015 8:18 AM   in response to: Lajos Juhasz in response to: Lajos Juhasz
Lajos

First, what is text? and what is binary data?

Text is a sequence of characters written in some language. Now text can
be represented using ANSI code pages or using unicode.

You started out well here, but only really succeeded in making things less clear. You choose to use the word "characters" which implies an alphabet and excludes languages such as Chinese which do not use an alphabet. There are also a number of (admittedly the ones I know of are dead) languages which do use characters, have an alphabet but aren't covered by current unicode (eg Ugarit which used cuniform and was possibly the first alphabetically based system). The we can add in smileys which only by the deranged standards of today's youf can be considered a character or part of a language.

Shortstring is a leftover from the old days (Pascal days or Delphi 1).
When strings were more binary containers than really strings. It's not
code-page aware and just bring confusion on the table.

So at that point it was alright to store binary data into a string?

Everyone would like to use the full alphabet.

Please do assume everyone shares your opinion - I may be in a minority, however, the only "full" alphabet I'm interested in is the English one.

Roy Lambert
wenjie zhou

Posts: 424
Registered: 6/28/02
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 11, 2015 9:33 PM   in response to: Lajos Juhasz in response to: Lajos Juhasz
The character is represented by binary data. No matter what language you used.
We can say that the character is binary data in computer. Do we know all the language in theuniverse? Obviously not
So some languages we don't know can be use any binary data to express. So that is why i like STRING in old DELPHI.
We must leave enough space for the unknown world.
Once again, i want to emphasize Unicode is one encoding for string. Not the only one in the world. Text and Unicode is
not the same thing. Force them to set them together will made huge obstacles for the expansion of the future.

Embarcadero by making ansi string code-page aware tried to make even
simpler to assign different ansi strings to unicodestring and back.

If the rtl supply some code , we can be simpler too. But build in type should not do so such thing.
e.g.
<code>
UTF8String = record
private
FCodePage: Integer;
FByteString; AnsiString;
public
class operator Implicit(Value: AnsiString): UTF8String ;
class operator Implicit(Value: WideString): UTF8String ;
class operator Implicit(Value: UTF8String ): AnsiString ;
class operator Implicit(Value: UTF8String ): WideString;
end;
</code>
You see, UTF8String can do such convertion. And AnsiString, WideString do not need care about codepage.

And i am Chinese. I do not like the full alphabet. And Roy Lambert is also do not.
It just like automatic camera and SLR camera. I do not need automatic camera . I want to produce high quality product.
Lajos Juhasz

Posts: 801
Registered: 3/14/14
Re: Unicode strings in structured data [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 15, 2015 9:21 AM   in response to: wenjie zhou in response to: wenjie zhou
The character is represented by binary data. No matter what language
you used. We can say that the character is binary data in computer.

The characters are binary data if you use assembler and some other low
level languages. However modern languages that are Unicode enabled can
know really well text. Unfortunately Unicode doesn't make your job
easier. Even with a Unicode enabled application it's not an easy task
to support every lanaguage. For example to find upper/lower case for an
unicode code point can be ambigous (is a language dependent).

Do we know all the language in theuniverse? Obviously not So some
languages we don't know can be use any binary data to express. So
that is why i like STRING in old DELPHI. We must leave enough
space for the unknown world.

This is already done. There is quite enough space in the unicode table
for future languages.

If the rtl supply some code , we can be simpler too. But build in
type should not do so such thing. e.g.
<code>
UTF8String = record
private
FCodePage: Integer;
FByteString; AnsiString;
public
class operator Implicit(Value: AnsiString): UTF8String ;
class operator Implicit(Value: WideString): UTF8String ;
class operator Implicit(Value: UTF8String ): AnsiString ;
class operator Implicit(Value: UTF8String ): WideString;
end;
</code>
You see, UTF8String can do such convertion. And AnsiString,
WideString do not need care about codepage.

Not really. How would you handle a concatenation of a Chinese Ansi
string with a Greek one without writing additional code?

And i am Chinese. I do not like the full alphabet. And Roy Lambert
is also do not. It just like automatic camera and SLR camera. I do
not need automatic camera . I want to produce high quality product.

Maybe you doesn't but your program sure is using. Windows controls are
Unicode thus Delphi must communicate with Windows API using PCHAR. I
know there for the most Windows API there is still ANSI version
(unfortunately most of them just converts the input data to Unicode and
executes the Unicode version of the function). Nowadays any code can
really use ANSI strings as input data from some device, file or
database and to send it back to any device, file or database. However
nowadays an ANSI input data is less common.
wenjie zhou

Posts: 424
Registered: 6/28/02
Re: Unicode strings in structured data [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 15, 2015 9:11 PM   in response to: Lajos Juhasz in response to: Lajos Juhasz
The characters are binary data if you use assembler and some other low
level languages.
Unfortunately, Delphi include Object-oriented Pascal and BASM. And we can use assembler in Delphi. I think this is the biggest difference with you.
I will not be down the assembler function. I think assembler is important. And you want discard it.

Now, just think about now solution. Whenever a string assignment to another. The compilered code had to compare the codepage to judge how to convert them.
There is no doubt that 90% of the scene we do not need such comparing. And this is not a rigorous scientific attitude.

Think about RTTI. It also happens that, In 90% of the scene, we do not need RTTI, but we have to include the useless information. And the .exe has become so bloated.
Think about lockable object. In 90% of the scene, we do not need lock the object, but we have to include the hidden field in every object.

The attitude is very harmful.

This is already done. There is quite enough space in the unicode table
for future languages.
People always believe that they have found all the solutions. But the truth is always the opposite.
For example, If we find an alien civilization, and the use another encoding string.
They have a lot of applications and refuse to use Unicode.
How to com communication with them? Convert now UnicodeString to bytes array?

Not really. How would you handle a concatenation of a Chinese Ansi
string with a Greek one without writing additional code?

I don't know what you mean. UTF8String or UTF16String is same as UnicodeString(in now solution). So i do not think this has any problem.

ANSI input data is less common.
Less ? There are a lot of equipment still using Ansi. They do not need unicode.
Not all devices need to have a variety of language skills. e.g Router, hub, SCM. And in network communications, in many case we also do not need Unicode.
Do not think Unicode is a silver bullet please.
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 15, 2015 11:26 PM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

The characters are binary data if you use assembler and some other
low level languages.
Unfortunately, Delphi include Object-oriented Pascal and BASM.

Unfortunately? Huh?

--
Rudy Velthuis http://www.rvelthuis.de

"In all affairs, it's a healthy thing now and then to hang a
question mark on the things you have long taken for granted."
-- Bertrand Russell
wenjie zhou

Posts: 424
Registered: 6/28/02
Re: Unicode strings in structured data [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 16, 2015 1:10 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...

Unfortunately? Huh?

I mean Delphi has BASM and can use assembler. Unfortunately, Delphi is he called "some other low level languages".
Lajos Juhasz

Posts: 801
Registered: 3/14/14
Re: Unicode strings in structured data [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 16, 2015 7:12 AM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:


Unfortunately? Huh?

I mean Delphi has BASM and can use assembler. Unfortunately, Delphi
is he called "some other low level languages".

No, I meant assembler and other low level language. Yes, assembler is
low level.
Bo Berglund

Posts: 757
Registered: 10/23/02
Re: Unicode strings in structured data [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 16, 2015 1:03 AM   in response to: wenjie zhou in response to: wenjie zhou
On Mon, 15 Jun 2015 21:11:07 -0700, wenjie zhou <> wrote:

For example, If we find an alien civilization, and the use another encoding string.
They have a lot of applications and refuse to use Unicode.
How to com communication with them? Convert now UnicodeString to bytes array?

They probably would not use 8 bit bytes anyway. A byte is just a
randomly selected size of a bit array to represent one unit of word
organization...

A civilization on a planet 100 light years away would with almost
certainty use something completely different if they at all store data
in any way resembling what we do. Maybe they even use a trinary
concept?

Aliens are not a good argument..

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
wenjie zhou

Posts: 424
Registered: 6/28/02
Re: Unicode strings in structured data [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 16, 2015 1:23 AM   in response to: Bo Berglund in response to: Bo Berglund
Aliens are not a good argument..

Yes, i know aliens is not a good argument. I just want to explain that Unicode is not a panacea.
And we should not supply such a panacea to express Text.
Just think about this :

We have ByteString and WordString. They have only reference count and binday data. Do not include code page.
They are original type.
And further more, we have smoe records type.

[code]
UTF16_String = record
pirvate
FCodePage: Integer;
FStringData: WordString;
public
.... here, we can define mang implicit convert ...
end;

UTF8_String = record
end;

Ansi_String = record
end;

//You can even define this
UTF32_String = record
end;

[code]

Maybe, in most case, we can may use UTF16_String instead of String;

e.g.

[code]
var
S: UTF16_String;
begin
ShowMessage(S); // It's OK, because UTF16_String has the convertion to UnicodeString.
end;
[code]
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 17, 2015 1:25 AM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

Aliens are not a good argument..

Yes, i know aliens is not a good argument. I just want to explain
that Unicode is not a panacea. And we should not supply such a
panacea to express Text. Just think about this :

We have ByteString and WordString. They have only reference count and
binday data. Do not include code page. They are original type.

You already have those. They are called dynamic arrays. And exactly
those are the best way to manage binary data.

--
Rudy Velthuis http://www.rvelthuis.de

"The internet is not something you just dump something on. It's
not a truck. It's a series of tubes!"
-- Sen. Ted Stevens, chairman of the United States Senate
Committee on Commerce, Science and Transportation

wenjie zhou

Posts: 424
Registered: 6/28/02
Re: Unicode strings in structured data [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 17, 2015 7:34 PM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
You already have those. They are called dynamic arrays. And exactly
those are the best way to manage binary data.

Dynamic arrays is everything. It can even be used to express integer, float, int64, int128 and etc. Dynamic arrays can even express a file or memory stream.
Should we use dynamic arrays to replace integer, float and etc? Obviously not.

Old style ansistring is a wondful thing. It only care some original things:

(1) Member is one byte ==> SomeText[x] is one byte
(2) Member count ==> SomeText[0]
(3) Reference count
(4) Scan the memory by byte ==> Pos(SomeText, 'abc');
(5) Joint string easier ==> Copy memory and change SomeText[0]

It do not care encoding. That's a very simple principle. And very elegant and clear.

We've had such a problem: we can not easy got string characters count in UTF8 string.
And the Length only return bytes but not characters in UTF8 string.
If UTF8 String is really text. Then Length should return characters count, but not bytes.
This problem is same as UTF16( UnicodeString ). You see, Here, the expression is very confusing.
Instead of complicating the problem, it might as well not use UnicodeString with codepage.
Just only express tow bytes string. That is simple and easy to understand.
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data [Edit] [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 18, 2015 12:52 AM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

You already have those. They are called dynamic arrays. And exactly
those are the best way to manage binary data.

Dynamic arrays is everything. It can even be used to express integer,
float, int64, int128 and etc. Dynamic arrays can even express a file
or memory stream. Should we use dynamic arrays to replace integer,
float and etc? Obviously not.

That's idiotic and doesn't make any sense.

A dynamic array is, however, very well destined to hold multiples of
the same kind, like your byte or word strings. They are reference
counted too, and you can dynamically change their sizes.

--
Rudy Velthuis http://www.rvelthuis.de

"The artist is nothing without the gift, but the gift is nothing
without work." -- Emile Zola (1840-1902)
wenjie zhou

Posts: 424
Registered: 6/28/02
Re: Unicode strings in structured data [Edit] [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 18, 2015 7:08 PM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
Yes. They are reference counted. And can change their size.
How about the string routines ? Pos(), LeftStr(), Trim() and etc. Dynamic array is only a workround. It is not Text or String.

In addition.
I would like to quote Lajos's words again :
"Text is a sequence of characters written in some language. Now text can be represented using ANSI code pages or using unicode."
The codepage is for character. And In theory, a TEXT(string) can contain a variety of codepage character.
Based on this analysis, every character should has codepage. And the type WideChar should has codepage too.

All the reasons above for String should has codepage. also can prove WideChar should has codepage.
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data [Edit] [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 19, 2015 2:02 AM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

Yes. They are reference counted. And can change their size.
How about the string routines ? Pos(), LeftStr(), Trim() and etc.
Dynamic array is only a workround. It is not Text or String.

No, text is text and best stored in a string. But XE8 does have some of
the functions you mention.

What should Trim remove on a word array? What is LeftStr on binary data?

--
Rudy Velthuis http://www.rvelthuis.de

Cann's (or Allen's) Axiom: When all else fails, read the
instructions.
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data [Edit] [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 19, 2015 4:27 AM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

Yes. They are reference counted. And can change their size.
How about the string routines ? Pos(), LeftStr(), Trim() and etc.
Dynamic array is only a workround. It is not Text or String.

No, text is text and best stored in a string. But XE8 does have some of
the functions you mention.

What should Trim remove on a word array? What is LeftStr on binary data?

--
Rudy Velthuis http://www.rvelthuis.de

Cann's (or Allen's) Axiom: When all else fails, read the
instructions.
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data [Edit] [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 19, 2015 4:30 AM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

Yes. They are reference counted. And can change their size.
How about the string routines ? Pos(), LeftStr(), Trim() and etc.
Dynamic array is only a workround. It is not Text or String.

No, text is text and best stored in a string. But XE8 does have some of
the functions you mention.

What should Trim remove on a word array? What is LeftStr on binary data?

--
Rudy Velthuis http://www.rvelthuis.de

Cann's (or Allen's) Axiom: When all else fails, read the
instructions.
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data [Edit] [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 22, 2015 1:59 AM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

Yes. They are reference counted. And can change their size.
How about the string routines ? Pos(), LeftStr(), Trim() and etc.
Dynamic array is only a workround. It is not Text or String.

No, text is text and best stored in a string. But XE8 does have some of
the functions you mention.

What should Trim remove on a word array? What is LeftStr on binary data?

--
Rudy Velthuis http://www.rvelthuis.de

Cann's (or Allen's) Axiom: When all else fails, read the
instructions.
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data [Edit] [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 22, 2015 2:41 AM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

Yes. They are reference counted. And can change their size.
How about the string routines ? Pos(), LeftStr(), Trim() and etc.
Dynamic array is only a workround. It is not Text or String.

No, text is text and best stored in a string. But XE8 does have some of
the functions you mention.

What should Trim remove on a word array? What is LeftStr on binary data?

--
Rudy Velthuis http://www.rvelthuis.de

Cann's (or Allen's) Axiom: When all else fails, read the
instructions.
Bo Berglund

Posts: 757
Registered: 10/23/02
Re: Unicode strings in structured data [Edit] [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 23, 2015 1:47 AM   in response to: Rudy Velthuis (... in response to: Rudy Velthuis (...
6 duplicates....

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
Rudy Velthuis (...


Posts: 7,731
Registered: 9/22/99
Re: Unicode strings in structured data [Edit] [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 23, 2015 5:30 AM   in response to: Bo Berglund in response to: Bo Berglund
Bo Berglund wrote:

6 duplicates....

Nope. Yesterday, I cancelled some of them.

--
Rudy Velthuis http://www.rvelthuis.de

"I'm so poor I can't even pay attention." -- Unknown
Lajos Juhasz

Posts: 801
Registered: 3/14/14
Re: Unicode strings in structured data [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 16, 2015 7:23 AM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

assembler is important. And you want discard it.

No I never wrote that assembler should be removed from the language. I
just wrote that there are languages that are low level without a
language for string data type. In those languages your handle strings
as binary data (for example array of bytes).


Now, just think about now solution. Whenever a string assignment to
another. The compilered code had to compare the codepage to judge how
to convert them. There is no doubt that 90% of the scene we do not
need such comparing. And this is not a rigorous scientific attitude.

This really depends on your code. You wrote that you're using only
ASCII that means that in your code doesn't matter. I almost never use
the ansi strings, but when I need them it must be code page aware in
order to be able to convert it to Unicode. I know I could go back and
handle it in old fashion and have another variable to hold the code
page for the string. However having two variable for a string can
introduce errors.

Think about RTTI. It also happens that, In 90% of the scene, we do
not need RTTI, but we have to include the useless information. And
the .exe has become so bloated.

I agree that most of the times the enhanced RTTI bloats the exe.
Unfortunately to use the enhanced RTTI is still on my to do list.

Think about lockable object. In 90%
of the scene, we do not need lock the object, but we have to include
the hidden field in every object.

The attitude is very harmful.

I disagree. It's a good thing that some objects are ready for multiple
scenarios.
Lajos Juhasz

Posts: 801
Registered: 3/14/14
Re: Unicode strings in structured data [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 16, 2015 9:00 AM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

assembler is important. And you want discard it.

No I never wrote that assembler should be removed from the language. I
just wrote that there are languages that are low level without a
language for string data type. In those languages your handle strings
as binary data (for example array of bytes).


Now, just think about now solution. Whenever a string assignment to
another. The compilered code had to compare the codepage to judge how
to convert them. There is no doubt that 90% of the scene we do not
need such comparing. And this is not a rigorous scientific attitude.

This really depends on your code. You wrote that you're using only
ASCII that means that in your code doesn't matter. I almost never use
the ansi strings, but when I need them it must be code page aware in
order to be able to convert it to Unicode. I know I could go back and
handle it in old fashion and have another variable to hold the code
page for the string. However having two variable for a string can
introduce errors.

Think about RTTI. It also happens that, In 90% of the scene, we do
not need RTTI, but we have to include the useless information. And
the .exe has become so bloated.

I agree that most of the times the enhanced RTTI bloats the exe.
Unfortunately to use the enhanced RTTI is still on my to do list.

Think about lockable object. In 90%
of the scene, we do not need lock the object, but we have to include
the hidden field in every object.

The attitude is very harmful.

I disagree. It's a good thing that some objects are ready for multiple
scenarios.
wenjie zhou

Posts: 424
Registered: 6/28/02
Re: Unicode strings in structured data [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 16, 2015 6:56 PM   in response to: Lajos Juhasz in response to: Lajos Juhasz
However having two variable for a string can introduce errors.
The UTF8_String and UTF16_String and etc. is supply by VCL. Not by your self.
The two variable is private. How can you introduce errors?

I agree that most of the times the enhanced RTTI bloats the exe.
Unfortunately to use the enhanced RTTI is still on my to do list.

Nobody thought RTTI was bad. And the main problem is to supply a choice for developers.
If RTTI is stored in a seperated file. Can user can chose wheather to link the RTTI. Is it more better?

And tt is same as lockable object;
Think about this :

TSomeObject = class lock
end;

We can supply the key word lock. And let developers to judge which object should has the hidden field. But not every class have the field.
That's not difficult. And will not waste memory, will not destroy the memory compatibility with the C++ object. Is it?

I had do some job to export c++ object in .DLL, and let Delphi can use it.

Lajos Juhasz

Posts: 801
Registered: 3/14/14
Re: Unicode strings in structured data [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 17, 2015 10:19 PM   in response to: wenjie zhou in response to: wenjie zhou
wenjie zhou wrote:

assembler is important. And you want discard it.

No I never wrote that assembler should be removed from the language. I
just wrote that there are languages that are low level without a
language for string data type. In those languages your handle strings
as binary data (for example array of bytes).


Now, just think about now solution. Whenever a string assignment to
another. The compilered code had to compare the codepage to judge how
to convert them. There is no doubt that 90% of the scene we do not
need such comparing. And this is not a rigorous scientific attitude.

This really depends on your code. You wrote that you're using only
ASCII that means that in your code doesn't matter. I almost never use
the ansi strings, but when I need them it must be code page aware in
order to be able to convert it to Unicode. I know I could go back and
handle it in old fashion and have another variable to hold the code
page for the string. However having two variable for a string can
introduce errors.

Think about RTTI. It also happens that, In 90% of the scene, we do
not need RTTI, but we have to include the useless information. And
the .exe has become so bloated.

I agree that most of the times the enhanced RTTI bloats the exe.
Unfortunately to use the enhanced RTTI is still on my to do list.

Think about lockable object. In 90%
of the scene, we do not need lock the object, but we have to include
the hidden field in every object.

The attitude is very harmful.

I disagree. It's a good thing that some objects are ready for multiple
scenarios.
Bo Berglund

Posts: 757
Registered: 10/23/02
Re: Unicode strings in structured data  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jun 7, 2015 9:37 AM   in response to: Jens Munk in response to: Jens Munk
This thread now contains 54 messages, not all of which are really
addressing the issue...

---
Bo Berglund
Sweden & Texas
Newsreader: Forte Free Agent 1.92/32.572
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02