Watch, Follow, &
Connect with Us

For forums, blogs and more please visit our
Developer Tools Community.


Welcome, Guest
Guest Settings
Help

Thread: Best Practices for Data Stuctures


This question is answered. Helpful answers available: 0. Correct answers available: 1.


Permlink Replies: 8 - Last Post: Feb 29, 2016 3:59 PM Last Post By: Dan Barclay Threads: [ Previous | Next ]
Richard Zarr

Posts: 74
Registered: 7/1/98
Best Practices for Data Stuctures  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 15, 2016 1:42 PM
We are migrating from BDS2006 to RAD Studio 2010 Seattle. We are looking at our original data structures and the biggest issue we are having is with Unicode. Our packet class uses a string type as the payload... that simply will not work now with Unicode. So does anyone have any ideas on what we could do to replace the original string type. See example below:

TMyPacket  = record
 ID : int64;
 PackType : MyPacketType;
 OtherStuff : boolean;
 Payload : string;
 CRC : integer;
end;
Peter Guth

Posts: 28
Registered: 2/11/05
Re: Best Practices for Data Stuctures
Helpful
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 15, 2016 3:16 PM   in response to: Richard Zarr in response to: Richard Zarr
Try ANSIstring.
Richard Zarr

Posts: 74
Registered: 7/1/98
Re: Best Practices for Data Stuctures  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 15, 2016 5:12 PM   in response to: Peter Guth in response to: Peter Guth
Peter Guth wrote:
Try ANSIstring.

This didn't seem to work... but it will take some more investigation. We used code like this to serialize the packet:

 Len := length(FPayload);
 Stream.Write(Len, SizeOf(Len));
 if (Len > 0) then Stream.Write(FPayload[1], Len);


So there is a one-to-one correlation to a single byte per character which some are binary. We used the inherent string dynamic memory allocation which we can no longer use if we move to TBytes. So I'm thinking we need to completely rewrite the class that represents a packet and use TBytes for the actual data. We could do a synonym of payload, but I'm not how we would represent it. We made a bad decision based on the convenience of the original string type... agh! Now we're stuck with our mess...

Update: We actually missed several ansistring and ansichar definitions... upon fixing those, everything magically started working. This is the short term solution... thanks for the idea!

Edited by: Richard Zarr on Feb 15, 2016 5:27 PM

Edited by: Richard Zarr on Feb 15, 2016 5:28 PM
Dan Barclay

Posts: 889
Registered: 11/9/03
Re: Best Practices for Data Stuctures  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 16, 2016 8:23 AM   in response to: Richard Zarr in response to: Richard Zarr
Richard Zarr wrote:
Peter Guth wrote:
Try ANSIstring.

This didn't seem to work... but it will take some more investigation. We used code like this to serialize the packet:

 Len := length(FPayload);
 Stream.Write(Len, SizeOf(Len));
 if (Len > 0) then Stream.Write(FPayload[1], Len);


So there is a one-to-one correlation to a single byte per character which some are binary. We used the inherent string dynamic memory allocation which we can no longer use if we move to TBytes. So I'm thinking we need to completely rewrite the class that represents a packet and use TBytes for the actual data. We could do a synonym of payload, but I'm not how we would represent it. We made a bad decision based on the convenience of the original string type... agh! Now we're stuck with our mess...

Update: We actually missed several ansistring and ansichar definitions... upon fixing those, everything magically started working. This is the short term solution... thanks for the idea!

They did a very bad job of moving to Unicode. Previously the (ansi)string was a great tool for handling byte/binary data. In fact, there wasn't any other good way to do it if you wanted arrays of those.

In the move to Unicode they "helped" way too much and caused tons of problems. There are automatic conversions between Ansi and Unicode. Some functions had their parameters changed without even overloading. Look into AnsiLeftStr() and friends, you'll find that their parameters are NOT ansistring, so using ansistring causes it to convert to Unicode, then convert back to ansi. You have to explicitly specify another unit to get actual ansi.

You can get there, but it will require some diligence. I had hundreds of thousands of lines of code, much of it handling data as bytes in ansistring. As you can tell, I'm still pretty ticked about the whole mess. That is especially true since I went through it with VB years ago and warned Embarcadero of it ahead of time. There are compiler warnings for most, but you still have to beware.

They didn't "get it", and still don't. In the latest release they just now are offering new options, but those are sadly inadequate.

Bottom line: Ansistring is likely still your best choice, unless you want to rewrite to byte arrays. Track down your corruption points (which will likely be at automagic type conversions) until you kill them.

Dan
Peter Below

Posts: 1,227
Registered: 12/16/99
Re: Best Practices for Data Stuctures  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 16, 2016 10:50 AM   in response to: Richard Zarr in response to: Richard Zarr
Richard Zarr wrote:

We are migrating from BDS2006 to RAD Studio 2010 Seattle. We are
looking at our original data structures and the biggest issue we are
having is with Unicode. Our packet class uses a string type as the
payload... that simply will not work now with Unicode. So does
anyone have any ideas on what we could do to replace the original
string type. See example below:

TMyPacket  = record
 ID : int64;
 PackType : MyPacketType;
 OtherStuff : boolean;
 Payload : string;
 CRC : integer;
end;

The problem is not in the packet definition per se, but in the way you
send and receive it. This kind of record cannot be transferred en bloc,
since the Payload field just contains a pointer to the original data.
Even if you change it to Ansistring you still have this problem. So,
how does the code sending or receiving such packets handle the Payload
field?


--
Peter Below
TeamB

Richard Zarr

Posts: 74
Registered: 7/1/98
Re: Best Practices for Data Stuctures  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 18, 2016 12:28 PM   in response to: Peter Below in response to: Peter Below
Peter Below wrote:
Richard Zarr wrote:

We are migrating from BDS2006 to RAD Studio 2010 Seattle. We are
looking at our original data structures and the biggest issue we are
having is with Unicode. Our packet class uses a string type as the
payload... that simply will not work now with Unicode. So does
anyone have any ideas on what we could do to replace the original
string type. See example below:

TMyPacket  = record
 ID : int64;
 PackType : MyPacketType;
 OtherStuff : boolean;
 Payload : string;
 CRC : integer;
end;

The problem is not in the packet definition per se, but in the way you
send and receive it. This kind of record cannot be transferred en bloc,
since the Payload field just contains a pointer to the original data.
Even if you change it to Ansistring you still have this problem. So,
how does the code sending or receiving such packets handle the Payload
field?


--
Peter Below
TeamB


The above example doesn't completely reflect our structure... it is not a record, but a class. The class has a SaveToStream method that writes each of the structures to a stream... the one that is in issue is the Payload field which originally was defined as a String... see below:
 Len := length(FPayload);
 Stream.Write(Len, SizeOf(Len));
 if (Len > 0) then
  Stream.Write(FPayload[1], Len);
Jeff Overcash (...

Posts: 1,529
Registered: 9/23/99
Re: Best Practices for Data Stuctures  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 18, 2016 12:47 PM   in response to: Richard Zarr in response to: Richard Zarr
Richard Zarr wrote:

The above example doesn't completely reflect our structure... it is not a record, but a class. The class has a SaveToStream method that writes each of the structures to a stream... the one that is in issue is the Payload field which originally was defined as a String... see below:
 Len := length(FPayload);
 Stream.Write(Len, SizeOf(Len));
 if (Len > 0) then
  Stream.Write(FPayload[1], Len);

Length is the number of characters, not the number of bytes. You are only
writing 1/2 your string.

If Len is hte number of bytes in the payload then change the Len := line to

   Len := length(FPayload) * SizeOf(Char);


On the reading side remember that the Len passed is the # of bytes so calls to
things like SetString should be Len / SizeOf(Char).

--
Jeff Overcash (TeamB)
(Please do not email me directly unless asked. Thank You)
Learning is finding out what you already know. Doing is demonstrating that you
know it. Teaching is reminding others that they know it as well as you. We are
all leaners, doers, teachers. (R Bach)

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
Richard Zarr

Posts: 74
Registered: 7/1/98
Re: Best Practices for Data Stuctures  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 23, 2016 11:12 AM   in response to: Jeff Overcash (... in response to: Jeff Overcash (...
Jeff Overcash (TeamB) wrote:
Richard Zarr wrote:

The above example doesn't completely reflect our structure... it is not a record, but a class. The class has a SaveToStream method that writes each of the structures to a stream... the one that is in issue is the Payload field which originally was defined as a String... see below:
 Len := length(FPayload);
 Stream.Write(Len, SizeOf(Len));
 if (Len > 0) then
  Stream.Write(FPayload[1], Len);

Length is the number of characters, not the number of bytes. You are only
writing 1/2 your string.

If Len is hte number of bytes in the payload then change the Len := line to

   Len := length(FPayload) * SizeOf(Char);


On the reading side remember that the Len passed is the # of bytes so calls to
things like SetString should be Len / SizeOf(Char).

--
Jeff Overcash (TeamB)
(Please do not email me directly unless asked. Thank You)
Learning is finding out what you already know. Doing is demonstrating that you
know it. Teaching is reminding others that they know it as well as you. We are
all leaners, doers, teachers. (R Bach)

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Agreed... Once we have done most of the refactoring for other issues, we're going to change this structure to replace the String with TBytes. Then we have a 1-1 byte correlation. The real problem is we have used the 'String' type interchangeably as buffers all over the code. In the old days before Unicode, this was a normal practice and was actually encouraged (Pre Delphi 7). But with the advent of Unicode, all of that changed and now we have some work to do. So here's the plan:

1) Leave any 'strings' that are not passed to DLLs as type 'String' which is now Unicode
2) Leave all DLL passing structures that contain strings as 'ShortStrings" (what we already have) and convert to 'String' types on each end. We get tons of Implicit String Cast warnings, but we do not allow long strings (>255 chars) in our DLL functions. This should be fine.
3) Modify any buffer related functions to remove 'String' types (note: we only have a few places where this happens) and replace with TBytes.

Overall this strategy might take a bit of work, but we're modifying a great deal to bring us into the 21st century... thoughts?
Dan Barclay

Posts: 889
Registered: 11/9/03
Re: Best Practices for Data Stuctures
Helpful
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 29, 2016 3:59 PM   in response to: Richard Zarr in response to: Richard Zarr
Richard Zarr wrote:
Jeff Overcash (TeamB) wrote:
Richard Zarr wrote:

The above example doesn't completely reflect our structure... it is not a record, but a class. The class has a SaveToStream method that writes each of the structures to a stream... the one that is in issue is the Payload field which originally was defined as a String... see below:
 Len := length(FPayload);
 Stream.Write(Len, SizeOf(Len));
 if (Len > 0) then
  Stream.Write(FPayload[1], Len);

Length is the number of characters, not the number of bytes. You are only
writing 1/2 your string.

If Len is hte number of bytes in the payload then change the Len := line to

   Len := length(FPayload) * SizeOf(Char);


On the reading side remember that the Len passed is the # of bytes so calls to
things like SetString should be Len / SizeOf(Char).

--
Jeff Overcash (TeamB)
(Please do not email me directly unless asked. Thank You)
Learning is finding out what you already know. Doing is demonstrating that you
know it. Teaching is reminding others that they know it as well as you. We are
all leaners, doers, teachers. (R Bach)

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Agreed... Once we have done most of the refactoring for other issues, we're going to change this structure to replace the String with TBytes. Then we have a 1-1 byte correlation. The real problem is we have used the 'String' type interchangeably as buffers all over the code. In the old days before Unicode, this was a normal practice and was actually encouraged (Pre Delphi 7).
There are people here who will argue with you about that, forever. You are right, they are wrong.

But with the advent of Unicode, all of that changed and now we have some work to do. So here's the plan:

1) Leave any 'strings' that are not passed to DLLs as type 'String' which is now Unicode

If they are text, yes.

2) Leave all DLL passing structures that contain strings as 'ShortStrings" (what we already have) and convert to 'String' types on each end. We get tons of Implicit String Cast warnings, but we do not allow long strings (>255 chars) in our DLL functions. This should be fine.

Maybe, maybe not. You have to be watchful, if the data are not text. As I mentioned above, some of those functions do implicit conversions when you do not expect them. There are a lot of warnings in a large app... almost too many to be useful. You just have to watch your data for corruption. The process you describe should work fine, just don't underestimate finding some of the needles in the haystack.

3) Modify any buffer related functions to remove 'String' types (note: we only have a few places where this happens) and replace with TBytes.
Should be good to go.

Overall this strategy might take a bit of work, but we're modifying a great deal to bring us into the 21st century... thoughts?
It can be a complex task, if you've used strings (and particularly arrays of strings) heavily. TBytes don't replace old ansistrings very well in the real world. That's a shame, because they could have done this so easily.

Sounds like the right track, just watch for the landmines.

Dan
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02