Watch, Follow, &
Connect with Us

Please visit our new home
community.embarcadero.com.


Welcome, Guest
Guest Settings
Help

Thread: Sorting unicode strings



Permlink Replies: 7 - Last Post: Mar 25, 2015 2:38 PM Last Post By: Ciarán Ó Duibhín
Ciarán Ó Duibhín

Posts: 13
Registered: 4/1/09
Sorting unicode strings
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 23, 2015 1:31 PM
I would appreciate some general advice on string ordering with Delphi. I'm
looking for an alternative to s1 > s2 or CompareC, but which orders unicode
strings according to the Unicode Collation Algorithm (
http://www.unicode.org/reports/tr10/ ), which means taking comprehensive
account of case,
accents, specials, etc.

I'll try to give an example from (colloquial) English. I make a list of the
words in an English text, and I want to sort them. One of the words is "
'bout ", a shortened form of " about ". In my alphabetic list I want "
'bout " to be placed immediately after " bout ". In the jargon, I want the
apostrophe to be treated as a special, which means its presence or absence
is less significant than (say) an accent.
I can get the online ICU collation demo (see below) to do this by making one
change to the default parameters: setting "alternate" to "shifted". But I
don't know any way to do it in Delphi, where (afaik) an apostrophe will
always sort between an ampersand and a left round bracket, like in ASCII.

When I worked with CP1252, I made my own routine for string ordering
according to these principles, which was feasible because I had only to
cover the character repertoire of CP1252. But now I'm moving to unicode and
that is no longer feasible, and besides I'd be surprised if there isn't
centralized provision for it already. (Perhaps there is by now, I'm still
using Delphi XE update 1.)

I'm aware that the ICU library ( http://site.icu-project.org/ ) supports
what I want, though not of how to call it. There are more explanations of
ICU at
http://userguide.icu-project.org/collation/architecture and
http://userguide.icu-project.org/collation/concepts and online demos at
http://demo.icu-project.org/icu-bin/collation.html and
http://demo.icu-project.org/icu-bin/scompare Officially, however, ICU only
exists for use with C/C++ and Java. I'm aware also of a Pascal port,
ICU4PAS (
http://quia.cf/orange/pooxy4/nph-poxy.pl/es/20/http/www.crossgl.com/icu4pas/
), but its status is unclear to me. It's dated 2007, but that might well
be good enough for me.

Is anyone doing this in Delphi? If so, can you offer any guidance?
Peter Below

Posts: 1,227
Registered: 12/16/99
Re: Sorting unicode strings
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 24, 2015 10:47 AM   in response to: Ciarán Ó Duibhín in response to: Ciarán Ó Duibhín
Ciarán Ó Duibhín wrote:

I would appreciate some general advice on string ordering with
Delphi. I'm looking for an alternative to s1 > s2 or CompareC, but
which orders unicode strings according to the Unicode Collation
Algorithm ( http://www.unicode.org/reports/tr10/ ), which means
taking comprehensive account of case,
accents, specials, etc.

I'll try to give an example from (colloquial) English. I make a list
of the words in an English text, and I want to sort them. One of the
words is " 'bout ", a shortened form of " about ". In my alphabetic
list I want " 'bout " to be placed immediately after " bout ". In
the jargon, I want the apostrophe to be treated as a special, which
means its presence or absence is less significant than (say) an
accent. I can get the online ICU collation demo (see below) to do
this by making one change to the default parameters: setting
"alternate" to "shifted". But I don't know any way to do it in
Delphi, where (afaik) an apostrophe will always sort between an
ampersand and a left round bracket, like in ASCII.

When I worked with CP1252, I made my own routine for string ordering
according to these principles, which was feasible because I had only
to cover the character repertoire of CP1252. But now I'm moving to
unicode and that is no longer feasible, and besides I'd be surprised
if there isn't centralized provision for it already. (Perhaps there
is by now, I'm still using Delphi XE update 1.)

I'm aware that the ICU library ( http://site.icu-project.org/ )
supports what I want, though not of how to call it. There are more
explanations of ICU at
http://userguide.icu-project.org/collation/architecture and
http://userguide.icu-project.org/collation/concepts and online demos
at http://demo.icu-project.org/icu-bin/collation.html and
http://demo.icu-project.org/icu-bin/scompare Officially, however,
ICU only exists for use with C/C++ and Java. I'm aware also of a
Pascal port, ICU4PAS (
http://quia.cf/orange/pooxy4/nph-poxy.pl/es/20/http/www.crossgl.com/icu4pas/
), but its status is unclear to me. It's dated 2007, but that might
well be good enough for me.

Is anyone doing this in Delphi? If so, can you offer any guidance?

Take a look at the CompareStringEx API function
(https://msdn.microsoft.com/en-us/library/windows/desktop/dd317761%28v=v
s.85%29.aspx). This function gives you a bit more control over the
comparison. The Delphi AnsiCompareStr and AnsiCompareText functions
from Sysutils use the CompareString API function with the default
locale and default flags, which may not fit your need.

If you want to write your own function, take a look at the Character
unit, which has support for classifying Unicode code points.

--
Peter Below (TeamB)
Ciarán Ó Duibhín

Posts: 13
Registered: 4/1/09
Re: Sorting unicode strings
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 26, 2015 5:06 PM   in response to: Peter Below in response to: Peter Below
"Peter Below" <none@address.invalid> wrote in message
news:715524 at forums dot embarcadero dot com...
Take a look at the CompareStringEx API function
(https://msdn.microsoft.com/en-us/library/windows/desktop/dd317761%28v=v
s.85%29.aspx). This function gives you a bit more control over the
comparison. The Delphi AnsiCompareStr and AnsiCompareText functions
from Sysutils use the CompareString API function with the default
locale and default flags, which may not fit your need.

If you want to write your own function, take a look at the Character
unit, which has support for classifying Unicode code points.

Thanks for those very useful ideas, but I really think my best bet is to try
and adapt ICU4PAS (or rather just the bit of it I need) to work with the
unicode versions of Delphi. Unless I am prepared to adapt ICU4PAS further
to post-2007 versions of ICU, I will have to make do with ICU 3.6, but that
may be good enough.

Thanks again for your help.
OUEDRAOGO Inoussa

Posts: 2
Registered: 3/7/01
Re: Sorting unicode strings
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 4, 2015 1:07 AM   in response to: Ciarán Ó Duibhín in response to: Ciarán Ó Duibhín
Ciarán Ó Duibhín wrote:
I would appreciate some general advice on string ordering with Delphi. I'm
looking for an alternative to s1 > s2 or CompareC, but which orders unicode
strings according to the Unicode Collation Algorithm (
http://www.unicode.org/reports/tr10/ ), which means taking comprehensive
account of case,
accents, specials, etc.

This* is an Object Pascal-based implementation of the Unicode Collation Algorithm, not depending on ICU or any other external library. That is the one that is used for the FPC' native unicode string manager. Not tested with Delphi ...
It supports :
- Incremental comparison of two strings(without computing all the sort key),
- computation of the sort key and keys' comparison functions


(*) : http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/rtl/objpas/unicodedata.pas?view=markup

Ciarán Ó Duibhín

Posts: 13
Registered: 4/1/09
Re: Sorting unicode strings
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 24, 2015 9:40 AM   in response to: OUEDRAOGO Inoussa in response to: OUEDRAOGO Inoussa
Thank you. I didn't see this message for some time. I'll certainly be
looking at this.
Ciarán Ó Duibhín.

"OUEDRAOGO Inoussa" wrote in message news:716421 at forums dot embarcadero dot com...
Ciarán Ó Duibhín wrote:
I would appreciate some general advice on string ordering with Delphi.
I'm
looking for an alternative to s1 > s2 or CompareC, but which orders
unicode
strings according to the Unicode Collation Algorithm (
http://www.unicode.org/reports/tr10/ ), which means taking comprehensive
account of case,
accents, specials, etc.

This* is an Object Pascal-based implementation of the Unicode Collation
Algorithm, not depending on ICU or any other external library. That is the
one that is used for the FPC' native unicode string manager. Not tested
with Delphi ...
It supports :
- Incremental comparison of two strings(without computing all the sort
key),
- computation of the sort key and keys' comparison functions


(*) :
http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/rtl/objpas/unicodedata.pas?view=markup

Ciarán Ó Duibhín

Posts: 13
Registered: 4/1/09
Re: Sorting unicode strings
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 24, 2015 11:21 AM   in response to: OUEDRAOGO Inoussa in response to: OUEDRAOGO Inoussa
Hi again,
Is there a sample Pascal program showing how to compare two strings with
your code — what to call, how to call it, etc.?
Many thanks,
Ciarán Ó Duibhín.

"OUEDRAOGO Inoussa" wrote in message news:716421 at forums dot embarcadero dot com...
Ciarán Ó Duibhín wrote:
I would appreciate some general advice on string ordering with Delphi.
I'm
looking for an alternative to s1 > s2 or CompareC, but which orders
unicode
strings according to the Unicode Collation Algorithm (
http://www.unicode.org/reports/tr10/ ), which means taking comprehensive
account of case,
accents, specials, etc.

This* is an Object Pascal-based implementation of the Unicode Collation
Algorithm, not depending on ICU or any other external library. That is the
one that is used for the FPC' native unicode string manager. Not tested
with Delphi ...
It supports :
- Incremental comparison of two strings(without computing all the sort
key),
- computation of the sort key and keys' comparison functions


(*) :
http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/rtl/objpas/unicodedata.pas?view=markup

OUEDRAOGO Inoussa

Posts: 2
Registered: 3/7/01
Re: Sorting unicode strings
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 25, 2015 11:07 AM   in response to: Ciarán Ó Duibhín in response to: Ciarán Ó Duibhín
Ciarán Ó Duibhín wrote:
Hi again,
Is there a sample Pascal program showing how to compare two strings with
your code — what to call, how to call it, etc.?
Many thanks,
Ciarán Ó Duibhín.

Here is a sample code. This code is using the Unicode DUCET collation. The usage of the incremental comparison of unicode strings and the comparison of computed keys are demonstrated.

program ucasample;
uses
  sysutils,
  unicodedata, unicodeducet;
 
function DumpKey(AKey : TUCASortKey) : string;
var
  i : Integer;
begin
  Result := '';
  for i := Low(AKey) to High(akey) do
    Result := Result + ' ' +IntToStr(AKey[i]);
end;
 
var
  locCollation : PUCA_DataBook;
  a, b : UnicodeString;
  ak, bk : TUCASortKey;
begin
  WriteLn('Unicode Collation Algorithm Sample',sLineBreak);
  locCollation := FindCollation(0);
  if (locCollation = nil) then begin
    WriteLn('No Collation Found.');
    Halt(1);
  end;
 
  a := 'ABC';
  b := 'AYZ';
  Writeln(Format('  IncrementalCompareString("%s", "%s") = %d',[a,b,IncrementalCompareString(a,b,locCollation)]));
  WriteLn;
 
  ak := ComputeSortKey(a,locCollation);
  bk := ComputeSortKey(b,locCollation);
  Writeln(Format('  ComputeSortKey("%s") = %s',[a,DumpKey(ak)]));
  Writeln(Format('  ComputeSortKey("%s") = %s',[b,DumpKey(bk)]));
  WriteLn(Format('  CompareSortKey("%s", "%s") = %d',[a,b,CompareSortKey(ak,bk)]));
end.   
Ciarán Ó Duibhín

Posts: 13
Registered: 4/1/09
Re: Sorting unicode strings
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 25, 2015 2:38 PM   in response to: OUEDRAOGO Inoussa in response to: OUEDRAOGO Inoussa
Thank you very much,
Ciarán.

"OUEDRAOGO Inoussa" wrote in message news:718661 at forums dot embarcadero dot com...
Ciarán Ó Duibhín wrote:
Hi again,
Is there a sample Pascal program showing how to compare two strings with
your code — what to call, how to call it, etc.?
Many thanks,
Ciarán Ó Duibhín.

Here is a sample code. This code is using the Unicode DUCET collation. The
usage of the incremental comparison of unicode strings and the comparison
of computed keys are demonstrated.

program ucasample;
uses
 sysutils,
 unicodedata, unicodeducet;
 
function DumpKey(AKey : TUCASortKey) : string;
var
 i : Integer;
begin
 Result := '';
 for i := Low(AKey) to High(akey) do
   Result := Result + ' ' +IntToStr(AKey[i]);
end;
 
var
 locCollation : PUCA_DataBook;
 a, b : UnicodeString;
 ak, bk : TUCASortKey;
begin
 WriteLn('Unicode Collation Algorithm Sample',sLineBreak);
 locCollation := FindCollation(0);
 if (locCollation = nil) then begin
   WriteLn('No Collation Found.');
   Halt(1);
 end;
 
 a := 'ABC';
 b := 'AYZ';
 Writeln(Format('  IncrementalCompareString("%s", "%s") = 
%d',[a,b,IncrementalCompareString(a,b,locCollation)]));
 WriteLn;
 
 ak := ComputeSortKey(a,locCollation);
 bk := ComputeSortKey(b,locCollation);
 Writeln(Format('  ComputeSortKey("%s") = %s',[a,DumpKey(ak)]));
 Writeln(Format('  ComputeSortKey("%s") = %s',[b,DumpKey(bk)]));
 WriteLn(Format('  CompareSortKey("%s", "%s") = 
%d',[a,b,CompareSortKey(ak,bk)]));
end.
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02