Watch, Follow, &
Connect with Us

For forums, blogs and more please visit our
Developer Tools Community.


Welcome, Guest
Guest Settings
Help

Thread: [ Delphi XE7 - Update 1 ] - Slow to load numbers into a tDictionary


This question is not answered. Helpful answers available: 2. Correct answers available: 1.


Permlink Replies: 8 - Last Post: Mar 29, 2016 9:12 AM Last Post By: Markus Humm
Simon Horup

Posts: 18
Registered: 10/5/01
[ Delphi XE7 - Update 1 ] - Slow to load numbers into a tDictionary  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 27, 2016 3:35 AM
I'm trying to load two lists, each of 500.000 unique integer numbers, into a dictionary ( tDictionary ). The dictionary gets cleared between the list loads, so it never contains more than 500.000 numbers.

For some reason, it takes ~8 seconds, on my computer, to load the first list of 500.000 numbers into the dictionary, whereas it only takes 0.1 second to load the second list of 500.000 numbers into the dictionary.

Anyone know why that is ?

These are the lists I use :

https://www.newsleecher.com/1.txt
and
https://www.newsleecher.com/2.txt

Both lists compressed :
https://www.newsleecher.com/lists.rar


This is the code I use to load the lists :

procedure TForm1.Button1Click(Sender: TObject);
var i   : integer;
    i64 : int64;
    s   : string;
    fi  : textFile;
    ca  : cardinal;
    dic : tDictionary<int64, pointer>;
    lst : tList<int64>;
begin
 
  dic := tDictionary<int64, pointer>.create;
    lst := tList<int64>.create;
 
      for i := 1 to 2 do begin
 
        lst.clear;
        assignFile(fi, i.toString + '.txt');
          reset(fi);
          while not eof(fi) do begin
            readLn(fi, s);
            lst.add(strToInt64(s));
          end;
        closeFile(fi);
 
        dic.clear;
        ca := getTickCount; // getTickCount is accurate enough for this test
          for i64 in lst do
            dic.addOrSetValue(i64, nil);
        ca := getTickCount - ca;
        showmessage(dic.count.toString + #13#10#13#10 + ' ~' + ca.toString + 'ms.');
 
      end;
 
    lst.free;
  dic.free;
end;


Regards,
Simon
Markus Humm

Posts: 5,113
Registered: 11/9/03
Re: [ Delphi XE7 - Update 1 ] - Slow to load numbers into a tDictionary  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 27, 2016 9:03 AM   in response to: Simon Horup in response to: Simon Horup
Am 27.03.2016 um 12:35 schrieb simon horup:
I'm trying to load two lists, each of 500.000 unique integer numbers, into a dictionary ( tDictionary ). The dictionary gets cleared between the list loads, so it never contains more than 500.000 numbers.

For some reason, it takes ~8 seconds, on my computer, to load the first list of 500.000 numbers into the dictionary, whereas it only takes 0.1 second to load the second list of 500.000 numbers into the dictionary.

Anyone know why that is ?

These are the lists I use :

https://www.newsleecher.com/1.txt
and
https://www.newsleecher.com/2.txt

Both lists compressed :
https://www.newsleecher.com/lists.rar

Hello,

I guess in the first run the dictionary has to allocate all that ememory
and since your code didn't looked like you told it upfront how many
entries you're going to have it is allocating it bit by bit.

Afaik the TDictionary should have some capacity property or even a
capacity parameter in one of the constructor variants. Set it to 500000
and check speed again.

I guess you'll be amazed then ;-)
(at least for TStringList, TObjectList<T> etc. that exists)

Greetings

Markus
Simon Horup

Posts: 18
Registered: 10/5/01
Re: [ Delphi XE7 - Update 1 ] - Slow to load numbers into a tDictionary  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 27, 2016 9:37 AM   in response to: Markus Humm in response to: Markus Humm
Hi Mark,

Thanks for the response.

I tried replacing
dic := tDictionary<int64, pointer>.create;
with
dic := tDictionary<int64, pointer>.create(500000);


No significant difference in performance.

I also tried to switch the load order of the lists, but that didn't change anything either. The first list ( 1.txt ) still takes around 8s to load, even though it gets loaded after the second list.

Markus Humm wrote:
Am 27.03.2016 um 12:35 schrieb simon horup:
I'm trying to load two lists, each of 500.000 unique integer numbers, into a dictionary ( tDictionary ). The dictionary gets cleared between the list loads, so it never contains more than 500.000 numbers.

For some reason, it takes ~8 seconds, on my computer, to load the first list of 500.000 numbers into the dictionary, whereas it only takes 0.1 second to load the second list of 500.000 numbers into the dictionary.

Anyone know why that is ?

These are the lists I use :

https://www.newsleecher.com/1.txt
and
https://www.newsleecher.com/2.txt

Both lists compressed :
https://www.newsleecher.com/lists.rar

Hello,

I guess in the first run the dictionary has to allocate all that ememory
and since your code didn't looked like you told it upfront how many
entries you're going to have it is allocating it bit by bit.

Afaik the TDictionary should have some capacity property or even a
capacity parameter in one of the constructor variants. Set it to 500000
and check speed again.

I guess you'll be amazed then ;-)
(at least for TStringList, TObjectList<T> etc. that exists)

Greetings

Markus
Peter Below

Posts: 1,227
Registered: 12/16/99
Re: [ Delphi XE7 - Update 1 ] - Slow to load numbers into a tDictionary [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 28, 2016 3:31 AM   in response to: Simon Horup in response to: Simon Horup
simon horup wrote:

Hi Mark,

Thanks for the response.

I tried replacing
dic := tDictionary<int64,
pointer>.create;
with
dic := tDictionary<int64,
pointer>.create(500000);


No significant difference in performance.

I also tried to switch the load order of the lists, but that didn't
change anything either. The first list ( 1.txt ) still takes around
8s to load, even though it gets loaded after the second list.

Your first list may just be a pathological case where practically every
number you add causes a hash collision, which requires a lot more work
for the code to find a free slot in the internal list of items. If this
is the case looking up keys from the first list should also be
significantly slower than looking up keys from the second list.

The performance of hash tables depends a lot on the distribution of the
keys you add to it, on the hash table implementation, and the hashing
algorithm used.

--
Peter Below
TeamB
Simon Horup

Posts: 18
Registered: 10/5/01
Re: [ Delphi XE7 - Update 1 ] - Slow to load numbers into a tDictionary [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 28, 2016 5:07 AM   in response to: Peter Below in response to: Peter Below
Yes, that's probably it then. Do you know if there's a way to change the hashing algorithm used by "tDictionary<int64, >" ?

Since all the numbers in the list are 32 bit integers, I tried to replace the tDictionary<int64, pointer> with a tDictionary<integer, pointer> object. That resulted in a 100x (!!) performance improvement when handling the 1.txt list. Instead of 8s, it only took .08s to load the numbers into the dictionary. I can't be sure that all numbers stay within the 32 bit boundary though, so it's not a working solution for me unfortunately.
Olivier Sannier

Posts: 424
Registered: 8/26/01
Re: [ Delphi XE7 - Update 1 ] - Slow to load numbers into a tDictionary [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 29, 2016 12:32 AM   in response to: Simon Horup in response to: Simon Horup
simon horup wrote:
Yes, that's probably it then. Do you know if there's a way to change the hashing algorithm used by "tDictionary<int64, >" ?

I believe there is a virtual function just for that.
Peter Below

Posts: 1,227
Registered: 12/16/99
Re: [ Delphi XE7 - Update 1 ] - Slow to load numbers into a tDictionary [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 29, 2016 2:15 AM   in response to: Simon Horup in response to: Simon Horup
simon horup wrote:

Yes, that's probably it then. Do you know if there's a way to change
the hashing algorithm used by "tDictionary<int64, >" ?

Since all the numbers in the list are 32 bit integers, I tried to
replace the tDictionary<int64, pointer> with a
tDictionary<integer, pointer> object. That resulted in a 100x (!!)
performance improvement when handling the 1.txt list. Instead of 8s,
it only took .08s to load the numbers into the dictionary. I can't be
sure that all numbers stay within the 32 bit boundary though, so it's
not a working solution for me unfortunately.

You have to create the dictionary with a IEqualityComparer<int64>
implementation instead of relying on the default comparer the class
will use.

If you look at the source for TDictionary<TKey,TValue>.Hash (which is
not virtual) it relies on the comparer to get the hashed key:

function TDictionary<TKey,TValue>.Hash(const Key: TKey): Integer;
const
PositiveMask = not Integer($80000000);
begin
// Double-Abs to avoid -MaxInt and MinInt problems.
// Not using compiler-Abs because we must get a positive integer;
// for compiler, Abs(Low(Integer)) is a null op.
Result := PositiveMask and ((PositiveMask and
FComparer.GetHashCode(Key)) + 1);
end;

The construction of the dictionary would then look something like this:

FDictionary := TDictionary<int64, pointer>.Create(
750000,
TEqualityComparer<int64>.Construct(
function (const L, R: int64): boolean
begin
result := L = R;
end;
function (const Value: int64): integer
begin
result := value
// truncate to 32 bits, switch off range and overflow checks!
end
));

The default comparer uses this

function GetHashCode_I8(Inst: Pointer; const Value: Int64): Integer;
begin
Result := THashBobJenkins.GetHashValue(Value, SizeOf(Value), 0);
end;

THashBobJenkins can be found in the system.hash unit. It implements a
generic hash function with quite a bit of computation going on. The
distribution it creates seems to be suboptimal for your data.

--
Peter Below
TeamB

Olivier Sannier

Posts: 424
Registered: 8/26/01
Re: [ Delphi XE7 - Update 1 ] - Slow to load numbers into a tDictionary [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 29, 2016 4:20 AM   in response to: Peter Below in response to: Peter Below
Peter Below wrote:

THashBobJenkins can be found in the system.hash unit. It implements a
generic hash function with quite a bit of computation going on. The
distribution it creates seems to be suboptimal for your data.

Well, it is suboptimal for quite a lot of data then ;-)
And it is very slow as well.
We switched over to MurmurHash2 and got very good results.
Markus Humm

Posts: 5,113
Registered: 11/9/03
Re: [ Delphi XE7 - Update 1 ] - Slow to load numbers into a tDictionary [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 29, 2016 9:12 AM   in response to: Olivier Sannier in response to: Olivier Sannier
Am 29.03.2016 um 13:20 schrieb Olivier Sannier:
Peter Below wrote:

THashBobJenkins can be found in the system.hash unit. It implements a
generic hash function with quite a bit of computation going on. The
distribution it creates seems to be suboptimal for your data.

Well, it is suboptimal for quite a lot of data then ;-)
And it is very slow as well.
We switched over to MurmurHash2 and got very good results.

Hello,

if this is a general thing it might warrant for a feature reques QP
issue to be created. Maybe EMBT could provide several default hash
algorithms which could be selected via some optional constructor
parameter and if not the current default selection will be used.

Greetings

Markus
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02