Watch, Follow, &
Connect with Us

For forums, blogs and more please visit our
Developer Tools Community.


Welcome, Guest
Guest Settings
Help

Thread: Probable issue in TThreadedQueue (due to TMonitor) under Delphi Tokyo


This question is not answered. Helpful answers available: 2. Correct answers available: 1.


Permlink Replies: 1 - Last Post: Mar 14, 2018 12:15 AM Last Post By: David Alcelay
David Alcelay

Posts: 11
Registered: 11/13/01
Probable issue in TThreadedQueue (due to TMonitor) under Delphi Tokyo  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 25, 2018 2:58 PM
Hi,
we have some new applications that make a heavy use of the TThreadQueue class, with many threads that push data, and many other threads that pop the data. We configure one time for the push timeout, and another for the pop.
When we make long tests with the applications, we notice that the cpu goes up for the applications with time. With no recognized pattern, after 'x' time, some of the threads detect the problem, after another 'y' time another threads detect the problem, and so on...

After debugging the issue, we see that the threads that are making the pop are responsible of the issue (later we see that the issue also happends with the push timeout). Without existing any event in the push side, in the other side the popitem method leaves the call immediatelly without applying the timeout (the result flag is timeout, but the time expended in the call is 0 miliseconds, but the parameter to the function is correct, 10 miliseconds for the pop for example), timeout that is making the 'sleep' work to the thread when there is no work to do, so with no 'sleep' the thread goes crazy :(

Debugging inside the TThreadedQueue class, the TMonitor is used intensively, and in the popitem method we see that it's used for the blocking call that waits for the incoming push events or for the timeout:

if not TMonitor.Wait(FQueueNotEmpty, FQueueLock, FPopTimeout) then...


Looking inside the TMonitor wait method, we see this code (this is the final overloaded method that is called):

function TMonitor.Wait(ALock: PMonitor; Timeout: Cardinal): Boolean;
var
  RecursionCount: Integer;
  WaitingThread: TWaitingThread;
begin
  WaitingThread.Next := nil;
  WaitingThread.Thread := ALock.CheckOwningThread;
  // This event should probably be cached someplace.
  // Probably not on the instance since this is a per-thread-per-instance resource
  WaitingThread.WaitEvent := MonitorSupport.NewWaitObject;
  try
    // Save the current recursion count for later
    RecursionCount := ALock.FRecursionCount;
    // Add the current thread to the waiting queue
    QueueWaiter(WaitingThread);
    // Set it back to almost released so the next Exit call actually drops the lock
    ALock.FRecursionCount := 1;
    // Now complete the exit and signal any waiters
    ALock.Exit;
    // Get in line for someone to do a Pulse or PulseAll
    Result := MonitorSupport.WaitOrSignalObject(nil, WaitingThread.WaitEvent, Timeout) = WAIT_OBJECT_0;
    // Got to get the lock back and block waiting for it.
    ALock.Enter(INFINITE);
    // Remove any dangling waiters from the list
    RemoveWaiter(WaitingThread);
    // Lets restore the recursion to return to the proper nesting level
    ALock.FRecursionCount := RecursionCount;
  finally
    MonitorSupport.FreeWaitObject(WaitingThread.WaitEvent);
  end;
end;


Debugging the code, the best I have being able to see is that the problem starts (or finishes, probably it's a consecuence of another thing that is not being released correctly?) in this line:
  WaitingThread.WaitEvent := MonitorSupport.NewWaitObject;


the problem is that this handle is nil when the issue starts happening, and that smells wrong in Denmark because this is the handle to be used for the waiting timeout in next lines of code.

Then, I prepared a little console project because first we must be sure that we don't introduce some kind of noise :) , so the test app more or less makes the same tasks (regarding the use of the TThreadedQueue) that the real application, but only has threads to make pushes at a certain rate, and threads for the pops. We configure a timeout for the push and another for the pop.

The issue appears, you must wait a time but there it is.
The issue looks to happend by groups of threads, suddenly 4 threads suffer the issue, the time goes on and another threads suffer the problem...I can't find a patthern.

The source for the test application has a 'define' that allows you to use the TThreadedQueue that comes with the compiler or another class that inherits from it that we created to patch the Grow method (we think also that there is a bug in the grow method if you use dinamically, and we patched it as we could :) , I uploaded the probable bug to Embarcadero https://quality.embarcadero.com/browse/RSP-19874) and adds 2 methods to use dynamic timeouts for the popitem and pushitem methods. In the test we don't use any of the new functions.

I also uploaded this probable bug of the TMonitor/TThreadedQueue to Embarcadero. There is an app console project attached there to see the problem (be patient, it needs time to start happening, and more time to finish all the threads with the issue): https://quality.embarcadero.com/browse/RSP-19993

In my last test, where I took data about time and threads affected, the issue arises after almost 3 hours, where 5 threads detect the problem at the same time, the others continue and after almost 1h and half next issue arises, 3 threads at the same time,...and so on until finish allthreads

Another curiosities detected in the tests are:
1) we configure the timeout for the pop in 10 miliseconds, but if you take the time before and after the call to the popitem method, many times the time detected when the timeout happens is of 9 miliseconds
2) we configure the timeout for the push in 50 miliseconds, but if you take the time before and after the call to the pushitem method, many times the time detected when the timeout happens is of 49 miliseconds

Any other has seeing this behaviours with the TThreadedQueue class?

Edited by: David Alcelay on Feb 25, 2018 3:06 PM

Edited by: David Alcelay on Feb 25, 2018 3:13 PM

Edited by: David Alcelay on Feb 25, 2018 3:14 PM
David Alcelay

Posts: 11
Registered: 11/13/01
Re: Probable issue in TThreadedQueue (due to TMonitor) under Delphi Tokyo  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 14, 2018 12:15 AM   in response to: David Alcelay in response to: David Alcelay
David Alcelay wrote:
Hi,
we have some new applications that make a heavy use of the TThreadQueue class, with many threads that push data, and many other threads that pop the data. We configure one time for the push timeout, and another for the pop.
When we make long tests with the applications, we notice that the cpu goes up for the applications with time. With no recognized pattern, after 'x' time, some of the threads detect the problem, after another 'y' time another threads detect the problem, and so on...

After debugging the issue, we see that the threads that are making the pop are responsible of the issue (later we see that the issue also happends with the push timeout). Without existing any event in the push side, in the other side the popitem method leaves the call immediatelly without applying the timeout (the result flag is timeout, but the time expended in the call is 0 miliseconds, but the parameter to the function is correct, 10 miliseconds for the pop for example), timeout that is making the 'sleep' work to the thread when there is no work to do, so with no 'sleep' the thread goes crazy :(

Debugging inside the TThreadedQueue class, the TMonitor is used intensively, and in the popitem method we see that it's used for the blocking call that waits for the incoming push events or for the timeout:

if not TMonitor.Wait(FQueueNotEmpty, FQueueLock, FPopTimeout) then...


Looking inside the TMonitor wait method, we see this code (this is the final overloaded method that is called):

function TMonitor.Wait(ALock: PMonitor; Timeout: Cardinal): Boolean;
var
  RecursionCount: Integer;
  WaitingThread: TWaitingThread;
begin
  WaitingThread.Next := nil;
  WaitingThread.Thread := ALock.CheckOwningThread;
  // This event should probably be cached someplace.
  // Probably not on the instance since this is a per-thread-per-instance resource
  WaitingThread.WaitEvent := MonitorSupport.NewWaitObject;
  try
    // Save the current recursion count for later
    RecursionCount := ALock.FRecursionCount;
    // Add the current thread to the waiting queue
    QueueWaiter(WaitingThread);
    // Set it back to almost released so the next Exit call actually drops the lock
    ALock.FRecursionCount := 1;
    // Now complete the exit and signal any waiters
    ALock.Exit;
    // Get in line for someone to do a Pulse or PulseAll
    Result := MonitorSupport.WaitOrSignalObject(nil, WaitingThread.WaitEvent, Timeout) = WAIT_OBJECT_0;
    // Got to get the lock back and block waiting for it.
    ALock.Enter(INFINITE);
    // Remove any dangling waiters from the list
    RemoveWaiter(WaitingThread);
    // Lets restore the recursion to return to the proper nesting level
    ALock.FRecursionCount := RecursionCount;
  finally
    MonitorSupport.FreeWaitObject(WaitingThread.WaitEvent);
  end;
end;


Debugging the code, the best I have being able to see is that the problem starts (or finishes, probably it's a consecuence of another thing that is not being released correctly?) in this line:
  WaitingThread.WaitEvent := MonitorSupport.NewWaitObject;


the problem is that this handle is nil when the issue starts happening, and that smells wrong in Denmark because this is the handle to be used for the waiting timeout in next lines of code.

Then, I prepared a little console project because first we must be sure that we don't introduce some kind of noise :) , so the test app more or less makes the same tasks (regarding the use of the TThreadedQueue) that the real application, but only has threads to make pushes at a certain rate, and threads for the pops. We configure a timeout for the push and another for the pop.

The issue appears, you must wait a time but there it is.
The issue looks to happend by groups of threads, suddenly 4 threads suffer the issue, the time goes on and another threads suffer the problem...I can't find a patthern.

The source for the test application has a 'define' that allows you to use the TThreadedQueue that comes with the compiler or another class that inherits from it that we created to patch the Grow method (we think also that there is a bug in the grow method if you use dinamically, and we patched it as we could :) , I uploaded the probable bug to Embarcadero https://quality.embarcadero.com/browse/RSP-19874) and adds 2 methods to use dynamic timeouts for the popitem and pushitem methods. In the test we don't use any of the new functions.

I also uploaded this probable bug of the TMonitor/TThreadedQueue to Embarcadero. There is an app console project attached there to see the problem (be patient, it needs time to start happening, and more time to finish all the threads with the issue): https://quality.embarcadero.com/browse/RSP-19993

In my last test, where I took data about time and threads affected, the issue arises after almost 3 hours, where 5 threads detect the problem at the same time, the others continue and after almost 1h and half next issue arises, 3 threads at the same time,...and so on until finish allthreads

Another curiosities detected in the tests are:
1) we configure the timeout for the pop in 10 miliseconds, but if you take the time before and after the call to the popitem method, many times the time detected when the timeout happens is of 9 miliseconds
2) we configure the timeout for the push in 50 miliseconds, but if you take the time before and after the call to the pushitem method, many times the time detected when the timeout happens is of 49 miliseconds

Any other has seeing this behaviours with the TThreadedQueue class?

Edited by: David Alcelay on Feb 25, 2018 3:06 PM

Edited by: David Alcelay on Feb 25, 2018 3:13 PM

Edited by: David Alcelay on Feb 25, 2018 3:14 PM

Under Tokyo 10.2.3 the issue even happens faster :(
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02