Watch, Follow, &
Connect with Us

Welcome, Guest
Guest Settings
Help

Thread: Overwriting the heap?



Permlink Replies: 13 - Last Post: May 2, 2017 7:02 AM Last Post By: Olivier Sannier Threads: [ Previous | Next ]
Mike Meier

Posts: 12
Registered: 5/25/05
Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 9, 2017 9:12 AM
I've recently started encountering problems where my program generates errors at seemingly random times. It may run for an hour or 100 hours, completing 10's of thousands and often more loops through the main part of the program. This is a desktop app that is not receiving input from external sources. It's running a loop simulating a laboratory test. Threads are involved.

The program runs on an OnTimer loop, checking this and that and when it is time starting a number of threads. I suspected the threads, that I might be starting one again before it had finished, but I set a flag when they start and clear it when they end and use this info to determine if it is safe to start the thread again.

I've peppered my code with messages to help locate the source of the problem but it seems that source keeps moving. The best I can figure is I have overwritten memory that is already being used. I have EurekaLog running but it has not been able to help.

So far the errors do not make sense. Here's an example.

In the main loop I check for and process errors.

// Handle other errors.
ProcMsg:='Checking for errors.';
if ErrorEvents.Count > 0 then ErrorHandleEvents;

In my error handler routine I create a form to report the errors.

// Create and display the error message dialog form.
if FormExists(frmError_Dialog) then
begin
frmError_Dialog.BringToFront;
frmError_Dialog.RefreshErrors;
end
else
begin
frmError_Dialog:=TksError_DialogFrm.Create(self);
frmError_Dialog.Show;
end;

Normally this works fine, but now, at random, when the cascade starts, somehow I get an error saying this form already exists.

When I enable memory leak checking in EurekaLog I get a lot of multi free thread leak errors, but these seem to be occurring after the start of the problem. These leaks are being reported in places where I am neither creating nor freeing objects. I have gone through my code to make sure every Free call is in either a try/finally/end block or in a form or component's Destroy procedure.

Again, the best I can figure is I have overwritten memory that is already being used. Is there a tool or technique folks here know of that can help?

Thanks,
Mike
Alex Belo

Posts: 526
Registered: 10/8/06
Re: Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 10, 2017 10:32 AM   in response to: Mike Meier in response to: Mike Meier
Mike Meier wrote:

Again, the best I can figure is I have overwritten memory that is
already being used. Is there a tool or technique folks here know of
that can help?

madExcept has option "instant crash on buffer overrun/underrun" but it
requires more memory: special memory manager places requested memory
blocks at the end/begin of memory pages and makes next/previous page
unavailable for writing. As a result any attempt to write too many
bytes will raise instant AV.

--
Alex
Mike Meier

Posts: 12
Registered: 5/25/05
Re: Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 12, 2017 9:21 AM   in response to: Alex Belo in response to: Alex Belo
Alex Belo wrote:
Mike Meier wrote:

Again, the best I can figure is I have overwritten memory that is
already being used. Is there a tool or technique folks here know of
that can help?

madExcept has option "instant crash on buffer overrun/underrun" but it
requires more memory: special memory manager places requested memory
blocks at the end/begin of memory pages and makes next/previous page
unavailable for writing. As a result any attempt to write too many
bytes will raise instant AV.

--
Alex

Thanks, Alex. I'll look into it.

So far no progress. It takes hours for an error to occur. They are always access violations and in two builds the AV address points to the Windows procedure CreateEvent (Matching AV address to those in my map file). Odd.
Alex Belo

Posts: 526
Registered: 10/8/06
Re: Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 12, 2017 10:11 AM   in response to: Mike Meier in response to: Mike Meier
Mike Meier wrote:

AV address points to the Windows procedure CreateEvent (Matching AV
address to those in my map file). Odd.

BTW madExcept has good stack tracer (much better than in IDE).

What version do you use? In old versions there was bug in TEvent
conctrustor; it was fixed at one time (I don't know in what version).
The same bug still presents in JCL (TJclEvent class).

--
Alex
Roy Lambert

Posts: 931
Registered: 10/21/99
Re: Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 13, 2017 4:10 AM   in response to: Mike Meier in response to: Mike Meier
Mike

I hate these sort of things. Whilst its a wild chance I thought I'd share this.

I had a similar problem - an application that just ran on my PC was fine for ages but if left on overnight would throw an AV when I moved the mouse cursor and the screen powered back on. The PC was a laptop. I never did understand just what the root cause was but I ended up setting a flag to determine that the screen was off and adjust actions accordingly

If you think it may be any help I'm happy to to posr code.

Roy Lambert

Olivier Sannier

Posts: 412
Registered: 8/26/01
Re: Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 13, 2017 4:55 AM   in response to: Mike Meier in response to: Mike Meier
You mention threads in your explanation.
Are any of those threads accessing GUI elements?
If yes, are these access "synchronized" to the main thread?

If not, then you have a valid suspect.
Mike Meier

Posts: 12
Registered: 5/25/05
Re: Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 19, 2017 5:05 PM   in response to: Olivier Sannier in response to: Olivier Sannier
Olivier Sannier wrote:
You mention threads in your explanation.
Are any of those threads accessing GUI elements?
If yes, are these access "synchronized" to the main thread?

If not, then you have a valid suspect.

None of my threads, and I can have as many as 14 running at the same time, do this. Unless I missed something. ;)

I racked down the multi free memory leaks to how I used a TStringList in my error reporting routine. It seems TStringLists are not subject to range checking.

I've run this program in a number of configurations, with and without certain options turned on and off. The threads handle I/O with sensors and external controllers. Meanwhile the main program does basically three things, execute the test procedure, log the data, and plot the data. I've eliminated plotting as a problem. I strongly suspect data logging at this point.

I'm using dynamic arrays and I resize them (setlength) once they are 90% full. I think that may be where the problem is, even though I've been doing this exact same thing in this evolving program for years without any problem. The program is pushing 300k lines now, so maybe the overall size of the code is becoming an issue. I understand that dynamic arrays can be moved if the block of memory it is currently using is not large enough to grow more. Does it sound right that an AV error might occur in these cases?

Here's the abbreviated code from where I've gotten errors recently.

function TMainForm.Process_DataLog(var P:TTest_Project):boolean;
const ProcName='Process_DataLog';
var ProcMsg: string;
var i,Daq_Size:integer;
 
  function GetLogValue(Idx:integer):single;
  // Get the value for channel Idx from the appropriate data source.
  var N:integer;
 
    function IndexInRange(Min,Max:integer):boolean;
    begin
      Result:= (Idx > Min) and (Idx <= Max);
    end;
 
  begin
    case DaqProject.Logging.Channel[Idx].Source of
      dsMFC : ...
      dsMFM : ...
      dsCNi16 : ...
      dsHumidifier : ...
      dsThermocouple : ...
    end;
  end;
 
  procedure LogTheTimeAndTemperature;
  begin
    ...
  end;
 
  function CountStr:string;
  begin
    Result:='('+inttostr(P.Logging.Count)+' of '+inttostr(Daq_Size)+').'
   end;
 
begin
 
  Result:=true;
 
  try
    // Are the daq time and values arrays filling up?  If so, redimension them.
    ProcMsg:='Checking data array size.';
    Daq_Size:=  High(P.ProcessData.Data[0].Value);
    
    if P.Logging.Count >= Round((1-Daq_SizeMargin) * Daq_Size) then
    begin
      // New array size.
      Daq_Size:= Daq_Size+ Daq_SizeIncrement;
      ProcMsg:='Changing the data array size to '+inttostr(Daq_Size)+'.';
      Process_UpdateMemo(ProcName+': storage increased to '+inttostr(Daq_Size));
 
      // Time Data.
      ProcMsg:='Changing the time-data array size to '+inttostr(Daq_Size)+'.';
      SetLength(P.ProcessData.TimeData,Daq_Size);
 
      // Logged channel data.
      ProcMsg:='Changing the process_values data array sizes to '+inttostr(Daq_Size)+'.';
      for i:=0 to P.Logging.Channels do
      begin
        ProcMsg:='Changing process-values data array '+inttostr(i)+' size to '+inttostr(Daq_Size)+'.';
        SetLength(P.ProcessData.Data[i].Value,Daq_Size);
      end;
    end;
 
    // Increment the number of daq data points logged
    ProcMsg:='Incrementing the logging count to '+CountStr;
    inc(P.Logging.Count);
 
    // Note the time and temperature
    ProcMsg:='Logging the time and temperature '+CountStr;
    LogTheTimeAndTemperature;
 
    // Note the daq time in the data array
    ProcMsg:='Logging the logging elapsed time '+CountStr;
    P.ProcessData.Data[0].Value[P.Logging.Count]:=P.Logging.ElapsedTimeSec;
 
    // Log all the other channels. Note that the Channel array contains entries
    // for all possible data sources, even those not being logged. The
    // ProcessData array will store only the readings from logged channels.
    ProcMsg:='Logging the channel readings '+CountStr;
    for i:=1 to DaqChannelCountMax do
      if P.Logging.Channel[i].Log then
        P.ProcessData.Data[P.Logging.Channel[i].Ch].Value[P.Logging.Count]:=GetLogValue(i);
 
    if FormExists(ksData_View) then ksData_View.Data_Update(P.Logging.Count,0,0);
 
  except
    on E:Exception do
    begin
      Result:=false;
      inc(ErrorEvents.Count);
      ErrorEvents.Text.Add(ProcName+': '+E.Message);
      ErrorEvents.Text.Add(ProcName+': '+ProcMsg);
      ErrorEvents.Text.Add(ProcName+': An error occurred while logging a new process data point.');
    end;
  end;
end;


It may take 2 to 15 hours for this error to occur but the last couple of times it involved the setlength operations.

Mike

Edited by: Mike Meier on Apr 19, 2017 5:06 PM
Olivier Sannier

Posts: 412
Registered: 8/26/01
Re: Overwriting the heap? [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 20, 2017 12:30 AM   in response to: Mike Meier in response to: Mike Meier
Mike Meier wrote:
Olivier Sannier wrote:
You mention threads in your explanation.
Are any of those threads accessing GUI elements?
If yes, are these access "synchronized" to the main thread?

If not, then you have a valid suspect.

None of my threads, and I can have as many as 14 running at the same time, do this. Unless I missed something. ;)

Fair enough.

I racked down the multi free memory leaks to how I used a TStringList in my error reporting routine. It seems TStringLists are not subject to range checking.

Well, they are, but they are not threadsafe. This means that if you
add/remove items in it from different threads, weird things can happen.

I'm using dynamic arrays and I resize them (setlength) once they are 90% full. I think that may be where the problem is, even though I've been doing this exact same thing in this evolving program for years without any problem. The program is pushing 300k lines now, so maybe the overall size of the code is becoming an issue. I understand that dynamic arrays can be moved if the block of memory it is currently using is not large enough to grow more. Does it sound right that an AV error might occur in these c
ases?

SetLength definitely isn't threadsafe especially when moving the array.
I mean, one thread gets the "pointer" to the array, then another decides
to make it grow and does it, then the first thread writes to the memory
which is now invalid.

Why don't you use TList<T> or TObjectList<T> instead of direct arrays?
This would alleviate you from handling the size issue.
It is not thread safe though, so you would still need to protect
accesses to it.
Generally, here is what I do:

TMyList = class(TList<TMyRecord>)
public
procedure Lock;
procedure Unlock;
end;

procedure TMyList.Lock;
begin
TMonitor.Enter(Self);
end;

procedure TMyList.Unlock;
begin
TMonitor.Exit(Self);
end;

Then I call it this way:

MyList.Lock;
try
// work with the list
finally
MyList.Unlock;
end;

If every access is protected as such, you can't have memory overwrites
from multiple threads.
However, you may get delays if your "display thread" is locking the list
for too long.
Mike Meier

Posts: 12
Registered: 5/25/05
Re: Overwriting the heap? [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 20, 2017 8:10 AM   in response to: Olivier Sannier in response to: Olivier Sannier
Olivier Sannier wrote:
Mike Meier wrote:
Olivier Sannier wrote:
You mention threads in your explanation.
Are any of those threads accessing GUI elements?
If yes, are these access "synchronized" to the main thread?

If not, then you have a valid suspect.

None of my threads, and I can have as many as 14 running at the same time, do this. Unless I missed something. ;)

Fair enough.

I racked down the multi free memory leaks to how I used a TStringList in my error reporting routine. It seems TStringLists are not subject to range checking.

Well, they are, but they are not threadsafe. This means that if you
add/remove items in it from different threads, weird things can happen.

I'm using dynamic arrays and I resize them (setlength) once they are 90% full. I think that may be where the problem is, even though I've been doing this exact same thing in this evolving program for years without any problem. The program is pushing 300k lines now, so maybe the overall size of the code is becoming an issue. I understand that dynamic arrays can be moved if the block of memory it is currently using is not large enough to grow more. Does it sound right that an AV error might occur in these c
ases?

SetLength definitely isn't threadsafe especially when moving the array.
I mean, one thread gets the "pointer" to the array, then another decides
to make it grow and does it, then the first thread writes to the memory
which is now invalid.

Why don't you use TList<T> or TObjectList<T> instead of direct arrays?
This would alleviate you from handling the size issue.
It is not thread safe though, so you would still need to protect
accesses to it.
Generally, here is what I do:

TMyList = class(TList<TMyRecord>)
public
procedure Lock;
procedure Unlock;
end;

procedure TMyList.Lock;
begin
TMonitor.Enter(Self);
end;

procedure TMyList.Unlock;
begin
TMonitor.Exit(Self);
end;

Then I call it this way:

MyList.Lock;
try
// work with the list
finally
MyList.Unlock;
end;

If every access is protected as such, you can't have memory overwrites
from multiple threads.
However, you may get delays if your "display thread" is locking the list
for too long.

Thanks. I've read about this method. I hope I don't have to do that. So many arrays...

My threads handle I/O with external hardware and don't access the arrays in question. When the threads are done they update variables indicating the current temperature, etc. The main program, when logging and plotting the data, simply grabs those values. Basically, the program has two data structures, one for the hardware and feature settings, and a separate one for the experiments. Other than grabbing those temperatures, etc, they don't interact.

Last night I got an AV error logging the time/temperature. It doesn't make sense. All that is happening there is time and temperature are being stored in an array. And before that I got an AV error in this code:

function TMainForm.IsProcessTest(P:TTest_Project):boolean;
begin
  Result:=
      (P.Test.MethodNr=MethodDOC) or
      (P.Test.MethodNr=MethodLST) or
      (P.Test.MethodNr=MethodGST) or
      (P.Test.MethodNr=MethodOSC) or
      (P.Test.MethodNr=MethodGPT);
end;


All this is doing is comparing the value of a variable to several constants. IsProcessTest is not called in the threads.

"P" is the data structure for the experiment I mentioned. "P" must be getting corrupted.

Mike

Edited by: Mike Meier on Apr 20, 2017 8:11 AM
Olivier Sannier

Posts: 412
Registered: 8/26/01
Re: Overwriting the heap? [Edit] [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 20, 2017 8:47 AM   in response to: Mike Meier in response to: Mike Meier
Mike Meier wrote:

My threads handle I/O with external hardware and don't access the arrays in question. When the threads are done they update variables indicating the current temperature, etc. The main program, when logging and plotting the data, simply grabs those values.

How are those values stored? Are they placed in properties of an item of
one the arrays and the item type is record instead of TObject?

Basically, the program has two data structures, one for the hardware and feature settings, and a separate one for the experiments. Other than grabbing those temperatures, etc, they don't interact.

Last night I got an AV error logging the time/temperature. It doesn't make sense. All that is happening there is time and temperature are being stored in an array. And before that I got an AV error in this code:

function TMainForm.IsProcessTest(P:TTest_Project):boolean;
begin
  Result:=
      (P.Test.MethodNr=MethodDOC) or
      (P.Test.MethodNr=MethodLST) or
      (P.Test.MethodNr=MethodGST) or
      (P.Test.MethodNr=MethodOSC) or
      (P.Test.MethodNr=MethodGPT);
end;


All this is doing is comparing the value of a variable to several constants. IsProcessTest is not called in the threads.

Here, I'd use an enumeration and the "in" operator, much easier to read
and safer to use.

"P" is the data structure for the experiment I mentioned. "P" must be getting corrupted.

What is the type of P? I mean, what is TTest_Project?

If need be, I'm sure some of us would gladly (ie, for a fee...) have a
look at your entire source code and give you a complete review of the
situation.
Brian Hamilton ...

Posts: 513
Registered: 10/14/04
Re: Overwriting the heap? [Edit] [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 29, 2017 10:37 PM   in response to: Olivier Sannier in response to: Olivier Sannier
you mention lots of code
are any procedures very long now?
if so try splitting them into smaller chunks of code?
Mike Meier

Posts: 12
Registered: 5/25/05
Re: Overwriting the heap? [Edit] [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: May 1, 2017 10:12 AM   in response to: Brian Hamilton ... in response to: Brian Hamilton ...
Brian Hamilton Hamilton wrote:
you mention lots of code
are any procedures very long now?
if so try splitting them into smaller chunks of code?

Thanks. I've tried keeping my functions and procedures small enough to manage, following the rule of thumb that a procedure longer than a page is too long.
Mike Meier

Posts: 12
Registered: 5/25/05
Re: Overwriting the heap? [Edit] [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: May 1, 2017 10:29 AM   in response to: Olivier Sannier in response to: Olivier Sannier
Olivier Sannier wrote:
Mike Meier wrote:

My threads handle I/O with external hardware and don't access the arrays in question. When the threads are done they update variables indicating the current temperature, etc. The main program, when logging and plotting the data, simply grabs those values.

What is the type of P? I mean, what is TTest_Project?

If need be, I'm sure some of us would gladly (ie, for a fee...) have a
look at your entire source code and give you a complete review of the
situation.


Thanks. I was thinking I might need to have fresh eye look this over, but given how much code there is, and not thinking my client would like that idea...

TTest_Project is a record that contains records that define every aspect of the test procedure. It also includes a number of dynamic arrays.

Test can run from a few hours to well over 100 hours. During the longer test the program will log close to 200,000 readings from 18 to 25 channels. I had never considered using TLists, figuring it would require more resources. Some of the dynamic arrays are for records and these arrays are small, usually under 100. TList might work there.

I finally did solve this and thought I should close out this thread with the solution. I ran tests where no data were logged and that worked so I grew more and more concerned that passing P as an argument was causing my trouble, even though this had been working for years, so I took that out of the procedures where I could and which were giving me problems. That did not help. I still suspected SetLength() was at the root of this so I set the initial array sizes large enough so they would never be resized. That worked, but it had to be something else because I had been doing this for years in three versions of this software, all which are currently running. So I did something of an audit to make sure TTest_Project was not referenced in the threads and I found one, so I moved it, and the problem seems to have been solved. Testing on my PC for over 200 hours produced no errors. The software is now running live experiments without issues.

Sorry for all the trouble. I broke a simple rule and had a devil of a time finding that needle in this 300k line haystack.

Mike
Olivier Sannier

Posts: 412
Registered: 8/26/01
Re: Overwriting the heap? [Edit] [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: May 2, 2017 7:02 AM   in response to: Mike Meier in response to: Mike Meier
Mike Meier wrote:

That worked, but it had to be something else because I had been doing this for years in three versions of this software, all which are currently running. So I did something of an audit to make sure TTest_Project was not referenced in the threads and I found one, so I moved it, and the problem seems to have been solved. Testing on my PC for over 200 hours produced no errors. The software is now running live experiments without issues.

So basically it was indeed what we described, where a thread accesses an
element that has already been freed by another one.

Sorry for all the trouble. I broke a simple rule and had a devil of a time finding that needle in this 300k line haystack.

No worries, it's not always easy to "dive" into such things.
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02