Welcome, Guest
Guest Settings
Help

Thread: Overwriting the heap?



Permlink Replies: 9 - Last Post: Apr 20, 2017 8:47 AM Last Post By: Olivier Sannier Threads: [ Previous | Next ]
Mike Meier

Posts: 10
Registered: 5/25/05
Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 9, 2017 9:12 AM
I've recently started encountering problems where my program generates errors at seemingly random times. It may run for an hour or 100 hours, completing 10's of thousands and often more loops through the main part of the program. This is a desktop app that is not receiving input from external sources. It's running a loop simulating a laboratory test. Threads are involved.

The program runs on an OnTimer loop, checking this and that and when it is time starting a number of threads. I suspected the threads, that I might be starting one again before it had finished, but I set a flag when they start and clear it when they end and use this info to determine if it is safe to start the thread again.

I've peppered my code with messages to help locate the source of the problem but it seems that source keeps moving. The best I can figure is I have overwritten memory that is already being used. I have EurekaLog running but it has not been able to help.

So far the errors do not make sense. Here's an example.

In the main loop I check for and process errors.

// Handle other errors.
ProcMsg:='Checking for errors.';
if ErrorEvents.Count > 0 then ErrorHandleEvents;

In my error handler routine I create a form to report the errors.

// Create and display the error message dialog form.
if FormExists(frmError_Dialog) then
begin
frmError_Dialog.BringToFront;
frmError_Dialog.RefreshErrors;
end
else
begin
frmError_Dialog:=TksError_DialogFrm.Create(self);
frmError_Dialog.Show;
end;

Normally this works fine, but now, at random, when the cascade starts, somehow I get an error saying this form already exists.

When I enable memory leak checking in EurekaLog I get a lot of multi free thread leak errors, but these seem to be occurring after the start of the problem. These leaks are being reported in places where I am neither creating nor freeing objects. I have gone through my code to make sure every Free call is in either a try/finally/end block or in a form or component's Destroy procedure.

Again, the best I can figure is I have overwritten memory that is already being used. Is there a tool or technique folks here know of that can help?

Thanks,
Mike
Alex Belo

Posts: 471
Registered: 10/8/06
Re: Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 10, 2017 10:32 AM   in response to: Mike Meier in response to: Mike Meier
Mike Meier wrote:

Again, the best I can figure is I have overwritten memory that is
already being used. Is there a tool or technique folks here know of
that can help?

madExcept has option "instant crash on buffer overrun/underrun" but it
requires more memory: special memory manager places requested memory
blocks at the end/begin of memory pages and makes next/previous page
unavailable for writing. As a result any attempt to write too many
bytes will raise instant AV.

--
Alex
Mike Meier

Posts: 10
Registered: 5/25/05
Re: Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 12, 2017 9:21 AM   in response to: Alex Belo in response to: Alex Belo
Alex Belo wrote:
Mike Meier wrote:

Again, the best I can figure is I have overwritten memory that is
already being used. Is there a tool or technique folks here know of
that can help?

madExcept has option "instant crash on buffer overrun/underrun" but it
requires more memory: special memory manager places requested memory
blocks at the end/begin of memory pages and makes next/previous page
unavailable for writing. As a result any attempt to write too many
bytes will raise instant AV.

--
Alex

Thanks, Alex. I'll look into it.

So far no progress. It takes hours for an error to occur. They are always access violations and in two builds the AV address points to the Windows procedure CreateEvent (Matching AV address to those in my map file). Odd.
Alex Belo

Posts: 471
Registered: 10/8/06
Re: Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 12, 2017 10:11 AM   in response to: Mike Meier in response to: Mike Meier
Mike Meier wrote:

AV address points to the Windows procedure CreateEvent (Matching AV
address to those in my map file). Odd.

BTW madExcept has good stack tracer (much better than in IDE).

What version do you use? In old versions there was bug in TEvent
conctrustor; it was fixed at one time (I don't know in what version).
The same bug still presents in JCL (TJclEvent class).

--
Alex
Roy Lambert

Posts: 445
Registered: 10/21/99
Re: Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 13, 2017 4:10 AM   in response to: Mike Meier in response to: Mike Meier
Mike

I hate these sort of things. Whilst its a wild chance I thought I'd share this.

I had a similar problem - an application that just ran on my PC was fine for ages but if left on overnight would throw an AV when I moved the mouse cursor and the screen powered back on. The PC was a laptop. I never did understand just what the root cause was but I ended up setting a flag to determine that the screen was off and adjust actions accordingly

If you think it may be any help I'm happy to to posr code.

Roy Lambert

Olivier Sannier

Posts: 396
Registered: 8/26/01
Re: Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 13, 2017 4:55 AM   in response to: Mike Meier in response to: Mike Meier
You mention threads in your explanation.
Are any of those threads accessing GUI elements?
If yes, are these access "synchronized" to the main thread?

If not, then you have a valid suspect.
Mike Meier

Posts: 10
Registered: 5/25/05
Re: Overwriting the heap?
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 19, 2017 5:05 PM   in response to: Olivier Sannier in response to: Olivier Sannier
Olivier Sannier wrote:
You mention threads in your explanation.
Are any of those threads accessing GUI elements?
If yes, are these access "synchronized" to the main thread?

If not, then you have a valid suspect.

None of my threads, and I can have as many as 14 running at the same time, do this. Unless I missed something. ;)

I racked down the multi free memory leaks to how I used a TStringList in my error reporting routine. It seems TStringLists are not subject to range checking.

I've run this program in a number of configurations, with and without certain options turned on and off. The threads handle I/O with sensors and external controllers. Meanwhile the main program does basically three things, execute the test procedure, log the data, and plot the data. I've eliminated plotting as a problem. I strongly suspect data logging at this point.

I'm using dynamic arrays and I resize them (setlength) once they are 90% full. I think that may be where the problem is, even though I've been doing this exact same thing in this evolving program for years without any problem. The program is pushing 300k lines now, so maybe the overall size of the code is becoming an issue. I understand that dynamic arrays can be moved if the block of memory it is currently using is not large enough to grow more. Does it sound right that an AV error might occur in these cases?

Here's the abbreviated code from where I've gotten errors recently.

function TMainForm.Process_DataLog(var P:TTest_Project):boolean;
const ProcName='Process_DataLog';
var ProcMsg: string;
var i,Daq_Size:integer;
 
  function GetLogValue(Idx:integer):single;
  // Get the value for channel Idx from the appropriate data source.
  var N:integer;
 
    function IndexInRange(Min,Max:integer):boolean;
    begin
      Result:= (Idx > Min) and (Idx <= Max);
    end;
 
  begin
    case DaqProject.Logging.Channel[Idx].Source of
      dsMFC : ...
      dsMFM : ...
      dsCNi16 : ...
      dsHumidifier : ...
      dsThermocouple : ...
    end;
  end;
 
  procedure LogTheTimeAndTemperature;
  begin
    ...
  end;
 
  function CountStr:string;
  begin
    Result:='('+inttostr(P.Logging.Count)+' of '+inttostr(Daq_Size)+').'
   end;
 
begin
 
  Result:=true;
 
  try
    // Are the daq time and values arrays filling up?  If so, redimension them.
    ProcMsg:='Checking data array size.';
    Daq_Size:=  High(P.ProcessData.Data[0].Value);
    
    if P.Logging.Count >= Round((1-Daq_SizeMargin) * Daq_Size) then
    begin
      // New array size.
      Daq_Size:= Daq_Size+ Daq_SizeIncrement;
      ProcMsg:='Changing the data array size to '+inttostr(Daq_Size)+'.';
      Process_UpdateMemo(ProcName+': storage increased to '+inttostr(Daq_Size));
 
      // Time Data.
      ProcMsg:='Changing the time-data array size to '+inttostr(Daq_Size)+'.';
      SetLength(P.ProcessData.TimeData,Daq_Size);
 
      // Logged channel data.
      ProcMsg:='Changing the process_values data array sizes to '+inttostr(Daq_Size)+'.';
      for i:=0 to P.Logging.Channels do
      begin
        ProcMsg:='Changing process-values data array '+inttostr(i)+' size to '+inttostr(Daq_Size)+'.';
        SetLength(P.ProcessData.Data[i].Value,Daq_Size);
      end;
    end;
 
    // Increment the number of daq data points logged
    ProcMsg:='Incrementing the logging count to '+CountStr;
    inc(P.Logging.Count);
 
    // Note the time and temperature
    ProcMsg:='Logging the time and temperature '+CountStr;
    LogTheTimeAndTemperature;
 
    // Note the daq time in the data array
    ProcMsg:='Logging the logging elapsed time '+CountStr;
    P.ProcessData.Data[0].Value[P.Logging.Count]:=P.Logging.ElapsedTimeSec;
 
    // Log all the other channels. Note that the Channel array contains entries
    // for all possible data sources, even those not being logged. The
    // ProcessData array will store only the readings from logged channels.
    ProcMsg:='Logging the channel readings '+CountStr;
    for i:=1 to DaqChannelCountMax do
      if P.Logging.Channel[i].Log then
        P.ProcessData.Data[P.Logging.Channel[i].Ch].Value[P.Logging.Count]:=GetLogValue(i);
 
    if FormExists(ksData_View) then ksData_View.Data_Update(P.Logging.Count,0,0);
 
  except
    on E:Exception do
    begin
      Result:=false;
      inc(ErrorEvents.Count);
      ErrorEvents.Text.Add(ProcName+': '+E.Message);
      ErrorEvents.Text.Add(ProcName+': '+ProcMsg);
      ErrorEvents.Text.Add(ProcName+': An error occurred while logging a new process data point.');
    end;
  end;
end;


It may take 2 to 15 hours for this error to occur but the last couple of times it involved the setlength operations.

Mike

Edited by: Mike Meier on Apr 19, 2017 5:06 PM
Olivier Sannier

Posts: 396
Registered: 8/26/01
Re: Overwriting the heap? [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 20, 2017 12:30 AM   in response to: Mike Meier in response to: Mike Meier
Mike Meier wrote:
Olivier Sannier wrote:
You mention threads in your explanation.
Are any of those threads accessing GUI elements?
If yes, are these access "synchronized" to the main thread?

If not, then you have a valid suspect.

None of my threads, and I can have as many as 14 running at the same time, do this. Unless I missed something. ;)

Fair enough.

I racked down the multi free memory leaks to how I used a TStringList in my error reporting routine. It seems TStringLists are not subject to range checking.

Well, they are, but they are not threadsafe. This means that if you
add/remove items in it from different threads, weird things can happen.

I'm using dynamic arrays and I resize them (setlength) once they are 90% full. I think that may be where the problem is, even though I've been doing this exact same thing in this evolving program for years without any problem. The program is pushing 300k lines now, so maybe the overall size of the code is becoming an issue. I understand that dynamic arrays can be moved if the block of memory it is currently using is not large enough to grow more. Does it sound right that an AV error might occur in these c
ases?

SetLength definitely isn't threadsafe especially when moving the array.
I mean, one thread gets the "pointer" to the array, then another decides
to make it grow and does it, then the first thread writes to the memory
which is now invalid.

Why don't you use TList<T> or TObjectList<T> instead of direct arrays?
This would alleviate you from handling the size issue.
It is not thread safe though, so you would still need to protect
accesses to it.
Generally, here is what I do:

TMyList = class(TList<TMyRecord>)
public
procedure Lock;
procedure Unlock;
end;

procedure TMyList.Lock;
begin
TMonitor.Enter(Self);
end;

procedure TMyList.Unlock;
begin
TMonitor.Exit(Self);
end;

Then I call it this way:

MyList.Lock;
try
// work with the list
finally
MyList.Unlock;
end;

If every access is protected as such, you can't have memory overwrites
from multiple threads.
However, you may get delays if your "display thread" is locking the list
for too long.
Mike Meier

Posts: 10
Registered: 5/25/05
Re: Overwriting the heap? [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 20, 2017 8:10 AM   in response to: Olivier Sannier in response to: Olivier Sannier
Olivier Sannier wrote:
Mike Meier wrote:
Olivier Sannier wrote:
You mention threads in your explanation.
Are any of those threads accessing GUI elements?
If yes, are these access "synchronized" to the main thread?

If not, then you have a valid suspect.

None of my threads, and I can have as many as 14 running at the same time, do this. Unless I missed something. ;)

Fair enough.

I racked down the multi free memory leaks to how I used a TStringList in my error reporting routine. It seems TStringLists are not subject to range checking.

Well, they are, but they are not threadsafe. This means that if you
add/remove items in it from different threads, weird things can happen.

I'm using dynamic arrays and I resize them (setlength) once they are 90% full. I think that may be where the problem is, even though I've been doing this exact same thing in this evolving program for years without any problem. The program is pushing 300k lines now, so maybe the overall size of the code is becoming an issue. I understand that dynamic arrays can be moved if the block of memory it is currently using is not large enough to grow more. Does it sound right that an AV error might occur in these c
ases?

SetLength definitely isn't threadsafe especially when moving the array.
I mean, one thread gets the "pointer" to the array, then another decides
to make it grow and does it, then the first thread writes to the memory
which is now invalid.

Why don't you use TList<T> or TObjectList<T> instead of direct arrays?
This would alleviate you from handling the size issue.
It is not thread safe though, so you would still need to protect
accesses to it.
Generally, here is what I do:

TMyList = class(TList<TMyRecord>)
public
procedure Lock;
procedure Unlock;
end;

procedure TMyList.Lock;
begin
TMonitor.Enter(Self);
end;

procedure TMyList.Unlock;
begin
TMonitor.Exit(Self);
end;

Then I call it this way:

MyList.Lock;
try
// work with the list
finally
MyList.Unlock;
end;

If every access is protected as such, you can't have memory overwrites
from multiple threads.
However, you may get delays if your "display thread" is locking the list
for too long.

Thanks. I've read about this method. I hope I don't have to do that. So many arrays...

My threads handle I/O with external hardware and don't access the arrays in question. When the threads are done they update variables indicating the current temperature, etc. The main program, when logging and plotting the data, simply grabs those values. Basically, the program has two data structures, one for the hardware and feature settings, and a separate one for the experiments. Other than grabbing those temperatures, etc, they don't interact.

Last night I got an AV error logging the time/temperature. It doesn't make sense. All that is happening there is time and temperature are being stored in an array. And before that I got an AV error in this code:

function TMainForm.IsProcessTest(P:TTest_Project):boolean;
begin
  Result:=
      (P.Test.MethodNr=MethodDOC) or
      (P.Test.MethodNr=MethodLST) or
      (P.Test.MethodNr=MethodGST) or
      (P.Test.MethodNr=MethodOSC) or
      (P.Test.MethodNr=MethodGPT);
end;


All this is doing is comparing the value of a variable to several constants. IsProcessTest is not called in the threads.

"P" is the data structure for the experiment I mentioned. "P" must be getting corrupted.

Mike

Edited by: Mike Meier on Apr 20, 2017 8:11 AM
Olivier Sannier

Posts: 396
Registered: 8/26/01
Re: Overwriting the heap? [Edit] [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: Apr 20, 2017 8:47 AM   in response to: Mike Meier in response to: Mike Meier
Mike Meier wrote:

My threads handle I/O with external hardware and don't access the arrays in question. When the threads are done they update variables indicating the current temperature, etc. The main program, when logging and plotting the data, simply grabs those values.

How are those values stored? Are they placed in properties of an item of
one the arrays and the item type is record instead of TObject?

Basically, the program has two data structures, one for the hardware and feature settings, and a separate one for the experiments. Other than grabbing those temperatures, etc, they don't interact.

Last night I got an AV error logging the time/temperature. It doesn't make sense. All that is happening there is time and temperature are being stored in an array. And before that I got an AV error in this code:

function TMainForm.IsProcessTest(P:TTest_Project):boolean;
begin
  Result:=
      (P.Test.MethodNr=MethodDOC) or
      (P.Test.MethodNr=MethodLST) or
      (P.Test.MethodNr=MethodGST) or
      (P.Test.MethodNr=MethodOSC) or
      (P.Test.MethodNr=MethodGPT);
end;


All this is doing is comparing the value of a variable to several constants. IsProcessTest is not called in the threads.

Here, I'd use an enumeration and the "in" operator, much easier to read
and safer to use.

"P" is the data structure for the experiment I mentioned. "P" must be getting corrupted.

What is the type of P? I mean, what is TTest_Project?

If need be, I'm sure some of us would gladly (ie, for a fee...) have a
look at your entire source code and give you a complete review of the
situation.
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02