Watch, Follow, &
Connect with Us

For forums, blogs and more please visit our
Developer Tools Community.


Welcome, Guest
Guest Settings
Help

Thread: TIdTCPClient Transmit Timeout Not Working


This question is answered. Helpful answers available: 0. Correct answers available: 1.


Permlink Replies: 7 - Last Post: Dec 4, 2017 10:01 AM Last Post By: Remy Lebeau (Te...
Richard Zarr

Posts: 74
Registered: 7/1/98
TIdTCPClient Transmit Timeout Not Working  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Nov 30, 2017 12:01 PM
We have a communications thread that uses TIdTCPClient blocking sockets. The protocol (Modbus TCP) is command oriented so it never sends asynchronous data to the client. Here's the initial code for our execute method:

procedure TMyCommThread.Execute;
var
 buffer : TIdBytes;
 buffer_length : integer;
begin
 while not Terminated do
  begin
   // get a packet
   FPacket := FTCPQueue.GetNextPacket;
   // see if we have something to do
   if not Terminated then
    begin
     if Assigned(FPacket) then
      begin
       // load packet into buffer
       buffer_length := CopyPacketToBuffer(buffer, FPacket);
       // if the TCP connection is active, process the command
       if Assigned(FTCP) then
        begin
         try
          // make sure we are still connected
          FTCP.CheckForGracefulDisconnect;
          // send it
          FTCP.IOHandler.WriteDirect(buffer);
          // clear the buffer
          Finalize(buffer);
          // now wait for response
          FTCP.IOHandler.ReadBytes(buffer, -1);
          // check if successful
          if (Length(buffer) > 0) then
           ProcessData(buffer)
          else
           if Assigned(FOnRxTimeout) then FOnRxTimeout(self);
         except
          if Assigned(FOnTxTimeout) then FOnTxTimeout(self);
         end;
        end;
       // clean up
       FreeAndNil(FPacket);
      end
     else
      begin
       // nothing more to do, wait for a command
       ResetEvent(FRun);
       WaitForSingleObject(FRun, INFINITE);
      end;
    end;
  end;
end;


We are polling very fast - roughly once every 20 ms - to load the system (typically, polling would occur every 100 ms). Randomly, between a few minutes and a few hours FTCP.IOHandler.WriteDirect(buffer) hangs. The thread is create when the OnConnected event handler fires once the connection with the PLC (Modbus TCP server) is established:

procedure TMyClient.ConnectEvent(Sender: TObject);
begin
 // create a working thread
 FTCPThread := TMyCommThread.Create(FTCP);
 FTCPThread.OnPacket := ModbusRx_PacketReadyEvent;
 FTCPThread.OnSend := ModbusTx_SendEvent; // used for low level monitoring
 FTCPThread.OnTxTimeout := ModbusTx_TimeoutEvent;
 FTCPThread.OnRxTimeout := ModbusRx_TimeoutEvent;
 // update socket settings
 FTCP.Socket.Binding.SetKeepAliveValues(true, FKeepAlivePeriod * 1000, 1000);
 FTCP.Socket.Binding.SetSockOpt(Id_SOL_SOCKET, Id_SO_SNDTIMEO, 500);
 // disable reconnect process
 FReconnect := false;
 // update properties
 UpdateProperties(Connected);
 // update the nodes
 EnumNodes;
 // enable the polling thread
 FPollingThread.Enabled := true;
 // alert the system we are connected
 if Assigned(FOnConnect) then FOnConnect(self);
end;


This is running as a service with no UI... (and no windows message queue since it is in a DLL). Note the 500 ms setting for the SO_SNDTIMEO socket option. I checked this value by reading it back so it is being properly set. However, the write hangs for roughly 30 seconds maybe more before raising an exception. This is WAY to long for real time control... any ideas for:

1) why the socket hangs?
2) why the timeout is not working?

Regards,

RZ

Update: We have captured and isolated the incident with Wireshark - the file is available [here|https://www.dropbox.com/sh/60et40fmw5o49hi/AAAw0CwIXUpstxcg6EInCJLOa?dl=0]. Note that when it hangs, there is a TCP ACK (item 4) that never occurs anywhere else in the communications. It originates from the TIdTCPClient socket to the server.... this must be related to the issue. It also illustrates that the Tx Timeout is taking 60 seconds (not 500 ms as programmed) - this could be something other than a Tx timeout. Traces of the code seem to indicate a disconnect, but Wireshark shows it is still connected. The only thing we can think of is that the handle of the socket is getting whacked - once the TIdTCPClient recovers from where ever it goes, we disconnect the client and reconnect and all restarts as designed (you can see that in the Wireshark file).

Edited by: Richard Zarr on Nov 30, 2017 12:39 PM
Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: TIdTCPClient Transmit Timeout Not Working
Helpful
Click to report abuse...   Click to reply to this thread Reply
  Posted: Nov 30, 2017 2:40 PM   in response to: Richard Zarr in response to: Richard Zarr
Richard Zarr wrote:
We have a communications thread that uses TIdTCPClient blocking sockets. The protocol (Modbus TCP) is command oriented so it never sends asynchronous data to the client. Here's the initial code for our execute method

Modbus TCP using a framing structure, but I don't see you coding for that framing.

http://www.modbus.org/docs/Modbus_Messaging_Implementation_Guide_V1_0b.pdf

Every Modbus TCP message has a 7-byte header. And you can clearly see this in your Wireshark capture, as Wireshark recognizes Modbus TCP and thus parses the header accordingly.

Is CopyPacketToBuffer() including that header in the output buffer? Perhaps so, since the header exists in your outbound packets in the Wireshark capture.

But you are certainly NOT handling that header in your reading code. ReadBytes(-1) is the wrong thing to use in this situation (especially considering that you are not setting a reading timeout of any kind). You need to read the 7-byte header, then read the number of bytes the header says.

And don't use TIdIOHandler.WriteDirect() directly., use TIdIOHandler.Write(TIdBytes) instead.

Randomly, between a few minutes and a few hours FTCP.IOHandler.WriteDirect(buffer) hangs.

Hanging indefinitely is not possible with a non-infinite SO_SNDTIMEO timeout.

The only way a send can hang at all is if the receiver's inbound socket buffer is full up. Your client socket would be blocked waiting for the receiver to read data, clearing space in its receive buffer for new data, until the timeout elapses, at which time a socket error would occur and your thread would get an exception (that it is not handling correctly, FYI).

// create a working thread

Are you creating the thread suspended? I don't see you calling Resume()/Start() on it, so I assume you are not. Which means you are configuring your thread and socket after the thread may already be running. You should create the thread suspended and then resume it only after you are finished preparing it. Otherwise, you risk race conditions.

Also, your thread is assuming ANY exception is a timeout error, but that is the not the case. When you catch an exception, make sure it is EIdReadTimeout, or EIdSocketError with ErrorCode=WSAETIMEOUT. Any other exception is an actual error, so you should close the socket and stop the thread.

This is running as a service with no UI...

That is irrelevant to your issue.

Note the 500 ms setting for the SO_SNDTIMEO socket option. I checked this value by reading it back so it is being properly set. However, the write hangs for roughly 30 seconds maybe more before raising an exception.

The only way that can happen is if the receiver is reading data VERY slowly, but is nonetheless reading at least 1 byte at a time before 500ms elapses, except the last time when the timeout does elapse and raise the exception.

Update: We have captured and isolated the incident with Wireshark - the file is available [here|https://www.dropbox.com/sh/60et40fmw5o49hi/AAAw0CwIXUpstxcg6EInCJLOa?dl=0]. Note that when it hangs, there is a TCP ACK (item 4) that never occurs anywhere else in the communications. It originates from the TIdTCPClient socket to the server.... this must be related to the issue.

That item is simply your client ACKing the data it has read from the server. Then the Wireshark capture shows you not sending any more Modbus commands at all, only TCP keep-alives (items 5-12), until your client disconnects (item 13-16).

It also illustrates that the Tx Timeout is taking 60 seconds (not 500 ms as programmed)

That is not what it shows. It shows the server responding to your 1st command in 1ms, which your client ACKs 191ms after it was received, then your client doesn't send any commands for 14.8s, at which time TCP keep-alives kick in at 15s intervals, and after 4 keep-alives (15s*4=60s) your client explicitly disconnects.

this could be something other than a Tx timeout.

It is. You are not handling the Modbus TCP protocol correctly.

Traces of the code seem to indicate a disconnect, but Wireshark shows it is still connected.

Not once 60s of keep-alives have elapsed. There is clearly a disconnect shown in the capture.

The only thing we can think of is that the handle of the socket is getting whacked

It is not.

The correct thread code should look something more like this:

FTCP.IOHandler.ReadTimeout := ...;
...
procedure TMyCommThread.Execute;
var
  buffer : TIdBytes;
  buffer_length : integer;
  len: Word;
begin
  while not Terminated do
  begin
    // get a packet
    FPacket := FTCPQueue.GetNextPacket;
    // see if we have something to do
    if not Assigned(FPacket) then
    begin
      // nothing more to do, wait for a command
      ResetEvent(FRun);
      WaitForSingleObject(FRun, INFINITE);
      Continue;
    end;
    try
      if not Terminated then
      begin
        // load packet into buffer
        buffer_length := CopyPacketToBuffer(buffer, FPacket);
        // send it
        try
          FTCP.IOHandler.Write(buffer);
        except
          on E: EIdSocketError do begin
            if E.LastError = WSAETIMEDOUT then begin
              if Assigned(FOnTxTimeout) then FOnTxTimeout(Self);
            end;
            // no way to know how much data was actually sent, so the
            // only sane thing to do is disconnect and reconnect...
            raise;
          end;
        end;
        // now wait for response
        try
          buffer := nil;
          FTCP.IOHandler.ReadBytes(buffer, 7);
          len := GStack.NetworkToHost(BytesToUInt16(buffer, 4));
          if len > 1 then FTCP.IOHandler.ReadBytes(buffer, len-1, True);
        except
          on E: EIdReadTimeout do begin
            if Assigned(FOnRxTimeout) then FOnRxTimeout(Self);
            // no way to know how much data was actually received, so the
            // only sane thing to do is disconnect and reconnect...
            raise;
          end;
        end;
        ProcessData(buffer);
        buffer := nil;
      end;
    finally
      // clean up
      FPacket.Free;
    end;
  end;
end;
 
procedure TMyCommThread.DoTeminate;
begin
  if FatalException <> nil then
  begin
    // signal a reconnect if needed...
  end;
  inherited;
end;


--
Remy Lebeau (TeamB)
Richard Zarr

Posts: 74
Registered: 7/1/98
Re: TIdTCPClient Transmit Timeout Not Working  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Dec 1, 2017 7:55 AM   in response to: Remy Lebeau (Te... in response to: Remy Lebeau (Te...
Remy Lebeau (TeamB) wrote:
Richard Zarr wrote:
We have a communications thread that uses TIdTCPClient blocking sockets. The protocol (Modbus TCP) is command oriented so it never sends asynchronous data to the client. Here's the initial code for our execute method

Modbus TCP using a framing structure, but I don't see you coding for that framing.

http://www.modbus.org/docs/Modbus_Messaging_Implementation_Guide_V1_0b.pdf

Every Modbus TCP message has a 7-byte header. And you can clearly see this in your Wireshark capture, as Wireshark recognizes Modbus TCP and thus parses the header accordingly.

Is CopyPacketToBuffer() including that header in the output buffer? Perhaps so, since the header exists in your outbound packets in the Wireshark capture.

But you are certainly NOT handling that header in your reading code. ReadBytes(-1) is the wrong thing to use in this situation (especially considering that you are not setting a reading timeout of any kind). You need to read the 7-byte header, then read the number of bytes the header says.

And don't use TIdIOHandler.WriteDirect() directly., use TIdIOHandler.Write(TIdBytes) instead.

Randomly, between a few minutes and a few hours FTCP.IOHandler.WriteDirect(buffer) hangs.

Hanging indefinitely is not possible with a non-infinite SO_SNDTIMEO timeout.

The only way a send can hang at all is if the receiver's inbound socket buffer is full up. Your client socket would be blocked waiting for the receiver to read data, clearing space in its receive buffer for new data, until the timeout elapses, at which time a socket error would occur and your thread would get an exception (that it is not handling correctly, FYI).

// create a working thread

Are you creating the thread suspended? I don't see you calling Resume()/Start() on it, so I assume you are not. Which means you are configuring your thread and socket after the thread may already be running. You should create the thread suspended and then resume it only after you are finished preparing it. Otherwise, you risk race conditions.

Also, your thread is assuming ANY exception is a timeout error, but that is the not the case. When you catch an exception, make sure it is EIdReadTimeout, or EIdSocketError with ErrorCode=WSAETIMEOUT. Any other exception is an actual error, so you should close the socket and stop the thread.

This is running as a service with no UI...

That is irrelevant to your issue.

Note the 500 ms setting for the SO_SNDTIMEO socket option. I checked this value by reading it back so it is being properly set. However, the write hangs for roughly 30 seconds maybe more before raising an exception.

The only way that can happen is if the receiver is reading data VERY slowly, but is nonetheless reading at least 1 byte at a time before 500ms elapses, except the last time when the timeout does elapse and raise the exception.

Update: We have captured and isolated the incident with Wireshark - the file is available [here|https://www.dropbox.com/sh/60et40fmw5o49hi/AAAw0CwIXUpstxcg6EInCJLOa?dl=0]. Note that when it hangs, there is a TCP ACK (item 4) that never occurs anywhere else in the communications. It originates from the TIdTCPClient socket to the server.... this must be related to the issue.

That item is simply your client ACKing the data it has read from the server. Then the Wireshark capture shows you not sending any more Modbus commands at all, only TCP keep-alives (items 5-12), until your client disconnects (item 13-16).

It also illustrates that the Tx Timeout is taking 60 seconds (not 500 ms as programmed)

That is not what it shows. It shows the server responding to your 1st command in 1ms, which your client ACKs 191ms after it was received, then your client doesn't send any commands for 14.8s, at which time TCP keep-alives kick in at 15s intervals, and after 4 keep-alives (15s*4=60s) your client explicitly disconnects.

this could be something other than a Tx timeout.

It is. You are not handling the Modbus TCP protocol correctly.

Traces of the code seem to indicate a disconnect, but Wireshark shows it is still connected.

Not once 60s of keep-alives have elapsed. There is clearly a disconnect shown in the capture.

The only thing we can think of is that the handle of the socket is getting whacked

It is not.

The correct thread code should look something more like this:

FTCP.IOHandler.ReadTimeout := ...;
...
procedure TMyCommThread.Execute;
var
  buffer : TIdBytes;
  buffer_length : integer;
  len: Word;
begin
  while not Terminated do
  begin
    // get a packet
    FPacket := FTCPQueue.GetNextPacket;
    // see if we have something to do
    if not Assigned(FPacket) then
    begin
      // nothing more to do, wait for a command
      ResetEvent(FRun);
      WaitForSingleObject(FRun, INFINITE);
      Continue;
    end;
    try
      if not Terminated then
      begin
        // load packet into buffer
        buffer_length := CopyPacketToBuffer(buffer, FPacket);
        // send it
        try
          FTCP.IOHandler.Write(buffer);
        except
          on E: EIdSocketError do begin
            if E.LastError = WSAETIMEDOUT then begin
              if Assigned(FOnTxTimeout) then FOnTxTimeout(Self);
            end;
            // no way to know how much data was actually sent, so the
            // only sane thing to do is disconnect and reconnect...
            raise;
          end;
        end;
        // now wait for response
        try
          buffer := nil;
          FTCP.IOHandler.ReadBytes(buffer, 7);
          len := GStack.NetworkToHost(BytesToUInt16(buffer, 4));
          if len > 1 then FTCP.IOHandler.ReadBytes(buffer, len-1, True);
        except
          on E: EIdReadTimeout do begin
            if Assigned(FOnRxTimeout) then FOnRxTimeout(Self);
            // no way to know how much data was actually received, so the
            // only sane thing to do is disconnect and reconnect...
            raise;
          end;
        end;
        ProcessData(buffer);
        buffer := nil;
      end;
    finally
      // clean up
      FPacket.Free;
    end;
  end;
end;
 
procedure TMyCommThread.DoTeminate;
begin
  if FatalException <> nil then
  begin
    // signal a reconnect if needed...
  end;
  inherited;
end;


--
Remy Lebeau (TeamB)

Hi Remy,

Thanks for the detailed look at this and the coding suggestions. We'll implement these changes today and do some testing. Per your questions:

Modbus TCP using a framing structure, but I don't see you coding for that framing.

We do that here:

function TMyCommThread.CopyPacketToBuffer(
 var buffer: TIdBytes;          // buffer (destination)
 Packet: TModbus_PacketTCP      // packet (source)
 ) : integer;                   // length of buffer
begin
 // set the length (header = 7, packet payload length = n, -1 to get the length
 Result := MODBUS_HEADER_LENGTH + Packet.PacketLength - 1;
 SetLength(buffer, Result);
 // clear the buffer
 FillChar(buffer[0], Result, $00);
 // now copy everything to the correct location
 // transaction ID - create one for outgoing...
 buffer[0] := byte((Packet.FTransactionID SHR 8) AND $FF);
 buffer[1] := byte(Packet.FTransactionID AND $FF);
 // protocol
 buffer[2] := byte((Packet.ProtocolID SHR 8) AND $FF);
 buffer[3] := byte(Packet.ProtocolID AND $FF);
 // packet length
 buffer[4] := byte((Packet.PacketLength SHR 8) AND $FF);
 buffer[5] := byte(Packet.PacketLength AND $FF);
 // unit ID
 buffer[6] := Packet.UnitID;
 // function code
 buffer[7] := Packet.FunctionCode;
 // payload
 Move(Packet.Payload[0], buffer[8], Packet.PacketLength-2);
end;


But you are certainly NOT handling that header in your reading code. ReadBytes(-1) is the wrong thing to use in this situation (especially considering that you are not setting a reading timeout of any kind). You need to read the 7-byte header, then read the number of bytes the header says.

We were assuming the response would fit into one MDU, but that may not be the case - so we'll fix this. The header is processed in the ProcessData() method where the transaction ID is checked against what was sent along with errors (which is in bit 7 of the function code). We then dispatch the final packet to a higher level process to update the individual objects and alert connected clients of the change. It doesn't return until all of that is done, so the next command in the queue waits until all of the additional processing is complete.

Are you creating the thread suspended? I don't see you calling Resume()/Start() on it, so I assume you are not. Which means you are configuring your thread and socket after the thread may already be running. You should create the thread suspended and then resume it only after you are finished preparing it. Otherwise, you risk race conditions.

We'll take a look at this... along with the other coding suggestions. Again, many thanks as always for your insight.

Regards,

RZ

UPDATE: OK... we implemented the changes, cleaned up some of our code and started testing. I believe this is a significantly more stable way to process the communications... however, we are still not out of the woods. We are now getting an EMonitorLockException with message 'Object lock not owned" and it completely dies. We believe this was the underlying issue with the original thread execute code and now is being properly caught. We believe it has something to do with our priority queue which has two TThreadList objects - normal (used for reads, etc) and high priority (used for writes to modbus registers and coils). We haven't run sufficient tests yet to determine if the new execute code is completely working, but if we start to manually inject high priority packets into the queue, we get this error. This occurs when a user on a console changes a setting. It is being injected along with polling packets. It is rare, but happens when the system is running at full polling speed (20 ms intervals).

Here is the send code. This is called by multiple threads - one is the polling thread and the other is the client service thread.
procedure TModbus_TCPThread.SendPacket(Packet: TModbus_PacketTCP;
  HighPriority: boolean);
var
 packet_entry : TModbus_PacketTCP;
begin
 if FTimeoutError then exit;
 // lock this routine to prevent multiple sources from disrupting the packet
 FLock.Enter;
 try
  // create a copy of the packet for the queue
  packet_entry := TModbus_PacketTCP.Create;
  packet_entry.Clone(Packet);
  // assign a transaction ID
  inc(FTransactionID);
  packet_entry.FTransactionID := FTransactionID;
  // add it to the queue
  if HighPriority then
   FTCPQueue.AddHP(packet_entry)
  else
   FTCPQueue.Add(packet_entry);
  // start up the thread...
  SetEvent(FRun);
 finally
  FLock.Leave;
 end;
end;


This is the add functions for FTCPQueue:

 
{*****************************************************************************}
{*                                                                           *}
{*  T M o d b u s _ T C P Q u e u e                                          *}
{*                                                                           *}
{*****************************************************************************}
procedure TModbus_TCPQueue.Add(Packet: TModbus_PacketTCP);
begin
 FQueue.Add(Packet);
end;
 
procedure TModbus_TCPQueue.AddHP(Packet: TModbus_PacketTCP);
begin
 FQueueHP.Add(Packet);
end;
 
constructor TModbus_TCPQueue.Create;
begin
 FQueue := TModbus_PacketList.Create;
 FQueueHP := TModbus_PacketList.Create;
end;
 
destructor TModbus_TCPQueue.Destroy;
begin
 Flush;
 FQueue.Free;
 FQueueHP.Free;
 inherited;
end;
 
procedure TModbus_TCPQueue.Flush;
begin
 FQueue.Clear;
 FQueueHP.Clear;
end;
 
function TModbus_TCPQueue.GetNextPacket: TModbus_PacketTCP;
begin
 // try the high priority queue first
 Result := FQueueHP.GetNextPacket;
 // if nothing, try the normal queue next
 if (Result = nil) then
  Result := FQueue.GetNextPacket;
end;


And here is the GetNextPacket method for FQueueHP and FQueue.
function TModbus_PacketList.GetNextPacket: TModbus_PacketTCP;
var
 L : TList<TModbus_PacketTCP>;
begin
 Result := nil;
 L := LockList;
 try
  if (L.Count > 0) then
   begin
    Result := L.First;
    L.Remove(Result);
   end;
 finally
  UnlockList;
 end;
end;


Any idea what might be throwing the error? We'll continue to trace and see what we can find.

Regards,

RZ

Edited by: Richard Zarr on Dec 1, 2017 9:36 AM

Edited by: Richard Zarr on Dec 1, 2017 9:50 AM
Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: TIdTCPClient Transmit Timeout Not Working [Edit]
Helpful
Click to report abuse...   Click to reply to this thread Reply
  Posted: Dec 1, 2017 10:44 AM   in response to: Richard Zarr in response to: Richard Zarr
Richard Zarr wrote:

// set the length (header = 7, packet payload length = n, -1 to get
the length Result := MODBUS_HEADER_LENGTH + Packet.PacketLength - 1;
SetLength(buffer, Result);
// clear the buffer
FillChar(buffer[0], Result, $00);
// now copy everything to the correct location
// transaction ID - create one for outgoing...
buffer[0] := byte((Packet.FTransactionID SHR 8) AND $FF);
buffer[1] := byte(Packet.FTransactionID AND $FF);
// protocol
buffer[2] := byte((Packet.ProtocolID SHR 8) AND $FF);
buffer[3] := byte(Packet.ProtocolID AND $FF);
// packet length
buffer[4] := byte((Packet.PacketLength SHR 8) AND $FF);
buffer[5] := byte(Packet.PacketLength AND $FF);
// unit ID
buffer[6] := Packet.UnitID;
// function code
buffer[7] := Packet.FunctionCode;
// payload
Move(Packet.Payload[0], buffer[8], Packet.PacketLength-2);

This looks a little suspicious to me. Why does Packet carry its own
PacketLength field? Why does Packet care about the framing at all?
You should be calculating that dynamically at the socket layer when
preparing the packet for transmission.

Is Packet.Payload a dynamic array or a fixed array? The allocated
buffer size should be MODBUS_HEADER_LENGTH + 1 +
NumberOfBytesInPayload, and the header's packet length field should be
2 + NumberOfBytesInPayload.

We were assuming the response would fit into one MDU

You can't make that assumption in TCP, especially with variable-length
messages. You must read the fixed-sized header first, then read the
payload if any.

We are now getting an EMonitorLockException with message 'Object lock
not owned" and it completely dies.

That exception means a thread tried to unlock a TMonitor that it did
not own the lock to. But nothing in the code you have shown deals with
TMonitor locks, so that error is happening in other code that is
irrelevant to your Modbus communications.

Locks should be short-lived, and locked code should be wrapped in
try/finally blocks to ensure proper unlocking.

But this particular error seems to be common when using the TTask
framework. Are you using TTask?

Here is the send code. This is called by multiple threads - one is
the polling thread and the other is the client service thread.

TThreadList has its own lock, so why do you have another lock around it
when adding an object to the list?

procedure TModbus_TCPQueue.Flush;
begin
FQueue.Clear;
FQueueHP.Clear;
end;

If there are objects in those queues, you are leaking them. They need
to be destroyed when removed from the lists.

--
Remy Lebeau (TeamB)
Richard Zarr

Posts: 74
Registered: 7/1/98
Re: TIdTCPClient Transmit Timeout Not Working [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Dec 1, 2017 11:23 AM   in response to: Remy Lebeau (Te... in response to: Remy Lebeau (Te...
Remy Lebeau (TeamB) wrote:
Richard Zarr wrote:

// set the length (header = 7, packet payload length = n, -1 to get
the length Result := MODBUS_HEADER_LENGTH + Packet.PacketLength - 1;
SetLength(buffer, Result);
// clear the buffer
FillChar(buffer[0], Result, $00);
// now copy everything to the correct location
// transaction ID - create one for outgoing...
buffer[0] := byte((Packet.FTransactionID SHR 8) AND $FF);
buffer[1] := byte(Packet.FTransactionID AND $FF);
// protocol
buffer[2] := byte((Packet.ProtocolID SHR 8) AND $FF);
buffer[3] := byte(Packet.ProtocolID AND $FF);
// packet length
buffer[4] := byte((Packet.PacketLength SHR 8) AND $FF);
buffer[5] := byte(Packet.PacketLength AND $FF);
// unit ID
buffer[6] := Packet.UnitID;
// function code
buffer[7] := Packet.FunctionCode;
// payload
Move(Packet.Payload[0], buffer[8], Packet.PacketLength-2);

This looks a little suspicious to me. Why does Packet carry its own
PacketLength field? Why does Packet care about the framing at all?
You should be calculating that dynamically at the socket layer when
preparing the packet for transmission.

Is Packet.Payload a dynamic array or a fixed array? The allocated
buffer size should be MODBUS_HEADER_LENGTH + 1 +
NumberOfBytesInPayload, and the header's packet length field should be
2 + NumberOfBytesInPayload.

We were assuming the response would fit into one MDU

You can't make that assumption in TCP, especially with variable-length
messages. You must read the fixed-sized header first, then read the
payload if any.

We are now getting an EMonitorLockException with message 'Object lock
not owned" and it completely dies.

That exception means a thread tried to unlock a TMonitor that it did
not own the lock to. But nothing in the code you have shown deals with
TMonitor locks, so that error is happening in other code that is
irrelevant to your Modbus communications.

Locks should be short-lived, and locked code should be wrapped in
try/finally blocks to ensure proper unlocking.

But this particular error seems to be common when using the TTask
framework. Are you using TTask?

Here is the send code. This is called by multiple threads - one is
the polling thread and the other is the client service thread.

TThreadList has its own lock, so why do you have another lock around it
when adding an object to the list?

procedure TModbus_TCPQueue.Flush;
begin
FQueue.Clear;
FQueueHP.Clear;
end;

If there are objects in those queues, you are leaking them. They need
to be destroyed when removed from the lists.

--
Remy Lebeau (TeamB)

Hi Remy,

Here's the update... we are still hanging somewhere in the system for roughly 60 seconds and then it throws the lock error. See the Wireshark capture file (Hang Issue 2) at https://www.dropbox.com/sh/60et40fmw5o49hi/AAAw0CwIXUpstxcg6EInCJLOa?dl=0. Note that between lines 14 and 23 there is a 60 second gap. The lock error is thrown right at line 23 when everything starts up again - and then goes into the weeds and never recovers. We are not using TTask.

Here's the stack dump:

:74f0845d KERNELBASE.RaiseException + 0x54
System.TMonitor.CheckOwningThread
System.ErrorAt(25,$5F7919C)
System.Error(reMonitorNotLocked)
System.TMonitor.CheckOwningThread
System.TMonitor.Exit
System.TMonitor.Exit($6341250)
classes_modbus.{System.Generics.Collections}TThreadList<classes_modbus.TModbus_PacketTCP>.UnlockList
classes_modbus.{System.Generics.Collections}TThreadList<classes_modbus.TModbus_PacketTCP>.Add($6294650)
classes_modbus.TModbus_TCPQueue.Add($6294650)
classes_modbus.TModbus_TCPThread.SendPacket($6294620,False)
classes_modbus.TModbus_Client.ReadCoils((3682717653, 16, 2, (0, 1, 0, 0, 0, 0, 0, 1)),0,16)
classes_modbus.TModbus_Client.Modbus_PollingEvent($6381D20)
classes_modbus.TModbus_Polling.Execute
System.Classes.ThreadProc($6381D20)
System.ThreadWrapper($63411E0)
:754fef8c kernel32.BaseThreadInitThunk + 0x12
:76fa367a ntdll.RtlInitializeExceptionChain + 0xef
:76fa364d ntdll.RtlInitializeExceptionChain + 0xc2

Per your questions:

This looks a little suspicious to me. Why does Packet carry its own
PacketLength field? Why does Packet care about the framing at all?
You should be calculating that dynamically at the socket layer when
preparing the packet for transmission.

This is a hold over when we actually had a Tx and Rx thread. The way we wrote this driver is that every type of modbus thing (a coil, holding register, etc.) is an object with all of the functionality and abstraction required by our application. They all descend from a TModbus_Object class that encapsulates all the generic modbus behavior. So if the system wants to read a coil, it simply calls the TModbus_Coil.Read method and that creates the correct packet (which includes a GUID of the object) and places it into the queue - see below:

procedure TModbus_TCPThread.SendPacket(Packet: TModbus_PacketTCP;
  HighPriority: boolean);
var
 packet_entry : TModbus_PacketTCP;
begin
 if FTimeoutError then exit;
 // lock this routine to prevent multiple sources from disrupting the packet
 FLock.Enter;
 try
  // create a copy of the packet for the queue
  packet_entry := TModbus_PacketTCP.Create;
  packet_entry.Clone(Packet);
  // assign a transaction ID
  inc(FTransactionID);
  packet_entry.FTransactionID := FTransactionID;
  // add it to the queue
  if HighPriority then
   FTCPQueue.AddHP(packet_entry)
  else
   FTCPQueue.Add(packet_entry);
  // start up the thread...
  SetEvent(FRun);
 finally
  FLock.Leave;
 end;
end;


Originally we were not using blocking sockets, so we needed a way to connect the transmitted packet from what was received... thus some of the overhead you are now seeing which is somewhat a legacy to our original design.

TThreadList has its own lock, so why do you have another lock around it
when adding an object to the list?

The sending code updates the packet with the transaction ID and some other information, so it is not thread safe. We lock the entire send routine to prevent two threads from corrupting the new packet being added to the queue.

procedure TModbus_TCPThread.SendPacket(Packet: TModbus_PacketTCP;
  HighPriority: boolean);
var
 packet_entry : TModbus_PacketTCP;
begin
 if FTimeoutError then exit;
 // lock this routine to prevent multiple sources from disrupting the packet
 FLock.Enter;
 try
  // create a copy of the packet for the queue
  packet_entry := TModbus_PacketTCP.Create; 
  packet_entry.Clone(Packet); // This operation is not thread safe...
  // assign a transaction ID
  inc(FTransactionID); 
  packet_entry.FTransactionID := FTransactionID; // and this is not thread safe 
  // add it to the queue
  if HighPriority then
   FTCPQueue.AddHP(packet_entry)  // these are thread safe
  else
   FTCPQueue.Add(packet_entry); // ditto...
  // start up the thread...
  SetEvent(FRun);
 finally
  FLock.Leave;
 end;
end;
 



procedure TModbus_TCPQueue.Flush;
begin
FQueue.Clear;
FQueueHP.Clear;
end;

If there are objects in those queues, you are leaking them. They need
to be destroyed when removed from the lists.

No leaking here - the clear routine removes and frees all list members...

procedure TModbus_PacketList.Clear;
var
 L : TList<TModbus_PacketTCP>;
 P : TModbus_PacketTCP;
begin
 L := LockList;
 try
  while (L.Count > 0) do
   begin
    P := L.Last;
    L.Remove(P);
    P.Free;
   end;
 finally
  UnlockList;
 end;
end;


Regards,

RZ

Edited by: Richard Zarr on Dec 1, 2017 11:25 AM
Richard Zarr

Posts: 74
Registered: 7/1/98
Re: TIdTCPClient Transmit Timeout Not Working [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Dec 1, 2017 12:18 PM   in response to: Richard Zarr in response to: Richard Zarr
Richard Zarr wrote:
Remy Lebeau (TeamB) wrote:
Richard Zarr wrote:

// set the length (header = 7, packet payload length = n, -1 to get
the length Result := MODBUS_HEADER_LENGTH + Packet.PacketLength - 1;
SetLength(buffer, Result);
// clear the buffer
FillChar(buffer[0], Result, $00);
// now copy everything to the correct location
// transaction ID - create one for outgoing...
buffer[0] := byte((Packet.FTransactionID SHR 8) AND $FF);
buffer[1] := byte(Packet.FTransactionID AND $FF);
// protocol
buffer[2] := byte((Packet.ProtocolID SHR 8) AND $FF);
buffer[3] := byte(Packet.ProtocolID AND $FF);
// packet length
buffer[4] := byte((Packet.PacketLength SHR 8) AND $FF);
buffer[5] := byte(Packet.PacketLength AND $FF);
// unit ID
buffer[6] := Packet.UnitID;
// function code
buffer[7] := Packet.FunctionCode;
// payload
Move(Packet.Payload[0], buffer[8], Packet.PacketLength-2);

This looks a little suspicious to me. Why does Packet carry its own
PacketLength field? Why does Packet care about the framing at all?
You should be calculating that dynamically at the socket layer when
preparing the packet for transmission.

Is Packet.Payload a dynamic array or a fixed array? The allocated
buffer size should be MODBUS_HEADER_LENGTH + 1 +
NumberOfBytesInPayload, and the header's packet length field should be
2 + NumberOfBytesInPayload.

We were assuming the response would fit into one MDU

You can't make that assumption in TCP, especially with variable-length
messages. You must read the fixed-sized header first, then read the
payload if any.

We are now getting an EMonitorLockException with message 'Object lock
not owned" and it completely dies.

That exception means a thread tried to unlock a TMonitor that it did
not own the lock to. But nothing in the code you have shown deals with
TMonitor locks, so that error is happening in other code that is
irrelevant to your Modbus communications.

Locks should be short-lived, and locked code should be wrapped in
try/finally blocks to ensure proper unlocking.

But this particular error seems to be common when using the TTask
framework. Are you using TTask?

Here is the send code. This is called by multiple threads - one is
the polling thread and the other is the client service thread.

TThreadList has its own lock, so why do you have another lock around it
when adding an object to the list?

procedure TModbus_TCPQueue.Flush;
begin
FQueue.Clear;
FQueueHP.Clear;
end;

If there are objects in those queues, you are leaking them. They need
to be destroyed when removed from the lists.

--
Remy Lebeau (TeamB)

Hi Remy,

Here's the update... we are still hanging somewhere in the system for roughly 60 seconds and then it throws the lock error. See the Wireshark capture file (Hang Issue 2) at https://www.dropbox.com/sh/60et40fmw5o49hi/AAAw0CwIXUpstxcg6EInCJLOa?dl=0. Note that between lines 14 and 23 there is a 60 second gap. The lock error is thrown right at line 23 when everything starts up again - and then goes into the weeds and never recovers. We are not using TTask.

Here's the stack dump:

:74f0845d KERNELBASE.RaiseException + 0x54
System.TMonitor.CheckOwningThread
System.ErrorAt(25,$5F7919C)
System.Error(reMonitorNotLocked)
System.TMonitor.CheckOwningThread
System.TMonitor.Exit
System.TMonitor.Exit($6341250)
classes_modbus.{System.Generics.Collections}TThreadList<classes_modbus.TModbus_PacketTCP>.UnlockList
classes_modbus.{System.Generics.Collections}TThreadList<classes_modbus.TModbus_PacketTCP>.Add($6294650)
classes_modbus.TModbus_TCPQueue.Add($6294650)
classes_modbus.TModbus_TCPThread.SendPacket($6294620,False)
classes_modbus.TModbus_Client.ReadCoils((3682717653, 16, 2, (0, 1, 0, 0, 0, 0, 0, 1)),0,16)
classes_modbus.TModbus_Client.Modbus_PollingEvent($6381D20)
classes_modbus.TModbus_Polling.Execute
System.Classes.ThreadProc($6381D20)
System.ThreadWrapper($63411E0)
:754fef8c kernel32.BaseThreadInitThunk + 0x12
:76fa367a ntdll.RtlInitializeExceptionChain + 0xef
:76fa364d ntdll.RtlInitializeExceptionChain + 0xc2

Per your questions:

This looks a little suspicious to me. Why does Packet carry its own
PacketLength field? Why does Packet care about the framing at all?
You should be calculating that dynamically at the socket layer when
preparing the packet for transmission.

This is a hold over when we actually had a Tx and Rx thread. The way we wrote this driver is that every type of modbus thing (a coil, holding register, etc.) is an object with all of the functionality and abstraction required by our application. They all descend from a TModbus_Object class that encapsulates all the generic modbus behavior. So if the system wants to read a coil, it simply calls the TModbus_Coil.Read method and that creates the correct packet (which includes a GUID of the object) and places it into the queue - see below:

procedure TModbus_TCPThread.SendPacket(Packet: TModbus_PacketTCP;
  HighPriority: boolean);
var
 packet_entry : TModbus_PacketTCP;
begin
 if FTimeoutError then exit;
 // lock this routine to prevent multiple sources from disrupting the packet
 FLock.Enter;
 try
  // create a copy of the packet for the queue
  packet_entry := TModbus_PacketTCP.Create;
  packet_entry.Clone(Packet);
  // assign a transaction ID
  inc(FTransactionID);
  packet_entry.FTransactionID := FTransactionID;
  // add it to the queue
  if HighPriority then
   FTCPQueue.AddHP(packet_entry)
  else
   FTCPQueue.Add(packet_entry);
  // start up the thread...
  SetEvent(FRun);
 finally
  FLock.Leave;
 end;
end;


Originally we were not using blocking sockets, so we needed a way to connect the transmitted packet from what was received... thus some of the overhead you are now seeing which is somewhat a legacy to our original design.

TThreadList has its own lock, so why do you have another lock around it
when adding an object to the list?

The sending code updates the packet with the transaction ID and some other information, so it is not thread safe. We lock the entire send routine to prevent two threads from corrupting the new packet being added to the queue.

procedure TModbus_TCPThread.SendPacket(Packet: TModbus_PacketTCP;
  HighPriority: boolean);
var
 packet_entry : TModbus_PacketTCP;
begin
 if FTimeoutError then exit;
 // lock this routine to prevent multiple sources from disrupting the packet
 FLock.Enter;
 try
  // create a copy of the packet for the queue
  packet_entry := TModbus_PacketTCP.Create; 
  packet_entry.Clone(Packet); // This operation is not thread safe...
  // assign a transaction ID
  inc(FTransactionID); 
  packet_entry.FTransactionID := FTransactionID; // and this is not thread safe 
  // add it to the queue
  if HighPriority then
   FTCPQueue.AddHP(packet_entry)  // these are thread safe
  else
   FTCPQueue.Add(packet_entry); // ditto...
  // start up the thread...
  SetEvent(FRun);
 finally
  FLock.Leave;
 end;
end;
 



procedure TModbus_TCPQueue.Flush;
begin
FQueue.Clear;
FQueueHP.Clear;
end;

If there are objects in those queues, you are leaking them. They need
to be destroyed when removed from the lists.

No leaking here - the clear routine removes and frees all list members...

procedure TModbus_PacketList.Clear;
var
 L : TList<TModbus_PacketTCP>;
 P : TModbus_PacketTCP;
begin
 L := LockList;
 try
  while (L.Count > 0) do
   begin
    P := L.Last;
    L.Remove(P);
    P.Free;
   end;
 finally
  UnlockList;
 end;
end;


Regards,

RZ

Edited by: Richard Zarr on Dec 1, 2017 11:25 AM

UPDATE!! I think we located the problem... so for now, more testing to verify this was what was hanging up. In every modbus command routine, we checked for connectivity with the server. this was a call to the TIdTCPClient.Connected function. In looking at this code, I believe we had a dead lock somewhere in the request for the connect status and the write or read command. We now use a private property called FConnected and update it in the TIdTCP.OnStatus event which happens even if a disconnect does not in place of the call to TIdTCPClient.Connected. I think this is where it went wrong... very weird and maybe you can shed some light on why using the TIdTCPClient.Connected function is a bad idea.

Regards,

RZ

Edited by: Richard Zarr on Dec 1, 2017 1:38 PM

Edited by: Richard Zarr on Dec 1, 2017 1:39 PM

UPDATE!! After 12 hours of continuous full rate testing... not one error. So our conclusion is that the polling and user client threads would call the TIdTCPClient.Connected function at or near the time the TIdTCPClient.IOHandler.Write() or FTCP.IOHandler.ReadBytes() was called by the communications thread somehow dead-locking it. So... the moral of this long story... do NOT call the TIdTCPClient.Connected function in any other thread other than the one using Write() or ReadBytes(). Maybe Remy can shed some light on why... but I'm sure it has to do with the way the TIdTCPClient figures out if it is still connected to the server.

Many thanks to Remy Lebeau (TeamB) for his suggestions (we implemented much of the code he suggested for a more stable and correct Modbus protocol).

Regards,

RZ (Strasis Systems, LLC).

Edited by: Richard Zarr on Dec 2, 2017 5:41 AM
Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: TIdTCPClient Transmit Timeout Not Working [Edit] [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Dec 4, 2017 10:01 AM   in response to: Richard Zarr in response to: Richard Zarr
Richard Zarr wrote:

In every modbus command routine, we checked for connectivity with the
server. this was a call to the TIdTCPClient.Connected function.

Definately DO NOT do that.

Connected() is not thread-safe, as it performs a read operation and
will put any data it reads into the InputBuffer. If another thread
also reads at the same time, you can corrupt the InputBuffer.

Besides, it is not necessary to call Connected() anyway. If the
connection is lost, the IOHandler will (eventually) raise an exception
during a send/read operation. Let it do that, and then handle the
exception accordingly.

UPDATE!! After 12 hours of continuous full rate testing... not one
error. So our conclusion is that the polling and user client threads
would call the TIdTCPClient.Connected function at or near the time
the TIdTCPClient.IOHandler.Write() or FTCP.IOHandler.ReadBytes() was
called by the communications thread somehow dead-locking it.

You are lucky that is all it was doing. Calling Connected() in one
thread while another thread calls ReadBytes() could go very very bad...

So... the moral of this long story... do NOT call the
TIdTCPClient.Connected function in any other thread other than the
one using Write() or ReadBytes().

A read and a write can be done safely in separate threads at the same
time. Just don't don't multiple reads in separate threads at the same,
or multiple writes in separate threads at the same time.

Though, you really shouldn't be calling Connected() AT ALL.

--
Remy Lebeau (TeamB)
Richard Zarr

Posts: 74
Registered: 7/1/98
Re: TIdTCPClient Transmit Timeout Not Working  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Dec 2, 2017 5:53 AM   in response to: Richard Zarr in response to: Richard Zarr
Remy Lebeau (TeamB) was extremely helpful in tracking down where the code was derailing. Our many thanks!
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02