Watch, Follow, &
Connect with Us

For forums, blogs and more please visit our
Developer Tools Community.


Welcome, Guest
Guest Settings
Help

Thread: Possible bug in TIdMessage


This question is answered.


Permlink Replies: 4 - Last Post: Oct 7, 2016 8:22 AM Last Post By: John May Threads: [ Previous | Next ]
John May

Posts: 81
Registered: 6/25/10
Possible bug in TIdMessage  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 6, 2016 7:34 AM
I have a message which looks like this:

To: aaa@bbb.ccc
Subject: aaa
From: xxx <xxx@yyy.com>
MIME-Version: 1.0
Content-type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 7bit;
Message-Id: <20161003154352.5AEF63EE39@yyy.com>
Date: Mon,  3 Oct 2016 11:43:52 -0400 (EDT)
 
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Test title</title>
</head>
<body>
<br>
<br>Test Message
</body>
</html>


When scanning through IdMsg->MessageParts, TIdMessage recognizes it inherits from __classid(TIdAttachment). Which might be the safe default if Content-Disposition is not specified - Don't know. Perhaps. But it is not logical since the Content Type is text/html so it should be TIdText

The code to scan is something like this:
for (int i = 0; i < IdMsg->MessageParts->Count; i++)
	{
	if (IdMsg->MessageParts->Items[i]->InheritsFrom(__classid(TIdAttachment)))
		{
		// Show as attachment here
		}
	else if (IdMsg->MessageParts->Items[i]->InheritsFrom(__classid(TIdText)))
		{
		// Show as text or HTML here
		}
	else
		{
		throw Exception("Unrecognized part type");		// This should never happen, failsafe only
		}
	}


I have also examined this example - http://stackoverflow.com/questions/14671010/evaluate-email-with-indy-10-and-delphi

But if Content-Disposition is explicitly set:

To: aaa@bbb.ccc
Subject: aaa
From: xxx <xxx@yyy.com>
MIME-Version: 1.0
Content-type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 7bit;
Content-Disposition: inline
Message-Id: <20161003154352.5AEF63EE39@yyy.com>
Date: Mon,  3 Oct 2016 11:43:52 -0400 (EDT)
 
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Test title</title>
</head>
<body>
<br>
<br>Test Message
</body>
</html>


It still inherits from __classid(TIdAttachment) and not from what is probably supposed to be - __classid(TIdText) as text/html should be TIdText ?

Is this a bug with TIdMessage or did I misunderstood something? Or is Content Disposition completely irrelevant here? I have Indy 5369 version at the moment.
Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: Possible bug in TIdMessage [Edit]
Helpful
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 6, 2016 9:45 AM   in response to: John May in response to: John May
John wrote:

Content-type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 7bit;

ISO-8859-1 is not a 7bit encoding, so using "7bit" is risking data loss.
8-bit data needs to use "8bit", "quoted-printable", or "base64" instead.

When scanning through IdMsg->MessageParts, TIdMessage recognizes
it inherits from __classid(TIdAttachment).

The problem is the trailing semicolon after the "7bit" value. The "Content-Transfer-Encoding"
header does not allow attributes, so ";" is not an allowed character in the
header value (see RFC 2045 section 6.1). The semicolon is not stripped off
when Indy reads the "Content-Transfer-Encoding" value, so "7bit;" gets treated
as an unknown encoding, causing the HTML to be saved in a TIdAttachment instead
of a TIdText.

But it is not logical since the Content Type is text/html so it should
be TIdText

And under normal conditions, it would be.

I have also examined this example -
http://stackoverflow.com/questions/14671010/evaluate-email-with-indy-10-and-delphi

But if Content-Disposition is explicitly set:
<snip>

It still inherits from __classid(TIdAttachment) and not from what
is probably supposed to be - __classid(TIdText) as text/html should
be TIdText ?

The issue is with the malformed "Content-Transfer-Encoding" header, not with
the "Content-Disposition" header.

Is this a bug with TIdMessage or did I misunderstood something?

Your email has a malformed "Content-Transfer-Encoding" header.

Or is Content Disposition completely irrelevant here?

Yes.

--
Remy Lebeau (TeamB)
John May

Posts: 81
Registered: 6/25/10
Re: Possible bug in TIdMessage  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 6, 2016 12:20 PM   in response to: Remy Lebeau (Te... in response to: Remy Lebeau (Te...
Thank you for explaining this. However, this poses a new problem:

header value (see RFC 2045 section 6.1). The semicolon is not stripped off
when Indy reads the "Content-Transfer-Encoding" value, so "7bit;" gets treated
as an unknown encoding, causing the HTML to be saved in a TIdAttachment instead
of a TIdText.

The message is cut from a larger "newsletter" message, I just made it shorter for this example.
I tested this in most major email clients - Thunderbird, Outlook, WLM, OE6, Gmail webmail, iPhone (Apple mail app) and Gmail phone app. All of them display the message fine except my message decoder. This makes Indy decoder inferior to all of these programs - at least to the eye of the user. Another problem is that most of these messages are indeed tested in these programs. If they behave well in listed programs, they are considered valid, even if they are not. People don't have time to check all the RFCs when quickly making script for sending emails, so they check in above programs, like I did - and they conclude it is good. The problems appear only in Indy.

I could make the decoder more tolerant by ignoring the semicolon like all the above programs do. But as it seems there is no way to provide a default value or workaround on the user-level without modifying the source-code of TIdMessage.

In fact, in general - most problems with Indy decoder I found are inability to provide default values for:

- encoding (if encoding in message is wrong, user could set own encoding to show it properly)
- content-type (I have no example for this ATM)
- content-transfer-encoding (in this case)

By having a "forced" encoding or content-transfer-encoding this would be solvable problem by programmer using TIdMessage. Then I would set Content-Transfer-Encoding to be default "7bit" like Exchange does and problem solved.

Or maybe replacing AnsiCompareText with StartsText when checking for content-transfer-encoding? There is no need to strip anything, just check only the part from beginning, if it ends with CRLF it is the same, if it ends with anything else it will be at least recognized. This should not break the existing functionality of the component either but would make it tolerant to this case (I encountered a few messages of this type).

Any workarounds? Anything that can be done to make it more tolerant?
Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: Possible bug in TIdMessage [Edit]
Correct
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 6, 2016 4:48 PM   in response to: John May in response to: John May
John wrote:

The message is cut from a larger "newsletter" message, I just made
it shorter for this example.

Doesn't matter. What I said still applies. ANY SYSTEM that is putting a
semicolon inside of a Content-Transfer-Encoding header has a logic bug.

I tested this in most major email clients - Thunderbird, Outlook, WLM,
OE6, Gmail webmail, iPhone (Apple mail app) and Gmail phone app. All
of them display the message fine except my message decoder.

Good for them. It means they are ignoring established standards that the
rest of the world has to abide by.

I could make the decoder more tolerant by ignoring the semicolon
like all the above programs do.

I will consider adding that to Indy's decoder.

But as it seems there is no way to provide a default value or workaround
on the user-level without modifying the source-code of TIdMessage.

Correct.

In fact, in general - most problems with Indy decoder I found are
inability to provide default values for:

- encoding (if encoding in message is wrong, user could set own
encoding to show it properly)

- content-type (I have no example for this ATM)

- content-transfer-encoding (in this case)

That would require re-parsing the complete email from scratch, ignoring every
type/encoding the email claims. That would require significant changes to
Indy's interfaces to pass that encoding information around through the various
layers that the email data passes through while being decoded. And that
is a level of effort I'm not willing (or able) to make in the current version
of Indy. Maybe for Indy 11, but I'm not making any promises.

Or maybe replacing AnsiCompareText with StartsText when checking
for content-transfer-encoding?

No. That provides too much freedom for senders to send completely invalid
data that acts like valid data. Semicolon has special meaning in MIME, I
would be willing to make an exception to handle it accordingly, but that
is as far as I would take it.

There is no need to strip anything, just check only the part from beginning,
if it ends with CRLF it is the same, if it ends with anything else it will
be
at least recognized.

This should not break the existing functionality of the component
either but would make it tolerant to this case (I encountered a few
messages of this type).

Actually, it would break something. MIME has particular rules when encountering
unrecognized encodings, and there is logic in TIdMessage to account for that
(which is why your example text ended up in TIdAttachment in the first place).
Treating "7bit;" the same as "7bit" is very different than treating "7bitMyA$$"
as "7bit", for example.

Any workarounds? Anything that can be done to make it more tolerant?

I have checked in a fix to make it more tolerant.

--
Remy Lebeau (TeamB)
John May

Posts: 81
Registered: 6/25/10
Re: Possible bug in TIdMessage [Edit]  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Oct 7, 2016 8:11 AM   in response to: Remy Lebeau (Te... in response to: Remy Lebeau (Te...
Thank you for your reply and the checked-in workaround!

I tested this in most major email clients - Thunderbird, Outlook, WLM,
OE6, Gmail webmail, iPhone (Apple mail app) and Gmail phone app. All
of them display the message fine except my message decoder.
Good for them. It means they are ignoring established standards that the
rest of the world has to abide by.

I agree on that the standards should be followed - for encoders. I am with you 100% on that. But it is unrealistic to expect that everyone will send 100% compliant MIME message and test it only on Indy when testing their own code. If you take "the rest of the world" as argument for what everyone does - it would seem the other way around - the error is ignored by the rest of the world (decoders only). But please - know that I highly appreciate your argument and the work you do - it is a great service to Delphi/C++ developers, no doubt! This fix will cause fewer Indy decoder questions, that is for sure.
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02