Watch, Follow, &
Connect with Us

For forums, blogs and more please visit our
Developer Tools Community.


Welcome, Guest
Guest Settings
Help

Thread: mailto URI parsing



Permlink Replies: 4 - Last Post: Aug 27, 2016 12:10 AM Last Post By: Marco Rocci
Marco Rocci

Posts: 17
Registered: 8/13/13
mailto URI parsing
Click to report abuse...   Click to reply to this thread Reply
  Posted: Aug 24, 2016 3:37 AM
It seems that both System.Net.URLClient.TURI and IdURI.TIdURI are incapable of parsing URIs that do not have a double slash following the scheme. This includes mailto, news and other schemes that typically do not have the double slash. These are valid URIs and are often listed as examples of URIs.

I was porting to 10.1 Berlin some old D5 code that handled Dicom attributes... and one such attribute handles URIs. In D5 I had written my own class to encode and decode URIs. I had set up some tests with example URIs (taken from online reference documentation) and it all worked nicely. But I do not want to port my class to Berlin, as I see that in newer Delphi versions there are already 2 (or more maybe).

Is there any native Delphi way to correctly parse these URIs without using custom code? I am also curious as to why these URIs are not parsed correctly by TURI and TIdURI.

As reference... these are the URIs that I am using in my tests and which are not parsed correctly by either class:
   mailto:John.Doe@example.com
   news:comp.infosystems.www.servers.unix
   tel:+1-816-555-1212
   urn:oasis:names:specification:docbook:dtd:xml:4.1.2

TIA,
Marco Rocci

Edited by: Marco Rocci on Aug 24, 2016 3:41 AM
Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: mailto URI parsing [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: Aug 24, 2016 11:21 AM   in response to: Marco Rocci in response to: Marco Rocci
Marco wrote:

It seems that both System.Net.URLClient.TURI and IdURI.TIdURI
are incapable of parsing URIs that do not have a double slash following
the scheme. This includes mailto, news and other schemes that
typically do not have the double slash.

I can't comment on TURI, but this is a known limitation of TIdURI. It uses
a very basic parser that requires an authority component in non-filesystem
URLs. It is primarily intended for HTTP URIs.

That being said, I do have a new parser in development (http://indy.codeplex.com/workitem/15328)
that will be based on the URI and IRI specs (RFCs 3986 and 3987), and will
include specialized derived classes for many common IRIs/URIs, including
"mailto", "news", etc.

I was porting to 10.1 Berlin some old D5 code that handled Dicom
attributes... and one such attribute handles URIs. In D5 I had written
my own class to encode and decode URIs. I had set up some tests with
example URIs (taken from online reference documentation) and it all
worked nicely. But I do not want to port my class to Berlin, as I see
that in newer Delphi versions there are already 2 (or more maybe).

Sounds like you will have to port your old code.

I am also curious as to why these URIs are not parsed correctly
by TURI and TIdURI.

Because TIdURI is very old and very limited in its scope. The new parser
will be much more flexible.

--
Remy Lebeau (TeamB)
Marco Rocci

Posts: 17
Registered: 8/13/13
Re: mailto URI parsing [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: Aug 25, 2016 6:45 AM   in response to: Remy Lebeau (Te... in response to: Remy Lebeau (Te...
Thanks Remy,

Actually, for now, I have preferred commenting out those tests... and I opted for using a TIdURI. I trust that your code is more tested than mine in many other contexts. Until the need arises to handle those special case URIs outside of the unit test suite, I can live with that.

But... since you are rethinking/rewriting TIdURI, I found another problem, which you also probably already know of:

In Dicom tags, URIs need to be URLencoded, so I have some of that subject to unit tests also... and I found another glitch. I use this code, where FData is also a TIdURI.:
  With TIdURI.Create(FData.uri) Do
    Try
      Result := RawByteString(URLEncode(uri));
    Finally
      Free;
    End;

But this doesn't URL encode the Authority part of the URI and broke some tests. After some tries I found this, still unperfect, version that worked out:
  With TIdURI.Create(FData.uri) Do
    Try
      Username := TNetEncoding.URL.EncodeAuth(Username);
      Password := TNetEncoding.URL.EncodeAuth(Password);
      If IPVersion = Id_IPv4 Then
        Host := TNetEncoding.URL.EncodeAuth(FData.Host);
      Result := RawByteString(URLEncode(uri));
    Finally
      Free;
    End;

Notice that I've had to avoid URL encoding the IPv6 Hosts as square brackets are not handled correctly by TNetEncoding.

I'm not really sure about all this, but from what I've read in the reference documentation, I think that all parts of an URI except the Scheme and the Port could contain spaces and characters that need URL encoding. But I may be wrong. Hope this can be of use.

TIA and regards,
Marco Rocci
Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: mailto URI parsing [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: Aug 25, 2016 10:37 AM   in response to: Marco Rocci in response to: Marco Rocci
Marco wrote:

With TIdURI.Create(FData.uri) Do
Try
Result := RawByteString(URLEncode(uri));
Finally
Free;
End;

The encoding/decoding methods of TIdURI are static class methods, so you
don't need to construct a TIdURI object:

Result := RawByteString(TIdURI.URLEncode(FData.URI));


However, since FData is a TIdURI, I wonder why your URI is not already url-encoded
before it is parsed into FData. If it were, you could just use the TIdURI.URI
property as-is instead:

Result := RawByteString(FData.URI);


But this doesn't URL encode the Authority part of the URI
and broke some tests.

Only the Path, Document, and Params values are currently encoded by TIdURI.URLEncode().

RFC 3986 allows the userinfo and host subcomponents of the Authority component
to be url-encoded. The TIdURI.URI property getter does not output the userinfo
subcomponent, you would have to call the TIdURI.GetFullURI() method with
the ofAuthInfo flag in the AOptionalFields parameter. But neither the userinfo
nor host subcomponents are currently url-encoded, no. You would have to
do it manually (url-encoding the host subcomponent has an added requirement
that the host must be encoded to UTF-8 before url-encoding). However,
if a URI host is intended to be resolved with DNS, the host should be encoded
using IDNA instead of url-encoding (per the RFC).

After some tries I found this, still unperfect, version that worked out:

Alternatively (granted, TIdURI does not have an encoding method specifically
for the Authority subcomponents):

with TIdURI.Create(FData.URI) do
try
  Username := ParamsEncode(Username);
  Password := ParamsEncode(Password);
  Host := ParamsEncode(Host, IndyTextEncoding_UTF8);
  Path := PathEncode(Path);
  Document := PathEncode(Document);
  Params := ParamsEncode(Params);
  Result := RawByteString(URI);
finally
  Free;
end;


Notice that I've had to avoid URL encoding the IPv6 Hosts as square
brackets are not handled correctly by TNetEncoding.

The TIdURI.Host property should not contain any brackets, they get stripped
off when TIdURI parses a URI.

--
Remy Lebeau (TeamB)
Marco Rocci

Posts: 17
Registered: 8/13/13
Re: mailto URI parsing [Edit]
Click to report abuse...   Click to reply to this thread Reply
  Posted: Aug 27, 2016 12:10 AM   in response to: Remy Lebeau (Te... in response to: Remy Lebeau (Te...
Hi Remy,

Thanks for all the useful info. In the end I used this variation of the code you suggested:
  With TIdURI.Create(FData.GetFullURI) Do
    Try
      UserName := ParamsEncode(UserName);
      Password := ParamsEncode(Password);
      Host     := ParamsEncode(Host, IndyTextEncoding_UTF8);
      Path     := PathEncode(Path);
      Document := PathEncode(Document);
      Params   := ParamsEncode(Params);
      Result   := RawByteString(GetFullURI);
    Finally
      Free;
    End;

Now everything other than the doubleslash-less URIs work. You are right on the URL encoding of IPv6 hosts... that must have been from my trials with the rtl TURI... sorry.

Thanks for everything,
Marco Rocci
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02