Watch, Follow, &
Connect with Us

For forums, blogs and more please visit our
Developer Tools Community.


Welcome, Guest
Guest Settings
Help

Thread: UTF-8 without BOM



Permlink Replies: 3 - Last Post: Jan 26, 2015 1:13 PM Last Post By: Andrej Mrvar
Andrej Mrvar

Posts: 99
Registered: 10/20/10
UTF-8 without BOM
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jan 26, 2015 11:47 AM
Hello!

If I write

Writer:=TStreamWriter.Create(TFileStream.Create(dat,fmCreate),TEncoding.UTF8);

the generated file will be UTF-8 with first three bytes defining a BOM - Byte Order Mark.

What should I write to get a UTF-8 file without BOM?

Thanks in Advance.

Andrej

Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: UTF-8 without BOM
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jan 26, 2015 12:08 PM   in response to: Andrej Mrvar in response to: Andrej Mrvar
Andrej wrote:

Writer:=TStreamWriter.Create(TFileStream.Create(dat,fmCreate),TEncoding.UTF8);

That is a memory leak on desktop platforms, as TStreamWriter does not take
ownership of your TFileStream. You have to either save the object pointer
into a variable so you can free it yourself when done using it:

FileStrm := TFileStream.Create(dat,fmCreate);
Writer := TStreamWriter.Create(FileStrm,TEncoding.UTF8);
...
Writer.Free;
FileStrm.Free;


Or else use the TStreamWriter constructor that takes a filename as input:

Writer := TStreamWriter.Create(dat, False, TEncoding.UTF8);


What should I write to get a UTF-8 file without BOM?

In D2009-XE, the TMBCSEncoding.GetPreamble() method never returned a BOM,
so you can use TEncoding.GetEncoding(65001), and then free the TEncoding
object when done using it:

Enc := TEncoding.UTF8;
Writer := TStreamWriter.Create(..., Enc);
...
Writer.Free;
Enc.Free;


However, that loophole was fixed in XE2. For XE2 onwards, you will have
to derive a new class from SysUtils.TUTF8Encoding and override its GetPreamble()
method (you can do the same in earlier versions as well):

type
  TUTF8NoBOMEncoding = class(TUTF8Encoding)
  public
    function GetPreamble: TBytes; override;
  end;
 
function TUTF8NoBOMEncoding.GetPreamble: TBytes;
begin
  SetLength(Result, 0);
end;
 
...
 
Enc := TUTF8NoBOMEncoding.Create;
Writer := TStreamWriter.Create(..., Enc);
...
Writer.Free;
Enc.Free;


--
Remy Lebeau (TeamB)
Remy Lebeau (Te...


Posts: 9,447
Registered: 12/23/01
Re: UTF-8 without BOM
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jan 26, 2015 12:10 PM   in response to: Andrej Mrvar in response to: Andrej Mrvar
Andrej wrote:

Writer:=TStreamWriter.Create(TFileStream.Create(dat,fmCreate),TEncoding.UTF8);

That is a memory leak on desktop platforms, as TStreamWriter does not take
ownership of your TFileStream. You have to either save the object pointer
into a variable so you can free it yourself when done using it:

FileStrm := TFileStream.Create(dat,fmCreate);
Writer := TStreamWriter.Create(FileStrm,TEncoding.UTF8);
...
Writer.Free;
FileStrm.Free;


Or else use the TStreamWriter constructor that takes a filename as input:

Writer := TStreamWriter.Create(dat, False, TEncoding.UTF8);


What should I write to get a UTF-8 file without BOM?

In D2009-XE, the TMBCSEncoding.GetPreamble() method never returned a BOM,
so you can use TEncoding.GetEncoding(65001), and then free the TEncoding
object when done using it:

Enc := TEncoding.UTF8;
Writer := TStreamWriter.Create(..., Enc);
...
Writer.Free;
Enc.Free;


However, that loophole was fixed in XE2. For XE2 onwards, you will have
to derive a new class from SysUtils.TUTF8Encoding and override its GetPreamble()
method (you can do the same in earlier versions as well):

type
  TUTF8NoBOMEncoding = class(TUTF8Encoding)
  public
    function GetPreamble: TBytes; override;
  end;
 
function TUTF8NoBOMEncoding.GetPreamble: TBytes;
begin
  SetLength(Result, 0);
end;
 
...
 
Enc := TUTF8NoBOMEncoding.Create;
Writer := TStreamWriter.Create(..., Enc);
...
Writer.Free;
Enc.Free;


--
Remy Lebeau (TeamB)
Andrej Mrvar

Posts: 99
Registered: 10/20/10
Re: UTF-8 without BOM
Click to report abuse...   Click to reply to this thread Reply
  Posted: Jan 26, 2015 1:13 PM   in response to: Remy Lebeau (Te... in response to: Remy Lebeau (Te...
Thanks Remy, your suggestions were exactly what I need.
As always ;)
Andrej
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02