Welcome, Guest
Guest Settings
Help

Thread: IbConsole: Converting to UTF8 problem? Crashes when creating UTF8 database


This question is answered. Helpful answers available: 2. Correct answers available: 1.


Permlink Replies: 5 - Last Post: Mar 8, 2017 11:28 AM Last Post By: Jeff Overcash (... Threads: [ Previous | Next ]
Michael Schewe

Posts: 3
Registered: 2/20/04
IbConsole: Converting to UTF8 problem? Crashes when creating UTF8 database  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 2, 2017 2:21 AM
I need to convert an Interbase database from default-codepage to UTF8.
I use IB-XE7 and IbConsole v12.1.0.142 that came with Delphi Berlin

Creating a new UTF8-database via the ibconsole menu causes ibconsole to crash.
It will not crash when default character-set is set to "none".

So instead I use this query:
CREATE DATABASE 'new-utf8.ib' USER 'SYSDBA' PASSWORD 'masterkey' PAGE_SIZE 8192 DEFAULT CHARACTER SET UTF8;

Then I open the source database
and use "copy database" to copy the content to new-utf8.ib

All works fine.
But when opening new-utf8.ib ibconsole shows all umlauts as 2 garbled characters.

Can't ibconsole display utf8 strings?

In Delphi Seattle:
TIBDataBase is set to
lc_ctype=UTF8

TIbQuery
select * from myTable

causes the exception
"arithmetic exception, numeric overflow, or string truncation
Cannot transliterate character between character sets."

Q:
doesn't "copy database" convert strings to UTF8 properly even though the database was created as UTF8?

Jeff Overcash (...

Posts: 1,311
Registered: 9/23/99
Re: IbConsole: Converting to UTF8 problem? Crashes when creating UTF8 database  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 2, 2017 9:41 AM   in response to: Michael Schewe in response to: Michael Schewe
Michael Schewe wrote:
I need to convert an Interbase database from default-codepage to UTF8.
I use IB-XE7 and IbConsole v12.1.0.142 that came with Delphi Berlin

Creating a new UTF8-database via the ibconsole menu causes ibconsole to crash.
It will not crash when default character-set is set to "none".

So instead I use this query:
CREATE DATABASE 'new-utf8.ib' USER 'SYSDBA' PASSWORD 'masterkey' PAGE_SIZE 8192 DEFAULT CHARACTER SET UTF8;

Then I open the source database
and use "copy database" to copy the content to new-utf8.ib

All works fine.
But when opening new-utf8.ib ibconsole shows all umlauts as 2 garbled characters.

Can't ibconsole display utf8 strings?

In Delphi Seattle:
TIBDataBase is set to
lc_ctype=UTF8

TIbQuery
select * from myTable

causes the exception
"arithmetic exception, numeric overflow, or string truncation
Cannot transliterate character between character sets."

Q:
doesn't "copy database" convert strings to UTF8 properly even though the database was created as UTF8?


Make sure you are setting the connection's Display character Set to UTF8 (this
is the equivalent to TIBDatabase's lc_ctype).

--
Jeff Overcash (TeamB)
(Please do not email me directly unless asked. Thank You)
Learning is finding out what you already know. Doing is demonstrating that you
know it. Teaching is reminding others that they know it as well as you. We are
all leaners, doers, teachers. (R Bach)
Michael Schewe

Posts: 3
Registered: 2/20/04
Re: IbConsole: Converting to UTF8 problem? Crashes when creating UTF8 database  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 5, 2017 10:11 AM   in response to: Michael Schewe in response to: Michael Schewe
Thanks Jeff,

I had overlooked that settings. Now it works.

However there remains one problem:
Textblob content is not converted to UTF8.
Accessing a Textblob containing umlauts with IbQuery throws this exception:
"No mapping for the Unicode character exists in the target multi-byte code page."

IbConsole also shows either a single character or nothing in case of umlauts.

Is there any way to force IbConsole to translate textblobs also?

Is there a way in Delphi to access that textblob as bytes so that no exception occurs?
Then I could convert it myself. Casting to a different type seems not possible in Interbase?

BTW:
creating a new UTF8 database throws tis exception:

"Access violation at address 00000001 in module 'IBConsole.exe'. Read of address 00000001."

The same also in menu "Database Copy"

Jeff Overcash (...

Posts: 1,311
Registered: 9/23/99
Re: IbConsole: Converting to UTF8 problem? Crashes when creating UTF8 database  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 6, 2017 10:45 AM   in response to: Michael Schewe in response to: Michael Schewe
Michael Schewe wrote:
Thanks Jeff,

I had overlooked that settings. Now it works.

However there remains one problem:
Textblob content is not converted to UTF8.
Accessing a Textblob containing umlauts with IbQuery throws this exception:
"No mapping for the Unicode character exists in the target multi-byte code page."

IbConsole also shows either a single character or nothing in case of umlauts.

Is there any way to force IbConsole to translate textblobs also?

Is there a way in Delphi to access that textblob as bytes so that no exception occurs?
Then I could convert it myself. Casting to a different type seems not possible in Interbase?

BTW:
creating a new UTF8 database throws tis exception:

"Access violation at address 00000001 in module 'IBConsole.exe'. Read of address 00000001."

The same also in menu "Database Copy"


Unofrtunately no. Blobs are moved in binary format and the lc_ctype has no
impact on text blobs. For the most part the TEXT subtype is only a hint to the
clients what type of data is held by the blob. IB itself doesn't do anything
different with TEXT blobs than non-text ones.

You can fix up the data by setting two connections, one without the lc_ctype and
the other with it set to UTF8. Read from the non lc_ctype. Then for the update
one that uses the lc_ctype you will want to create. The update one can not just
be an update statement. For this to work you have to go through
TWideMemoField's to get the conversion.

Here is the key code that does the conversion

function TIBDSBlobStream.Write(const Buffer; Count: Longint): Longint;
 
   function DecodeLength : LongInt;
   var
     bt : TBytes;
   begin
     SetLength(bt, Count);
     Move(Buffer, bt[0], Count);
     bt := FBlobStream.Database.Encoding.Convert(TEncoding.Unicode, 
FBlobStream.Database.Encoding, bt);
     FBlobStream.Write(bt[0], Length(bt));
     Result := Count;
   end;
 
begin
   FModified := true;
   if not (FField.DataSet.State in [dsEdit, dsInsert]) then
     IBError(ibxeNotEditing, [nil]);
   TIBCustomDataSet(FField.DataSet).RecordModified(True);
   if FField is TWideMemoField then
     Result := DecodeLength
   else
     result := FBlobStream.Write(Buffer, Count);
end;


notice that the Encoding depends on the Database's Encoding (which is the
lc_ctype set when the connection is made). These streams are only created
through CreateBlobStream which only happens for TBlobFields and their
descendants, TParams so not create these.

You can might be able to do it in a single pass using TIBDataset. Select the
primary key + blob field, set the lc_ctype to UTF8 and open the result set.

Then using the current property of IBDataset get at the bytes of the blob and
convert it on hte fly to UTF16 and pass that back to the blob. something like

procedure TForm1.btn1Click(Sender: TObject);
begin
   trn1.StartTransaction;
   try
     dts1.Open;
     while not dts1.Eof do
     begin
       dts1.Edit;
       dts1.FieldByName('proj_desc').AsString := 
TEncoding.Default.GetString(dts1.Current.ByName('proj_desc').AsBytes);
       dts1.Post;
       dts1.Next;
     end;
     dts1.Close;
   finally
     trn1.Rollback;
   end;
end;


This would force the encoding to be UFT8.


--
Jeff Overcash (TeamB)
(Please do not email me directly unless asked. Thank You)
Learning is finding out what you already know. Doing is demonstrating that you
know it. Teaching is reminding others that they know it as well as you. We are
all leaners, doers, teachers. (R Bach)

Michael Schewe

Posts: 3
Registered: 2/20/04
Re: IbConsole: Converting to UTF8 problem? Crashes when creating UTF8 database  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 8, 2017 10:17 AM   in response to: Michael Schewe in response to: Michael Schewe
Thanks again Jeff !
I got it working.

There remains one little problem:

IBConsole does not show the field content properly, just one character.
When entereing a string in IBConsole then DBMemo connected to IBDataSet displays chinese characters.

Is IBConsole treating the string differently?

When using a default characterset database, then the strings look identical.

I don't want to loose the option of viewing the database with IBConsole, that's why I ask.
Jeff Overcash (...

Posts: 1,311
Registered: 9/23/99
Re: IbConsole: Converting to UTF8 problem? Crashes when creating UTF8 database  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Mar 8, 2017 11:28 AM   in response to: Michael Schewe in response to: Michael Schewe
Michael Schewe wrote:
Thanks again Jeff !
I got it working.

There remains one little problem:

IBConsole does not show the field content properly, just one character.
When entereing a string in IBConsole then DBMemo connected to IBDataSet displays chinese characters.

Is IBConsole treating the string differently?

When using a default characterset database, then the strings look identical.

I don't want to loose the option of viewing the database with IBConsole, that's why I ask.

The problem tends to be that with normal string data IB does conversions on the
fly For instance if the data is in Win1250, the lc_ctype is UTF8 (and Delphi
now is UTF16) it goes

1 read record from disk (Win1250)
2 convert to UTF8
3 put onto the wire
4 Client receives
5 IBX then converts from UTF8 to UTF16 for local storage and use.

With blobs though step 2 does not happen because IB just treats it as binary
data not string data. So for text blobs it is important that the encoding of
the text blob and the lc_ctype match.

For Ansi data this is not needed unless you are trying to share across different
client machines with different default codepages. Unfortunately by the time
that stream I was talking about earlier in this thread is built and used the
information about the blob itself is not accessible which is why I had to have
the rule that the lc_ctype would determine the encoding to use (if no lc_ctype
then it would use your default codepage).

--
Jeff Overcash (TeamB)
(Please do not email me directly unless asked. Thank You)
Learning is finding out what you already know. Doing is demonstrating that you
know it. Teaching is reminding others that they know it as well as you. We are
all leaners, doers, teachers. (R Bach)
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02