Subject | UDFs and alternative character sets |
---|---|
Author | Ray Holme |
Post date | 2014-01-01T13:43:52Z |
I have noted that "cstring" must be shorter than 32k (divide by the
character set width). This makes sense as the DB cannot import or show
these longer strings.
However, I am confused still. I would think that a "cstring" is going to
always be a c-string (ascii string of characters terminated by null).
If this is so, the db engine must convert internal strings to cstring
for the udf and back again if the udf emits them.
If not, two important things need to be well defined:
a) there must be a call in the DB library to tell us what set we have
so the UDF can handle multiple types
OR
you must write character set dependent routines
AND
b) there must be some documentation somewhere as to what these
strings look like - I see stuff on the net, but it is rather hard to get
a clear understanding
I am thinking of UTF8 (net looks like this is 16bits per character, but
I have heard 24 too).
If a true cstring is still ascii with null terminator, this makes the
UDF writer's job much easier (but it means some UDF functions cannot
work for a lot of languages).
Can anyone give me a clear answer on this and/or point me to a clear
page describing UTF8 if I need to deal with it in a UDF.
Happy New Year and thanks for any help you can provide.
character set width). This makes sense as the DB cannot import or show
these longer strings.
However, I am confused still. I would think that a "cstring" is going to
always be a c-string (ascii string of characters terminated by null).
If this is so, the db engine must convert internal strings to cstring
for the udf and back again if the udf emits them.
If not, two important things need to be well defined:
a) there must be a call in the DB library to tell us what set we have
so the UDF can handle multiple types
OR
you must write character set dependent routines
AND
b) there must be some documentation somewhere as to what these
strings look like - I see stuff on the net, but it is rather hard to get
a clear understanding
I am thinking of UTF8 (net looks like this is 16bits per character, but
I have heard 24 too).
If a true cstring is still ascii with null terminator, this makes the
UDF writer's job much easier (but it means some UDF functions cannot
work for a lot of languages).
Can anyone give me a clear answer on this and/or point me to a clear
page describing UTF8 if I need to deal with it in a UDF.
Happy New Year and thanks for any help you can provide.