Discussion:
CSets 2.1 released
Mark Leisher
2007-06-14 20:28:16 UTC
Permalink
http://crl.nmsu.edu/~mleisher/csets.html

It has been almost 2 years, so I thought I should get something out just
to keep it alive :-)

This release provides three new mapping tables:

1. A common Guarani encoding called Times Guarani (GN-TIMESG.TXT).
2. A Middle Eastern transliteration encoding from Knut Vikør's Jaghbub
font (JAGHBUB.TXT).
3. The Kazakh STRK1048-2002 national standard encoding (KZ1048.TXT).

For those of you not familiar with this package:

"The CSets collection is a set of mapping tables between various
character sets and Unicode, and is intended to provide mappings not
typically found in character set conversion tools available today."

As always, I am happy to accept mapping tables/conversion program source
code for any other obscure or under-represented encodings.
--
Mark Leisher
Aiet Kolkhi
2007-06-14 23:17:30 UTC
Permalink
Hello Mark,

CSets is a wonderful package.

Taking a loot at the list, it includes two 8-bit Georgian character
sets (GEO-ITA – aka Georgian-BPG and GEO-PS – aka GESCII).

In fact, there are two more character sets used in Georgian. One of
them is 7-bit charset Georgian Latin (or Georgian Transliterated) and
is used in the majority of Georgian written correspondence and
typography. About 90 % of Georgian computer typefaces are set in this
charset.

Fortunately this is changing as all Georgian sites seem to be using
UTF-8 (about 90% UTF-8, 8% GeoLat & 2% GEO-PS). Of course, nearly 100%
of Georgian software localization in accomplished in Unicode on all
major platforms.

This is partly due to lack of or incorrect Unicode Georgian keyboard
input layouts available on popular operating systems by default. This
issue has been resolved on Microsoft Vista and Fontconfig (thus all
major GNU Linux distributions) and remains to be solved on Apple OS X.

Do you think it would be wise to add GeoLat character set to CSets?
GeoLat is a very aggressive 7-bit character set, placing all
contemporary Georgian (Mkhedruli) characters over Latin characters
within 0040 - 007A ASCII range.

For comparison, over 90% of conversions on Georgian encoding
conversion site [1] is done from GeoLat (called _eng_ on the site) to
UTF-8.

Regards,

Noshre Chkhaidze

--
Aiet Kolkhi
http://www.Gakartuleba.org
http://www.284bc.com

[1] http://convert.ge/en/text.shtml

Loading...