Post by Russell ShawHi,
I was thinking of making a multilingual text editor.
I don't get how glyphs are done outside of english.
I've read the Unicode Standard book.
When a paragraph of unicode characters is processed, the glyphs
are layed out according to the state contained in the unicode
character sequence.
Depending on this state, the same unicode characters can map to
multiple glyphs depending on context.
If multiple fonts exist for a language, then for all these font
files to work with an editor, then all these glyphs must be indexed
the same.
Where can i find the standard that specifies what glyphs are indexed
by what number? Or are these glyphs created on the fly by the unicode
paragraph layout processor?
The relevant standard is OpenType fonts, which contain the necessary
tables for mapping sequences of characters to glyphs. The glyph
indexing is specific to the particular font being used; there is no
standard across fonts, and in fact some fonts will use precomposed
glyphs while others will use constituent glyphs with positioning
information to achieve the same result.
OpenType was designed by Microsoft as an abstraction of TrueType and
Type1 fonts with the necessary features for proper Unicode rendering.
On Windows, Uniscribe/USP10.DLL is the code responsible for processing
these tables. Correctly multilingualized applications will use its
functions for text rendering (but all the standard Windows controls
will do that for apps).
The situation on Linux and *nix is a bit more diverse. Both GTK+ and
Qt widgets provide semi-correct OpenType handling, but with lots of
mistakes in handling scripts/languages their developers are not very
familiar with. Qt uses its own code for this, while GTK+ uses the
Pango library, an extremely slow “complex text layout” library which
does a lot more than is needed for most uses, and which duplicates
most of the font-specific logic in code, causing lots of headaches in
addition to bloat and bad performance (Firefox with Pango enabled is
many times slower than without; this is why many distributions still
have Pango support disabled by default, causing many languages not to
work...).
I’m very much hoping for a future direction of proper OpenType
rendering support without the need for Pango, but it requires someone
spending some time to understand the problem domain. Basically it’s
just a matter of applying substitution tables, and hard-coding lists
of which tables are needed for which scripts in Unicode and the order
in which they should be applied. (Originally they were intended to be
applied in the order they appear in the font files, but then MS went
and made their implementation hard-code the order, so other
implementations need to follow that in order to handle fonts properly
— or at least that’s my understanding.)
The OpenType specs themselves are available at Microsoft’s website,
but they’re very poorly documented. Reading them alone is insufficient
to make an implementation unless you already know basically what the
implementation must do, IMO — something like RFC 1459 in quality...
There’s a (semi-)new library called Harfbuzz which, as I understand
it, is purely the OpenType logic, without all the bloat of Pango. I’m
not sure what stage it’s at these days, but it might be a good place
to begin your search. Of course if your app depends on GTK+ or Qt you
can just use their widgets and forget about the whole issue, but I
hope someone will move things forward for OpenType font support
without the need for these toolkits.
Rich