i18n fonts

Discussion:

i18n fonts

Russell Shaw

2007-12-03 03:16:00 UTC

Hi,
I was thinking of making a multilingual text editor.

I don't get how glyphs are done outside of english.

I've read the Unicode Standard book.

When a paragraph of unicode characters is processed, the glyphs
are layed out according to the state contained in the unicode
character sequence.

Depending on this state, the same unicode characters can map to
multiple glyphs depending on context.

If multiple fonts exist for a language, then for all these font
files to work with an editor, then all these glyphs must be indexed
the same.

Where can i find the standard that specifies what glyphs are indexed
by what number? Or are these glyphs created on the fly by the unicode
paragraph layout processor?

Christopher Fynn

2007-12-03 04:27:18 UTC

Permalink

Hi Russell

Post by Russell Shaw
If multiple fonts exist for a language, then for all these font
files to work with an editor, then all these glyphs must be indexed
the same.

For complex scripts rarely will glyphs for the combinations you need to display
be indexed the same from font to font - you really need to access the glyphs via
OpenType, ATSUI /AAT or Graphite lookups.

For complex scripts (Indic, Arabic etc.) you probably want to use something
which map characters to glyphs, handles the OpenType (or other) lookups in the
font and returns a string of properly substituted and positioned glyphs. - To do
all this you will probably want to use something like Pango:
<http://www.pango.org/>

You will probably also want to look at the OpenType spec and related documents:
<http://www.microsoft.com/typography/SpecificationsOverview.mspx>
<http://www.adobe.com/devnet/opentype/>
<http://partners.adobe.com/public/developer/opentype/index_spec.html>

- Chris

Post by Russell Shaw
Hi,
I was thinking of making a multilingual text editor.
I don't get how glyphs are done outside of english.
I've read the Unicode Standard book.
When a paragraph of unicode characters is processed, the glyphs
are layed out according to the state contained in the unicode
character sequence.
Depending on this state, the same unicode characters can map to
multiple glyphs depending on context.
If multiple fonts exist for a language, then for all these font
files to work with an editor, then all these glyphs must be indexed
the same.
Where can i find the standard that specifies what glyphs are indexed
by what number? Or are these glyphs created on the fly by the unicode
paragraph layout processor?
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/

Rich Felker

2007-12-03 04:37:44 UTC

Permalink

The relevant standard is OpenType fonts, which contain the necessary
tables for mapping sequences of characters to glyphs. The glyph
indexing is specific to the particular font being used; there is no
standard across fonts, and in fact some fonts will use precomposed
glyphs while others will use constituent glyphs with positioning
information to achieve the same result.

OpenType was designed by Microsoft as an abstraction of TrueType and
Type1 fonts with the necessary features for proper Unicode rendering.
On Windows, Uniscribe/USP10.DLL is the code responsible for processing
these tables. Correctly multilingualized applications will use its
functions for text rendering (but all the standard Windows controls
will do that for apps).

The situation on Linux and *nix is a bit more diverse. Both GTK+ and
Qt widgets provide semi-correct OpenType handling, but with lots of
mistakes in handling scripts/languages their developers are not very
familiar with. Qt uses its own code for this, while GTK+ uses the
Pango library, an extremely slow “complex text layout” library which
does a lot more than is needed for most uses, and which duplicates
most of the font-specific logic in code, causing lots of headaches in
addition to bloat and bad performance (Firefox with Pango enabled is
many times slower than without; this is why many distributions still
have Pango support disabled by default, causing many languages not to
work...).

I’m very much hoping for a future direction of proper OpenType
rendering support without the need for Pango, but it requires someone
spending some time to understand the problem domain. Basically it’s
just a matter of applying substitution tables, and hard-coding lists
of which tables are needed for which scripts in Unicode and the order
in which they should be applied. (Originally they were intended to be
applied in the order they appear in the font files, but then MS went
and made their implementation hard-code the order, so other
implementations need to follow that in order to handle fonts properly
— or at least that’s my understanding.)

The OpenType specs themselves are available at Microsoft’s website,
but they’re very poorly documented. Reading them alone is insufficient
to make an implementation unless you already know basically what the
implementation must do, IMO — something like RFC 1459 in quality...

There’s a (semi-)new library called Harfbuzz which, as I understand
it, is purely the OpenType logic, without all the bloat of Pango. I’m
not sure what stage it’s at these days, but it might be a good place
to begin your search. Of course if your app depends on GTK+ or Qt you
can just use their widgets and forget about the whole issue, but I
hope someone will move things forward for OpenType font support
without the need for these toolkits.

Rich

Russell Shaw

2007-12-18 16:05:06 UTC

Permalink

Post by Rich Felker

The relevant standard is OpenType fonts, which contain the necessary
tables for mapping sequences of characters to glyphs. The glyph
indexing is specific to the particular font being used; there is no
standard across fonts, and in fact some fonts will use precomposed
glyphs while others will use constituent glyphs with positioning
information to achieve the same result.
OpenType was designed by Microsoft as an abstraction of TrueType and
Type1 fonts with the necessary features for proper Unicode rendering.
On Windows, Uniscribe/USP10.DLL is the code responsible for processing
these tables. Correctly multilingualized applications will use its
functions for text rendering (but all the standard Windows controls
will do that for apps).
The situation on Linux and *nix is a bit more diverse. Both GTK+ and
Qt widgets provide semi-correct OpenType handling, but with lots of
mistakes in handling scripts/languages their developers are not very
familiar with. Qt uses its own code for this, while GTK+ uses the
Pango library, an extremely slow “complex text layout” library which
does a lot more than is needed for most uses, and which duplicates
most of the font-specific logic in code, causing lots of headaches in
addition to bloat and bad performance (Firefox with Pango enabled is
many times slower than without; this is why many distributions still
have Pango support disabled by default, causing many languages not to
work...).
I’m very much hoping for a future direction of proper OpenType
rendering support without the need for Pango, but it requires someone
spending some time to understand the problem domain. Basically it’s
just a matter of applying substitution tables, and hard-coding lists
of which tables are needed for which scripts in Unicode and the order
in which they should be applied. (Originally they were intended to be
applied in the order they appear in the font files, but then MS went
and made their implementation hard-code the order, so other
implementations need to follow that in order to handle fonts properly
— or at least that’s my understanding.)
The OpenType specs themselves are available at Microsoft’s website,
but they’re very poorly documented. Reading them alone is insufficient
to make an implementation unless you already know basically what the
implementation must do, IMO — something like RFC 1459 in quality...
There’s a (semi-)new library called Harfbuzz which, as I understand
it, is purely the OpenType logic, without all the bloat of Pango. I’m
not sure what stage it’s at these days, but it might be a good place
to begin your search. Of course if your app depends on GTK+ or Qt you
can just use their widgets and forget about the whole issue, but I
hope someone will move things forward for OpenType font support
without the need for these toolkits.

Hi,
I can parse in the gsub tables. I was trying to do the gpos tables,
but the OpenType spec doesn't define "ValueRecord" in
"Single Adjustment Positioning: Format 1":

http://www.microsoft.com/typography/otspec/gpos.htm

Russell Shaw

2007-12-19 03:01:26 UTC

Permalink

...

Post by Russell Shaw
Hi,
I can parse in the gsub tables. I was trying to do the gpos tables,
but the OpenType spec doesn't define "ValueRecord" in
http://www.microsoft.com/typography/otspec/gpos.htm

I found it in there. For some reason, Ctrl-F "valuerecord" doesn't
find it in firefox.

Rich Felker

2007-12-19 03:52:46 UTC

Permalink

Post by Russell Shaw
....

I found it in there. For some reason, Ctrl-F "valuerecord" doesn't
find it in firefox.

Props for researching this stuff. If there's ever going to be good
implementations a lot more people need to know about how it works and
exchange ideas, challenge and debate how to make it best, etc.

Rich