Discussion:
[Paps-discuss] Combining characters not rendered properly
Jan Willem Stumpel
2006-12-08 12:03:42 UTC
Permalink
[copied to linux-***@nl.linux.org from a discussion which started
at paps-***@lists.sourceforge.net]

The "combining accents" problem seems really complicated.
Depending on the font and the rendering engine, combining accents
are sometimes displayed/printed correctly, or sometimes not.

Here are the results of some experiments, trying to display the
following combining (not pre-combined) accents:

à made by a + U+300
á made by a + U+301
ã made by a + U+303
ả made by a + U+309
ạ made by a + U+323
a with comma below made by a + U+326

Now sometimes this works (the accents are displayed above or below
the base letters) and sometimes it does not (the accents are
displayed after the base letters).

Legend:

OO : rendering of combining accents in Openoffice
GE : rendering of combining accents in Pango (gedit/paps)
TW : rendering in paps by Thomas Wolff's experiments
CA : "combining accents" present in font file
CL : "OT Glyph Class" (whatever that is) of combining accents
(as shown by fontforge)
GS : font file has a GSUB table

- : not at all
-- : font does not have code position for combining accents
(Type1 font file)
+ : some accents (or "yes")
++ : all accents (especially including U+323, U+326)
A : OT Glyph Class is "Automatic"
M : OT Glyph Class is "Mark"


Font OO GE TW CA CL GS
==== == == == == == ==

Andale Mono - - - A -

Arial ++ ++ + A +

AR PL KaitiM GB ++ ++ - A

Baekmuk Dotum ++ ++ - A

Bitstream Vera Serif ++ ++ - A

Bitstream Vera Sans Mono - - - - A

Code2000 ++ ++ ++ M

Comic Sans MS ++ ++ - A -

Courier (10 pitch) - ++ - --

Courier New - - + + A +

FreeMono + + ++ M

FreeSans ++ ++ ++ M

Free Serif ++ ++ ++ M

Lucida Bright + + - A

Luxi Mono - - - A

Luxi Sans + + - A

Luxi Serif + + - A

Times New Roman ++ ++ + A +

Trebuchet MS ++ ++ - A -

URW Bookman L ++ ++ --

Verdana - - + A -


Well, I cannot make heads or tails of this. In general, the
behaviour of pango seems to be the same as that of Openoffice,
apart from the case Courier 10 pitch (which is a type 1 font). But
why the combining accents work in some fonts and not in others, I
have no clue. Are there bugs in the rendering engines? Or (more
likely) in the fonts? But what are these bugs exactly?

Thomas Wolff also showed paps results with "vera sans mono" which
shows the accents *before* the base letters. Which font is this
exactly? It it obviously not the same as "Bitstream vera Sans Mono".

I include (for members of the linux-utf-8 list) a test file made
by Thomas Wolff, containing "combining accents".

Regards, Jan
Arne Götje (高盛華)
2006-12-08 13:40:07 UTC
Permalink
Post by Jan Willem Stumpel
The "combining accents" problem seems really complicated.
Depending on the font and the rendering engine, combining accents
are sometimes displayed/printed correctly, or sometimes not.
Correct.
Post by Jan Willem Stumpel
Here are the results of some experiments, trying to display the
[...]
Post by Jan Willem Stumpel
Font OO GE TW CA CL GS
==== == == == == == ==
Andale Mono - - - A -
Arial ++ ++ + A +
AR PL KaitiM GB ++ ++ - A
I highly doubt, that the Arphic fonts can do combining accents. The
accents are not present in those fonts. Which simply means, that your
font rendering engine replaced the glyphs with some from other fonts.
Post by Jan Willem Stumpel
Well, I cannot make heads or tails of this. In general, the
behaviour of pango seems to be the same as that of Openoffice,
apart from the case Courier 10 pitch (which is a type 1 font). But
why the combining accents work in some fonts and not in others, I
have no clue. Are there bugs in the rendering engines? Or (more
likely) in the fonts? But what are these bugs exactly?
Both.

The fonts need to contain
a) the base glyphs
b) the combining accents
c) "anchors" in the GPOS table to tell the rendering engine where
exactly to place the accents. In fontforge this is done easily. :)

The rendering engine needs to support the GPOS table and render the
glyphs according to it. AFAIK only pango can do this currently.

M$ has cheated in its Times New Roman font. They use the normal accents,
not the combining ones and give them a negative position. Some
rendering engines will then appear to render them "correctly", although
they actually don't. This might also be the answer for the next
question...
Post by Jan Willem Stumpel
Thomas Wolff also showed paps results with "vera sans mono" which
shows the accents *before* the base letters. Which font is this
exactly? It it obviously not the same as "Bitstream vera Sans Mono".
HTH
Arne
--
Arne Götje (高盛華) <***@linux.org.tw>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F 1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net. Encrypted e-mail preferred.
Jan Willem Stumpel
2006-12-10 15:11:26 UTC
Permalink
I highly doubt that the Arphic fonts can do combining accents.
The accents are not present in those fonts. Which simply
means, that your font rendering engine replaced the glyphs with
some from other fonts.
Your explanation is enlightening. Thanks! I haven't yet found out
the exact mechanism by which some fonts which do not contain
combining accents, yet manage to display them. It must have
something to do with the order of preference in fontconfig's
config files.
Post by Jan Willem Stumpel
But what are these bugs exactly?
=20
Both.
=20
The fonts need to contain a) the base glyphs b) the combining=20
accents c) "anchors" in the GPOS table to tell the rendering=20
engine where exactly to place the accents. In fontforge this is
done easily. :)
I am sure I detect some irony here.. Anyway it is true that in all
cases when a), b), c) are fulfilled (but there are not many of
those), combining accents work (at least for *single* accents).
And they work (more or less by luck) also in some other cases.

I suppose for the Unicode "*multiple* combining accents" mechanism
to work, more complicated anchor classed ought to be defined
(allowing, e.g., "accent on top of accent"). But it seems no
actual font does this at the moment. In the case of the multiple
accents of Classical Greek, Openoffice seems to do OK. But I think
Openoffice uses another mechanism.

So the advice to users has to remain: use pre-combined accents
whenever possible. Don't count on the "combining accents"
mechanism to work. It won't, except from some lucky cases.

Thanks again for your explanation.

Regards, Jan
Rich Felker
2006-12-10 21:07:48 UTC
Permalink
Post by Jan Willem Stumpel
So the advice to users has to remain: use pre-combined accents
whenever possible. Don't count on the "combining accents"
mechanism to work. It won't, except from some lucky cases.
Well there are two flip sides to this situation. On the one hand
you're right, but on the other hand if no one tries to use the
combining characters, applications and fonts will remain broken. :(
Since I need to use scripts where combining is essential, I'm somewhat
inclined to hilight the brokenness in apps (by using combining chars
in more situations) in hopes that the authors will fix them...

Rich
Andries Brouwer
2006-12-10 21:33:30 UTC
Permalink
Post by Jan Willem Stumpel
So the advice to users has to remain: use pre-combined accents
whenever possible. Don't count on the "combining accents"
mechanism to work. It won't, except from some lucky cases.
It won't, except in lucky cases. True.

Now that this is being discussed, recently I needed to use
vocalized Hebrew, and found that all that I tried was broken.
Is there a version of Java, or a Java function perhaps other than
Graphics.drawString(), that correctly handles vocalized Hebrew?
Is there a version of Mozilla / Firefox that correctly handles
vocalized Hebrew? With what font?

Andries
Arne Götje (高盛華)
2006-12-11 02:15:11 UTC
Permalink
Post by Andries Brouwer
Post by Jan Willem Stumpel
So the advice to users has to remain: use pre-combined accents
whenever possible. Don't count on the "combining accents"
mechanism to work. It won't, except from some lucky cases.
It won't, except in lucky cases. True.
Now that this is being discussed, recently I needed to use
vocalized Hebrew, and found that all that I tried was broken.
Is there a version of Java, or a Java function perhaps other than
Graphics.drawString(), that correctly handles vocalized Hebrew?
Is there a version of Mozilla / Firefox that correctly handles
vocalized Hebrew? With what font?
I suppose "vocalized Hebrew" also uses accent composition and there are
no pre-composed glyphs available?

Firefox can use the pango library, which should be able to read the GPOS
definitions in the font if they are present... if not, prod the font
maintainer to include GPOS information for those glyphs.

But please, if you do so, provide the necessary information (which
combinations are possible and how they should display) to the font
maintainer... not all font maintainers are aware of all details of a
script... many only implement the glyphs (or a basic set of them) which
are available in Unicode and think their job is done.

Cheers
Arne
- --
Arne Götje (高盛華) <***@linux.org.tw>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F 1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net. Encrypted e-mail preferred.
Christopher Fynn
2006-12-12 16:21:40 UTC
Permalink
Andries Brouwer wrote:

...
Post by Andries Brouwer
Now that this is being discussed, recently I needed to use
vocalized Hebrew, and found that all that I tried was broken.
Is there a version of Java, or a Java function perhaps other than
Graphics.drawString(), that correctly handles vocalized Hebrew?
Is there a version of Mozilla / Firefox that correctly handles
vocalized Hebrew? With what font?
Java is still pretty lame as far as complex script support goes.

For Mozilla / Firefox make sure you are using a binary compiled
with Pango enabled. Several Linux distros ship with versions of
Mozilla/ Firefox compiled with Pango disabled.


- Chris
Arne Götje (高盛華)
2006-12-11 02:08:23 UTC
Permalink
This post might be inappropriate. Click to display it.
Rich Felker
2006-12-11 03:28:09 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Rich Felker
Post by Jan Willem Stumpel
So the advice to users has to remain: use pre-combined accents
whenever possible. Don't count on the "combining accents"
mechanism to work. It won't, except from some lucky cases.
Well there are two flip sides to this situation. On the one hand
you're right, but on the other hand if no one tries to use the
combining characters, applications and fonts will remain broken. :(
Since I need to use scripts where combining is essential, I'm somewhat
inclined to hilight the brokenness in apps (by using combining chars
in more situations) in hopes that the authors will fix them...
In this case, *you* need to *define* which combinations you need and
*how* they should be displayed. If they exist already as pre-composed
glyphs in Unicode, then it's no problem to implement them... but if not,
it's up to you to do the definition first and then either to modify
existing fonts by yourself (e.g. by using fontforge), or ask the
upstream author of the fonts to do it for you.
The case I need is just for Tibetan, and all the existing fonts
(except the broken stuff in GNU Unifont, which I'm hoping to
eventually get replaced with glyphs that halfway-work with dumb
overstrike) have working GPOS/GSUB data.

The issue I was talking about was more application support than font
support. Even if a font does not have glyphs for the combining
characters, a good rendering engine should be able to use the
precomposed glyphs in cases where they're encoded in Unicode. However
many apps don't work even with the combining characters in the font.
On Windows, whether it works is often dependent on the order the
characters are stored; I'm told that Hebrew specifically does not work
when stored in the canonical order, at least with certain fonts, but
does work when the combining marks are reordered. Firefox doesn't
support combining correctly at all unless you patch it and compile it
yourself with pango (yeah right, good luck...) on *nix or on Windows.
If the application of your choice then still cannot display the
combinations correctly, prod the maintainers of that application to
either use a rendering engine which can interpret the GPOS table of TTF
fonts, or hack the support in by themselves, or (the best solution)
forward the request to the upstream maintainer of the rendering engine
they use...
*nod* these are all good suggestions.

Rich
Jan Willem Stumpel
2006-12-11 16:06:23 UTC
Permalink
Arne G=C3=B6tje (=E9=AB=98=E7=9B=9B=E8=8F=AF) wrote (in another threa=
d, replying to Rich
In this case, *you* need to *define* which combinations you=20
need and *how* they should be displayed.
But please, if you do so, provide the necessary information=20
(which combinations are possible and how they should display)=20
to the font maintainer...
Do you really mean that in order to make fonts suitable for
rendering strings containing "combining accents", all possible
combinations have to be known *in advance*? This seems a tall order.

Are you absolutely sure this is correct? Apart from the difficulty
of doing this, in my view this would defeat the whole purpose of
having "combining accents" (which should be usable when combined
with just about any character, in order to create "new" characters
which Unicode does not specify).

I am beginning to think that the responsibility for correct
"combining accents" behaviour rests primarily with the rendering
engine, rather than with the fonts. The fonts must, of course,
include the combining accents, otherwise the accents will be
borrowed from other fonts; but I doubt that they really need
anchors or GPOS.

E.g. say I am a rendering engine; I see a character which, from
its Unicode range, is either

-- a "top" accent
-- a "bottom" accent
[-- a left accent if such things exist, a right accent, etc.,]

Then I can place it to the top, bottom, etc., of the previous
character; based on the "top", "bottom", etc., coordinates of the
previous base character (which I know, or at least can calculate).
So I do not need anchors!

After, e.g., placing a "top" accent on a base character, I could
increment the "top" coordinate by a certain amount, so a following
"stacked" character can also be placed correctly (but it seems not
even pango does this).

Couldn't this work? Perhaps it really works like that in practice
(I also hope to see some comment by the "pango guys"!) It would at
least explain some of the puzzling "luck" we now see when trying
to display combining accents using anchor-less fonts.

Regards, Jan
Andries Brouwer
2006-12-11 16:49:22 UTC
Permalink
Post by Jan Willem Stumpel
I am beginning to think that the responsibility for correct
"combining accents" behaviour rests primarily with the rendering
engine, rather than with the fonts. The fonts must, of course,
include the combining accents, otherwise the accents will be
borrowed from other fonts; but I doubt that they really need
anchors or GPOS.
E.g. say I am a rendering engine; I see a character which, from
its Unicode range, is either
-- a "top" accent
-- a "bottom" accent
[-- a left accent if such things exist, a right accent, etc.,]
In Hebrew, a dagesh is a dot centered in the glyph to double
the consonant or change the pronunciation.

The precise place where it should go must be indicated by the font.
If one just centers a dot in the same area, it may well be
(and in practice, in my Java experiments, is) invisible
because it overlaps part of the glyph.

Andries
Rich Felker
2006-12-11 18:13:25 UTC
Permalink
Post by Andries Brouwer
Post by Jan Willem Stumpel
I am beginning to think that the responsibility for correct
"combining accents" behaviour rests primarily with the rendering
engine, rather than with the fonts. The fonts must, of course,
include the combining accents, otherwise the accents will be
borrowed from other fonts; but I doubt that they really need
anchors or GPOS.
E.g. say I am a rendering engine; I see a character which, from
its Unicode range, is either
-- a "top" accent
-- a "bottom" accent
[-- a left accent if such things exist, a right accent, etc.,]
In Hebrew, a dagesh is a dot centered in the glyph to double
the consonant or change the pronunciation.
The precise place where it should go must be indicated by the font.
If one just centers a dot in the same area, it may well be
(and in practice, in my Java experiments, is) invisible
because it overlaps part of the glyph.
Then in principle you just need a 'center point' anchor for Hebrew
consonants. The point is that rendering combining marks should require
roughly O(nk) information (where n is the number of characters and k
is a small number of classes) as opposed to O(nm) or even O(nm^j)
(where m is the number of combining characters and j is the maximum
combining stack length).

Whether it's possible to support all combinations efficiently, I don't
know. The OpenType system is very poorly designed from what I can
tell. In the Tibetan fonts I've examined, rather than just saying
"character U+0F62 needs to use an alternate glyph when followed by any
of {list here} combining characters", there are individual ligature
combination tables for each pairing. Whether this is just lack of
understanding on the font designer's part or fundamental limitations
of OpenType, I'm not sure.

On the other hand, I've successfully implemented a O(nk) system with
UCF/uuterm, so I know it's possible. From what I've read, Apple's AAT
tables also sound like they're O(nk) and don't suffer from the
horrible "leave it to the rendering engine to decide what to do, and
decide incorrectly" syndrome of OpenType/Uniscribe/pango/etc.

Rich
Werner LEMBERG
2006-12-11 22:19:06 UTC
Permalink
The OpenType system is very poorly designed from what I can tell. In
the Tibetan fonts I've examined, rather than just saying "character
U+0F62 needs to use an alternate glyph when followed by any of {list
here} combining characters", there are individual ligature
combination tables for each pairing. Whether this is just lack of
understanding on the font designer's part or fundamental limitations
of OpenType, I'm not sure.
Mhmm, perhaps the font designer decided to used the simplest solution
for him, not the generic one...

Anyway, your Tibetan example can be implemented as you suggest within
the GSUB table: Just use a Lookup Type 5 (Contextual Substitution
Subtable), Format 2, which is based on glyph classes. If this doesn't
suffice, use Lookup Type 6 (Chaining Contextual Substitution Subtable)
to have the possibility to look back and look ahead also. You might
also have more than a single Lookup for a given `feature' (which are
executed sequentially on the whole input string) so that you can
construct even more complicated patterns.


Werner
Christopher Fynn
2006-12-12 14:56:06 UTC
Permalink
Post by Rich Felker
Whether it's possible to support all combinations efficiently, I don't
know. The OpenType system is very poorly designed from what I can
tell. In the Tibetan fonts I've examined, rather than just saying
"character U+0F62 needs to use an alternate glyph when followed by any
of {list here} combining characters", there are individual ligature
combination tables for each pairing. Whether this is just lack of
understanding on the font designer's part or fundamental limitations
of OpenType, I'm not sure.
Although you can build Tibetan stacks using contextual substitutions
I've found through trial and error that it is generally much more
efficient to have pre-composed consonant stacks and simple
(non-contextual) GSUB lookups. You will probably still need some
contextual lookups for vowel marks and for a few variant forms of stacks
- especially in cursive style Tibetan - but having a lot of contextual
substitution lookups in a Tibetan font seems to slow everything to a
crawl especially with long documents.

Since existing elements in a Tibetan stack usually need to get smaller
as additional elements are added to the stack, building stacks at run
time (rather than having pre-composed ligatures) inevitably involves a
lot of complex contextual substitution. IMO these are best used
sparingly with existing OT rendering engines.

A downside of this is that a font needs to contain a lot of glyphs for
comprehensive support of Tibetan script.

best regards

- Chris
Rich Felker
2006-12-12 16:29:08 UTC
Permalink
Post by Christopher Fynn
Post by Rich Felker
Whether it's possible to support all combinations efficiently, I don't
know. The OpenType system is very poorly designed from what I can
tell. In the Tibetan fonts I've examined, rather than just saying
"character U+0F62 needs to use an alternate glyph when followed by any
of {list here} combining characters", there are individual ligature
combination tables for each pairing. Whether this is just lack of
understanding on the font designer's part or fundamental limitations
of OpenType, I'm not sure.
Although you can build Tibetan stacks using contextual substitutions
I've found through trial and error that it is generally much more
efficient to have pre-composed consonant stacks and simple
(non-contextual) GSUB lookups. You will probably still need some
contextual lookups for vowel marks and for a few variant forms of stacks
- especially in cursive style Tibetan - but having a lot of contextual
substitution lookups in a Tibetan font seems to slow everything to a
crawl especially with long documents.
OK, so basically it's a workaround for poor OpenType implementations.
Got it. Thanks for the explanation.

BTW, I heard there was a mailing list specifically for Tibetan
font/script issues. Is that still active, and if so, how can I
subscribe?

Rich
Christopher Fynn
2006-12-13 13:46:56 UTC
Permalink
Post by Rich Felker
OK, so basically it's a workaround for poor OpenType implementations.
Got it. Thanks for the explanation.
Well I don't know whether or not the implementations are poor but if you
use a lot of reverse chaining contextual lookups for Tibetan isn't there
bound to be a lot more processing overhead? When you can get pages and
pages of Tibetan text without a paragraph break, entering a single
character in a block of Tibetan text may mean the whole thing has to be
re-rendered. A good implementation is going to to things much more
efficiently than a bad one - but in either case, the more complex the
lookups, the more processing overhead there is is going to be.

In Microsoft's earlier implementations of Tibetan shaping, positioning
lookups were *very*, *very* slow - in experiments with large documents
characters sometimes didn't appear on screen for a second or two after
you typed them - so I always tried to avoid these lookups. In that
regard things seem to have improved considerably in latest versions of
Uniscribe. I also understand that every lookup in a font used to result
in a separate call to their shaping engine - now I think, for Tibetan,
everything is applied in one pass.

It seems Pango currently doesn't apply all the OT features required for
Tibetan script in it's implementation ("ccmp" & "kern" are missing) -
but a bunch of OT features not used in any Tibetan font ("pref", "pres",
"blwf", "abvf" & "pstf") are present.

It would probably also be worthwhile checking if all the types of
lookups that should be supported under blwm & abvm (GPOS lookups type 4
& 5) are working properly in Pango. For the above stated reasons I've
tended to avoid these GPOS lookups in Tibetan fonts so wouldn't know
whether or not they are working properly in Pango.
Post by Rich Felker
BTW, I heard there was a mailing list specifically for Tibetan
font/script issues. Is that still active, and if so, how can I
subscribe?
There are two:

Tibex at Unicode dot org - mostly for Tibetan character encoding issues
not font issues. You can subscribe to that list by sending an email
to ecartis at unicode dot org with subscribe Tibex in the body of the mail.

Tibetscript is a list for Tibetan script issues in general. Again not
font specific. See:
<http://list.mail.virginia.edu/mailman/listinfo/TibetScript>

For OpenType font issues related to Tibetan script you might have better
luck on the OpenType list. People involved with writing all the main OT
layout engines, and many people making fonts for complex scripts are
subscribed to that list. Microsoft's VOLT list
MicrosoftVOLTuserscommunity at groups dot msn dot com can also be useful.

- Chris

Christopher Fynn
2006-12-12 16:05:34 UTC
Permalink
Post by Jan Willem Stumpel
After, e.g., placing a "top" accent on a base character, I could
increment the "top" coordinate by a certain amount, so a following
"stacked" character can also be placed correctly (but it seems not
even pango does this).
You can do this with mark-to-base and mark-to-mark GPOS lookups.
under the appropriate OpenType feature for the concerned script.

- Chris
Arne Götje (高盛華)
2006-12-11 01:49:04 UTC
Permalink
Post by Jan Willem Stumpel
The fonts need to contain a) the base glyphs b) the combining
accents c) "anchors" in the GPOS table to tell the rendering
engine where exactly to place the accents. In fontforge this is
done easily. :)
I am sure I detect some irony here.. Anyway it is true that in all
cases when a), b), c) are fulfilled (but there are not many of
those), combining accents work (at least for *single* accents).
And they work (more or less by luck) also in some other cases.
I suppose for the Unicode "*multiple* combining accents" mechanism
to work, more complicated anchor classed ought to be defined
(allowing, e.g., "accent on top of accent"). But it seems no
actual font does this at the moment. In the case of the multiple
accents of Classical Greek, Openoffice seems to do OK. But I think
Openoffice uses another mechanism.
So the advice to users has to remain: use pre-combined accents
whenever possible. Don't count on the "combining accents"
mechanism to work. It won't, except from some lucky cases.
yes, it depends totally on the font to define the *position* of the
accents (and weather or not they can be stacked). But it depends on the
rendering engine to *interpret* the information the font gives about the
accents.

BTW: there was no irony in my statement. In Fontforge it is really easy
to define the "anchors". For all pre-composed Latin based combinations,
you can get it done with around 10 anchor classes, including the stacked
ones for Vietnamese.

The difficulty is to decide how the combinations should be displayed,
i.e. which "standard" to follow and which scripts to support.
You need to define the anchor points, both for the base glyphs and the
combining accents, separately and for each possible combination. This is
quite some work...

In my CJK-Unifont (http://www.cjkunifonts.info) project, I have used
"anchors" to display compositions, which are used in the Minnan language
(or better it's romanization) and which are not available as
pre-composed glyphs in Unicode.

Example: o <U+0301> <U+0358> -> ó͘

However, I made no attempts yet to implement this for the already
existing pre-composed glyphs, due to lack of time and low priority for
my project.
Post by Jan Willem Stumpel
Thanks again for your explanation.
You are welcome. :)

Cheers
Arne
- --
Arne Götje (高盛華) <***@linux.org.tw>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F 1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net. Encrypted e-mail preferred.
Jan Willem Stumpel
2006-12-11 09:19:07 UTC
Permalink
[..] yes, it depends totally on the font to define the
*position* of the accents (and weather or not they can be
stacked). But it depends on the rendering engine to *interpret*
the information the font gives about the accents.
=20
BTW: there was no irony in my statement. In Fontforge it is
really easy to define the "anchors". For all pre-composed Latin
based combinations, you can get it done with around 10 anchor
classes, including the stacked ones for Vietnamese.
Is it also easy to create a GPOS table for a font which does not
have one? My experience with Fontforge is very limited!

In the meantime I found that most fonts on my system do *not* have
this table (including the Bitstream Vera fonts and the MS "core"
fonts). It seems that including such a table is one of the things
that we must "badger" upstream font developers about.
[..] Example: o <U+0301> <U+0358> -> o=CC=81=CD=98
This example displays OK on my system with your uming.ttf font
(naturally), but also (by "luck") with, for instance, Bitstream
Vera Serif (which does not have the combining accent characters,
nor a GPOS table). I suppose the rendering engine (pango) borrows
the accents from another font (yours, probably). But how can it
know where to place them? The base characters in Bitstream Vera
Serif do not have anchors.

Regards, Jan
Arne Götje (高盛華)
2006-12-11 10:09:28 UTC
Permalink
Post by Jan Willem Stumpel
[..] yes, it depends totally on the font to define the
*position* of the accents (and weather or not they can be
stacked). But it depends on the rendering engine to *interpret*
the information the font gives about the accents.
BTW: there was no irony in my statement. In Fontforge it is
really easy to define the "anchors". For all pre-composed Latin
based combinations, you can get it done with around 10 anchor
classes, including the stacked ones for Vietnamese.
Is it also easy to create a GPOS table for a font which does not
have one? My experience with Fontforge is very limited!
As soon as any GPOS feature ("anchors" is one if the features) is
present, the table is created automatically.
Post by Jan Willem Stumpel
In the meantime I found that most fonts on my system do *not* have
this table (including the Bitstream Vera fonts and the MS "core"
fonts). It seems that including such a table is one of the things
that we must "badger" upstream font developers about.
Exactly.
Post by Jan Willem Stumpel
[..] Example: o <U+0301> <U+0358> -> ó͘
This example displays OK on my system with your uming.ttf font
(naturally), but also (by "luck") with, for instance, Bitstream
Vera Serif (which does not have the combining accent characters,
nor a GPOS table). I suppose the rendering engine (pango) borrows
the accents from another font (yours, probably). But how can it
know where to place them? The base characters in Bitstream Vera
Serif do not have anchors.
This also puzzles me. but the glyphs do not come from my font, they look
different... I'd have to take a closer look on the fonts if I find some
time... I assume that if there are no GPOS information present, pango
and other rendering engines classify the combining accents by unicode
coderange and just center them over the preceding (latin) character...
(maybe someone of the pango guys can answer this...?)

Cheers
Arne
- --
Arne Götje (高盛華) <***@linux.org.tw>
PGP/GnuPG key: 1024D/685D1E8C
Fingerprint: 2056 F6B7 DEA8 B478 311F 1C34 6E9F D06E 685D 1E8C
Key available at wwwkeys.pgp.net. Encrypted e-mail preferred.
Loading...