Post by Abel CheungPost by Rich FelkerThis is only an issue on character-cell devices which use wcwidth.
I'm exactly talking about those apps, like terminals.
Given how utterly abysmal current terminals' Unicode support is, this
seems like a relatively minor issue. I don't want to disparage concern
about getting it right, but rather investigate where we're at now and
what needs to be done. Along those lines, I recently evaluated some
terminals with the following results:
Konsole and Xfce terminal: no support for nonspacing characters;
unsure about whether cjk wide characters are right.
Gnome Terminal: I assume it's the same since Xfce uses the same
widget. Please correct me if I'm mistaken since I didn't try it.
urxvt and xterm: CJK and nonspacing character widths are correct, but
rendering is minimal overstrike for nonspacing characters. No bidi or
complex script support. xterm default of only 1 combining character
per cell is horribly deficient for any language that doesn't just use
precomposed characters anyway.
aterm/rxvt/Eterm/etc.: unmaintained; no UTF-8 support at all.
mlterm: CJK and nonspacing character widths are correct, bidi is
available (not sure how well it works) with correct Arabic shaping,
and Indic reordering/shaping is available but as a special case (not
sure how well it works either). Also, cursor position becomes
nonsensical (font-dependent too) with Indic shaping, making
screen-mode (my terminology, as opposed to line-mode) apps difficult
to use.
uuterm (experimental; by me): CJK and nonspacing character widths are
correct. Shaping/ligatures are supported and sufficient for all
scripts afaik, but using a nonstandard font system (ucf). Bidi and
reordering (for Indic vowel marks on left) are not available.
So as of now, here is the status of support for particular languages
I'm aware of:
European-script langs using precomposed forms only: any terminal
except legacy stuff lacking UTF-8 support should be fine.
European-script languages with multiple decomposed accents: uuterm is
probably the only one that works.
Languages of India: mlterm and some old, unmaintained Indic-specific
terminals (pre-Unicode I think) are the only ones that work.
CJK, Thai, Lao: urxvt, xterm, mlterm, and uuterm all work. uuterm is
the only one that supports decomposed Korean (Hangul Jamo) though.
Tibetan: uuterm is the only terminal that works correctly, but a
minimal degree of legibility can be obtained with an ugly tailored
font that does not require shaping, so that urxvt, xterm, and mlterm
are usable.
Burmese: not supported by anything.
Arabic and Hebrew: mlterm and perhaps some rtl-specific terminal
emulators I'm not aware of..?
Mongolian: unknown; probably only mlterm and I'm unsure whether it
even works acceptably well.
One additional issue I have not tested is support for characters
outside the BMP. I know GNU screen totally lacks support for these,
and I suspect many terminal emulators have the same problem.
~Rich