Rich Felker
2007-02-23 20:35:48 UTC
These days we have at least xterm, urxvt, mlterm, gnome-terminal, and
konsole which support utf-8 fairly well, but on the flip side there's
still a huge number of terminal emulators which do not respect the
user's encoding at all and always behave in a legacy-8bit-codepage
way.
Trying to help users in #irssi, etc. with charset issues, I've come to
believe that it's a fairly significant problem: users get frustrated
with utf-8 because the terminal emulator they want to use (which might
be chosen based on anti-bloat sentiment or, quite the opposite, on a
desire for specialized eye candy only available in one or two
programs) forces their system into a mixed-encoding scenario where
they have both utf-8 and non-utf-8 data in the filesystem and text
files.
How hard would it be to go through the available terminal emulators,
evaluate which ones lack utf-8 support, and provide at least minimal
fixes? In particular, are there any volunteers?
What I'm thinking of as a minimal fix is just putting utf-8 conversion
into the input and output layers. It would still be fine for most
users of these apps if the terminal were limited to a 256-character
subset of UCS, didn't support combining characters or CJK, etc. as
long as the data sent and received over the PTY device is valid UTF-8,
so that the (valid and correct) assumption of applications running on
the terminal that characters are encoded in the locale's encoding is
satisfied.
Perhaps this could be done via a "reverse luit" -- that is, a program
like luit or an extension to luit that assumes the physical terminal
is using an 8bit legacy codepage rather than UTF-8. Then these
terminals could simply be patched to run luit if the locale's encoding
is not single-byte.
Rich
konsole which support utf-8 fairly well, but on the flip side there's
still a huge number of terminal emulators which do not respect the
user's encoding at all and always behave in a legacy-8bit-codepage
way.
Trying to help users in #irssi, etc. with charset issues, I've come to
believe that it's a fairly significant problem: users get frustrated
with utf-8 because the terminal emulator they want to use (which might
be chosen based on anti-bloat sentiment or, quite the opposite, on a
desire for specialized eye candy only available in one or two
programs) forces their system into a mixed-encoding scenario where
they have both utf-8 and non-utf-8 data in the filesystem and text
files.
How hard would it be to go through the available terminal emulators,
evaluate which ones lack utf-8 support, and provide at least minimal
fixes? In particular, are there any volunteers?
What I'm thinking of as a minimal fix is just putting utf-8 conversion
into the input and output layers. It would still be fine for most
users of these apps if the terminal were limited to a 256-character
subset of UCS, didn't support combining characters or CJK, etc. as
long as the data sent and received over the PTY device is valid UTF-8,
so that the (valid and correct) assumption of applications running on
the terminal that characters are encoded in the locale's encoding is
satisfied.
Perhaps this could be done via a "reverse luit" -- that is, a program
like luit or an extension to luit that assumes the physical terminal
is using an 8bit legacy codepage rather than UTF-8. Then these
terminals could simply be patched to run luit if the locale's encoding
is not single-byte.
Rich