Rich> On Sat, Mar 17, 2007 at 07:05:01AM +0000, Colin Paul Adams
Post by Colin Paul AdamsI can't find this in the GNOME help, so I thought I'd try
asking here.
I want to be rename a file so it has an a-umlaut (lower case)
in the name.
My LANG is en_GB.UTF-8.
I don't know how to type the accented character.
Rich> One sure way is to copy-and-paste it from a file already
Rich> containing the character. I keep around a copy of
Rich> UnicodeData.txt with the literal UTF-8 character added to
Rich> each line for exactly this purpose.
Rich> Another method that might work is the ISO 14755 entry
Rich> method, holding control and shift and typing the character
Rich> number in hex. Not sure if GNOME terminal supports this
Trial says it does.
Thank-you.
Now my real problem is somewhat more interesting, and relevant to this
list, I think:
I am the author of an XSLT 2.0 interpreter, and a member of the W3C
XSLT WG. As such, I have access to the XSLT 2.0 test suite
(unfortunately not publicly distributed now).
One of the tests involves evaluation of the following expression:
document('xgespr%C3%A4ch.xml')
According to the rules of the language, the argument to document() is
of type xs:anyURI. The percent-encoding must be interpreted as a UTF-8
byte sequence representing the Unicode characters.
Now this is where it gets interesting.
My URI resolver translates the file name (the URI is relative to a
base file: URI) into a UTF-8 byte sequence which gets passed to the
fopen call (the program is supposed to work on other O/Ses too, not
just Linux, but I'll worry about that later).
The test suite is currently distributed as a zip file. It so happens
that the file concerned is named using ISO-8859-1 on the distributors
system. On my system, doing ls from the GNOME console shows the name
as xgespr?ch.xml. Whereas Emacs dired shows the name as
xgespräch.xml.
I'm not sure exactly how fopen is supposed to handle the situation.
Anyway, the test failed - not surprisingly.
I looked at the unzip man page, to see if there was any filename
translation option. I couldn't find one.
So I tried unzipping the distrbution afresh, but this time with
LANG=en_GB.
Emacs still showed the same name, ls however showed a completely
different character (it loked like it might be arabic to me - I don't
know).
The test still failed.
So I went back to LANG=en_GB.UTF-8, unzipped the distribution again,
and re-named the file, thanks to your help.
ls now shows the correct file name. Emacs shows
xgespräch.xml. And the test works.
Has anyone any illuminating comments to make? I'm particularly
interested in the distribution problem.
--
Colin Adams
Preston Lancashire