SrinTuar
2008-06-13 15:32:23 UTC
Using some fairly recent O/S's, such as Fedora core 8 and WIndows XP,
I seem to have no way to move a bunch of files from one to the other while
preserving
the nice unicode filenames I have.
In specific, the files were created on the fc8 system. (a few thousand of
them)
Putting them together in a zip file works fine fc8->fc8, but fails miserably
when trying to unzip in windows.
A bit of searching shows this:
http://www.pkware.com/documents/casestudies/APPNOTE.TXT
pkware has apparently declared a flag bit to mean all filenames are utf-8
But at the same time, the developers of info-zip say this:
http://www.info-zip.org/FAQ.html
Basically, that utf-8 support is nowhere on their radar.
Things work poorly in the opposite direction for zipfiles created on windows
as well:
sometimes i can guess the original encoding and reverse the damage, other
times
I cannot : perhaps the software that made the archive has already trashed
the filenames.
Ive also given tarballs a shot for this task, but sadly cygwin is
ascii-only.
Because it works linux to linux, or at least fedora to fedora, and that is
really good enough for me,
Its not a major issue. But I'm curious to know if other have run into this
cross-platform problem, and how they
resolved it for themselves. That is, if anyone still reads this list.
How do you go about making a basic archive containing non-ascii filenames
that you can have confidence
will unpack well on most operating systems.
I seem to have no way to move a bunch of files from one to the other while
preserving
the nice unicode filenames I have.
In specific, the files were created on the fc8 system. (a few thousand of
them)
Putting them together in a zip file works fine fc8->fc8, but fails miserably
when trying to unzip in windows.
A bit of searching shows this:
http://www.pkware.com/documents/casestudies/APPNOTE.TXT
pkware has apparently declared a flag bit to mean all filenames are utf-8
But at the same time, the developers of info-zip say this:
http://www.info-zip.org/FAQ.html
Basically, that utf-8 support is nowhere on their radar.
Things work poorly in the opposite direction for zipfiles created on windows
as well:
sometimes i can guess the original encoding and reverse the damage, other
times
I cannot : perhaps the software that made the archive has already trashed
the filenames.
Ive also given tarballs a shot for this task, but sadly cygwin is
ascii-only.
Because it works linux to linux, or at least fedora to fedora, and that is
really good enough for me,
Its not a major issue. But I'm curious to know if other have run into this
cross-platform problem, and how they
resolved it for themselves. That is, if anyone still reads this list.
How do you go about making a basic archive containing non-ascii filenames
that you can have confidence
will unpack well on most operating systems.