Character Encoding in file names and folder names
There are some known limitations (discussed in many messages on the unison-users mailing list) in the way that Unison handles non-ASCII characters in the names of files and folders. So if you use diacritics / accent marks, umlaute or anything more exotic in your file and folder names, you'll have to take special care. The word "exotic" is based on a 7-bit ASCII environment ... you get the point!
On 6/1/06, Benjamin Pierce <bcpierce@cis.upenn.edu> wrote: [snip] > Unfortunately, I'm less sanguine about the ease of fixing things. > There are lots of ways in which Unison doesn't deal well with Unicode > and other character encoding issues -- basically, Unison itself just > ignores all such issues and takes whatever it gets from the lower- > level OCaml / Posix filesystem libraries, string libraries, etc. > Doing all of this right would be very valuable, but at the moment no > one is signed up to do it. (Volunteers welcome, of course! :-)
The best way to avoid problems is to avoid any kind of non-ASCII characters in file names.
Known status
NTFS Windows - unison 2.17.1 local usage - SMB network share:
- With local usage of unison it has been seen to work fine with extended ASCII characters ISO 8859-1.
- It doesn't work with Unicode characters.
NTFS Windows - unison 2.17.1 through ssh - unison on Mac OS 10.4 x86
- With this setup only the 7 bit ASCII characters work fine
- Extended ASCII or unicode looked like they result in some sort of combined characters on the other platform, and causes unison to fail.
Help on renaming files and folders
There are probably many ways of using a little script to walk through your folders to help you working around this unison limitation. I could imagine a script that logs or suspicious names to a text file. Others could automatically rename them so that only 7 bit ASCII characters are used. If you have any pointers, please put links in this document.
Possible Workaround for syncing with Windows
Here is a guide for how to use Unison to synchronize files between a Linux and Windows computer. The requirements are that the non-Windows computer uses a UTF8 filesystem, and that the Windows computer is running Cygwin and the Cygwin-version of Unison. http://jan.essert.name/2008/11/on-using-unison-to-synchronise-files-efficiently-between-windows-and-linux-machines/
You will need the Cygwin UTF8 patch from here: http://www.okisoft.co.jp/esc/utf8-cygwin/
However, this will NOT work for synchronizing with OSX, because OSX store the UTF8 filenames differently than Windows/Cygwin. The first sync will seem to work, with all files transferred successfully with correct names on both systems. However, on the second run Unison will report all previously transferred files (containing non ASCII chars like ÅÄÖ) as NEW, and try to transfer them again. Only to fail with "Failed: Destination updated during synchronization" (since the file is already on the other system). Some characters can be stored in (at least) two different ways in UTF8, and unfortunately OSX and Cygwin uses different formats.
