Recent Changes - Search:

UnisonFAQCharacterEncoding

Character Encoding in file names and folder names

  • 2010-01-19 Jérôme Vouillon’s page would seem to indicate that there are Windows binaries which support Unicode as of v2.37.4, with Unicode support enabled by default as of 2.39. I have not tested any of these versions and have no idea how this idea may interact with the information below. Perhaps someone with better knowledge could update this section. (bill+unisonATblunn,org)
  • 2010-07-27 I have been using Unison 2.40.16 and 2.40.1 for a while now to synchronise folders between a Windows (Vista) machine and a Linux (Ubuntu 10.04 Lucid) machine. This setup seems to support Unicode/UTF-8 filenames fine. So far it seems to be working OK. (bill+unisonATblunn,org)

There are some known limitations (discussed in many messages on the unison-users mailing list) in the way that Unison handles non-ASCII characters in the names of files and folders. So if you use diacritics / accent marks, umlaute or anything more exotic in your file and folder names, you'll have to take special care. The word "exotic" is based on a 7-bit ASCII environment ... you get the point!


On 6/1/06, Benjamin Pierce <bcpierce@cis.upenn.edu> wrote:
[snip] 
> Unfortunately, I'm less sanguine about the ease of fixing things.
> There are lots of ways in which Unison doesn't deal well with Unicode
> and other character encoding issues -- basically, Unison itself just
> ignores all such issues and takes whatever it gets from the lower-
> level OCaml / Posix filesystem libraries, string libraries, etc.
> Doing all of this right would be very valuable, but at the moment no
> one is signed up to do it.  (Volunteers welcome, of course! :-)

The best way to avoid problems is to avoid any kind of non-ASCII characters in file names.

Known status

NTFS Windows - unison 2.17.1 local usage - SMB network share:

NTFS Windows - unison 2.17.1 through ssh - unison on Mac OS 10.4 x86

  • With this setup only the 7 bit ASCII characters work fine
  • Extended ASCII or unicode looked like they result in some sort of combined characters on the other platform, and causes unison to fail.

Help on renaming files and folders

There are probably many ways of using a little script to walk through your folders to help you working around this unison limitation. I could imagine a script that logs "exotic" filenames to a text file. Another script could automatically rename them so that only 7 bit ASCII characters are used. If you have any pointers, please put links in this document.

Possible Workaround for syncing with Windows

  • 2010-01-19 Cygwin 1.7.1 (announced 2009-12-23) appears to support UTF-8 (Note: UTF-8 is spelled "UTF-8" NOT "UTF8"!!!) out of the box (needs LANG to be set appropriately; new installations seem to be gifted with LANG=C.UTF-8 by default and this seems to make UTF-8 work, for example on filenames). I have not used Unison under Cygwin and have no idea how this idea may interact with the information below. Perhaps someone with better knowledge could update this section. (bill+unisonATblunn,org)
  • You are right, just install the newest cygwin with unison (tested Version 2.27.157). It works native with UTF-8 and filenames are "translated". No more special dll needed. (cyber1000)

Here is a guide for how to use Unison to synchronize files between a Linux and Windows computer. The requirements are that the non-Windows computer uses a UTF8 filesystem, and that the Windows computer is running Cygwin and the Cygwin-version of Unison. http://jan.essert.name/posts/2008/11/on-using-unison-to-synchronise-files-efficiently-between-windows-and-linux-machines/

You will need the Cygwin UTF8 patch from here: http://www.oki-osk.jp/esc/utf8-cygwin/

However, this will NOT work for synchronizing with OSX, because OSX store the UTF8 filenames differently than Windows/Cygwin. The first sync will seem to work, with all files transferred successfully with correct names on both systems. However, on the second run Unison will report all previously transferred files (containing non ASCII chars like ÅÄÖ) as NEW, and try to transfer them again. Only to fail with "Failed: Destination updated during synchronization" (since the file is already on the other system). Some characters can be stored in (at least) two different ways in UTF8, and unfortunately OSX and Cygwin uses different formats.


2011-01-06, christian.lehmann@uni-erfurt.de

As a newcomer to Unison (2.32.52), much of the above is too abstract for me to be helpful. Please allow me to describe my case:

I want to synchronize a directory on an Ubuntu ext4 partition with a directory on a USB stick. The latter is formated with FAT32 and used under both Windows (2000 and XP) and Ubuntu 10.10. Its entry in the fstab is currently as follows:

/dev/sdc1 /media/USB_16GB vfat uid=1000,nodiratime,gid=100,users,noexec,noauto,noatime,nodev,utf8 0 0

(The problem described below, however, is the same if there is no entry for the USB stick at all, or if I delete the utf8 option.)

I had used Unison before to copy the USB files to the ext4 partition. Thus, both directories contain the same set of identical files. Upon starting the synchronization, Unison now lists all files that contain special characters and proposes for each of them, in the second column, which is the one for the USB stick, "<- deleted" and then "<- new file".

Those files are many. 1) If there were a comfortable way of renaming all of them in a consistent fashion, I would do so. 2) Failing #1: Suppose I delete one of the identical copies: Would there be a way of copying the files again, this time getting the coding right, so that Unison no longer notices a difference? 3) Failing #2, or in addition, is there an option to be set in the fstab that would fix the problem?

Grateful for any help.

Edit - History - Print - Recent Changes - Search
Page last modified on August 29, 2013, at 06:03 PM