Thursday, December 11, 2008

Mac/Linux/Windows file name friction

In 1993, Microsoft added long file name support to Windows NT 3.1, allowing more descriptive names than the limited 8.3 DOS format. Mac users scoffed, having had long file names for nearly a decade, and because Windows still stored a DOS file name in the background. Linux was born with long file name a couple of years before it showed up in Windows. Today, long file names are well supported by all three operating systems though key differences remain.

Linux is the most sensitive

One of first culture shocks for people moving from Windows to Linux is the case sensitivity of file names. These files: "filename", "Filename", and "FileName" are the same file in Windows but three unique files in Linux.

The Mac OS X HFS+ and Windows NTFS file systems are case preserving, but not case sensitive. This means they will store and keep track of the case in a file name, but will ignore case when comparing file names. So, you can't have "filename" and "Filename" in the same directory. (note: an optional file system called HFSX, available in OS X 10.3+, is case sensitive.)

Linux and OS X files have more character(s)

Linux file names can be up to 255 characters long and can be made up of any characters except forward slash (/) and NUL. Using special characters like those recognized by the shell takes extra effort. You can trick the shell by either enclosing the name in single quotes ('icky?filename') or escaping it with a backslash (icky?filename). You can also use non-printable characters in a file name, but I can't think of a good reason to do so. To avoid confusion, most people stick to alphanumerics, periods, spaces, underscores, and hyphens.

OS X supports up to 255 characters and can use the same characters as Linux, except for a colon (:). However, the Finder may have trouble with bizarre file names that can be created in the shell.

Windows file names can be up to 255 characters, but that includes the full path. A lot characters are wasted if the default storage location is used: "C:\Documents and Settings\USER\My Documents". Windows does not allow names to contain any of these characters:
* |  / :  ?

Avoid using these characaters
for maximum portability
Asterisk *
Colon :
Back slash
\
Forward slash /
Less Than <
Greater Than >
Pipe |
Quote "
Question mark ?
all non-printable characters

Portability

Even though you can use odd characters in a Linux or OS X file name, it's not a great idea. Using sensible names allows a smooth exchange of files with other operating systems where an odd character may be invalid. For maximum portability, avoid using characters that are illegal in any of the operating systems.

The lowercase-hyphen rule

My preference is to always use lowercase letters and hyphens to replace spaces. For example, "this-is-a-long-filename.txt", instead of "This is a long filename.txt".

I'm not sure where or when I picked up the lowercase habit but it has been with me for some time. Since I spend a lot of quality time with web files, it relieves me of thinking about the case of file names. I like hyphens over underscores because underscores don't always show up clearly in anchor tag hyperlinks. Another web bias. Even though it is not required in Linux or OS X, I add a file type extension (three or four characters) mostly for my own benefit. I use the same naming convention regardless of operating system.

There are two exceptions I make to the lowercase-hyphen rule. The first is audio files, since ripping software tends to default file names to the track title with mixed case and spaces. The other exception is when I exchange documents with someone who has already named a file. The tab completion feature of BASH takes the edge off working with whatever is thrown at me.

Taming file names

Modern operating systems provide a lot of flexibility in naming files. I probably expend too many brain cycles thinking about file names, but I suspect most people have a preference.