Saturday, December 13, 2008

30 days with JFS

The Journaled File System (JFS) is a little known file system open sourced by IBM in 1999 and available in the official kernel sources since 2002. It originated inside IBM as the standard file system on the AIX line of UNIX servers, and later OS/2. IBM ships JFS2 with AIX today. Despite its pedigree, JFS has not received the publicity or widespread usage of other Linux file systems like EXT2/3 and ReiserFS. To learn more about JFS, I installed it as my root file system.

note: originally published September 14, 2007 on linux.com

Features

I installed Slackware 12 on a laptop choosing JFS as the file system during installation. I performed no special partitioning and created one JFS file system to hold everything. Installation was uneventful and the system booted normally from Grub. Not all distributions offer JFS as an install option, and some may not have JFS compiled into their default kernels. While Fedora and SUSE can use JFS, they both default to EXT3. Slackware, Debian, Ubuntu, and their derivatives are good choices if you want to try JFS.

One of the first things I noticed was the absence of a lost+found directory, a relic of lesser file systems.

JFS is a fully 64-bit file system. With the default block size of 4 KB, it supports a maximum file system size of 4 petabytes (less if smaller block sizes are used). The minimum file system size supported is 16 MB. The JFS transaction log has a default size of 0.4% of the aggregate size, rounded up to a megabyte boundary. The maximum size of the log is 32 MB. One interesting aspect of the layout on disk is the fsck working space, a small area allocated within the file system for keeping track of block allocation if there is not enough RAM to track a large file system at boot time.

JFS dynamically allocates space for disk inodes, freeing the space when it is no longer required. This eliminates the possibility of running out of inodes due to a large number of small files. As far as I can tell, JFS is the only file system in the kernel with this feature. For performance and efficiency, the contents of small directories are stored within the directory's inode. Up to eight entries are stored in-line within the inode, excluding the self [.] and parent [..] entries. Larger directories use a B+tree keyed on name for faster retrieval. Internally, extents are used to allocate blocks to files, leading to compact, efficient use of space even as files grow in size. This is also available in XFS, and is a major new feature in EXT4.

JFS supports both sparse and dense files. Sparse files allow data to be written to random locations within a file without writing intervening file blocks. JFS reports the file size as the largest used block, while only allocating actually used blocks. Sparse files are useful for applications that require a large logical space but only use a portion of the space. With dense files, blocks are allocated to fill the entire file size, whether data is written to them or not.

In addition to the standard permissions, JFS supports basic extended attributes, such as the immutable (i) and append-only (a) attributes. I was able to successfully set and test them with the lsattr and chattr programs. I could not find definitive information on access control list support under Linux.

Logging

The main design goal of JFS was to provide fast crash recovery for large file systems, avoiding the long file system check (fsck) times of older UNIX file systems. That was also the primary goal of file systems like EXT3 and ReiserFS. Unlike EXT3, journaling was not an add-on to JFS, but baked into the design from the start. For high performance applications, the JFS transaction log file can be created on an external volume if specified when the file system is first created.

JFS only logs operations on meta-data, maintaining the consistency of the file system structure, but not necessarily the data. A crash might result in stale data, but the files should remain consistent and usable.

Here is a list of the file system operations logged by JFS:

  • File creation (create)
  • Linking (link)
  • Making directory (mkdir)
  • Making node (mknod)
  • Removing file (unlink)
  • Rename (rename)
  • Removing directory (rmdir)
  • Symbolic link (symlink)
  • Truncating regular file

Utilities

A suite of utilities is provided to manage JFS file systems. You must be the root user to use them.

Utility Description
jfs_debugfs shell based JFS file system editor, allows changes to the ACL, uid/gid. mode, time, etc. You can also alter data on disk, but only by entering hex strings, not the most efficient way to edit a file.
jfs_fsck replay the JFS transaction log, check and repair a JFS device, should on ly be run on an unmounted or read only file system, run automatically at boot.
jfs_fscklog extract a JFS fsck service log into a file. jfs_fscklog -e /dev/hd a6 extracts the binary log to file fscklog.new, to view, use jfs_fscklog -d fscklog.new
jfs_logdump dump the journal log, a plaint text file that shows data on each transac tion in the log file.
jfs_mkfs create a JFS formatted partition, use the -j journal_device option to create an external journal (1.0.18 or later)
jfs_tune adjust tunable file system parameters on JFS. I didn't find options that looked like they might improve performance, the -l option lists th e superblock info

Here is what a dump of the superblock information looks like:

root@slackt41:~# jfs_tune -l /dev/hda6
jfs_tune version 1.1.11, 05-Jun-2006

JFS filesystem superblock:

JFS magic number: 'JFS1'
JFS version: 1
JFS state: mounted
JFS flags: JFS_LINUX JFS_COMMIT JFS_GROUPCOMMIT JFS_INLINELOG
Aggregate block size: 4096 bytes
Aggregate size: 12239720 blocks
Physical block size: 512 bytes
Allocation group size: 16384 aggregate blocks
Log device number: 0x306
Filesystem creation: Wed Jul 11 01:52:42 2007
Volume label: ''

Crash Testing

White papers and man pages are no substitute for the harsh reality of a server room. To test the recovery capabilities of JFS, I started crashing my system (forced power off) with increasing workloads. I repeated each crash twice to see if my results were consistent.

Crash workload Recovery
console (no X) running text editor with one open file about 2 seconds to replay the journal log, changes I had not saved in th e editor were missing but the file was intact
X window system with KDE, gimp, nvu, and text editor in xte rm all with open files about 2 seconds to replay the journal log, all open files were intact, u nsaved changes were missing
X window system with KDE, gimp, nvu, and text editor all with open files , plus a shell script that inserted records into a MySQL (ISAM) table. The scri pt I wrote was an infinite loop and I let it run for a couple of minutes to make sure some records were flushed to disk. about 3 seconds to replay the journal log, all open files intact, databa se intact with a few thousand records inserted, but the timestamp on the table f ile had been rolled back one minute.

In all cases, these boot messages appeared...

**Phase 0 - Replay Journal Log
-|----- (spinner appeared for a couple of seconds, then went away)
Filesystem is clean

Throughout the crash testing, no file system corruption occurred and the longest log replay time I experienced was about 3 seconds.

Conclusion

While my improvised crash tests were not a good simulation a busy server, JFS did hold up well, and recovery time was fast. All file level applications I tested, like tar and rsync, worked flawlessly and lower level programs like truecrypt also worked as expected. After 30 days of kicking and prodding, this gave me a high level of confidence in JFS and I am content trusting my data to it. JFS may not have been marketed as effectively as others, but is a solid choice in the long list of quality Linux file systems.