Emerging Linux filesystems - Part 2

As a comparison, I ran my test script against a set of standard filesystems in the Linux kernel that most users will be familiar with.


Watch a slideshow of the graphics with this article.


As well as being of general interest, it also gives a good range of values to use as a baseline when comparing the emerging filesystems.

ext2

The ext2 filesystem was introduced into the Linux kernel in January 1993 and was the principal filesystem until the introduction of ext3 in 2001. It is a fixed-block-size filesystem and has no journalling capabilities.

Timing Results

Test

Time (secs)

Total

3232.9

Extract kernel sources

2.7

Extract GCC sources

4.0

Recursive random file

22.7

Configure GCC

2.0

Kernbench

824.5

GCC make -j16 bootstrap

1288.3

Remove kernel source

0.3

Bonnie++ file operations

403.3

Remove GCC tree

0.9

tiobench threaded I/O

54.9

Bonnie++ intelligent I/O

629.1

Bonnie++ Results

Test

Result

Sequential create/sec

651

Sequential stats/sec

+++++

Sequential delete/sec

204531

Random create/sec

639

Random stat/sec

+++++

Random delete/sec

1204

Block writes KB/sec

648084

Block rewrites KB/sec

123908

Block read KB/sec

294471

Random seeks/sec

1007

ext3

The third extended filesystem (ext3) was introduced into the mainline Linux kernel in 2001 to provide a filesystem that was backwards- and forwards-compatible with ext2 but which provided journalling of both metadata and (optionally) data.

In its default mode of data=ordered, it also provides a higher guarantee of filesystem consistency by ensuring that file data is flushed to disk before the corresponding metadata.

Timing Results

Test

Time (secs)

Total

2509.9

Extract kernel sources

4.0

Extract GCC sources

5.4

Recursive random file

22.5

Configure GCC

2.1

Kernbench

828.1

GCC make -j16 bootstrap

1290.4

Remove kernel source

0.7

Bonnie++ file operations

7.9

Remove GCC tree

1.8

tiobench threaded I/O

59.9

Bonnie++ intelligent I/O

286.6

Bonnie++ Results

Test

Result

Sequential create/sec

53412

Sequential stats/sec

+++++

Sequential delete/sec

60123

Random create/sec

52744

Random stat/sec

+++++

Random delete/sec

59555

Block writes KB/sec

275239

Block rewrites KB/sec

115008

Block read KB/sec

309794

Random seeks/sec

991.9

XFS

SGI's XFS began life in the mid '90s in Irix, their Unix variant, but in 1999 the company announced it was going to contribute it to Linux. It finally arrived in Linux in the 2.5.36 kernel on Sept. 17, 2002, and then in the 2.4.24 kernel on Feb. 5, 2004.

XFS is a 64-bit extents-based filesystem capable of scaling up to 9 exabytes, though on 32-bit Linux systems there are kernel constraints that limit it to 16TB for both filesystems and individual files.

Timing Results

Test

Time (secs)

Total

2782.4

Extract kernel sources

8.1

Extract GCC sources

13.6

Recursive random file

22.7

Configure GCC

2.0

Kernbench

832.2

GCC make -j16 bootstrap

1307.3

Remove kernel source

6.6

Bonnie++ file operations

145.6

Remove GCC tree

7.4

tiobench threaded I/O

51.1

Bonnie++ intelligent I/O

385.4

Bonnie++ Results

Test

Result

Sequential create/sec

2894

Sequential stats/sec

+++++

Sequential delete/sec

4602

Random create/sec

2643

Random stat/sec

+++++

Random delete/sec

2109

Block writes KB/sec

617869

Block rewrites KB/sec

128171

Block read KB/sec

246910

Random seeks/sec

1404

JFS

JFS is a filesystem developed by IBM. It first appeared in the 2.5.6 development kernel on March 8, 2002, then backported to 2.4.20 which was released on Nov.28, 2002.

JFS is an extents-based filesystem and can extend to 4 petabytes with 4KB block sizes.

Timing Results

Test

Time (secs)

Total

3064.5

Extract kernel sources

10.5

Extract GCC sources

18.7

Recursive random file

22.1

Configure GCC

1.9

Kernbench

847.6

GCC make -j16 bootstrap

1387.9

Remove kernel source

12.1

Bonnie++ file operations

193.4

Remove GCC tree

21.5

tiobench threaded I/O

54.9

Bonnie++ intelligent I/O

443.8

Bonnie++ Results

Test

Result

Sequential create/sec

5562

Sequential stats/sec

+++++

Sequential delete/sec

2761

Random create/sec

1556

Random stat/sec

+++++

Random delete/sec

1432

Block writes KB/sec

327055

Block rewrites KB/sec

128943

Block read KB/sec

279747

Random seeks/sec

1060

Reiserfs

Reiserfs (actually Reiserfs Version 3) was the first journalling filesystem to be included into the mainline Linux kernel, arriving in the 2.4.1 release on Jan. 29, 2001.

It uses a novel tree structure for files, as well as directories, and claims space-efficiency through its “tail-packing” of small files, though this feature can have performance impacts too, and can be disabled if necessary.

Timing Results

Test

Time (secs)

Total

2531.8

Extract kernel sources

3.1

Extract GCC sources

5.0

Recursive random file

25.0

Configure GCC

1.5

Kernbench

831.4

GCC make -j16 bootstrap

1273.9

Remove kernel source

1.2

Bonnie++ file operations

18.1

Remove GCC tree

2.8

tiobench threaded I/O

66.2

Bonnie++ intelligent I/O

303.3

Bonnie++ Results

Test

Result

Sequential create/sec

29107

Sequential stats/sec

+++++

Sequential delete/sec

24549

Random create/sec

28179

Random stat/sec

+++++

Random delete/sec

16623

Block writes KB/sec

359405

Block rewrites KB/sec

116784

Block read KB/sec

215436

Random seeks/sec

989.1

ChunkFS

*     Authors: Amit Gud, Val Henson, et. al

*     Website(s): http://linuxfs.pbwiki.com/chunkfs

Background

ChunkFS is based on ideas from Arjan van de Ven and Val Henson to counter the “fsck problem” caused by seek times not keeping up with disk sizes and bandwidth. It was discussed at the 2006 Linux Filesystems Workshopand again at the 2007 Workshop.

In their Usenix paper they describe the filesystem, saying:

"Our proposed solution, chunkfs, divides up the on-disk file system format into individually repairable chunks with strong fault isolation boundaries. Each chunk can be individually checked and repaired with only occasional, limited references to data outside of itself."

There are two early implementations of ChunkFS at present; one is a straight kernel filesystem and the second is implemented as a user space filesystem using FUSE. Both use the ext2 filesystem code underneath the covers.

Installation

Both the FUSE and kernel versions, as well as their associated tool sets, are available via git from http://git.kernel.org/.

FUSE version

To retrieve the FUSE version of ChunkFS, I installed the cogito package which provides a higher-level interface to GIT and then cloned the repository thus:

# apt-get install cogito

# cg-clone git://git.kernel.org/pub/scm/linux/kernel/git/gud/chunkfs.git

Because it is a FUSE filesystem, it requires some dependencies to be pulled in, which are documented in the INSTALL files:

# apt-get install e2fslibs e2fslibs-dev fuse-utils libfuse-dev libfuse2

There is one undocumented dependency too:

# apt-get install pkg-config

To build it the usual routine is followed:

# ./configure --prefix=/usr/local/chunkfs-fuse-trunk

# make

# make install

It appears the FUSE version lags somewhat behind the kernel version, though the FUSE version is easier to debug because it is fairly easy to run it under GDB.

Kernel module

I had expected the straight kernel filesystem version of ChunkFS to be simply a kernel module, so on trying to clone it I was surprised to find out it was its own self-contained kernel tree! Thus began a rather painful experience.

I began by importing my existing kernel config file for the 2.6.22.1 build I already was using, and enabled it as a module. I also took notice of the text on the ChunkFS PBwiki page, which says:

Compile with CONFIG_CHUNKFS_FS set and CONFIG_BLK_DEV_LOOP to "y". NOTE: No xattrs and xips yet, CONFIG_EXT2_FS_XATTR and CONFIG_EXT2_FS_XIP should be "no" for clean compile.

Unfortunately I was then greeted with this error during the build:

ERROR: "shrink_dcache_for_umount" [fs/chunkfs/chunkfs.ko] undefined!

ERROR: "super_blocks" [fs/chunkfs/chunkfs.ko] undefined!

ERROR: "sb_lock" [fs/chunkfs/chunkfs.ko] undefined!

Telling it to build directly into the kernel worked, however, so it appears that (for the moment) ChunkFS will not build as a module.

However, when trying to boot this kernel, it would panic on boot. I turned to Amit Gud, one of the developers, and he was able to supply me with an extremely cut-down kernel .config that he was using. I was able to take that and by carefully enabling only what was essential (PCI device support, SCSI support, Adaptec AACRAID driver, networking, and so forth), I was able to get the system to boot.

However, this certainly is not yet recommended for beginners!

Configuration

To be able to create a ChunkFS filesystem for either version of the filesystem, you need the ChunkFS version of mkfs which again is available through git.

# cg-clone git://git.kernel.org/pub/scm/linux/kernel/git/gud/chunkfs-tools.git

# cd chunkfs-tools

# ./configure --prefix=/usr/local/chunkfs-fuse-trunk

# make

# make install

As I had seven drives in the RAID-0 stripe, I decided to set that as the number of chunks to create within the ChunkFS filesystem, like this:

# /usr/local/chunkfs-fuse-trunk/sbin/mkfs -C 7 /dev/md0

However, it was rapidly apparent that this was incredibly slow and that it appeared to be rather a lot of I/O for each chunk. A quick investigation into the code showed that a missing break in the argument-checking code was causing the -C option to drop through into the -c option and setting the variable to enable bad block checking. Fortunately pretty easy to fix!

NOTE: This patch has been submitted upstream, but it has not been committed out to the kernel.org git repository, so you may want to check for yourself if you wish to experiment.

FUSE

Mounting the FUSE filesystem is the same as running any other user process. Here we pass through the -o option to specify the chunks we created previously with the custom mkfs command. You will notice that the same device is referenced seven times, once for each chunk specified with the -C option. Finally, we specify the mount point.

# /usr/local/chunkfs-fuse-trunk/sbin/chunkfs -o chunks=/dev/md0:/dev/md0:/dev/md0:/dev/md0:/dev/md0:/dev/md0:/dev/md0 /mnt

Kernel

The kernel filesystem is much simpler.  Even though we have seven chunks, we don't need to tell it what we have created.

# mount -t chunkfs /dev/md0

Tests

Timing Results

Here we look at the FUSE and the kernel version separately.

FUSE

Test

Time (secs)

Total

Invalid due to above failures

Extract kernel sources

47.5

Extract GCC sources

116.2

Recursive random file

26.2

Configure GCC

Failed, ../configure: Permission denied

Kernbench

Failed, returned 0 seconds for each run

GCC make -j16 bootstrap

Failed as configure didn't complete.

Remove kernel source

chunkfs-fuse process crashed during this.

Bonnie++ file operations

N/A

Remove GCC tree

N/A

tiobench threaded I/O

N/A

Bonnie++ intelligent I/O

N/A

The FUSE variant seemed rather fragile. A reproducible crash before the results above was tracked down to a buffer overrun when the name of a file to be unlinked was being passed through. This code only appears in the FUSE variant, not the kernel version, because that doesn't require this glue layer.

Bonnie++ Results

As can be seen above, the FUSE version of ChunkFS was not robust enough to survive the testing, so no Bonnie++ results were available.

Kernel

Sadly, the kernel version of ChunkFS locked the machine up hard while trying to extract the kernel source tree, and it was not possible to track down where this was happening in the time available.

NILFS

l     Authors: The NILFS Development Team, NTT Laboratories

l     Website(s):

http://www.nilfs.org/en/

Background

NILFS is a log based filesystem developed in Japan by NTT Laboratories and designed to provide continuous “checkpoints” (as well as on demand) that can be converted into snapshots (a persistent checkpoint) at a later date, before the checkpoint expires and is cleaned up by the garbage collector. These snapshots are separately mountable as read-only filesystems and can be converted back into checkpoints (for garbage collection) at a later date.

It has what appears to be a rather nicely thought out set of user commands to create (mkcp), list (lscp), change (chcp) and remove (rmcp) checkpoints.

Installation

Installation of NILFS is reasonably straightforward. I grabbed the 2.0.0-testing-3 versions of the nilfs kernel module and the nilfs-utils package and extracted them into their own directories.

The kernel module builds as an out-of-tree module so it just a matter of:

# cd nilfs-2.0.0-testing-3

# make

# make install

to get the necessary kernel module installed into /lib/modules/2.6.22.1+ext4/kernel/fs/nilfs2/.

The utilities package uses the standard autoconf tools, I built them with:

# cd nilfs-utils-2.0.0-testing-3

# ./configure --prefix=/usr/local/nilfs-utils-2.0.0-testing-3

# make -j4

# make install

Then I found out that it it didn't completely honour the —prefix option I had passed through, as it did:

/usr/bin/install -c .libs/nilfs_cleanerd /sbin/nilfs_cleanerd

/usr/bin/install -c mkfs.nilfs2 /sbin/mkfs.nilfs2

/usr/bin/install -c mount.nilfs2 /sbin/mount.nilfs2

/usr/bin/install -c umount.nilfs2 /sbin/umount.nilfs2

This will be to allow for the way that mount, mkfs, etc., work when passed the -t <fstype> option.

Configuration

Creating a NILFS filesystem is very easy:

# mkfs.nilfs /dev/md0

Mounting is again very simple:

# mount -t nilfs2 /dev/md0 /mnt

Tests

Timing Results

Test

Time (secs)

Total

3870.5

Extract kernel sources

5.5

Extract GCC sources

8.2

Recursive random file

22.4

Configure GCC

1.9

Kernbench

827.0

GCC make -j16 bootstrap

1293.6

Remove kernel source

0.7

Bonnie++ file operations

517.6

Remove GCC tree

2.7

tiobench threaded I/O

106.5

Bonnie++ intelligent I/O

1084.4

Bonnie++ Results

Test

Result

Sequential create/sec

495

Sequential stats/sec

+++++

Sequential delete/sec

118726

Random create/sec

495

Random stat/sec

+++++

Random delete/sec

993

Block writes KB/sec

102669

Block rewrites KB/sec

60190

Block read KB/sec

177609

Random seeks/sec

519.6

btrfs

l     Authors: Chris Mason, Oracle

Related:
1 2 3 Page 1
Page 1 of 3
IT Salary Survey: The results are in