Peering into inodes

It doesn't take very much time working on a Unix system before your attention is drawn to the mysteries of the inode, especially these days with the concept of metadata getting so much public attention in the ongoing debates over privacy. What exactly is metadata and how is it used on Unix systems?

Metadata is simply the collection of data that describes data. The preface "meta" means something like "along with" or "behind". And just as 1) the phone numbers involved in a phone call and the date and time that a call was made describe a phone call and 2) your name, cell number, driver's license, home address, etc. identify you, metadata describes the files on your system by recording all of the important information that goes beyond the file's contents. In short, metadata describes data.

Inodes are the data structures that store all the metadata for your files -- all there is to know about your files except for what they contain and their names. The only reason that inodes don't contain the file names is that this convention allows you to reference the same files by multiple names or from different locations in a file system. File names are only stored inside directory files that represent the directories that appear to contain them. And I say "appear to" only because directories are actually themselves just a special kind of file.

In general, Unix commands don't allow you to look at the contents of directory files except in a manner consistent with their use as "folders" that contain other files. So, if you try to look at directory files as if they were ordinary files, you will get some push back from the OS. Let's try:

$ cat mydir
cat: mydir: Is a directory
$ more mydir

*** mydir: directory ***
od -bc mydir
od: mydir: read error: Is a directory

Ah, yes, we can see that directories are directories, but we know that they're also still files. And if we deliberately or inadvertently try to edit a directory file, we will end up seeing something like this:

$ vi mydir
" ============================================================================
" Netrw Directory Listing                                        (netrw v149)
"   /home/shs/mydir
"   Sorted by      name
"   Sort sequence: [\/]$,\<core\%(\.\d\+\)\=\>,\.h$,\.c$,\.cpp$,\~\=\*$,*,\.o$,\
"   Quick Help: <F1>:help  -:go up dir  D:delete  R:rename  s:sort-by  x:exec
" ============================================================================

Some of the content that you see in this example directory file (e.g., junk and msg) are names of the files stored "inside" that directory.

Inodes record a lot of useful information about your files, including where the content is stored on your disk, who owns the files, when the files were last updated (sometimes along with when they were first created and most recently accessed). Aside from the content itself, the file names are the only exception to what the inodes store because that information, instead of being stored in the inode, is stored in the directory files -- as we saw in the example above.

The fact that inodes are data structures that are stored within and created when each file system is built leads to some interesting characteristics:

  1. Within a file system, inodes are always unique. You can have duplicate inode numbers on a system if and only if the inodes are stored in different file systems.
  2. It's possible to fill up a file system by using all of the space that it provides for inodes without using all of the space that it provides for file contents. In general, however, file systems are set up such that you're far more likely to run out of file space before you run out of inode space.

To see what percentage of the available inodes is in use on a system, you can use the df -i command. In the example below, only 8% of the inodes are being used.

$ df -i
Filesystem     Inodes IUsed  IFree IUse% Mounted on
/dev/xvda1     524288 37616 486672    8% /
devtmpfs       125224   427 124797    1% /dev
tmpfs          127523     1 127522    1% /dev/shm

The data stored in an inode includes:

  • File type
  • File permissions for owner, group, and other
  • Owner
  • Group
  • File Size
  • File access, change and modification times
  • File deletion time (if the file has been deleted and the inode not yet reused)
  • The number of hard links that refer to the inode
  • The file's extended attributes
  • Access Control List (ACLs)

You can best view the contents of an inode using the stat command. Here's an example.

$ stat maybe
  File: ‘maybe’
  Size: 153             Blocks: 8          IO Block: 4096   regular file
Device: ca01h/51713d    Inode: 412714      Links: 2
Access: (0740/-rwxr-----)  Uid: (  500/ec2-user)   Gid: (  500/ec2-user)
Access: 2015-08-22 18:23:38.085900612 +0000
Modify: 2015-08-22 19:14:07.514542177 +0000
Change: 2015-12-09 00:19:29.763283761 +0000
 Birth: -

When you look at that output, it's easy to imagine that modify and change mean the same thing. However, they're used for very different things in an inode. The modify field records the last time that the content of the file was modified while the change field represents the last time fields within the inode were changed. Edit the file and the modify field changes; use the setfacl command to give access to another user and the change field reflects your changes.

The Links: 2 field shows you that there are two links to this file -- two references to this file in the file system. In this case, someone ran the command ln maybe maybe2 as we'll see in just a moment.

Notice that the inode displayed above also displays a "birth" field. On some systems, this field might capture the time and date when the file was first created, but on most systems today, this field is empty or missing altogether.

To display a file's inode number, you can simply use the ls -i command.

$ ls -i maybe*
412714 maybe  412714 maybe2  412716 maybe.awk  413245 maybe-not

In the output above, we can see the inode numbers for the first two files are the same. This means they're both really the same file (contents and metadata) with two file system references. In other words, they're hard links created by a command such as ln maybe maybe2.

You can also use a file's inode number to locate it. In this case, we find the file twice because a hard link exists.

$ find . -inum 412714

If we look at a long file listing, we can see that one of the maybe* files is a symbolic link that points to the maybe file. We also can see that the maybe/maybe2 file has extended attributes, indicated by the + sign at the end of the permissions shown at the beginning of each line. These extended attributes were added to the file's permissions with the setfacl command.

$ ls -l maybe*
-rwxr-----+ 2 ec2-user ec2-user 153 Aug 22 19:14 maybe
-rwxr-----+ 2 ec2-user ec2-user 153 Aug 22 19:14 maybe2
-rwx------  1 ec2-user ec2-user  44 Aug 22 19:13 maybe.awk
lrwxrwxrwx  1 ec2-user ec2-user   5 Dec  9 00:15 maybe-not -> maybe

To remove a file using its inode number, you can use a find command like this one:

$ find . -inum 412714 -exec rm {} \;
$ find . -inum 412714

Notice that, when we try the find command a second time, the file can no longer be found. The inode might still exist, but will be marked as deleted. I used to use this method to remove files with names like -ouch before I discovered that commands like rm -- -ouch worked even more easily.

What you don't see using the stat command is a clear indication of where the file's content is stored, but the Device field points the way. What you need to keep in mind is that a file can occupy an extremely large number of blocks within a file system and that a series of pointers, direct and indirect, to blocks of data allow an inode to reference a huge amount of storage. Although you won't often see them, single files can now reach sizes in the terrabytes.

While inodes have been around since Unix was a bouncing baby of an operating system, they have changed in sophistication over time. Still, they represent a fundamental component of all Unix file systems.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2015 IDG Communications, Inc.