• United States
Unix Dweeb

Comparing files and directories with the diff and comm Linux commands

Jun 07, 20186 mins
LinuxRed HatUbuntu

The Linux comm command makes it easy to compare files or the contents of directories with its columnar output.

There are a number of ways to compare files and directories on Linux systems. The diff, colordiff, and wdiff commands are just a sampling of commands that you’re likely to run into. Another is comm. The command (think “common”) lets you compare files in side-by-side columns the contents of individual files.

Where diff gives you a display like this showing the lines that are different and the location of the differences, comm offers some different options with a focus on common content. Let’s look at the default output and then some other features.

Here’s some diff output — displaying the lines that are different in the two files and using signs to indicate which file each line came from.

$ diff whoison whoison-again
 who | awk '{print $1}' | sort | wc -l

If you’ve used the diff command a lot, you probably know that it can also display file content side by side. In the example below, we see that the one line that is different between the two files is marked with a vertical bar preceding the line that is different.

$ diff -y whoison whoison-again
#!/bin/bash                                      #!/bin/bash
# show unique logins                             # show unique logins

echo hello, $USER                                echo hello, $USER
echo Look who is logged in!                      echo Look who is logged in!
echo ===========================                 echo ===========================
who | awk '{print $1}' | sort | uniq           | who | awk '{print $1}' | sort | wc -l
echo ===========================                 echo ===========================

The comm command displays the differences in columns by default, but we’ve got a little problem here:

$ comm whoison whoison-again
                # show unique logins

                echo hello, $USER
                echo Look who is logged in!
                echo ===========================
who | awk '{print $1}' | sort | uniq
comm: file 1 is not in sorted order             

The errors indicated in this output confirm one important restriction with diff — it requires that the files being compared are in sorted order.

The output is very different from diff, but let's review what we're seeing. In the diff output, we are looking at the lines that are different in the two files. All the other lines in the two files are the same.

In the comm output, we also see the content of both files in columns, but the key is the indentation. The rightmost column displays the content that is the same in both files — up to a point. The other two columns show (leftmost) the content that is unique to the first file and (middle) the content that is unique to the second file. But we also see another line (shown twice) in the first two columns and a couple complaints that the data being compared is not in sorted order. This tells us something about the way that comm works. It expects to be working with files that are in sorted order. You're better off using diff when you want to compare scripts or other non-sorted data.

unique to     unique to     common to
file 1        file 2        both files

Say we're comparing lists of the states in which two individuals have lived. In this example, the states in which Eric lived (and Sandra did not) are shown in the left column, while the states in which Sandra lived (and Eric did not) appear in the middle column. The states in which they've both lived are in the rightmost column. In this case, both lists of states are in alphabetic order, so the comm command works as expected.

$ comm eric sandra
        New Jersey
        New York

Now let's assume that you only want to see the states where Eric and Sandra both lived. That's easy for the comm command. You just need to use the -12 option, which tells comm to not display what you would normally see in columns 1 and 2.

$ comm -12 eric sandra

Comparing directories

The comm command can also be easily coerced into showing you the differences between the contents of two similar directories. After all, directory listings are by nature in alphabetical order. In the following example, we see that the dir1 and dir2 directories have 3 files in common and that each has a single unique file.

$ comm 

When you use comm to compare directory listings, all you're doing is comparing the file names, not the content of the files. If you're comparing the contents of recently configured home directories, you can add the -a to view "dot files."

$ comm ^lt;(ls -a mjw) 

The same thing can be done with diff, but the output is a little different — with signs identifying the differences and no sign of the common files.

$ diff  .cshrc
> .history
> .login
> .logout

Did you notice what the comm command is doing when we use the

$ comm 

That might not be the most insightful command you might run, but you can see how comm is informing us that the pwd command and the echo command have the same output. The diff command would make the same kind of comparison, but give you no output by default — an indication that the output from both commands is the same.

$ diff 

The comm command can provide a way to compare the output of two commands as easily as it can compare two files. Just be sure the data you're comparing is in alphabetical order if more than one line of output is expected.

Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

More from this author