Comparing files and directories with the diff and comm Linux commands

The Linux comm command makes it easy to compare files or the contents of directories with its columnar output.

Linux comm command
Robert Couse-Baker (CC BY 2.0)

There are a number of ways to compare files and directories on Linux systems. The diff, colordiff, and wdiff commands are just a sampling of commands that you're likely to run into. Another is comm. The command (think "common") lets you compare files in side-by-side columns the contents of individual files.

Where diff gives you a display like this showing the lines that are different and the location of the differences, comm offers some different options with a focus on common content. Let's look at the default output and then some other features.

Here's some diff output — displaying the lines that are different in the two files and using < and > signs to indicate which file each line came from.

$ diff whoison whoison-again
7c7
< who | awk '{print $1}' | sort | uniq
---
> who | awk '{print $1}' | sort | wc -l

If you've used the diff command a lot, you probably know that it can also display file content side by side. In the example below, we see that the one line that is different between the two files is marked with a vertical bar preceding the line that is different.

$ diff -y whoison whoison-again
#!/bin/bash                                      #!/bin/bash
# show unique logins                             # show unique logins

echo hello, $USER                                echo hello, $USER
echo Look who is logged in!                      echo Look who is logged in!
echo ===========================                 echo ===========================
who | awk '{print $1}' | sort | uniq           | who | awk '{print $1}' | sort | wc -l
echo ===========================                 echo ===========================

The comm command displays the differences in columns by default, but we've got a little problem here:

$ comm whoison whoison-again
                #!/bin/bash
                # show unique logins

                echo hello, $USER
                echo Look who is logged in!
                echo ===========================
who | awk '{print $1}' | sort | uniq
comm: file 1 is not in sorted order             <=== Oops! The comm commands expects
echo ===========================                     sorted data
        who | awk '{print $1}' | sort | wc -l
comm: file 2 is not in sorted order
        echo ===========================

The errors indicated in this output confirm one important restriction with diff — it requires that the files being compared are in sorted order.

The output is very different from diff, but let's review what we're seeing. In the diff output, we are looking at the lines that are different in the two files. All the other lines in the two files are the same.

In the comm output, we also see the content of both files in columns, but the key is the indentation. The rightmost column displays the content that is the same in both files — up to a point. The other two columns show (leftmost) the content that is unique to the first file and (middle) the content that is unique to the second file. But we also see another line (shown twice) in the first two columns and a couple complaints that the data being compared is not in sorted order. This tells us something about the way that comm works. It expects to be working with files that are in sorted order. You're better off using diff when you want to compare scripts or other non-sorted data.

unique to     unique to     common to
file 1        file 2        both files

Say we're comparing lists of the states in which two individuals have lived. In this example, the states in which Eric lived (and Sandra did not) are shown in the left column, while the states in which Sandra lived (and Eric did not) appear in the middle column. The states in which they've both lived are in the rightmost column. In this case, both lists of states are in alphabetic order, so the comm command works as expected.

$ comm eric sandra
                California
        Connecticut
        Hawaii
                Maryland
Michigan
        New Jersey
        New York
                Pennsylvania
        Texas
                Virginia

Now let's assume that you only want to see the states where Eric and Sandra both lived. That's easy for the comm command. You just need to use the -12 option, which tells comm to not display what you would normally see in columns 1 and 2.

$ comm -12 eric sandra
California
Maryland
Pennsylvania

Comparing directories

The comm command can also be easily coerced into showing you the differences between the contents of two similar directories. After all, directory listings are by nature in alphabetical order. In the following example, we see that the dir1 and dir2 directories have 3 files in common and that each has a single unique file.

$ comm <(ls dir1) <(ls dir2)
        infile
                junk.pl
                runme
                test
xfile

When you use comm to compare directory listings, all you're doing is comparing the file names, not the content of the files. If you're comparing the contents of recently configured home directories, you can add the -a to view "dot files."

$ comm ^lt;(ls -a mjw) <(ls -a pxg)
                .
                ..
.bash_history
.bashrc
.bashrc.orig
bin
        .cshrc
        .history
        .login
        .logout
mbox
                .profile
                public_html
.ssh
                .viminfo
.vimrc
.Xauthority

The same thing can be done with diff, but the output is a little different — with < and > signs identifying the differences and no sign of the common files.

$ diff <(ls -a user1) <(ls -a user2)
3,7c3,6
< .bash_history
< .bashrc
< .bashrc.orig
< bin
< mbox
---
> .cshrc
> .history
> .login
> .logout
10d8
< .ssh
12,13d9
< .vimrc
< .Xauthority

Did you notice what the comm command is doing when we use the <() arguments? It's running the commands between the parentheses. Other command output could be compared in the same manner. Here's an example:

$ comm <(pwd) <(echo /home/justme)
                /home/justme

That might not be the most insightful command you might run, but you can see how comm is informing us that the pwd command and the echo command have the same output. The diff command would make the same kind of comparison, but give you no output by default — an indication that the output from both commands is the same.

$ diff <(pwd) <(echo /home/justme)
$

The comm command can provide a way to compare the output of two commands as easily as it can compare two files. Just be sure the data you're comparing is in alphabetical order if more than one line of output is expected.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Related:
Now read: Getting grounded in IoT