The Unix diff command is very handy, but it can do a lot more than just let you know if two files that you’re evaluating are the same or different. It can find and show you the differences and it can find and show you those differences in any of several different ways. It can also generate files that can force files that are different to be the same. Let’s take a look at the most common uses of diff and then see what else it can do to make your work easier. One of the most common uses of the diff command is to tell a user whether two files, which might appear to be the same (based on size and other characteristics such as permissions, ownership and name) are actually the same. The diff command can do a byte-by-byte comparison lickety-split (that’s 19th century for “very fast”) even if the files are very large.
$ diff /usr/bin/time /bin/date Binary files /usr/bin/time and /bin/date differ
OK, so these files are different. The executable for the time command and the one for the date command are not the same. This is no surprise there since the functionality of the commands is so different. You wouldn’t expect them to share code, so why would they be implemented as a single file? If the files you’re looking were not different in their content, you see no output from the diff command.
$ diff /usr/bin/zcmp /usr/bin/zdiff $
This output shouldn’t surprise us at all if we noticed that these two files are actually hard links like they are on the system I’m working on:
$ ls -i /usr/bin/zcmp /usr/bin/zdiff 3257719 /usr/bin/zcmp 3257719 /usr/bin/zdiff
Notice that they have the same inode numbers, so they’re the same file. When used to detect and report on differences between text files, on the other hand, you can expect to see a lot more interesting results. If the files contain the exact same text, we’ll see no output again, but if we examine two very simple but different text files with diff, we’ll see something like this:
$ diff one two 1d0 < one 3d1 < three 5d2 < five 7c4,6 < seven --- > eight > ten > twelve
Sure, you might just look at this file and conclude that the files are different, but this output can be a lot more useful. Obviously, the files are different. But what do those command sequences like 3d1 actually mean? To best understand what is happening here, let’s first look at the contents of the two files, one and two. They look like this:
one two === === one two two four three six four eight five ten six twelve seven
Now look back at the diff output. In each command string (e.g., 3d1), the part before the letter (e.g., 3), is a line number or a range of lines. The “3” means “line 3”. If it were, instead, “3,5”, it would mean lines 3 through 5.
$ diff one two 1d0 < one 3d1
The letter represents a command. The a (append), c (change), d (delete), i (insert), and s (substitute) commands can be used to resolve the differences between the two files. In other words, the commands could make the files identical. The numbers after the letter represent the line or the line range in the second file. The commands represent a script that, using the patch command, will change file “one” into a copy of file “two”. It would add the missing lines (those in file two and not in file one) and remove the excess lines (those in file one and not file two). While this might not seem like an exciting thing to do, imagine being able to replicate the changes in a configuration file using a set of ed commands that allow you to just capture and replicate just the changes without any other effects.
$ patch one -i fixit -o one1 patching file one $ cat one1 two four six eight ten twelve
File one1 is now updated with the change commands needed to make it look like two. In the real world, you might be changing a dozen lines and adding four to a file that’s several hundred lines long. And you might be doing this on several hundred systems. You could trust that process to someone’s “hand editing” or you could send out a “patch file” and maybe even a script to run it. Or you could, of course, copy the changed file around. But making the changes as easy as possible and as schedulable as possible might just be the best approach. You might, after all, be sending the fixes to your customers or staff at remote locations.
$ cat runme patch one –i fixit –o one1 mv one1 one
You can also save the output from diff in ed format using the –e option like this:
$ diff –e fixit one two
In this case, the fixit file will look like this:
$ cat fixit 7c eight ten twelve . 5d 3d 1d
If you wanted, you could open the file to be changed in ed, type the commands 7c, eight, and so on as shown and the changes would be made. Then just exit ed with the w (write) command and quit with q. To run the changes from the command line, you would do something like this:
$ (cat fixit && echo w) | ed - one $ cat one two four six eight ten twelve
The parenthesized commands are sent to ed which uses the “fixit” file to make the changes. The display of the file shows that the changes have been made. Another useful way to use the diff command is to use the –p option. With this option, the differences between the two files are illustrated in a way that provides more context for the viewer. As you’ll note below, we first get some information on the update times for the files. Then we see the contents of the two files with the lines in file appear in one file, but are missing from the other prepended with - or !.
$ diff -p one two *** one 2014-06-28 17:04:05.000000000 -0400 --- two 2014-06-28 15:31:33.000000000 -0400 *************** *** 1,7 **** - one two - three four - five six ! seven --- 1,6 ---- two four six ! eight ! ten ! twelve
You might like the –y option even more as it will show you the differences between two files in a side-by-side fashion.
$ diff -y one two one < two two three < four four five < six six > eight
You can also install and use the colordiff command if you would prefer to be showing the different lines in one of two colors to indicate the source (e.g., the lines from one file might be red, the other blue). If the files you want to compare are on two different systems, a better approach to determining whether they’re the same or not is to compute a checksum and compare the checksums. The md5sum command is ideal for this.
server1$ md5sum file1 0789d2dcc23a7984a47319228597c1c4 file1
server2> md5sum file1 95ee44328db4819563548fd9789becb2 file1
Options like –i to ignore differences in case, –ignore-all-space and –ignore-blank-lines can also come in very handy when you just don’t want to be bothered with insignificant file differences. The diff command has well over 60 options – suggesting that it’s a lot more complicated and versatile than you might have come to expect.
Read more of Sandra Henry-Stocker's Unix as a Second Language blog and follow the latest IT news at ITworld, Twitter and Facebook.