• United States
Unix Dweeb

How to extract content from compressed files on Linux

Dec 20, 20173 mins

Making the process easy without having to memorize a suite of syntactical options

The easiest way to extract the content of compressed files (and compressed archives) on Linux is to prepare a script that both recognizes files by type and uses the proper commands for extracting their contents. Almost every compressed file will have an easily recognizable file extension — .Z, .gz, .tgz etc. And while the commands aren’t very complex, there sure are a lot of them and many options for each.

So, why not attack the problem with a script that saves your precious brain cells for more challenging work? Let’s look at an example that you might want to consider.

In this script, the order in which the file types are listed is important. File extensions like .tar.gz that incorporate simpler file extensions like .gz have to be checked before .gz so that the proper extraction command is used. The case statement, after all, is going to select the first matching criteria that it encounters.

So, the case statement might look like this:


if [ -f $1 ] ; then
  case $1 in
    *.tar.bz2)  tar xjf $1   ;;
    *.tar.gz)   tar xzf $1   ;;
    *.tar.xz)   tar zxvf $1  ;;
    *.bz2)      bunzip2 $1   ;;
    *.rar)      rar x $1     ;;
    *.gz)       gunzip $1    ;;
    *.tar)      tar xf $1    ;;
    *.tbz2)     tar xjf $1   ;;
    *.tgz)      tar xzf $1   ;;
    *.xz)       xz -d $1     ;;
    *.zip)      unzip $1     ;;
    *.Z)        uncompress $1;;
    *)          echo "contents of '$1' cannot be extracted" ;;
  echo "'$1' is not recognized as a compressed file"

Fortunately, there’s quite a bit of consistency among the extraction commands’ numerous options. For the commands shown, “x” is the “extract” option, “v”, “j” is the tar command’s option for bzip2 files. “z” is for gzip, and “d” is xz’s decompress. Where shown, “f” is the option to specify the file name.

If a compressed file’s naming convention doesn’t match any of those included, we’ll get an error and a chance to update the script as needed.

If an extraction fails, anyone running the command used should generate errors. However, you can also make the problem a little more obvious if you specifically say so. Checking the return code after the compression will let anyone using it know whether the decompress operation succeeded or failed. If the return code is not zero, the script exits.

if [ $? != 0 ]; then
    echo "extraction failed"
    exit 1

Another thing you might want to add is the option of removing the original file once the extraction is done. For some decompression commands (such as uncompress), the file is uncompressed in place, so the original will be gone once the uncompress operation is complete. For most of the decompress commands, however, the extraction will not remove the original file. Adding an option to remove it makes it easier for users to remove the original file if they want. This code would follow the failure exit described above.

if [ -f $1 ]; then
  echo -n "Do you want to remove the original file ($1) [Yn]?> "
  read ans
  if [ $ans == “Y” ]; then
    rm $1
    if [ $? == 0 ]; then
      echo $1 removed
      echo “ERROR: $1 not removed”

This code removes the original file if it still exists, but only if the user agrees by typing “Y”.

Uncompressing and extracting the contents of compressed files isn’t all that complicated, but remembering all of the commands and options certainly is. A script like the one described here can save a lot of time — especially if you have to deal with compressed files of many types.

Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

More from this author