• United States
Unix Dweeb

Searching through compressed files on Linux

Nov 01, 20213 mins

A laptop user with magnifying lens examines binary data.
Credit: AlphaSpirit / Getty Images

There are quite a few ways to search through compressed text files on Linux systems without having to uncompress them first. Depending on the format of the files, you can choose to view entire files, extract specific text, navigate through file contents searching for content of interest, and sometimes even edit content. I

First, to show you how this works, I compressed the words file on one of my Linux systems (/usr/share/dict/words) using these commands:

$ cp /usr/share/dict/words .
$ 7z a words.7z words
$ bzip2 -k words
$ gzip -k words
$ xz -k words
$ zip words

The -k options used with the bzip2, gzip, and xz commands kept these commands from removing the original file, which they would by default. The resultant files then looked like this:

$ ls -l
total 9164
-rw-r--r--. 1 shs shs 4953598 Oct 27 16:11 words
-rw-r--r--. 1 shs shs 1230545 Oct 27 16:14 words.7z
-rw-r--r--. 1 shs shs 1712421 Oct 27 16:11 words.bz2
-rw-r--r--. 1 shs shs 1476067 Oct 27 16:11 words.gz
-rw-r--r--. 1 shs shs 1230236 Oct 27 16:11 words.xz
-rw-r--r--. 1 shs shs 1476203 Oct 28 12:42

Viewing compressed-file content

To view the entire content of a compressed file while leaving the compressed file intact, you can use any of these commands:

  • for 7z:  7z x -so words.7z
  • for bz2:  bzcat words.bz2
  • for gz:  zcat words.gz
  • for xz:  xzcat words.xz
  • for zip:  zcat

For example:

$ bzcat words.bz2 | head -5        $ 7z x -so words.7z | head -5
1080                               1080
10-point                           10-point
10th                               10th
11-point                           11-point
12-point                           12-point

You can also pipe the output to commands like more or grep, or simply watch it scroll rapidly down your screen.

$ 7z x -so words.7z | grep overclever

Browsing with less

You can browse some types of compressed files (bz2, gz and xz) using the less command.

$ less words.bz2        $ less words.gz         $ less words.xz
1080                    1080                    1080
10-point                10-point                10-point
10th                    10th                    10th
11-point                11-point                11-point
12-point                12-point                12-point
...                     ...                     ...

Searching for text in 7z files

The 7z command allows you to view files included in the archive, but searching their contents requires an extraction (-x) option. However, a command like that below leaves the compressed file intact but also extracts the contents in the process. The -so option tells the command to write data to standard out.

$ 7z x -so words.7z | grep clever | column
clever          cleverest       cleverly        overcleverly    uncleverness
cleverality     clever-handed   cleverness      overcleverness
clever-clever   cleverish       clevernesses    unclever
cleverer        cleverishly     overclever      uncleverly

There doesn’t seem to be a grep-like command for 7z files, but commands like this work very well.

Searching for text in other types of compressed files

To search for specific text in compressed files, you can use commands like these:

$ bzgrep overclever words.bz2
$ zgrep overclever words.gz
$ xzgrep overclever words.xz
$ zipgrep overclever

For any of these commands, you should see these words that they pull from the compressed word files:


Editing compressed files

Using vi or vim, you can actually edit some compressed files (bz2, gz and xz files) to add, change, or remove content. The files will remain compressed on your disk, but you’ll be able to notice the size changes.

$ xzcat words.xz | tail -3
$ vi words.xz
$ xzcat words.xz | tail -3
I added this line!


Given all the ways that you can browse and select content from compressed files, it might be a good time to exercise your “overcleverness” and see how helpful the methods described in this post might be.

Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

More from this author