Breaking Linux files into pieces with the split command

Some simple Linux commands allow you to break up files and reassemble them as needed in order to accommodate size restrictions on file size for storage or email attachments

chocolate chunks
Marco Verch (CC BY 2.0)

Linux systems provide a very easy-to-use command for breaking files into pieces. This is something that you might need to do prior to uploading your files to some storage site that limits file sizes or emailing them as attachments. To split a file into pieces, you simply use the split command.

$ split bigfile

By default, the split command uses a very simple naming scheme. The file chunks will be named xaa, xab, xac, etc., and, presumably, if you break up a file that is sufficiently large, you might even get chunks named xza and xzz.

Unless you ask, the command runs without giving you any feedback. You can, however, use the --verbose option if you would like to see the file chunks as they are being created.

$ split –-verbose bigfile
creating file 'xaa'
creating file 'xab'
creating file 'xac'

You can also contribute to the file naming by providing a prefix. For example, to name all the pieces of your original file bigfile.xaa, bigfile.xab and so on, you would add your prefix to the end of your split command like so:

$ split –-verbose bigfile bigfile.
creating file 'bigfile.aa'
creating file 'bigfile.ab'
creating file 'bigfile.ac'

Note that a dot is added to the end of the prefix shown in the above command. Otherwise, the files would have names like bigfilexaa rather than bigfile.xaa.

Note that the split command does not remove your original file, just creates the chunks. If you want to specify the size of the file chunks, you can add that to your command using the -b option. For example:

$ split -b100M bigfile

File sizes can be specified in kilobytes, megabytes, gigabytes … up to yottabytes! Just use the appropriate letter from K, M, G, T, P, E, Z and Y.

If you want your file to be split based on the number of lines in each chunk rather than the number of bytes, you can use the -l (lines) option. In this example, each file will have 1,000 lines except, of course, for the last one which may have fewer lines.

$ split --verbose -l1000 logfile log.
creating file 'log.aa'
creating file 'log.ab'
creating file 'log.ac'
creating file 'log.ad'
creating file 'log.ae'
creating file 'log.af'
creating file 'log.ag'
creating file 'log.ah'
creating file 'log.ai'
creating file 'log.aj'

If you need to reassemble your file from pieces on a remote site, you can do that fairly easily using a cat command like one of these:

$ cat x?? > original.file
$ cat log.?? > original.file

Splitting and reassembling with the commands shown above should work for binary files as well as text files. In this example, we’ve split the zip binary into 50 kilobyte chunks, used cat to reassemble them and then compared the assembled and original files. The diff command verifies that the files are the same.

$ split --verbose -b50K zip zip.
creating file 'zip.aa'
creating file 'zip.ab'
creating file 'zip.ac'
creating file 'zip.ad'
creating file 'zip.ae'
$ cat zip.a? > zip.new
$ diff zip zip.new
$                    <== no output = no difference

The only caution I have to give at this point is that, if you use split often and use the default naming, you will likely end up overwriting some chunks with others and maybe sometimes having more chunks than you were expecting because some were left over from some earlier split.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Related:
Take IDG’s 2020 IT Salary Survey: You’ll provide important data and have a chance to win $500.