XKCD Comic: tar
via XKCD

The tar utility, used to consolidate multiple files or directories into a single archive file, is ubiquitous among Unix-like operating systems; whenever source code is made available for download, it is almost always archived using tar. Nonetheless, the tar utility remains notorious for being difficult to use. Why is that?

The answer, I suspect, lies in the order of the command’s arguments. To illustrate, we’ll create a tar file called “destination.tar” containing 3 files. To create this file, the tar command takes the following form:

tar -c -f destination.tar file1.txt file2.txt file3.txt

That’s not so bad, right? However, the command’s confusing nature doesn’t necessarily relate to its inherent properties – rather, it relates to how it differs from other common Unix commands. Every other Unix command that specifies one or more source files with a destination has them ordered the other way around: with the source file(s) followed by the destination.

For example, to copy several files to a directory, you would use the cp command like this:

cp file1.txt file2.txt file3.txt destination/

Similarly to move the files instead of copying them, the mv command is used in the same way:

mv file1.txt file2.txt file3.txt destination/

Other commands that also take on this form include…

How it happened

To determine why tar takes this unusual form, we must examine the history of the command. The command was initially released with Version 7 Unix in 19791 as a utility which “saves and restores files on magtape”2. (tar is actually short for Tape ARchive.) As such, the default form of tar, without any arguments, accepts a list of source files and writes them to your magnetic tape device2. (Your computer does have one of those, right?)

Naturally, tar is never really used in this way anymore. Instead, the -c and -f flags are used, which cause the command to create a new file for use as the archive. (Tellingly, every example command listed in the manual for the current version tar contains the -f flag 3.)

Going back to our original example command:

tar -c -f destination.tar file1.txt file2.txt file3.txt

We can now see why “destination.tar” must be specified first: the filename is actually part of the -f flag, and therefore must directly follow it.

Implications for design

The tar command is not difficult to remember because it is inherently the wrong order, but because it fails to maintain consistency with other, similar commands. When designing any user interface (not just command line ones), it’s important to maintain a sense of consistency – failing to do so will, as we see here, often result in unexpected behavior, which in turn leads to frustration.

Discuss this post on Hacker News