pv: The Pipe Viewer

When working on the command line, it is common to build complex commands that connect sub-tasks through the pipe. In those situations, it can be useful to have an idea of the data transferred between two connected processes. For example, what is the throughput? What about the volume of data transferred? Enter pv.

You can insert pv at any point in a pipe. Functionally, it does not modify the data that passes through it, just copying it from input to output. It will will however collect and display statistics.

Let’s take an example: we need to retrieve a file from a remote host, decompress it, filter out some lines, re-compress it and store the result locally. One way to do this could be the following:

$ ssh remote@host 'cat myfile.gz' | gzip -cd | grep -v ERROR | gzip -c > myfile_filtered.gz

If we want to have an idea in real-time of how fast data is transferred on the network, we can insert a pv command in the pipe before ssh and gzip:

$ ssh remote@host 'cat myfile.gz' | pv | gzip -cd | grep -v ERROR | gzip -c > myfile_filtered.gz

When running this modified command, a progress bar will display, continuously showing the volume of data, time elapsed and transfer rate.

pv is very configurable, and many options exists to control the precise data displayed. Furthermore, it can also limit the data rate, report statistics of a running command, etc. For more details, as always, please refer to the man page.

Some more examples:

$ cat /dev/urandom | pv > /dev/null
$ cat /dev/urandom | pv --rate-limit 1M > /dev/null
$ cat /dev/random | pv > /dev/null
[move the mouse and see entropy coming in :) )

$ cat random.bin | xz -c --best > random.bin.gz
find the PID of the xz command, then:
$ pv --watchfd <pid>

$ pv -cN source < out.txt.gz | gzip -cd | pv -cN gzip | xz --best | pv -cN xz > out.txt.xz

A similar command is progress. This tool examines some commands currently running on your system (such as cp, mv, tar, etc.) and displays the percentage of copied data. It can also show estimated time and throughput, and provide a “top-like” mode (monitoring).