Skip to content

Then-chaining

In accord with the Unix philosophy, you can pipe data into or out of Miller. For example:

mlr cut --complement -f os_version *.dat | mlr sort -f hostname,uptime

You can, if you like, instead simply chain commands together using the then keyword:

mlr cut --complement -f os_version then sort -f hostname,uptime *.dat

(You can precede the very first verb with then, if you like, for symmetry.)

Here's a performance comparison:

% cat piped.sh
mlr cut -x -f i,y data/big | mlr sort -n y > /dev/null

% time sh piped.sh
real    0m2.321s
user    0m4.878s
sys     0m1.564s

% cat chained.sh
mlr cut -x -f i,y then sort -n y data/big > /dev/null

% time sh chained.sh
real    0m2.070s
user    0m2.738s
sys     0m1.259s

There are two reasons to use then-chaining: one is for performance, although I don't expect this to be a win in all cases. Using then-chaining avoids redundant string-parsing and string-formatting at each pipeline step: instead input records are parsed once, they are fed through each pipeline stage in memory, and then output records are formatted once.

The other reason to use then-chaining is for simplicity: you don't have re-type formatting flags (e.g. --csv --fs tab) at every pipeline stage.