Miller
• About Miller
• File formats
• Miller features in the context of the Unix toolkit
• Record-heterogeneity
• Reference
• Data examples
• Cookbook
• FAQ
• Internationalization
• Compiling, portability, dependencies, and testing
• Performance
• Why C?
• Why call it Miller?
• How original is Miller?
• Things to do
• Documents by release
• Contact information
• GitHub repo

Miller features in the context of the Unix toolkit

Contents: • File-format awareness
• awk-like features: mlr filter and mlr put
• See also

File-format awareness

Miller respects CSV headers. If you do mlr --csv cat *.csv then the header line is written once:

$ cat a.csv
a,b,c
1,2,3
4,5,6

$ cat b.csv
a,b,c
7,8,9

$ mlr --csv cat a.csv b.csv
a,b,c
1,2,3
4,5,6
7,8,9

$ mlr --csv sort -nr b a.csv b.csv
a,b,c
7,8,9
4,5,6
1,2,3

Likewise with mlr sort, mlr tac, and so on.

awk-like features: mlr filter and mlr put

mlr filter includes/excludes records based on a filter expression, e.g. mlr filter '$count > 10'.

mlr put adds a new field as a function of others, e.g. mlr put '$xy = $x * $y' or mlr put '$counter = NR'.

The $name syntax is straight from awk’s $1 $2 $3 (adapted to name-based indexing), as are the variables FS, OFS, RS, ORS, NF, NR, and FILENAME.

While awk functions are record-based, Miller subcommands (or functions, if you like) are stream-based: each of them maps a stream of records into another stream of records.

Unlike awk, Miller doesn’t allow you to define new functions. Its domain-specific languages are limited to the filter and put syntax. Futher programmability comes from chaining with then.

Unlike with awk, all variables are stream variables and all functions are stream functions. This means NF, NR, etc. change from one line to another, $x is a label for field x in the current record, and the input to sqrt($x) changes from one record to the next. Miller doesn’t let you set sum=0 before records are read, then update that sum on each record, then print its value at the end. (However, do see mlr step -a rsum in the Reference) page.)
Miller is faster than awk, cut, and so on (depending on platform; see also Performance). In particular, Miller’s DSL syntax is parsed into C control structures at startup time, with the bulk data-stream processing all done in C.

See also

See Reference for more on Miller’s subcommands cat, cut, head, sort, tac, tail, top, and uniq, as well as awk-like mlr filter and mlr put.