Unix-toolkit context¶
How does Miller fit within the Unix toolkit (grep, sed, awk, etc.)?
File-format awareness¶
Miller respects CSV headers. If you do mlr --csv cat *.csv then the header line is written once:
$ cat data/a.csv
a,b,c
1,2,3
4,5,6
$ cat data/b.csv
a,b,c
7,8,9
$ mlr --csv cat data/a.csv data/b.csv
a,b,c
1,2,3
4,5,6
7,8,9
$ mlr --csv sort -nr b data/a.csv data/b.csv
a,b,c
7,8,9
4,5,6
1,2,3
Likewise with mlr sort, mlr tac, and so on.
awk-like features: mlr filter and mlr put¶
mlr filterincludes/excludes records based on a filter expression, e.g.mlr filter '$count > 10'.mlr putadds a new field as a function of others, e.g.mlr put '$xy = $x * $y'ormlr put '$counter = NR'.The
$namesyntax is straight fromawk’s$1 $2 $3(adapted to name-based indexing), as are the variablesFS,OFS,RS,ORS,NF,NR, andFILENAME. TheENV[...]syntax is from Ruby.While
awkfunctions are record-based, Miller subcommands (or verbs) are stream-based: each of them maps a stream of records into another stream of records.Like
awk, Miller (as of v5.0.0) allows you to define new functions within itsputandfilterexpression language. Further programmability comes from chaining withthen.As with
awk,$-variables are stream variables and all verbs (such ascut,stats1,put, etc.) as well asput/filterstatements operate on streams. This means that you define actions to be done on each record and then stream your data through those actions. The built-in variablesNF,NR, etc. change from one line to another,$xis a label for fieldxin the current record, and the input tosqrt($x)changes from one record to the next. The expression language for theputandfilterverbs additionally allows you to definebegin {...}andend {...}blocks for actions to be taken before and after records are processed, respectively.As with
awk, Miller’sput/filterlanguage lets you set@sum=0before records are read, then update that sum on each record, then print its value at the end. Unlikeawk, Miller makes syntactically explicit the difference between variables with extent across all records (names starting with@, such as@sum) and variables which are local to the current expression (names starting without@, such assum).Miller can be faster than
awk,cut, and so on, depending on platform; see also Performance. In particular, Miller’s DSL syntax is parsed into C control structures at startup time, with the bulk data-stream processing all done in C.
See also¶
See Verbs reference for more on Miller’s subcommands cat, cut, head, sort, tac, tail, top, and uniq, as well as DSL reference for more on the awk-like mlr filter and mlr put.