Unix-toolkit context¶
How does Miller fit within the Unix toolkit (grep, sed, awk, etc.)?
File-format awareness¶
Miller respects CSV headers. If you do mlr --csv cat *.csv
then the header line is written once:
$ cat data/a.csv
a,b,c
1,2,3
4,5,6
$ cat data/b.csv
a,b,c
7,8,9
$ mlr --csv cat data/a.csv data/b.csv
a,b,c
1,2,3
4,5,6
7,8,9
$ mlr --csv sort -nr b data/a.csv data/b.csv
a,b,c
7,8,9
4,5,6
1,2,3
Likewise with mlr sort
, mlr tac
, and so on.
awk-like features: mlr filter and mlr put¶
mlr filter
includes/excludes records based on a filter expression, e.g.mlr filter '$count > 10'
.mlr put
adds a new field as a function of others, e.g.mlr put '$xy = $x * $y'
ormlr put '$counter = NR'
.The
$name
syntax is straight fromawk
’s$1 $2 $3
(adapted to name-based indexing), as are the variablesFS
,OFS
,RS
,ORS
,NF
,NR
, andFILENAME
. TheENV[...]
syntax is from Ruby.While
awk
functions are record-based, Miller subcommands (or verbs) are stream-based: each of them maps a stream of records into another stream of records.Like
awk
, Miller (as of v5.0.0) allows you to define new functions within itsput
andfilter
expression language. Further programmability comes from chaining withthen
.As with
awk
,$
-variables are stream variables and all verbs (such ascut
,stats1
,put
, etc.) as well asput
/filter
statements operate on streams. This means that you define actions to be done on each record and then stream your data through those actions. The built-in variablesNF
,NR
, etc. change from one line to another,$x
is a label for fieldx
in the current record, and the input tosqrt($x)
changes from one record to the next. The expression language for theput
andfilter
verbs additionally allows you to definebegin {...}
andend {...}
blocks for actions to be taken before and after records are processed, respectively.As with
awk
, Miller’sput
/filter
language lets you set@sum=0
before records are read, then update that sum on each record, then print its value at the end. Unlikeawk
, Miller makes syntactically explicit the difference between variables with extent across all records (names starting with@
, such as@sum
) and variables which are local to the current expression (names starting without@
, such assum
).Miller can be faster than
awk
,cut
, and so on, depending on platform; see also Performance. In particular, Miller’s DSL syntax is parsed into C control structures at startup time, with the bulk data-stream processing all done in C.
See also¶
See Verbs reference for more on Miller’s subcommands cat
, cut
, head
, sort
, tac
, tail
, top
, and uniq
, as well as DSL reference for more on the awk-like mlr filter
and mlr put
.