|
Miller features in the context of the Unix toolkit
File-format awareness
Miller respects CSV headers. If you do mlr --csv cat *.csv then the header line is written once:
$ cat a.csv
a,b,c
1,2,3
4,5,6
|
|
$ mlr --csv cat a.csv b.csv
a,b,c
1,2,3
4,5,6
7,8,9
|
$ mlr --csv sort -nr b a.csv b.csv
a,b,c
7,8,9
4,5,6
1,2,3
|
Likewise with mlr sort, mlr tac, and so on.
awk-like features: mlr filter and mlr put
mlr filter includes/excludes records based on a filter
expression, e.g. mlr filter '$count > 10'.
mlr put adds a new field as a function of others, e.g. mlr
put '$xy = $x * $y' or mlr put '$counter = NR'.
The $name syntax is straight from awk’s $1 $2
$3 (adapted to name-based indexing), as are the variables FS,
OFS, RS, ORS, NF, NR, and
FILENAME.
While awk functions are record-based, Miller subcommands (or
functions, if you like) are stream-based: each of them maps a stream of records
into another stream of records.
Unlike awk, Miller doesn’t allow you to define new functions.
Its domain-specific languages are limited to the filter and
put syntax. Futher programmability comes from chaining with
then.
Unlike with awk, all variables are stream variables and all
functions are stream functions. This means NF, NR, etc.
change from one line to another, $x is a label for field x in
the current record, and the input to sqrt($x) changes from one record
to the next. Miller doesn’t let you set sum=0 before
records are read, then update that sum on each record, then print its value at the
end. (However, do see mlr step -a rsum in the
Reference) page.)
Miller is faster than awk, cut, and so on (depending on
platform; see also Performance). In
particular, Miller’s DSL syntax is parsed into C control structures at
startup time, with the bulk data-stream processing all done in C.
See also
See Reference for more on Miller’s
subcommands cat, cut, head, sort,
tac, tail, top, and uniq, as well as awk-like
mlr filter and mlr put.
|