.. PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE. Unix-toolkit context ================================================================ How does Miller fit within the Unix toolkit (`grep`, `sed`, `awk`, etc.)? File-format awareness ---------------------------------------------------------------- Miller respects CSV headers. If you do ``mlr --csv cat *.csv`` then the header line is written once: :: $ cat data/a.csv a,b,c 1,2,3 4,5,6 :: $ cat data/b.csv a,b,c 7,8,9 :: $ mlr --csv cat data/a.csv data/b.csv a,b,c 1,2,3 4,5,6 7,8,9 :: $ mlr --csv sort -nr b data/a.csv data/b.csv a,b,c 7,8,9 4,5,6 1,2,3 Likewise with ``mlr sort``, ``mlr tac``, and so on. awk-like features: mlr filter and mlr put ---------------------------------------------------------------- * ``mlr filter`` includes/excludes records based on a filter expression, e.g. ``mlr filter '$count > 10'``. * ``mlr put`` adds a new field as a function of others, e.g. ``mlr put '$xy = $x * $y'`` or ``mlr put '$counter = NR'``. * The ``$name`` syntax is straight from ``awk``'s ``$1 $2 $3`` (adapted to name-based indexing), as are the variables ``FS``, ``OFS``, ``RS``, ``ORS``, ``NF``, ``NR``, and ``FILENAME``. The ``ENV[...]`` syntax is from Ruby. * While ``awk`` functions are record-based, Miller subcommands (or *verbs*) are stream-based: each of them maps a stream of records into another stream of records. * Like ``awk``, Miller (as of v5.0.0) allows you to define new functions within its ``put`` and ``filter`` expression language. Further programmability comes from chaining with ``then``. * As with ``awk``, ``$``-variables are stream variables and all verbs (such as ``cut``, ``stats1``, ``put``, etc.) as well as ``put``/``filter`` statements operate on streams. This means that you define actions to be done on each record and then stream your data through those actions. The built-in variables ``NF``, ``NR``, etc. change from one line to another, ``$x`` is a label for field ``x`` in the current record, and the input to ``sqrt($x)`` changes from one record to the next. The expression language for the ``put`` and ``filter`` verbs additionally allows you to define ``begin {...}`` and ``end {...}`` blocks for actions to be taken before and after records are processed, respectively. * As with ``awk``, Miller's ``put``/``filter`` language lets you set ``@sum=0`` before records are read, then update that sum on each record, then print its value at the end. Unlike ``awk``, Miller makes syntactically explicit the difference between variables with extent across all records (names starting with ``@``, such as ``@sum``) and variables which are local to the current expression (names starting without ``@``, such as ``sum``). * Miller can be faster than ``awk``, ``cut``, and so on, depending on platform; see also :doc:`performance`. In particular, Miller's DSL syntax is parsed into C control structures at startup time, with the bulk data-stream processing all done in C. See also ---------------------------------------------------------------- See :doc:`reference-verbs` for more on Miller's subcommands ``cat``, ``cut``, ``head``, ``sort``, ``tac``, ``tail``, ``top``, and ``uniq``, as well as :doc:`reference-dsl` for more on the awk-like ``mlr filter`` and ``mlr put``.