| Overview: • About Miller • Miller in 10 minutes • File formats • Miller features in the context of the Unix toolkit • Record-heterogeneity • Internationalization Using Miller: • FAQ • Cookbook part 1 • Cookbook part 2 • Cookbook part 3 • Data-diving examples • Manpage • Reference • Reference: Verbs • Reference: DSL • Documents by release • Installation, portability, dependencies, and testing Background: • Why? • Why C? • Why call it Miller? • How original is Miller? • Performance Repository: • Things to do • Contact information • GitHub repo | • Examples • CSV/TSV/etc. • DKVP: Key-value pairs • NIDX: Index-numbered (toolkit style) • Tabular JSON • Single-level JSON objects • Nested JSON objects • Arrays • Formatting JSON options • JSON non-streaming • PPRINT: Pretty-printed tabular • XTAB: Vertical tabular • Markdown tabular • Data-conversion keystroke-savers • Autodetect of line endings Overview
Miller handles name-indexed data using several formats: some you probably
know by name, such as CSV, TSV, and JSON — and other formats you’re
likely already seeing and using in your structured data.
 Examples
$ mlr --usage-data-format-examples
  DKVP: delimited key-value pairs (Miller default format)
  +---------------------+
  | apple=1,bat=2,cog=3 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
  | dish=7,egg=8,flint  | Record 2: "dish" => "7", "egg" => "8", "3" => "flint"
  +---------------------+
  NIDX: implicitly numerically indexed (Unix-toolkit style)
  +---------------------+
  | the quick brown     | Record 1: "1" => "the", "2" => "quick", "3" => "brown"
  | fox jumped          | Record 2: "1" => "fox", "2" => "jumped"
  +---------------------+
  CSV/CSV-lite: comma-separated values with separate header line
  +---------------------+
  | apple,bat,cog       |
  | 1,2,3               | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
  | 4,5,6               | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
  +---------------------+
  Tabular JSON: nested objects are supported, although arrays within them are not:
  +---------------------+
  | {                   |
  |  "apple": 1,        | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
  |  "bat": 2,          |
  |  "cog": 3           |
  | }                   |
  | {                   |
  |   "dish": {         | Record 2: "dish:egg" => "7", "dish:flint" => "8", "garlic" => ""
  |     "egg": 7,       |
  |     "flint": 8      |
  |   },                |
  |   "garlic": ""      |
  | }                   |
  +---------------------+
  PPRINT: pretty-printed tabular
  +---------------------+
  | apple bat cog       |
  | 1     2   3         | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
  | 4     5   6         | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
  +---------------------+
  XTAB: pretty-printed transposed tabular
  +---------------------+
  | apple 1             | Record 1: "apple" => "1", "bat" => "2", "cog" => "3"
  | bat   2             |
  | cog   3             |
  |                     |
  | dish 7              | Record 2: "dish" => "7", "egg" => "8"
  | egg  8              |
  +---------------------+
  Markdown tabular (supported for output only):
  +-----------------------+
  | | apple | bat | cog | |
  | | ---   | --- | --- | |
  | | 1     | 2   | 3   | | Record 1: "apple => "1", "bat" => "2", "cog" => "3"
  | | 4     | 5   | 6   | | Record 2: "apple" => "4", "bat" => "5", "cog" => "6"
  +-----------------------+
CSV/TSV/etc.
When mlr is invoked with the --csv or --csvlite option,
key names are found on the first record and values are taken from subsequent
records.  This includes the case of CSV-formatted files.  See
Record-heterogeneity for how Miller handles
changes of field names within a single data stream.
 Miller has record separator RS and field separator FS,
just as awk does.  For TSV, use --fs tab; to convert TSV to
CSV, use --ifs tab --ofs comma, etc.  (See also
Reference.)
 The following are synonymous pairs:
 
 
 
 DKVP: Key-value pairs
Miller’s default file format is DKVP, for delimited key-value pairs. Example:
 $ mlr cat data/small a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533 a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797 a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776 a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463 a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729 
puts "host=#{hostname},seconds=#{t2-t1},message=#{msg}"
puts mymap.collect{|k,v| "#{k}=#{v}"}.join(',')
echo "type=3,user=$USER,date=$date\n"; 
logger.log("type=3,user=$USER,date=$date\n");
resource=/path/to/file,loadsec=0.45,ok=true record_count=100, resource=/path/to/file resource=/some/other/path,loadsec=0.97,ok=false NIDX: Index-numbered (toolkit style)
With --inidx --ifs ' ' --repifs, Miller splits lines on whitespace and
assigns integer field names starting with 1. This recapitulates Unix-toolkit
behavior.
 Example with index-numbered output:
 
 
 
 Tabular JSON
JSON is a format which supports arbitrarily deep nesting of
“objects” (hashmaps) and “arrays” (lists), while Miller
is a tool for handling  Single-level JSON objectsAn
$ mlr --json head -n 2 then cut -f color,shape data/json-example-1.json
{ "color": "yellow", "shape": "triangle" }
{ "color": "red", "shape": "square" }
$ mlr --json --jvstack head -n 2 then cut -f color,u,v data/json-example-1.json
{
  "color": "yellow",
  "u": 0.6321695890307647,
  "v": 0.9887207810889004
}
{
  "color": "red",
  "u": 0.21966833570651523,
  "v": 0.001257332190235938
}
$ mlr --ijson --opprint stats1 -a mean,stddev,count -f u -g shape data/json-example-1.json shape u_mean u_stddev u_count triangle 0.583995 0.131184 3 square 0.409355 0.365428 4 circle 0.366013 0.209094 3 Nested JSON objectsAdditionally, Miller can
$ mlr --json --jvstack head -n 2 data/json-example-2.json
{
  "flag": 1,
  "i": 11,
  "attributes": {
    "color": "yellow",
    "shape": "triangle"
  },
  "values": {
    "u": 0.632170,
    "v": 0.988721,
    "w": 0.436498,
    "x": 5.798188
  }
}
{
  "flag": 1,
  "i": 15,
  "attributes": {
    "color": "red",
    "shape": "square"
  },
  "values": {
    "u": 0.219668,
    "v": 0.001257,
    "w": 0.792778,
    "x": 2.944117
  }
}
$ mlr --ijson --opprint head -n 4 data/json-example-2.json flag i attributes:color attributes:shape values:u values:v values:w values:x 1 11 yellow triangle 0.632170 0.988721 0.436498 5.798188 1 15 red square 0.219668 0.001257 0.792778 2.944117 1 16 red circle 0.209017 0.290052 0.138103 5.065034 0 48 red square 0.956274 0.746720 0.775542 7.117831 
$ mlr --json --jvstack head -n 1 then put '${values:uv} = ${values:u} * ${values:v}' data/json-example-2.json
{
  "flag": 1,
  "i": 11,
  "attributes": {
    "color": "yellow",
    "shape": "triangle"
  },
  "values": {
    "u": 0.632170,
    "v": 0.988721,
    "w": 0.436498,
    "x": 5.798188,
    "uv": 0.625040
  }
}
ArraysArrays aren’t supported in Miller’s put/filter DSL. By default, JSON arrays are read in as integer-keyed maps. Suppose you have arrays like this in our input data:
$ cat data/json-example-3.json
{
  "label": "orange",
  "values": [12.2, 13.8, 17.2]
}
{
  "label": "purple",
  "values": [27.0, 32.4]
}
$ mlr --ijson --oxtab cat data/json-example-3.json label orange values:0 12.2 values:1 13.8 values:2 17.2 label purple values:0 27.0 values:1 32.4 
$ mlr --json --jvstack cat data/json-example-3.json
{
  "label": "orange",
  "values": {
    "0": 12.2,
    "1": 13.8,
    "2": 17.2
  }
}
{
  "label": "purple",
  "values": {
    "0": 27.0,
    "1": 32.4
  }
}
Formatting JSON optionsJSON isn’t a parameterized format, so RS, FS, PS aren’t specifiable. Nonetheless, you can do the following:
 JSON non-streamingThe JSON parser Miller uses does not return until all input is parsed: in particular this means that, unlike for other file formats, Miller does not (at present) handle JSON files in tail -f contexts.PPRINT: Pretty-printed tabular
Miller’s pretty-print format is like CSV, but column-aligned.  For example, compare
 
 $ mlr --opprint --barred cat data/small +-----+-----+---+---------------------+---------------------+ | a | b | i | x | y | +-----+-----+---+---------------------+---------------------+ | pan | pan | 1 | 0.3467901443380824 | 0.7268028627434533 | | eks | pan | 2 | 0.7586799647899636 | 0.5221511083334797 | | wye | wye | 3 | 0.20460330576630303 | 0.33831852551664776 | | eks | wye | 4 | 0.38139939387114097 | 0.13418874328430463 | | wye | pan | 5 | 0.5732889198020006 | 0.8636244699032729 | +-----+-----+---+---------------------+---------------------+ XTAB: Vertical tabular
This is perhaps most useful for looking a very wide and/or multi-column
data which causes line-wraps on the screen (but see also https://github.com/twosigma/ngrid
for an entirely different, very powerful option). Namely:
 
 Markdown tabular
Markdown format looks like this:
 $ mlr --omd cat data/small | a | b | i | x | y | | --- | --- | --- | --- | --- | | pan | pan | 1 | 0.3467901443380824 | 0.7268028627434533 | | eks | pan | 2 | 0.7586799647899636 | 0.5221511083334797 | | wye | wye | 3 | 0.20460330576630303 | 0.33831852551664776 | | eks | wye | 4 | 0.38139939387114097 | 0.13418874328430463 | | wye | pan | 5 | 0.5732889198020006 | 0.8636244699032729 |  As of Miller 4.3.0, markdown format is supported only for output, not input. Data-conversion keystroke-savers
While you can do format conversion using mlr --icsv --ojson cat myfile.csv,
there are also keystroke-savers for this purpose, such as mlr --c2j cat myfile.csv.
For a complete list:
 $ mlr --usage-format-conversion-keystroke-saver-options As keystroke-savers for format-conversion you may use the following: --c2t --c2d --c2n --c2j --c2x --c2p --c2m --t2c --t2d --t2n --t2j --t2x --t2p --t2m --d2c --d2t --d2n --d2j --d2x --d2p --d2m --n2c --n2t --n2d --n2j --n2x --n2p --n2m --j2c --j2t --j2d --j2n --j2x --j2p --j2m --x2c --x2t --x2d --x2n --x2j --x2p --x2m --p2c --p2t --p2d --p2n --p2j --p2x --p2m The letters c t d n j x p m refer to formats CSV, TSV, DKVP, NIDX, JSON, XTAB, PPRINT, and markdown, respectively. Note that markdown format is available for output only. Autodetect of line endings
 Default line endings (--irs and --ors) are 'auto'
which means  |