10-minute intro

Sample CSV data file:

$ cat example.csv
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
red,square,1,15,79.2778,0.0130
red,circle,1,16,13.8103,2.9010
red,square,0,48,77.5542,7.4670
purple,triangle,0,51,81.2290,8.5910
red,square,0,64,77.1991,9.5310
purple,triangle,0,65,80.1405,5.8240
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430

mlr cat is like cat ...

$ mlr --csv cat example.csv
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
red,square,1,15,79.2778,0.0130
red,circle,1,16,13.8103,2.9010
red,square,0,48,77.5542,7.4670
purple,triangle,0,51,81.2290,8.5910
red,square,0,64,77.1991,9.5310
purple,triangle,0,65,80.1405,5.8240
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430

... but it can also do format conversion (here, to pretty-printed tabular format):

$ mlr --icsv --opprint cat example.csv
color  shape    flag index quantity rate
yellow triangle 1    11    43.6498  9.8870
red    square   1    15    79.2778  0.0130
red    circle   1    16    13.8103  2.9010
red    square   0    48    77.5542  7.4670
purple triangle 0    51    81.2290  8.5910
red    square   0    64    77.1991  9.5310
purple triangle 0    65    80.1405  5.8240
yellow circle   1    73    63.9785  4.2370
yellow circle   1    87    63.5058  8.3350
purple square   0    91    72.3735  8.2430

mlr head and mlr tail count records rather than lines. The CSV header is included either way:

$ mlr --csv head -n 4 example.csv
color,shape,flag,index,quantity,rate
yellow,triangle,1,11,43.6498,9.8870
red,square,1,15,79.2778,0.0130
red,circle,1,16,13.8103,2.9010
red,square,0,48,77.5542,7.4670

$ mlr --csv tail -n 4 example.csv
color,shape,flag,index,quantity,rate
purple,triangle,0,65,80.1405,5.8240
yellow,circle,1,73,63.9785,4.2370
yellow,circle,1,87,63.5058,8.3350
purple,square,0,91,72.3735,8.2430

Sort primarily alphabetically on one field, then secondarily numerically descending on another field:

$ mlr --icsv --opprint sort -f shape -nr index example.csv
color  shape    flag index quantity rate
yellow circle   1    87    63.5058  8.3350
yellow circle   1    73    63.9785  4.2370
red    circle   1    16    13.8103  2.9010
purple square   0    91    72.3735  8.2430
red    square   0    64    77.1991  9.5310
red    square   0    48    77.5542  7.4670
red    square   1    15    79.2778  0.0130
purple triangle 0    65    80.1405  5.8240
purple triangle 0    51    81.2290  8.5910
yellow triangle 1    11    43.6498  9.8870

Use cut to retain only specified fields, in input-data order:

$ mlr --icsv --opprint cut -f flag,shape example.csv
shape    flag
triangle 1
square   1
circle   1
square   0
triangle 0
square   0
triangle 0
circle   1
circle   1
square   0

Use cut -o to retain only specified fields, in your specified order:

$ mlr --icsv --opprint cut -o -f flag,shape example.csv
flag shape
1    triangle
1    square
1    circle
0    square
0    triangle
0    square
0    triangle
1    circle
1    circle
0    square

Use cut -x to omit specified fields:

$ mlr --icsv --opprint cut -x -f flag,shape example.csv
color  index quantity rate
yellow 11    43.6498  9.8870
red    15    79.2778  0.0130
red    16    13.8103  2.9010
red    48    77.5542  7.4670
purple 51    81.2290  8.5910
red    64    77.1991  9.5310
purple 65    80.1405  5.8240
yellow 73    63.9785  4.2370
yellow 87    63.5058  8.3350
purple 91    72.3735  8.2430

Use filter to retain specified records:

$ mlr --icsv --opprint filter '$color == "red"' example.csv
color shape  flag index quantity rate
red   square 1    15    79.2778  0.0130
red   circle 1    16    13.8103  2.9010
red   square 0    48    77.5542  7.4670
red   square 0    64    77.1991  9.5310

$ mlr --icsv --opprint filter '$color == "red" && $flag == 1' example.csv
color shape  flag index quantity rate
red   square 1    15    79.2778  0.0130
red   circle 1    16    13.8103  2.9010

Use put to add/replace fields which are computed from other fields:

$ mlr --icsv --opprint put '$ratio = $quantity / $rate; $color_shape = $color . "_" . $shape' example.csv
color  shape    flag index quantity rate   ratio       color_shape
yellow triangle 1    11    43.6498  9.8870 4.414868    yellow_triangle
red    square   1    15    79.2778  0.0130 6098.292308 red_square
red    circle   1    16    13.8103  2.9010 4.760531    red_circle
red    square   0    48    77.5542  7.4670 10.386260   red_square
purple triangle 0    51    81.2290  8.5910 9.455127    purple_triangle
red    square   0    64    77.1991  9.5310 8.099790    red_square
purple triangle 0    65    80.1405  5.8240 13.760388   purple_triangle
yellow circle   1    73    63.9785  4.2370 15.099953   yellow_circle
yellow circle   1    87    63.5058  8.3350 7.619172    yellow_circle
purple square   0    91    72.3735  8.2430 8.779995    purple_square

Even though Miller’s main selling point is name-indexing, sometimes you really want to refer to a field name by its positional index. Use $[[3]] to access the name of field 3 or $[[[3]]] to access the value of field 3:

$ mlr --icsv --opprint put '$[[3]] = "NEW"' example.csv
color  shape    NEW index quantity rate
yellow triangle 1   11    43.6498  9.8870
red    square   1   15    79.2778  0.0130
red    circle   1   16    13.8103  2.9010
red    square   0   48    77.5542  7.4670
purple triangle 0   51    81.2290  8.5910
red    square   0   64    77.1991  9.5310
purple triangle 0   65    80.1405  5.8240
yellow circle   1   73    63.9785  4.2370
yellow circle   1   87    63.5058  8.3350
purple square   0   91    72.3735  8.2430

$ mlr --icsv --opprint put '$[[[3]]] = "NEW"' example.csv
color  shape    flag index quantity rate
yellow triangle NEW  11    43.6498  9.8870
red    square   NEW  15    79.2778  0.0130
red    circle   NEW  16    13.8103  2.9010
red    square   NEW  48    77.5542  7.4670
purple triangle NEW  51    81.2290  8.5910
red    square   NEW  64    77.1991  9.5310
purple triangle NEW  65    80.1405  5.8240
yellow circle   NEW  73    63.9785  4.2370
yellow circle   NEW  87    63.5058  8.3350
purple square   NEW  91    72.3735  8.2430

JSON output:

$ mlr --icsv --ojson put '$ratio = $quantity/$rate; $shape = toupper($shape)' example.csv
{ "color": "yellow", "shape": "TRIANGLE", "flag": 1, "index": 11, "quantity": 43.6498, "rate": 9.8870, "ratio": 4.414868 }
{ "color": "red", "shape": "SQUARE", "flag": 1, "index": 15, "quantity": 79.2778, "rate": 0.0130, "ratio": 6098.292308 }
{ "color": "red", "shape": "CIRCLE", "flag": 1, "index": 16, "quantity": 13.8103, "rate": 2.9010, "ratio": 4.760531 }
{ "color": "red", "shape": "SQUARE", "flag": 0, "index": 48, "quantity": 77.5542, "rate": 7.4670, "ratio": 10.386260 }
{ "color": "purple", "shape": "TRIANGLE", "flag": 0, "index": 51, "quantity": 81.2290, "rate": 8.5910, "ratio": 9.455127 }
{ "color": "red", "shape": "SQUARE", "flag": 0, "index": 64, "quantity": 77.1991, "rate": 9.5310, "ratio": 8.099790 }
{ "color": "purple", "shape": "TRIANGLE", "flag": 0, "index": 65, "quantity": 80.1405, "rate": 5.8240, "ratio": 13.760388 }
{ "color": "yellow", "shape": "CIRCLE", "flag": 1, "index": 73, "quantity": 63.9785, "rate": 4.2370, "ratio": 15.099953 }
{ "color": "yellow", "shape": "CIRCLE", "flag": 1, "index": 87, "quantity": 63.5058, "rate": 8.3350, "ratio": 7.619172 }
{ "color": "purple", "shape": "SQUARE", "flag": 0, "index": 91, "quantity": 72.3735, "rate": 8.2430, "ratio": 8.779995 }

JSON output with vertical-formatting flags:

$ mlr --icsv --ojson --jvstack --jlistwrap tail -n 2 example.csv
[
{
  "color": "yellow",
  "shape": "circle",
  "flag": 1,
  "index": 87,
  "quantity": 63.5058,
  "rate": 8.3350
}
,{
  "color": "purple",
  "shape": "square",
  "flag": 0,
  "index": 91,
  "quantity": 72.3735,
  "rate": 8.2430
}
]

Use then to pipe commands together. Also, the -g option for many Miller commands is for group-by: here, head -n 1 -g shape outputs the first record for each distinct value of the shape field. This means we’re finding the record with highest index field for each distinct shape field:

$ mlr --icsv --opprint sort -f shape -nr index then head -n 1 -g shape example.csv
color  shape    flag index quantity rate
yellow circle   1    87    63.5058  8.3350
purple square   0    91    72.3735  8.2430
purple triangle 0    65    80.1405  5.8240

Statistics can be computed with or without group-by field(s). Also, the first of these two examples uses --oxtab output format which is a nice alternative to --opprint when you have lots of columns:

$ mlr --icsv --oxtab --from example.csv stats1 -a p0,p10,p25,p50,p75,p90,p99,p100 -f rate
rate_p0   0.013000
rate_p10  2.901000
rate_p25  4.237000
rate_p50  8.243000
rate_p75  8.591000
rate_p90  9.887000
rate_p99  9.887000
rate_p100 9.887000

$ mlr --icsv --opprint --from example.csv stats1 -a count,min,mean,max -f quantity -g shape
shape    quantity_count quantity_min quantity_mean quantity_max
triangle 3              43.649800    68.339767     81.229000
square   4              72.373500    76.601150     79.277800
circle   3              13.810300    47.098200     63.978500

$ mlr --icsv --opprint --from example.csv stats1 -a count,min,mean,max -f quantity -g shape,color
shape    color  quantity_count quantity_min quantity_mean quantity_max
triangle yellow 1              43.649800    43.649800     43.649800
square   red    3              77.199100    78.010367     79.277800
circle   red    1              13.810300    13.810300     13.810300
triangle purple 2              80.140500    80.684750     81.229000
circle   yellow 2              63.505800    63.742150     63.978500
square   purple 1              72.373500    72.373500     72.373500

CSV-file examples

Choices for printing to files

Other-format examples

SQL-output examples

SQL-input examples

Log-processing examples

More