.. PLEASE DO NOT EDIT DIRECTLY. EDIT THE .rst.in FILE PLEASE. Miller in 10 minutes ==================== CSV-file examples ^^^^^^^^^^^^^^^^^ Suppose you have this CSV data file:: $ cat example.csv color,shape,flag,index,quantity,rate yellow,triangle,1,11,43.6498,9.8870 red,square,1,15,79.2778,0.0130 red,circle,1,16,13.8103,2.9010 red,square,0,48,77.5542,7.4670 purple,triangle,0,51,81.2290,8.5910 red,square,0,64,77.1991,9.5310 purple,triangle,0,65,80.1405,5.8240 yellow,circle,1,73,63.9785,4.2370 yellow,circle,1,87,63.5058,8.3350 purple,square,0,91,72.3735,8.2430 ``mlr cat`` is like cat -- it passes the data through unmodified:: $ mlr --csv cat example.csv color,shape,flag,index,quantity,rate yellow,triangle,1,11,43.6498,9.8870 red,square,1,15,79.2778,0.0130 red,circle,1,16,13.8103,2.9010 red,square,0,48,77.5542,7.4670 purple,triangle,0,51,81.2290,8.5910 red,square,0,64,77.1991,9.5310 purple,triangle,0,65,80.1405,5.8240 yellow,circle,1,73,63.9785,4.2370 yellow,circle,1,87,63.5058,8.3350 purple,square,0,91,72.3735,8.2430 but it can also do format conversion (here, you can pretty-print in tabular format):: $ mlr --icsv --opprint cat example.csv color shape flag index quantity rate yellow triangle 1 11 43.6498 9.8870 red square 1 15 79.2778 0.0130 red circle 1 16 13.8103 2.9010 red square 0 48 77.5542 7.4670 purple triangle 0 51 81.2290 8.5910 red square 0 64 77.1991 9.5310 purple triangle 0 65 80.1405 5.8240 yellow circle 1 73 63.9785 4.2370 yellow circle 1 87 63.5058 8.3350 purple square 0 91 72.3735 8.2430 ``mlr head`` and ``mlr tail`` count records rather than lines. Whethere you're getting the first few records or the last few, the CSV header is included either way:: $ mlr --csv head -n 4 example.csv color,shape,flag,index,quantity,rate yellow,triangle,1,11,43.6498,9.8870 red,square,1,15,79.2778,0.0130 red,circle,1,16,13.8103,2.9010 red,square,0,48,77.5542,7.4670 :: $ mlr --csv tail -n 4 example.csv color,shape,flag,index,quantity,rate purple,triangle,0,65,80.1405,5.8240 yellow,circle,1,73,63.9785,4.2370 yellow,circle,1,87,63.5058,8.3350 purple,square,0,91,72.3735,8.2430 You can sort primarily alphabetically on one field, then secondarily numerically descending on another field:: $ mlr --icsv --opprint sort -f shape -nr index example.csv color shape flag index quantity rate yellow circle 1 87 63.5058 8.3350 yellow circle 1 73 63.9785 4.2370 red circle 1 16 13.8103 2.9010 purple square 0 91 72.3735 8.2430 red square 0 64 77.1991 9.5310 red square 0 48 77.5542 7.4670 red square 1 15 79.2778 0.0130 purple triangle 0 65 80.1405 5.8240 purple triangle 0 51 81.2290 8.5910 yellow triangle 1 11 43.6498 9.8870 You can use ``cut`` to retain only specified fields, in the same order they appeared in the input data:: $ mlr --icsv --opprint cut -f flag,shape example.csv shape flag triangle 1 square 1 circle 1 square 0 triangle 0 square 0 triangle 0 circle 1 circle 1 square 0 You can also use ``cut -o`` to retain only specified fields in your preferred order:: $ mlr --icsv --opprint cut -o -f flag,shape example.csv flag shape 1 triangle 1 square 1 circle 0 square 0 triangle 0 square 0 triangle 1 circle 1 circle 0 square You can use ``cut -x`` to omit fields you don't care about:: $ mlr --icsv --opprint cut -x -f flag,shape example.csv color index quantity rate yellow 11 43.6498 9.8870 red 15 79.2778 0.0130 red 16 13.8103 2.9010 red 48 77.5542 7.4670 purple 51 81.2290 8.5910 red 64 77.1991 9.5310 purple 65 80.1405 5.8240 yellow 73 63.9785 4.2370 yellow 87 63.5058 8.3350 purple 91 72.3735 8.2430 You can use ``filter`` to keep only records you care about:: $ mlr --icsv --opprint filter '$color == "red"' example.csv color shape flag index quantity rate red square 1 15 79.2778 0.0130 red circle 1 16 13.8103 2.9010 red square 0 48 77.5542 7.4670 red square 0 64 77.1991 9.5310 :: $ mlr --icsv --opprint filter '$color == "red" && $flag == 1' example.csv color shape flag index quantity rate red square 1 15 79.2778 0.0130 red circle 1 16 13.8103 2.9010 You can use ``put`` to create new fields which are computed from other fields:: $ mlr --icsv --opprint put '$ratio = $quantity / $rate; $color_shape = $color . "_" . $shape' example.csv color shape flag index quantity rate ratio color_shape yellow triangle 1 11 43.6498 9.8870 4.414868 yellow_triangle red square 1 15 79.2778 0.0130 6098.292308 red_square red circle 1 16 13.8103 2.9010 4.760531 red_circle red square 0 48 77.5542 7.4670 10.386260 red_square purple triangle 0 51 81.2290 8.5910 9.455127 purple_triangle red square 0 64 77.1991 9.5310 8.099790 red_square purple triangle 0 65 80.1405 5.8240 13.760388 purple_triangle yellow circle 1 73 63.9785 4.2370 15.099953 yellow_circle yellow circle 1 87 63.5058 8.3350 7.619172 yellow_circle purple square 0 91 72.3735 8.2430 8.779995 purple_square Even though Miller's main selling point is name-indexing, sometimes you really want to refer to a field name by its positional index. Use ``$[[3]]`` to access the name of field 3 or ``$[[[3]]]`` to access the value of field 3:: $ mlr --icsv --opprint put '$[[3]] = "NEW"' example.csv color shape NEW index quantity rate yellow triangle 1 11 43.6498 9.8870 red square 1 15 79.2778 0.0130 red circle 1 16 13.8103 2.9010 red square 0 48 77.5542 7.4670 purple triangle 0 51 81.2290 8.5910 red square 0 64 77.1991 9.5310 purple triangle 0 65 80.1405 5.8240 yellow circle 1 73 63.9785 4.2370 yellow circle 1 87 63.5058 8.3350 purple square 0 91 72.3735 8.2430 :: $ mlr --icsv --opprint put '$[[[3]]] = "NEW"' example.csv color shape flag index quantity rate yellow triangle NEW 11 43.6498 9.8870 red square NEW 15 79.2778 0.0130 red circle NEW 16 13.8103 2.9010 red square NEW 48 77.5542 7.4670 purple triangle NEW 51 81.2290 8.5910 red square NEW 64 77.1991 9.5310 purple triangle NEW 65 80.1405 5.8240 yellow circle NEW 73 63.9785 4.2370 yellow circle NEW 87 63.5058 8.3350 purple square NEW 91 72.3735 8.2430 JSON-file examples ^^^^^^^^^^^^^^^^^^ OK, CSV and pretty-print are fine. But Miller can also convert between a few other formats -- let's take a look at JSON output:: $ mlr --icsv --ojson put '$ratio = $quantity/$rate; $shape = toupper($shape)' example.csv { "color": "yellow", "shape": "TRIANGLE", "flag": 1, "index": 11, "quantity": 43.6498, "rate": 9.8870, "ratio": 4.414868 } { "color": "red", "shape": "SQUARE", "flag": 1, "index": 15, "quantity": 79.2778, "rate": 0.0130, "ratio": 6098.292308 } { "color": "red", "shape": "CIRCLE", "flag": 1, "index": 16, "quantity": 13.8103, "rate": 2.9010, "ratio": 4.760531 } { "color": "red", "shape": "SQUARE", "flag": 0, "index": 48, "quantity": 77.5542, "rate": 7.4670, "ratio": 10.386260 } { "color": "purple", "shape": "TRIANGLE", "flag": 0, "index": 51, "quantity": 81.2290, "rate": 8.5910, "ratio": 9.455127 } { "color": "red", "shape": "SQUARE", "flag": 0, "index": 64, "quantity": 77.1991, "rate": 9.5310, "ratio": 8.099790 } { "color": "purple", "shape": "TRIANGLE", "flag": 0, "index": 65, "quantity": 80.1405, "rate": 5.8240, "ratio": 13.760388 } { "color": "yellow", "shape": "CIRCLE", "flag": 1, "index": 73, "quantity": 63.9785, "rate": 4.2370, "ratio": 15.099953 } { "color": "yellow", "shape": "CIRCLE", "flag": 1, "index": 87, "quantity": 63.5058, "rate": 8.3350, "ratio": 7.619172 } { "color": "purple", "shape": "SQUARE", "flag": 0, "index": 91, "quantity": 72.3735, "rate": 8.2430, "ratio": 8.779995 } Or, JSON output with vertical-formatting flags:: $ mlr --icsv --ojsonx tail -n 2 example.csv { "color": "yellow", "shape": "circle", "flag": 1, "index": 87, "quantity": 63.5058, "rate": 8.3350 } { "color": "purple", "shape": "square", "flag": 0, "index": 91, "quantity": 72.3735, "rate": 8.2430 } Sorts and stats ^^^^^^^^^^^^^^^ Now suppose you want to sort the data on a given column, *and then* take the top few in that ordering. You can use Miller's ``then`` feature to pipe commands together. Here are the records with the top three ``index`` values:: $ mlr --icsv --opprint sort -f shape -nr index then head -n 3 example.csv color shape flag index quantity rate yellow circle 1 87 63.5058 8.3350 yellow circle 1 73 63.9785 4.2370 red circle 1 16 13.8103 2.9010 Lots of Miller commands take a ``-g`` option for group-by: here, ``head -n 1 -g shape`` outputs the first record for each distinct value of the ``shape`` field. This means we're finding the record with highest ``index`` field for each distinct ``shape`` field:: $ mlr --icsv --opprint sort -f shape -nr index then head -n 1 -g shape example.csv color shape flag index quantity rate yellow circle 1 87 63.5058 8.3350 purple square 0 91 72.3735 8.2430 purple triangle 0 65 80.1405 5.8240 Statistics can be computed with or without group-by field(s):: $ mlr --icsv --opprint --from example.csv stats1 -a count,min,mean,max -f quantity -g shape shape quantity_count quantity_min quantity_mean quantity_max triangle 3 43.649800 68.339767 81.229000 square 4 72.373500 76.601150 79.277800 circle 3 13.810300 47.098200 63.978500 :: $ mlr --icsv --opprint --from example.csv stats1 -a count,min,mean,max -f quantity -g shape,color shape color quantity_count quantity_min quantity_mean quantity_max triangle yellow 1 43.649800 43.649800 43.649800 square red 3 77.199100 78.010367 79.277800 circle red 1 13.810300 13.810300 13.810300 triangle purple 2 80.140500 80.684750 81.229000 circle yellow 2 63.505800 63.742150 63.978500 square purple 1 72.373500 72.373500 72.373500 If your output has a lot of columns, you can use XTAB format to line things up vertically for you instead:: $ mlr --icsv --oxtab --from example.csv stats1 -a p0,p10,p25,p50,p75,p90,p99,p100 -f rate rate_p0 0.013000 rate_p10 2.901000 rate_p25 4.237000 rate_p50 8.243000 rate_p75 8.591000 rate_p90 9.887000 rate_p99 9.887000 rate_p100 9.887000 .. _10min-choices-for-printing-to-files: Choices for printing to files ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Often we want to print output to the screen. Miller does this by default, as we've seen in the previous examples. Sometimes we want to print output to another file: just use **> outputfilenamegoeshere** at the end of your command: :: % mlr --icsv --opprint cat example.csv > newfile.csv # Output goes to the new file; # nothing is printed to the screen. :: % cat newfile.csv color shape flag index quantity rate yellow triangle 1 11 43.6498 9.8870 red square 1 15 79.2778 0.0130 red circle 1 16 13.8103 2.9010 red square 0 48 77.5542 7.4670 purple triangle 0 51 81.2290 8.5910 red square 0 64 77.1991 9.5310 purple triangle 0 65 80.1405 5.8240 yellow circle 1 73 63.9785 4.2370 yellow circle 1 87 63.5058 8.3350 purple square 0 91 72.3735 8.2430 Other times we just want our files to be **changed in-place**: just use **mlr -I**:: % cp example.csv newfile.txt % cat newfile.txt color,shape,flag,index,quantity,rate yellow,triangle,1,11,43.6498,9.8870 red,square,1,15,79.2778,0.0130 red,circle,1,16,13.8103,2.9010 red,square,0,48,77.5542,7.4670 purple,triangle,0,51,81.2290,8.5910 red,square,0,64,77.1991,9.5310 purple,triangle,0,65,80.1405,5.8240 yellow,circle,1,73,63.9785,4.2370 yellow,circle,1,87,63.5058,8.3350 purple,square,0,91,72.3735,8.2430 % mlr -I --icsv --opprint cat newfile.txt % cat newfile.txt color shape flag index quantity rate yellow triangle 1 11 43.6498 9.8870 red square 1 15 79.2778 0.0130 red circle 1 16 13.8103 2.9010 red square 0 48 77.5542 7.4670 purple triangle 0 51 81.2290 8.5910 red square 0 64 77.1991 9.5310 purple triangle 0 65 80.1405 5.8240 yellow circle 1 73 63.9785 4.2370 yellow circle 1 87 63.5058 8.3350 purple square 0 91 72.3735 8.2430 Also using ``mlr -I`` you can bulk-operate on lots of files: e.g.:: mlr -I --csv cut -x -f unwanted_column_name *.csv If you like, you can first copy off your original data somewhere else, before doing in-place operations. Lastly, using ``tee`` within ``put``, you can split your input data into separate files per one or more field names:: $ mlr --csv --from example.csv put -q 'tee > $shape.".csv", $*' :: $ cat circle.csv color,shape,flag,index,quantity,rate red,circle,1,16,13.8103,2.9010 yellow,circle,1,73,63.9785,4.2370 yellow,circle,1,87,63.5058,8.3350 :: $ cat square.csv color,shape,flag,index,quantity,rate red,square,1,15,79.2778,0.0130 red,square,0,48,77.5542,7.4670 red,square,0,64,77.1991,9.5310 purple,square,0,91,72.3735,8.2430 :: $ cat triangle.csv color,shape,flag,index,quantity,rate yellow,triangle,1,11,43.6498,9.8870 purple,triangle,0,51,81.2290,8.5910 purple,triangle,0,65,80.1405,5.8240 Other-format examples ^^^^^^^^^^^^^^^^^^^^^ What's a CSV file, really? It's an array of rows, or *records*, each being a list of key-value pairs, or *fields*: for CSV it so happens that all the keys are shared in the header line and the values vary data line by data line. For example, if you have:: shape,flag,index circle,1,24 square,0,36 then that's a way of saying:: shape=circle,flag=1,index=24 shape=square,flag=0,index=36 Data written this way are called **DKVP**, for *delimited key-value pairs*. We've also already seen other ways to write the same data:: CSV PPRINT JSON shape,flag,index shape flag index [ circle,1,24 circle 1 24 { square,0,36 square 0 36 "shape": "circle", "flag": 1, "index": 24 }, DKVP XTAB { shape=circle,flag=1,index=24 shape circle "shape": "square", shape=square,flag=0,index=36 flag 1 "flag": 0, index 24 "index": 36 } shape square ] flag 0 index 36 Anything we can do with CSV input data, we can do with any other format input data. And you can read from one format, do any record-processing, and output to the same format as the input, or to a different output format.