| Overview: • About Miller • Miller in 10 minutes • File formats • Miller features in the context of the Unix toolkit • Record-heterogeneity • Internationalization Using Miller: • FAQ • Cookbook part 1 • Cookbook part 2 • Cookbook part 3 • Data-diving examples • Manpage • Reference • Reference: Verbs • Reference: DSL • Documents by release • Installation, portability, dependencies, and testing Background: • Why? • Why C? • Why call it Miller? • How original is Miller? • Performance Repository: • Things to do • Contact information • GitHub repo | • Fields not selected • Diagnosing delimiter specifications • How do I examine then-chaining? • I assigned $9 and it’s not 9th • Why doesn’t mlr cut put fields in the order I want? • NR is not consecutive after then-chaining • Why am I not seeing all possible joins occur? • What about XML or JSON file formats? No output at all
Try od -xcv and/or cat -e on your file to check for non-printable characters.
If you’re using Miller version less than 5.0.0 (try
mlr --version on your system to find out), when the
line-ending-autodetect feature was introduced, please see
here.
 Fields not selected
Check the field-separators of the data, e.g. with the command-line
head program. Example: for CSV, Miller’s default record
separator is comma; if your data is tab-delimited, e.g. aTABbTABc,
then Miller won’t find three fields named a, b, and
c but rather just one named aTABbTABc.  Solution in this
case: mlr --fs tab {remaining arguments ...}.
Also try od -xcv and/or cat -e on your file to check for non-printable characters.
 Diagnosing delimiter specifications# Use the `file` command to see if there are CR/LF terminators (in this case, # there are not): $ file data/colours.csv data/colours.csv: UTF-8 Unicode text # Look at the file to find names of fields $ cat data/colours.csv KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah # Extract a few fields: $ mlr --csv cut -f KEY,PL,RO data/colours.csv (only blank lines appear) # Use XTAB output format to get a sharper picture of where records/fields # are being split: $ mlr --icsv --oxtab cat data/colours.csv KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah # Using XTAB output format makes it clearer that KEY;DE;...;RO;TR is being # treated as a single field name in the CSV header, and likewise each # subsequent line is being treated as a single field value. This is because # the default field separator is a comma but we have semicolons here. # Use XTAB again with different field separator (--fs semicolon): mlr --icsv --ifs semicolon --oxtab cat data/colours.csv KEY masterdata_colourcode_1 DE Weiß EN White ES Blanco FI Valkoinen FR Blanc IT Bianco NL Wit PL Biały RO Alb TR Beyaz KEY masterdata_colourcode_2 DE Schwarz EN Black ES Negro FI Musta FR Noir IT Nero NL Zwart PL Czarny RO Negru TR Siyah # Using the new field-separator, retry the cut: mlr --csv --fs semicolon cut -f KEY,PL,RO data/colours.csv KEY;PL;RO masterdata_colourcode_1;Biały;Alb masterdata_colourcode_2;Czarny;Negru How do I examine then-chaining?
Then-chaining found in Miller is intended to function the same as Unix
pipes, but with less keystroking. You can print your data one pipeline step at
a time, to see what intermediate output at one step becomes the input to the
next step.
First, look at the input data:
 $ cat data/then-example.csv Status,Payment_Type,Amount paid,cash,10.00 pending,debit,20.00 paid,cash,50.00 pending,credit,40.00 paid,debit,30.00 $ mlr --icsv --opprint count-distinct -f Status,Payment_Type data/then-example.csv Status Payment_Type count paid cash 2 pending debit 1 pending credit 1 paid debit 1 $ mlr --icsv --opprint count-distinct -f Status,Payment_Type then sort -nr count data/then-example.csv Status Payment_Type count paid cash 2 pending debit 1 pending credit 1 paid debit 1 $ mlr --csv count-distinct -f Status,Payment_Type data/then-example.csv | mlr --icsv --opprint sort -nr count Status Payment_Type count paid cash 2 pending debit 1 pending credit 1 paid debit 1 I assigned $9 and it’s not 9th
 Miller records are ordered lists of key-value pairs. For NIDX format, DKVP
format when keys are missing, or CSV/CSV-lite format with
--implicit-csv-header, Miller will sequentially assign keys of the
form 1, 2, etc. But these are not integer array indices:
they’re just field names taken from the initial field ordering in the
input data.
 $ echo x,y,z | mlr --dkvp cat 1=x,2=y,3=z $ echo x,y,z | mlr --dkvp put '$6="a";$4="b";$55="cde"' 1=x,2=y,3=z,6=a,4=b,55=cde $ echo x,y,z | mlr --nidx cat x,y,z $ echo x,y,z | mlr --csv --implicit-csv-header cat 1,2,3 x,y,z $ echo x,y,z | mlr --dkvp rename 2,999 1=x,999=y,3=z $ echo x,y,z | mlr --dkvp rename 2,newname 1=x,newname=y,3=z $ echo x,y,z | mlr --csv --implicit-csv-header reorder -f 3,1,2 3,1,2 z,x,y Why doesn’t mlr cut put fields in the order I want?
Example: columns x,i,a were requested but they appear here in the order a,i,x:
 $ cat data/small a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533 a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797 a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776 a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463 a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729 $ mlr cut -f x,i,a data/small a=pan,i=1,x=0.3467901443380824 a=eks,i=2,x=0.7586799647899636 a=wye,i=3,x=0.20460330576630303 a=eks,i=4,x=0.38139939387114097 a=wye,i=5,x=0.5732889198020006 $ mlr cut -o -f x,i,a data/small x=0.3467901443380824,i=1,a=pan x=0.7586799647899636,i=2,a=eks x=0.20460330576630303,i=3,a=wye x=0.38139939387114097,i=4,a=eks x=0.5732889198020006,i=5,a=wye NR is not consecutive after then-chaining
 Given this input data:
 $ cat data/small a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533 a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797 a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776 a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463 a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729 $ mlr filter '$x > 0.5' then put '$NR = NR' data/small a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797,NR=2 a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729,NR=5 $ echo x=1,y=2,z=3 | mlr put '$nf1 = NF; $u = 4; $nf2 = NF; unset $x,$y,$z; $nf3 = NF' nf1=3,u=4,nf2=5,nf3=3 
$ mlr --opprint --from data/small put '
  begin{ @nr1 = 0 }
  @nr1 += 1;
  $nr1 = @nr1
' \
then filter '$x>0.5' \
then put '
  begin{ @nr2 = 0 }
  @nr2 += 1;
  $nr2 = @nr2
'
a   b   i x                  y                  nr1 nr2
eks pan 2 0.7586799647899636 0.5221511083334797 2   1
wye pan 5 0.5732889198020006 0.8636244699032729 5   2
$ mlr filter '$x > 0.5' then cat -n data/small n=1,a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797 n=2,a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729 Why am I not seeing all possible joins occur?
For example, the right file here has nine records, and the left file should
add in the hostname column — so the join output should also have
9 records:
 $ mlr --icsvlite --opprint cat data/join-u-left.csv hostname ipaddr nadir.east.our.org 10.3.1.18 zenith.west.our.org 10.3.1.27 apoapsis.east.our.org 10.4.5.94 $ mlr --icsvlite --opprint cat data/join-u-right.csv ipaddr timestamp bytes 10.3.1.27 1448762579 4568 10.3.1.18 1448762578 8729 10.4.5.94 1448762579 17445 10.3.1.27 1448762589 12 10.3.1.18 1448762588 44558 10.4.5.94 1448762589 8899 10.3.1.27 1448762599 0 10.3.1.18 1448762598 73425 10.4.5.94 1448762599 12200 $ mlr --icsvlite --opprint join -j ipaddr -f data/join-u-left.csv data/join-u-right.csv ipaddr hostname timestamp bytes 10.3.1.27 zenith.west.our.org 1448762579 4568 10.4.5.94 apoapsis.east.our.org 1448762579 17445 10.4.5.94 apoapsis.east.our.org 1448762589 8899 10.4.5.94 apoapsis.east.our.org 1448762599 12200 $ mlr --icsvlite --opprint join -u -j ipaddr -f data/join-u-left.csv data/join-u-right.csv ipaddr hostname timestamp bytes 10.3.1.27 zenith.west.our.org 1448762579 4568 10.3.1.18 nadir.east.our.org 1448762578 8729 10.4.5.94 apoapsis.east.our.org 1448762579 17445 10.3.1.27 zenith.west.our.org 1448762589 12 10.3.1.18 nadir.east.our.org 1448762588 44558 10.4.5.94 apoapsis.east.our.org 1448762589 8899 10.3.1.27 zenith.west.our.org 1448762599 0 10.3.1.18 nadir.east.our.org 1448762598 73425 10.4.5.94 apoapsis.east.our.org 1448762599 12200 What about XML or JSON file formats?
Miller handles  
# DKVP
x=1,y=2
z=3
# XML
<table>
  <record>
    <field>
      <key> x </key> <value> 1 </value>
    </field>
    <field>
      <key> y </key> <value> 2 </value>
    </field>
  </record>
    <field>
      <key> z </key> <value> 3 </value>
    </field>
  <record>
  </record>
</table>
# JSON
[{"x":1,"y":2},{"z":3}]
 |