• About Miller • File formats • Miller features in the context of the Unix toolkit • Record-heterogeneity • Reference • Data examples • Cookbook • FAQ • Internationalization • Compiling, portability, dependencies, and testing • Performance • Why C? • Why call it Miller? • How original is Miller? • Things to do • Documents by release • Contact information • GitHub repo |
• No output at all • Fields not selected • Diagnosing delimiter specifications • Error-output in certain string cases • How do I examine then-chaining? • Why doesn’t mlr cut put fields in the order I want? • Why am I not seeing all possible joins occur? • What about XML or JSON file formats? Number one FAQPlease use mlr --csv --rs lf for native Un*x (linefeed-terminated) CSV files.No output at allCheck the line-terminators of the data, e.g. with the command-line file program. Example: for CSV, Miller’s default line terminator is CR/LF (carriage return followed by linefeed, following RFC4180). Yet if your CSV has *nix-standard LF line endings, Miller will keep reading the file looking for a CR/LF which never appears. Solution in this case: tell Miller the input has LF line-terminator, e.g. mlr --csv --rs lf {remaining arguments ...}. Also try od -xcv and/or cat -e on your file to check for non-printable characters.Fields not selectedCheck the field-separators of the data, e.g. with the command-line head program. Example: for CSV, Miller’s default record separator is comma; if your data is tab-delimited, e.g. aTABbTABc, then Miller won’t find three fields named a, b, and c but rather just one named aTABbTABc. Solution in this case: mlr --fs tab {remaining arguments ...}. Also try od -xcv and/or cat -e on your file to check for non-printable characters.Diagnosing delimiter specifications# Use the `file` command to see if there are CR/LF terminators (in this case, # there are not): $ file colours.csv colours.csv: UTF-8 Unicode text # Look at the file to find names of fields $ cat colours.csv KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah # Try (unsuccessfully) to extract a few fields: $ mlr --csv cut -f KEY,PL,RO colours.csv (no output) # Use LF record separator (--rs lf) since the file doesn't have CR/LF line # endings -- but still unsuccessfully: $ mlr --csv --rs lf cut -f KEY,PL,RO colours.csv (only blank lines appear) # Use XTAB output format to get a sharper picture of where records/fields # are being split: $ mlr --icsv --irs lf --oxtab cat colours.csv KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah # Using XTAB output format makes it clearer that KEY;DE;...;RO;TR is being # treated as a single field name in the CSV header, and likewise each # subsequent line is being treated as a single field value. This is because # the default field separator is a comma but we have semicolons here. # Use XTAB again with different field separator (--fs semicolon): $ mlr --icsv --irs lf --ifs semicolon --oxtab cat colours.csv KEY masterdata_colourcode_1 DE Weiß EN White ES Blanco FI Valkoinen FR Blanc IT Bianco NL Wit PL Biały RO Alb TR Beyaz KEY masterdata_colourcode_2 DE Schwarz EN Black ES Negro FI Musta FR Noir IT Nero NL Zwart PL Czarny RO Negru TR Siyah # Using the new field-separator, retry the cut: $ mlr --csv --rs lf --fs semicolon cut -f KEY,PL,RO colours.csv KEY;PL;RO masterdata_colourcode_1;Biały;Alb masterdata_colourcode_2;Czarny;Negru Error-output in certain string casesmlr put '$y = string($x); $z=$y.$y' gives (error) on numeric data such as x=123 while mlr put '$z=string($x).string($x)' does not. This is because in the former case y is computed and stored as a string, then re-parsed as an integer, for which string-concatenation is an invalid operator.How do I examine then-chaining?Then-chaining found in Miller is intended to function the same as Unix pipes. You can print your data one pipeline step at a time, to see what intermediate output at one step becomes the input to the next step. First, review the input data:$ cat data/then-example.csv Status,Payment_Type,Amount paid,cash,10.00 pending,debit,20.00 paid,cash,50.00 pending,credit,40.00 paid,debit,30.00 $ mlr --icsv --rs lf --opprint count-distinct -f Status,Payment_Type data/then-example.csv Status Payment_Type count paid cash 2 pending debit 1 pending credit 1 paid debit 1 $ mlr --icsv --rs lf --opprint count-distinct -f Status,Payment_Type then sort -nr count data/then-example.csv Status Payment_Type count paid cash 2 pending debit 1 pending credit 1 paid debit 1 $ mlr --csv --rs lf count-distinct -f Status,Payment_Type data/then-example.csv | mlr --icsv --rs lf --opprint sort -nr count Status Payment_Type count paid cash 2 pending debit 1 pending credit 1 paid debit 1 Why doesn’t mlr cut put fields in the order I want?Example: columns x,i,a were requested but they appear here in the order a,i,x:$ cat data/small a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533 a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797 a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776 a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463 a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729 $ mlr cut -f x,i,a data/small a=pan,i=1,x=0.3467901443380824 a=eks,i=2,x=0.7586799647899636 a=wye,i=3,x=0.20460330576630303 a=eks,i=4,x=0.38139939387114097 a=wye,i=5,x=0.5732889198020006 $ mlr cut -o -f x,i,a data/small x=0.3467901443380824,i=1,a=pan x=0.7586799647899636,i=2,a=eks x=0.20460330576630303,i=3,a=wye x=0.38139939387114097,i=4,a=eks x=0.5732889198020006,i=5,a=wye Why am I not seeing all possible joins occur?For example, the right file here has nine records, and the left file should add in the hostname column — so the join output should also have 9 records:$ mlr --icsvlite --opprint cat data/join-u-left.csv hostname ipaddr nadir.east.our.org 10.3.1.18 zenith.west.our.org 10.3.1.27 apoapsis.east.our.org 10.4.5.94 $ mlr --icsvlite --opprint cat data/join-u-right.csv ipaddr timestamp bytes 10.3.1.27 1448762579 4568 10.3.1.18 1448762578 8729 10.4.5.94 1448762579 17445 10.3.1.27 1448762589 12 10.3.1.18 1448762588 44558 10.4.5.94 1448762589 8899 10.3.1.27 1448762599 0 10.3.1.18 1448762598 73425 10.4.5.94 1448762599 12200 $ mlr --icsvlite --opprint join -j ipaddr -f data/join-u-left.csv data/join-u-right.csv ipaddr hostname timestamp bytes 10.3.1.27 zenith.west.our.org 1448762579 4568 10.4.5.94 apoapsis.east.our.org 1448762579 17445 10.4.5.94 apoapsis.east.our.org 1448762589 8899 10.4.5.94 apoapsis.east.our.org 1448762599 12200 What about XML or JSON file formats?Miller handles# DKVP x=1,y=2 z=3 # XML <table> <record> <field> <key> x </key> <value> 1 </value> </field> <field> <key> y </key> <value> 2 </value> </field> </record> <field> <key> z </key> <value> 3 </value> </field> <record> </record> </table> # JSON [{"x":1,"y":2},{"z":3}] |