• About Miller • File formats • Miller features in the context of the Unix toolkit • Record-heterogeneity • Reference • Data examples • FAQ • Internationalization • Compiling, portability, dependencies, and testing • Performance • Why C? • Why call it Miller? • How original is Miller? • Things to do • Documents by release • Contact information • GitHub repo |
• Fields not selected • Diagnosing delimiter specifications • Error-output in certain string cases • How do I parse log-file output? • How do I examine then-chaining? • How do I do arithmetic on fields with currency symbols? No output at allCheck the line-terminators of the data, e.g. with the command-line file program. Example: for CSV, Miller’s default line terminator is CR/LF (carriage return followed by linefeed, following RFC4180). Yet if your CSV has *nix-standard LF line endings, Miller will keep reading the file looking for a CR/LF which never appears. Solution in this case: tell Miller the input has LF line-terminator, e.g. mlr --csv --rs lf {remaining arguments ...}. Also try od -xcv and/or cat -e on your file to check for non-printable characters.Fields not selectedCheck the field-separators of the data, e.g. with the command-line head program. Example: for CSV, Miller’s default record separator is comma; if your data is tab-delimited, e.g. aTABbTABc, then Miller won’t find three fields named a, b, and c but rather just one named aTABbTABc. Solution in this case: mlr --fs tab {remaining arguments ...}. Also try od -xcv and/or cat -e on your file to check for non-printable characters.Diagnosing delimiter specifications# Use the `file` command to see if there are CR/LF terminators (in this case, # there are not): $ file colours.csv colours.csv: UTF-8 Unicode text # Look at the file to find names of fields $ cat colours.csv KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah # Try (unsuccessfully) to extract a few fields: $ mlr --csv cut -f KEY,PL,RO colours.csv (no output) # Use LF record separator (--rs lf) since the file doesn't have CR/LF line # endings -- but still unsuccessfully: $ mlr --csv --rs lf cut -f KEY,PL,RO colours.csv (only blank lines appear) # Use XTAB output format to get a sharper picture of where records/fields # are being split: $ mlr --icsv --irs lf --oxtab cat colours.csv KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah # Using XTAB output format makes it clearer that KEY;DE;...;RO;TR is being # treated as a single field name in the CSV header, and likewise each # subsequent line is being treated as a single field value. This is because # the default field separator is a comma but we have semicolons here. # Use XTAB again with different field separator (--fs semicolon): $ mlr --icsv --irs lf --ifs semicolon --oxtab cat colours.csv KEY masterdata_colourcode_1 DE Weiß EN White ES Blanco FI Valkoinen FR Blanc IT Bianco NL Wit PL Biały RO Alb TR Beyaz KEY masterdata_colourcode_2 DE Schwarz EN Black ES Negro FI Musta FR Noir IT Nero NL Zwart PL Czarny RO Negru TR Siyah # Using the new field-separator, retry the cut: $ mlr --csv --rs lf --fs semicolon cut -f KEY,PL,RO colours.csv KEY;PL;RO masterdata_colourcode_1;Biały;Alb masterdata_colourcode_2;Czarny;Negru Error-output in certain string casesmlr put '$y = string($x); $z=$y.$y' gives (error) on numeric data such as x=123 while mlr put '$z=string($x).string($x)' does not. This is because in the former case y is computed and stored as a string, then re-parsed as an integer, for which string-concatenation is an invalid operator.How do I parse log-file output?Suppose you have log-file lines such as2015-10-08 08:29:09,445 INFO com.company.path.to.ClassName @ [sometext] various/sorts/of data {& punctuation} hits=1 status=0 time=2.378 grep 'various sorts' *.log | sed 's/.*} //' | mlr --fs space --repifs --oxtab stats1 -a min,p10,p50,p90,max -f time -g status How do I examine then-chaining?Then-chaining found in Miller is intended to function the same as Unix pipes. You can print your data one pipeline step at a time, to see what intermediate output at one step becomes the input to the next step. First, review the input data:$ cat data/then-example.csv Status,Payment_Type,Amount paid,cash,10.00 pending,debit,20.00 paid,cash,50.00 pending,credit,40.00 paid,debit,30.00 $ mlr --icsv --rs lf --opprint count-distinct -f Status,Payment_Type data/then-example.csv Status Payment_Type count paid cash 2 pending debit 1 pending credit 1 paid debit 1 $ mlr --icsv --rs lf --opprint count-distinct -f Status,Payment_Type then sort -nr count data/then-example.csv Status Payment_Type count paid cash 2 pending debit 1 pending credit 1 paid debit 1 $ mlr --csv --rs lf count-distinct -f Status,Payment_Type data/then-example.csv | mlr --icsv --rs lf --opprint sort -nr count Status Payment_Type count paid cash 2 pending debit 1 pending credit 1 paid debit 1 How do I do arithmetic on fields with currency symbols?$ cat sample.csv EventOccurred,EventType,Description,Status,PaymentType,NameonAccount,TransactionNumber,Amount 10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,John,1,$230.36 10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Fred,2,$32.25 10/1/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Bob,3,$39.02 10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Alice,4,$57.54 10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Jungle,5,$230.36 10/1/2015,Charged Back,Reason: Payment Stopped,Disputed,Checking,Joe,6,$281.96 10/2/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Joseph,7,$188.19 10/2/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Joseph,8,$188.19 10/2/2015,Charged Back,Reason: Payment Stopped,Disputed,Checking,Anthony,9,$250.00 $ mlr --icsv --opprint cat sample.csv EventOccurred EventType Description Status PaymentType NameonAccount TransactionNumber Amount 10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking John 1 $230.36 10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking Fred 2 $32.25 10/1/2015 Charged Back Reason: Customer Advises Not Authorized Disputed Checking Bob 3 $39.02 10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking Alice 4 $57.54 10/1/2015 Charged Back Reason: Authorization Revoked By Customer Disputed Checking Jungle 5 $230.36 10/1/2015 Charged Back Reason: Payment Stopped Disputed Checking Joe 6 $281.96 10/2/2015 Charged Back Reason: Customer Advises Not Authorized Disputed Checking Joseph 7 $188.19 10/2/2015 Charged Back Reason: Customer Advises Not Authorized Disputed Checking Joseph 8 $188.19 10/2/2015 Charged Back Reason: Payment Stopped Disputed Checking Anthony 9 $250.00 $ mlr --csv put '$Amount = sub(string($Amount), "\$", "")' then stats1 -a sum -f Amount sample.csv Amount_sum 1497.870000 $ mlr --csv --ofmt '%.2lf' put '$Amount = sub(string($Amount), "\$", "")' then stats1 -a sum -f Amount sample.csv Amount_sum 1497.87 |