FAQ

No output at all

Check the line-terminators of the data, e.g. with the command-line file program. Example: for CSV, Miller’s default line terminator is CR/LF (carriage return followed by linefeed, following RFC4180). Yet if your CSV has *nix-standard LF line endings, Miller will keep reading the file looking for a CR/LF which never appears. Solution in this case: tell Miller the input has LF line-terminator, e.g. mlr --csv --rs lf {remaining arguments ...}.

Also try od -xcv and/or cat -e on your file to check for non-printable characters.

Fields not selected

Check the field-separators of the data, e.g. with the command-line head program. Example: for CSV, Miller’s default record separator is comma; if your data is tab-delimited, e.g. aTABbTABc, then Miller won’t find three fields named a, b, and c but rather just one named aTABbTABc. Solution in this case: mlr --fs tab {remaining arguments ...}.

Also try od -xcv and/or cat -e on your file to check for non-printable characters.

Diagnosing delimiter specifications

# Use the `file` command to see if there are CR/LF terminators (in this case,
# there are not):
$ file colours.csv
colours.csv: UTF-8 Unicode text

# Look at the file to find names of fields
$ cat colours.csv
KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR
masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz
masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah

# Try (unsuccessfully) to extract a few fields:
$ mlr --csv cut -f KEY,PL,RO colours.csv
(no output)

# Use LF record separator (--rs lf) since the file doesn't have CR/LF line
# endings -- but still unsuccessfully:
$ mlr --csv --rs lf cut -f KEY,PL,RO colours.csv
(only blank lines appear)

# Use XTAB output format to get a sharper picture of where records/fields
# are being split:
$ mlr --icsv --irs lf --oxtab cat colours.csv
KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_1;Weiß;White;Blanco;Valkoinen;Blanc;Bianco;Wit;Biały;Alb;Beyaz

KEY;DE;EN;ES;FI;FR;IT;NL;PL;RO;TR masterdata_colourcode_2;Schwarz;Black;Negro;Musta;Noir;Nero;Zwart;Czarny;Negru;Siyah

# Using XTAB output format makes it clearer that KEY;DE;...;RO;TR is being
# treated as a single field name in the CSV header, and likewise each
# subsequent line is being treated as a single field value. This is because
# the default field separator is a comma but we have semicolons here.
# Use XTAB again with different field separator (--fs semicolon):
$ mlr --icsv --irs lf --ifs semicolon --oxtab cat colours.csv
KEY masterdata_colourcode_1
DE  Weiß
EN  White
ES  Blanco
FI  Valkoinen
FR  Blanc
IT  Bianco
NL  Wit
PL  Biały
RO  Alb
TR  Beyaz

KEY masterdata_colourcode_2
DE  Schwarz
EN  Black
ES  Negro
FI  Musta
FR  Noir
IT  Nero
NL  Zwart
PL  Czarny
RO  Negru
TR  Siyah

# Using the new field-separator, retry the cut:
$ mlr --csv --rs lf --fs semicolon cut -f KEY,PL,RO colours.csv
KEY;PL;RO
masterdata_colourcode_1;Biały;Alb
masterdata_colourcode_2;Czarny;Negru

Error-output in certain string cases

mlr put '$y = string($x); $z=$y.$y' gives (error) on numeric data such as x=123 while mlr put '$z=string($x).string($x)' does not. This is because in the former case y is computed and stored as a string, then re-parsed as an integer, for which string-concatenation is an invalid operator.

How do I parse log-file output?

Suppose you have log-file lines such as

2015-10-08 08:29:09,445 INFO com.company.path.to.ClassName @ [sometext] various/sorts/of data {& punctuation} hits=1 status=0 time=2.378

I prefer to pre-filter with grep and/or sed to extract the structured text, then hand that to Miller. Example:

grep 'various sorts' *.log | sed 's/.*} //' | mlr --fs space --repifs --oxtab stats1 -a min,p10,p50,p90,max -f time -g status

How do I examine then-chaining?

Then-chaining found in Miller is intended to function the same as Unix pipes. You can print your data one pipeline step at a time, to see what intermediate output at one step becomes the input to the next step.

First, review the input data:

$ cat data/then-example.csv
Status,Payment_Type,Amount
paid,cash,10.00
pending,debit,20.00
paid,cash,50.00
pending,credit,40.00
paid,debit,30.00

Next, run the first step of your command, omitting anything from the first then onward:

$ mlr --icsv --rs lf --opprint count-distinct -f Status,Payment_Type data/then-example.csv
Status  Payment_Type count
paid    cash         2
pending debit        1
pending credit       1
paid    debit        1

After that, run it with the next then step included:

$ mlr --icsv --rs lf --opprint count-distinct -f Status,Payment_Type then sort -nr count data/then-example.csv
Status  Payment_Type count
paid    cash         2
pending debit        1
pending credit       1
paid    debit        1

Now if you include another then step after this, the columns Status, Payment_Type, and count will be its input.

Note, by the way, that you’ll get the same results using pipes:

$ mlr --csv --rs lf count-distinct -f Status,Payment_Type data/then-example.csv | mlr --icsv --rs lf --opprint sort -nr count
Status  Payment_Type count
paid    cash         2
pending debit        1
pending credit       1
paid    debit        1

How do I do arithmetic on fields with currency symbols?

$ cat sample.csv
EventOccurred,EventType,Description,Status,PaymentType,NameonAccount,TransactionNumber,Amount
10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,John,1,$230.36
10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Fred,2,$32.25
10/1/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Bob,3,$39.02
10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Alice,4,$57.54
10/1/2015,Charged Back,Reason: Authorization Revoked By Customer,Disputed,Checking,Jungle,5,$230.36
10/1/2015,Charged Back,Reason: Payment Stopped,Disputed,Checking,Joe,6,$281.96
10/2/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Joseph,7,$188.19
10/2/2015,Charged Back,Reason: Customer Advises Not Authorized,Disputed,Checking,Joseph,8,$188.19
10/2/2015,Charged Back,Reason: Payment Stopped,Disputed,Checking,Anthony,9,$250.00

$ mlr --icsv --opprint cat sample.csv
EventOccurred EventType    Description                               Status   PaymentType NameonAccount TransactionNumber Amount
10/1/2015     Charged Back Reason: Authorization Revoked By Customer Disputed Checking    John          1                 $230.36
10/1/2015     Charged Back Reason: Authorization Revoked By Customer Disputed Checking    Fred          2                 $32.25
10/1/2015     Charged Back Reason: Customer Advises Not Authorized   Disputed Checking    Bob           3                 $39.02
10/1/2015     Charged Back Reason: Authorization Revoked By Customer Disputed Checking    Alice         4                 $57.54
10/1/2015     Charged Back Reason: Authorization Revoked By Customer Disputed Checking    Jungle        5                 $230.36
10/1/2015     Charged Back Reason: Payment Stopped                   Disputed Checking    Joe           6                 $281.96
10/2/2015     Charged Back Reason: Customer Advises Not Authorized   Disputed Checking    Joseph        7                 $188.19
10/2/2015     Charged Back Reason: Customer Advises Not Authorized   Disputed Checking    Joseph        8                 $188.19
10/2/2015     Charged Back Reason: Payment Stopped                   Disputed Checking    Anthony       9                 $250.00

$ mlr --csv put '$Amount = sub(string($Amount), "\$", "")' then stats1 -a sum -f Amount sample.csv
Amount_sum
1497.870000

$ mlr --csv --ofmt '%.2lf' put '$Amount = sub(string($Amount), "\$", "")' then stats1 -a sum -f Amount sample.csv
Amount_sum
1497.87