Skip to content

Questions about then-chaining

How do I examine then-chaining?

Then-chaining found in Miller is intended to function the same as Unix pipes, but with less keystroking. You can print your data one pipeline step at a time, to see what intermediate output at one step becomes the input to the next step.

First, look at the input data:

cat data/then-example.csv
Status,Payment_Type,Amount
paid,cash,10.00
pending,debit,20.00
paid,cash,50.00
pending,credit,40.00
paid,debit,30.00

Next, run the first step of your command, omitting anything from the first then onward:

mlr --from data/then-example.csv --c2p count-distinct -f Status,Payment_Type
Status  Payment_Type count
paid    cash         2
pending debit        1
pending credit       1
paid    debit        1

After that, run it with the next then step included:

mlr --from data/then-example.csv --c2p count-distinct -f Status,Payment_Type \
  then sort -nr count
Status  Payment_Type count
paid    cash         2
pending debit        1
pending credit       1
paid    debit        1

Now if you use then to include another verb after that, the columns Status, Payment_Type, and count will be the input to that verb.

Note, by the way, that you'll get the same results using pipes:

mlr --from data/then-example.csv --csv count-distinct -f Status,Payment_Type \
| mlr --c2p sort -nr count
Status  Payment_Type count
paid    cash         2
pending debit        1
pending credit       1
paid    debit        1

NR is not consecutive after then-chaining

Given this input data:

cat data/small
a=pan,b=pan,i=1,x=0.346791,y=0.726802
a=eks,b=pan,i=2,x=0.758679,y=0.522151
a=wye,b=wye,i=3,x=0.204603,y=0.338318
a=eks,b=wye,i=4,x=0.381399,y=0.134188
a=wye,b=pan,i=5,x=0.573288,y=0.863624

why don't I see NR=1 and NR=2 here??

mlr --from data/small filter '$x > 0.5' then put '$NR = NR'
a=eks,b=pan,i=2,x=0.758679,y=0.522151,NR=2
a=wye,b=pan,i=5,x=0.573288,y=0.863624,NR=5

The reason is that NR is computed for the original input records and isn't dynamically updated. By contrast, NF is dynamically updated: it's the number of fields in the current record, and if you add/remove a field, the value of NF will change:

echo x=1,y=2,z=3 | mlr put '$nf1 = NF; $u = 4; $nf2 = NF; unset $x,$y,$z; $nf3 = NF'
nf1=3,u=4,nf2=5,nf3=3

NR, by contrast (and FNR as well), retains the value from the original input stream, and records may be dropped by a filter within a then-chain. To recover consecutive record numbers, you can use out-of-stream variables as follows:

mlr --opprint --from data/small put '
  begin{ @nr1 = 0 }
  @nr1 += 1;
  $nr1 = @nr1
' \
then filter '$x>0.5' \
then put '
  begin{ @nr2 = 0 }
  @nr2 += 1;
  $nr2 = @nr2
'
a   b   i x        y        nr1 nr2
eks pan 2 0.758679 0.522151 2   1
wye pan 5 0.573288 0.863624 5   2

Or, simply use mlr cat -n:

mlr filter '$x > 0.5' then cat -n data/small
n=1,a=eks,b=pan,i=2,x=0.758679,y=0.522151
n=2,a=wye,b=pan,i=5,x=0.573288,y=0.863624