Data

a=pan,b=pan,i=1,x=0.3467901443380824,y=0.7268028627434533
a=eks,b=pan,i=2,x=0.7586799647899636,y=0.5221511083334797
a=wye,b=wye,i=3,x=0.20460330576630303,y=0.33831852551664776
a=eks,b=wye,i=4,x=0.38139939387114097,y=0.13418874328430463
a=wye,b=pan,i=5,x=0.5732889198020006,y=0.8636244699032729

a,b,i,x,y
pan,pan,1,0.3467901443380824,0.7268028627434533
eks,pan,2,0.7586799647899636,0.5221511083334797
wye,wye,3,0.20460330576630303,0.33831852551664776
eks,wye,4,0.38139939387114097,0.13418874328430463
wye,pan,5,0.5732889198020006,0.8636244699032729

for DKVP and CSV, respectively, where fields a and b take one of five text values, uniformly distributed; i is a 1-up line counter; x and y are independent uniformly distributed floating-point numbers in the unit interval.

Data files of one million lines (totalling about 50MB for CSV and 60MB for DKVP) were used. In experiments not shown here, I also varied the file sizes; the size-dependent results were the expected, completely unsurprising linearities and so I produced no file-size-dependent plots for your viewing pleasure.

Comparands

The cat, cut, awk, sed, sort tools were compared to mlr on an 8-core Darwin laptop; RAM capacity was nowhere near challenged . The catc program is a simple line-oriented line-printer (source here) which is intermediate between Miller (which is record-aware as well as line-aware) and cat (which is only byte-aware).

Raw results

Note that for CSV data, the command is mlr --csvlite ... rather than mlr ....

   Mac     Mac         Comparand
   DKVP    CSV
  seconds seconds

   0.016   0.013       cat
   0.189   0.189       catc
   3.657   4.388       awk -F, '{print}'
   2.027   1.795       mlr cat

   2.292   1.940       cut -d , -f 1,4
   3.540   4.516       awk -F, '{print $1,$4}'
   1.600   1.390       mlr cut -f a,x
   1.694   1.648       mlr cut -x -f a,x

   0.845   0.643       sed -e 's/x=/EKS=/' -e 's/b=/BEE=/'
   2.076   1.842       mlr rename x,EKS,b,BEE

   5.643   5.031       awk -F, '{gsub("x=","",$4);gsub("y=","",$5);print $4+$5}'
   4.019   3.679       mlr put '$z=$x+$y'

   2.481   2.628       mlr stats1 -a mean -f x,y -g a,b

   2.587   2.389       mlr stats2 -a corr -f x,y -g a,b

  23.247  14.466       sort -t, -k 1,2
   3.023   5.658       mlr sort -f a,b

  17.224  15.523       sort -t, -k 4,5
   5.807   5.194       mlr sort -n x,y

Analysis

As expected, cat is very fast — it needs only stream bytes as quickly as possible; it doesn’t even need to touch individual bytes.
My catc is also faster than Miller: it needs to read and write lines, but it doesn’t segment lines into records; in fact it does no iteration over bytes in each line.
Miller does not outperform sed, which is string-oriented rather than record-oriented.
For the tools which do need to pick apart fields (cut, awk, sort), Miller is comparable or outperforms. As noted above, this effect persists linearly across file sizes.
For univariate and bivariate statistics, I didn’t attempt to compare to other tools wherein such computations are less straightforward; rather, I attempted only to show that Miller’s processing time here is comparable to its own processing time for other problems.

Conclusion

For record-oriented data transformations, Miller meets or beats the Unix toolkit in many contexts. Field renames in particular are worth doing as a pre-pipe or post-pipe using sed.