Skip to content
Quick links:   Flags   Verbs   Functions   Glossary   Release docs

Statistics examples

Computing interquartile ranges

For one or more specified field names, simply compute p25 and p75, then write the IQR as the difference of p75 and p25:

mlr --oxtab stats1 -f x -a p25,p75 \
    then put '$x_iqr = $x_p75 - $x_p25' \
    data/medium 
x_p25 0.24667037823231752
x_p75 0.7481860062358446
x_iqr 0.5015156280035271

For wildcarded field names, first compute p25 and p75, then loop over field names with p25 in them:

mlr --oxtab stats1 --fr '[i-z]' -a p25,p75 \
    then put 'for (k,v in $*) {
      if (k =~ "(.*)_p25") {
        $["\1_iqr"] = $["\1_p75"] - $["\1_p25"]
      }
    }' \
    data/medium 
i_p25 2501
i_p75 7501
x_p25 0.24667037823231752
x_p75 0.7481860062358446
y_p25 0.25213670524015686
y_p75 0.7640028449996572
i_iqr 5000
x_iqr 0.5015156280035271
y_iqr 0.5118661397595003

Computing weighted means

This might be more elegantly implemented as an option within the stats1 verb. Meanwhile, it's expressible within the DSL:

mlr --from data/medium put -q '
  # Using the y field for weighting in this example
  weight = $y;

  # Using the a field for weighted aggregation in this example
  @sumwx[$a] += weight * $i;
  @sumw[$a] += weight;

  @sumx[$a] += $i;
  @sumn[$a] += 1;

  end {
    map wmean = {};
    map mean  = {};
    for (a in @sumwx) {
      wmean[a] = @sumwx[a] / @sumw[a]
    }
    for (a in @sumx) {
      mean[a] = @sumx[a] / @sumn[a]
    }
    #emit wmean, "a";
    #emit mean, "a";
    emit (wmean, mean), "a";
  }'
a=pan,wmean=4979.563722208067,mean=5028.259010091302
a=eks,wmean=4890.3815931472145,mean=4956.2900763358775
a=wye,wmean=4946.987746229947,mean=4920.001017293998
a=zee,wmean=5164.719684856538,mean=5123.092330239375
a=hat,wmean=4925.533162478552,mean=4967.743946419371
Back to top