Quick links: Flags Verbs Functions Glossary Release docs

DSL higher-order functions¶

A higher-order function is one which takes another function as an argument. As of Miller 6 you can use select, apply, reduce, fold, and sort, and any, and every to express flexible, intuitive operations on arrays and maps, as an alternative to things which would otherwise require for-loops.

See also the get_keys and get_values functions which, when given a map, return an array of its keys or an array of its values, respectively.

select¶

The select function takes a map or array as its first argument and a function as second argument. It includes each input element in the output if the function returns true.

For arrays, that function should take one argument, for array element; for maps, it should take two, for map-element key and value. In either case it should return a boolean.

A perhaps helpful analogy: the select function is to arrays and maps as the filter is to records.

Array examples:

mlr -n put '
  end {
    my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6];

    print "Original:";
    print my_array;

    print;
    print "Evens:";
    print select(my_array, func (e) { return e % 2 == 0});

    print;
    print "Odds:";
    print select(my_array, func (e) { return e % 2 == 1});
    print;
  }
'

Original:
[2, 9, 10, 3, 1, 4, 5, 8, 7, 6]

Evens:
[2, 10, 4, 8, 6]

Odds:
[9, 3, 1, 5, 7]

Map examples:

mlr -n put '
  end {
    my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107};
    print "Original:";
    print my_map;

    print;
    print "Keys with an 'o' in them:";
    print select(my_map, func (k,v) { return k =~ "o"});

    print;
    print "Values with last digit >= 5:";
    print select(my_map, func (k,v) { return v % 10 >= 5});
  }
'

Original:
{
  "cubit": 823,
  "dale": 13,
  "apple": 199,
  "ember": 191,
  "bottle": 107
}

Keys with an o in them:
{
  "bottle": 107
}

Values with last digit >= 5:
{
  "apple": 199,
  "bottle": 107
}

apply¶

The apply function takes a map or array as its first argument and a function as second argument. It applies the function to each element of the array or map.

For arrays, the function should take one argument, for array element; it should return a new element. For maps, it should take two, for map-element key and value. It should return a new key-value pair (i.e. a single-entry map).

A perhaps helpful analogy: the apply function is to arrays and maps as the put is to records.

Array examples:

mlr -n put '
  end {
    my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6];
    print "Original:";
    print my_array;

    print;
    print "Squares:";
    print apply(my_array, func(e) { return e**2 });

    print;
    print "Cubes:";
    print apply(my_array, func(e) { return e**3 });

    print;
    print "Sorted cubes:";
    print sort(apply(my_array, func(e) { return e**3 }));
  }
'

Original:
[2, 9, 10, 3, 1, 4, 5, 8, 7, 6]

Squares:
[4, 81, 100, 9, 1, 16, 25, 64, 49, 36]

Cubes:
[8, 729, 1000, 27, 1, 64, 125, 512, 343, 216]

Sorted cubes:
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]

mlr -n put '
  end {
    my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107};
    print "Original:";
    print my_map;

    print;
    print "Squared values:";
    print apply(my_map, func(k,v) { return {k: v**2} });

    print;
    print "Cubed values, sorted by key:";
    print sort(apply(my_map, func(k,v) { return {k: v**3} }));

    print;
    print "Same, with upcased keys:";
    print sort(apply(my_map, func(k,v) { return {toupper(k): v**3} }));
  }
'

Original:
{
  "cubit": 823,
  "dale": 13,
  "apple": 199,
  "ember": 191,
  "bottle": 107
}

Squared values:
{
  "cubit": 677329,
  "dale": 169,
  "apple": 39601,
  "ember": 36481,
  "bottle": 11449
}

Cubed values, sorted by key:
{
  "apple": 7880599,
  "bottle": 1225043,
  "cubit": 557441767,
  "dale": 2197,
  "ember": 6967871
}

Same, with upcased keys:
{
  "APPLE": 7880599,
  "BOTTLE": 1225043,
  "CUBIT": 557441767,
  "DALE": 2197,
  "EMBER": 6967871
}

reduce¶

The reduce function takes a map or array as its first argument and a function as second argument. It accumulates entries into a final output -- for example, sum or product.

For arrays, the function should take two arguments, for accumulated value and array element; for maps, it should take four, for accumulated key and value and map-element key and value. In either case it should return the updated accumulator.

The start value for the accumulator is the first element for arrays, or the first element's key-value pair for maps.

mlr -n put '
  end {
    my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6];

    print "Original:";
    print my_array;

    print;
    print "First element:";
    print reduce(my_array, func (acc,e) { return acc });

    print;
    print "Last element:";
    print reduce(my_array, func (acc,e) { return e });

    print;
    print "Sum of values:";
    print reduce(my_array, func (acc,e) { return acc + e });

    print;
    print "Product of values:";
    print reduce(my_array, func (acc,e) { return acc * e });

    print;
    print "Concatenation of values:";
    print reduce(my_array, func (acc,e) { return acc. "," . e });
  }
'

Original:
[2, 9, 10, 3, 1, 4, 5, 8, 7, 6]

First element:
2

Last element:
6

Sum of values:
55

Product of values:
3628800

Concatenation of values:
2,9,10,3,1,4,5,8,7,6

mlr -n put '
  end {
    my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107};
    print "Original:";
    print my_map;

    print;
    print "First key-value pair:";
    print reduce(my_map, func (acck,accv,ek,ev) { return {acck: accv}});

    print;
    print "Last key-value pair:";
    print reduce(my_map, func (acck,accv,ek,ev) { return {ek: ev}});

    print;
    print "Concatenate keys and values:";
    print reduce(my_map, func (acck,accv,ek,ev) { return {acck . "," . ek: accv . "," . ev}});

    print;
    print "Sum of values:";
    print reduce(my_map, func (acck,accv,ek,ev) { return {"sum": accv + ev }});

    print;
    print "Product of values:";
    print reduce(my_map, func (acck,accv,ek,ev) { return {"product": accv * ev }});

    print;
    print "String-join of values:";
    print reduce(my_map, func (acck,accv,ek,ev) { return {"joined": accv . "," . ev }});
  }
'

Original:
{
  "cubit": 823,
  "dale": 13,
  "apple": 199,
  "ember": 191,
  "bottle": 107
}

First key-value pair:
{
  "cubit": 823
}

Last key-value pair:
{
  "bottle": 107
}

Concatenate keys and values:
{
  "cubit,dale,apple,ember,bottle": "823,13,199,191,107"
}

Sum of values:
{
  "sum": 1333
}

Product of values:
{
  "product": 43512437137
}

String-join of values:
{
  "joined": "823,13,199,191,107"
}

fold¶

The fold function is the same as reduce, except that instead of the starting value for the accumulation being taken from the first entry of the array/map, you specify it as the third argument.

mlr -n put '
  end {
    my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6];

    print "Original:";
    print my_array;

    print;
    print "Sum with reduce:";
    print reduce(my_array, func (acc,e) { return acc + e });

    print;
    print "Sum with fold and 0 initial value:";
    print fold(my_array, func (acc,e) { return acc + e }, 0);

    print;
    print "Sum with fold and 1000000 initial value:";
    print fold(my_array, func (acc,e) { return acc + e }, 1000000);
  }
'

Original:
[2, 9, 10, 3, 1, 4, 5, 8, 7, 6]

Sum with reduce:
55

Sum with fold and 0 initial value:
55

Sum with fold and 1000000 initial value:
1000055

mlr -n put '
  end {
    my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107};
    print "Original:";
    print my_map;

    print;
    print "First key-value pair -- note this is the starting accumulator:";
    print fold(my_map, func (acck,accv,ek,ev) { return {acck: accv}}, {"start": 999});

    print;
    print "Last key-value pair:";
    print fold(my_map, func (acck,accv,ek,ev) { return {ek: ev}}, {"start": 999});

    print;
    print "Sum of values with fold and 0 initial value:";
    print fold(my_map, func (acck,accv,ek,ev) { return {"sum": accv + ev} }, {"sum": 0});

    print;
    print "Sum of values with fold and 1000000 initial value:";
    print fold(my_map, func (acck,accv,ek,ev) { return {"sum": accv + ev} }, {"sum": 1000000});
  }
'

Original:
{
  "cubit": 823,
  "dale": 13,
  "apple": 199,
  "ember": 191,
  "bottle": 107
}

First key-value pair -- note this is the starting accumulator:
{
  "start": 999
}

Last key-value pair:
{
  "bottle": 107
}

Sum of values with fold and 0 initial value:
{
  "sum": 1333
}

Sum of values with fold and 1000000 initial value:
{
  "sum": 1001333
}

sort¶

The sort function takes a map or array as its first argument, and it can take a function as second argument. Unlike the other higher-order functions, the second argument can be omitted when the natural ordering is desired -- ordered by array element for arrays, or by key for maps.

As a second option, character flags such as r for reverse or c for case-folded lexical sort can be supplied as the second argument.

As a third option, a function can be supplied as the second argument.

For arrays, that function should take two arguments a and b, returning a negative, zero, or positive number as a<b, a==b, or a>b respectively. For maps, the function should take four arguments ak, av, bk, and bv, again returning negative, zero, or positive, using a and b's keys and values.

Array examples:

mlr -n put '
  end {
    my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6];

    print "Original:";
    print my_array;

    print;
    print "Ascending:";
    print sort(my_array);
    print sort(my_array, func (a,b) { return a <=> b });

    print;
    print "Descending:";
    print sort(my_array, "r");
    print sort(my_array, func (a,b) { return b <=> a });
  }
'

Original:
[2, 9, 10, 3, 1, 4, 5, 8, 7, 6]

Ascending:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Descending:
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

Map examples:

mlr -n put '
  end {
    my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107};

    print "Original:";
    print my_map;

    print;
    print "Ascending by key:";
    print sort(my_map);
    print sort(my_map, func(ak,av,bk,bv) { return ak <=> bk });

    print;
    print "Descending by key:";
    print sort(my_map, "r");
    print sort(my_map, func(ak,av,bk,bv) { return bk <=> ak });

    print;
    print "Ascending by value:";
    print sort(my_map, func(ak,av,bk,bv) { return av <=> bv });

    print;
    print "Descending by value:";
    print sort(my_map, func(ak,av,bk,bv) { return bv <=> av });
  }
'

Original:
{
  "cubit": 823,
  "dale": 13,
  "apple": 199,
  "ember": 191,
  "bottle": 107
}

Ascending by key:
{
  "apple": 199,
  "bottle": 107,
  "cubit": 823,
  "dale": 13,
  "ember": 191
}
{
  "apple": 199,
  "bottle": 107,
  "cubit": 823,
  "dale": 13,
  "ember": 191
}

Descending by key:
{
  "ember": 191,
  "dale": 13,
  "cubit": 823,
  "bottle": 107,
  "apple": 199
}
{
  "ember": 191,
  "dale": 13,
  "cubit": 823,
  "bottle": 107,
  "apple": 199
}

Ascending by value:
{
  "dale": 13,
  "bottle": 107,
  "ember": 191,
  "apple": 199,
  "cubit": 823
}

Descending by value:
{
  "cubit": 823,
  "apple": 199,
  "ember": 191,
  "bottle": 107,
  "dale": 13
}

Please see the sorting page for more examples.

any and every¶

This is a way to do a logical OR/AND, respectively, of several boolean expressions, without the explicit ||/&& and without a for-loop. This is a keystroke-saving convenience.

mlr --c2p cat example.csv

color  shape    flag  k  index quantity rate
yellow triangle true  1  11    43.6498  9.8870
red    square   true  2  15    79.2778  0.0130
red    circle   true  3  16    13.8103  2.9010
red    square   false 4  48    77.5542  7.4670
purple triangle false 5  51    81.2290  8.5910
red    square   false 6  64    77.1991  9.5310
purple triangle false 7  65    80.1405  5.8240
yellow circle   true  8  73    63.9785  4.2370
yellow circle   true  9  87    63.5058  8.3350
purple square   false 10 91    72.3735  8.2430

mlr --c2p --from example.csv filter 'any({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})'

color  shape  flag  k  index quantity rate
red    square true  2  15    79.2778  0.0130
red    circle true  3  16    13.8103  2.9010
red    square false 4  48    77.5542  7.4670
red    square false 6  64    77.1991  9.5310
purple square false 10 91    72.3735  8.2430

mlr --c2p --from example.csv filter 'every({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})'

color shape  flag  k index quantity rate
red   square true  2 15    79.2778  0.0130
red   square false 4 48    77.5542  7.4670
red   square false 6 64    77.1991  9.5310

mlr --c2p --from example.csv put '$is_red_square = every({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})'

color  shape    flag  k  index quantity rate   is_red_square
yellow triangle true  1  11    43.6498  9.8870 false
red    square   true  2  15    79.2778  0.0130 true
red    circle   true  3  16    13.8103  2.9010 false
red    square   false 4  48    77.5542  7.4670 true
purple triangle false 5  51    81.2290  8.5910 false
red    square   false 6  64    77.1991  9.5310 true
purple triangle false 7  65    80.1405  5.8240 false
yellow circle   true  8  73    63.9785  4.2370 false
yellow circle   true  9  87    63.5058  8.3350 false
purple square   false 10 91    72.3735  8.2430 false

mlr --c2p --from example.csv filter 'any([16,51,61,64], func(e) {return $index == e})'

color  shape    flag  k index quantity rate
red    circle   true  3 16    13.8103  2.9010
purple triangle false 5 51    81.2290  8.5910
red    square   false 6 64    77.1991  9.5310

This last example could also be done using a map:

mlr --c2p --from example.csv filter '
  begin {
    @indices = {16:true, 51:true, 61:true, 64:true};
  }
  @indices[$index] == true;
'

color  shape    flag  k index quantity rate
red    circle   true  3 16    13.8103  2.9010
purple triangle false 5 51    81.2290  8.5910
red    square   false 6 64    77.1991  9.5310

Combined examples¶

Using a paradigm from the page on operating on all records, we can retain a column from the input data as an array, then apply some higher-order functions to it:

mlr --c2p cat example.csv

color  shape    flag  k  index quantity rate
yellow triangle true  1  11    43.6498  9.8870
red    square   true  2  15    79.2778  0.0130
red    circle   true  3  16    13.8103  2.9010
red    square   false 4  48    77.5542  7.4670
purple triangle false 5  51    81.2290  8.5910
red    square   false 6  64    77.1991  9.5310
purple triangle false 7  65    80.1405  5.8240
yellow circle   true  8  73    63.9785  4.2370
yellow circle   true  9  87    63.5058  8.3350
purple square   false 10 91    72.3735  8.2430

mlr --c2p --from example.csv put -q '
  begin {
    @indexes = [] # So auto-extend will make an array, not a map
  }
  @indexes[NR] = $index;
  end {

    print "Original:";
    print @indexes;

    print;
    print "Sorted:";
    print sort(@indexes, "r");

    print;
    print "Sorted, then cubed:";
    print apply(
      sort(@indexes, "r"),
      func(e) { return e**3 },
    );

    print;
    print "Sorted, then cubed, then summed:";
    print reduce(
      apply(
        sort(@indexes, "r"),
        func(e) { return e**3 },
      ),
      func(acc, e) { return acc + e },
    )
  }
'

Original:
[11, 15, 16, 48, 51, 64, 65, 73, 87, 91]

Sorted:
[91, 87, 73, 65, 64, 51, 48, 16, 15, 11]

Sorted, then cubed:
[753571, 658503, 389017, 274625, 262144, 132651, 110592, 4096, 3375, 1331]

Sorted, then cubed, then summed:
2589905

Caveats¶

Remember return¶

From other languages it's easy to accidentally write

mlr -n put 'end { print select([1,2,3,4,5], func (e) { e >= 3 })}'

mlr: select: function returned non-boolean "(absent)".

instead of

mlr -n put 'end { print select([1,2,3,4,5], func (e) { return e >= 3 })}'

[3, 4, 5]

No IIFEs¶

As of September 2021, immediately invoked function expressions (IIFEs) are not part of the Miller DSL's grammar. For example, this doesn't work yet:

mlr -n put '
  end {
    x = 3;
    y = (func (e) { return e**7 })(x);
    print y;
  }
'

mlr: cannot parse DSL expression.
Parse error on token "(" at line 4 column 35.
Please check for missing semicolon.
Expected one of:
  ; } > >> | ? || ^^ && ?? ??? =~ !=~ == != <=> >= < <= ^ & << >>> + - .+
  .- * / // % .* ./ .// . **

but this does:

mlr -n put '
  end {
    x = 3;
    f = func (e) { return e**7 };
    y = f(x);
    print y;
  }
'

Built-in functions currently unsupported as arguments¶

Built-in functions are, as of September 2021, a bit separate from user-defined functions internally to Miller, and can't be used directly as arguments to higher-order functions.

For example, this doesn't work yet:

mlr -n put '
  end {
    notches = [0,1,2,3];
    radians = apply(notches, func (e) { return e * M_PI / 8 });
    cosines = apply(radians, cos);
    print cosines;
  }
'

mlr: apply: second argument must be a function; got absent.

but this does:

mlr -n put '
  end {
    notches = [0,1,2,3];
    radians = apply(notches, func (e) { return e * M_PI / 8 });
    # cosines = apply(radians, cos);
    cosines = apply(radians, func (e) { return cos(e) });
    print cosines;
  }
'

[1, 0.9238795325112867, 0.7071067811865476, 0.38268343236508984]