DSL higher-order functions¶
A higher-order function is one which takes another function
as an argument.
As of Miller 6 you can use
select,
apply,
reduce,
fold, and
sort, and
any, and
every to express flexible,
intuitive operations on arrays and maps, as an alternative to things which
would otherwise require for-loops.
See also the get_keys and
get_values functions which,
when given a map, return an array of its keys or an array of its values,
respectively.
select¶
The select function takes a map
or array as its first argument and a function as second argument. It includes
each input element in the output if the function returns true.
For arrays, that function should take one argument, for array element; for maps, it should take two, for map-element key and value. In either case it should return a boolean.
A perhaps helpful analogy: the select function is to arrays and maps as the
filter is to records.
Array examples:
mlr -n put '
end {
my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6];
print "Original:";
print my_array;
print;
print "Evens:";
print select(my_array, func (e) { return e % 2 == 0});
print;
print "Odds:";
print select(my_array, func (e) { return e % 2 == 1});
print;
}
'
Original: [2, 9, 10, 3, 1, 4, 5, 8, 7, 6] Evens: [2, 10, 4, 8, 6] Odds: [9, 3, 1, 5, 7]
Map examples:
mlr -n put '
end {
my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107};
print "Original:";
print my_map;
print;
print "Keys with an 'o' in them:";
print select(my_map, func (k,v) { return k =~ "o"});
print;
print "Values with last digit >= 5:";
print select(my_map, func (k,v) { return v % 10 >= 5});
}
'
Original:
{
"cubit": 823,
"dale": 13,
"apple": 199,
"ember": 191,
"bottle": 107
}
Keys with an o in them:
{
"bottle": 107
}
Values with last digit >= 5:
{
"apple": 199,
"bottle": 107
}
apply¶
The apply function takes a map
or array as its first argument and a function as second argument. It applies
the function to each element of the array or map.
For arrays, the function should take one argument, for array element; it should return a new element. For maps, it should take two, for map-element key and value. It should return a new key-value pair (i.e. a single-entry map).
A perhaps helpful analogy: the apply function is to arrays and maps as the
put is to records.
Array examples:
mlr -n put '
end {
my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6];
print "Original:";
print my_array;
print;
print "Squares:";
print apply(my_array, func(e) { return e**2 });
print;
print "Cubes:";
print apply(my_array, func(e) { return e**3 });
print;
print "Sorted cubes:";
print sort(apply(my_array, func(e) { return e**3 }));
}
'
Original: [2, 9, 10, 3, 1, 4, 5, 8, 7, 6] Squares: [4, 81, 100, 9, 1, 16, 25, 64, 49, 36] Cubes: [8, 729, 1000, 27, 1, 64, 125, 512, 343, 216] Sorted cubes: [1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
mlr -n put '
end {
my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107};
print "Original:";
print my_map;
print;
print "Squared values:";
print apply(my_map, func(k,v) { return {k: v**2} });
print;
print "Cubed values, sorted by key:";
print sort(apply(my_map, func(k,v) { return {k: v**3} }));
print;
print "Same, with upcased keys:";
print sort(apply(my_map, func(k,v) { return {toupper(k): v**3} }));
}
'
Original:
{
"cubit": 823,
"dale": 13,
"apple": 199,
"ember": 191,
"bottle": 107
}
Squared values:
{
"cubit": 677329,
"dale": 169,
"apple": 39601,
"ember": 36481,
"bottle": 11449
}
Cubed values, sorted by key:
{
"apple": 7880599,
"bottle": 1225043,
"cubit": 557441767,
"dale": 2197,
"ember": 6967871
}
Same, with upcased keys:
{
"APPLE": 7880599,
"BOTTLE": 1225043,
"CUBIT": 557441767,
"DALE": 2197,
"EMBER": 6967871
}
reduce¶
The reduce function takes a map
or array as its first argument and a function as second argument. It accumulates entries into a final
output -- for example, sum or product.
For arrays, the function should take two arguments, for accumulated value and array element; for maps, it should take four, for accumulated key and value and map-element key and value. In either case it should return the updated accumulator.
The start value for the accumulator is the first element for arrays, or the first element's key-value pair for maps.
mlr -n put '
end {
my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6];
print "Original:";
print my_array;
print;
print "First element:";
print reduce(my_array, func (acc,e) { return acc });
print;
print "Last element:";
print reduce(my_array, func (acc,e) { return e });
print;
print "Sum of values:";
print reduce(my_array, func (acc,e) { return acc + e });
print;
print "Product of values:";
print reduce(my_array, func (acc,e) { return acc * e });
print;
print "Concatenation of values:";
print reduce(my_array, func (acc,e) { return acc. "," . e });
}
'
Original: [2, 9, 10, 3, 1, 4, 5, 8, 7, 6] First element: 2 Last element: 6 Sum of values: 55 Product of values: 3628800 Concatenation of values: 2,9,10,3,1,4,5,8,7,6
mlr -n put '
end {
my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107};
print "Original:";
print my_map;
print;
print "First key-value pair:";
print reduce(my_map, func (acck,accv,ek,ev) { return {acck: accv}});
print;
print "Last key-value pair:";
print reduce(my_map, func (acck,accv,ek,ev) { return {ek: ev}});
print;
print "Concatenate keys and values:";
print reduce(my_map, func (acck,accv,ek,ev) { return {acck . "," . ek: accv . "," . ev}});
print;
print "Sum of values:";
print reduce(my_map, func (acck,accv,ek,ev) { return {"sum": accv + ev }});
print;
print "Product of values:";
print reduce(my_map, func (acck,accv,ek,ev) { return {"product": accv * ev }});
print;
print "String-join of values:";
print reduce(my_map, func (acck,accv,ek,ev) { return {"joined": accv . "," . ev }});
}
'
Original:
{
"cubit": 823,
"dale": 13,
"apple": 199,
"ember": 191,
"bottle": 107
}
First key-value pair:
{
"cubit": 823
}
Last key-value pair:
{
"bottle": 107
}
Concatenate keys and values:
{
"cubit,dale,apple,ember,bottle": "823,13,199,191,107"
}
Sum of values:
{
"sum": 1333
}
Product of values:
{
"product": 43512437137
}
String-join of values:
{
"joined": "823,13,199,191,107"
}
fold¶
The fold function is the same as
reduce, except that instead of the starting value for the accumulation being
taken from the first entry of the array/map, you specify it as the third
argument.
mlr -n put '
end {
my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6];
print "Original:";
print my_array;
print;
print "Sum with reduce:";
print reduce(my_array, func (acc,e) { return acc + e });
print;
print "Sum with fold and 0 initial value:";
print fold(my_array, func (acc,e) { return acc + e }, 0);
print;
print "Sum with fold and 1000000 initial value:";
print fold(my_array, func (acc,e) { return acc + e }, 1000000);
}
'
Original: [2, 9, 10, 3, 1, 4, 5, 8, 7, 6] Sum with reduce: 55 Sum with fold and 0 initial value: 55 Sum with fold and 1000000 initial value: 1000055
mlr -n put '
end {
my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107};
print "Original:";
print my_map;
print;
print "First key-value pair -- note this is the starting accumulator:";
print fold(my_map, func (acck,accv,ek,ev) { return {acck: accv}}, {"start": 999});
print;
print "Last key-value pair:";
print fold(my_map, func (acck,accv,ek,ev) { return {ek: ev}}, {"start": 999});
print;
print "Sum of values with fold and 0 initial value:";
print fold(my_map, func (acck,accv,ek,ev) { return {"sum": accv + ev} }, {"sum": 0});
print;
print "Sum of values with fold and 1000000 initial value:";
print fold(my_map, func (acck,accv,ek,ev) { return {"sum": accv + ev} }, {"sum": 1000000});
}
'
Original:
{
"cubit": 823,
"dale": 13,
"apple": 199,
"ember": 191,
"bottle": 107
}
First key-value pair -- note this is the starting accumulator:
{
"start": 999
}
Last key-value pair:
{
"bottle": 107
}
Sum of values with fold and 0 initial value:
{
"sum": 1333
}
Sum of values with fold and 1000000 initial value:
{
"sum": 1001333
}
sort¶
The sort function takes a map or
array as its first argument, and it can take a function as second argument.
Unlike the other higher-order functions, the second argument can be omitted
when the natural ordering is desired -- ordered by array element for arrays, or by
key for maps.
As a second option, character flags such as r for reverse or c for
case-folded lexical sort can be supplied as the second argument.
As a third option, a function can be supplied as the second argument.
For arrays, that function should take two arguments a and b, returning a
negative, zero, or positive number as a<b, a==b, or a>b respectively.
For maps, the function should take four arguments ak, av, bk, and bv,
again returning negative, zero, or positive, using a and b's keys and
values.
Array examples:
mlr -n put '
end {
my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6];
print "Original:";
print my_array;
print;
print "Ascending:";
print sort(my_array);
print sort(my_array, func (a,b) { return a <=> b });
print;
print "Descending:";
print sort(my_array, "r");
print sort(my_array, func (a,b) { return b <=> a });
}
'
Original: [2, 9, 10, 3, 1, 4, 5, 8, 7, 6] Ascending: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] Descending: [10, 9, 8, 7, 6, 5, 4, 3, 2, 1] [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
Map examples:
mlr -n put '
end {
my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107};
print "Original:";
print my_map;
print;
print "Ascending by key:";
print sort(my_map);
print sort(my_map, func(ak,av,bk,bv) { return ak <=> bk });
print;
print "Descending by key:";
print sort(my_map, "r");
print sort(my_map, func(ak,av,bk,bv) { return bk <=> ak });
print;
print "Ascending by value:";
print sort(my_map, func(ak,av,bk,bv) { return av <=> bv });
print;
print "Descending by value:";
print sort(my_map, func(ak,av,bk,bv) { return bv <=> av });
}
'
Original:
{
"cubit": 823,
"dale": 13,
"apple": 199,
"ember": 191,
"bottle": 107
}
Ascending by key:
{
"apple": 199,
"bottle": 107,
"cubit": 823,
"dale": 13,
"ember": 191
}
{
"apple": 199,
"bottle": 107,
"cubit": 823,
"dale": 13,
"ember": 191
}
Descending by key:
{
"ember": 191,
"dale": 13,
"cubit": 823,
"bottle": 107,
"apple": 199
}
{
"ember": 191,
"dale": 13,
"cubit": 823,
"bottle": 107,
"apple": 199
}
Ascending by value:
{
"dale": 13,
"bottle": 107,
"ember": 191,
"apple": 199,
"cubit": 823
}
Descending by value:
{
"cubit": 823,
"apple": 199,
"ember": 191,
"bottle": 107,
"dale": 13
}
Please see the sorting page for more examples.
any and every¶
This is a way to do a logical OR/AND, respectively, of several boolean expressions, without the explicit ||/&& and without a for-loop. This is a keystroke-saving convenience.
mlr --c2p cat example.csv
color shape flag k index quantity rate yellow triangle true 1 11 43.6498 9.8870 red square true 2 15 79.2778 0.0130 red circle true 3 16 13.8103 2.9010 red square false 4 48 77.5542 7.4670 purple triangle false 5 51 81.2290 8.5910 red square false 6 64 77.1991 9.5310 purple triangle false 7 65 80.1405 5.8240 yellow circle true 8 73 63.9785 4.2370 yellow circle true 9 87 63.5058 8.3350 purple square false 10 91 72.3735 8.2430
mlr --c2p --from example.csv filter 'any({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})'
color shape flag k index quantity rate red square true 2 15 79.2778 0.0130 red circle true 3 16 13.8103 2.9010 red square false 4 48 77.5542 7.4670 red square false 6 64 77.1991 9.5310 purple square false 10 91 72.3735 8.2430
mlr --c2p --from example.csv filter 'every({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})'
color shape flag k index quantity rate red square true 2 15 79.2778 0.0130 red square false 4 48 77.5542 7.4670 red square false 6 64 77.1991 9.5310
mlr --c2p --from example.csv put '$is_red_square = every({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})'
color shape flag k index quantity rate is_red_square yellow triangle true 1 11 43.6498 9.8870 false red square true 2 15 79.2778 0.0130 true red circle true 3 16 13.8103 2.9010 false red square false 4 48 77.5542 7.4670 true purple triangle false 5 51 81.2290 8.5910 false red square false 6 64 77.1991 9.5310 true purple triangle false 7 65 80.1405 5.8240 false yellow circle true 8 73 63.9785 4.2370 false yellow circle true 9 87 63.5058 8.3350 false purple square false 10 91 72.3735 8.2430 false
mlr --c2p --from example.csv filter 'any([16,51,61,64], func(e) {return $index == e})'
color shape flag k index quantity rate red circle true 3 16 13.8103 2.9010 purple triangle false 5 51 81.2290 8.5910 red square false 6 64 77.1991 9.5310
This last example could also be done using a map:
mlr --c2p --from example.csv filter '
begin {
@indices = {16:true, 51:true, 61:true, 64:true};
}
@indices[$index] == true;
'
color shape flag k index quantity rate red circle true 3 16 13.8103 2.9010 purple triangle false 5 51 81.2290 8.5910 red square false 6 64 77.1991 9.5310
Combined examples¶
Using a paradigm from the page on operating on all records, we can retain a column from the input data as an array, then apply some higher-order functions to it:
mlr --c2p cat example.csv
color shape flag k index quantity rate yellow triangle true 1 11 43.6498 9.8870 red square true 2 15 79.2778 0.0130 red circle true 3 16 13.8103 2.9010 red square false 4 48 77.5542 7.4670 purple triangle false 5 51 81.2290 8.5910 red square false 6 64 77.1991 9.5310 purple triangle false 7 65 80.1405 5.8240 yellow circle true 8 73 63.9785 4.2370 yellow circle true 9 87 63.5058 8.3350 purple square false 10 91 72.3735 8.2430
mlr --c2p --from example.csv put -q '
begin {
@indexes = [] # So auto-extend will make an array, not a map
}
@indexes[NR] = $index;
end {
print "Original:";
print @indexes;
print;
print "Sorted:";
print sort(@indexes, "r");
print;
print "Sorted, then cubed:";
print apply(
sort(@indexes, "r"),
func(e) { return e**3 },
);
print;
print "Sorted, then cubed, then summed:";
print reduce(
apply(
sort(@indexes, "r"),
func(e) { return e**3 },
),
func(acc, e) { return acc + e },
)
}
'
Original: [11, 15, 16, 48, 51, 64, 65, 73, 87, 91] Sorted: [91, 87, 73, 65, 64, 51, 48, 16, 15, 11] Sorted, then cubed: [753571, 658503, 389017, 274625, 262144, 132651, 110592, 4096, 3375, 1331] Sorted, then cubed, then summed: 2589905
Caveats¶
Remember return¶
From other languages it's easy to accidentally write
mlr -n put 'end { print select([1,2,3,4,5], func (e) { e >= 3 })}'
mlr: select: function returned non-boolean "(absent)".
instead of
mlr -n put 'end { print select([1,2,3,4,5], func (e) { return e >= 3 })}'
[3, 4, 5]
No IIFEs¶
As of September 2021, immediately invoked function expressions (IIFEs) are not part of the Miller DSL's grammar. For example, this doesn't work yet:
mlr -n put '
end {
x = 3;
y = (func (e) { return e**7 })(x);
print y;
}
'
mlr: cannot parse DSL expression.
Parse error on token "(" at line 4 column 35.
Please check for missing semicolon.
Expected one of:
; } > >> | ? || ^^ && ?? ??? =~ !=~ == != <=> >= < <= ^ & << >>> + - .+
.- * / // % .* ./ .// . **
but this does:
mlr -n put '
end {
x = 3;
f = func (e) { return e**7 };
y = f(x);
print y;
}
'
2187
Built-in functions currently unsupported as arguments¶
Built-in functions are, as of September 2021, a bit separate from user-defined functions internally to Miller, and can't be used directly as arguments to higher-order functions.
For example, this doesn't work yet:
mlr -n put '
end {
notches = [0,1,2,3];
radians = apply(notches, func (e) { return e * M_PI / 8 });
cosines = apply(radians, cos);
print cosines;
}
'
mlr: apply: second argument must be a function; got absent.
but this does:
mlr -n put '
end {
notches = [0,1,2,3];
radians = apply(notches, func (e) { return e * M_PI / 8 });
# cosines = apply(radians, cos);
cosines = apply(radians, func (e) { return cos(e) });
print cosines;
}
'
[1, 0.9238795325112867, 0.7071067811865476, 0.38268343236508984]