DSL higher-order functions¶
A higher-order function is one which takes another function
as an argument.
As of Miller 6 you can use
select
,
apply
,
reduce
,
fold
, and
sort
, and
any
, and
every
to express flexible,
intuitive operations on arrays and maps, as an alternative to things which
would otherwise require for-loops.
See also the get_keys
and
get_values
functions which,
when given a map, return an array of its keys or an array of its values,
respectively.
select¶
The select
function takes a map
or array as its first argument and a function as second argument. It includes
each input element in the output if the function returns true.
For arrays, that function should take one argument, for array element; for maps, it should take two, for map-element key and value. In either case it should return a boolean.
A perhaps helpful analogy: the select
function is to arrays and maps as the
filter
is to records.
Array examples:
mlr -n put ' end { my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6]; print "Original:"; print my_array; print; print "Evens:"; print select(my_array, func (e) { return e % 2 == 0}); print; print "Odds:"; print select(my_array, func (e) { return e % 2 == 1}); print; } '
Original: [2, 9, 10, 3, 1, 4, 5, 8, 7, 6] Evens: [2, 10, 4, 8, 6] Odds: [9, 3, 1, 5, 7]
Map examples:
mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; print "Original:"; print my_map; print; print "Keys with an 'o' in them:"; print select(my_map, func (k,v) { return k =~ "o"}); print; print "Values with last digit >= 5:"; print select(my_map, func (k,v) { return v % 10 >= 5}); } '
Original: { "cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107 } Keys with an o in them: { "bottle": 107 } Values with last digit >= 5: { "apple": 199, "bottle": 107 }
apply¶
The apply
function takes a map
or array as its first argument and a function as second argument. It applies
the function to each element of the array or map.
For arrays, the function should take one argument, for array element; it should return a new element. For maps, it should take two, for map-element key and value. It should return a new key-value pair (i.e. a single-entry map).
A perhaps helpful analogy: the apply
function is to arrays and maps as the
put
is to records.
Array examples:
mlr -n put ' end { my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6]; print "Original:"; print my_array; print; print "Squares:"; print apply(my_array, func(e) { return e**2 }); print; print "Cubes:"; print apply(my_array, func(e) { return e**3 }); print; print "Sorted cubes:"; print sort(apply(my_array, func(e) { return e**3 })); } '
Original: [2, 9, 10, 3, 1, 4, 5, 8, 7, 6] Squares: [4, 81, 100, 9, 1, 16, 25, 64, 49, 36] Cubes: [8, 729, 1000, 27, 1, 64, 125, 512, 343, 216] Sorted cubes: [1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; print "Original:"; print my_map; print; print "Squared values:"; print apply(my_map, func(k,v) { return {k: v**2} }); print; print "Cubed values, sorted by key:"; print sort(apply(my_map, func(k,v) { return {k: v**3} })); print; print "Same, with upcased keys:"; print sort(apply(my_map, func(k,v) { return {toupper(k): v**3} })); } '
Original: { "cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107 } Squared values: { "cubit": 677329, "dale": 169, "apple": 39601, "ember": 36481, "bottle": 11449 } Cubed values, sorted by key: { "apple": 7880599, "bottle": 1225043, "cubit": 557441767, "dale": 2197, "ember": 6967871 } Same, with upcased keys: { "APPLE": 7880599, "BOTTLE": 1225043, "CUBIT": 557441767, "DALE": 2197, "EMBER": 6967871 }
reduce¶
The reduce
function takes a map
or array as its first argument and a function as second argument. It accumulates entries into a final
output -- for example, sum or product.
For arrays, the function should take two arguments, for accumulated value and array element; for maps, it should take four, for accumulated key and value and map-element key and value. In either case it should return the updated accumulator.
The start value for the accumulator is the first element for arrays, or the first element's key-value pair for maps.
mlr -n put ' end { my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6]; print "Original:"; print my_array; print; print "First element:"; print reduce(my_array, func (acc,e) { return acc }); print; print "Last element:"; print reduce(my_array, func (acc,e) { return e }); print; print "Sum of values:"; print reduce(my_array, func (acc,e) { return acc + e }); print; print "Product of values:"; print reduce(my_array, func (acc,e) { return acc * e }); print; print "Concatenation of values:"; print reduce(my_array, func (acc,e) { return acc. "," . e }); } '
Original: [2, 9, 10, 3, 1, 4, 5, 8, 7, 6] First element: 2 Last element: 6 Sum of values: 55 Product of values: 3628800 Concatenation of values: 2,9,10,3,1,4,5,8,7,6
mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; print "Original:"; print my_map; print; print "First key-value pair:"; print reduce(my_map, func (acck,accv,ek,ev) { return {acck: accv}}); print; print "Last key-value pair:"; print reduce(my_map, func (acck,accv,ek,ev) { return {ek: ev}}); print; print "Concatenate keys and values:"; print reduce(my_map, func (acck,accv,ek,ev) { return {acck . "," . ek: accv . "," . ev}}); print; print "Sum of values:"; print reduce(my_map, func (acck,accv,ek,ev) { return {"sum": accv + ev }}); print; print "Product of values:"; print reduce(my_map, func (acck,accv,ek,ev) { return {"product": accv * ev }}); print; print "String-join of values:"; print reduce(my_map, func (acck,accv,ek,ev) { return {"joined": accv . "," . ev }}); } '
Original: { "cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107 } First key-value pair: { "cubit": 823 } Last key-value pair: { "bottle": 107 } Concatenate keys and values: { "cubit,dale,apple,ember,bottle": "823,13,199,191,107" } Sum of values: { "sum": 1333 } Product of values: { "product": 43512437137 } String-join of values: { "joined": "823,13,199,191,107" }
fold¶
The fold
function is the same as
reduce
, except that instead of the starting value for the accumulation being
taken from the first entry of the array/map, you specify it as the third
argument.
mlr -n put ' end { my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6]; print "Original:"; print my_array; print; print "Sum with reduce:"; print reduce(my_array, func (acc,e) { return acc + e }); print; print "Sum with fold and 0 initial value:"; print fold(my_array, func (acc,e) { return acc + e }, 0); print; print "Sum with fold and 1000000 initial value:"; print fold(my_array, func (acc,e) { return acc + e }, 1000000); } '
Original: [2, 9, 10, 3, 1, 4, 5, 8, 7, 6] Sum with reduce: 55 Sum with fold and 0 initial value: 55 Sum with fold and 1000000 initial value: 1000055
mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; print "Original:"; print my_map; print; print "First key-value pair -- note this is the starting accumulator:"; print fold(my_map, func (acck,accv,ek,ev) { return {acck: accv}}, {"start": 999}); print; print "Last key-value pair:"; print fold(my_map, func (acck,accv,ek,ev) { return {ek: ev}}, {"start": 999}); print; print "Sum of values with fold and 0 initial value:"; print fold(my_map, func (acck,accv,ek,ev) { return {"sum": accv + ev} }, {"sum": 0}); print; print "Sum of values with fold and 1000000 initial value:"; print fold(my_map, func (acck,accv,ek,ev) { return {"sum": accv + ev} }, {"sum": 1000000}); } '
Original: { "cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107 } First key-value pair -- note this is the starting accumulator: { "start": 999 } Last key-value pair: { "bottle": 107 } Sum of values with fold and 0 initial value: { "sum": 1333 } Sum of values with fold and 1000000 initial value: { "sum": 1001333 }
sort¶
The sort
function takes a map or
array as its first argument, and it can take a function as second argument.
Unlike the other higher-order functions, the second argument can be omitted
when the natural ordering is desired -- ordered by array element for arrays, or by
key for maps.
As a second option, character flags such as r
for reverse or c
for
case-folded lexical sort can be supplied as the second argument.
As a third option, a function can be supplied as the second argument.
For arrays, that function should take two arguments a
and b
, returning a
negative, zero, or positive number as a<b
, a==b
, or a>b
respectively.
For maps, the function should take four arguments ak
, av
, bk
, and bv
,
again returning negative, zero, or positive, using a
and b
's keys and
values.
Array examples:
mlr -n put ' end { my_array = [2, 9, 10, 3, 1, 4, 5, 8, 7, 6]; print "Original:"; print my_array; print; print "Ascending:"; print sort(my_array); print sort(my_array, func (a,b) { return a <=> b }); print; print "Descending:"; print sort(my_array, "r"); print sort(my_array, func (a,b) { return b <=> a }); } '
Original: [2, 9, 10, 3, 1, 4, 5, 8, 7, 6] Ascending: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] Descending: [10, 9, 8, 7, 6, 5, 4, 3, 2, 1] [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
Map examples:
mlr -n put ' end { my_map = {"cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107}; print "Original:"; print my_map; print; print "Ascending by key:"; print sort(my_map); print sort(my_map, func(ak,av,bk,bv) { return ak <=> bk }); print; print "Descending by key:"; print sort(my_map, "r"); print sort(my_map, func(ak,av,bk,bv) { return bk <=> ak }); print; print "Ascending by value:"; print sort(my_map, func(ak,av,bk,bv) { return av <=> bv }); print; print "Descending by value:"; print sort(my_map, func(ak,av,bk,bv) { return bv <=> av }); } '
Original: { "cubit": 823, "dale": 13, "apple": 199, "ember": 191, "bottle": 107 } Ascending by key: { "apple": 199, "bottle": 107, "cubit": 823, "dale": 13, "ember": 191 } { "apple": 199, "bottle": 107, "cubit": 823, "dale": 13, "ember": 191 } Descending by key: { "ember": 191, "dale": 13, "cubit": 823, "bottle": 107, "apple": 199 } { "ember": 191, "dale": 13, "cubit": 823, "bottle": 107, "apple": 199 } Ascending by value: { "dale": 13, "bottle": 107, "ember": 191, "apple": 199, "cubit": 823 } Descending by value: { "cubit": 823, "apple": 199, "ember": 191, "bottle": 107, "dale": 13 }
Please see the sorting page for more examples.
any and every¶
This is a way to do a logical OR/AND, respectively, of several boolean expressions, without the explicit ||
/&&
and without a for
-loop. This is a keystroke-saving convenience.
mlr --c2p cat example.csv
color shape flag k index quantity rate yellow triangle true 1 11 43.6498 9.8870 red square true 2 15 79.2778 0.0130 red circle true 3 16 13.8103 2.9010 red square false 4 48 77.5542 7.4670 purple triangle false 5 51 81.2290 8.5910 red square false 6 64 77.1991 9.5310 purple triangle false 7 65 80.1405 5.8240 yellow circle true 8 73 63.9785 4.2370 yellow circle true 9 87 63.5058 8.3350 purple square false 10 91 72.3735 8.2430
mlr --c2p --from example.csv filter 'any({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})'
color shape flag k index quantity rate red square true 2 15 79.2778 0.0130 red circle true 3 16 13.8103 2.9010 red square false 4 48 77.5542 7.4670 red square false 6 64 77.1991 9.5310 purple square false 10 91 72.3735 8.2430
mlr --c2p --from example.csv filter 'every({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})'
color shape flag k index quantity rate red square true 2 15 79.2778 0.0130 red square false 4 48 77.5542 7.4670 red square false 6 64 77.1991 9.5310
mlr --c2p --from example.csv put '$is_red_square = every({"color":"red","shape":"square"}, func(k,v) {return $[k] == v})'
color shape flag k index quantity rate is_red_square yellow triangle true 1 11 43.6498 9.8870 false red square true 2 15 79.2778 0.0130 true red circle true 3 16 13.8103 2.9010 false red square false 4 48 77.5542 7.4670 true purple triangle false 5 51 81.2290 8.5910 false red square false 6 64 77.1991 9.5310 true purple triangle false 7 65 80.1405 5.8240 false yellow circle true 8 73 63.9785 4.2370 false yellow circle true 9 87 63.5058 8.3350 false purple square false 10 91 72.3735 8.2430 false
mlr --c2p --from example.csv filter 'any([16,51,61,64], func(e) {return $index == e})'
color shape flag k index quantity rate red circle true 3 16 13.8103 2.9010 purple triangle false 5 51 81.2290 8.5910 red square false 6 64 77.1991 9.5310
This last example could also be done using a map:
mlr --c2p --from example.csv filter ' begin { @indices = {16:true, 51:true, 61:true, 64:true}; } @indices[$index] == true; '
color shape flag k index quantity rate red circle true 3 16 13.8103 2.9010 purple triangle false 5 51 81.2290 8.5910 red square false 6 64 77.1991 9.5310
Combined examples¶
Using a paradigm from the page on operating on all records, we can retain a column from the input data as an array, then apply some higher-order functions to it:
mlr --c2p cat example.csv
color shape flag k index quantity rate yellow triangle true 1 11 43.6498 9.8870 red square true 2 15 79.2778 0.0130 red circle true 3 16 13.8103 2.9010 red square false 4 48 77.5542 7.4670 purple triangle false 5 51 81.2290 8.5910 red square false 6 64 77.1991 9.5310 purple triangle false 7 65 80.1405 5.8240 yellow circle true 8 73 63.9785 4.2370 yellow circle true 9 87 63.5058 8.3350 purple square false 10 91 72.3735 8.2430
mlr --c2p --from example.csv put -q ' begin { @indexes = [] # So auto-extend will make an array, not a map } @indexes[NR] = $index; end { print "Original:"; print @indexes; print; print "Sorted:"; print sort(@indexes, "r"); print; print "Sorted, then cubed:"; print apply( sort(@indexes, "r"), func(e) { return e**3 }, ); print; print "Sorted, then cubed, then summed:"; print reduce( apply( sort(@indexes, "r"), func(e) { return e**3 }, ), func(acc, e) { return acc + e }, ) } '
Original: [11, 15, 16, 48, 51, 64, 65, 73, 87, 91] Sorted: [91, 87, 73, 65, 64, 51, 48, 16, 15, 11] Sorted, then cubed: [753571, 658503, 389017, 274625, 262144, 132651, 110592, 4096, 3375, 1331] Sorted, then cubed, then summed: 2589905
Caveats¶
Remember return¶
From other languages it's easy to accidentally write
mlr -n put 'end { print select([1,2,3,4,5], func (e) { e >= 3 })}'
mlr: select: function returned non-boolean "(absent)".
instead of
mlr -n put 'end { print select([1,2,3,4,5], func (e) { return e >= 3 })}'
[3, 4, 5]
No IIFEs¶
As of September 2021, immediately invoked function expressions (IIFEs) are not part of the Miller DSL's grammar. For example, this doesn't work yet:
mlr -n put ' end { x = 3; y = (func (e) { return e**7 })(x); print y; } '
mlr: cannot parse DSL expression. Parse error on token "(" at line 4 column 35. Please check for missing semicolon. Expected one of: ; } > >> | ? || ^^ && ?? ??? =~ !=~ == != <=> >= < <= ^ & << >>> + - .+ .- * / // % .* ./ .// . **
but this does:
mlr -n put ' end { x = 3; f = func (e) { return e**7 }; y = f(x); print y; } '
2187
Built-in functions currently unsupported as arguments¶
Built-in functions are, as of September 2021, a bit separate from user-defined functions internally to Miller, and can't be used directly as arguments to higher-order functions.
For example, this doesn't work yet:
mlr -n put ' end { notches = [0,1,2,3]; radians = apply(notches, func (e) { return e * M_PI / 8 }); cosines = apply(radians, cos); print cosines; } '
mlr: apply: second argument must be a function; got absent.
but this does:
mlr -n put ' end { notches = [0,1,2,3]; radians = apply(notches, func (e) { return e * M_PI / 8 }); # cosines = apply(radians, cos); cosines = apply(radians, func (e) { return cos(e) }); print cosines; } '
[1, 0.9238795325112867, 0.7071067811865476, 0.38268343236508984]