Skip to content

DSL user-defined functions

As of Miller 5.0.0 you can define your own functions, as well as subroutines.

User-defined functions

Here's the obligatory example of a recursive function to compute the factorial function:

mlr --opprint --from data/small put '
    func f(n) {
        if (is_numeric(n)) {
            if (n > 0) {
                return n * f(n-1);
            } else {
                return 1;
            }
        }
        # implicitly return absent-null if non-numeric
    }
    $ox = f($x + NR);
    $oi = f($i);
'
a   b   i x        y        ox                 oi
pan pan 1 0.346791 0.726802 0.4670549976810001 1
eks pan 2 0.758679 0.522151 3.6808304227112796 2
wye wye 3 0.204603 0.338318 1.7412477437471126 6
eks wye 4 0.381399 0.134188 18.588317372151177 24
wye pan 5 0.573288 0.863624 211.38663947090302 120

Properties of user-defined functions:

  • Function bodies start with func and a parameter list, defined outside of begin, end, or other func or subr blocks. (I.e. the Miller DSL has no nested functions.)

  • A function (uniqified by its name) may not be redefined: either by redefining a user-defined function, or by redefining a built-in function. However, functions and subroutines have separate namespaces: you can define a subroutine log (for logging messages to stderr, say) which does not clash with the mathematical log (logarithm) function.

  • Functions may be defined either before or after use -- there is an object-binding/linkage step at startup. More specifically, functions may be either recursive or mutually recursive.

  • Functions may be defined and called either within mlr filter or mlr put.

  • Argument values may be reassigned: they are not read-only.

  • When a return value is not implicitly returned, this results in a return value of absent-null. (In the example above, if there were records for which the argument to f is non-numeric, the assignments would be skipped.) See also the null-data reference page.

  • See the section on Local variables for information on scope and extent of arguments, as well as for information on the use of local variables within functions.

  • See the section on Expressions from files for information on the use of -f and -e flags.

User-defined subroutines

Example:

mlr --opprint --from data/small put -q '
  begin {
    @call_count = 0;
  }
  subr s(n) {
    @call_count += 1;
    if (is_numeric(n)) {
      if (n > 1) {
        call s(n-1);
      } else {
        print "numcalls=" . @call_count;
      }
    }
  }
  print "NR=" . NR;
  call s(NR);
'
NR=1
numcalls=1
NR=2
numcalls=3
NR=3
numcalls=6
NR=4
numcalls=10
NR=5
numcalls=15

Properties of user-defined subroutines:

  • Subroutine bodies start with subr and a parameter list, defined outside of begin, end, or other func or subr blocks. (I.e. the Miller DSL has no nested subroutines.)

  • A subroutine (uniqified by its name) may not be redefined. However, functions and subroutines have separate namespaces: you can define a subroutine log which does not clash with the mathematical log function.

  • Subroutines may be defined either before or after use -- there is an object-binding/linkage step at startup. More specifically, subroutines may be either recursive or mutually recursive. Subroutines may call functions.

  • Subroutines may be defined and called either within mlr put or mlr put.

  • Subroutines have read/write access to $-variables and @-variables.

  • Argument values may be reassigned: they are not read-only.

  • See the section on local variables for information on scope and extent of arguments, as well as for information on the use of local variables within functions.

  • See the section on Expressions from files for information on the use of -f and -e flags.

Differences between functions and subroutines

Subroutines cannot return values, and they are invoked by the keyword call.

In hindsight, subroutines needn't have been invented. If foo is a function then you can write foo(1,2,3) while ignoring its return value, and that plays the role of subroutine quite well.

Loading a library of functions

If you have a file with UDFs you use frequently, say my-udfs.mlr, you can use --load or --mload to define them for your Miller scripts. For example, in your shell,

alias mlr='mlr --load ~/my-functions.mlr'

or

alias mlr='mlr --load /u/miller-udfs/'

See the miscellaneous-flags page for more information.