Skip to content
Quick links:   Flags   Verbs   Functions   Glossary   Release docs

Data types

List of types

Miller's types are:

See also the list of type-checking functions for the Miller programming language.

See also Differences from other programming languages.

Type inference for literal and record data

Miller's input and output are all text-oriented: all the file formats supported by Miller are human-readable text, such as CSV, TSV, and JSON; binary formats such as BSON and Parquet are not supported (as of mid-2021). In this sense, everything is a string in and out of Miller -- be it in data files, or in DSL expressions you key in.

In the DSL, 7 is an int and 8.9 is a float, as one would expect. Likewise, on input from data files, string values representable as numbers, e.g. 1.2 or 3, are treated as int or float, respectively. If a record has x=1,y=2 then mlr put '$z=$x+$y' will produce x=1,y=2,z=3.

Numbers retain their original string representation, so if x is 1.2 on one record and 1.200 on another, they'll print out that way on output (unless of course they've been modified during processing, e.g. mlr put '$x = $x + 10).

Generally strings, numbers, and booleans don't mix; use type-casting like string($x) to convert. However, the dot (string-concatenation) operator has been special-cased: mlr put '$z=$x.$y' does not give an error, because the dot operator has been generalized to stringify non-strings

Examples:

mlr --csv cat data/type-infer.csv
a,b,c
1.2,3,true
4,5.6,buongiorno
mlr --icsv --oxtab --from data/type-infer.csv put '
  $d = $a . $c;
  $e = 7;
  $f = 8.9;
  $g = $e + $f;
  $ta = typeof($a);
  $tb = typeof($b);
  $tc = typeof($c);
  $td = typeof($d);
  $te = typeof($e);
  $tf = typeof($f);
  $tg = typeof($g);
' then reorder -f a,ta,b,tb,c,tc,d,td,e,te,f,tf,g,tg
a  1.2
ta float
b  3
tb int
c  true
tc string
d  1.2true
td string
e  7
te int
f  8.9
tf float
g  15.9
tg float

a  4
ta int
b  5.6
tb float
c  buongiorno
tc string
d  4buongiorno
td string
e  7
te int
f  8.9
tf float
g  15.9
tg float

On input, string values representable as boolean (e.g. "true", "false") are not automatically treated as boolean. This is because "true" and "false" are ordinary words, and auto string-to-boolean on a column consisting of words would result in some strings mixed with some booleans. Use the boolean function to coerce: e.g. giving the record x=1,y=2,w=false to mlr filter '$z=($x<$y) || boolean($w)'.

The same is true for inf, +inf, -inf, infinity, +infinity, -infinity, NaN, and all upper-cased/lower-cased/mixed-case variants of those. These are valid IEEE floating-point numbers, but Miller treats these as strings. You can explicit force conversion: if x=infinity in a data file, then typeof($x) is string but typeof(float($x)) is float.

JSON parse and stringify

If you have, say, a CSV file whose columns contain strings which are well-formatted JSON, they will not be auto-converted, but you can use the json-parse verb or the json_parse DSL function:

mlr --csv --from data/json-in-csv.csv cat
id,blob
100,"{""a"":1,""b"":[2,3,4]}"
105,"{""a"":6,""b"":[7,8,9]}"
mlr --icsv --ojson --from data/json-in-csv.csv cat
[
{
  "id": 100,
  "blob": "{\"a\":1,\"b\":[2,3,4]}"
},
{
  "id": 105,
  "blob": "{\"a\":6,\"b\":[7,8,9]}"
}
]
mlr --icsv --ojson --from data/json-in-csv.csv json-parse -f blob
[
{
  "id": 100,
  "blob": {
    "a": 1,
    "b": [2, 3, 4]
  }
},
{
  "id": 105,
  "blob": {
    "a": 6,
    "b": [7, 8, 9]
  }
}
]
mlr --icsv --ojson --from data/json-in-csv.csv put '$blob = json_parse($blob)'
[
{
  "id": 100,
  "blob": {
    "a": 1,
    "b": [2, 3, 4]
  }
},
{
  "id": 105,
  "blob": {
    "a": 6,
    "b": [7, 8, 9]
  }
}
]

These have their respective operations to convert back to string: the json-stringify verb and json_stringify DSL function.

Back to top