Skip to content

Arrays

Miller data types are listed on the Data types page; here we focus specifically on arrays.

Arrays are supported as of Miller 6, and constitute one of the major advantages of Miller 6.

Array literals

Array literals are written in square brackets braces with integer indices. Array slots can be any Miller data type (including other arrays, or maps).

mlr -n put '
  end {
    x = [ "a", 1, "b", {"x": 2, "y": [3,4,5]}, 99, true];
    print x;
  }
'
[
  "a",
  1,
  "b",
  {
    "x": 2,
    "y": [3, 4, 5]
  },
  99,
  true
]

As with maps and argument-lists, trailing commas are supported:

mlr -n put '
  end {
    x = [
      "a",
      "b",
      "c",
    ];
    print x;
  }
'
["a", "b", "c"]

Also note that several built-in functions operate on arrays and/or return arrays.

1-up indexing

The most important difference between Miller's arrays and arrays in other languages is that indices start with 1, not 0. (The same is true for Miller strings.) This is intentional.

1-up array indices may feel like a thing of the past, belonging to Fortran and Matlab, say; or R and Julia as well, which are more modern. But the overall trend is decidedly toward 0-up. This means that if Miller does 1-up array indices, it should do so for good reasons.

When arrays were introduced into Miller 6, it quickly became clear that 1-up indexing is the right thing for Miller. So many other things are already 1-up in Miller, and always have been, mostly inherited from AWK:

  • The awk-like built-in variables NF, NR, and FNR are 1-up in Miller. So for idioms like @records[NR] = $* it's natural to index from 1; @records[NR-1] = $* would be error-prone and would result in frequent off-by-one errors.
  • In particular, fields have always been indexed 1-up for NIDX and DKVP formats.
  • Regex captures run from "\1" to "\9" ("\0" is the entire match substring).

Negative-index aliasing

Imitating Python and other languages, you can use negative indices to read backward from the end of the array, while positive indices read forward from the start. If an array has length n then -n..-1 are aliases for 1..n, respectively; 0 is never a valid array index in Miller.

mlr -n put '
  end {
    x = [10, 20, 30, 40, 50];
    print x[1];
    print x[-1];
    print x[1:2];
    print x[-2:-1];
  }
'
10
50
[10, 20]
[40, 50]

Slicing

Miller supports slicing using [lo:hi] syntax. Either or both of the indices in a slice can be negatively aliased as described above. Unlike in Python, Miller array-slice indices are inclusive on both sides: x[3:5] means [x[3], x[4], x[5]].

mlr -n put '
  end {
    x = [10, 20, 30, 40, 50];
    print x[3:4];
    print x[:2];
    print x[3:];
    print x[1:-1];
    print x[2:-2];
  }
'
[30, 40]
[10, 20]
[30, 40, 50]
[10, 20, 30, 40, 50]
[20, 30, 40]

Out-of-bounds indexing

Somewhat imitating Python, out-of-bounds index accesses are absent, but out-of-bounds slice accesses result in trimming the indices, resulting in a short array or even the empty array:

mlr -n put '
  end {
    x = [10, 20, 30, 40, 50];
    print x[1];
    print x[5];
    print x[6]; # absent
  }
'
10
50

mlr -n put '
  end {
    x = [10, 20, 30, 40, 50];
    print x[1:2];
    print x[1:6];
    print x[10:20];
  }
'
[10, 20]
[10, 20, 30, 40, 50]
[]

Auto-create results in maps

As noted on the maps page, indexing any as-yet-assigned local variable or out-of-stream variable results in auto-create of that variable as a map variable:

mlr --csv --from example.csv put -q '
  # You can do this but you do not need to:
  # begin { @last_rates = {} }
  @last_rates[$shape] = $rate;
  end {
    dump @last_rates;
  }
'
{
  "triangle": 5.8240,
  "square": 8.2430,
  "circle": 8.3350
}

This also means that auto-create results in maps, not arrays, even if keys are integers. If you want to auto-extend an array, initialize it explicitly to [].

mlr --csv --from example.csv head -n 4 then put -q '
  begin {
    @my_array = [];
  }
  @my_array[NR] = $quantity;
  @my_map[NR] = $rate;
  end {
    dump
  }
'
{
  "my_array": [43.6498, 79.2778, 13.8103, 77.5542],
  "my_map": {
    "1": 9.8870,
    "2": 0.0130,
    "3": 2.9010,
    "4": 7.4670
  }
}

Auto-extend and null-gaps

Once an array is initialized, it can be extended by assigning to indices beyond its length. If each write is one past the end of the array, the array will grow by one. (Memory management, handled for you, is careful handled here in Miller: not to worry, capacity is doubled so performance doesn't suffer a rellocate on every single extend.)

This is important in Miller so you can do things like @records[NR] = $* with a minimum of keystrokes without worrying about explicitly resizing arrays. In particular, you can iteratively populate arrays as you read your data files, without having to first know how many records they have.

However, if an array is written to more than one past its end, values of type JSON-null are used to fill in the gaps. These are called null-gaps.

mlr -n put '
  end {
    no_gaps = [];
    no_gaps[1] = "a";
    no_gaps[2] = "b";

    gaps = [];
    gaps[1] = "a";
    gaps[5] = "e";

    print no_gaps;
    print gaps;
  }
'
["a", "b"]
["a", null, null, null, "e"]

Unset as shift

Unsetting an array index results in shifting all higher-index elements down by one:

mlr -n put '
  end {
    x = [ "a", "b", "c", "d", "e"];
    print x;
    unset x[2];
    print x;
  }
'
["a", "b", "c", "d", "e"]
["a", "c", "d", "e"]

More generally, you can get shift and pop operations by unsetting indices 1 and -1:

$ mlr repl -q
[mlr] x=[1,2,3,4,5]
[mlr] unset x[-1]
[mlr] x
[1, 2, 3, 4]
[mlr] unset x[-1]
[mlr] x
[1, 2, 3]
[mlr]
[mlr] x=[1,2,3,4,5]
[mlr] unset x[1]
[mlr] x
[2, 3, 4, 5]
[mlr] unset x[1]
[mlr] x
[3, 4, 5]
[mlr]

Looping

See single-variable for-loops and key-value for-loops.

Array-valued fields in CSV files

See the flatten/unflatten page.