Overview: • About Miller • Miller in 10 minutes • File formats • Miller features in the context of the Unix toolkit • Record-heterogeneity • Internationalization Using Miller: • FAQ • Cookbook part 1 • Cookbook part 2 • Cookbook part 3 • Data-diving examples • Manpage • Reference • Reference: Verbs • Reference: DSL • Documents by release • Installation, portability, dependencies, and testing Background: • Why? • Why C? • Why call it Miller? • How original is Miller? • Performance Repository: • Things to do • Contact information • GitHub repo |
• I/O options • Formats • In-place mode • Compression • Record/field/pair separators • Number formatting • Data transformations (verbs) • Expression language for filter and put • then-chaining • Auxiliary commands • Data types • Null data: empty and absent • String literals • Regular expressions • Regex captures • Arithmetic • Input scanning • Conversion by math routines • Conversion by arithmetic operators • Pythonic division • On-line help Command overviewWhereas the Unix toolkit is made of the separate executables cat, tail, cut, sort, etc., Miller has subcommands, invoked as follows: mlr tac *.dat mlr cut --complement -f os_version *.dat mlr sort -f hostname,uptime *.dat
I/O optionsFormatsOptions:--dkvp --idkvp --odkvp --nidx --inidx --onidx --csv --icsv --ocsv --csvlite --icsvlite --ocsvlite --pprint --ipprint --opprint --right --xtab --ixtab --oxtab --json --ijson --ojsonThese are as discussed in File formats, with the exception of --right which makes pretty-printed output right-aligned:
In-place modeUse the mlr -I flag to process files in-place. For example, mlr -I --csv cut -x -f unwanted_column_name mydata/*.csv will remove unwanted_column_name from all your *.csv files in your mydata/ subdirectory. By default, Miller output goes to the screen (or you can redirect a file using > or to another process using |). With -I, for each file name on the command line, output is written to a temporary file in the same directory. Miller writes its output into that temp file, which is then renamed over the original. Then, processing continues on the next file. Each file is processed in isolation: if the output format is CSV, CSV headers will be present in each output file; statistics are only over each file's own records; and so on. Please see here for examples.CompressionOptions:--prepipe {command}The prepipe command is anything which reads from standard input and produces data acceptable to Miller. Nominally this allows you to use whichever decompression utilities you have installed on your system, on a per-file basis. If the command has flags, quote them: e.g. mlr --prepipe 'zcat -cf'. Examples: # These two produce the same output: $ gunzip < myfile1.csv.gz | mlr cut -f hostname,uptime $ mlr --prepipe gunzip cut -f hostname,uptime myfile1.csv.gz # With multiple input files you need --prepipe: $ mlr --prepipe gunzip cut -f hostname,uptime myfile1.csv.gz myfile2.csv.gz $ mlr --prepipe gunzip --idkvp --oxtab cut -f hostname,uptime myfile1.dat.gz myfile2.dat.gz # Similar to the above, but with compressed output as well as input: $ gunzip < myfile1.csv.gz | mlr cut -f hostname,uptime | gzip > outfile.csv.gz $ mlr --prepipe gunzip cut -f hostname,uptime myfile1.csv.gz | gzip > outfile.csv.gz $ mlr --prepipe gunzip cut -f hostname,uptime myfile1.csv.gz myfile2.csv.gz | gzip > outfile.csv.gz # Similar to the above, but with different compression tools for input and output: $ gunzip < myfile1.csv.gz | mlr cut -f hostname,uptime | xz -z > outfile.csv.xz $ xz -cd < myfile1.csv.xz | mlr cut -f hostname,uptime | gzip > outfile.csv.xz $ mlr --prepipe 'xz -cd' cut -f hostname,uptime myfile1.csv.xz myfile2.csv.xz | xz -z > outfile.csv.xz ... etc. Record/field/pair separatorsMiller has record separators IRS and ORS, field separators IFS and OFS, and pair separators IPS and OPS. For example, in the DKVP line a=1,b=2,c=3, the record separator is newline, field separator is comma, and pair separator is the equals sign. These are the default values. Options:--rs --irs --ors --fs --ifs --ofs --repifs --ps --ips --ops
Number formattingThe command-line option --ofmt {format string} is the global number format for commands which generate numeric output, e.g. stats1, stats2, histogram, and step, as well as mlr put. Examples:--ofmt %.9le --ofmt %.6lf --ofmt %.0lf $ echo 'x=3.1,y=4.3' | mlr put '$z=fmtnum($x*$y,"%08lf")' x=3.1,y=4.3,z=13.330000 $ echo 'x=0xffff,y=0xff' | mlr put '$z=fmtnum(int($x*$y),"%08llx")' x=0xffff,y=0xff,z=00feff01 $ echo 'x=0xffff,y=0xff' | mlr put '$z=hexfmt($x*$y)' x=0xffff,y=0xff,z=0xfeff01 Data transformations (verbs)
Please see the separate page here.
Expression language for filter and put
Please see the separate page here.
then-chaining
In accord with the
Unix philosophy, you can pipe data into or out of
Miller. For example:
mlr cut --complement -f os_version *.dat | mlr sort -f hostname,uptime mlr cut --complement -f os_version then sort -f hostname,uptime *.dat % cat piped.sh mlr cut -x -f i,y data/big | mlr sort -n y > /dev/null % time sh piped.sh real 0m2.828s user 0m3.183s sys 0m0.137s % cat chained.sh mlr cut -x -f i,y then sort -n y data/big > /dev/null % time sh chained.sh real 0m2.082s user 0m1.933s sys 0m0.137s Auxiliary commands
There are a few nearly-standalone programs which have nothing to do with the rest of Miller, do not
participate in record streams, and do not deal with file formats. They might as well be little standalone executables
but they’re delivered within the main Miller executable for convenience.
$ mlr aux-list Available subcommands: aux-list lecat termcvt hex unhex netbsd-strptime For more information, please invoke mlr {subcommand} --help $ echo 'Hello, world!' | mlr hex 00000000: 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 0a |Hello, world!.| $ echo 'Hello, world!' | mlr hex -r 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 0a $ echo 'Hello, world!' | mlr hex -r | mlr unhex Hello, world! $ echo 'Hello, world!' | mlr lecat --mono Hello, world![LF] Data types
Miller’s input and output are all string-oriented: there is (as of
August 2015 anyway) no support for binary record packing. In this sense,
everything is a string in and out of Miller. During processing, field names
are always strings, even if they have names like "3"; field values are usually
strings. Field values’ ability to be interpreted as a non-string type
only has meaning when comparison or function operations are done on them. And
it is an error condition if Miller encounters non-numeric (or otherwise
mistyped) data in a field in which it has been asked to do numeric (or
otherwise type-specific) operations.
Field values are treated as numeric for the following:
Null data: empty and absent
One of Miller’s key features is its support for heterogeneous
data. For example, take mlr sort: if you try to sort on field
hostname when not all records in the data stream have a field
named hostname, it is not an error (although you could pre-filter the
data stream using mlr having-fields --at-least hostname then sort
...). Rather, records lacking one or more sort keys are simply output
contiguously by mlr sort.
Miller has two kinds of null data:
$ mlr cat data/sort-null.dat a=3,b=2 a=1,b=8 a=,b=4 x=9,b=10 a=5,b=7 $ mlr sort -n a data/sort-null.dat a=1,b=8 a=3,b=2 a=5,b=7 a=,b=4 x=9,b=10 $ mlr sort -nr a data/sort-null.dat a=,b=4 a=5,b=7 a=3,b=2 a=1,b=8 x=9,b=10 $ echo 'x=2,y=3' | mlr put '$a=$x+$y' x=2,y=3,a=5 $ echo 'x=,y=3' | mlr put '$a=$x+$y' x=,y=3,a= $ echo 'x=,y=3' | mlr put '$a=log($x);$b=log($y)' x=,y=3,a=,b=1.098612 $ echo 'x=,y=3' | mlr put '$a=min($x,$y);$b=max($x,$y)' x=,y=3,a=3,b=3 $ echo 'x=2,y=3' | mlr put '$a=$u+$v; $b=$u+$y; $c=$x+$y' x=2,y=3,b=3,c=5 $ echo 'x=2,y=3' | mlr put '$a=min($x,$v);$b=max($u,$y);$c=min($u,$v)' x=2,y=3,a=2,b=3
$ mlr cat data/het.dkvp resource=/path/to/file,loadsec=0.45,ok=true record_count=100,resource=/path/to/file resource=/path/to/second/file,loadsec=0.32,ok=true record_count=150,resource=/path/to/second/file resource=/some/other/path,loadsec=0.97,ok=false $ mlr put 'is_present($loadsec) { $loadmillis = $loadsec * 1000 }' data/het.dkvp resource=/path/to/file,loadsec=0.45,ok=true,loadmillis=450.000000 record_count=100,resource=/path/to/file resource=/path/to/second/file,loadsec=0.32,ok=true,loadmillis=320.000000 record_count=150,resource=/path/to/second/file resource=/some/other/path,loadsec=0.97,ok=false,loadmillis=970.000000 $ mlr put '$loadmillis = (is_present($loadsec) ? $loadsec : 0.0) * 1000' data/het.dkvp resource=/path/to/file,loadsec=0.45,ok=true,loadmillis=450.000000 record_count=100,resource=/path/to/file,loadmillis=0.000000 resource=/path/to/second/file,loadsec=0.32,ok=true,loadmillis=320.000000 record_count=150,resource=/path/to/second/file,loadmillis=0.000000 resource=/some/other/path,loadsec=0.97,ok=false,loadmillis=970.000000 $ mlr --print-type-arithmetic-info (+) | error absent empty string int float bool ------ + ------ ------ ------ ------ ------ ------ ------ error | error error error error error error error absent | error absent absent error int float error empty | error absent empty error empty empty error string | error error error error error error error int | error int empty error int float error float | error float empty error float float error bool | error error error error error error error String literals
You can use the following backslash escapes for strings such as between the double quotes in contexts such as
mlr filter '$name =~ "..."',
mlr put '$name = $othername . "..."',
mlr put '$name = sub($name, "...", "..."), etc.:
Regular expressions
Miller lets you use regular expressions (of type POSIX.2) in the following contexts:
$ cat data/regex-in-data.dat name=jane,regex=^j.*e$ name=bill,regex=^b[ou]ll$ name=bull,regex=^b[ou]ll$ $ mlr filter '$name =~ $regex' data/regex-in-data.dat name=jane,regex=^j.*e$ name=bull,regex=^b[ou]ll$ Regex capturesRegex captures of the form \0 through \9 are supported as follows:
mlr put '$b = sub($a, "(..)_(...)", "\2-\1"); $c = sub($a, "(..)_(.)(..)", ":\1:\2:\3")' mlr put '$a =~ "(..)_(....); $b = "left_\1"; $c = "right_\2"' mlr put '$a =~ "(..)_(....)' then {... something else ...} then put '$b = "left_\1"; $c = "right_\2"' mlr filter '$a =~ "(..)_(....)' ArithmeticInput scanningNumbers in Miller are double-precision float or 64-bit signed integers. Anything scannable as int, e.g 123 or 0xabcd, is treated as an integer; otherwise, input scannable as float (4.56 or 8e9) is treated as float; everything else is a string. If you want all numbers to be treated as floats, then you may use float() in your filter/put expressions (e.g. replacing $c = $a * $b with $c = float($a) * float($b)) — or, more simply, use mlr filter -F and mlr put -F which forces all numeric input, whether from expression literals or field values, to float. Likewise mlr stats1 -F and mlr step -F force integerable accumulators (such as count) to be done in floating-point.Conversion by math routinesFor most math functions, integers are cast to float on input, and produce float output: e.g. exp(0) = 1.0 rather than 1. The following, however, produce integer output if their inputs are integers: + - * / // % abs ceil floor max min round roundm sgn. As well, stats1 -a min, stats1 -a max, stats1 -a sum, step -a delta, and step -a rsum produce integer output if their inputs are integers.Conversion by arithmetic operatorsThe sum, difference, and product of integers is again integer, except for when that would overflow a 64-bit integer at which point Miller converts the result to float. The short of it is that Miller does this transparently for you so you needn’t think about it. Implementation details of this, for the interested: integer adds and subtracts overflow by at most one bit so it suffices to check sign-changes. Thus, Miller allows you to add and subtract arbitrary 64-bit signed integers, converting only to float precisely when the result is less than -263 or greater than 263-1. Multiplies, on the other hand, can overflow by a word size and a sign-change technique does not suffice to detect overflow. Instead Miller tests whether the floating-point product exceeds the representable integer range. Now, 64-bit integers have 64-bit precision while IEEE-doubles have only 52-bit mantissas — so, there are 53 bits including implicit leading one. The following experiment explicitly demonstrates the resolution at this range:64-bit integer 64-bit integer Casted to double Back to 64-bit in hex in decimal integer 0x7ffffffffffff9ff 9223372036854774271 9223372036854773760.000000 0x7ffffffffffff800 0x7ffffffffffffa00 9223372036854774272 9223372036854773760.000000 0x7ffffffffffff800 0x7ffffffffffffbff 9223372036854774783 9223372036854774784.000000 0x7ffffffffffffc00 0x7ffffffffffffc00 9223372036854774784 9223372036854774784.000000 0x7ffffffffffffc00 0x7ffffffffffffdff 9223372036854775295 9223372036854774784.000000 0x7ffffffffffffc00 0x7ffffffffffffe00 9223372036854775296 9223372036854775808.000000 0x8000000000000000 0x7ffffffffffffffe 9223372036854775806 9223372036854775808.000000 0x8000000000000000 0x7fffffffffffffff 9223372036854775807 9223372036854775808.000000 0x8000000000000000 Pythonic divisionDivision and remainder are pythonic:
On-line help
Examples:
$ mlr --help Usage: mlr [I/O options] {verb} [verb-dependent options ...] {zero or more file names} Command-line-syntax examples: mlr --csv cut -f hostname,uptime mydata.csv mlr --tsv --rs lf filter '$status != "down" && $upsec >= 10000' *.tsv mlr --nidx put '$sum = $7 < 0.0 ? 3.5 : $7 + 2.1*$8' *.dat grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group mlr join -j account_id -f accounts.dat then group-by account_name balances.dat mlr --json put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/*.json mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/* mlr stats2 -a linreg-pca -f u,v -g shape data/* mlr put -q '@sum[$a][$b] += $x; end {emit @sum, "a", "b"}' data/* mlr --from estimates.tbl put ' for (k,v in $*) { if (is_numeric(v) && k =~ "^[t-z].*$") { $sum += v; $count += 1 } } $mean = $sum / $count # no assignment if count unset' mlr --from infile.dat put -f analyze.mlr mlr --from infile.dat put 'tee > "./taps/data-".$a."-".$b, $*' mlr --from infile.dat put 'tee | "gzip > ./taps/data-".$a."-".$b.".gz", $*' mlr --from infile.dat put -q '@v=$*; dump | "jq .[]"' mlr --from infile.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}' Data-format examples: DKVP: delimited key-value pairs (Miller default format) +---------------------+ | apple=1,bat=2,cog=3 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3" | dish=7,egg=8,flint | Record 2: "dish" => "7", "egg" => "8", "3" => "flint" +---------------------+ NIDX: implicitly numerically indexed (Unix-toolkit style) +---------------------+ | the quick brown | Record 1: "1" => "the", "2" => "quick", "3" => "brown" | fox jumped | Record 2: "1" => "fox", "2" => "jumped" +---------------------+ CSV/CSV-lite: comma-separated values with separate header line +---------------------+ | apple,bat,cog | | 1,2,3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3" | 4,5,6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6" +---------------------+ Tabular JSON: nested objects are supported, although arrays within them are not: +---------------------+ | { | | "apple": 1, | Record 1: "apple" => "1", "bat" => "2", "cog" => "3" | "bat": 2, | | "cog": 3 | | } | | { | | "dish": { | Record 2: "dish:egg" => "7", "dish:flint" => "8", "garlic" => "" | "egg": 7, | | "flint": 8 | | }, | | "garlic": "" | | } | +---------------------+ PPRINT: pretty-printed tabular +---------------------+ | apple bat cog | | 1 2 3 | Record 1: "apple => "1", "bat" => "2", "cog" => "3" | 4 5 6 | Record 2: "apple" => "4", "bat" => "5", "cog" => "6" +---------------------+ XTAB: pretty-printed transposed tabular +---------------------+ | apple 1 | Record 1: "apple" => "1", "bat" => "2", "cog" => "3" | bat 2 | | cog 3 | | | | dish 7 | Record 2: "dish" => "7", "egg" => "8" | egg 8 | +---------------------+ Markdown tabular (supported for output only): +-----------------------+ | | apple | bat | cog | | | | --- | --- | --- | | | | 1 | 2 | 3 | | Record 1: "apple => "1", "bat" => "2", "cog" => "3" | | 4 | 5 | 6 | | Record 2: "apple" => "4", "bat" => "5", "cog" => "6" +-----------------------+ Help options: -h or --help Show this message. --version Show the software version. {verb name} --help Show verb-specific help. --help-all-verbs Show help on all verbs. -l or --list-all-verbs List only verb names. -L List only verb names, one per line. -f or --help-all-functions Show help on all built-in functions. -F Show a bare listing of built-in functions by name. -k or --help-all-keywords Show help on all keywords. -K Show a bare listing of keywords by name. Verbs: bar bootstrap cat check count-distinct cut decimate filter fraction grep group-by group-like having-fields head histogram join label least-frequent merge-fields most-frequent nest nothing put regularize rename reorder repeat reshape sample sec2gmt sec2gmtdate seqgen shuffle sort stats1 stats2 step tac tail tee top uniq unsparsify Functions for the filter and put verbs: + + - - * / // % ** | ^ & ~ << >> == != =~ !=~ > >= < <= && || ^^ ! ? : . gsub strlen sub substr tolower toupper abs acos acosh asin asinh atan atan2 atanh cbrt ceil cos cosh erf erfc exp expm1 floor invqnorm log log10 log1p logifit madd max mexp min mmul msub pow qnorm round roundm sgn sin sinh sqrt tan tanh urand urand32 urandint dhms2fsec dhms2sec fsec2dhms fsec2hms gmt2sec hms2fsec hms2sec sec2dhms sec2gmt sec2gmt sec2gmtdate sec2hms strftime strptime systime is_absent is_bool is_boolean is_empty is_empty_map is_float is_int is_map is_nonempty_map is_not_empty is_not_map is_not_null is_null is_numeric is_present is_string asserting_absent asserting_bool asserting_boolean asserting_empty asserting_empty_map asserting_float asserting_int asserting_map asserting_nonempty_map asserting_not_empty asserting_not_map asserting_not_null asserting_null asserting_numeric asserting_present asserting_string boolean float fmtnum hexfmt int string typeof depth haskey joink joinkv joinv leafcount length mapdiff mapexcept mapselect mapsum splitkv splitkvx splitnv splitnvx Please use "mlr --help-function {function name}" for function-specific help. Data-format options, for input, output, or both: --idkvp --odkvp --dkvp Delimited key-value pairs, e.g "a=1,b=2" (this is Miller's default format). --inidx --onidx --nidx Implicitly-integer-indexed fields (Unix-toolkit style). --icsv --ocsv --csv Comma-separated value (or tab-separated with --fs tab, etc.) --itsv --otsv --tsv Keystroke-savers for "--icsv --ifs tab", "--ocsv --ofs tab", "--csv --fs tab". --ipprint --opprint --pprint Pretty-printed tabular (produces no output until all input is in). --right Right-justifies all fields for PPRINT output. --barred Prints a border around PPRINT output (only available for output). --omd Markdown-tabular (only available for output). --ixtab --oxtab --xtab Pretty-printed vertical-tabular. --xvright Right-justifies values for XTAB format. --ijson --ojson --json JSON tabular: sequence or list of one-level maps: {...}{...} or [{...},{...}]. --json-map-arrays-on-input JSON arrays are unmillerable. --json-map-arrays-on-input --json-skip-arrays-on-input is the default: arrays are converted to integer-indexed --json-fatal-arrays-on-input maps. The other two options cause them to be skipped, or to be treated as errors. Please use the jq tool for full JSON (pre)processing. --jvstack Put one key-value pair per line for JSON output. --jlistwrap Wrap JSON output in outermost [ ]. --jknquoteint Do not quote non-string map keys in JSON output. --jvquoteall Quote map values in JSON output, even if they're numeric. --jflatsep {string} Separator for flattening multi-level JSON keys, e.g. '{"a":{"b":3}}' becomes a:b => 3 for non-JSON formats. Defaults to :. -p is a keystroke-saver for --nidx --fs space --repifs Examples: --csv for CSV-formatted input and output; --idkvp --opprint for DKVP-formatted input and pretty-printed output. Format-conversion keystroke-saver options, for input, output, or both: As keystroke-savers for format-conversion you may use the following: --c2t --c2d --c2n --c2j --c2x --c2p --c2m --t2c --t2d --t2n --t2j --t2x --t2p --t2m --d2c --d2t --d2n --d2j --d2x --d2p --d2m --n2c --n2t --n2d --n2j --n2x --n2p --n2m --j2c --j2t --j2d --j2n --j2x --j2p --j2m --x2c --x2t --x2d --x2n --x2j --x2p --x2m --p2c --p2t --p2d --p2n --p2j --p2x --p2m The letters c t d n j x p m refer to formats CSV, TSV, DKVP, NIDX, JSON, XTAB, PPRINT, and markdown, respectively. Note that markdown format is available for output only. Compressed-data options: --prepipe {command} This allows Miller to handle compressed inputs. You can do without this for single input files, e.g. "gunzip < myfile.csv.gz | mlr ...". However, when multiple input files are present, between-file separations are lost; also, the FILENAME variable doesn't iterate. Using --prepipe you can specify an action to be taken on each input file. This pre-pipe command must be able to read from standard input; it will be invoked with {command} < {filename}. Examples: mlr --prepipe 'gunzip' mlr --prepipe 'zcat -cf' mlr --prepipe 'xz -cd' mlr --prepipe cat Note that this feature is quite general and is not limited to decompression utilities. You can use it to apply per-file filters of your choice. For output compression (or other) utilities, simply pipe the output: mlr ... | {your compression command} Separator options, for input, output, or both: --rs --irs --ors Record separators, e.g. 'lf' or '\r\n' --fs --ifs --ofs --repifs Field separators, e.g. comma --ps --ips --ops Pair separators, e.g. equals sign Notes about line endings: * Default line endings (--irs and --ors) are "auto" which means autodetect from the input file format, as long as the input file(s) have lines ending in either LF (also known as linefeed, '\n', 0x0a, Unix-style) or CRLF (also known as carriage-return/linefeed pairs, '\r\n', 0x0d 0x0a, Windows style). * If both irs and ors are auto (which is the default) then LF input will lead to LF output and CRLF input will lead to CRLF output, regardless of the platform you're running on. * The line-ending autodetector triggers on the first line ending detected in the input stream. E.g. if you specify a CRLF-terminated file on the command line followed by an LF-terminated file then autodetected line endings will be CRLF. * If you use --ors {something else} with (default or explicitly specified) --irs auto then line endings are autodetected on input and set to what you specify on output. * If you use --irs {something else} with (default or explicitly specified) --ors auto then the output line endings used are LF on Unix/Linux/BSD/MacOSX, and CRLF on Windows. Notes about all other separators: * IPS/OPS are only used for DKVP and XTAB formats, since only in these formats do key-value pairs appear juxtaposed. * IRS/ORS are ignored for XTAB format. Nominally IFS and OFS are newlines; XTAB records are separated by two or more consecutive IFS/OFS -- i.e. a blank line. Everything above about --irs/--ors/--rs auto becomes --ifs/--ofs/--fs auto for XTAB format. (XTAB's default IFS/OFS are "auto".) * OFS must be single-character for PPRINT format. This is because it is used with repetition for alignment; multi-character separators would make alignment impossible. * OPS may be multi-character for XTAB format, in which case alignment is disabled. * TSV is simply CSV using tab as field separator ("--fs tab"). * FS/PS are ignored for markdown format; RS is used. * All FS and PS options are ignored for JSON format, since they are not relevant to the JSON format. * You can specify separators in any of the following ways, shown by example: - Type them out, quoting as necessary for shell escapes, e.g. "--fs '|' --ips :" - C-style escape sequences, e.g. "--rs '\r\n' --fs '\t'". - To avoid backslashing, you can use any of the following names: cr crcr newline lf lflf crlf crlfcrlf tab space comma pipe slash colon semicolon equals * Default separators by format: File format RS FS PS dkvp auto , = json auto (N/A) (N/A) nidx auto space (N/A) csv auto , (N/A) csvlite auto , (N/A) markdown auto (N/A) (N/A) pprint auto space (N/A) xtab (N/A) auto space Relevant to CSV/CSV-lite input only: --implicit-csv-header Use 1,2,3,... as field labels, rather than from line 1 of input files. Tip: combine with "label" to recreate missing headers. --headerless-csv-output Print only CSV data lines. Double-quoting for CSV output: --quote-all Wrap all fields in double quotes --quote-none Do not wrap any fields in double quotes, even if they have OFS or ORS in them --quote-minimal Wrap fields in double quotes only if they have OFS or ORS in them (default) --quote-numeric Wrap fields in double quotes only if they have numbers in them --quote-original Wrap fields in double quotes if and only if they were quoted on input. This isn't sticky for computed fields: e.g. if fields a and b were quoted on input and you do "put '$c = $a . $b'" then field c won't inherit a or b's was-quoted-on-input flag. Numerical formatting: --ofmt {format} E.g. %.18lf, %.0lf. Please use sprintf-style codes for double-precision. Applies to verbs which compute new values, e.g. put, stats1, stats2. See also the fmtnum function within mlr put (mlr --help-all-functions). Defaults to %lf. Other options: --seed {n} with n of the form 12345678 or 0xcafefeed. For put/filter urand()/urandint()/urand32(). --nr-progress-mod {m}, with m a positive integer: print filename and record count to stderr every m input records. --from {filename} Use this to specify an input file before the verb(s), rather than after. May be used more than once. Example: "mlr --from a.dat --from b.dat cat" is the same as "mlr cat a.dat b.dat". -n Process no input files, nor standard input either. Useful for mlr put with begin/end statements only. (Same as --from /dev/null.) Also useful in "mlr -n put -v '...'" for analyzing abstract syntax trees (if that's your thing). -I Process files in-place. For each file name on the command line, output is written to a temp file in the same directory, which is then renamed over the original. Each file is processed in isolation: if the output format is CSV, CSV headers will be present in each output file; statistics are only over each file's own records; and so on. Then-chaining: Output of one verb may be chained as input to another using "then", e.g. mlr stats1 -a min,mean,max -f flag,u,v -g color then sort -f color For more information please see http://johnkerl.org/miller/doc and/or http://github.com/johnkerl/miller. This is Miller version v5.2.0. $ mlr sort --help Usage: mlr sort {flags} Flags: -f {comma-separated field names} Lexical ascending -n {comma-separated field names} Numerical ascending; nulls sort last -nf {comma-separated field names} Numerical ascending; nulls sort last -r {comma-separated field names} Lexical descending -nr {comma-separated field names} Numerical descending; nulls sort first Sorts records primarily by the first specified field, secondarily by the second field, and so on. (Any records not having all specified sort keys will appear at the end of the output, in the order they were encountered, regardless of the specified sort order.) The sort is stable: records that compare equal will sort in the order they were encountered in the input record stream. Example: mlr sort -f a,b -nr x,y,z which is the same as: mlr sort -f a -f b -nr x -nr y -nr z |