Special symbols and formatting¶
How can I handle commas-as-data in various formats?¶
CSV handles this well and by design:
cat commas.csv
Name,Role "Xiao, Lin",administrator "Khavari, Darius",tester
Likewise JSON:
mlr --icsv --ojson cat commas.csv
[ { "Name": "Xiao, Lin", "Role": "administrator" }, { "Name": "Khavari, Darius", "Role": "tester" } ]
For Miller's XTAB there is no escaping for carriage returns, but commas work fine:
mlr --icsv --oxtab cat commas.csv
Name Xiao, Lin Role administrator Name Khavari, Darius Role tester
But for key-value-pairs and index-numbered formats, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters:
mlr --icsv --odkvp cat commas.csv
Name=Xiao, Lin,Role=administrator Name=Khavari, Darius,Role=tester
One solution is to use a different delimiter, such as a pipe character:
mlr --icsv --odkvp --ofs pipe cat commas.csv
Name=Xiao, Lin|Role=administrator Name=Khavari, Darius|Role=tester
To be extra-sure to avoid data/delimiter clashes, you can also use control characters as delimiters -- here, control-A:
mlr --icsv --odkvp --ofs '\001' cat commas.csv | cat -v
Name=Xiao, Lin^ARole=administrator Name=Khavari, Darius^ARole=tester
How can I handle field names with special symbols in them?¶
Simply surround the field names with curly braces:
echo 'x.a=3,y:b=4,z/c=5' | mlr put '${product.all} = ${x.a} * ${y:b} * ${z/c}'
x.a=3,y:b=4,z/c=5,product.all=60
How can I put single quotes into strings?¶
This is a little tricky due to the shell's handling of quotes. For simplicity, let's first put an update script into a file:
$a = "It's OK, I said, then 'for now'."
echo a=bcd | mlr put -f data/single-quote-example.mlr
a=It's OK, I said, then 'for now'.
So: Miller's DSL uses double quotes for strings, and you can put single quotes (or backslash-escaped double-quotes) inside strings, no problem.
Without putting the update expression in a file, it's messier:
echo a=bcd | mlr put '$a="It'\''s OK, I said, '\''for now'\''."'
a=It's OK, I said, 'for now'.
The idea is that the outermost single-quotes are to protect the put
expression from the shell, and the double quotes within them are for Miller. To get a single quote in the middle there, you need to actually put it outside the single-quoting for the shell. The pieces are the following, all concatenated together:
$a="It
\'
s OK, I said,
\'
for now
\'
.
How to escape '?' in regexes?¶
One way is to use square brackets; an alternative is to use simple string-substitution rather than a regular expression.
cat data/question.dat
a=is it?,b=it is!
mlr --oxtab put '$c = gsub($a, "[?]"," ...")' data/question.dat
a is it? b it is! c is it ...
mlr --oxtab put '$c = ssub($a, "?"," ...")' data/question.dat
a is it? b it is! c is it ...
The ssub
function exists precisely for this reason: so you don't have to escape anything.
How to apply math to regex output?¶
- Use parentheses for capture groups
- Use
\1
,\2
, etc. to refer to the captures - The matched patterns are strings, so cast them to
int
orfloat
See also the page on regular expressions.
echo "a=14°45'" | mlr put '$a =~"^([0-9]+)°([0-9]+)" {$degrees = float("\1") + float("\2") / 60}'
a=14°45',degrees=14.75