Skip to content

Special symbols and formatting

How can I handle commas-as-data in various formats?

CSV handles this well and by design:

cat commas.csv
"Xiao, Lin",administrator
"Khavari, Darius",tester

Likewise JSON:

mlr --icsv --ojson cat commas.csv
  "Name": "Xiao, Lin",
  "Role": "administrator"
  "Name": "Khavari, Darius",
  "Role": "tester"

For Miller's XTAB there is no escaping for carriage returns, but commas work fine:

mlr --icsv --oxtab cat commas.csv
Name Xiao, Lin
Role administrator

Name Khavari, Darius
Role tester

But for key-value-pairs and index-numbered formats, commas are the default field separator. And -- as of Miller 5.4.0 anyway -- there is no CSV-style double-quote-handling like there is for CSV. So commas within the data look like delimiters:

mlr --icsv --odkvp cat commas.csv
Name=Xiao, Lin,Role=administrator
Name=Khavari, Darius,Role=tester

One solution is to use a different delimiter, such as a pipe character:

mlr --icsv --odkvp --ofs pipe cat commas.csv
Name=Xiao, Lin|Role=administrator
Name=Khavari, Darius|Role=tester

To be extra-sure to avoid data/delimiter clashes, you can also use control characters as delimiters -- here, control-A:

mlr --icsv --odkvp --ofs '\001'  cat commas.csv | cat -v
Name=Xiao, Lin^ARole=administrator
Name=Khavari, Darius^ARole=tester

How can I handle field names with special symbols in them?

Simply surround the field names with curly braces:

echo 'x.a=3,y:b=4,z/c=5' | mlr put '${product.all} = ${x.a} * ${y:b} * ${z/c}'

How can I put single quotes into strings?

This is a little tricky due to the shell's handling of quotes. For simplicity, let's first put an update script into a file:

$a = "It's OK, I said, then 'for now'."
echo a=bcd | mlr put -f data/single-quote-example.mlr
a=It's OK, I said, then 'for now'.

So: Miller's DSL uses double quotes for strings, and you can put single quotes (or backslash-escaped double-quotes) inside strings, no problem.

Without putting the update expression in a file, it's messier:

echo a=bcd | mlr put '$a="It'\''s OK, I said, '\''for now'\''."'
a=It's OK, I said, 'for now'.

The idea is that the outermost single-quotes are to protect the put expression from the shell, and the double quotes within them are for Miller. To get a single quote in the middle there, you need to actually put it outside the single-quoting for the shell. The pieces are the following, all concatenated together:

  • $a="It
  • \'
  • s OK, I said,
  • \'
  • for now
  • \'
  • .

How to escape '?' in regexes?

One way is to use square brackets; an alternative is to use simple string-substitution rather than a regular expression.

cat data/question.dat
a=is it?,b=it is!
mlr --oxtab put '$c = gsub($a, "[?]"," ...")' data/question.dat
a is it?
b it is!
c is it ...
mlr --oxtab put '$c = ssub($a, "?"," ...")' data/question.dat
a is it?
b it is!
c is it ...

The ssub function exists precisely for this reason: so you don't have to escape anything.