Sharing data with other languages

As discussed in the section on File formats, Miller supports several different file formats. Different tools are good at different things, so it’s important to be able to move data into and out of other languages. CSV and JSON are well-known, of course; here are some examples using DKVP format, with Ruby and Python.

DKVP I/O in Python

Here are the I/O routines:

#!/usr/bin/env python

# ================================================================
# Example of DKVP I/O using Python.
# Key point: Use Miller for what it's good at; pass data into/out of tools in
# other languages to do what they're good at.
#   bash$ python -i
#   # READ
#   >>> map = dkvpline2map('x=1,y=2', '=', ',')
#   >>> map
#   OrderedDict([('x', '1'), ('y', '2')])
#   # MODIFY
#   >>> map['z'] = map['x'] + map['y']
#   >>> map
#   OrderedDict([('x', '1'), ('y', '2'), ('z', 3)])
#   # WRITE
#   >>> line = map2dkvpline(map, '=', ',')
#   >>> line
#   'x=1,y=2,z=3'
# ================================================================

import re
import collections

# ----------------------------------------------------------------
# ips and ifs (input pair separator and input field separator) are nominally '=' and ','.
def dkvpline2map(line, ips, ifs):
	pairs = re.split(ifs, line)
	map = collections.OrderedDict()
	for pair in pairs:
		key, value = re.split(ips, pair, 1)

		# Type inference:
			value = int(value)
				value = float(value)

		map[key] = value
	return map

# ----------------------------------------------------------------
# ops and ofs (output pair separator and output field separator) are nominally '=' and ','.
def map2dkvpline(map , ops, ofs):
	line = ''
	pairs = []
	for key in map:
		pairs.append(str(key) + ops + str(map[key]))
	return str.join(ofs, pairs)

And here is an example using them:

$ cat polyglot-dkvp-io/
#!/usr/bin/env ruby

import sys
import re
import dkvp_io

while True:
	# Read the original record:
	line = sys.stdin.readline().strip()
	if line == '':
	map = dkvp_io.dkvpline2map(line, '=', ',')

	# Drop a field:

	# Compute some new fields:
	map['ab'] = map['a'] + map['b']
	map['iy'] = map['i'] + map['y']

	# Add new fields which show type of each already-existing field:
	keys = map.keys()
	for key in keys:
		# Convert "<type 'int'>" to just "int", etc.:
		type_string = str(map[key].__class__)
		type_string = re.sub("<type '", "", type_string)
		type_string = re.sub("'>", "", type_string)
		map['t'+key] = type_string

	# Write the modified record:
	print dkvp_io.map2dkvpline(map, '=', ',')

Run as-is:

$ python polyglot-dkvp-io/ < data/small

Run as-is, then pipe to Miller for pretty-printing:

$ python polyglot-dkvp-io/ < data/small | mlr --opprint cat
a   b   i y              ab     iy            ta  tb  ti  ty    tab tiy
pan pan 1 0.726802862743 panpan 1.72680286274 str str int float str float
eks pan 2 0.522151108333 ekspan 2.52215110833 str str int float str float
wye wye 3 0.338318525517 wyewye 3.33831852552 str str int float str float
eks wye 4 0.134188743284 ekswye 4.13418874328 str str int float str float
wye pan 5 0.863624469903 wyepan 5.8636244699  str str int float str float

DKVP I/O in Ruby

Here are the I/O routines:

#!/usr/bin/env ruby

# ================================================================
# Example of DKVP I/O using Ruby.
# Key point: Use Miller for what it's good at; pass data into/out of tools in
# other languages to do what they're good at.
#   bash$ irb -I. -r dkvp_io.rb
#   # READ
#   irb(main):001:0> map = dkvpline2map('x=1,y=2', '=', ',')
#   => {"x"=>"1", "y"=>"2"}
#   # MODIFY
#   irb(main):001:0> map['z'] = map['x'] + map['y']
#   => 3
#   # WRITE
#   irb(main):002:0> line = map2dkvpline(map, '=', ',')
#   => "x=1,y=2,z=3"
# ================================================================

# ----------------------------------------------------------------
# ips and ifs (input pair separator and input field separator) are nominally '=' and ','.
def dkvpline2map(line, ips, ifs)
  map = {}
  line.split(ifs).each do |pair|
    (k, v) = pair.split(ips, 2)

    # Type inference:
      v = Integer(v)
    rescue ArgumentError
        v = Float(v)
      rescue ArgumentError
        # Leave as string

    map[k] = v

# ----------------------------------------------------------------
# ops and ofs (output pair separator and output field separator) are nominally '=' and ','.
def map2dkvpline(map, ops, ofs)
  map.collect{|k,v| k.to_s + ops + v.to_s}.join(ofs)

And here is an example using them:

$ cat polyglot-dkvp-io/example.rb
#!/usr/bin/env ruby

require 'dkvp_io'

ARGF.each do |line|
  # Read the original record:
  map = dkvpline2map(line.chomp, '=', ',')

  # Drop a field:

  # Compute some new fields:
  map['ab'] = map['a'] + map['b']
  map['iy'] = map['i'] + map['y']

  # Add new fields which show type of each already-existing field:
  keys = map.keys
  keys.each do |key|
    map['t'+key] = map[key].class

  # Write the modified record:
  puts map2dkvpline(map, '=', ',')

Run as-is:

$ ruby -I./polyglot-dkvp-io polyglot-dkvp-io/example.rb data/small

Run as-is, then pipe to Miller for pretty-printing:

$ ruby -I./polyglot-dkvp-io polyglot-dkvp-io/example.rb data/small | mlr --opprint cat
a   b   i y                   ab     iy                 ta     tb     ti     ty    tab    tiy
pan pan 1 0.7268028627434533  panpan 1.7268028627434533 String String Fixnum Float String Float
eks pan 2 0.5221511083334797  ekspan 2.5221511083334796 String String Fixnum Float String Float
wye wye 3 0.33831852551664776 wyewye 3.3383185255166477 String String Fixnum Float String Float
eks wye 4 0.13418874328430463 ekswye 4.134188743284304  String String Fixnum Float String Float
wye pan 5 0.8636244699032729  wyepan 5.863624469903273  String String Fixnum Float String Float

