Skip to content

Randomizing examples

Generating random numbers from various distributions

Here we can chain together a few simple building blocks:

cat expo-sample.sh
# Generate 100,000 pairs of independent and identically distributed
# exponentially distributed random variables with the same rate parameter
# (namely, 2.5). Then compute histograms of one of them, along with
# histograms for their sum and their product.
#
# See also https://en.wikipedia.org/wiki/Exponential_distribution
#
# Here I'm using a specified random-number seed so this example always
# produces the same output for this web document: in everyday practice we
# wouldn't do that.

mlr -n \
  --seed 0 \
  --opprint \
  seqgen --stop 100000 \
  then put '
    # https://en.wikipedia.org/wiki/Inverse_transform_sampling
    func expo_sample(lambda) {
      return -log(1-urand())/lambda
    }
    $u = expo_sample(2.5);
    $v = expo_sample(2.5);
    $s = $u + $v;
  ' \
  then histogram -f u,s --lo 0 --hi 2 --nbins 50 \
  then bar -f u_count,s_count --auto -w 20

Namely:

  • Set the Miller random-number seed so this webdoc looks the same every time I regenerate it.
  • Use pretty-printed tabular output.
  • Use seqgen to produce 100,000 records i=0, i=1, etc.
  • Send those to a put step which defines an inverse-transform-sampling function and calls it twice, then computes the sum and product of samples.
  • Send those to a histogram, and from there to a bar-plotter. This is just for visualization; you could just as well output CSV and send that off to your own plotting tool, etc.

The output is as follows:

sh expo-sample.sh
bin_lo bin_hi u_count                        s_count
0      0.04   [64]*******************#[9554] [326]#...................[3703]
0.04   0.08   [64]*****************...[9554] [326]*****...............[3703]
0.08   0.12   [64]****************....[9554] [326]*********...........[3703]
0.12   0.16   [64]**************......[9554] [326]************........[3703]
0.16   0.2    [64]*************.......[9554] [326]**************......[3703]
0.2    0.24   [64]************........[9554] [326]*****************...[3703]
0.24   0.28   [64]**********..........[9554] [326]******************..[3703]
0.28   0.32   [64]*********...........[9554] [326]******************..[3703]
0.32   0.36   [64]********............[9554] [326]*******************.[3703]
0.36   0.4    [64]*******.............[9554] [326]*******************#[3703]
0.4    0.44   [64]*******.............[9554] [326]*******************.[3703]
0.44   0.48   [64]******..............[9554] [326]*******************.[3703]
0.48   0.52   [64]*****...............[9554] [326]******************..[3703]
0.52   0.56   [64]*****...............[9554] [326]******************..[3703]
0.56   0.6    [64]****................[9554] [326]*****************...[3703]
0.6    0.64   [64]****................[9554] [326]******************..[3703]
0.64   0.68   [64]***.................[9554] [326]****************....[3703]
0.68   0.72   [64]***.................[9554] [326]****************....[3703]
0.72   0.76   [64]***.................[9554] [326]***************.....[3703]
0.76   0.8    [64]**..................[9554] [326]**************......[3703]
0.8    0.84   [64]**..................[9554] [326]*************.......[3703]
0.84   0.88   [64]**..................[9554] [326]************........[3703]
0.88   0.92   [64]**..................[9554] [326]************........[3703]
0.92   0.96   [64]*...................[9554] [326]***********.........[3703]
0.96   1      [64]*...................[9554] [326]**********..........[3703]
1      1.04   [64]*...................[9554] [326]*********...........[3703]
1.04   1.08   [64]*...................[9554] [326]********............[3703]
1.08   1.12   [64]*...................[9554] [326]********............[3703]
1.12   1.16   [64]*...................[9554] [326]********............[3703]
1.16   1.2    [64]*...................[9554] [326]*******.............[3703]
1.2    1.24   [64]#...................[9554] [326]******..............[3703]
1.24   1.28   [64]#...................[9554] [326]*****...............[3703]
1.28   1.32   [64]#...................[9554] [326]*****...............[3703]
1.32   1.36   [64]#...................[9554] [326]****................[3703]
1.36   1.4    [64]#...................[9554] [326]****................[3703]
1.4    1.44   [64]#...................[9554] [326]****................[3703]
1.44   1.48   [64]#...................[9554] [326]***.................[3703]
1.48   1.52   [64]#...................[9554] [326]***.................[3703]
1.52   1.56   [64]#...................[9554] [326]***.................[3703]
1.56   1.6    [64]#...................[9554] [326]**..................[3703]
1.6    1.64   [64]#...................[9554] [326]**..................[3703]
1.64   1.68   [64]#...................[9554] [326]**..................[3703]
1.68   1.72   [64]#...................[9554] [326]*...................[3703]
1.72   1.76   [64]#...................[9554] [326]*...................[3703]
1.76   1.8    [64]#...................[9554] [326]*...................[3703]
1.8    1.84   [64]#...................[9554] [326]#...................[3703]
1.84   1.88   [64]#...................[9554] [326]#...................[3703]
1.88   1.92   [64]#...................[9554] [326]#...................[3703]
1.92   1.96   [64]#...................[9554] [326]#...................[3703]
1.96   2      [64]#...................[9554] [326]#...................[3703]

Randomly selecting words from a list

Given this word list, first take a look to see what the first few lines look like:

head data/english-words.txt
a
aa
aal
aalii
aam
aardvark
aardwolf
aba
abac
abaca

Then the following will randomly sample ten words with four to eight characters in them:

mlr --from data/english-words.txt --nidx filter -S 'n=strlen($1);4<=n&&n<=8' then sample -k 10
thionine
birchman
mildewy
avigate
addedly
abaze
askant
aiming
insulant
coinmate

Randomly generating jabberwocky words

These are simple n-grams as described here. Some common functions are located here. Then here are scripts for 1-grams, 2-grams, 3-grams, 4-grams, and 5-grams.

The idea is that words from the input file are consumed, then taken apart and pasted back together in ways which imitate the letter-to-letter transitions found in the word list -- giving us automatically generated words in the same vein as bromance and spork:

mlr --nidx --from ./ngrams/gsl-2000.txt put -q -f ./ngrams/ngfuncs.mlr -f ./ngrams/ng5.mlr
beard
plastinguish
politicially
noise
loan
country
controductionary
suppery
lose
lessors
dollar
judge
rottendence
lessenger
diffendant
suggestional