Skip to content
Quick links:   Flags   Verbs   Functions   Glossary   Release docs

Randomizing examples

Generating random numbers from various distributions

Here we can chain together a few simple building blocks:

cat expo-sample.sh
# Generate 100,000 pairs of independent and identically distributed
# exponentially distributed random variables with the same rate parameter
# (namely, 2.5). Then compute histograms of one of them, along with
# histograms for their sum and their product.
#
# See also https://en.wikipedia.org/wiki/Exponential_distribution
#
# Here I'm using a specified random-number seed so this example always
# produces the same output for this web document: in everyday practice we
# wouldn't do that.

mlr -n \
  --seed 0 \
  --opprint \
  seqgen --stop 100000 \
  then put '
    # https://en.wikipedia.org/wiki/Inverse_transform_sampling
    func expo_sample(lambda) {
      return -log(1-urand())/lambda
    }
    $u = expo_sample(2.5);
    $v = expo_sample(2.5);
    $s = $u + $v;
  ' \
  then histogram -f u,s --lo 0 --hi 2 --nbins 50 \
  then bar -f u_count,s_count --auto -w 20

Namely:

  • Set the Miller random-number seed so this webdoc looks the same every time I regenerate it.
  • Use pretty-printed tabular output.
  • Use seqgen to produce 100,000 records i=0, i=1, etc.
  • Send those to a put step which defines an inverse-transform-sampling function and calls it twice, then computes the sum and product of samples.
  • Send those to a histogram, and from there to a bar-plotter. This is just for visualization; you could just as well output CSV and send that off to your own plotting tool, etc.

The output is as follows:

sh expo-sample.sh
bin_lo bin_hi u_count                        s_count
0      0.04   [64]*******************#[9554] [326]#...................[3703]
0.04   0.08   [64]*****************...[9554] [326]*****...............[3703]
0.08   0.12   [64]****************....[9554] [326]*********...........[3703]
0.12   0.16   [64]**************......[9554] [326]************........[3703]
0.16   0.2    [64]*************.......[9554] [326]**************......[3703]
0.2    0.24   [64]************........[9554] [326]*****************...[3703]
0.24   0.28   [64]**********..........[9554] [326]******************..[3703]
0.28   0.32   [64]*********...........[9554] [326]******************..[3703]
0.32   0.36   [64]********............[9554] [326]*******************.[3703]
0.36   0.4    [64]*******.............[9554] [326]*******************#[3703]
0.4    0.44   [64]*******.............[9554] [326]*******************.[3703]
0.44   0.48   [64]******..............[9554] [326]*******************.[3703]
0.48   0.52   [64]*****...............[9554] [326]******************..[3703]
0.52   0.56   [64]*****...............[9554] [326]******************..[3703]
0.56   0.6    [64]****................[9554] [326]*****************...[3703]
0.6    0.64   [64]****................[9554] [326]******************..[3703]
0.64   0.68   [64]***.................[9554] [326]****************....[3703]
0.68   0.72   [64]***.................[9554] [326]****************....[3703]
0.72   0.76   [64]***.................[9554] [326]***************.....[3703]
0.76   0.8    [64]**..................[9554] [326]**************......[3703]
0.8    0.84   [64]**..................[9554] [326]*************.......[3703]
0.84   0.88   [64]**..................[9554] [326]************........[3703]
0.88   0.92   [64]**..................[9554] [326]************........[3703]
0.92   0.96   [64]*...................[9554] [326]***********.........[3703]
0.96   1      [64]*...................[9554] [326]**********..........[3703]
1      1.04   [64]*...................[9554] [326]*********...........[3703]
1.04   1.08   [64]*...................[9554] [326]********............[3703]
1.08   1.12   [64]*...................[9554] [326]********............[3703]
1.12   1.16   [64]*...................[9554] [326]********............[3703]
1.16   1.2    [64]*...................[9554] [326]*******.............[3703]
1.2    1.24   [64]#...................[9554] [326]******..............[3703]
1.24   1.28   [64]#...................[9554] [326]*****...............[3703]
1.28   1.32   [64]#...................[9554] [326]*****...............[3703]
1.32   1.36   [64]#...................[9554] [326]****................[3703]
1.36   1.4    [64]#...................[9554] [326]****................[3703]
1.4    1.44   [64]#...................[9554] [326]****................[3703]
1.44   1.48   [64]#...................[9554] [326]***.................[3703]
1.48   1.52   [64]#...................[9554] [326]***.................[3703]
1.52   1.56   [64]#...................[9554] [326]***.................[3703]
1.56   1.6    [64]#...................[9554] [326]**..................[3703]
1.6    1.64   [64]#...................[9554] [326]**..................[3703]
1.64   1.68   [64]#...................[9554] [326]**..................[3703]
1.68   1.72   [64]#...................[9554] [326]*...................[3703]
1.72   1.76   [64]#...................[9554] [326]*...................[3703]
1.76   1.8    [64]#...................[9554] [326]*...................[3703]
1.8    1.84   [64]#...................[9554] [326]#...................[3703]
1.84   1.88   [64]#...................[9554] [326]#...................[3703]
1.88   1.92   [64]#...................[9554] [326]#...................[3703]
1.92   1.96   [64]#...................[9554] [326]#...................[3703]
1.96   2      [64]#...................[9554] [326]#...................[3703]

Randomly selecting words from a list

Given this word list, first take a look to see what the first few lines look like:

head data/english-words.txt
a
aa
aal
aalii
aam
aardvark
aardwolf
aba
abac
abaca

Then the following will randomly sample ten words with four to eight characters in them:

mlr --from data/english-words.txt --nidx filter -S 'n=strlen($1);4<=n&&n<=8' then sample -k 10
thionine
birchman
mildewy
avigate
addedly
abaze
askant
aiming
insulant
coinmate

Randomly generating jabberwocky words

These are simple n-grams, adapted from a previous version described here. Some common functions are located here with main Miller script here and wrapper script here.

The idea is that words from the input file are consumed, then taken apart and pasted back together in ways which imitate the letter-to-letter transitions found in the word list -- giving us automatically generated words in the same vein as bromance and spork:

mlr --nidx --from ./ngrams/gsl-2000.txt put -q -f ./ngrams/ngfuncs.mlr -f ./ngrams/ngrams.mlr
burse
serious
land
seasure
clainst
tray
wherhoose
stry
jourt
strue
partist
ornear
devel
praction
roup
Back to top