APCS Notes

Random Numbers


Copyright Chris Nevison, Computer Science Department, Colgate University, Hamilton, NY, 13346, January, 2001.  This material may be reproduced for face-to-face teaching purposes only.   This material may not be reproduced or used for any commercial purpose. This copyright notice must appear on any copies made.


There are many applications in computer science where random numbers can be useful.  These range from the obvious game programs that need an element of chance, for example shuffling cards for a card game, to simulations where unpredictable real phenomena can be reasonably modeled using random values, for example, arrivals at a restaurant can be modeled with random arrival times, to complex mathematical calculations where random techniques can be used to generate highly accurate approximations to solutions.  Of course computers are predictable machines that produce exactly the output that a program calls for (most of the time!).  How can they be used to generate random results?

Pseudo-Random Numbers

In reality, computers do not produce truly random numbers.  Instead, when some degree of randomness is called for in a program, an algorithm is used that produces a sequence of numbers that appear to be random.  These are called pseudo-random numbers.  There are many algorithms that have been developed for generating pseudo-random numbers and there are some very sophisticated statistical tests to determine whether a sequence truly appears random.  This is an active area of research, because there is need for random number generators that produce longer andlonger sequences with good sttistical properties.

Many programming languages provide a built-in random number generator or have library functions for doing so.  We will use a random number generator provided with the APCS Marine Biology Case Study in the class RandGen.  A class is used to encapsulate a collection of functions that provide different types of random numbers and to hide the underlying details of the implementation.  We will only use this class, we will not investigate how it is put together.  That is a topic for a class on simulation or random algorithms.

class RandGen

The class RandGen provides two type of random number functions, integer and real, and there are two of each:


    int RandInt(int max);                  // returns int in [0..max)
    int RandInt(int low, int max);         // returns int in [low..max]
    double RandReal();                     // returns double in [0..1)
    double RandReal(double low, double max); // returns double in [low..max)


In addition, RandGen has two constructors.  The default constructor (no parameters), starts the random number stream at a different point every time a program is run.  The constructor that takes an integer parameter uses that parameter to "seed" the stream of random numbers, so the same stream is produced every time the program is run using that same seed parameter.  If a different seed is used, then a different stream will be produced.

The class RandGen only produces one stream of random numbers for the program, no matter how many different instances of RandGen are created.  Consequently, the determination of whether the stream is controlled by the seed of a constructor and produces the same sequence of random numbers or is started by a default constructor depends on which one is called first in the program.  Often this is not an issue because we only have one instance of RandGen in the program.

When testing a program using a random number generator, it is a good idea to force the program to do the same thing each time by giving the random sequence the same seed each run.  When we think the program is working properly, we can use different seeds to check the randomness of its behavior.  Finally, if we want unpredictable behavior, rather than a controlled experiment, we can use RandGen with the default constructor.  This actually uses the time function of the computer to get the seed for the random sequence, so it will be different every time.

Random Integers

The simplest function provided by class RandGen is one that generates random integers from 0 up to, but not including the parameter given:
    int RandInt(int max);                  // returns int in [0..max)
If we call this number with max = 2, then we get the simplest case, a random number that is either 0 or 1.  By random, we mean that every number that can be produced is equally likely, so for  RandInt(2), we should get 0 half the time and 1 half the time.  Of course, because it is random, we do not expect this to be exact, but for many trials, the sequence of 0's and 1's should behave like like flips of a coin.  In fact a simple program can produce a sequence of coin flips.

#include <iostream>  // if you do not use namespaces, add a .h here
using namespace std; // and delete this line
#include "randgen.h"

int main()
{
  RandGen rg(2345);
  int totHeads = 0;
  int k;

  for(k = 0; k < 10; k++){
    if(rg.RandInt(2) == 0){
      cout << heads << endl;
      totHeads++;
    }
    else{
      cout << tails << endl;
    }
  }
  cout << endl << "number of heads is " << totHeads << endl;

  return 0;
}

Run this program two or three times.  Remember that you will need to add the file randgen.cpp to the project as well as this program file.  You should get the same sequence of heads and tails each time, because we have given RandGen the same seed each time.  Type in a new seed and recompile the program and run ir two or three times.  You should get a different sequence of heads and tails from the first set of runs, but the same for each of these runs.  Now replace the line declaring the RandGen object with the line


  RandGen rg;


and recompile and run the program.  This time you should get different behavior every time that you run the program.  Now we have a tool where we can model things involving randomness, using this tool.

Suppose we want to model throwing dice for some sort of game.  For example, in some games you throw two dice and something special happens (getting out of jail in thegame of monopoly) is you get doubles (both dice show the same number).  How can we model the dice in a program?

The throws of a six-sided die produces a result of 1, 2, 3, 4, 5, or 6 (the number of spots showing on top), and each of these results is equally likely.  We can model this in two ways sing our RandGen class.  First, we could use call the function RandInt(6).  It would produce the numbers 0, 1, 2, 3, 4, 5,  randomly, so then we just add 1 to the result to get the random value for the throw of a die.  Alternatively, we could use the function RandInt(1, 6).  This produces one of the numbers 1, 2, 3, 4, 5, 6 randomly.  We choose the latter to write a program that rolls two dice at a time and counts the number of rolls needed before we get doubles.

int main()
{
  RandGen rg;
  int dice1, dice2;

  int cnt = 0;
  do{
    dice1 = rg.RandInt(1, 6);
    dice2 = rg.RandInt(1, 6);
    cout << dice1 << " " << dice2 << endl;
    cnt++;
  }while(dice1 != dice2);

  cout << endl << "rolls to get doubles " << cnt << endl;

  return 0;
}

Exercises

1.  Write a function that flips a coin num times, where num is a parameter, and returns the number of heads.  Use this function in a program that does 1000 experiments, each flipping a coin a fixed number of times (say ten), and find the average number of heads for that number of flips.
2.  Write a function that returns the number of rolls of a pair of dice taken before the first doubles comes up (both dice having the same number).  Write a program that uses this function to run the dice rolling experiment 1000 times and calculates the average number of rolls before doubles comes up.
 

Arbitrary Probabilities

Sometimes we want to model random phenomena where the probablities do not correspond to a equal choice among integers.  We can do this by making use of the RandGen functions that produce random real numbers.  The first of these takes no parameters and produces a number between 0.0 and 1.0 (including 0.0 as a possibility, but not 1.0).

In theory, we want a random number that can take any real value between 0.0 and 1.0.  However, since real numbers are modeled with finite precision on computers, this is impossible.  We can only get real values that can be represented by the variable type selected; in the case of the RandGen RandReal function, the type returned is double.  In addition, the mechanism by which the random numbers are generated may limit the precision of the values that can be produced further.  This is an implementation detail with which we do not want to concern ourselves.  Suffice it to say that the RandReal function produces random real values with type double that are evenly spread across the range 0.0 to 1.0.

With this in mind, we can produce decisions based on any given probability.  To say that an event E happens with probability prob (0.0 <= prob <= 1.0), means that if we do enough trials, E will happen a proportion of the time equal to prob, but randomly.  We can generate this behavior by observing the the random number produced by a call to RandReal() will be less than prob a proportion of the time equal to prob, and greater than or equal to prob the rest of the time --
this is what is meant when we say any real value between 0.0 and 1.0 is equally likely.

So to model something that happens with probability prob, we use the conditional statement

if( rg.RandReal() < prob){
  <something that happens with probability prob>
}
For example, suppose we want to experiment with a "coin" that is weighted, so that heads comes up with a probability prob, where prob is some number greater than 1/2.  We can use a statement like the one given above to determine whether heads comes up, and produce the following program (where we prompt the user for the probability of heads).

int main()
{
  RandGen rg;
  int totHeads = 0;
  int k;
  double probHeads;

  do{
    cout << "Enter the probability of heads (between 0.0 and 1.0): ";
    cin  >> probHeads;
    if((probHeads < 0.0) || (probHeads > 1.0))
      cout << "Invalid probability, please reenter." << endl;
  }while((probHeads < 0.0) || (probHeads > 1.0));

  for(k = 0; k < 10; k++){
    if(rg.RandReal() < probHeads){
      cout << "heads" << endl;
      totHeads++;
    }
    else{
      cout << "tails" << endl;
    }
  }
  cout << endl << "number of heads is " << totHeads << endl;

  return 0;
}

Try this program a few times.  If you pick the probability of, say, 0.55 each time, then the behavior of the program looks very much like the first coin-flipping program.  How can we be sure that the probabilities are really coming out "right"?  What does "right" mean if the behavior is supposed to be random?  We can do a simple statistical "test" to check, as described in exercise   More sophisticated tests of the random numbers requires a knowledge of statistics (try the AP Statistics course).

Exercise

3.  Write a function that flips a coin num times, where num is a parameter, and returns the number of heads.  Do this for a "weighted coin" where the probability of  heads is a second parameter to the function.  Use this function in a program that does 1000 experiments, each flipping a coin a fixed number of times (say a hundred) with a fixed probability of heads, and find the average number of heads for that number of flips.  Do the results match what you would expect for the probability used?

Sometimes we need to make a random selection among several possible events, each with different probabilities.  This was the case with the dice, where there were six possible values for one die, but they were all equally likely to occur, so we could use the RandInt function.  When the probabilities are different, we cannot do that.  For example, suppose that one of four things will happen, A with probability 0.10, B with probability 0.15, C with probability 0.25, or D with probability 0.50 (note, the probabilities add to 1.0 as they should when we have described all possibilities).  We might try to randomly select from the four possibilities as follows:

// incorrect code for the selection

if(rg.RandReal() < 0.10){
  <select A>
}
else if(rg.RandReal() < 0.15){
  <select B>
}
else if(rg.RandReal() < 0.25){
  <select C>
}
else{
  <select D>
}

Exercise

4. Write a program that uses this form, and repeats the selection process 1000 times, keeping count of the number of times each choice is made.  Do the frequncies of A, B, C, and D match the probabilities?  The first one should, the second and third should be low and the last should be high.

The problem with the code above is that we generate three random values for one selection.  We really should be generating only one value and checking which part of the interval it falls in, partitionling the interval into four pieces, with the lengths given by the respecitve probabilities.

Here is a correct version of the program.  Try running it and compare the results to the previous version.

int main()
{
  RandGen rg;
  int Atot = 0;
  int Btot = 0;
  int Ctot = 0;
  int Dtot = 0;
  int k;
  double probA = 0.10;
  double probB = 0.15;
  double probC = 0.25;
  double probD = 0.50;
  double randval;
 

  for(k = 0; k < 1000; k++){
    randval = rg.RandReal();
    if(randval < probA){
      Atot++;
    }
    else if(randval < probA + probB){
      Btot++;
    }
    else if(randval < probA + probB + probC){
      Ctot++;
    }
    else{
      Dtot++;
    }
  }
  cout << "A total = " << Atot << endl;
  cout << "B total = " << Btot << endl;
  cout << "C total = " << Ctot << endl;
  cout << "D total = " << Dtot << endl;

  return 0;
}

This gives us a tool for selecting one among several possibilities with different probabilities.

Exercises

5.  Suppose that you have dice that are fixed, so that the probabilities of 1, and 2 are each 0.12 and the probability of 3 and 4 are each 0.18, and the probability of 5 and 6 are each 0.20.  (a)  Write a function that returns an integer with value 1, 2, 3, 4, 5, or 6 according to these probabilities.  (b)  Using this function to model the roll of one weighted die, write a function that returns the number of rolls of a pair of dice taken before the first doubles comes up.  (c) Write a program that uses this function to run the dice rolling experiment 1000 times and calculates the average number of rolls before doubles comes up.  If you were playing a game where doubles was a result that you wanted, would you rather use the fair dice or the weighted dice?