Given a function which generates a random integer in the range 1 to 7, write a function which generates a random integer in the range 1 to 10 uniformly.

This appear to be one of those probabilistic analysis questions. You should be familiar with the concept of expected value, as it could be extremely helpful in probabilistic analysis.

**Hint:**

Assume you could generate a random integer in the range **1** to **49**. How would you generate a random integer in the range of **1** to** 10**? What would you do if the generated number is in the desired range? What if it’s not?

**Solution:**

This solution is based upon Rejection Sampling. The main idea is when you generate a number in the desired range, output that number immediately. If the number is out of the desired range, reject it and re-sample again. As each number in the desired range has the same probability of being chosen, a uniform distribution is produced.

Obviously, we have to run *rand7()* function at least twice, as there are not enough numbers in the range of **1** to **10**. By running *rand7()* twice, we can get integers from **1** to **49** uniformly. Why?

1 2 3 4 5 6 711 2 3 4 5 6 728 9 10 1 2 3 435 6 7 8 9 10 142 3 4 5 6 7 859 10 1 2 3 4 566 7 8 9 10 * *7* * * * * * *

A table is used to illustrate the concept of rejection sampling. Calling *rand7()* twice will get us row and column index that corresponds to a unique position in the table above. Imagine that you are choosing a number randomly from the table above. If you hit a number, you return that number immediately. If you hit a *, you repeat the process again until you hit a number.

Since **49** is not a multiple of tens, we have to use rejection sampling. Our desired range is integers from **1** to **40**, which we can return the answer immediately. If not (the integer falls between** 41** to **49**), we reject it and repeat the whole process again.

1 2 3 4 5 6 7 8 9 |
int rand10() { int row, col, idx; do { row = rand7(); col = rand7(); idx = col + (row-1)*7; } while (idx > 40); return 1 + (idx-1)%10; } |

Now let’s get our hands dirty to calculate the expected value for the number of calls to *rand7()* function.

E(# calls to rand7)= 2 * (40/49) + 4 * (9/49) * (40/49) + 6 * (9/49)^{2}* (40/49) + ..._{∞}=∑2k * (9/49)^{k-1}* (40/49) k=1 = (80/49) / (1 - 9/49)^{2}=2.45

**Optimization:**

There are a total of **2.45** calls to *rand7()* on average using the above method. Can we do better? Glad that you asked. In fact, we are able to improved the above method by **10%** faster.

It seems wasteful to throw away the integers in the range **41** to **49**. In fact, we could reuse them in the hope of minimizing the number of calls to *rand7()*. In the event that we could not generate a number in the desired range (**1** to **40**), it is equally likely that each number of **41** to **49** would be chosen. In other words, we are able to obtain integers in the range of **1 **to **9** uniformly. Now, run* rand7()* again and we obtain integers in the range of **1** to **63** uniformly. Apply rejection sampling where the desired range is **1** to **60**. If the generated number is in the desired range (**1** to** 60**), we return the number. If it is not (**61** to **63**), we at least obtain integers of **1** to **3** uniformly. Run *rand7()* again and we obtain integers in the range of **1** to **21** uniformly. The desired range is **1** to **20**, and in the unlikely event we get a **21**, we reject it and repeat the entire process again.

Below is the code for this optimized method. Note that there are code sections that are repeated, but I leave it as it is for code clarity. (Take it as a challenge to refactor the code below!)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
int rand10Imp() { int a, b, idx; while (true) { a = rand7(); b = rand7(); idx = b + (a-1)*7; if (idx <= 40) return 1 + (idx-1)%10; a = idx-40; b = rand7(); // get uniform dist from 1 - 63 idx = b + (a-1)*7; if (idx <= 60) return 1 + (idx-1)%10; a = idx-60; b = rand7(); // get uniform dist from 1-21 idx = b + (a-1)*7; if (idx <= 20) return 1 + (idx-1)%10; } } |

The expected value for the number of calls to *rand7()* function using this optimization is:

E(# calls to rand7)= 2 * (40/49) + 3 * (9/49) * (60/63) + 4 * (9/49) * (3/63) * (20/21) + (9/49) * (3/63) * (1/21) * [ 6 * (40/49) + 7 * (9/49) * (60/63) + 8 * (9/49) * (3/63) * (20/21) ] + ((9/49) * (3/63) * (1/21))^{2}* [ 10 * (40/49) + 11 * (9/49) * (60/63) + 12 * (9/49) * (3/63) * (20/21) ] + ... =2.2123

I omitted some steps above because it involves some tedious math. Anyway, using this optimization, we are able to achieve **2.2123** calls to* rand7()* on average, which is **10%** less calls than the unoptimized method.

Hi, I subscribed your blog through rss. Usually I read your posts using google reader. But I can't help coming here to tell you that you have done a great job of composing these articles. I especially like the way you solve the problems, that is first giving a solution which is more straightforward and easier to come into the mind, and then keep refining and improving the solution.

I encourage you to write more details about the process of tackling a tech question, e.g. what prompt you to give such a solution, what will you do when encounter a not-so-easy problem that you never see before, what is your strategy or heuristic when meeting a question that you cannot recognize its type (such as DP, BFS, etc). Because I think these are more important than the answers π

Keep up the good works!

+4Thank you so much for those encouraging words, because it meant a great deal of motivation to me.

I especially appreciate the suggestion you gave, that's actually the most thoughtful comment, thank you π

+4Hi, for the optimization part, I kind of know what you are saying, but not quite sure if I understand and it right. Could you please post some codes like the previous one? That will be really helpful. Thank you very much! Like your posting as always!

0Read above for newly added code for the optimized part. Appreciate your thanks!

0you can calculate the expected number of calls much more easily: if X is the random variable giving the number of calls,

E[X] = 2 * 40/49 + 9/49 * (2 + E[X])

0Good stuff, and the optimization is definitely very subtle. Though for the numbers in [41, 49] it would probably be "good enough" (simpler code) to reject 48,49 and use [41,47] as if it was a call to rand7().

0Sorry if this is a lame question but it is not clear why numbers from 41-49 were rejected. We can always get the sample from those numbers as well by taking modulos with 10 which will give us a sample in 1-10 range?

0It cannot ensure the uniform property

0Probability of getting 10 is less as compared to any number ranging from 1 to 9. All outcomes must have equal probability to maintain uniform property

0He’s adding the two numbers together. He built a table (2-d array) and use the two random variables as indices of row x and column y. And the probability that (x,y) falls on element 1-40 is uniform.

0Am I missing something here? Is there a requirement that you’re only allowed to use integers that you left out?

This can be done a lot more easily simply by taking the random number, dividing by 7, (after first casting to a double), then multiplying the resulting decimal by 10.

0Nope that wouldn’t work. The random gen provided in an INTEGER generator from 1 to 7 (aka 1 2 3 4 5 6 7). Dividing it by 7 and multiplied by 10 would give you a generator that generate 7 certain double numbers from 1 to 10.

0Pingback: Rejection Sampling | ε£βδΊΊδΊΊβμ Blog

@ihas1337 can u explain how you achieved this 2 * (40/49) +

4 * (9/49) * (40/49) +

6 * (9/49)2 * (40/49) +

…

β

= β 2k * (9/49)k-1 * (40/49)

k=1

= (80/49) / (1 – 9/49)2

= 2.45

please reply asap

0@ihas1337 can u explain how you achieved this

2 * (40/49) +

4 * (9/49) * (40/49) +

6 * (9/49)2 * (40/49) +

…

β

= β 2k * (9/49)k-1 * (40/49)

k=1

= (80/49) / (1 – 9/49)2

= 2.45

please reply asap

0I think the following works as well, might call rand7 a few more times, but much simpler.

int v;

do {

v = rand7();

} while (v > 4);

int result = rand7() + v – 1;

0I don’t think your code works. For number 10, there are two ways to get it, 7+3 and 6+4, but for number 9, there are three ways to get it, 7+2, 6+3, and 5+4. In a word, their probabilities are not equals.

0Excellent post (as usual). I have a question: Why

while

is the same and more efficient?!

01 + (idx – 1) % 10 vs 1 + idx % 10

Suppose idx = 40.

1 + (idx – 1) % 10 = 10 (Correct)

(1 + idx) % 10 = 1 (wrong)

1 + (idx % 10) = 1 (wrong)

0why 1+(idx%10)=1 is wrong here? I am also confused here. idx is between 1 and 40, then idx%10 is between 0 and 9. Therefore, 1+(idx%10) is between 1 and 10.

+1Excellent Post. Very informative.

0how calling rand7 twice produce uniform distribution from 1 to 49 and calling it thrice produces upto 63 and calling it again produces upto 21. Can’t get that ? Pls explain

0Now got it. Thanks anyway. Great explanation

0Or more simply generate two random numbers x and y, respectively one in the ranges 1-2 and 1-5. the return value would be 5*(x-1)+y;

0What about this

int i=0;

while(i<10)

{

r = r + rand7();

i++;

}

// generating numbers between 1-70 with probability 1/70

return r/7 + 1; // number between 1-10 are retunred with probablity 1/10

+1Why cant we just do this ?

imagine an array arr[7][7] ie prepopulated with below values

int val=0;

for(int i=0;i++,i<7)

{

for (int j=0;j<7;j++)

{

arr[i][j] = (val%11)+1;

val++;

}

}

you will get below array

1 2 3 4 5 6 7

1 1 2 3 4 5 6 7

2 8 9 10 1 2 3 4

3 5 6 7 8 9 10 1

4 2 3 4 5 6 7 8

5 9 10 1 2 3 4 5

6 6 7 8 9 10 1 2

7 3 4 5 6 7 8 9

int rand10()

{

int row, col, idx;

row = rand7();

col = rand7();

return arr[row*col];

}

002.2123 is not so optimal. Using compression/entropy principles, it should only cost log(10)/log(7)=1.183 steps on average

0What about this solution:

0