The classic Monte Carlo "Hello World" is to calculate $\pi$ by generating random numbers and accepting or rejecting them. The algorithm below is taken from one of my graduate textbooks on molecular simulations - I apologize that I can't remember where, and that book is long gone. Essentially one is throwing darts into the unit square and then measuring if it is within the quarter of the unit circle. $\pi$ equals the number of darts in the circle divided by all of the attempts. Since it is a quarter of the unit circle, we multiple by 4.

I thought it would be fun to try to use some of the available tooling to get a speed performance in Python as well as using some Fortan inline since Fortran was the major language I used in graduate school. Note that fortranmagic is located in the fortran-magic package.

%load_ext fortranmagic
import numpy as np
from numba import jit, njit, prange

n = 10000000


## Plain Python and Numpy

def calc_pi_p(n):
accept = 0.0
for i in range(n):
r = np.random.random(2)
if (np.dot(r,r) <= 1.0):
accept += 1.0

return 4.0 * accept / float(n)

%%time
calc_pi_p(n)

CPU times: user 47.9 s, sys: 25.6 ms, total: 47.9 s
Wall time: 48.1 s

3.141434

## Numba

@jit
def calc_pi_numba(n):
accept = 0.0
for i in range(n):
r = np.random.random(2)
if (np.dot(r,r) <= 1.0):
accept += 1.0

return 4.0 * accept / float(n)

%%time
calc_pi_numba(n)

CPU times: user 2.29 s, sys: 13.2 ms, total: 2.31 s
Wall time: 2.31 s

3.1418784

## Numba in parallel

@njit(parallel=True)
def calc_pi_numba_parallel(n):
accept = 0.0
for i in prange(n):
r = np.random.random(2)
if (np.dot(r,r) <= 1.0):
accept += 1.0

return 4.0 * accept / float(n)

%%time
calc_pi_numba_parallel(n)

CPU times: user 1.24 s, sys: 9.69 ms, total: 1.25 s
Wall time: 1.09 s

3.1403444

## Fortran

%%fortran
function calc_pi_f(n) result(pi)
implicit none
integer(8) :: accept
real(8) :: r(2), pi
integer(8), intent(in) :: n
integer(8) :: i

accept = 0
do i = 1, n
call random_number(r)
if (dot_product(r,r) <= 1.0) then
accept = accept + 1
end if
end do

pi = 4.0d0 * dble(accept)/dble(n)
end function calc_pi_f

%%time
calc_pi_f(n)

CPU times: user 406 ms, sys: 0 ns, total: 406 ms
Wall time: 405 ms

3.1423488

Looks like Fortran is the winner in this case. We didn't even parallelize the operation, but Numba does give a signifcant speedup compared to plain Python/Numpy. There might be a way to vectorize some of the operations in Numpy though to perform better, but I failed to find out how to do that in this case.