Advanced SQL: Approximating π Using the Monte Carlo Method in Postgres

Databases are commonly used as dumb storage bins for CRUD application data. However, this doesn’t do them justice: database systems such as PostgreSQL are much more capable and versatile than that, supporting a wide range of operations across many different data types.

The standard way of interfacing with Postgres is SQL – you know, that thing you may have briefly learned in webdev class where you SELECT stuff FROM a set of tables WHERE some condition is met. But that’s a far cry from what you can achieve when taking advantage of the full feature set of this declarative and – surprise! – Turing complete programming language.

In this post, I’ll describe how to approximate with a fairly compact SQL query. 1

This series of posts is loosely based on what I learned in the Advanced SQL lecture held by Torsten Grust in the summer of 2017 at the University of Tübingen. Take a look at the lecture slides for in-depth explanations and a wide array of examples.

Theory

Some math first:2 You might remember that the area of a circle with radius is

We can divide by to get an equation for :

That means that as long as we know the radius and find a way of estimating , we can estimate !

Now let’s imagine our circle centered in a tight square box:

Note that the square’s side length is , yielding the square area

.

If we generate random points in this square, we can count how many points fall inside the circle. That’s the Monte Carlo method – generating random samples and estimating based on the observed distribution: 3

This gives us an approximation of the fraction of the square that is occupied by the circle, which in turn lets us approximate :

That means that is approximately four times the number of points inside the circle divided by the total number of points within the bounds of our square. This is starting to sound easy to implement!

To make things a bit more straightforward later on, let’s agree to set and place the circle’s center at . This conveniently turns our square into a unit square:

Counting the random points in this example, we notice that out of points fall inside the circle. Let’s plug these values into the formula we derived above and see what we get:

Not too far off!

Implementation

Before showing you the query, here are some “advanced” SQL features that will come in handy:

Putting all of this together and combining it with our approximation formula for , we end up with something along the lines of the following query:

\set n 1000000

SELECT 4 * count(*) :: float / :n AS pi
FROM   generate_series(1, :n) AS _
WHERE  circle(point(0.5, 0.5), 0.5) @> point(random(), random());

To run it, simply spin up psql, paste the query, press return and bask in the glory of what we’ve achieved today:

$ psql
psql=# \set n 1000000
psql=# SELECT 4 * count(*) :: float / :n AS pi
psql-# FROM   generate_series(1, :n) AS _
psql-# WHERE  circle(point(0.5, 0.5), 0.5) @> point(random(), random());
+----------+
|    pi    |
+----------+
| 3.140892 |
+----------+
(1 row)

Time: 682.926 ms

Addendum: Accidental LaTeX Implementation

If you’re only here for the SQL query, go away.

The visualizations in the “theory” section above have been drawn in LaTeX/TikZ6, including the random points used to approximate in the SQL query.

At some point I realized that it wouldn’t be too hard to add a counter to the \foreach loop that’s responsible for filling the points with different shades of gray depending on whether they fall inside the circle. 7 This enables keeping track of the number of points inside the circle. Because the iteration count is known, computing the corresponding approximation for works the same way as in the SQL query’s SELECT clause. Displaying the result as part of the drawing then was trivial, as you can see once you scroll down a bit.

First, here’s the code:

\RequirePackage{luatex85}
\documentclass{standalone}
\usepackage{fontenc,unicode-math}
\setmainfont[Ligatures=TeX]{TeX Gyre Pagella}
\setmathfont[Ligatures=TeX]{TeX Gyre Pagella Math}
\usepackage{tikz}
\usetikzlibrary{calc}
\begin{document}

\scalebox{2}{
\begin{tikzpicture}[scale=3.5]
    % axes
    \draw [<->,thick] (0,1.2) node (yaxis) [above] {$y$}
                   |- (1.2,0) node (xaxis) [right] {$x$};
    % draw circle
    \coordinate (m) at (0.5,0.5);
    \draw (m) circle (0.5cm) node (mlabel) [right] {$m$};
    \fill[black] (m) circle (0.4pt);
    \draw[dashed] (yaxis |- m) node[left] {$0.5$}
               -| (xaxis -| m) node[below] {$0.5$};
    \draw[dashed] (m) -- node[right] {$r$} (0.23,0.08);
    % draw rectangle
    \coordinate (one) at (1,1);
    \draw (0,0) rectangle (one);
    \draw[dashed] (yaxis |- one) node[left] {$1$}
               -| (xaxis -| one) node[below] {$1$};
    % draw random points
    \pgfmathsetmacro{\i}{100}
    \newcounter{inpoints}
    \setcounter{inpoints}{0}
    \pgfmathsetseed{3455632}
    \def\incolor{gray!50!black}
    \def\outcolor{gray!50!white}
    \foreach \p in {1,...,\i} {
        \pgfmathsetmacro{\x}{0.5*rand+0.5}
        \pgfmathsetmacro{\y}{0.5*rand+0.5}
        \pgfmathparse{(\x-0.5)^2+(\y-0.5)^2}
        \pgfmathsetmacro{\dist}{\pgfmathresult}
        \ifdim\dist pt < 0.25pt
            \addtocounter{inpoints}{1}
            \fill[fill=\incolor] (\x,\y) circle (0.25pt);
        \else
            \fill[fill=\outcolor] (\x,\y) circle (0.25pt);
        \fi
    }
    \pgfmathparse{int(\i-\theinpoints)}
    \pgfmathsetmacro{\theoutpoints}{\pgfmathresult}
    \pgfmathparse{(4*\theinpoints/\i)}
    \node[above of=m,yshift=1.15cm,xshift=0.18cm] {$\pi \approx \frac{4 \cdot \textcolor{\incolor}{\theinpoints}}{\textcolor{\outcolor}{\theoutpoints} + \textcolor{\incolor}{\theinpoints}} \approx \pgfmathresult$};
\end{tikzpicture}
}

\end{document}

Simply adjust the random seed 3455632 and optionally the iteration count \i in the LaTeX document above, compile8 and you’ll observe a different distribution of the points and, most likely, a slightly different approximation of below the plot. You can also change the colors of the points to your liking:

Turning this visualization into an animation by making the points appear one by one (while continually adjusting the approximation) would be interesting. Consider that an exercise for the reader. 😉

  1. Note that we’ll use some PostgreSQL-specific functions, however due to SQL’s Turing completeness, everything we do here is theoretically possible in any standards-compliant RDBMS.

  2. Bear with me here.

  3. If points are too boring, you could do the same with darts. As long as you’re not too good at darts.

  4. As usual, the range is actually , meaning that is not included. As usual, this doesn’t really matter for the problem at hand.

  5. Other RDBMSes have similar features, see here or here.

  6. The excellent pdf2svg utility (which can be installed via Homebrew) was used to convert to a web-accessible format.

  7. Which can be tested by checking whether (\x-0.5)^2+(\y-0.5)^2 (where 0.5 is both the x and y coordinate of the circle’s center) is smaller than 0.25.

  8. If you remove lines 1 and 3-5, you can use any LaTeX engine, otherwise you’ll be constrained to LuaLaTeX.