While attempting to learn Javascript and D3.js a couple months ago, I wrote a little library for displaying area proportional venn diagrams.

One thing this library didn't do though is consider the intersection areas of 3 or more circles when placing each set in the venn diagram. Its a trickier problem than I first thought, mainly because of all the special cases that can arise when the number of circles gets large. While the 2 circle case is a simple calculus problem, I failed to extend this solution to calculate the intersection area of an arbitrary number of circles.

The research papers I read on this both avoided calculating the circle intersection by using approximation techniques. One paper approximated the circles using polygons and used polygon intersection techniques to get the area, and the other approximated by plotting each circle and using binary indexing to compute the area. I tried out the latter approach, but found it to be too slow for realtime use.

Since then, I've had some ideas on different approaches to this problem that I wanted to try out. To keep up with learning D3, I also thought I'd try visualizing each approach here.

One of these ideas was to run a Monte Carlo simulation to try to figure this out.

Its an exceedingly simple approach: just randomly sample a bunch of points, and compute the ratio of points that are inside all the circles. The area of the intersection is approximately this ratio multiplied by the size of the bounding rectangle.

I thought of using this method after seeing someone use a Monte Carlo simulation to estimate Pi - and it seemed like a pretty easy tweak to extend to handle multiple circles. I've included the 1 circle case to include that estimate here:

Its looking like you're missing javascript, or are running an old web browser that doesn't support SVG. If no diagrams show up then you might want to consider loading this up in a different browser. I've tested this with Chrome, Safari and Firefox only.

Area: | 0 |

Ratio: | 0 / 0 |

Error: | 0 % |

When I remove all the visualization overhead, my laptop can sample about 10 million points a second. Taking a sample of 10k points happens in around a millisecond, and is accurate to within an percentage point or so. Which isn't really an awesome result, unless you're the kind of person comfortable with your estimate of Pi being 3.15. Also even a millisecond evaluation time leads to noticeable lag in my use case, since I have to compute this hundreds of times. On the plus side, its a very easy method to implement.

Another idea I had was to decompose the intersection area recursively using a Quadtree style approach.

Since the intersection areas here are convex, a rectangle is fully contained inside the intersection area if all four of its corners are inside. So the idea here is just to divide the region recursively into 4 rectangles, and check if the corners of these rectangles are contained in each circle. If all the corners are inside all the circles, we don't need to recurse. Likewise we don't need to recurse if all the corners are outside all the circles (assuming an initial bounding rectangle that is tight around the intersection area):

Area: | 0 +/- 1 |

Rectangle Count: | 0 |

Tree Depth: | 0 |

Error: | 0 % |

I didn't bother benchmarking this one, because as I was looking at it I finally figured out a way to calculate the intersection area without resorting to using an approximation.

The key here is that each intersection area is just a polygon, with an extra circle arc bulging outwards from every line segment:

Area: | 0 |

Polygon Area: | 0 |

Arc Area: | 0 |

The polygon can be found by examining all the possible intersection points for all pairs of circles. The intersection points that are inside all the circles define the perimeter of the polygon. After sorting these points by their angle from the centre of the polygon, its relatively straightforward to calculate the area of the polygon.

Calculating the area of each circle arc is a little trickier. For each line segment on the inner polygon, there can be many circles that link both points - and for each circle there are two different arcs between the two points. We need to pick the arc that lies in the right direction, which can be done in a bunch of ways: I'm doing this by picking the arc that has an angle between the angle of the two points of the line segment. The circle is the one where the arc has the smallest distance to the line segment. Finally the area is computed by integrating the circle up to the width of the arc.

I've put the code up on github, and changed my venn.js library to use the exact method when calculating the loss function for 3+ sets.

While not fast enough or accurate enough for my use case, both of the approximation methods came in useful in testing out the exact solution. I generated a million random layouts and tested that the exact solution was within the error bound given by the quadtree approximation. If the answers didn't match up, I investigated manually using the monte carlo estimate to figure out if it was the quadtree method or the exact method that had the issue. This caught a bunch of bugs that I don't think I would have caught otherwise, especially with the quadtree estimate.

In retrospect the exact solution seems really obvious. I think the only reason I didn't come up with it originally is that I ended up googling for papers on laying out venn diagrams when I got stuck. Since all the papers I read used approximation techniques, I went that route instead.

Published on 19 November 2013

Enter your email address to get an email whenever I write a new post: