Computers can serve as exciting tools for discovery, with which we can
model and explore complex phenomena. For example, to test theories about
the formation of the universe, we can perform *simulations* to predict
how galaxies evolve. To do this, we could gather the estimated mass and
location of stars and then model their gravitational interactions over time.

Another avenue for discovery is to *visualize* complex information to
reveal structure and patterns. Consider this network diagram, showing
connections between people in a social network. We can use such diagrams
to examine community groups and identify people who bridge between them.

Though they may seem quite different at first, these two examples share
a common need: they require computing forces that arise from pairwise
interactions among a set of points, often referred to as an *N-body problem*.
In the case of astronomical simulation, we seek to model the
gravitational forces among stars. In the case of network visualization,
we compute the layout using a similar physical simulation:
nodes in the network act as charged particles that repel each other, while
links act as springs that pull related nodes together.

*To get a sense of how this force-directed layout works, drag the nodes or
use the slider to adjust the force strength.* Negative values indicate
repulsive forces, while positive values indicate attractive forces.

*Force Strength *-30.00

A straightforward approach to computing N-body forces is to consider all
pairs of individual points and add up the contributions of each
interaction. This naïve scheme has *quadratic complexity*: as the number
of points *n* increases, the running time grows proportionally
to *n*^{2}, quickly leading to intractably long calculations.
How might we do better?

To accelerate computation and make large-scale simulations possible, the
astronomers Josh Barnes and Piet Hut devised a clever scheme.
The key idea is to approximate long-range
forces by replacing a group of distant points with their center
of mass. In exchange for a small amount of error,
this scheme significantly speeds up calculation, with
complexity *n log n* rather than *n*^{2}.

Central to this approximation is a *spatial index*: a “map” of space
that helps us model groups of points as a single center of mass. In
two dimensions, we can use a quadtree
data structure, which recursively
subdivides square regions of space into four equal-sized quadrants. (In three dimensions, one can use an analogous octree that divides a cubic volume into eight sub-cubes.)

The Barnes-Hut approximation involves three steps:

- Construct the spatial index (e.g., quadtree)
- Calculate centers of mass
- Estimate forces

Let’s explore each step in turn. We will assume we are computing *repulsive*
forces for the purposes of network layout. This setup is akin to modeling
anti-gravity or electric forces with similarly-charged particles. While we
will use the term “center of mass”, this could readily
be replaced with “center of charge”.

We begin with a set of two-dimensional points as input. When we insert the first point into the quadtree, it is added to the top-level root cell of the tree.

Next we insert another point, which requires expanding the tree by subdiving the space. Upon each subsequent insertion, more fine-grained cells may be added until all points reside in their own cell.

*Advance the slider to add each point and produce the full quadtree*.

*Inserted Points *0

After quadtree construction, we calculate centers of mass for each cell of the tree. The center of mass of a quadtree cell is simply the weighted average of the centers of its four child cells.

We visit the leaf node cells first and then visit subsequent parent cells, merging data as we pass upwards through the tree. Once the traversal completes, each cell has been updated with the position and strength of its center of mass.

Now we are ready to estimate forces!

To measure forces at a given point, let's add a "probe" to our diagram (). The purple line extending from the probe indicates the direction and magnitude of the total force at that location. (To promote visibility, the purple line is three times longer than the actual pixel distance the probe would be moved in a single timestep of the force simulation.) The dotted lines extending to the probe represent the force components exerted by individual points.

*Move the probe (click or
drag in the visualization) to explore the force field*.

Ignoring the quadtree, we can naïvely calculate forces by summing the
contributions of *all* individual points. Of course, we would instead like
to use the quadtree to accelerate calculation and approximate long-range
forces. Rather than compute interactions among individual
points, we can compute interactions
with centers of mass, using smaller quadtree cells for nearer points and
larger cells for more distant points.

At this point we’ve skipped a critical detail: what constitutes
“long-range” versus “short-range” forces? We consider both
the *distance* to the center of a quadtree cell and that cell’s *width*.
If the ratio *width / distance* falls below a chosen threshold
(a parameter named *theta*), we treat the quadtree cell as a
source of long-range forces and use its center of mass.
Otherwise, we will recursively visit the child cells in the quadtree.

When theta = 1, a quadtree cell’s center of mass will be used — and its internal points ignored — if the distance from the sample point to the cell’s center is greater than or equal to the cell’s width.

*Adjust the theta parameter to view its effect on force estimation*. How does
the number of considered points change based on the probe location and theta?
How does the direction and magnitude of the total force vary with theta?

*Theta *0.00

We can now perform force estimation for each individual point, using the Barnes-Hut approximation to limit the total number of comparisons!

To assess the performance of the Barnes-Hut approximation, let’s look at
both the running time and accuracy of force estimation. We will compare
naïve (*n*^{2}) calculation to different settings of the theta
parameter.

We will take measurements using different point sets, ranging from 500 to 10,000 points. For each point count, we average the results from 50 separate runs of force estimation, each time using a different set of points placed at uniformly random coordinates within a 900 x 500 pixel rectangle.

The running time results confirm that the Barnes-Hut approximation can significantly speed-up computation. As expected, the naïve approach exhibits a quadratic relationship, whereas increasing the theta parameter leads to faster calculations. A low setting of theta=0.5 does not fare better than the naïve approach until processing about 6,000 points. Until that point, the overhead of quadtree construction and center of mass calculation outstrips any gains in force estimation. For theta=1 and theta=1.5, however, we see a significant improvement in running time, with similar performance for each.

To evaluate approximation error, we measure the average vector distance between the results of the naïve scheme and Barnes-Hut. In the context of a force-directed graph layout, this error represents the difference (in pixels) between node positions after applying the naïve and approximate methods.

Looking at the error results, we first see that the average
error is relatively small: only ~5% of a single pixel in
difference! However, we should take care in interpreting these
results, as we use the *average* error per point and
the *maximum* error may be substantially higher. While theta=1 and theta=1.5 exhibit similar *running times*,
here we see notably higher *error rates* for theta=1.5
versus theta=1 and theta=0.5.

These results suggest that a good default value for theta
— with low running time *and* low approximation error
— is around 1.0. Indeed, in practice it is common to see default
settings slightly below 1. In
visualization applications, where errors on the order of a few
pixels are not a problem, even higher theta values may be used
without issue.

The Barnes-Hut approximation has had a major impact on both physical simulation and network visualization, enabling n-body calculations to scale to much larger data sets than naïve force calculation permits.

Returning to our initial network diagram, we can use Barnes-Hut to efficiently compute repulsive forces at each timestep. For each animation frame, we perform the approximation anew, creating a new quadtree, accumulating centers of mass, and (approximately) estimating forces.

Want to learn more or apply Barnes-Hut in your own work?

- The popular D3.js library by Mike Bostock includes Barnes-Hut for performing force-directed layout as part of the d3-force module. The examples in this article are based on the D3 implementation.
- For more details, including pseudo-code and performance analysis, Prof. James Demmel of UC Berkeley has online lecture notes on the Barnes-Hut approximation. These notes provided my first introduction to the technique!
- While the error of the Barnes-Hut approximation is acceptable in a variety of applications, sometimes greater precision is needed. In such cases, more complicated schemes such as the Fast Multipole Method can be used in lieu of Barnes-Hut.

This article was created using Idyll, with visualizations powered by D3 and Vega. The source code is available on GitHub.