Tuesday, November 28, 2017

Multi-colored Gauss-Seidel for projective dynamics

Graph coloring is a tough problem but useful in a wide range of applications. I'm interested in multi-color Gauss-Seidel (MCGS), in which solving for the unknowns in Ax=b can be accelerated with parallelism. The general idea is that a graph (the matrix A) is partitioned into independent nodes (equations/rows) as a part of a set (color). We loop over the colors and perform an x-update on each node in the color in parallel.

Recently, Fratarcangeli et al. used this approach to accelerate the kind of linear solves that show up in graphics (e.g. position-based dynamics and projective dynamics) with a solver called Vivace. The performance is reasonable when compared to a pre-factored linear solve, given the fact that it conquers the biggest limitation in projective dynamics: reliance on a constant constraint manifold.

In projective dynamics, energy terms used to simulate elasticity were assumed constant. So when we want stuff like tearing or remeshing, the matrix that represents these energy terms had to be refactored (a costly operation). The bigger problem with this was dealing with collisions, which are represented as repulsion-springs. There are some hacks to avoid this overhead, but they have their own set of limitations that I won't get into here. Using MCGS, adding and removing repulsion springs for collisions only requires a recoloring of the matrix. So, if we can recolor fast, we can have nice, fast, dynamic simulations. Fratarcangeli et al. compare several graph coloring algorithms in their paper. They found a method by Grable and Panconesi to perform best given certain considerations (run time and number of colors).

We use this approach in our recent TVCG publication. The code (including the graph coloring) is finally up on my github.

In my implementation, the graph coloring has a "stride", allowing you to skip rows/cols of the matrix. The motivation for a stride is when your matrix has redundant rows/cols (arising from x, y, z constraint independence in projective dynamics), but wanting to store your vertices as a single column: (x,y,z)^T. The recent paper by Liu et al. explains it further (the text following Eq. 3) in which they avoid these redundant coefficients. It's on my to-do list to refactor the code and remove the redundant coeffs, since we tend to stick to isotropic models anyway.

Friday, May 12, 2017

1D Clustering with KDE

Kernel Density Estimation (KDE) is a useful technique for clustering one-dimensional data. For example, I recently implemented an interface for clustered parallel coordinates, in which I needed to cluster about 600k variables at the click of a button:

Of course, KDE requires a lot of expensive operations. Making the above happen in a few seconds required a few efficiency tricks at the cost of perfectly accurate clusters. So, I put together this mini-tutorial to explain my approach.

What is Kernel Density Estimation?

KDE helps identify the density of a distribution of data. That is, it can be useful in finding where a lot of data is grouped together and where it isn't. Naturally, it can be used for 1D clustering by creating clusters about the points of highest density (local maxima), separated by the points of lowest density (local minima). There are many great online tutorials about KDE, and I recommend familiarizing yourself before moving on.

1D clustering with KDE can be done in roughly 4 steps:
  1. Normalize data (0 to 1) and sort
  2. Compute densities
  3. Find local maxima
  4. Find minima and cluster
For all of my examples I'll be using Matlab with the final script linked at the bottom. Because it's Matlab it will be slow. However, the methods translate well to other languages.

1.) Normalize data and sort

With standard KDE you don't need to sort, because density is calculated from every other point in your dataset. Since the following section makes use of a limited neighborhood, sorting is necessary.

2) Compute densities

This is the only step that requires a little bit of work and some knowledge of KDE. It's also where we can start taking shortcuts for faster clustering.

The general idea is for a given point along the data, we compare the distance of that point to its neighbors. Neighbors that are really close add more to the density and neighbors that are far away add much less. How much they add is dependent on the choice of a smoothing kernel with radius h. A good choice for h is the Silverman's rule of thumb: h=std(x)*(4/3/n)^(1/5). For example, the density at point i, data x, number of elements n, and often-used Gaussian/normal kernel would be:

    sum = 0;
    for j = 1:n
        v = ( x(i) - x(j) )/h;

        sum = sum + exp(-0.5*v*v) / sqrt(2*pi);
    density = sum/(n*h);

For full KDE, that's an unseemly n^2 checks. To reduce the workload, we can make two simplifications:
  1. Compute density in bins instead of every point
  2. Check only nearby neighbors that have a greater impact on density
For a small dataset this is a little unnecessary. But consider 600k with 10 bins and checking 100 neighbors: you've reduced the loop from 360 billion to 1000. The choice of number of bins and neighbors is dependent on application. Less bins gives you less clusters. Less neighbors will dramatically improve run time but may give you noisier results, especially when an ignored neighbor would add a not-so-insignificant amount to the density. Sometimes is sufficient to add a round of smoothing, where the density of a bin is recomputed as the average of itself and left and right bins. The following code snippet applies these simplifications:

h = std(x) * (4/(3*n))^(1/5);
nn = 10; % number of neighbors to check on either side
nb = 10; % number of bins
bh = 1/10; % bin step size
bins = zeros(nb,1); % nb bins for computing density
for i = 1:n

    bin = ceil(x(i)/bh); % bin index
    sum = 0;
    for j = max(1,i-nn):min(n,i+nn)
        v = (x(i)-x(j))/h;

        sum = sum + exp( -0.5*v*v ) / sqrt(2*pi);
    dens = sum/(2*nn*h);
    bins(bin) = bins(bin) + dens;
for i = 2:nb-1 % Smoothing
    bins(i) = (bins(i-1)+bins(i)+bins(i+1))/3;

Of course, the bins aren't exactly centered at the points of highest density. For a lot of applications this is perfectly fine. Plotting the resulting density and data gives us:


3) Find local maxima

Local maxima are the peaks of the curves in the above plot. They can easily be discovered by checking the bins on either side. If it's the largest, it's a local maximum! If your curve is jaggy with too many maxima, you'll need to increase the number of neighbors sampled when computing density. Play with the tunable variables until you have something that looks nice:


4) Find local minima

The minima are simply the bins of lowest density between two maximums:

Local minima define where the clusters split. In the above example, we can see the data is split into three clusters.

And that's it. The script I used to create the plots above can be found right here.

Friday, January 6, 2017

Optimization-Based Material Point Method

The Material Point Method (MPM) is a powerful and interesting technique for physics simulation. MPM is great for continuum substances like fluids and soft bodies, as well as for dealing with complex problems like fracture and self collisions. It has received a lot of recent attention from Disney, most notably the snow simulation and phase change/heat transfer papers that appeared in SIGGRAPH.

Recently I coded up optimization based implicit MPM to test out some research ideas. My ideas didn't work out, but I put the code up on github instead of letting it collect dust on my hard drive. There are many graphics-oriented MPM implementations online, but very few of them use implicit integration, and even less (any?) use optimization.

I'm going to skip a lot of the mathematical details in this post. I feel they are better covered by the referenced papers. Instead, I'll give just enough to understand the code.

Material Point Method

The basic idea is that particles move around the domain and carry mass. When it's time to do time integration, the particle mass/momentum is mapped to a grid. New velocities are computed on the grid, then mapped back to the particles, which are then advected. The grid essentially acts as a scratch pad for simulation that is cleared and reset each time step. For a (better) background on MPM, read the introduction chapter from the SIGGRAPH course notes.

Time integration

The best starting point for computing new velocities is explicit time integration. Though explicit integration is fast to compute and easy to understand, it has problems with stability and overshooting. If we take too large of a time step, the particle goes past the physically-correct trajectory. Intermediate techniques like midpoint method can alleviate such problems, but the silver bullet is implicit integration.

With implicit integration we figure out what the velocity needs to be to make the particles to end up where they should. The hard part is that we now have both unknowns in the velocity calculation, which requires solving a system of nonlinear equations. This is computationally expensive and time consuming, especially for large and complex systems.

Fortunately, many important forces (like elasticity) are conservative. Mathematically, that means we can represent the forces as the negative gradient of potential energy. This lets us do implicit integration through a technique called optimization. This makes it a bit easier to formulate and can have better performance than a lot of other implicit solvers.

By representing our forces as energy potentials, we can do implicit time integration by minimizing total system energy. The concept itself is not new, but was recently applied to Finite Elements/MPM in an excellent paper by Gast et al.  Section 6 of the paper covers the algorithm for optimization-based MPM. One thing to keep in mind: in MPM the velocity calculations happen on grid nodes, not grid boundaries.


Mapping to/from the grid is an important part of the material point method. MPM was originally an extension of Fluid Implicit Particle (FLIP), which in turn was an extension of the Particle in Cell (PIC) method. PIC and FLIP are the standard in fluid simulation, and were used in the above Gast paper. Mike Seymour wrote a good article on fxguide that covers what PIC/FLIP is and why they are useful. However, they suffer from a few well known limitations. Namely, PIC causes things to "clump" together, and FLIP can be unstable.

A technique called the Affine Particle In Cell (APIC) was recently introduced to combat these limitations. It's more stable and does a better job at conserving momentum. In short, it's worth using. To do so, you only need to make a few subtle changes to the MPM algorithm. The SIGGRAPH course notes explain what to change, but the Gast paper does not.


Everything you need to know to code up MPM can be found in the SIGGRAPH course notes. If you've ever done fluid simulation with PIC/FLIP, it should be straight forward. If not, here are some things I found useful:

1. It helps with stability to have a cell size large enough to have 8-16 particles.

2. Keep a fully-allocated grid as the domain boundary, but do minimization on a separate "active grid" list made up of pointers to grid nodes.

3. A "make grid loop" function for particle/grid mapping: given a particle, return the indices of grid nodes within the kernel radius, as well as kernel smoothing values/derivatives. This was probably the trickiest part of the whole method. Much of this can be preallocated if you have a lot of memory, but I found (in 3D) allocating/deallocating on the spot was necessary.

Another thing to consider is what algorithm to use for minimization. I'm too lazy to deal with Hessians, so I generally stick to L-BFGS and Conjugate Gradient. Optimization is widely used across disciplines, so there are some open-implementations you may want to consider using. I've found that Pat Wie's CppOptimizationLibrary works well as a go-to for most of my test projects, because I like the interface and the Eigen3 compatibility. It also includes a lot of nice things like finite difference gradient/hessian, line search algorithms, etc... However, there are some issues with its L-BFGS and CG. For some of my applications, they failed to converge (although that may have been fixed in recent revisions, I haven't checked). Instead I use an L-BFGS implementation by Ioannis Karamouzas that I modified for Pat Wie's library (included in my MPM github repo).


Since the code was just to test out some ideas, I didn't put any effort into rendering. Instead, I just visualized the particle locations using VDB. If you haven't used VDB, it's a pretty cool tool for drawing out primitives in fresh simulation code. All you have to do is run the viewer, include the header, then call a few functions (like "vdb_point(x,y,z)") to draw stuff.