Adrien Corenflos about 1 year ago
Parallel scans accumulate sequences on GPUs (or other parallel hardware) at logarithmic cost in the size of the input. The canonical example is cumulative sums (a, a+b, a+b+c, ...) from an input (a, b, c, ...), but this is hardly the only use, and, e.g., Kalman filtering can be handled in parallel.