Adrien Corenflos 11 months ago
Parallel scans accumulate sequences on GPUs (or other parallel hardware) at logarithmic cost in the size of the input. The canonical example is cumulative sums (a, a+b, a+b+c, ...) from an input (a, b, c, ...), but this is hardly the only use, and, e.g., Kalman filtering can be handled in parallel.