## Calculating percentiles in Python – use numpy not scipy!

This is just a brief public service announcement reporting something that I’ve just found: `np.percentile` is **a ****lot faster** than `scipy.stats.scoreatpercentile` – almost an order of magnitude faster in some cases.

Someone recently asked me why on earth I was using scoreatpercentile anyway – and it turns out that np.percentile was only added in numpy 1.7, which was released part-way through my PhD in Feb 2013, hence why the scipy function is used in some of my code.

In my code I frequently calculate percentiles from satellite images represented as large 2D numpy arrays – and the speed differences can be quite astounding:

Image size |
scoreatpercentile |
percentile |
speedup |

100 | 595us | 169us | 3.5x |

1000 | 84ms | 13ms | 6.5x |

3000 | 927ms | 104ms | 9x |

8000 | 8s | 1s | 8x |

As you can see, we get 3-4 times speedup for even small arrays (100 x 100, so 10,000 elements), and up to 8-9 times speedup for large arrays (tens of millions of elements).

Anyway, the two functions have very similar signatures and options – the only thing missing from `np.percentile` is the ability to set hard upper or lower limits – so it should be fairly easy to switch over, and it’s worth it for the speed boost!

*If you found this post useful, please consider buying me a coffee.*

This post originally appeared on Robin's Blog.

**Categorised as:** Programming, Python

One big difference is that scipy does linear interpolation, and not numpy. Have you tried disabling linear interpolation (for instance putting interpolation_method=’lower’)?