Pandas-FSDR: a simple function for finding significant differences in pandas DataFrames

September 12, 2023

In the spirit of my Previously Unpublicised Code series, today I’m going to share Pandas-FSDR. This is a simple library with one function which finds significant differences between two columns in a pandas DataFrame.

For example, imagine you had the following data frame:

Subject	UK	World
Biology	50	40
Geography	75	80
Computing	100	50
Maths	1500	1600

You may be interested in the differences between the values for the UK and the World (these could be test scores or something similar). Pandas-FSDR will tell you – by running one function you can get output like this:

Maths is significantly smaller for UK (1500 for UK compared to 1600 for World)
Computing is significantly larger for UK (100 for UK compared to 50 for World)

Differences are calculated in absolute and relative terms, and all thresholds can be altered by changing parameters to the function. The function will even output pre-formatted Markdown text for display in an IPython notebook, inclusion in a dashboard or similar. The output above was created by running this code:

result = FSDR(df, 'UK', 'World', rel_thresh=30, abs_thresh=75)

This is a pretty simple function, but I thought it might be worth sharing. I originally wrote it for some contract data science work I did years ago, where I was sharing the output of Jupyter Notebooks with clients directly, and wanted something that would ‘write the text’ of the comparisons for me, so it could be automatically updated when I had new data. If you don’t want it to write anything then it’ll just output a list of row indices which have significant differences.

Anyway, it’s nothing special but someone may find it useful.

The code and full documentation are available in the Pandas-FSDR Github repository

If you found this post useful, please consider buying me a coffee.
This post originally appeared on Robin's Blog.

Tagged with:

Categorised as: Previously Unpublicised Code, Programming, Python

Pandas-FSDR: a simple function for finding significant differences in pandas DataFrames

Leave a Reply