John Snow’s famous cholera analysis data in modern GIS formats
In 1854 there was a massive cholera outbreak in Soho, London – in three days over 120 people died from the disease. Famously, John Snow plotted the locations of the deaths on a map and found they clustered around a pump in Broad Street – he suggested that the pump be taken out of service – thus helping to end the epidemic. This then helped him formulate his theory of the spread of cholera by dirty water.
This analysis is famous as it is often considered to be:
- The first epidemiological analysis of disease – trying to understand the spread of cases by factors in the environment
- The first geographical analysis of disease data – plotting points on a map and looking for relationships
So, that’s what I did – and the data is available to download as SnowGIS.zip. All of the data is provided in TIFF (with TFW) or SHP formats, ready for loading in to ArcGIS, QGis, R, or anything else. There is a README in the zip file, but read on below for more details on what’s included.
To create the data I took I a copy of Snow’s original map, georeferenced it to the Ordnance Survey National Grid, warped it to fit correctly, and then digitised the locations of the deaths and the pumps. This allows the data that Snow collected to be overlaid on a modern OS map (click for larger copy):
The pumps are shown in blue, and the size of the red circles indicates the number of deaths at that location. Of course, the data can be overlaid on the original map created by Snow (so you can check I digitised it properly!):
- How about performing some sort of statistical cluster analysis on the deaths data? Does it identify the correct pump as the source?
- What if the data were only provided in aggregated form? Lots of healthcare data is provided in that way today because of privacy concerns – but if the data had been provided aggregated to (for example) census output areas or a standard grid, would the right pump have been identified?
Nice idea! If you’re interested in an R application, you can get the Snow data set and many other interesting historical data sets without bothering with a zip file– just install the HistData package, compiled by Michael Friendly. Here’s a blog post we wrote about it: http://sas-and-r.blogspot.com/2011/03/example-832-histdata-package-sunflower.html
Doesn’t help directly with the other formats, though, I guess.
I saw this used in a workshop at Yale
http://guides.library.yale.edu/content.php?pid=29977&sid=1244888
See the Intermediate GIS Skills Workshop
I will definitely use this in my GIS & Spatial Data Analysis course next fall. I’ve used images in that course for years; now I can have some GIS fun with it! Thanks very much.
If you have it available, can you provide the R code to get this data in a workable format? I haven’t seen these file formats before.
Nice post. I definitely appreciate historical examples like this.
Very nice work, and excellent point about the fetish for privacy around health data.
This caught my interest, I’m trying to track down some street lines so I can try running a network analysis and look at what the coverage of this pump might be using that instead of Snow’s initial radius method and the average human walking speed. Think it’d be a neat way to demonstrate the difference between service-area buffers via network analysis and straight buffer to one of my classes. Hoping it’ll capture those cases that sit outside a 250 yard buffer.
This is fantastic! I am currently creating lessons and labs for the Geomatics 12 course in NS, Canada and this would work great in the lesson!
This is a nice post. Thanks. Please if you have it, can you provide the R codes on how to overlay the Snow data on the OS Map.
Hey, I used this to make a map in our online galleries,
http://developers.cartodb.com/gallery/maps/johnsnow.html#/example
It is such a great story and such an awesome piece of history, thanks for posting the data.
Andrew
Great post! Thanks for your effort.
About the story of the map, there is another version. According to Orford [1], Dr Snow enquired ¨…where the people who had died had obtained their drinking water. He quickly isolated the Broad Street pump as the likely source of the outbreak, and he got the handle of the pump removed.
Dr Snow actually drew his map of cholera some time after the epidemic had abated and definitely after the handle of the pump had been removed. The map formed part of a report written by Dr Snow.”
I fouhd this amazing, especially because I always was told the other version.
[1] Orford, 2005. Cartography and visualization. In: Castree, N. et al. Questioning Geography.
Fundamental Debates.
Hi, I got to know your work some days ago and had to take the challenge and add another interactive version. Here is the link to page: http://ixmaps.blogspot.it/2013/03/another-interactive-version-of-john.html
Thank you for posting the data and i hope you enjoy the map
Guenter Richter
Good work! This is really interesting, I think it would make for a great lab activity in a spatial analysis course. Thanks for the effort!