Robin's Blog

John Snow’s famous cholera analysis data in modern GIS formats

In 1854 there was a massive cholera outbreak in Soho, London – in three days over 120 people died from the disease. Famously, John Snow plotted the locations of the deaths on a map and found they clustered around a pump in Broad Street – he suggested that the pump be taken out of service – thus helping to end the epidemic. This then helped him formulate his theory of the spread of cholera by dirty water.

This analysis is famous as it is often considered to be:

  • The first epidemiological analysis of disease – trying to understand the spread of cases by factors in the environment
  • The first geographical analysis of disease data – plotting points on a map and looking for relationships
Snow’s work is often used as a case study in courses in GIS and the geographies of health. So, I thought – why not convert Snow’s data into a format that will work with modern GIS systems to allow students (and others, of course) to analyse the data themselves with all of the capabilities of modern tools.

So, that’s what I did – and the data is available to download as SnowGIS.zip. All of the data is provided in TIFF (with TFW) or SHP formats, ready for loading in to ArcGIS, QGis, R, or anything else. There is a README in the zip file, but read on below for more details on what’s included.

To create the data I took I a copy of Snow’s original map, georeferenced it to the Ordnance Survey National Grid, warped it to fit correctly, and then digitised the locations of the deaths and the pumps. This allows the data that Snow collected to be overlaid on a modern OS map (click for larger copy):

The pumps are shown in blue, and the size of the red circles indicates the number of deaths at that location. Of course, the data can be overlaid on the original map created by Snow (so you can check I digitised it properly!):

So, that’s basically the data that’s included in the zip file (plus a greyscale version of the OS map to make for easier visualisation in certain circumstances). The question then is – what can you do with it? I’d be very interested to see what you do – but here are a few ideas:
  • How about performing some sort of statistical cluster analysis on the deaths data? Does it identify the correct pump as the source?
  • What if the data were only provided in aggregated form? Lots of healthcare data is provided in that way today because of privacy concerns – but if the data had been provided aggregated to (for example) census output areas or a standard grid, would the right pump have been identified?
So – have fun, and please let me know what you’ve done with the data in the comments (particularly if you do any useful analyses or use it in teaching).

Categorised as: Academic, GIS, R


28 Comments

  1. Thiago Silva says:

    Good work! This is really interesting, I think it would make for a great lab activity in a spatial analysis course. Thanks for the effort!

  2. Robin Wilson says:

    Glad you like it :-) I am in discussions with some of the lecturers in department about using it in teaching. If you (or anyone known to you) does use it then please drop me a line and let me know – I’d be very interested to see how well it works.

  3. Ken K. says:

    Nice idea! If you’re interested in an R application, you can get the Snow data set and many other interesting historical data sets without bothering with a zip file– just install the HistData package, compiled by Michael Friendly. Here’s a blog post we wrote about it: http://sas-and-r.blogspot.com/2011/03/example-832-histdata-package-sunflower.html

    Doesn’t help directly with the other formats, though, I guess.

  4. Cody says:

    I saw this used in a workshop at Yale

    http://guides.library.yale.edu/content.php?pid=29977&sid=1244888

    See the Intermediate GIS Skills Workshop

  5. Dave MacLean says:

    I will definitely use this in my GIS & Spatial Data Analysis course next fall. I’ve used images in that course for years; now I can have some GIS fun with it! Thanks very much.

  6. Robin Wilson says:

    Thanks David – that’s great. Can you drop me an email ([email protected]) after the course and let me know what you did and how it went? Thanks.

  7. Max Mitchell says:

    If you have it available, can you provide the R code to get this data in a workable format? I haven’t seen these file formats before.

  8. Giacomo says:

    Nice post. I definitely appreciate historical examples like this.

  9. Robin Wilson says:

    Max,

    The code below will load the data into R in a simple way:

    install.packages(“maptools”)
    library(maptools)
    deaths <- readShapePoints(“Cholera_Deaths”)
    pumps <- readShapePoints(“Pumps”)
    deaths # Prints out the deaths data – shows all of the attributes (count and ID in this case) plus the X and Y co-ords
    # Create a simple (horrible) plot of the data
    plot(deaths, col=’red’)
    points(pumps, col=’blue’)

    Hope that helps,

    Robin

  10. Chris says:

    Very nice work, and excellent point about the fetish for privacy around health data.

  11. Dave S says:

    This caught my interest, I’m trying to track down some street lines so I can try running a network analysis and look at what the coverage of this pump might be using that instead of Snow’s initial radius method and the average human walking speed. Think it’d be a neat way to demonstrate the difference between service-area buffers via network analysis and straight buffer to one of my classes. Hoping it’ll capture those cases that sit outside a 250 yard buffer.

  12. Alicia says:

    This is fantastic! I am currently creating lessons and labs for the Geomatics 12 course in NS, Canada and this would work great in the lesson!

  13. Fefe says:

    This is a nice post. Thanks. Please if you have it, can you provide the R codes on how to overlay the Snow data on the OS Map.

  14. Hey, I used this to make a map in our online galleries,

    http://developers.cartodb.com/gallery/maps/johnsnow.html#/example

    It is such a great story and such an awesome piece of history, thanks for posting the data.

    Andrew

  15. [...] Cholera analysis data in modern GIS formats, and georeferenced to the British National Grid (see my previous post). Unfortunately, there was an error in some of the attributes of the Cholera Deaths shapefile which [...]

  16. Ali says:

    Great post! Thanks for your effort.

    About the story of the map, there is another version. According to Orford [1], Dr Snow enquired ¨…where the people who had died had obtained their drinking water. He quickly isolated the Broad Street pump as the likely source of the outbreak, and he got the handle of the pump removed.

    Dr Snow actually drew his map of cholera some time after the epidemic had abated and definitely after the handle of the pump had been removed. The map formed part of a report written by Dr Snow.”

    I fouhd this amazing, especially because I always was told the other version.

    [1] Orford, 2005. Cartography and visualization. In: Castree, N. et al. Questioning Geography.
    Fundamental Debates.

  17. [...] as romantic as the original one but it is really interesting to play with the data. Here you can download some [...]

  18. [...] save you reading my previous blog posts on the subject, I’ll give a brief overview of my data. John Snow produced a famous [...]

  19. [...] to Robin Wilson at Southampton University, we have the data. Robin painstakingly georeferenced every cholera death and pump location, so we could recreate the [...]

  20. [...] to Robin Wilson at Southampton University, we have the data. Robin painstakingly georeferenced every cholera death and pump location, so we could recreate the [...]

  21. [...] he then digitized the plotted locations of cholera deaths and pumps.  Wilson has made all of the GIS data freely available for downloading in a ZIP file.  The file includes the georeferenced scan of John Snow’s map, shapefiles of the cholera [...]

  22. Guenter Richter says:

    Hi, I got to know your work some days ago and had to take the challenge and add another interactive version. Here is the link to page: http://ixmaps.blogspot.it/2013/03/another-interactive-version-of-john.html
    Thank you for posting the data and i hope you enjoy the map

    Guenter Richter

  23. [...] Jon Snow’s map of cholera has been celebrated widely anew, with new exhibitions and even a GIS data package from a Southampton University postgraduate researcher, displayed to great effect by the [...]

  24. are there available the dates of reported cases somewhere ? I am interested in the spatiotemporal evolution of the disease… let me know.

    thanks

    Matteo

  25. Robin Wilson says:

    Hi Matteo. Unfortunately I don’t know of any temporal data about the evolution of the epidemic – sorry.

  26. Doug Sharp says:

    Tom Koch wrote extensively about Snow’s data in Cartographies of Disease: Maps, Mapping, and Medicine. In particular he discusses the various iterations of analysis that people – ESRI included – have put this data through and points out the changing perspectives we have taken of Snow’s groundbreaking work.

  27. […] Guardian Data Blog” har med hjälp av CartoDB och data från Robert Wilson på Southampton University gjort en interaktiv version av den gamla kartan, för att visa på hur dagens teknik enkelt kan […]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>