Robin's Blog

John Snow’s famous cholera analysis data in modern GIS formats

In 1854 there was a massive cholera outbreak in Soho, London – in three days over 120 people died from the disease. Famously, John Snow plotted the locations of the deaths on a map and found they clustered around a pump in Broad Street – he suggested that the pump be taken out of service – thus helping to end the epidemic. This then helped him formulate his theory of the spread of cholera by dirty water.

This analysis is famous as it is often considered to be:

  • The first epidemiological analysis of disease – trying to understand the spread of cases by factors in the environment
  • The first geographical analysis of disease data – plotting points on a map and looking for relationships
Snow’s work is often used as a case study in courses in GIS and the geographies of health. So, I thought – why not convert Snow’s data into a format that will work with modern GIS systems to allow students (and others, of course) to analyse the data themselves with all of the capabilities of modern tools.

So, that’s what I did – and the data is available to download as SnowGIS.zip. All of the data is provided in TIFF (with TFW) or SHP formats, ready for loading in to ArcGIS, QGis, R, or anything else. There is a README in the zip file, but read on below for more details on what’s included.

To create the data I took I a copy of Snow’s original map, georeferenced it to the Ordnance Survey National Grid, warped it to fit correctly, and then digitised the locations of the deaths and the pumps. This allows the data that Snow collected to be overlaid on a modern OS map (click for larger copy):

The pumps are shown in blue, and the size of the red circles indicates the number of deaths at that location. Of course, the data can be overlaid on the original map created by Snow (so you can check I digitised it properly!):

So, that’s basically the data that’s included in the zip file (plus a greyscale version of the OS map to make for easier visualisation in certain circumstances). The question then is – what can you do with it? I’d be very interested to see what you do – but here are a few ideas:
  • How about performing some sort of statistical cluster analysis on the deaths data? Does it identify the correct pump as the source?
  • What if the data were only provided in aggregated form? Lots of healthcare data is provided in that way today because of privacy concerns – but if the data had been provided aggregated to (for example) census output areas or a standard grid, would the right pump have been identified?
So – have fun, and please let me know what you’ve done with the data in the comments (particularly if you do any useful analyses or use it in teaching).

If you found this post useful, please consider buying me a coffee.
This post originally appeared on Robin's Blog.


Categorised as: Academic, GIS, R


50 Comments

  1. Thiago Silva says:

    Good work! This is really interesting, I think it would make for a great lab activity in a spatial analysis course. Thanks for the effort!

  2. Robin Wilson says:

    Glad you like it 🙂 I am in discussions with some of the lecturers in department about using it in teaching. If you (or anyone known to you) does use it then please drop me a line and let me know – I’d be very interested to see how well it works.

  3. Ken K. says:

    Nice idea! If you’re interested in an R application, you can get the Snow data set and many other interesting historical data sets without bothering with a zip file– just install the HistData package, compiled by Michael Friendly. Here’s a blog post we wrote about it: http://sas-and-r.blogspot.com/2011/03/example-832-histdata-package-sunflower.html

    Doesn’t help directly with the other formats, though, I guess.

  4. Cody says:

    I saw this used in a workshop at Yale

    http://guides.library.yale.edu/content.php?pid=29977&sid=1244888

    See the Intermediate GIS Skills Workshop

  5. Dave MacLean says:

    I will definitely use this in my GIS & Spatial Data Analysis course next fall. I’ve used images in that course for years; now I can have some GIS fun with it! Thanks very much.

  6. Robin Wilson says:

    Thanks David – that’s great. Can you drop me an email (robin@rtwilson.com) after the course and let me know what you did and how it went? Thanks.

  7. Max Mitchell says:

    If you have it available, can you provide the R code to get this data in a workable format? I haven’t seen these file formats before.

  8. Giacomo says:

    Nice post. I definitely appreciate historical examples like this.

  9. Robin Wilson says:

    Max,

    The code below will load the data into R in a simple way:

    install.packages(“maptools”)
    library(maptools)
    deaths <- readShapePoints("Cholera_Deaths") pumps <- readShapePoints("Pumps") deaths # Prints out the deaths data - shows all of the attributes (count and ID in this case) plus the X and Y co-ords # Create a simple (horrible) plot of the data plot(deaths, col='red') points(pumps, col='blue') Hope that helps, Robin

  10. Chris says:

    Very nice work, and excellent point about the fetish for privacy around health data.

  11. Dave S says:

    This caught my interest, I’m trying to track down some street lines so I can try running a network analysis and look at what the coverage of this pump might be using that instead of Snow’s initial radius method and the average human walking speed. Think it’d be a neat way to demonstrate the difference between service-area buffers via network analysis and straight buffer to one of my classes. Hoping it’ll capture those cases that sit outside a 250 yard buffer.

  12. Alicia says:

    This is fantastic! I am currently creating lessons and labs for the Geomatics 12 course in NS, Canada and this would work great in the lesson!

  13. Fefe says:

    This is a nice post. Thanks. Please if you have it, can you provide the R codes on how to overlay the Snow data on the OS Map.

  14. Hey, I used this to make a map in our online galleries,

    http://developers.cartodb.com/gallery/maps/johnsnow.html#/example

    It is such a great story and such an awesome piece of history, thanks for posting the data.

    Andrew

  15. […] Cholera analysis data in modern GIS formats, and georeferenced to the British National Grid (see my previous post). Unfortunately, there was an error in some of the attributes of the Cholera Deaths shapefile which […]

  16. Ali says:

    Great post! Thanks for your effort.

    About the story of the map, there is another version. According to Orford [1], Dr Snow enquired ¨…where the people who had died had obtained their drinking water. He quickly isolated the Broad Street pump as the likely source of the outbreak, and he got the handle of the pump removed.

    Dr Snow actually drew his map of cholera some time after the epidemic had abated and definitely after the handle of the pump had been removed. The map formed part of a report written by Dr Snow.”

    I fouhd this amazing, especially because I always was told the other version.

    [1] Orford, 2005. Cartography and visualization. In: Castree, N. et al. Questioning Geography.
    Fundamental Debates.

  17. […] as romantic as the original one but it is really interesting to play with the data. Here you can download some […]

  18. […] save you reading my previous blog posts on the subject, I’ll give a brief overview of my data. John Snow produced a famous […]

  19. […] to Robin Wilson at Southampton University, we have the data. Robin painstakingly georeferenced every cholera death and pump location, so we could recreate the […]

  20. […] to Robin Wilson at Southampton University, we have the data. Robin painstakingly georeferenced every cholera death and pump location, so we could recreate the […]

  21. […] he then digitized the plotted locations of cholera deaths and pumps.  Wilson has made all of the GIS data freely available for downloading in a ZIP file.  The file includes the georeferenced scan of John Snow’s map, shapefiles of the cholera […]

  22. Guenter Richter says:

    Hi, I got to know your work some days ago and had to take the challenge and add another interactive version. Here is the link to page: http://ixmaps.blogspot.it/2013/03/another-interactive-version-of-john.html
    Thank you for posting the data and i hope you enjoy the map

    Guenter Richter

  23. […] Jon Snow’s map of cholera has been celebrated widely anew, with new exhibitions and even a GIS data package from a Southampton University postgraduate researcher, displayed to great effect by the […]

  24. are there available the dates of reported cases somewhere ? I am interested in the spatiotemporal evolution of the disease… let me know.

    thanks

    Matteo

  25. Robin Wilson says:

    Hi Matteo. Unfortunately I don’t know of any temporal data about the evolution of the epidemic – sorry.

  26. Doug Sharp says:

    Tom Koch wrote extensively about Snow’s data in Cartographies of Disease: Maps, Mapping, and Medicine. In particular he discusses the various iterations of analysis that people – ESRI included – have put this data through and points out the changing perspectives we have taken of Snow’s groundbreaking work.

  27. […] Guardian Data Blog” har med hjälp av CartoDB och data frÃ¥n Robert Wilson pÃ¥ Southampton University gjort en interaktiv version av den gamla kartan, för att visa pÃ¥ hur dagens teknik enkelt kan […]

  28. Dr. Snow says:

    hjälp av CartoDB och data från Robert Wilson på Southampton University gjort en

  29. Dr. Snow says:

    Thank you for contributing to my project!

  30. […] last year, the original data used by Snow was made available in GIS format. Being the inquisitive person I am, I decided to run my own analysis on the data. Here’s what […]

  31. […] digitisation of John Snow’s 1854 cholera map has seen side use. A few selected uses include: a Guardian DataBlog article, multiple journal […]

  32. Anna says:

    Hello,

    Could I use the map as part of a blog post around disease prevention?

    I am happy to send you a link once it publishes.

    Thank you,

    Anna

  33. Robin Wilson says:

    Of course – please do, and feel free to comment here with a link afterwards.

  34. Alfredo Conetta says:

    Robin, thanks for the data, I have used placed it all in a geodatabase and will definitely use it in my deliveries of training. I’m not to far from the location so may see if there are any medical diaries or such like that could provide temporal information for some swish visualisation. No quick turn around unfortunately as I’m busy at the moment.

  35. […] build a point map using John Snow’s famous cholera map.  Go to Robin Wilson’s Blog, who created and made available the Shapefiles for his recreation.  Download the SnowGIS.zip file […]

  36. […] ¹¹https://blog.rtwilson.com/john-snows-famous-cholera-analysis-data-in-modern-gis-formats […]

  37. The daily number of cholera deaths (Aug-Sept 1854) is available in Shiode et al, 2015 (Erratum): https://ij-healthgeographics.biomedcentral.com/articles/10.1186/s12942-015-0016-6
    Eventually, I expect to include a timeline in my project (in Spanish): http://www.cerebroperiferico.com/projects/colera/
    Best,
    Daniel

  38. Jamer Costa says:

    Initially, I would like to congratulate you on the initiative. I got your data and did a tutorial using the QGIS 3.0 program. in Portuguese. Thanks!

    https://youtu.be/paRLugrCFrA

  39. chrisarp says:

    this is very interesting. and well suited for my purpose….but I would like to know if Snow’s map for the cholera is a map for hypothesis or Exploratory Spatial Data Analysis?

  40. Dave Unwin says:

    Hi
    These data were digitised many years ago by Rusty Dodson at Santa Barbara and used by the late, great Waldo Tobler to create a really neat teaching package in (if I recall GWBASIC!) I have used them for years in my teaching and for the cover of a text book but came across your excellent resource when I thought about putting them into OSGB as you have. Most of the popular accounts about ending the epidemic by closing down the pump are wrong. To find out more check out Steven Johnson’s excellent book on the entire affair (THE GHOST MAP) or almost anything on the same issue by Tom Koch, including his heartfelt plea that we allow “John Snow – RIP”. There is work by Allan Brimicombe on the time sequence but as far as I am aware the data never had a time mark attached to them and Allan had to fudge things a bit. As to analysis, even a crow’s flying distance based KDE locates a single hot spot almost exactly over the infected pump’s location and after a Voronoi tesselation around the pumps one finds that all the cases (bar one) fall into the tile qround the infected pump. A third approach is simply to use distances from all cases to all pumps.

  41. I used Q-GIS to show this map as an example in the course “GIS-analysis” that we had here in Nordland in Norway last week. Than you very much!

  42. wayne winston says:

    Thanks for this great analysis. I wrote a chapter in an upcoming book on Snow and I will mail you a copy when it comes out.

  43. Robin Wilson says:

    That’s great Wayne – I look forward to receiving it!

  44. SIBILA LILIAN OSIS says:

    Hi Robin Wilson,

    I would like authorization to use the image (with credits) and also quote access to your blog. I am preparing epidemiology material for a course in Amazonas (Brazil). I am very grateful if you grant permission.
    Thanks

  45. Robin Wilson says:

    Sure – just make sure you credit me and give a link back to the blog post.

  46. Miriam says:

    Thank you so much. It´s very kind of you to share it,

  47. Stephen Neville says:

    I’ve been georectifying in QGIS and then came across your link. I thought great GIS data!! However it seems the ZIP file is no longer there in 2022. Can you help please?

  48. […] (You can find a copy of these data on J:Current_ProjectsSnowGIS.  These data were created by Robin Wilson) […]

  49. […] digitisation of John Snow’s 1854 cholera map has seen wide use. A few selected uses include: a Guardian DataBlog article, multiple journal […]

Leave a Reply

Your email address will not be published. Required fields are marked *