Sphinx is a great tool for documenting Python programs (and lots of other things – I recently saw a lecturer who had done all of his lecture notes using Sphinx!) and I’ve used it for my new project (which will be announced on this blog in the next few days). Now that the project is near release, I wanted to get the documentation onto ReadTheDocs to make it nice and easily accessible (and so that it can be easily built every time I commit to GitHub).
The theory is that you just point ReadTheDocs to your GitHub repository, it finds the Sphinx conf.py file, and all is fine. However, if you use any module outside of the standard library, and you’re using the Sphinx autodoc module, then it will fail to compile the documentation. This is because the Python code that you are documenting needs to be able to be imported for autodoc to work, and if you are trying to import a module that doesn’t exist by default on a Python install then an error will be produced.
The ReadTheDocs FAQ says that you can setup a pip_requirements file to install any modules that are needed for your code, but this won’t work for any modules that include C code. This is understandable – as ReadTheDocs don’t want any random C code executing on their server – but it means that trying to build the docs for any code that uses numpy, scipy or matplotlib (or many other modules) will fail.
The FAQ suggests how to solve this – using a so-called ‘mock’. This is an object that pretends to be one of these modules, so that it can be imported, but doesn’t actually do anything. This doesn’t matter as it is not normally necessary to actually run the code to produce the docs, just to be able to import it. However, the code that is provided by ReadTheDocs doesn’t work for any modules that you import using the * operator – for example, from matplotlib import *. After asking a StackOverflow question, I found how to fix this for the code that ReadTheDocs provide, but a comment suggested a far easier way to do this, simply add code like the following to the top of your conf.py file:
MOCK_MODULES = ['numpy', 'scipy', 'matplotlib', 'matplotlib.pyplot', 'scipy.interpolate']
for mod_name in MOCK_MODULES:
sys.modules[mod_name] = mock.Mock()
In the MOCK_MODULES list put the names of all of the modules that you import. It is important to list submodules (such as matplotlib.pyplot) as well as the main modules. After committing your changes and pushing to GitHub, you should find that your docs compile properly.
While processing some data at work today I had an issue where I had a raster dataset in ArcGIS, where all cells with invalid data had been set to 9999. Of course, this caused a lot of issues for the statistics on the dataset – basically they were all nonsense – so I needed to fix it. I couldn’t seem to get reclass to work properly with floating point data, so I posted a question to the GIS StackExchange site. Within a few minutes I had a couple of answers: one suggesting a possible way using the interface, and one suggesting a method using the ArcPy API.
However, I managed to find a way to do this easily using the approach recommended for ArcPy in the response from Aragon (thanks!), but using the GUI interface. To do this, follow the instructions below:
1. Run the Spatial Analyst -> Conditional -> Set Null tool
2. Set the input conditional raster to be the raster dataset which you want to change
3. Set the expression to VALUE = 9999 where 9999 is the value that you want to replace with NoData.
4. Set the Input false raster or constant value to the same raster dataset that you select in step 2.
5. Choose a sensible output raster. The dialog should look roughly like this (click for more detail):
6. Click OK
It’s done! Your new raster should have all of the values of 9999 replaced with NoData, and your statistics should work fine.
In a project recently I was struggling to find a way to parse strings that contain a date range, for example:
27th-29th June 2010
Tuesday 29 May -> Sat 2 June 2012
From 27th to 29th March 1999
None of the Python modules I investigated (including parsedatetime) seemed to be able to cope with the range of strings that I had to deal with. I investigated patching parsedatetime to allow it to do what I wanted, but I found it very hard to get into the code. So, I thought, why not write my own…
So I did, and I’ve released it under the LGPL and you can install it right now by running:
The current version will parse a wide range of formats (see the examples in the documentation) and will deal with individual dates as well as date ranges. The API is very simple – just import the parse method and run it, giving the date range string as an argument. For example:
from daterangeparser import parse
print parse("14th-19th Feb 2010")
This will produce an output tuple with two datetime objects in it: the start and end date of the range you gave.
The parser is built using PyParsing – a great Python parsing framework that I have found very easy to get to grips with. It is incredibly powerful, very easy to use, and really shows how limited regular expressions can be! Now that I’ve done this I have an urge to use PyParsing to write parsers for all of the horrible scientific data formats that I have to deal with in my PhD….watch this space!
In my field I often need to plot polar contour plots, and generally plotting tools don’t make this easy. In fact, I think I could rate every single graphing/plotting package in the world by the ease of producing a polar contour plot – and most would fail entirely! Still, I have managed to find a fairly nice way of doing this using my plotting package of choice: matplotlib.
I must warn you first – a Google search for matplotlib polar contour or a similar search term will produce a lot of completely out-dated answers. The most commonly found answers are those such as this StackOverflow question and this forum post. In fact, the first question was asked by me last year – and got an answer which is now completely out of date. Basically, all of these answers tell you that you can’t do a polar contour plot directly in matplotlib, and you must convert your points from polar co-ordinates to cartesian co-ordinates first. This isn’t difficult, but is a pain to do, and of course you then end up with cartesian axes which doesn’t look great. The great news is that you can now do polar contour plots directly with matplotlib!
So, how do you do them? Simple really, you just create some polar axes and plot a contour plot on them:
This produces a filled contour plot, as it uses the contourf function, using the contour function would give simple contour lines. The first three parameters which must be given to this function are all two-dimensional arrays containing: the radii, the angles (theta) and the actual values to contour. The final parameter is the number of contour levels to plot – you tend to want lower numbers for line contours and higher numbers for filled contour plots (to get a smooth look).
I never quite understood these two-dimensional arrays, and why they were needed. I normally had my data in the form of three lists that were basically columns of a table, where each row of the table defined a point and value. For example:
Each of these rows define a point – for example, the first row defines a point with a radius of 10, an angle of 0 degrees and a value of 0.7. I could never understand why the contour function didn’t just take these three lists and plot me a contour plot. In fact, I’ve written a function that will do just that, which I will describe below, but first let me explain how those values are converted to two-dimensional arrays.
First of all, lets think of the dimensions: we obviously have dimensions here in our data, the radius and the angle. In fact, we could re-shape our values array so that it is two-dimensional fairly easily. We can see from the table above that we are doing all the azimuth angles for a radius of 10 degrees, then the same azimuth angles for a radius of 20 degrees, etc. Thus, rather than our values being stored in a one-dimensional list, we could put them in a two-dimensional table where the columns are the azimuth angles, and the rows are the radii:
This is exactly the sort of two dimensional array that we need to give to the contourf function. That’s not too hard to understand – but why on earth do the radii and angle arrays have to be two-dimensional too. Well, basically we just need two arrays like the one above, but with the relevant radii and angles in the cells, rather than the values. So, for the angles, we’d have:
And for the radii we’d have:
Then, when we take all three arrays together, each cell will define the three bits of information we need. So, the top left cell gives us an angle of 0, a radius of 10 and a value of 0.7. Luckily, you don’t have to make these arrays by hand – a handy NumPy function called meshgrid will do it for you:
One thing to remember is that the plotting function requires the angle (theta) in radians, not degrees, so if your data is in degrees (as it often is) then you’ll need to convert it to radians using the NumPy radians function.
After doing all of this you can get your data into the contour plotting function correctly, and you can get some polar axes for it to be plotted on. However, if you do this, you’ll find that your axes look something like this:
You can see that zero degrees isn’t at the top, it’s at the ‘East’ or ’3 o’clock’ position, and the angles go round the wrong way! Apparently that’s how these things are often done in maths – but in my field particularly people want to have a polar plot like a compass, with zero at the top!
If you try and find how to do this, you’ll find a StackOverflow answer with a brilliant subclass of PolarAxes which does this for you. It’s brilliant that matplotlib allows you do this sort of customisation, but if you look below the accepted answer you’ll find a link to the matplotlib documentation for a function called set_theta_zero_location. This function very nicely takes a compass direction (“N” or “E” or “NE” etc) for where zero should be, and puts it there! Similarly, the function set_theta_direction sets the direction in which the angles will increase. All you need to do to use these is call them from the axes object:
The example above will set up the plot for a ‘normal’ compass-style plot with zero degrees at the north, and the angles increasing clockwise. If you find that these lines of code give an error you need to update your matplotlib version – these methods were only added in the latest version (v1.1.0).
So, now we’ve covered everything that I’ve gradually learnt about doing this, we can put it all together in a function. I use the function below whenever I want to plot a polar contour plot, and it works fine for me. It is documented through the docstring shown in the code below.
I can’t guarantee the code will work for you, but hopefully this post has been helpful and you’ll now be able to go away and create polar contour plots in matplotlib.
import numpy as np
from matplotlib.pyplot import *
def plot_polar_contour(values, azimuths, zeniths):
"""Plot a polar contour plot, with 0 degrees at the North.
* `values` -- A list (or other iterable - eg. a NumPy array) of the values to plot on the
contour plot (the `z` values)
* `azimuths` -- A list of azimuths (in degrees)
* `zeniths` -- A list of zeniths (that is, radii)
The shapes of these lists are important, and are designed for a particular
use case (but should be more generally useful). The values list should be `len(azimuths) * len(zeniths)`
long with data for the first azimuth for all the zeniths, then the second azimuth for all the zeniths etc.
This is designed to work nicely with data that is produced using a loop as follows:
values = 
for azimuth in azimuths:
for zenith in zeniths:
# Do something and get a result
After that code the azimuths, zeniths and values lists will be ready to be passed into this function.
theta = np.radians(azimuths)
zeniths = np.array(zeniths)
values = np.array(values)
values = values.reshape(len(azimuths), len(zeniths))
r, theta = np.meshgrid(zeniths, np.radians(azimuths))
fig, ax = subplots(subplot_kw=dict(projection='polar'))
cax = ax.contourf(theta, r, values, 30)
cb = fig.colorbar(cax)
return fig, ax, cax
In 1854 there was a massive cholera outbreak in Soho, London – in three days over 120 people died from the disease. Famously, John Snow plotted the locations of the deaths on a map and found they clustered around a pump in Broad Street – he suggested that the pump be taken out of service – thus helping to end the epidemic. This then helped him formulate his theory of the spread of cholera by dirty water.
This analysis is famous as it is often considered to be:
The first epidemiological analysis of disease – trying to understand the spread of cases by factors in the environment
The first geographical analysis of disease data – plotting points on a map and looking for relationships
Snow’s work is often used as a case study in courses in GIS and the geographies of health. So, I thought – why not convert Snow’s data into a format that will work with modern GIS systems to allow students (and others, of course) to analyse the data themselves with all of the capabilities of modern tools.
So, that’s what I did – and the data is available to download as SnowGIS.zip. All of the data is provided in TIFF (with TFW) or SHP formats, ready for loading in to ArcGIS, QGis, R, or anything else. There is a README in the zip file, but read on below for more details on what’s included.
To create the data I took I a copy of Snow’s original map, georeferenced it to the Ordnance Survey National Grid, warped it to fit correctly, and then digitised the locations of the deaths and the pumps. This allows the data that Snow collected to be overlaid on a modern OS map (click for larger copy):
The pumps are shown in blue, and the size of the red circles indicates the number of deaths at that location. Of course, the data can be overlaid on the original map created by Snow (so you can check I digitised it properly!):
So, that’s basically the data that’s included in the zip file (plus a greyscale version of the OS map to make for easier visualisation in certain circumstances). The question then is – what can you do with it? I’d be very interested to see what you do – but here are a few ideas:
How about performing some sort of statistical cluster analysis on the deaths data? Does it identify the correct pump as the source?
What if the data were only provided in aggregated form? Lots of healthcare data is provided in that way today because of privacy concerns – but if the data had been provided aggregated to (for example) census output areas or a standard grid, would the right pump have been identified?
So – have fun, and please let me know what you’ve done with the data in the comments (particularly if you do any useful analyses or use it in teaching).
Why can’t I get auto-complete to work properly in Eclipse?
I’ve just set up a new University computer and things don’t work like they do on my laptop (maybe that one’s just me…)
It’s actually very simple to solve, but the problem is actually nothing to do with Eclipse. First of all, let’s see what the problem is:
You’ve just installed Eclipse, are starting to do some programming in it, and want to use the very handy auto-complete feature. So, you type part of a function name and press Ctrl-Space, waiting for the magic to work and the rest of the name to be automatically typed….but it doesn’t happen!
In the image above (which unfortunately doesn’t include the cursor) I had typed ST, and pressed Ctrl-Space to autocomplete it but nothing happened.
When trying to fix this myself, I went in to the Eclipse options (Windows->Preferences, then General->Keys) and tried to find the command for auto-complete. Helpfully, it’s not called autocomplete or anything like that – it’s called Content Assist. This showed me that, as I expected, Content Assist was assigned to Ctrl-Space:
So why wasn’t Eclipse picking this up? I tried setting the key for Content Assist manually, but when I deleted the text in the key binding box and pressed Ctrl-Space, it showed that only Ctrl registered – somehow the spacebar press was being ‘eaten up’ by something else. What could it be?
The simple answer is: the Windows language services utility – obvious really! This seems to be set by default (at least some of the time) to switch between languages by using Ctrl-Space. On my personal computer I only have one language set up (English (UK)), but on the university computers there are loads – French, German, Italian, Chinese (simplified) etc. You can find out what languages you have set up by going to Control Panel -> Region and Language -> Keyboards and Languages (tab) and then Change Keyboards (again, how obvious…). You’ll see a list of languages installed – remove any that you don’t want (click the language and then click the Remove button) until you only have the ones you want left. That fixed it for me, but you can also check the Advanced Key Settings tab to make sure that none of the keyboard shortcuts that are set include Ctrl-Space.
Once you’ve done that, Ctrl-Space should work nicely
When you try and reproject a dataset in ArcGIS (for example, by using the Project Raster tool) you will see a dialog a bit like the one below:
The highlighted field wants you to specific a Geographic Tranformation. Although it says that it is optional, it often isn’t (I think the optionality depends on the type of transformation you’re trying to do). I’ve often found that there are a number of items available in the dropdown box and I have absolutely no idea which one to choose!
For example, when converting from OSGB to WGS84 there are the following options:
How on earth should you choose one of these? Until now I had been choosing semi-randomly – picking one, seeing if the result looks good, if not then trying another. However, the other day I found out about a list of these transformations in the ArcGIS documentation – available to download from the ArcGIS documentation page. This document (available for each different version of ArcGIS) lists all of the transformations available in ArcGIS and their theoretical accuracy. So, for example, we can find out that OSGB_1936_To_WGS_1984_2 is meant to cover England only, and OSGB_1936_To_WGS_1984_4 is for Scotland. The accuracies seem to be around 20m for each transform, although OSGB_1936_To_WGS_1984_5 (for Wales) has an accuracy of 35m.
I can’t believe I’d never come across this resource before – it allows me to actually make intelligent decisions about which transformation to use. I’d strongly suggest you get familiar with this document.
(I’d like to thank the GIS.SE users who helped me with a question I asked related to this problem)
I often find myself using Julian days as a simple method to represent dates in my code. It’s nice and easy, because every day is simply an integer (the number of days since the beginning of the year) and any time during the day can be represented as a floating point number (the fraction of that day which has elapsed at that point). Furthermore, lots of satellite imagery is provided with the dates specified as Julian days – for example, MODIS data filenames include the year and then the Julian day.
It’s fairly easy to convert a standard date (eg. 24th March) to the Julian day in most programming languages (there will either be a library function to do it, or it’s fairly simple to write one yourself) – but it’s not that easy to do in your head. So, I have created a Julian Day calendar poster:
It’s a very simple table that lets you choose the month and day, and get the Julian Day number. Just make sure you remember the note at the top – add one to the number (after February) if it’s a leap year!
I was helping some undergraduates with some work the other day, and they decided to use the following colour scheme for representing river depth:
Deep water: Red
Medium-depth water: Bright green
Shallow water: Pink
Why did they do this? Well, either they were the default values used by the software they were using (unlikely), or they just chose randomly. Not a good idea.
If you look you’ll find a huge amount of literature about this (I should put some references here but I can’t really be bothered at this time at night), and it really makes your maps a HUGE amount more useable if you’re using sensible colours. For example:
Deep water: Dark blue
Medium-depth water: Medium-blue
Shallow water: Light blue
Why is this sensible? Well it makes sense on a number of levels – water is normally shown as blue (so it’s obviously some kind of water), and the different levels of colour imply some sort of ordering. With the original colours above there is no inherent ordering – is green ‘higher’ or ‘lower’ than red? Of course, red and green being used for ‘incorrect’ and ‘correct’ on a different map would be very sensible…
Isn’t it hard work to come up with nice colour schemes for all of your maps? Nope not at all – ColorBrewer has done it already! If you haven’t used this website already I urge you to do so, it provides a number of carefully-chosen colour-schemes designed for various different purposes. For representing river depth you’d probably want to use one of the blue Sequential schemes, but there are also Diverging schemes for data that goes off in two directions, as well as schemes for representing Qualitative data (those that have no explicit ordering). What’s more you can tell it to only show schemes that are color-blind-friendly, photocopier-safe etc, and it’ll produce a preview for you with various map styles (labels, cities, coastlines etc). All in all it’s very impressive, and very useful.
Plugins and extensions are available for a number of pieces of software to allow ColorBrewer colours to be easily used. These include an ArcGIS plugin (see the bottom answer for how to install with ArcGIS 10), R package, Python module and IDL routines.
I’m starting a new series here on my blog about interesting and somewhat out-of-the-ordinary applications of remote sensing. So…here is number 1.
An analysis firm called Remote Sensing Metrics LLC has been using satellite images to predict changes in the profits of firms such as Walmart, and thus help predict share prices. How does this work? Well it’s fairly simply – for a firm like Walmart, the more people shop in its stores, the more profitable it will be. Of course, this is a generalisation, but there is likely to be a fairly strong correlation between these two variables. So, all you need to do is find a way to estimate how many people shop at Walmart.
How does remote sensing come into this? Well, you just use high resolution imagery and count the cars in the car parks at their stores. Obtaining base-line data for a number of stores, and then looking at the trends should give a guide to trends in profits. If you want to go further – and these analysts did – you can perform a regression to obtain a quantitative relationship between the number of cars seen in the car parks and the profits of the company.
The technicalities of the approach aren’t detailed in the article – for obvious commercial reasons – but it seems that the data is provided by sensors such as QuickBird and Geo-Eye, which have resolutions down to 40cm. This means they can easily resolve cars – which tend to be at least 1m wide by 2m long. Whether they count the cars using human analysts, or whether some sort of object-based image analysis is used to classify cars in defined areas (the selected car parks), I don’t know – but I suspect the latter.
What do you think of this? Clever use of technology, or “Cold War-style satellite surveillance” (as CNBC put it)? Why not leave a comment.