Category Archives: Python
van Heuklon Ozone model implementation in Python
As part of my PhD I wanted to use a simple model which would give me an estimation of the atmospheric ozone amount given a location and time of year. A simple model to do this was created by van Heuklon in 1979, and was described in a delightfully simple paper (unfortunately not freely available online, but see the reference below) using a simple formula. The map below shows the modelled ozone amounts across the world for 15th April (click to enlarge):
The formula was built up from a baseline ozone amount (235 matm-cm) which then has various components added to it, some of which varied based upon latitude, longitude or time of year. The model isn’t intended to be accurate or high-resolution, but provides me with a rough estimate of the ozone amount at a given location at a given time. Accurate measurements of ozone amounts are available from satellites, but I would need to get data from many different times of year to be able to take into account the seasonal variation in ozone amounts, and the satellite data would take up a lot of space – hence using this simple model.
The code for this model is available on Github and has two implementations: get_ozone_conc and old_get_ozone_conc. The latter uses standard Python code with if statements to set the different values in the formula depending on the input latitudes and longitudes, which means it can only process one latitude/longitude combination at once. get_ozone_conc, on the other hand, uses somewhat more complicated numpy code which allows you to pass it arrays of latitudes and longitudes and it will return an array of results. This is what was used to produce the map above – the code for which is also in the repository.
This code was created for use as part of my PhD, but has been released into the public domain (as CC0) in case anyone else finds it useful.
Also, for those of you using Py6S (my Python interface to the 6S atmospheric Radiative Transfer Model), I’m sure you will be pleased to know that this model will shortly be integrated into Py6S to allow you to easily specify ozone amounts using it.
References:
van Heuklon, T. K. (1979). Estimating atmospheric ozone for solar radiation models. Solar Energy, 22(1), 63-68
John Snow’s Cholera data in more formats
In honour of the bicentenary of John Snow’s birth – and because I was asked to by someone via email – I have now released my digitisation of John Snow’s Cholera data in a few other formats: KML and as Google Fusion Tables.
To save you reading my previous blog posts on the subject, I’ll give a brief overview of my data. John Snow produced a famous map in 1854 showing the deaths caused by a cholera outbreak in Soho, London, and the locations of water pumps in the area. By doing this he found there was a significant clustering of the deaths around a certain pump – and removing the handle of the pump stopped the outbreak. This is a bit of a simplification (see Wikipedia or the John Snow Society for more details), but generally covers what happened.
Anyway, I digitised John Snow’s original data and georeferenced it to the Ordnance Survey co-ordinate system, so that I could overlay it on modern maps of that area, as below (using the OS OpenData StreetView data, containing Ordnance Survey data © Crown copyright and database right 2013):
while still being able to overlay it on John Snow’s original map:
Anyway, the data that is available is:
- Cholera Death locations (Vector) with attribute data giving the number of deaths at each point
- Pump locations (Vector)
- John Snow’s original map georeferenced to the Ordnance Survey National Grid (Raster)
- Current Ordnance Survey maps of the area (from those released under OS OpenData; Contains Ordnance Survey data © Crown copyright and database right 2013; Raster)
These are available for download/use in a number of formats:
- A zip file with the Vector data as Shapefiles and the Raster data as TIFF images
(this is the original data provided for download by me – and is probably what you want for importing into a GIS system)
Download - A zip file with the Vector data as KML files and the Raster data as TIFF images
(suitable for importing into Google Earth and other products that use KML files)
Download - Links to Google Fusion Tables with the vector data already imported
Cholera Deaths
Pumps
Deaths and Pumps together (this dataset has both pump and death points in the same table: pump points have Count values of -999, death points have Count values > 0 which give the number of Cholera deaths at that location)
The latter is particularly cool, I think as it allows you to very easily overlay the data on modern Google Maps data, and should allow some interesting ‘mashups’ to be created. All of the tables are set to be shared publically, so you could be able to copy them (using the Copy Table command in the File menu) and play around with them as much as you want! If you click the Example Map tab then you’ll see a very rudimentary map I’ve created using the data on top of Google Maps (see below) – I’m sure you’ll be able to do far better visualisations than that.
The folks at CartoDB have also used this data in one of their tutorials which shows you how to import the data to CartoDB and create a styled map to show the deaths with different sized markers – yet another way you can use the ‘first real GIS data’ in today’s modern web-based GIS tools.
So, enjoy – and please let me know (via the comments below) what you create!
Update: There was a problem with the KML files and Google Fusion Tables that I uploaded yesterday, caused by an incorrect co-ordinate transformation between the Ordnance Survey grid references and latitude/longitude. This has now been fixed and the downloads and tables have been updated. Sorry about this.
Converting latitude/longitude co-ordinates to Landsat WRS-2 paths/rows
As part of some work I was doing for my PhD, I needed to automatically find what Landsat scene path and row would contain a pixel with a certain latitude/longitude co-ordinate (this was basically so I could automatically download the relevant Landsat image and do some processing on it).
There is an online converter (provided by the USGS) which allows you to input latitude/longitude and get back path/row, but it doesn’t have an API of any kind, and I couldn’t see an easy to way to automatically submit requests to the form, as they’ve (very sensibly really) included methods to stop people doing that.
Summary: So, I wrote my own code to do the conversion. It’s available on Github and depends on GDAL/OGR and Shapely. See the README file on Github for instructions.
Very helpfully, the USGS have made shapefiles of the WRS-2 scene boundaries available to download, so all my code needs to do is load the shapefile, do some ‘Point in Polygon’ operations to find out which polygon(s) the point is in, and extract the ‘path’ and ‘row’ attributes from those polygons. This is fairly easy to do in Python using the wonderful OGR library to access shapefiles, and Shapely to do the ‘Point in Polygon’ operations. When writing this I used the Python Geospatial Development book as a good reference for how to combine GDAL and Shapely in this way.
The code is available at GitHub, with instructions on how to install it (basically, download the shapefiles from the USGS – I’m deliberately not re-distributing them here – and download the Python script), and it is very easy to use:
>>> import get_wrs
>>> conv = ConvertToWRS()
>>> conv.get_wrs(50.14, -1.7)
[{'path': 202, 'row': 25}]
>>> conv.get_wrs(50.14, -1.43)
[{'path': 201, 'row': 25}, {'path': 202, 'row': 25}]
Beware:
- Instantiating the class may take a while, as it has to load in the shapefile
- The get_wrs method may return multiple results, as Landsat scenes overlap at all four edges.
Please let me know (via a comment on this post) if you find this code useful.
For reference, the code is shown below, and is well-commented to show how it works:
Can I atmospherically-correct my images with Py6S?
I’m a big fan of Matt Might’s blog, and thought I’d implement one of his tips for blogging as an academic – namely Reply to Public. I’ve had a number of emails from Py6S users asking me questions about how to atmospherically-correct entire satellite images with Py6S – so I thought ‘d respond online, so that in the future people either find this response without needing to email me (ideally), or at the very least, I can point them to this blog post when they do email me.
Unfortunately, the simple answer to the question is: Py6S cannot (yet!) atmospherically correct satellite images – at least, not in a sensible manner – so I’d suggest using other atmospheric correction software such as ATCOR or FLAASH. The longer answer is below…
If you read the Py6S documentation you’ll find that there is an atmospheric correction option which you can use to take an at-sensor radiance (often called a Top of Atmosphere radiance when you’re dealing with satellite sensors) and atmospherically-correct it to get a corrected surface reflectance (or radiance) value. You could write some code to do that – it might look a bit like this:
from Py6S import * # Create a SixS object called s # (used as the standard name by convention) s = SixS() # Set the atmospheric conditions as usual s.aero_profile = AeroProfile.PredefinedType(AeroProfile.Maritime) s.aot550 = 0.05 s.atmos_profile = AtmosProfile.UserWaterAndOzone(2, 0.318) # Set the wavelength s.wavelength=Wavelength(PredefinedWavelengths.MODIS_B6) # Set the altitudes s.altitudes.set_target_sea_level() s.altitudes.set_sensor_satellite_level() # Set the geometry s.geometry = Geometry.User() s.geometry.solar_z= 35.8 s.geometry.solar_a= 149.4 s.geometry.view_z= 5.1 s.geometry.view_a= 106.8 s.geometry.month=06 s.geometry.date=28 # Turn on atmospheric correction mode and set it to do the # correction assuming a Lambertian surface with a TOA # radiance of 137.5 W/m^2 s.atmos_corr = AtmosCorr.AtmosCorrLambertianFromRadiance(137.5) # Run the model s.run() # Print the result of the atmospheric correction # (assuming Lambertian reflectance) # This is the ground-reflectance for the given radiance, # under the given atmospheric conditions print s.outputs.atmos_corrected_reflectance_lambertian
This works fine, and so you might think that all you need to do to correct a satellite image is to take the code above, and run it for each pixel of the image using a for loop, as in the pseudo-code below:
# Load in the image (eg. using GDAL) for pixel in image: reflectance = run_py6s(various, parameters, here) # Store reflectance in the output image # Save output image
Unfortunately, that won’t work very well. Or rather, it will work – it’ll just take a long time! Let’s do some maths to work out roughly how long:
- As a rough approximation, it takes Py6S around two seconds to atmospherically correct a single pixel in a single band (the exact time depends on how complicated the parameterisation is – for example, a full manual AERONET-based aerosol parameterisation will take longer than a standard aerosol model)
- A full Landsat scene is roughly 5500 x 6000 pixels (170km x 185km), which is around 33 million pixels
- 33 million pixels x 2 seconds per pixel = 66 million seconds, or 763 days!
- If I started running this now – on one band of a Landsat scene – it’d be finished around April 2015. Even if I used a simple parameterisation that only took one second per pixel, it’d still take over a year!
Now, I suspect you’re now thinking the Py6S is really awful because it takes such a long time. Well, unfortunately it’s not down to Py6S (if it was, I could try and improve it!). In fact, the Python bit of Py6S adds very little time - the vast majority of the time is taken by 6S itself.
So, maybe it’s 6S that is rubbish. Well, again, unfortunately not, or we could switch to something that is better. All other Radiative Transfer Models, such as MODTRAN, SCIATRAN etc, suffer from the same problem.
So, how on earth do people manage to correct satellite images? Obviously people have managed to do it – and there is commercial software available to do it. Well, they use something called a Lookup Table (LUT). This involves running the Radiative Transfer Model many times to produce a table of results, and then looking up the result you want in this table. Creating the table takes a long time (but nowhere near as long as 763 days!) and it often takes a lot of space to store the resulting table (for example, the Lookup Table used in ATCOR is a multi-Gigabyte file), but once you’ve got the table you can correct a pixel in a tiny fraction of a second – rather than one or two seconds – which means the correction of an image is a lot quicker.
A lookup table is a good example of two standard trade-offs in computing:
- Setup time vs Run time: Often algorithms can either be designed to have a very short (or no) setup time, but then take a long time to run; or they can be designed to have a long setup phase, and then run very quickly from then onwards. Often the decision of which one to focus on depends on how often you’re going to run your process – if you want to run it multiple times (like an atmospheric correction algorithm) then it is normally better to have a long setup time (which you only have to do once!) and then a short run time.
- Memory vs Time: Again, algorithms often trade speed against memory – a fast algorithm normally requires more memory, and vice-versa. In this case, a Lookup Table trades speed for memory – we have to store the Lookup Table in memory, but once we have access to it, we can do corrections very fast.
I haven’t written any code to use Py6S to create a lookup table and then use that to correct an image (although I will almost certainly write some code to do this sometime during my PhD) – but it wouldn’t be too difficult to do. The procedure would be something like the following:
Setup Phase:
- Come up with a list of parameter values for all of the important parameters – choosing a number of parameters across their realistic range (there is a trade-off between accuracy and speed here, as always). For example maybe: AOT = [0.1, 0.2, 0.3, 0.4…]; solar_z = [10, 20, 30, 40, 50…]; radiance = [10, 20, 30, 40, 50, 60, 70] etc.
- Write some code to loop over all of these lists and run every possible combination of them
- Store the result of all of these runs of Py6S (ie. the atmospherically-corrected reflectance) and the parameters used to get the result in a big table of some sort.
Correction Phase:
Loop through every pixel in the image, and for each pixel:
- Get the parameters (AOT, angles, radiance etc) for that pixel
- Choose the nearest parameters that were used to create results in the table (for ALL of the parameters you used)
- Interpolate between these parameters as needed (eg. for an AOT of 0.15, get the results for an AOT of 0.1 and and AOT of 0.2 and interpolate) to get an interpolated ground reflectance.
- Store this reflectance in another image.
This basic procedure has been used in almost all atmospheric correction tools for the last thirty years, and is described in various papers including Fraser et al. (1992) and Liang et al. (1997). Another way to speed-up the computation even more is by introducing parallel computation. If we can assume that each pixel’s atmospheric correction is entirely independent from every other pixel’s (which may not be the case in the real-world, but is generally assumed by most atmospheric correction algorithms) then we can split the pixels between a number of processors and thus correct many pixels in parallel – which gets an even greater speedup as we can do this both when generating the LUT, and when correcting the image.
Unfortunately, actually implementing this is a bit more complicated than I’ve explained here and various decisions have to be taken (including what ranges of parameters to use, how to interpolate, how best to store the multi-dimensional lookup table and more…) – but it is definitely on my ‘to do’ list as part of my PhD, and as part of Py6S.
So, in conclusion: Py6S can’t do atmospheric correction of satellite imagery in a sensible time at the moment, but should be able to within the next few years. In the meantime, I suggest using other atmospheric correction software (most of which, unfortunately, is commercial software). You may think that not being able to do this makes Py6S useless…but, as my next post will show, Py6S is still very useful for remote-sensing research.
References
Fallah-Adl, Hassan, et al. “Fast algorithms for removing atmospheric effects from satellite images.” Computational Science & Engineering, IEEE 3.2 (1996): 66-77.
Fraser, R. S., Ferrare, R. A., Kaufman, Y. J., Markham, B. L., & Mattoo, S. (1992). Algorithm for atmospheric corrections of aircraft and satellite imagery.International Journal of Remote Sensing, 13(3), 541-557.
Liang, Shunlin, et al. “An operational atmospheric correction algorithm for Landsat Thematic Mapper imagery over the land.” Journal of Geophysical Research 102.D14 (1997): 17173-17.
Py6S: Run a radiative transfer model with a user-defined input spectrum
Version 1.2 of Py6S has recently been released (including a couple of bug fix releases, taking the most recent version to v1.2.2), and the major new feature in this version is the ability to use any spectrum as the ground reflectance for a model run. Previously, users were restricted to using the built-in 6S ground spectra (vegetation, clear water, lake water and sand) or doing some complicated parameterisation to set the user-defined reflectances up in the correct manner – so complicated that it was almost never worth doing!
Luckily, that has all changed in this version – read on to find out more…
Setting the Ground Reflectance
Setting the ground reflectance parameterisation in Py6S is very simple – you just use one of the methods of the GroundReflectance class. For example, to create a 6S model and set a constant reflectance of 0.5 for all wavelengths, simply run:
from Py6S import * s = SixS() s.ground_reflectance = GroundReflectance.HomogeneousLambertian(0.5) s.run()
In previous versions the parameter given to HomogeneousLambertian (or HeterogeneousLambertian) could be any of:
- A single reflectance value as a float, which is used as a uniform reflectance for all wavelengths:
s.ground_reflectance = GroundReflectance.HomogeneousLambertian(0.7)
- A constant stating a built-in 6S reflectance value:
s.ground_reflectance = GroundReflectance.HomogeneousLambertian(GroundReflectance.GreenVegetation)
- An array giving a user-defined spectrum given in steps of 2.5nm for the wavelength range used for that simulation (the last bit is important!)
s.ground_reflectance = GroundReflectance.HomogeneousLambertian([0.6, 0.8, 0.34, 0.453])
But now there is one more option:
- A two-dimensional array containing wavelengths (column 0, in micrometres) and reflectance values (column 1, as a fraction). This array will then be taken by Py6S, resampled to 2.5nm, subset to only the range of wavelengths being used for the current simulation, and used with the underlying 6S model.
s.ground_reflectance = GroundReflectance.HomogeneousLambertian(spectrum_array)
This saves a lot of time and extra code, compared to doing all of the resampling yourself and using the user-defined spectrum option which was always available in Py6S. However, there are more benefits than this… Py6S will store the whole spectrum that you give, and just re-sample it to the correct wavelength range every time you run the model, which means you can use the lovely helper methods like run_vnir and run_landsat_tm to run simulations for a spectrum for a number of wavelengths with very little code. For example:
from Py6S import * s = SixS() # Somehow generate a 2D array called spectrum_array here s.ground_reflectance = GroundReflectance.HomogeneousLambertian(spectrum_array) wavelengths, results = SixSHelpers.Wavelengths.run_vnir(s, output_name="apparent_radiance")
Getting spectral data
Of course, that example won’t work until we add in some code to generate this 2D array for us. There are a two ways we could get this array:
- Generate it within our code somehow – for example, using a simple model, or interpolation between some hard-coded values.
- Load it from some sort of external datafile, for example from a CSV file using np.loadtxt
Both of those are ways that you might want to use, but I’ve added functions to Py6S to simplify a couple of ways of doing this that may be useful to people.
Firstly, the functions in the Spectra module make it easy to import spectra from the USGS Spectral Library and the ASTER Spectral Library. These two libraries seem to be the most commonly used – as far as I am aware, at least – and have a very wide range of spectra in them. The two functions are very simple to use – you simply pass the URL or file path to the spectrum data file, and it returns the right sort of array. Thus you can do:
spectrum_array = Spectra.import_from_usgs("http://speclab.cr.usgs.gov/spectral.lib06/ds231/ASCII/V/russianolive.dw92-4.30728.asc")
s.ground_reflectance = GroundReflectance.HomogeneousLambertian(spectrum_array)
Or
s.ground_reflectance = GroundReflectance.HomogeneousLambertian(Spectra.import_from_usgs("http://speclab.cr.usgs.gov/spectral.lib06/ds231/ASCII/V/russianolive.dw92-4.30728.asc"))
(Either of the URLs could be replaced with a simple local file path instead, which is particularly useful if you have downloaded the USGS Spectral Library archive file, or obtained the CD of the ASTER Spectral Library).
The other way to get data is to use a model. Many models can produce spectral data, and most will output in a format like CSV that can be imported into a NumPy array really easily – but to make things even easier I have also released a Python interface to the ProSAIL called – you guessed it – PyProSAIL. Again, the methods have been carefully designed to return the right sort of arrays, so you can simply do:
# Make sure you have both PyProSAIL and Py6S installed import pyprosail from Py6S import * spectrum = pyprosail.run(1.5, 40, 8, 0, 0.01, 0.009, 1, 3, 0.01, 30, 0, 10, 0, pyprosail.Planophile) s = SixS() s.ground_reflectance = GroundReflectance.HomogeneousLambertian(spectrum) s.run()
So, hopefully that has been a useful introduction to how to use user-defined spectra with Py6S. I’ll leave you with an example of how to put all of this together to do something useful – that is, to run a Py6S simulation with the ground reflectance set to the spectra from every USGS spectral library file in a directory, and store the results:
from Py6S import *
import glob
import numpy as np
# Create the SixS model and set some parameters
s = SixS()
s.aero_profile = AeroProfile.PredefinedType(AeroProfile.NoAerosols)
s.altitudes.set_sensor_satellite_level()
s.altitudes.set_target_sea_level()
# Get a list of files that have the extension .asc
# (the extension used by the USGS spectral library files)
files = glob.glob("./USGSSpecLib/Selected/*.asc")
results = []
# Run the model using each filename as the ground reflectance spectrum
# using the Landsat ETM+ bands
for filename in files:
s.ground_reflectance = GroundReflectance.HomogeneousLambertian(Spectra.import_from_usgs(filename))
wvs, res = SixSHelpers.Wavelengths.run_landsat_etm(s, output_name='apparent_radiance')
results.append(res)
# Stack all of the results into an array
results_arr = np.vstack(results)
# Write the results out to a CSV file for further analysis
np.savetxt("results.csv", results_arr, delimeter=",")
My first academic paper – on Py6S
Another exciting update for this new year: my first academic journal paper has been published!
It’s called Py6S: A Python interface to the 6S Radiative Transfer Model, and is published in Computers and Geosciences. If you’re reading this from a university with a subscription to Computers and Geosciences then you can read it at ScienceDirect – if you don’t, then you can read the post-peer-review (but pre-publisher-typsetting) version at my website.
The paper describes the Py6S python module that I wrote to assist me in running simulations using the Second Simulation of the Satellite Signal in the Solar Spectrum (6S) model, and is fairly short, but – I hope – fairly useful. It also gives a good way for people to cite Py6S when (if…) they use it in their work – they simply cite this paper.
Of course, the issue with journal papers is that they are static – indeed, a number of things changed in Py6S during the time between submitting the paper and it coming out in print (which was around nine months!). None of those affected the example code given in the paper – and to be honest, they were mostly bugfixes – but there are some new features gradually working their way into the code.
I’m planning to start a series of posts about Py6S on this blog, showing how various things work ‘under the hood’, announcing new features, and showing examples of how to use Py6S well. Hopefully this will be useful for me and also useful for other people who may be interested in using Py6s, but may require a bit of help.
So… stay tuned for the next post which will be on a new feature involving user-defined spectra.
Announcing bib2coins – convert BibTeX files to COINS metadata
Recently I was shocked to find that there didn’t seem to be a simple tool which would convert BibTeX files to COINS metadata span tags – so I wrote one!
That sentence probably made no sense to you – so lets go through it in a bit more depth. I use LaTeX to write all of my reports for my PhD, and therefore I keep all of my references in BibTeX format. I also use BibTeX format to keep a list of all of my publications, which I then use to automatically produce a nice listing of my publications on my website. I’ve recently become a big fan of Zotero, which will import references from webpages with a single click. This works for many sites like Google Scholar, Web of Knowledge, Science Direct etc – and I wanted to get the same for my publications page.
Examining the information given on the Zotero Exposing Metadata page suggests that one of the ways to expose this metadata in a HTML page is to use COINS (ContextObjects in Spans). This involves putting a number of strange-looking <SPAN> elements into your HTML page, which Zotero (and various other tools like Mendeley) will then use to automatically add the bibliographic data to their database.
So, how should I create the COINS metadata? Well, you can generate one item at a time using the online generator, or you can export items from Zotero as COINS, but neither of these methods can be automated. I’d really like to have a simple command-line tool that would take a BibTeX file and produce COINS metadata for all of the entries in the file…
So that’s what I created! It’s called bib2coins and it is available on the Python Package Index, to install simply run pip install bib2coins, and it will automatically be placed on your path. You can then just run it as bib2coins bibtexfile.bib and it will print out a load of COINS spans to standard output – just ready for you to pipe into the middle of a HTML file!
The code is fairly simple, and uses a BibTeX parser written by Vassilios Karakoidas combined with my own code to create the COINS spans themselves. It is not finished yet, and currently works well for journals and ‘inproceedings’ items but hasn’t been tested on much else (I haven’t written any books, so I’m not so concerned about creating COINS metadata for them!). However, I will be updating this tool to support more bibliographic item types in the near future.
How to: Make your Sphinx documentation compile with ReadTheDocs when you’re using Numpy and Scipy
Sphinx is a great tool for documenting Python programs (and lots of other things – I recently saw a lecturer who had done all of his lecture notes using Sphinx!) and I’ve used it for my new project (which will be announced on this blog in the next few days). Now that the project is near release, I wanted to get the documentation onto ReadTheDocs to make it nice and easily accessible (and so that it can be easily built every time I commit to GitHub).
The theory is that you just point ReadTheDocs to your GitHub repository, it finds the Sphinx conf.py file, and all is fine. However, if you use any module outside of the standard library, and you’re using the Sphinx autodoc module, then it will fail to compile the documentation. This is because the Python code that you are documenting needs to be able to be imported for autodoc to work, and if you are trying to import a module that doesn’t exist by default on a Python install then an error will be produced.
The ReadTheDocs FAQ says that you can setup a pip_requirements file to install any modules that are needed for your code, but this won’t work for any modules that include C code. This is understandable – as ReadTheDocs don’t want any random C code executing on their server – but it means that trying to build the docs for any code that uses numpy, scipy or matplotlib (or many other modules) will fail.
The FAQ suggests how to solve this – using a so-called ‘mock’. This is an object that pretends to be one of these modules, so that it can be imported, but doesn’t actually do anything. This doesn’t matter as it is not normally necessary to actually run the code to produce the docs, just to be able to import it. However, the code that is provided by ReadTheDocs doesn’t work for any modules that you import using the * operator – for example, from matplotlib import *. After asking a StackOverflow question, I found how to fix this for the code that ReadTheDocs provide, but a comment suggested a far easier way to do this, simply add code like the following to the top of your conf.py file:
import mock MOCK_MODULES = ['numpy', 'scipy', 'matplotlib', 'matplotlib.pyplot', 'scipy.interpolate'] for mod_name in MOCK_MODULES: sys.modules[mod_name] = mock.Mock()
In the MOCK_MODULES list put the names of all of the modules that you import. It is important to list submodules (such as matplotlib.pyplot) as well as the main modules. After committing your changes and pushing to GitHub, you should find that your docs compile properly.
Announcing DateRangeParser: Parse strings like “27th-29th June 2010″
In a project recently I was struggling to find a way to parse strings that contain a date range, for example:
- 27th-29th June 2010
- Tuesday 29 May -> Sat 2 June 2012
- From 27th to 29th March 1999
None of the Python modules I investigated (including parsedatetime) seemed to be able to cope with the range of strings that I had to deal with. I investigated patching parsedatetime to allow it to do what I wanted, but I found it very hard to get into the code. So, I thought, why not write my own…
So I did, and I’ve released it under the LGPL and you can install it right now by running:
pip install daterangeparser
Or you can visit the DateRangeParser PyPI page to download it manually, read the documentation, or hack on the code.
The current version will parse a wide range of formats (see the examples in the documentation) and will deal with individual dates as well as date ranges. The API is very simple – just import the parse method and run it, giving the date range string as an argument. For example:
from daterangeparser import parse
print parse("14th-19th Feb 2010")
This will produce an output tuple with two datetime objects in it: the start and end date of the range you gave.
The parser is built using PyParsing – a great Python parsing framework that I have found very easy to get to grips with. It is incredibly powerful, very easy to use, and really shows how limited regular expressions can be! Now that I’ve done this I have an urge to use PyParsing to write parsers for all of the horrible scientific data formats that I have to deal with in my PhD….watch this space!
Producing polar contour plots with matplotlib
In my field I often need to plot polar contour plots, and generally plotting tools don’t make this easy. In fact, I think I could rate every single graphing/plotting package in the world by the ease of producing a polar contour plot – and most would fail entirely! Still, I have managed to find a fairly nice way of doing this using my plotting package of choice: matplotlib.
I must warn you first – a Google search for matplotlib polar contour or a similar search term will produce a lot of completely out-dated answers. The most commonly found answers are those such as this StackOverflow question and this forum post. In fact, the first question was asked by me last year – and got an answer which is now completely out of date. Basically, all of these answers tell you that you can’t do a polar contour plot directly in matplotlib, and you must convert your points from polar co-ordinates to cartesian co-ordinates first. This isn’t difficult, but is a pain to do, and of course you then end up with cartesian axes which doesn’t look great. The great news is that you can now do polar contour plots directly with matplotlib!
So, how do you do them? Simple really, you just create some polar axes and plot a contour plot on them:
fig, ax = subplots(subplot_kw=dict(projection='polar')) cax = ax.contourf(theta, r, values, nlevels)
This produces a filled contour plot, as it uses the contourf function, using the contour function would give simple contour lines. The first three parameters which must be given to this function are all two-dimensional arrays containing: the radii, the angles (theta) and the actual values to contour. The final parameter is the number of contour levels to plot – you tend to want lower numbers for line contours and higher numbers for filled contour plots (to get a smooth look).
I never quite understood these two-dimensional arrays, and why they were needed. I normally had my data in the form of three lists that were basically columns of a table, where each row of the table defined a point and value. For example:
| Radius | Theta | Value |
| 10 | 0 | 0.7 |
| 10 | 90 | 0.45 |
| 10 | 180 | 0.9 |
| 10 | 270 | 0.23 |
| 20 | 0 | 0.5 |
| 20 | 90 | 0.13 |
| 20 | 180 | 0.52 |
| 20 | 270 | 0.98 |
Each of these rows define a point – for example, the first row defines a point with a radius of 10, an angle of 0 degrees and a value of 0.7. I could never understand why the contour function didn’t just take these three lists and plot me a contour plot. In fact, I’ve written a function that will do just that, which I will describe below, but first let me explain how those values are converted to two-dimensional arrays.
First of all, lets think of the dimensions: we obviously have dimensions here in our data, the radius and the angle. In fact, we could re-shape our values array so that it is two-dimensional fairly easily. We can see from the table above that we are doing all the azimuth angles for a radius of 10 degrees, then the same azimuth angles for a radius of 20 degrees, etc. Thus, rather than our values being stored in a one-dimensional list, we could put them in a two-dimensional table where the columns are the azimuth angles, and the rows are the radii:
| 0 | 90 | 180 | 270 | |
| 10 | 0.7 | 0.45 | 0.9 | 0.23 |
| 20 | 0.5 | 0.13 | 0.52 | 0.98 |
This is exactly the sort of two dimensional array that we need to give to the contourf function. That’s not too hard to understand – but why on earth do the radii and angle arrays have to be two-dimensional too. Well, basically we just need two arrays like the one above, but with the relevant radii and angles in the cells, rather than the values. So, for the angles, we’d have:
| 0 | 90 | 180 | 270 | |
| 10 | 0 | 90 | 180 | 270 |
| 20 | 0 | 90 | 180 | 270 |
And for the radii we’d have:
| 0 | 90 | 180 | 270 | |
| 10 | 10 | 10 | 10 | 10 |
| 20 | 0 | 20 | 20 | 20 |
Then, when we take all three arrays together, each cell will define the three bits of information we need. So, the top left cell gives us an angle of 0, a radius of 10 and a value of 0.7. Luckily, you don’t have to make these arrays by hand – a handy NumPy function called meshgrid will do it for you:
>>> radii = np.arange(0, 60, 10)
>>> print radii
[ 0 10 20 30 40 50]
>>> angles = np.arange(0, 360, 90)
>>> print angles
[ 0 90 180 270]
>>> np.meshgrid(angles, radii)
(array([[ 0, 90, 180, 270],
[ 0, 90, 180, 270],
[ 0, 90, 180, 270],
[ 0, 90, 180, 270],
[ 0, 90, 180, 270],
[ 0, 90, 180, 270]]),
array([[ 0, 0, 0, 0],
[10, 10, 10, 10],
[20, 20, 20, 20],
[30, 30, 30, 30],
[40, 40, 40, 40],
[50, 50, 50, 50]]))
One thing to remember is that the plotting function requires the angle (theta) in radians, not degrees, so if your data is in degrees (as it often is) then you’ll need to convert it to radians using the NumPy radians function.
After doing all of this you can get your data into the contour plotting function correctly, and you can get some polar axes for it to be plotted on. However, if you do this, you’ll find that your axes look something like this:
You can see that zero degrees isn’t at the top, it’s at the ‘East’ or ’3 o’clock’ position, and the angles go round the wrong way! Apparently that’s how these things are often done in maths – but in my field particularly people want to have a polar plot like a compass, with zero at the top!
If you try and find how to do this, you’ll find a StackOverflow answer with a brilliant subclass of PolarAxes which does this for you. It’s brilliant that matplotlib allows you do this sort of customisation, but if you look below the accepted answer you’ll find a link to the matplotlib documentation for a function called set_theta_zero_location. This function very nicely takes a compass direction (“N” or “E” or “NE” etc) for where zero should be, and puts it there! Similarly, the function set_theta_direction sets the direction in which the angles will increase. All you need to do to use these is call them from the axes object:
ax.set_theta_zero_location("N")
ax.set_theta_direction(-1)
The example above will set up the plot for a ‘normal’ compass-style plot with zero degrees at the north, and the angles increasing clockwise. If you find that these lines of code give an error you need to update your matplotlib version – these methods were only added in the latest version (v1.1.0).
So, now we’ve covered everything that I’ve gradually learnt about doing this, we can put it all together in a function. I use the function below whenever I want to plot a polar contour plot, and it works fine for me. It is documented through the docstring shown in the code below.
I can’t guarantee the code will work for you, but hopefully this post has been helpful and you’ll now be able to go away and create polar contour plots in matplotlib.
import numpy as np
from matplotlib.pyplot import *
def plot_polar_contour(values, azimuths, zeniths):
"""Plot a polar contour plot, with 0 degrees at the North.
Arguments:
* `values` -- A list (or other iterable - eg. a NumPy array) of the values to plot on the
contour plot (the `z` values)
* `azimuths` -- A list of azimuths (in degrees)
* `zeniths` -- A list of zeniths (that is, radii)
The shapes of these lists are important, and are designed for a particular
use case (but should be more generally useful). The values list should be `len(azimuths) * len(zeniths)`
long with data for the first azimuth for all the zeniths, then the second azimuth for all the zeniths etc.
This is designed to work nicely with data that is produced using a loop as follows:
values = []
for azimuth in azimuths:
for zenith in zeniths:
# Do something and get a result
values.append(result)
After that code the azimuths, zeniths and values lists will be ready to be passed into this function.
"""
theta = np.radians(azimuths)
zeniths = np.array(zeniths)
values = np.array(values)
values = values.reshape(len(azimuths), len(zeniths))
r, theta = np.meshgrid(zeniths, np.radians(azimuths))
fig, ax = subplots(subplot_kw=dict(projection='polar'))
ax.set_theta_zero_location("N")
ax.set_theta_direction(-1)
autumn()
cax = ax.contourf(theta, r, values, 30)
autumn()
cb = fig.colorbar(cax)
cb.set_label("Pixel reflectance")
return fig, ax, cax



