Robin's Blog

Automatically generating a legend for a choropleth layer in Leaflet

Some work I’ve been doing recently has involved putting together a webmap using the Leaflet library. I’ve been very impressed with how Leaflet works, and the range of plugins available for it.

leaflet-choropleth is an extension for Leaflet that allows easy generation of choropleth maps in Leaflet. The docs for this module are pretty good, so I’ll just show a quick example of how to use it in a fairly basic way:

var layer_IMD = L.choropleth(geojson, {
        valueProperty: 'IMDRank',
        scale: ['red', 'orange', 'yellow'],
        style: {
            color: '#111111', // border color
            weight: 1,
            fillOpacity: 0.5,
            fillColor: '#ffffff'

This displays a choropleth based on the GeoJSON data in geojson, and uses a red-orange-yellow colourmap, basing the colours on the IMDRank property of each GeoJSON feature.

This will produce something like this – a map of Index of Multiple Deprivation values in Southampton, UK (read later if you want to see a Github repository of a full map):

One thing I wanted to do was create a legend for this layer in the Leaflet layers control. The leaflet-choropleth docs give an example of creating a legend, but I don’t really like the style, and the legend appears in a separate box rather than in the layers control for the map.

So, I put together a javascript function to create the sort of legend I wanted. For those who just want to use the function, it’s below. For those who want more details, read on…

function legend_for_choropleth_layer(layer, name, units, id) {
    // Generate a HTML legend for a Leaflet layer created using choropleth.js
    // Arguments:
    // layer: The leaflet Layer object referring to the layer - must be a layer using
    //        choropleth.js
    // name: The name to display in the layer control (will be displayed above the legend, and next
    //       to the checkbox
    // units: A suffix to put after each numerical range in the layer - for example to specify the
    //        units of the values - but could be used for other purposes)
    // id: The id to give the <ul> element that is used to create the legend. Useful to allow the legend
    //     to be shown/hidden programmatically
    // Returns:
    // The HTML ready to be used in the specification of the layers control
    var limits = layer.options.limits;
    var colors = layer.options.colors;
    var labels = [];

    // Start with just the name that you want displayed in the layer selector
    var HTML = name

    // For each limit value, create a string of the form 'X-Y'
    limits.forEach(function (limit, index) {
        if (index === 0) {
            var to = parseFloat(limits[index]).toFixed(0);
            var range_str = "< " + to;
        else {
            var from = parseFloat(limits[index - 1]).toFixed(0);
            var to = parseFloat(limits[index]).toFixed(0);
            var range_str = from + "-" + to;

        // Put together a <li> element with the relevant classes, and the right colour and text
        labels.push('<li class="sublegend-item"><div class="sublegend-color" style="background-color: ' +
            colors[index] + '"> </div> ' + range_str + units + '</li>');

    // Put all the <li> elements together in a <ul> element
    HTML += '<ul id="' + id + '" class="sublegend">' + labels.join('') + '</ul>';

    return HTML;

This function is fairly simple: it loops through the limits that have been defined for each of the categories in the choropleth map, and generates a chunk of HTML for each of the different categories (specifically, a <li> element), and these elements are put together and wrapped in a <ul> to produce the final HTML for the legend. We also set CSS classes for each element of the legend, so we can style them nicely later.

When setting up the layers control in Leaflet you pass an object mapping display names (the text you want displayed in the layers control) to Layer objects – something like this:

var layers = {
    'OpenStreetMap': layer_OSM,
    'IMD': layer_IMD

var layersControl = L.control.layers({},
    { collapsed: false }).addTo(map);

To use the function to generate a legend, replace the simple display name with a call to the function, wrapped in []‘s because of javascript’s weird inability to parse function calls in object keys. For example:

var layers = {
    'OpenStreetMap': layer_OSM,
    [legend_for_choropleth_layer(layer_IMD, 'IMD', '', 'legend_IMD')]: layer_IMD

Here we’re passing layer_IMD as the Layer object, IMD as the name to display above the legend, no units (so the empty string), and telling it to give the legend HTML element an ID of legend_IMD.

This produces a legend that looks something like this:

To get this nice looking legend, we use the following CSS:

.sublegend-color {
    width: 20px;
    border: 1px solid #666666;
    display: inline-block;
    opacity: 0.5;

.sublegend-item {
    padding-top: 0.2em;

.sublegend {
    list-style: none;
    padding-inline-start: 24px;
    margin-top: 0px;

Just for one final touch, I’d like the legend to disappear when the layer is ‘turned off’, and appear again when it is ‘turned on’ again. This is particularly useful when you have multiple choropleth layers on a map and the combined length of the legends make the layers control very long.

We can do this with a quick bit of jQuery (yes, I know it can be done in pure javascript, but I prefer using jQuery as it’s generally easier). Remember that one of the parameters to the legend_for_choropleth_layer function was the HTML ID to give the legend? Now you know why: we need to use that ID to hide and show the legend.

We connect to some of the Leaflet events to find out when the layers are turned on or off, and then use the jQuery hide and show methods. There’s one little niggle though: we have to use the setTimeout function to ensure that we only run this once – otherwise we get multiple events raised and it causes problems. So, the code to do this is:

layer_IMD.on('add', function () {
    // Need setTimeout so that we don't get multiple
    // onadd/onremove events raised
    setTimeout(function () {

layer_IMD.on('remove', function () {
    // Need setTimeout so that we don't get multiple
    // onadd/onremove events raised
    setTimeout(function () {

You can see how this works by looking at the final map here – try turning the IMD layer off and on again.

All of the code behind this example is available on Github if you want to check how it all fits together.

This work was done while analysing GIS data and producing a webmap for a freelancing client. If you’d like me to do something similar for you, have a look at my freelance website or email me.

Easily hiding items from the legend in matplotlib

When producing some graphs for a client recently, I wanted to hide some labels from a legend in matplotlib. I started investigating complex arguments to the plt.legend function, but it turned out that there was a really simple way to do it…

If you start your label for a plot item with an underscore (_) then that item will be hidden from the legend.

For example:

plt.plot(np.random.rand(20), label='Random 1')
plt.plot(np.random.rand(20), label='Random 2')
plt.plot(np.random.rand(20), label='_Hidden label')

produces a plot like this:

You can see that the third line is hidden from the legend – just because we started its label with an underscore.

I found this particularly useful when I wanted to plot a load of lines in the same colour to show all the data for something, and then highlight a few lines that meant specific things. For example:

for i in range(20):
    plt.plot(np.random.rand(20), label='_Hidden', color='gray', alpha=0.3)
plt.plot(np.random.rand(20), label='Important Line 1')
plt.plot(np.random.rand(20), label='Important Line 2')


My next step was to do this when plotting from pandas. In this case I had a dataframe that had a column for each line I wanted to plot in the ‘background’, and then a separate dataframe with each of the ‘special’ lines to highlight.

This code will create a couple of example dataframes:

df = pd.DataFrame()

for i in range(20):
    df[f'Data{i}'] = np.random.rand(20)

special = pd.Series(data=np.random.rand(20))

Plotting this produces a legend with all the individual lines showing:

df.plot(color='gray', alpha=0.3)


However, just by changing the column names to start with an underscore you can hide all the entries in the legend. In this example, I actually set one of the columns to a name without an underscore, so that column can be used as a label to represent all of these lines:

cols = ["_" + col for col in df.columns]
cols[0] = 'All other data'
df.columns = cols

Plotting again using exactly the same command as above gives us this – along with some warnings saying that a load of legend items are going to be ignored (in case we accidentally had pandas columns starting with _)


Putting it all together, we can plot both dataframes, with a sensible legend:

ax = df.plot(color='gray', alpha=0.3)
special.plot(ax=ax, label='Special data')


Advert: I do freelance data science work – please see here for more details.

Calculating Rayleigh Reflectance using Py6S

A user of Py6S recently contacted me to ask if it was possible to get an output of Rayleigh reflectance from Py6S. Unfortunately this email wasn’t sent to the Py6s Google Group, so I thought I’d write a blog post explaining how to do this, and showing a few outputs (reminder: please post Py6S questions there rather than emailing me directly, then people with questions in the future can find the answers there rather than asking again).

So, first of all, what is Rayleigh reflectance? Well, it’s the reflectance (as measured at the top-of-atmosphere) that is caused by Rayleigh scattering in the atmosphere. This is the wavelength-dependent scattering of light by gas molecules in the atmosphere – and it is an inescapable effect of light passing through the atmosphere.

So, on to how to calculate it in Py6S. Unfortunately the underlying 6S model doesn’t provide Rayleigh reflectance as an output, so we have to do a bit more work to calculate it.

First, let’s import Py6S and set up a few basic parameters:

from Py6S import *

s = SixS()

# Standard altitude settings for the sensor
# and target

# Wavelength of 0.5nm
s.wavelength = Wavelength(0.5)

Now, to calculate the reflectance which is entirely due to Rayleigh scattering we need to ‘turn off’ everything else that is going on that could contribute to the reflectance. First, we ‘turn off’ the ground reflectance by setting it to zero, so we won’t have any contribution from the ground reflectance:

s.ground_reflectance = GroundReflectance.HomogeneousLambertian(0)

Then we turn off aerosol scattering:

s.aero_profile = AeroProfile.PredefinedType(AeroProfile.NoAerosols)

and also atmospheric absorption by gases:

s.atmos_profile = AtmosProfile.PredefinedType(AtmosProfile.NoGaseousAbsorption)

We can then run the simulation (using and look at the outputs. The best way to do this is to just run:


to look at the ‘pretty’ text output that Py6S provides. The value we want is the ‘apparent reflectance’ – which is the reflectance at the top-of-atmosphere. Because we’ve turned off everything else, this will be purely caused by the Rayleigh reflectance.

We can access this value programmatically as s.outputs.apparent_reflectance.

So, that’s how to get the Rayleigh reflectance – but there are a few more interesting things to say…

Firstly, we don’t actually have to set the ground reflectance to zero. If we set the ground reflectance to something else – for example:

s.ground_reflectance = GroundReflectance.HomogeneousLambertian(GroundReflectance.GreenVegetation)

and run the simulation, then we will get a different answer for the apparent radiance – because the ground reflectance is now being taken into account – but we will see the value we want as the atmospheric intrinsic reflectance. This is the reflectance that comes directly from the atmosphere (in this case just from Rayleigh scattering, but in normal situations this would include aerosol scattering as well). This can be accessed programmatically as s.outputs.atmospheric_intrinsic_reflectance.

One more thing, just to show that Rayleigh reflectance in Py6S behaves in the manner that we’d expect from what we know of the physics… We can put together a bit of code that will extract the Rayleigh reflectance at various wavelengths and plot a graph – we’d expect an exponentially-decreasing curve, showing high Rayleigh reflectance at low wavelengths, and vice versa.

The code below will do this:

from Py6S import *
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

s = SixS()

s.aero_profile = AeroProfile.PredefinedType(AeroProfile.NoAerosols)
s.atmos_profile = AtmosProfile.PredefinedType(AtmosProfile.NoGaseousAbsorption)

wavelengths = np.arange(0.3, 1.0, 0.05)
results = []

for wv in wavelengths:
    s.wavelength = Wavelength(wv)

    results.append({'wavelength': wv,
                   'rayleigh_refl': s.outputs.atmospheric_intrinsic_reflectance})

results = pd.DataFrame(results)

results.plot(x='wavelength', y='rayleigh_refl', style='x-', label='Rayleigh Reflectance', grid=True)
plt.xlabel('Wavelength ($\mu m$)')
plt.ylabel('Rayleigh Reflectance (no units)')

This produces the following graph, which shows exactly what the physics predicts:


There’s nothing particularly revolutionary in that chunk of code – we’ve just combined the code I demonstrated earlier, and then looped through various wavelengths and run the model for each wavelength.

The way that we’re storing the results from the model deserves a brief explanation, as this is a pattern I use a lot. Each time the model is run, a new dict is appended to a list – and this dict has entries for the various parameters we’re interested in (in this case just wavelength) and the various results we’re interested in (in this case just Rayleigh reflectance). After we’ve finished the loop we can simply pass this list of dicts to pd.DataFrame() and get a nice pandas DataFrame back – ready to display, plot or analyse further.

Automatically annotating a boxplot in matplotlib

You can probably tell from the sudden influx of matplotlib posts that I’ve been doing a lot of work plotting graphs recently…

I have produced a number of boxplots to compare different sets of data. Some of these graphs are for a non-technical audience, and my client agreed that a boxplot was the best way to visualise the data, but wanted the various elements of the boxplot to be labelled so the audience could work out how to interpret it.

I started doing this manually using the plt.annotate function, but quickly got fed up with manually positioning everything – so I wrote a quick function to do it for me.

If you just want the code then here it is – it’s not perfect, but it should be a good starting point for you:

def annotate_boxplot(bpdict, annotate_params=None,
                     x_offset=0.05, x_loc=0,
    """Annotates a matplotlib boxplot with labels marking various centile levels.

    - bpdict: The dict returned from the matplotlib `boxplot` function. If you're using pandas you can
    get this dict by setting `return_type='dict'` when calling `df.boxplot()`.
    - annotate_params: Extra parameters for the plt.annotate function. The default setting uses standard arrows
    and offsets the text based on other parameters passed to the function
    - x_offset: The offset from the centre of the boxplot to place the heads of the arrows, in x axis
    units (normally just 0-n for n boxplots). Values between around -0.15 and 0.15 seem to work well
    - x_loc: The x axis location of the boxplot to annotate. Usually just the number of the boxplot, counting
    from the left and starting at zero.
    text_offset_x: The x offset from the arrow head location to place the associated text, in 'figure points' units
    text_offset_y: The y offset from the arrow head location to place the associated text, in 'figure points' units
    if annotate_params is None:
        annotate_params = dict(xytext=(text_offset_x, text_offset_y), textcoords='offset points', arrowprops={'arrowstyle':'->'})

    plt.annotate('Median', (x_loc + 1 + x_offset, bpdict['medians'][x_loc].get_ydata()[0]), **annotate_params)
    plt.annotate('25%', (x_loc + 1 + x_offset, bpdict['boxes'][x_loc].get_ydata()[0]), **annotate_params)
    plt.annotate('75%', (x_loc + 1 + x_offset, bpdict['boxes'][x_loc].get_ydata()[2]), **annotate_params)
    plt.annotate('5%', (x_loc + 1 + x_offset, bpdict['caps'][x_loc*2].get_ydata()[0]), **annotate_params)
    plt.annotate('95%', (x_loc + 1 + x_offset, bpdict['caps'][(x_loc*2)+1].get_ydata()[0]), **annotate_params)

You can then run code like this:

df = pd.DataFrame({'Column 1': np.random.normal(size=100),
                   'Column 2': np.random.normal(scale=2, size=100)})

bpdict = df.boxplot(whis=[5, 95], return_type='dict')
annotate_boxplot(bpdict, x_loc=1)

This will produce something like this:


You can pass various parameters to change the display. For example, to make the labels closer to the boxplot and lower than the thing they’re pointing at, set the parameters as: text_offset_x=20 and text_offset_y=-20, giving:


So, how does this work? Well, when you create a boxplot, matplotlib very helpfully returns you a dict containing the matplotlib objects referring to each part of the boxplot: the box, the median line, the whiskers etc. It looks a bit like this:

{'boxes': [<matplotlib.lines.Line2D object at 0x1179db908>,
           <matplotlib.lines.Line2D object at 0x1176ac3c8>],
 'caps': [<matplotlib.lines.Line2D object at 0x117736668>,
          <matplotlib.lines.Line2D object at 0x1177369b0>,
          <matplotlib.lines.Line2D object at 0x1176acda0>,
          <matplotlib.lines.Line2D object at 0x1176ace80>],
 'fliers': [<matplotlib.lines.Line2D object at 0x117736dd8>,
            <matplotlib.lines.Line2D object at 0x1176b07b8>],
 'means': [],
 'medians': [<matplotlib.lines.Line2D object at 0x117736cf8>,
             <matplotlib.lines.Line2D object at 0x1176b0470>],
 'whiskers': [<matplotlib.lines.Line2D object at 0x11774ef98>,
              <matplotlib.lines.Line2D object at 0x117736320>,
              <matplotlib.lines.Line2D object at 0x1176ac710>,
              <matplotlib.lines.Line2D object at 0x1176aca58>]}

Each of these objects is a matplotlib.lines.Line2D object, which has a get_xdata() and get_ydata() method (see the docs for more details). In this case, all we’re interested in is the y locations, so get_ydata() suffices.

All we do then is grab the right co-ordinate from the list of co-ordinates that are returned (noting that for the box we have to look at the 0th and the 2nd co-ordinates to get the bottom and top of the box respectively). We also have to remember that the caps dict entry has two objects for each individual boxplot – as there are caps at the bottom and the top – so we have to be a bit careful with selecting those.

The other useful thing to point out is that you can choose what co-ordinate systems you use with the plt.annotate function – so you can set textcoords='offset points' and then set the same xytext value each time we call it – and all the labels will be offset from their arrowhead location the same amount. That saves us manually calculating a location for the text each time.

This function was gradually expanded to allow more configurability – and could be expanded a lot more, but should work as a good starting point for anyone wanting to do the same sort of thing. It depends on a lot of features of plt.annotate – for more details on how to use this function look at its documentation or the Advanced Annotation guide.

Matplotlib titles have configurable locations – and you can have more than one at once!

Just a quick post here to let you know about a matplotlib feature I’ve only just found out about.

I expect most of my readers know how to produce a simple plot with a title using matplotlib:

plt.plot([1, 2, 3])
plt.title('Title here')

which gives this output:

I spent a while today playing around with special code (using plt.annotate) to put some text on the right-hand side of the title line – but found it really difficult to get the text in just the right location…until I found that you can do this with the plt.title function:

plt.plot([1, 2, 3])
plt.title('On the Right!', loc='right')

giving this:

You can probably guess how to put a title on the left – yup, it’s loc='left'.

What makes things even better is that you can put multiple titles in different places:

plt.plot([1, 2, 3])
plt.title('Centre Title')
plt.title('RH title', loc='right')
plt.title('LH title', loc='left')



I used this recently as part of some freelance work to produce graphs of air quality in Southampton. I had lots of graphs using data from different periods – one might be just for spring, or one just for early August – and I wanted to make it clear what date range was used for each graph. Putting the date range covered by each graph on the right-hand side of the title line made it very easy for the reader to see what data was used – and I did it with a simple bit of code like this:

plt.plot([1, 2, 3])
plt.title('Straight line graph')
plt.title('1st-5th June', loc='right', fontstyle='italic')

producing this:

(Note: you can pass arguments like fontstyle='italic' to any matplotlib function that produces text – things like title(), xlabel() and so on)

I am now a freelancer in Remote Sensing, GIS, Data Science & Python

I’ve been doing a bit of freelancing ‘on the side’ for a while – but now I’ve made it official: I am available for freelance work. Please look at my new website or contact me if you’re interested in what I can do for you, or carry on reading for more details.

Since I stopped working as an academic, and took time out to focus on my work and look after my new baby, I’ve been trying to find something which allows me to fit my work nicely around the rest of my life. I’ve done bits of short part-time work contracts, and various bits of freelance work – and I’ve now decided that freelancing is the way forward.

I’ve created a new freelance website which explains what I do and the experience I have – but to summarise here, my areas of focus are:

  • Remote Sensing – I am an expert at processing satellite and aerial imagery, and have processed time-series of thousands of images for a range of clients. I can help you produce useful information from raw satellite data, and am particularly experienced at atmospheric remote sensing and atmospheric correction.
  • GIS – I can process geographic data from a huge range of sources into a coherent data library, perform analyses and produce outputs in the form of static maps, webmaps and reports.
  • Data science – I have experience processing terabytes of data to produce insights which were used directly by the United Nations, and I can apply the same skills to processing your data: whether it is a single questionnaire or a huge automatically-generated dataset. I am particularly experienced at making research reproducible and self-documenting.
  • Python – I am an experienced Python programmer, and maintain a number of open-source modules (such as Py6S). I produce well-written, Pythonic code with high-quality tests and documentation.

The testimonials on my website show how much previous clients have valued the work I’ve done for them.

I’ve heard from a various people that they were rather put off by the nature of the auction that I ran for a day’s work from me – so if you were interested in working with me but wanted a standard sort of contract, and more than a day’s work, then please get in touch and we can discuss how we could work together.

(I’m aware that the last few posts on the blog have been focused on the auction for work, and this announcement of freelance work. Don’t worry – I’ve got some more posts lined up which are more along my usual lines. Stay tuned for posts on Leaflet webmaps and machine learning of large raster stacks)

Less than a week left to bid for a day’s work from me

Just a quick reminder that you’ve only got until next Tuesday to bid for a day’s work from me – so get bidding here.

The full details and rules are available in my previous post, but basically I’ll do a day’s work for the highest bidder in this auction – working on coding, data science, GIS/remote sensing, teaching…pretty much anything in my areas of expertise. This could be a great way to get some work from me for a very reasonable price – so please have a look, and share with anyone else who you think might be interested.

Bid for a day’s work from me

Summary: I will do a day’s work for the highest bidder in this auction. This could mean you get a day’s work from me very cheaply. Please read all of this post carefully, and then submit your bid here before 5th Feb.

This experiment is based very heavily on David MacIver’s experiment in auctioning off a day’s work (see his blog posts introducing it, and summarising the results). It seemed to work fairly well for him, and I am interested to see how it will work for me.

So, if you win this auction, I will do one day (8 hours) of work for you, on a project of your choosing. If you’ve been following this blog then you’ll have a reasonable idea of what sort of things I can do – but to jog your memory, here are some ideas:
  • Working on an open-source project: I could work to add features to, fix bugs in, or document, an open-source project of mine – probably either Py6S or recipy.
  • Pair programming: I could work with you to write some code – it could be code to do pretty-much anything, but I’m most experienced in data science, geographical data processing, computer vision, remote sensing/GIS and similar areas.
  • Programming: I could write some code by myself, to do a reasonably-simple task of your choosing, providing the well-documented code for you to use or develop further. As above, this could be anything, but would work best if it were in my areas of expertise.
  • Data science: I could do some analysis of a reasonably simple dataset for you, providing well-documented code to allow you to extend the analysis.
  • GIS/Remote Sensing: I could perform some remote sensing/GIS analysis on a dataset, potentially producing well-designed maps as outputs.
  • Teaching: I could work with you, online or in person, to help you understand a topic with which I am familiar – for example, Python programming, data science, computer science, remote sensing, GIS and so on.
  • Review & comments: I could review and give comments on documents in my areas of expertise, for example, a draft paper, chapter of a thesis, or similar.
These are just a few ideas of things I could do – I am happy to do most things, although I will let you know if I think that I do not have the required expertise to do what you are requesting.


  1. The bid is only for me to work for 8 hours, so I strongly suggest either a short self-contained project, or something that can be stopped at any point and still be useful. If you want me to continue working past 8 hours then I would be happy to negotiate some further work – but this would be entirely outside of the bidding process.
  2. The 8 hours work will likely be split over multiple days: due to my health I find working for 8 hours straight to be very difficult, so I will probably do the work in two or three chunks. I am happy to do the work entirely independently, or to work in close collaboration with you.
  3. If I produce something tangible as part of this work (eg. some code, some documentation) then I will give you the rights to do whatever you wish with these (the only exception being work on my open-source projects, for which I will require you to agree to release the work under the same open-source license as the rest of the project).
  4. Following David’s lead, the auction will be a Vickrey Auction, where all bids are secret, and the highest bidder wins but pays the second highest bidder’s bid. This means that the mathematically best amount to bid is exactly the amount you are willing to pay for my time.
  5. If there is only one bidder, then you will get a day of my work and pay nothing for it.
  6. If there is a tie for top place then I will pick the work I most want to do, and charge the highest bid.
  7. The auction closes at 23:59 UTC on the 5th February 2019. Bids submitted after that time will be invalid.
  8. The day of work must be claimed by the end of March 2019. I will contact the winner to arrange dates and times. I will send an invoice after the work is completed, and this must be paid within 30 days.
  9. If your company wants to bid then I am happy to invoice them after the work is complete and, within reason, jump through the necessary hoops to get the invoice paid.
  10. If you wish me to work in-person then I will invoice you for travel costs on top of the bid payment. Work can only be carried out in a wheelchair accessible building, and in general I would prefer remote work.
  11. If you ask me to do something illegal, unethical, or just something that I firmly do not want to do, then I will delete your bid. If you would have been one of the top bidders then I will inform you of this.
  12. After the auction is over, and the work has been completed, I will post on this blog a summary of the bids received, the winning bid and so on.
To go ahead and submit your bid, please fill in the form here.

I give talks – on science, programming and more

The quick summary of this post is: I give talks. You might like them. Here are some details of talks I’ve done. Feel free to invite me to speak to your group – contact me at [email protected]. Read on for more details.

I enjoy giving talks on a variety of subjects to a range of groups. I’ve mentioned some of my programming talks on my blog before, but I haven’t mentioned anything about my other talks so far. I’ve spoken at amateur science groups (Cafe Scientifique or U3A science groups and similar), programming conferences (EuroSciPy, PyCon UK etc), schools (mostly to sixth form students), unconferences (including short talks made up on the day) and at academic conferences.

Feedback from audiences has been very good. I’ve won the ‘best talk’ prize at a number of events including the Computational Modelling Group at the University of Southampton, the Student Conference on Complexity Science, and EuroSciPy. A local science group recently wrote:

“The presentation that Dr Robin Wilson gave on Complex systems in the world around us to our Science group was excellent. The clever animated video clips, accompanied by a clear vocal description gave an easily understood picture of the underlining principles involved. The wide range of topics taken from situations familiar to everyone made the examples pertinent to all present and maintained their interest throughout. A thoroughly enjoyable and thought provoking talk.”

A list of talks I’ve done, with a brief summary for each talk, is at the end of this post. I would be happy to present any of these talks at your event – whether that is a science group, a school Geography class, a programming meet-up or something else appropriate. Just get in touch on [email protected].

Science talks

All of these are illustrated with lots of images and videos – and one even has live demonstrations of complex system models. They’re designed for people with an interest in science, but they don’t assume any specific knowledge – everything you need is covered from the ground up.

Monitoring the environment from space

Hundreds of satellites orbit the Earth every day, collecting data that is used for monitoring almost all aspects of the environment. This talk will introduce to you the world of satellite imaging, take you beyond the ‘pretty pictures’ to the scientific data behind them, and show you how the data can be applied to monitor plant growth, air pollution and more.

From segregation to sand dunes: complex systems in the world around us

‘Complex’ systems are all around us, and are often difficult to understand and control. In this talk you will be introduced to a range of complex systems including segregation in cities, sand dune development, traffic jams, weather forecasting, the cold war and more – and will show how looking at these systems in a decentralised way can be useful in understanding and controlling them. I’m also working on a talk for a local science and technology group on railway signalling, which should be fascinating. I’m happy to come up with new talks in areas that I know a lot about – just ask.

Programming talks

These are illustrated with code examples, and can be made suitable for a range of events including local programming meet-ups, conferences, keynotes, schools and more.

Writing Python to process millions of row of mobile data – in a weekend

In April 2105 there was a devastating earthquake in Nepal, killing thousands and displacing hundreds of thousands more. Robin Wilson was working for the Flowminder Foundation at the time, and was given the task of processing millions of rows of mobile phone call records to try and extract useful information on population displacement due to the disaster. The aid agencies wanted this information as quickly as possible – so he was given the unenviable task of trying to produce preliminary outputs in one bank-holiday weekend… This talk is the story of how he wrote code in Python to do this, and what can be learnt from his experience. Along the way he’ll show how Python enables rapid development, introduce some lesser-used built-in data structures, explain how strings and dictionaries work, and show a slightly different approach to data processing.

xarray: the power of pandas for multidimensional arrays

“I wish there was a way to easily manipulate this huge multi-dimensional array in Python…”, I thought, as I stared at a huge chunk of satellite data on my laptop. The data was from a satellite measuring air quality – and I wanted to slice and dice the data in some supposedly simple ways. Using pure numpy was just such a pain. What I wished for was something like pandas – with datetime indexes, fancy ways of selecting subsets, group-by operations and so on – but something that would work with my huge multi-dimensional array.

The solution: xarray – a wonderful library which provides the power of pandas for multi-dimensional data. In this talk I will introduce the xarray library by showing how just a few lines of code can answer questions about my data that would take a lot of complex code to answer with pure numpy – questions like ‘What is the average air quality in March?’, ‘What is the time series of air quality in Southampton?’ and ‘What is the seasonal average air quality for each census output area?’.

After demonstrating how these questions can be answered easily with xarray, I will introduce the fundamental xarray data types, and show how indexes can be added to raw arrays to fully utilise the power of xarray. I will discuss how to get data in and out of xarray, and how xarray can use dask for high-performance data processing on multiple cores, or distributed across multiple machines. Finally I will leave you with a taster of some of the advanced features of xarray – including seamless access to data via the internet using OpenDAP, complex apply functions, and xarray extension libraries.

recipy: effortless provenance in Python

Imagine the situation: You’ve written some wonderful Python code which produces a beautiful output: a graph, some wonderful data, a lovely musical composition, or whatever. You save that output, naturally enough, as awesome_output.png. You run the code a couple of times, each time making minor modifications. You come back to it the next week/month/year. Do you know how you created that output? What input data? What version of your code? If you’re anything like me then the answer will often, frustratingly, be “no”.

This talk will introduce recipy, a Python module that will save you from this situation! With the addition of a single line of code to the top of your Python files, recipy will log each run of your code to a database, keeping track of all of your input files, output files and the code that was used – as well as a lot of other useful information. You can then query this easily and find out exactly how that output was created.

In this talk you will hear how to install and use recipy and how it will help you, how it hooks into Python and how you can help with further development.


School talks/lessons

Decentralised systems, complexity theory, self-organisation and more

This talk/lesson is very similar to my complex systems talk described above, but is altered to make it more suitable for use in schools. So far I have run this as a lesson in the International Baccalaureate Theory of Knowledge (TOK) course, but it would also be suitable for A-Level students studying a wide range of subjects.

GIS/Remote sensing for geographers

I’ve run a number of lessons for sixth form geographers introducing them to the basics of GIS and remote sensing. These topics are often included in the curriculum for A-Level or equivalent qualifications, but it’s often difficult to teach them without help from outside experts. In this lesson I provide an easily-understood introduction to GIS and remote sensing, taking the students from no knowledge at all to a basic understanding of the methods involved, and then run a discussion session looking at potential uses of GIS/RS in topics they have recently covered. This discussion session really helps the content stick in their minds and relates it to the rest of their course.


As an experienced programmer, and someone with formal computer science education, I have provided input to a range of computing lessons at sixth-form level. This has included short talks and part-lessons covering various programming topics, including examples of ‘programming in the real world’ and discussions on structuring code for larger projects. Recently I have provided one-on-one support to A-Level students on their coursework projects, including guidance on code structure, object-oriented design, documentation and GUI/backend interfaces.

Mismatch between 6S atmospheric correction results & those from coefficients

A while back a friend on Twitter pointed me towards a question on the GIS StackExchange site about the 6S model, asking if “that was the thing you wrote”. I didn’t write the 6S model (Eric Vermote and colleagues did that), but I did write a fairly well-used Python interface to the 6S model, so I know a fair amount about it.

The question was about atmospherically correcting radiance values using 6S. When you configure the atmospheric correction mode in 6S you give it a radiance value measured at the sensor and it outputs an atmospherically-corrected radiance value. Simple. However, it also outputs three coefficients: xa, xb and xc which can be used to atmospherically-correct other at-sensor radiance values. These coefficients are used in the following formulae, given in the 6S output:

y=xa*(measured radiance)-xb

where acr is the atmospherically-corrected radiance.

The person asking the question had found that when he used the formula to correct the same radiance that he had corrected using 6S itself, he got a different answer. In his case, the result from 6S itself was 0.02862, but when he ran his at-sensor radiance through the formula he got a different answer: 0.02879, a difference of 0.6%.

I was intrigued by this question, as I’ve used 6S for a long time and never noticed this before…strangely, I’d never thought to check! The rest of this post is basically a copy of my answer on the StackExchange site, but with a few bits of extra explanation.

I thought originally that it might be an issue with the parameterisation of 6S – but I tried a few different parameterisations myself and came up with the same issue – I was getting a slightly different atmospherically-corrected reflectance when putting the coefficients through the formula, compared to the reflectance that was output by the 6S model directly.

The 6S manual is very detailed, but somehow never seems to answer the questions that I have – for example, it doesn’t explain anywhere how the three coefficients are calculated. It does, however, have an example output file which includes the atmospheric correction results (see the final page of Part 1 of the manual). This includes the following outputs:

* atmospheric correction result *
* ----------------------------- *
* input apparent reflectance : 0.100 *
* measured radiance [w/m2/sr/mic] : 38.529 *
* atmospherically corrected reflectance *
* Lambertian case : 0.22180 *
* BRDF case : 0.22180 *
* coefficients xa xb xc : 0.00685 0.03885 0.06835 *
* y=xa*(measured radiance)-xb; acr=y/(1.+xc*y) *

If you work through the calculation using the formula given you find that the result of the calculation doesn’t match the 6S output. Let me say that again: in the example provided by the 6S authors, the model output and formula don’t match! I couldn’t quite believe this…

So, I wondered if the formula was some sort of simple curve fitting to a few outputs from 6S, and would therefore be expected to have a small error compared to the actual model outputs. As mentioned earlier, the manual explains a lot of things in a huge amount of detail, but is completely silent on the calculation of these coefficients. Luckily the 6S source code is available to download. Less conveniently, the source code is in written in Fortran 77!

I am by no means an expert in Fortran 77 (in fact, I’ve never written any Fortran code in real-life), but I’ve had a dig in to the code to try and find out how the coefficients are calculated.

If you want to follow along, the code to calculate the coefficients starts at line 3382 of main.f. The actual coefficients are set in lines 3393-3397:


(strangely xb is set twice, to the same value, and another coefficient xap is set, which never seems to be used – I have no idea why!).

It’s fairly obvious from this code that there is no complicated curve fitting algorithm used – the coefficients are simply algebraic manipulations of other variables used in the model. For example, xc is set to the value of the variable sast, which, through a bit of detective work, turns out to be the total spherical albedo (see line 3354). You can check this in the 6S output: the value of xc is always the same as the total spherical albedo which is shown a few lines further up in the output file. Similarly xb is calculated based on various variables including tgasm, which is the total global gas transmittance and sdtott, which is the total downward scattering transmittance, and so on. (These variables are difficult to decode, because Fortran 77 has a limit of six characters for variable names, so they aren’t very descriptive!).

I was stumped at this point, until I thought about numerical precision. I realised that the xacoefficient has a number of zeros after the decimal point, and wondered if there might not be enough significant figures to produce an accurate output when using the formula. It turned out this was the case, but I’ll go through how I altered the 6S code to test this.

Line 3439 of main.f is responsible for writing the coefficients to the file. It consists of:

write(iwr, 944)xa,xb,xc

This tells Fortran to write the output to the file/output stream iwr using the format code specified at line 944, and write the three variables xaxb and xc. Looking at line 944 (that is, the line given a Fortran line number of 944, which is actually line 3772 in the file…just to keep you on your toes!) we see:

  944 format(1h*,6x,40h coefficients xa xb xc                 :, 
     s           1x, 3(f8.5,1x),t79,1h*,/,1h*,6x,
     s           ' y=xa*(measured radiance)-xb;  acr=y/(1.+xc*y)',
     s               t79,1h*,/,79(1h*))

This rather complicated line explains how to format the output. The key bit is 3(f8.5,1x) which tells Fortran to write a floating point number (f) with a maximum width of 8 characters, and 5 decimal places (8.5) followed by a space (1x), and to repeat that three times (the 3(...)). We can alter this to print out more decimal places – for example, I changed it to 3(f10.8,1x), which gives us 8 decimal places. If we do this, then we find that the output runs into the *‘s that are at the end of each line, so we need to alter a bit of the rest of the line to reduce the number of spaces after the text coefficients xa xb xc. The final, working line looks like this:

  944 format(1h*,6x,35h coefficients xa xb xc            :, 
 s           1x, 3(f10.8,1x),t79,1h*,/,1h*,6x,
 s           ' y=xa*(measured radiance)-xb;  acr=y/(1.+xc*y)',
 s               t79,1h*,/,79(1h*))

If you alter this line in main.f and recompile 6S, you will see that your output looks like this:

*                        atmospheric correction result                        *
*                        -----------------------------                        *
*       input apparent reflectance            :    0.485                      *
*       measured radiance [w/m2/sr/mic]       :  240.000                      *
*       atmospherically corrected reflectance                                 *
*       Lambertian case :      0.45439                                        *
*       BRDF       case :      0.45439                                        *
*       coefficients xa xb xc            : 0.00297362 0.20291930 0.24282509   *
*       y=xa*(measured radiance)-xb;  acr=y/(1.+xc*y)                         *

If you then apply the formula you will find that the output of the formula, and the output of the model match – at least, to the number of decimal places of the model output.

In my tests of this, I got the following for the original 6S code:

  • Model: 0.4543900000
  • Formula: 0.4537049078
  • Perc Diff: 0.1507718536%

(the percentage difference I was getting was smaller than the questioner found – but that will just depend on the parameterisation used)

and this for my altered 6S code:

  • Model: 0.4543900000
  • Formula: 0.4543942552
  • Perc Diff: -0.0009364659%

A lot better!

For reference, to investigate this I used Py6S, the Python interface to the 6S model that I wrote. I used the following functions to automatically calculate the results using the formula from a Py6S SixS object, and to calculate the percentage difference automatically:

def calc_acr(radiance, xa, xb, xc):
    y = xa * radiance - xb
    acr = y/(1.0 + xc * y)

    return acr

def calc_acr_from_obj(radiance, s):
    return calc_acr(radiance, s.outputs.coef_xa, s.outputs.coef_xb, s.outputs.coef_xc)

def difference_between_formula_and_model(s):
    formula = calc_acr_from_obj(s.outputs.measured_radiance, s)
    model = s.outputs.atmos_corrected_reflectance_lambertian

    diff = model - formula

    perc_diff = (diff / model) * 100

    print("Model: %.10f" % model)
    print("Formula: %.10f" % formula)
    print("Perc Diff: %.10f%%" % perc_diff)

and my example errors above came from running Py6S using the following parameterisation:

s = SixS()
s.atmos_corr = AtmosCorr.AtmosCorrLambertianFromRadiance(240)
s.wavelength = Wavelength(PredefinedWavelengths.LANDSAT_OLI_B1)

Just as a slight addendum, if you’re atmospherically-correcting Sentinel-2 data with 6S then you might want to consider using ARCSI – an atmospheric correction tool that uses Py6S internally, but does a lot of the hard work for you. The best way to learn ARCSI is with their tutorial document.