Robin's Blog

Matplotlib titles have configurable locations – and you can have more than one at once!

September 22, 2019

Just a quick post here to let you know about a matplotlib feature I’ve only just found out about.

I expect most of my readers know how to produce a simple plot with a title using matplotlib:

plt.plot([1, 2, 3])
plt.title('Title here')

which gives this output:
file

I spent a while today playing around with special code (using plt.annotate) to put some text on the right-hand side of the title line – but found it really difficult to get the text in just the right location…until I found that you can do this with the plt.title function:

plt.plot([1, 2, 3])
plt.title('On the Right!', loc='right')

giving this:
file

You can probably guess how to put a title on the left – yup, it’s loc='left'.

What makes things even better is that you can put multiple titles in different places:

plt.plot([1, 2, 3])
plt.title('Centre Title')
plt.title('RH title', loc='right')
plt.title('LH title', loc='left')

giving

file

I used this recently as part of some freelance work to produce graphs of air quality in Southampton. I had lots of graphs using data from different periods – one might be just for spring, or one just for early August – and I wanted to make it clear what date range was used for each graph. Putting the date range covered by each graph on the right-hand side of the title line made it very easy for the reader to see what data was used – and I did it with a simple bit of code like this:

plt.plot([1, 2, 3])
plt.title('Straight line graph')
plt.title('1st-5th June', loc='right', fontstyle='italic')

producing this:
file

(Note: you can pass arguments like fontstyle='italic' to any matplotlib function that produces text – things like title(), xlabel() and so on)

1 Comment

I am now a freelancer in Remote Sensing, GIS, Data Science & Python

September 12, 2019

I’ve been doing a bit of freelancing ‘on the side’ for a while – but now I’ve made it official: I am available for freelance work. Please look at my new website or contact me if you’re interested in what I can do for you, or carry on reading for more details.

Since I stopped working as an academic, and took time out to focus on my work and look after my new baby, I’ve been trying to find something which allows me to fit my work nicely around the rest of my life. I’ve done bits of short part-time work contracts, and various bits of freelance work – and I’ve now decided that freelancing is the way forward.

I’ve created a new freelance website which explains what I do and the experience I have – but to summarise here, my areas of focus are:

Remote Sensing – I am an expert at processing satellite and aerial imagery, and have processed time-series of thousands of images for a range of clients. I can help you produce useful information from raw satellite data, and am particularly experienced at atmospheric remote sensing and atmospheric correction.
GIS – I can process geographic data from a huge range of sources into a coherent data library, perform analyses and produce outputs in the form of static maps, webmaps and reports.
Data science – I have experience processing terabytes of data to produce insights which were used directly by the United Nations, and I can apply the same skills to processing your data: whether it is a single questionnaire or a huge automatically-generated dataset. I am particularly experienced at making research reproducible and self-documenting.
Python – I am an experienced Python programmer, and maintain a number of open-source modules (such as Py6S). I produce well-written, Pythonic code with high-quality tests and documentation.

The testimonials on my website show how much previous clients have valued the work I’ve done for them.

I’ve heard from a various people that they were rather put off by the nature of the auction that I ran for a day’s work from me – so if you were interested in working with me but wanted a standard sort of contract, and more than a day’s work, then please get in touch and we can discuss how we could work together.

(I’m aware that the last few posts on the blog have been focused on the auction for work, and this announcement of freelance work. Don’t worry – I’ve got some more posts lined up which are more along my usual lines. Stay tuned for posts on Leaflet webmaps and machine learning of large raster stacks)

Share your thoughts

Less than a week left to bid for a day’s work from me

January 30, 2019 Just a quick reminder that you’ve only got until next Tuesday to bid for a day’s work from me – so get bidding here.

The full details and rules are available in my previous post, but basically I’ll do a day’s work for the highest bidder in this auction – working on coding, data science, GIS/remote sensing, teaching…pretty much anything in my areas of expertise. This could be a great way to get some work from me for a very reasonable price – so please have a look, and share with anyone else who you think might be interested.

Share your thoughts

Bid for a day’s work from me

January 16, 2019

Summary: I will do a day’s work for the highest bidder in this auction. This could mean you get a day’s work from me very cheaply. Please read all of this post carefully, and then submit your bid here before 5th Feb.

This experiment is based very heavily on David MacIver’s experiment in auctioning off a day’s work (see his blog posts introducing it, and summarising the results). It seemed to work fairly well for him, and I am interested to see how it will work for me.

So, if you win this auction, I will do one day (8 hours) of work for you, on a project of your choosing. If you’ve been following this blog then you’ll have a reasonable idea of what sort of things I can do – but to jog your memory, here are some ideas:

Working on an open-source project: I could work to add features to, fix bugs in, or document, an open-source project of mine – probably either Py6S or recipy.
Pair programming: I could work with you to write some code – it could be code to do pretty-much anything, but I’m most experienced in data science, geographical data processing, computer vision, remote sensing/GIS and similar areas.
Programming: I could write some code by myself, to do a reasonably-simple task of your choosing, providing the well-documented code for you to use or develop further. As above, this could be anything, but would work best if it were in my areas of expertise.
Data science: I could do some analysis of a reasonably simple dataset for you, providing well-documented code to allow you to extend the analysis.
GIS/Remote Sensing: I could perform some remote sensing/GIS analysis on a dataset, potentially producing well-designed maps as outputs.
Teaching: I could work with you, online or in person, to help you understand a topic with which I am familiar – for example, Python programming, data science, computer science, remote sensing, GIS and so on.
Review & comments: I could review and give comments on documents in my areas of expertise, for example, a draft paper, chapter of a thesis, or similar.

These are just a few ideas of things I could do – I am happy to do most things, although I will let you know if I think that I do not have the required expertise to do what you are requesting.

Rules

The bid is only for me to work for 8 hours, so I strongly suggest either a short self-contained project, or something that can be stopped at any point and still be useful. If you want me to continue working past 8 hours then I would be happy to negotiate some further work – but this would be entirely outside of the bidding process.
The 8 hours work will likely be split over multiple days: due to my health I find working for 8 hours straight to be very difficult, so I will probably do the work in two or three chunks. I am happy to do the work entirely independently, or to work in close collaboration with you.
If I produce something tangible as part of this work (eg. some code, some documentation) then I will give you the rights to do whatever you wish with these (the only exception being work on my open-source projects, for which I will require you to agree to release the work under the same open-source license as the rest of the project).
Following David’s lead, the auction will be a Vickrey Auction, where all bids are secret, and the highest bidder wins but pays the second highest bidder’s bid. This means that the mathematically best amount to bid is exactly the amount you are willing to pay for my time.
If there is only one bidder, then you will get a day of my work and pay nothing for it.
If there is a tie for top place then I will pick the work I most want to do, and charge the highest bid.
The auction closes at 23:59 UTC on the 5th February 2019. Bids submitted after that time will be invalid.
The day of work must be claimed by the end of March 2019. I will contact the winner to arrange dates and times. I will send an invoice after the work is completed, and this must be paid within 30 days.
If your company wants to bid then I am happy to invoice them after the work is complete and, within reason, jump through the necessary hoops to get the invoice paid.
If you wish me to work in-person then I will invoice you for travel costs on top of the bid payment. Work can only be carried out in a wheelchair accessible building, and in general I would prefer remote work.
If you ask me to do something illegal, unethical, or just something that I firmly do not want to do, then I will delete your bid. If you would have been one of the top bidders then I will inform you of this.
After the auction is over, and the work has been completed, I will post on this blog a summary of the bids received, the winning bid and so on.

To go ahead and submit your bid, please fill in the form here.

2 Comments

I give talks – on science, programming and more

October 23, 2018

The quick summary of this post is: I give talks. You might like them. Here are some details of talks I’ve done. Feel free to invite me to speak to your group – contact me at robin@rtwilson.com. Read on for more details.

I enjoy giving talks on a variety of subjects to a range of groups. I’ve mentioned some of my programming talks on my blog before, but I haven’t mentioned anything about my other talks so far. I’ve spoken at amateur science groups (Cafe Scientifique or U3A science groups and similar), programming conferences (EuroSciPy, PyCon UK etc), schools (mostly to sixth form students), unconferences (including short talks made up on the day) and at academic conferences.

Feedback from audiences has been very good. I’ve won the ‘best talk’ prize at a number of events including the Computational Modelling Group at the University of Southampton, the Student Conference on Complexity Science, and EuroSciPy. A local science group recently wrote:

"The presentation that Dr Robin Wilson gave on Complex systems in the world around us to our Science group was excellent. The clever animated video clips, accompanied by a clear vocal description gave an easily understood picture of the underlining principles involved. The wide range of topics taken from situations familiar to everyone made the examples pertinent to all present and maintained their interest throughout. A thoroughly enjoyable and thought provoking talk."

A list of talks I’ve done, with a brief summary for each talk, is at the end of this post. I would be happy to present any of these talks at your event – whether that is a science group, a school Geography class, a programming meet-up or something else appropriate. Just get in touch on robin@rtwilson.com.

Science talks

All of these are illustrated with lots of images and videos – and one even has live demonstrations of complex system models. They’re designed for people with an interest in science, but they don’t assume any specific knowledge – everything you need is covered from the ground up.

Monitoring the environment from space

Hundreds of satellites orbit the Earth every day, collecting data that is used for monitoring almost all aspects of the environment. This talk will introduce to you the world of satellite imaging, take you beyond the ‘pretty pictures’ to the scientific data behind them, and show you how the data can be applied to monitor plant growth, air pollution and more.

From segregation to sand dunes: complex systems in the world around us

‘Complex’ systems are all around us, and are often difficult to understand and control. In this talk you will be introduced to a range of complex systems including segregation in cities, sand dune development, traffic jams, weather forecasting, the cold war and more – and will show how looking at these systems in a decentralised way can be useful in understanding and controlling them. I’m also working on a talk for a local science and technology group on railway signalling, which should be fascinating. I’m happy to come up with new talks in areas that I know a lot about – just ask.

Programming talks

These are illustrated with code examples, and can be made suitable for a range of events including local programming meet-ups, conferences, keynotes, schools and more.

Writing Python to process millions of row of mobile data – in a weekend

In April 2105 there was a devastating earthquake in Nepal, killing thousands and displacing hundreds of thousands more. Robin Wilson was working for the Flowminder Foundation at the time, and was given the task of processing millions of rows of mobile phone call records to try and extract useful information on population displacement due to the disaster. The aid agencies wanted this information as quickly as possible, so he was given the unenviable task of trying to produce preliminary outputs in one bank-holiday weekend! This talk is the story of how he wrote code in Python to do this, and what can be learnt from his experience. Along the way he’ll show how Python enables rapid development, introduce some lesser-used built-in data structures, explain how strings and dictionaries work, and show a slightly different approach to data processing.

xarray: the power of pandas for multidimensional arrays

"I wish there was a way to easily manipulate this huge multi-dimensional array in Python…", I thought, as I stared at a huge chunk of satellite data on my laptop. The data was from a satellite measuring air quality – and I wanted to slice and dice the data in some supposedly simple ways. Using pure numpy was just such a pain. What I wished for was something like pandas – with datetime indexes, fancy ways of selecting subsets, group-by operations and so on – but something that would work with my huge multi-dimensional array.

The solution: xarray – a wonderful library which provides the power of pandas for multi-dimensional data. In this talk I will introduce the xarray library by showing how just a few lines of code can answer questions about my data that would take a lot of complex code to answer with pure numpy – questions like ‘What is the average air quality in March?’, ‘What is the time series of air quality in Southampton?’ and ‘What is the seasonal average air quality for each census output area?’.

After demonstrating how these questions can be answered easily with xarray, I will introduce the fundamental xarray data types, and show how indexes can be added to raw arrays to fully utilise the power of xarray. I will discuss how to get data in and out of xarray, and how xarray can use dask for high-performance data processing on multiple cores, or distributed across multiple machines. Finally I will leave you with a taster of some of the advanced features of xarray – including seamless access to data via the internet using OpenDAP, complex apply functions, and xarray extension libraries.

recipy: effortless provenance in Python

Imagine the situation: Youâ€™ve written some wonderful Python code which produces a beautiful output: a graph, some wonderful data, a lovely musical composition, or whatever. You save that output, naturally enough, as awesome_output.png. You run the code a couple of times, each time making minor modifications. You come back to it the next week/month/year. Do you know how you created that output? What input data? What version of your code? If you’re anything like me then the answer will often, frustratingly, be "no".

This talk will introduce recipy, a Python module that will save you from this situation! With the addition of a single line of code to the top of your Python files, recipy will log each run of your code to a database, keeping track of all of your input files, output files and the code that was used – as well as a lot of other useful information. You can then query this easily and find out exactly how that output was created.

In this talk you will hear how to install and use recipy and how it will help you, how it hooks into Python and how you can help with further development.

School talks/lessons

Decentralised systems, complexity theory, self-organisation and more

This talk/lesson is very similar to my complex systems talk described above, but is altered to make it more suitable for use in schools. So far I have run this as a lesson in the International Baccalaureate Theory of Knowledge (TOK) course, but it would also be suitable for A-Level students studying a wide range of subjects.

GIS/Remote sensing for geographers

I’ve run a number of lessons for sixth form geographers introducing them to the basics of GIS and remote sensing. These topics are often included in the curriculum for A-Level or equivalent qualifications, but it’s often difficult to teach them without help from outside experts. In this lesson I provide an easily-understood introduction to GIS and remote sensing, taking the students from no knowledge at all to a basic understanding of the methods involved, and then run a discussion session looking at potential uses of GIS/RS in topics they have recently covered. This discussion session really helps the content stick in their minds and relates it to the rest of their course.

Computing

As an experienced programmer, and someone with formal computer science education, I have provided input to a range of computing lessons at sixth-form level. This has included short talks and part-lessons covering various programming topics, including examples of ‘programming in the real world’ and discussions on structuring code for larger projects. Recently I have provided one-on-one support to A-Level students on their coursework projects, including guidance on code structure, object-oriented design, documentation and GUI/backend interfaces.

1 Comment

Mismatch between 6S atmospheric correction results & those from coefficients

October 18, 2018

A while back a friend on Twitter pointed me towards a question on the GIS StackExchange site about the 6S model, asking if "that was the thing you wrote". I didn’t write the 6S model (Eric Vermote and colleagues did that), but I did write a fairly well-used Python interface to the 6S model, so I know a fair amount about it.

The question was about atmospherically correcting radiance values using 6S. When you configure the atmospheric correction mode in 6S you give it a radiance value measured at the sensor and it outputs an atmospherically-corrected radiance value. Simple. However, it also outputs three coefficients: x_a, x_b and x_c which can be used to atmospherically-correct other at-sensor radiance values. These coefficients are used in the following formulae, given in the 6S output:

y=xa*(measured radiance)-xb
acr=y/(1.+xc*y)

where acr is the atmospherically-corrected radiance.

The person asking the question had found that when he used the formula to correct the same radiance that he had corrected using 6S itself, he got a different answer. In his case, the result from 6S itself was 0.02862, but when he ran his at-sensor radiance through the formula he got a different answer: 0.02879, a difference of 0.6%.

I was intrigued by this question, as I’ve used 6S for a long time and never noticed this before…strangely, I’d never thought to check! The rest of this post is basically a copy of my answer on the StackExchange site, but with a few bits of extra explanation.

I thought originally that it might be an issue with the parameterisation of 6S – but I tried a few different parameterisations myself and came up with the same issue – I was getting a slightly different atmospherically-corrected reflectance when putting the coefficients through the formula, compared to the reflectance that was output by the 6S model directly.

The 6S manual is very detailed, but somehow never seems to answer the questions that I have – for example, it doesn’t explain anywhere how the three coefficients are calculated. It does, however, have an example output file which includes the atmospheric correction results (see the final page of Part 1 of the manual). This includes the following outputs:

*******************************************************************************
* atmospheric correction result *
* ----------------------------- *
* input apparent reflectance : 0.100 *
* measured radiance [w/m2/sr/mic] : 38.529 *
* atmospherically corrected reflectance *
* Lambertian case : 0.22180 *
* BRDF case : 0.22180 *
* coefficients xa xb xc : 0.00685 0.03885 0.06835 *
* y=xa*(measured radiance)-xb; acr=y/(1.+xc*y) *
*******************************************************************************

If you work through the calculation using the formula given you find that the result of the calculation doesn’t match the 6S output. Let me say that again: in the example provided by the 6S authors, the model output and formula don’t match! I couldn’t quite believe this…

So, I wondered if the formula was some sort of simple curve fitting to a few outputs from 6S, and would therefore be expected to have a small error compared to the actual model outputs. As mentioned earlier, the manual explains a lot of things in a huge amount of detail, but is completely silent on the calculation of these coefficients. Luckily the 6S source code is available to download. Less conveniently, the source code is in written in Fortran 77!

I am by no means an expert in Fortran 77 (in fact, I’ve never written any Fortran code in real-life), but I’ve had a dig in to the code to try and find out how the coefficients are calculated.

If you want to follow along, the code to calculate the coefficients starts at line 3382 of main.f. The actual coefficients are set in lines 3393-3397:

 xa=pi*sb/xmus/seb/tgasm/sutott/sdtott
 xap=1./tgasm/sutott/sdtott
 xb=ainr(1,1)/sutott/sdtott/tgasm
 xb=ainr(1,1)/sutott/sdtott/tgasm
 xc=sast

(strangely xb is set twice, to the same value, and another coefficient xap is set, which never seems to be used – I have no idea why!).

It’s fairly obvious from this code that there is no complicated curve fitting algorithm used – the coefficients are simply algebraic manipulations of other variables used in the model. For example, xc is set to the value of the variable sast, which, through a bit of detective work, turns out to be the total spherical albedo (see line 3354). You can check this in the 6S output: the value of xc is always the same as the total spherical albedo which is shown a few lines further up in the output file. Similarly xb is calculated based on various variables including tgasm, which is the total global gas transmittance and sdtott, which is the total downward scattering transmittance, and so on. (These variables are difficult to decode, because Fortran 77 has a limit of six characters for variable names, so they aren’t very descriptive!).

I was stumped at this point, until I thought about numerical precision. I realised that the xacoefficient has a number of zeros after the decimal point, and wondered if there might not be enough significant figures to produce an accurate output when using the formula. It turned out this was the case, but I’ll go through how I altered the 6S code to test this.

Line 3439 of main.f is responsible for writing the coefficients to the file. It consists of:

write(iwr, 944)xa,xb,xc

This tells Fortran to write the output to the file/output stream iwr using the format code specified at line 944, and write the three variables xa, xb and xc. Looking at line 944 (that is, the line given a Fortran line number of 944, which is actually line 3772 in the file…just to keep you on your toes!) we see:

  944 format(1h*,6x,40h coefficients xa xb xc                 :, 
     s           1x, 3(f8.5,1x),t79,1h*,/,1h*,6x,
     s           ' y=xa*(measured radiance)-xb;  acr=y/(1.+xc*y)',
     s               t79,1h*,/,79(1h*))

This rather complicated line explains how to format the output. The key bit is 3(f8.5,1x) which tells Fortran to write a floating point number (f) with a maximum width of 8 characters, and 5 decimal places (8.5) followed by a space (1x), and to repeat that three times (the 3(...)). We can alter this to print out more decimal places – for example, I changed it to 3(f10.8,1x), which gives us 8 decimal places. If we do this, then we find that the output runs into the *‘s that are at the end of each line, so we need to alter a bit of the rest of the line to reduce the number of spaces after the text coefficients xa xb xc. The final, working line looks like this:

  944 format(1h*,6x,35h coefficients xa xb xc            :, 
 s           1x, 3(f10.8,1x),t79,1h*,/,1h*,6x,
 s           ' y=xa*(measured radiance)-xb;  acr=y/(1.+xc*y)',
 s               t79,1h*,/,79(1h*))

If you alter this line in main.f and recompile 6S, you will see that your output looks like this:

*******************************************************************************
*                        atmospheric correction result                        *
*                        -----------------------------                        *
*       input apparent reflectance            :    0.485                      *
*       measured radiance [w/m2/sr/mic]       :  240.000                      *
*       atmospherically corrected reflectance                                 *
*       Lambertian case :      0.45439                                        *
*       BRDF       case :      0.45439                                        *
*       coefficients xa xb xc            : 0.00297362 0.20291930 0.24282509   *
*       y=xa*(measured radiance)-xb;  acr=y/(1.+xc*y)                         *
*******************************************************************************

If you then apply the formula you will find that the output of the formula, and the output of the model match – at least, to the number of decimal places of the model output.

In my tests of this, I got the following for the original 6S code:

Model: 0.4543900000
Formula: 0.4537049078
Perc Diff: 0.1507718536%

(the percentage difference I was getting was smaller than the questioner found – but that will just depend on the parameterisation used)

and this for my altered 6S code:

Model: 0.4543900000
Formula: 0.4543942552
Perc Diff: -0.0009364659%

A lot better!

For reference, to investigate this I used Py6S, the Python interface to the 6S model that I wrote. I used the following functions to automatically calculate the results using the formula from a Py6S SixS object, and to calculate the percentage difference automatically:

def calc_acr(radiance, xa, xb, xc):
    y = xa * radiance - xb
    acr = y/(1.0 + xc * y)

    return acr

def calc_acr_from_obj(radiance, s):
    return calc_acr(radiance, s.outputs.coef_xa, s.outputs.coef_xb, s.outputs.coef_xc)

def difference_between_formula_and_model(s):
    formula = calc_acr_from_obj(s.outputs.measured_radiance, s)
    model = s.outputs.atmos_corrected_reflectance_lambertian

    diff = model - formula

    perc_diff = (diff / model) * 100

    print("Model: %.10f" % model)
    print("Formula: %.10f" % formula)
    print("Perc Diff: %.10f%%" % perc_diff)

and my example errors above came from running Py6S using the following parameterisation:

s = SixS()
s.altitudes.set_sensor_satellite_level()
s.atmos_corr = AtmosCorr.AtmosCorrLambertianFromRadiance(240)
s.wavelength = Wavelength(PredefinedWavelengths.LANDSAT_OLI_B1)
s.run()

Just as a slight addendum, if you’re atmospherically-correcting Sentinel-2 data with 6S then you might want to consider using ARCSI – an atmospheric correction tool that uses Py6S internally, but does a lot of the hard work for you. The best way to learn ARCSI is with their tutorial document.

1 Comment

PyCon UK 2018: My thoughts – including childcare review

October 15, 2018

As I mentioned in the previous post, I attended – and spoke at – PyCon UK 2018 in Cardiff. Last time I provided a link to my talk on xarray – this time I want to provide some general thoughts on the conference, some suggested talks to watch, and a particular comment on the creche/childcare that was available.

In summary: I really enjoyed my time at PyCon UK and I would strongly suggest you attend. Interestingly for the first time I think I got more out of some of the informal activities than some of the talks – people always say that the ‘hallway track’ is one of the best bits of the conference, but I’d never really found this before.

So, what bits did I particularly enjoy?

Talks

Of the many talks that I attended, I’d particularly recommend watching the videos of:

On the diagrammatic diagnosis of data by Ian Oszwald
Python at Ordnance Survey by Olivia Wilson (my wife!)
The knowledge in the code by Hannah Hazi
Teaching programming: what’s in a teacher’s toolkit by Sue Sentance
Assume worst intent by Alex Chan
Rehabilitating pickle by Alex Willmer
Why does a smile make a difference by Nikoleta E. Glynatsi

Other things

There were two other things that went on that were very interesting. One was a ‘bot competition’ run by Peter Ingelsby, where you had to write Python bots to play Connect 4 against each other. I didn’t have the time (or energy!) to write a bot, but I enjoyed looking at the code of the various bots that won at the end – some very clever techniques in there! Some of the details of the bots are described in this presentation at the end of the conference.

On the final day of the conference, people traditionally take part in ‘sprints’ – working on a whole range of Python projects. However, this year there was another activity taking place during the sprints day: a set of ‘Lean Coffee’ discussions run by David MacIver. I won’t go into the way this worked in detail, as David has written a post all about it, but I found it a very satisfying way to finish the conference. We had discussions about a whole range of issues – including the best talks at the conference, how to encourage new speakers, testing methods for Python code, other good conferences, how to get the most out of the ‘hallway track’ and lots more. Because of the way the ‘Lean Coffee’ works, each discussion is time-bound, and only occurs if the majority of the people around the table are interested in it – so it felt far more efficient than most group discussions I’ve been in. I left wanting to run some Lean Coffee sessions myself sometime (and, while writing this, am kicking myself for not suggesting it at a local unconference I went to last week!). I may also have volunteered myself to run some more sessions like it during the main conference next year – wait to hear more on that front.

Creche/Childcare

My wife and I wouldn’t have been able to attend PyCon UK without their childcare offer. The childcare is described on the conference website, but there isn’t a huge amount of detail. My aim in this section is to provide a bit more real-world information on how it actually worked and what it was like – along with some cute photos.

So, having said we wanted to use the creche when we booked our tickets, we got an email a few days before the conference asking us to provide our child’s name, age and any special requirements. We turned up on the first day at about 8:45 (the first session started at 9:00), not really sure what to expect, and found a room for the creche just outside of the main hall (the Assembly Room). It was a fairly small room, but that didn’t matter as there weren’t that many children.

Inside there were two nursery staff, from Brecon Mobile Childcare. They specialise in doing childcare at conferences, parties, weddings and so on – so they were used to looking after children that they didn’t know very well. They introduced themselves to us, and to our son, and got us to fill in a form with our details and his details, including emergency contact details for us. We talked a little about his routine and when he tends to nap, snack and so on, and then we kissed him goodbye and left. They assured us that if he got really upset and they couldn’t settle him (because they didn’t know him very well) then they’d call our mobiles and we could come and calm him down. We could then go off and enjoy the conference – and, in fact, the staff suggested that we shouldn’t come visiting during the breaks as that was likely to just upset him as he’d have to say goodbye to Mummy and Daddy multiple times.

I think there were something like 5 children there on the first day, ranging in age from about six months to ten years. The room had a variety of toys in it suitable for various different ages (including colouring and board games for the older ones, and soft toys and play mats for the younger ones), plus a small TV showing some children’s TV programmes (Teletubbies was on when we came in).

We came back at lunchtime and found that he’d had a good time. He cried a little when we left, but stopped in about a minute, and the staff engaged him with some of the toys. He’d had a short nap in his pram (we left that with them in the room) and had a few of his snacks. We collected him for lunch and took him down to the main lunch hall to get some food.

PyCon UK make it very clear that children are welcomed in all parts of the conference venue, and no-one looked at us strangely for having a child with us at lunchtime. Various other attendees engaged with our son nicely, and we soon had him sitting on a seat and eating some of the food provided. Those with younger children should note that there wasn’t any special food provided for children: our son was nearly 18 months old, so he could just eat the same as us, but younger children may need food bringing specially for them. There also weren’t any high chairs around, which could have been useful – but our son managed fairly well sitting on a chair and then on the floor, and didn’t make too much mess.

After eating lunch we took him for a walk in his pram around the park outside the venue, with the aim of getting him to sleep. We didn’t manage to get him to sleep, but he did get some fresh air. We then took him up to the creche room again and said goodbye, and left him to have fun playing with the staff for the afternoon.

We were keen to go to the lightning talks that afternoon, so went to the main hall at 5:30pm in time for them. Part-way through the talks, when popping to the toilet, we found one of the creche staff outside the main hall with our son. It turned out that the creche only continued until 5:30, not until 6:30 when the conference actually finished. We were a little surprised by this (and gave feedback to the organisers saying that the creche should finish when the main conference finishes), but it didn’t actually cause us much problem. We’d been told that children are welcome in any of the talks – and the lightning talks are more informal than most of the talks – so we brought him into the main hall and played with him at the back.

He enjoyed wandering around with his Mummy’s conference badge around his neck, and kept walking up and down the aisle smiling at people. Occasionally he got a bit too near the front, and we were asked very nicely by one of the organisers the next day to try and keep him out of the main eye-line of the speakers as it can be a bit distracting for them, but we were assured that they were more than happy to have him in the room. He even did some of his climbing over Mummy games at the back, and then breastfed for a bit, and no-one minded at all.

The rest of the days were just like the first, except that there were less children in the creche, and therefore only one member of staff. For most of the days there were just two children: our son, and a ten year old girl. On the last day (the sprints day) there was just Julian. During some of these days the staff member was able to take Julian out for a walk in his pram, which was nice, and got him a bit of fresh air.

So, that’s pretty-much all there is to say about the creche. It worked very well, and it allowed both my wife and me to attend – something which isn’t possible with most conferences. We were happy to leave our son with the staff, and he seemed to have a nice time. We’ll definitely use the creche again!

1 Comment

PyCon UK 2018: My talk on xarray

September 28, 2018

Last week I attended PyCon UK 2018 in Cardiff, and had a great time. I’m going to write a few posts about this conference – and this first one is focused on my talk.

I spoke in the ‘PyData’ track, with a talk entitled XArray: the power of pandas for multidimensional arrays. PyCon UK always do a great job of getting the videos up online very quickly, so you can watch the video of my talk below:

https://www.youtube.com/watch?v=Dgr_d8iEWk4

The slides for my talk are available here and a Github repository with the notebook which was used to create the slides here.

I think the talk went fairly well, although I found my positioning a bit awkward as I was trying to keep out of the way of the projector, while also being in range of the microphone, and trying to use my pointer to point out specific parts of the screen.

Feedback was generally good, with some useful questions afterwards, and a number of positive comments from people throughout the rest of the conference. One person emailed me to say that my talk was "the highlight of the conference" for him – which was very pleasing. My tweet with a link to the video of my talk also got a number of retweets, including from the PyData and NumFocus accounts, which got it quite a few views

In the interests of full transparency, I have posted online the full talk proposal that I submitted, as this may be helpful to others trying to come up with PyCon talk proposals.

Next up in my PyCon UK series of posts: a general review of the conference.

1 Comment

Automatic PDF calendar generation with pcal

May 21, 2018

During the Nepal earthquake response project I worked on, we were gradually getting access to historical mobile phone data for use in our analyses. I wanted to keep track of which days of data we had got access to, and which ones we were still waiting for.

I wrote a simple script to print out a list of days that we had data for – but that isn’t very easy to interpret. Far easier would be a calendar with days highlighted. I thought this would be very difficult to generate – but then I found the pcal utility, which makes it easy to produce something like this:I’m not going to go into huge detail here, as the pcal man page is very comprehensive – and pcal can do far more than I show here. However, to create an output like the one shown above you’ll need to put together a list of dates in a text file. Here’s what my dates.txt file looks like:

01/05/2018*
03/05/2018*
05/05/2018*
09/05/2018*
...

It is simply a list of dates (in dd/mm/yyyy format), each followed by an asterisk and a newline.

Then, to create the calendar, install pcal (on Linux it should be available via your package manager, on OS X it is available through brew) and run it like this:

pcal -E -s 1.0:0.0:0.0 -n /18 -b sat-sun -f dates.txt 5 2018 1 > calendar.ps

The arguments do the following:

-E configures pcal to use European-style dates (dd/mm/yyyy)
-s 1.0:0.0:0.0 sets up the highlighting colour in R:G:B format, in this case, pure red
-n /18 sets the font (in this case the default, so blank) and the font size (the /18 bit)
-b sat-sun stops Saturday and Sunday being highlighted, which is the default
-f dates.txt takes a list of dates from dates.txt
5 2018 1 tells pcal to produce a calendar starting on the 5th month (May) of 2018, and running for one month. 5 2018 6 would do the same, but producing 6 separate pages with one month per page

This produces a postscript file, which can be opened directly on many systems (eg. on OS X it opens by default in Preview) or can be converted to pdf using the ps2pdf tool.

There are loads of other options for pcal – one handy one is -w which switches to a year-per-page layout, handy for getting an overview of data availability across a whole year:

1 Comment

Assumptions in Remote Sensing

May 17, 2018

Back in 2012, I wrote the following editorial for SENSED, the magazine of the Remote Sensing and Photogrammetry Society. I found it recently while looking through back issues, and thought it deserved a wider audience, as it is still very relevant. I’ve made a few updates to the text, but it is mostly as published.

In this editorial, I’d like to delve a bit deeper into our subject, and talk about the assumptions that we all make when doing our work.

In a paper written almost twenty years ago, Duggin and Robinove produced a list of assumptions which they thought were implicit in most remote sensing analyses. These were:

There is a very high degree of correlation between the surface attributes of interest, the optical properties of the surface, and the data in the image.
The radiometric calibration of the sensor is known for each pixel.
The atmosphere does not affect the correlation (see 1 above), or the atmospheric correction perfectly corrects for this.
The sensor spatial response characteristics are accurately known at the time of image acquisition.
The sensor spectral response and calibration characteristics are accurately known at the time of image acquisition.
Image acquisition conditions were adequate to provide good radiometric contrast between the features of interest and the background.
The scale of the image is appropriate to detect and quantify the features of interest.
The correlation (see 1 above) is invariant across the image.
The analytical methods used are appropriate and adequate to the task.
The imagery is analysed at the appropriate scale
There is a method of verifying the accuracy with which ground attributes have been determined, and this method is uniformly sensitive across the image.

These all come from the following paper, in which there is a far more detailed discussion of each of these: Duggin and Robinove, 1990, Assumptions implicit in remote sensing data acquisition and analysis, International Journal of Remote Sensing, 11:10, p1669.

I firmly believe that now is a very important time to start examining this list more closely. We are in an era when products are being produced routinely from satellites: end-user products such as land-cover maps, but also products designed to be used by the remote sensing community, such as atmospherically-corrected surface reflectance products. Similarly, GUI-based ‘one-click’ software is being produced which purports to perform very complicated processing, such as atmospheric correction or vegetation canopy modelling, very easily.

My question to you, as scientists and practitioners in the field is: Have you stopped to examine the assumptions underlying the products you use?, and even if you’re not using products such as those above, have you looked at your analysis to see whether it really stands up to a scrutiny of its assumptions?

I suspect the answer is no – it certainly was for me until recently. There is a great temptation to use satellite-derived products without really looking into how they are produced and the assumptions that may have been made in their production process (seriously, read the Algorithm Theoretical Basis Document!). Ask yourself, are those assumptions valid for your particular use of the data?

Looking at the list of assumptions above, I can see a number which are very problematic. Number 8 is one that I have struggled with myself – how do I know whether the correlation between the ground data of interest and the image data is uniform across the image. I suspect it isn’t – but I’d need a lot of ground data to test it, and even then, what could I do about it? Of course, number 11 causes lots of problems for validation studies too. Number 4 and 5 are primarily related to the calibration of the sensors, which is normally managed by the operators themselves. We might not be able to do anything about it – but have we considered it, particularly when using older and therefore less well-calibrated data?

As a relatively young member of the field, it may seem like I’m ‘teaching my grandparents to suck eggs’, and I’m sure this is familiar to many of you. Those of you who have been in the field a while have probably read the paper – more recent entrants may not have done so. Regardless of experience, I think we could all do with thinking these through a bit more. So on go, have a read of the list above, maybe read the paper, and have a think about your last project: were your assumptions valid?

I’m interested in doing some more detailed work on the Duggin and Robinove paper, possibly leading to a new paper revisiting their assumptions in the modern era of remote sensing. If you’re interested in collaborating with me on this then please get in touch via robin@rtwilson.com.

2 Comments