Robin's Blog

Travel times, over time

March 6, 2023

A fun analysis I did a while back was using the Google Maps API to look at travel times between certain locations over time. I originally got interested in this because I found that travelling from my house to the university (yes, that’s how long ago this started…) seemed to either take a very short time, or a very long time, but rarely anything in the middle. I wondered if the histogram of travel times might be bi-modal – so I thought I’d investigate. This then led to doing various other analyses of local travel times.

It was actually very easy to gather the data for this. Google Maps will give you an estimated travel time for any route that you plan, and I’ve usually found these estimated times quite accurate – so I’ve relied on these for my data here. There is a googlemaps package for Python that wraps the Google Maps API and you can get the travel time between two locations using code like this:

def get_travel_time(from_loc, to_loc):
    now = datetime.now()
    directions_result = gmaps.directions(from_loc,
                                         to_loc,
                                         mode="driving",
                                         departure_time=now)

    return directions_result[0]['legs'][0]['duration_in_traffic']['value'] / 60

Back when I did this, the Google Maps API didn’t require authentication for most uses, and had a very generous free tier. This changed after a while, and now I suspect you’d need to give an API key, and have a credit card set up on your API account, and so on.

Anyway, as you can see, this code is just one API call, and then extracting the ‘duration_in_traffic’ from the result (it comes in seconds, we convert it to minutes).

To get a dataset of travel times over time, you just need to run this regularly (using chron, or equivalent), giving a sensible set of from and to locations. These locations can be anything that Google Maps recognises: an address, a lat/lon pair, a business etc. One warning is that you must pick the from and to locations carefully if you’re starting on a dual carriageway or a motorway: if you give the starting location on the wrong carriageway (just a small change in the latitude and longitude values) then your route will be wrong as it will show you travelling down that carriage way until a junction where you can turn onto the correct carriageway.

So, lets look at some results:

Home to University histograms

Looking at my initial question, here is a histogram of travel times from my home to the university (click on any image to see a larger version):

You can see that my hypothesis was wrong: the distribution is not bimodal, but it is definitely one-sided. The peak is around 11-12 minutes, and then there is a long tail extending to the right as delays increase, with a few journeys taking almost double that time.

Southampton to Bournemouth

A more interesting analysis is the travel time between Southampton and Bournemouth (for those not familiar with UK geography, Southampton is a city on the south coast of the UK but without a beach, and Bournemouth has a nice beach about 45mins drive from Southampton). Again, this idea came from personal experience: I was going over to the Bournemouth area relatively frequently during the spring/summer to go to the beach, and was interested in how long it would take. It’s well-known that the traffic between Southampton and Bournemouth is particularly bad on a summer weekend, and I wondered how bad it was on bank holidays.

So, I plotted travel time between Southampton and Bournemouth on normal Mondays, compared to bank holiday Mondays:

There are a few interesting patterns in this graph: firstly, and most obviously, there is a big difference in travel times on bank holidays – travel in mid-morning can take over 50% longer than on a normal Monday. You can see the rush hour peaks on normal Mondays for journeys leaving at around 7am and 4:30pm. These peaks are still there on bank holiday Mondays, but are significantly smaller. The travel time on a bank holiday starts to exceed a normal day from around 8-9am, and reaches its peak for journeys starting from Southampton around 11am (after all, who wants to get up early on a bank holiday!).

Looking at the reverse journey, from Bournemouth back to Southampton, you can see a far broader peak on bank holidays:

Again, there is a standard rush hour peak (though less well-defined) on normal Mondays at around 7am and 4-5pm, but the bank holiday peak starts around 10am (presumably for people travelling from Bournemouth for some other reason than to go directly to Southampton), and then there is a broad peak starting in the early afternoon of people travelling back to Southampton, extending right up until around 8-9pm.

So, from a practical purpose, if you’re going to the beach in Bournemouth on a bank holiday Monday, when should you leave Southampton? Traffic is probably best if you leave before 8am (a bit early for a bank holiday!) or after 3pm, and you’re likely to hit the traffic if you come home any time after lunch.

Just for amusement purposes, I decided to look at the travel time between Southampton and Bournemouth on Fridays, Saturdays and Sundays and compare it to the temperature. I got the temperature data for Bournemouth from the Weather Underground API (which doesn’t seem to work any more, as I found when I tried to re-run this analysis). Plotting temperature against travel time gives this graph:

The straight line fit doesn’t look great, but apparently it has a R^2 of 0.67!

The Avenue in Southampton

Finally, let’s look at The Avenue in Southampton. This is a long, straight road running north-south from the end of the M3 at the north of Southampton, right down to the city centre in the south. It gets fairly busy during rush hour, so I thought I’d look at travel time along it. Here’s the graph:

Firstly, you can see that it is quicker to travel southbound on the Avenue than it is to travel northbound. I think this is due to the way that traffic lights and junctions wait: there are various right turns when travelling northbound that cause traffic to queue, and travelling southbound most junctions are either No Right Turn, or have filter lanes for right turns.

Also, the peak in the morning is offset: travel time peaks earlier for traffic going northbound, and later for traffic going southbound. I assume this is because traffic going northbound is leaving Southampton to go somewhere else, so it is the beginning of their journey, whereas traffic going southbound is arriving in Southampton as their destination. There are some other interesting patterns on this graph that I haven’t quite worked out yet – so I’ll leave you to ponder them.

1 Comment

Beach wheelchairs at Sandown, Isle of Wight and Barry, South Wales

January 17, 2023

As you may be aware, I use a wheelchair for anything over very short distances. This can rather limit my ability to enjoy going to the beach, as wheelchairs don’t really work on sand…

There are some ways around this – for example, finding a beach where I can park my wheelchair on the promenade and get down onto the beach easily, and going at high tide I don’t have to walk very far to the sea – but even in these situations I find it tires me out a lot. However, there is a better way…

Some beaches have ‘beach wheelchairs’ that you can borrow. They have big inflatable tyres so they don’t sink into the sand, they’re waterproof, and they even float! Here is an example photo of me using a beach wheelchair:

Various beaches around the country have these wheelchairs available to borrow, but it’s quite difficult to find out where they are and how to get access to them. This blog will explain how to find and access the beach wheelchairs in Sandown, Isle of Wight and Barry, South Wales.

Sandown, Isle of Wight

Sandown has a lovely sandy beach on the south-eastern coast of the Isle of Wight. They have two beach wheelchairs, and they are available from the lifeguards in the middle of the section of the beach to the north-east of the pier. They are only available when the lifeguards are on duty (certain hours during the summer months). Go to the lifeguard station, which is an elevated platform at the top of the beach, marked on the map below (click to enlarge) and linked on Google Maps here, and ask for the wheelchairs. It may take them a while to find the relevant keys and get cover for lifeguarding so they can go to their storage room, but they should then come back with the life guard. One of the Sandown wheelchairs is shown in the photo above.

Other tips: there is parking along the road that runs at the top of the beach, free for Blue Badge holders, and there is a nice ice cream shop just opposite the parking bays.

Barry, South Wales

Barry also has a lovely sandy beach, in a bay in the area called Barry Island, just to the west of Cardiff. They have at least four beach wheelchairs, possibly more. They are stored in a room off the Changing Places toilet at the top of the beach, near the funfair.

To find them, go to the toilets behind Marco’s Cafe (see arrow on the map below – click to enlarge), or this location on Google Maps. You can get to the toilets by going either to the left or right of Marco’s cafe, and then around the back. You will find a circular building with standard disabled toilets in it, as well as steps up to normal male/female toilets. Behind the circular building is a rectangular building with a ramp up to one of the doors. This is where the Changing Places toilet (with a hoist etc) is located. Have a look around for a toilet attendant – they will often be in a little store-room at the back of the circular building, or they may be in one of the other toilets, or hanging around the general area. If you ask them for the beach wheelchairs they will open up the Changing Places toilet and then unlock a separate room inside. The beach wheelchairs will be there, and you should also be able to leave your normal wheelchair there to collect later.

I don’t know whether the wheelchairs here are available all the time, or just in the summer season (a lot of things in Barry close down for the winter).

Other tips: There is free Blue Badge parking on some of the roads nearby, particularly the parking bays off Friar’s Rd. Zio’s Gelateria is nice, and Boofy’s Fish and Chip shop does gluten-free.

Share your thoughts

Karabiner Elements and Goku for custom keyboard shortcuts in MacOS

January 11, 2023

I came across Karabiner Elements a number of years ago when trying to find a way to get easy access to the # symbol on my MacBook Pro keyboard. I’m not sure why, but it seems that using a UK keyboard layout on a MBP means that to get # you have to press some weird two-key combination – and I could never remember it. As I program in Python, being able to easily use # is fairly important!

So, I managed to get this set up in Karabiner Elements – but a couple of years ago I came across a better way to configure it, which enabled me to easily create a load of extra keyboard shortcuts.

A lot of this post is really just reminding me how I did this – but hopefully it’ll be useful to other people.

The key tool here is Goku which lets you write simple short configuration files which are translated into a complicated Karabiner Elements configuration file.

To get going, install Karabiner Elements and then follow these instructions to install Goku, and set up Karabiner Elements properly.

Then you’ll need to write your Goku configuration file, which you should place in ~/.config/karabiner.edn. Mine is below, and I’ll try and take you through it (I am not an expert at Goku configuration though!).

{:layers {:caps_lock-mode {:key :caps_lock}},
 :main [{:des "capskey",
         :rules [:caps_lock-mode
                 [:open_bracket :!!open_bracket]
                 [:close_bracket :!!close_bracket]
                 [:semicolon :!!semicolon]
                 [:quote :!!quote]
                 [:comma :!!comma]
                 [:period :!!period]
                 [:slash :!!slash]
                 [:a :!!a]
                 [:b ["osascript ~/code/applescript/switch_to_chrome.scpt"]]
                 [:c ["osascript ~/code/applescript/open_with_vscode.scpt"]]
                 [:d :!!d]
                 [:e [:x :delete_forward :left_arrow :down_arrow]]
                 [:f :!!f]
                 [:g :!!g]
                 [:h :!!h]
                 [:i :!!i]
                 [:j :!!j]
                 [:k :!!k]
                 [:l :!!l]
                 [:m :!!m]
                 [:n :!!n]
                 [:o :!!o]
                 [:p :!!p]
                 [:q :!!q]
                 [:r :!!r]
                 [:s :!!s]
                 [:t :!!t]
                 [:u :!!u]
                 [:v :!!v]
                 [:w :!!w]
                 [:x :!!x]
                 [:y :!!y]
                 [:z :!!z]
                 [:1 :!!1]
                 [:2 :!!2]
                 [:3 :!!3]
                 [:4 :!!4]
                 [:5 :!!5]
                 [:6 :!!6]
                 [:7 :!!7]
                 [:8 :!!8]
                 [:9 :!!9]
                 [:0 :!!0]]}
                 {:des "top left to hash" :rules [[:non_us_backslash :!O3]]}
                 {:des "wireless presenter" :rules [[:dictation :!!P]]}]}

Let’s start from the bottom: the penultimate rule (starting with :des) remaps the non-US-backslash key (which is the top left key directly below Escape on my MBP keyboard) to the key press Option-3, which is what gets the # symbol on my keyboard (I always have to look that up…).

The final rule is for my Logitech Wireless presenter remote, which connects via a USB dongle and acts as a keyboard. Unfortunately, when pressing the laser pointer button, it sends some sort of key press as well – and that seems to stop videos playing when presenting a Powerpoint presentation. Very inconvenient when you want to use the laser pointer to point out something in a video! So, this rule just remaps that keypress (apparently the ‘dictation’ button) to an unused key combination. In this case, that is !!P which is Cmd-Ctrl-Option-Shift-P – a key combination which is rather unlikely to already be used…! (There is documentation on what these codes like !! means here).

The rest of the rules set up a Caps Lock mode which means Caps Lock can be used normally if you just tap it by itself, but can also be used as a modifier key to allow it to be used with other keys to trigger shortcuts. This works quite well because Caps Lock is on the ‘home row’ so is easy to press with other keys. Most of these remappings just map Caps Lock plus another key to !!<key> – which, as mentioned above, is Cmd-Ctrl-Option-Shift plus the key. This may not seem very useful, but these you can then use this new complicated (but easy to press) key combination as a user-defined shortcut for various things.

For example, I use Alfred, and it has a clipboard history/snippet tool. I’ve assigned that to Caps Lock-T, so pressing that will bring up the tool. This new shortcut won’t conflict with any others (like Ctrl-T or Cmd-T) because nothing by default uses all the modifiers at once.

Another shortcut I use often is Caps Lock-Y to add the selected files to Yoink – this is just configured in the standard Mac Keyboard Shortcuts settings, as Yoink adds a configurable option there under the Finder category.

Most of the rules just map a key to !!<key> but a few do other things. You can run command-line tools directly from Goku/Karabiner Elements, so a Caps Lock-B is remapped to run a small AppleScript that switches to Chrome, and Caps Lock-C is remapped to a small AppleScript that opens the selected file in VS Code. Finally, Caps Lock-E is remapped to a little macro that types x and then deletes the character in front of the cursor, moves left and moves down. This was a useful shortcut for filling in long lists of Github-Flavoured Markdown todo lists, which were often used for PR checklists in a previous job.

So, that’s how I use Goku and Karabiner Elements to get useful keyboard shortcuts – I hope this has been helpful. Just remember that once you’ve edited your karabiner.edn file, you need to either run goku or have it already running as a service – otherwise you’ll wonder why your modifications didn’t work.

Note: I’m trying to get back into blogging, after a very busy time at work in the last couple of years meaning that I haven’t had much time. Hopefully this will be the first of some more regular posts.

1 Comment

Py6S v1.9.0 released – plus code cleanup and how I got M1 support for the underlying 6S model

June 15, 2021

Last week I released version 1.9.0 of Py6S – my Python interface to the 6S radiative transfer model.

It’s been the first non-bugfix release for quite a while (mainly because I’ve been busy with paid work), and so I just wanted to highlight a few aspects of the release.

The full release notes are available here, and they include a couple of minor breaking changes (with very specific exceptions being raised if you run into them). Specifically, the pysolar dependency has been updated to the latest version (old versions are not supported as the pysolar API changed significantly between versions), and ambiguous dates in Geometry.from_time_and_location has been dealt with by requiring date-times to be passed in ISO 8601 format. We also no longer require the matplotlib dependency, which makes Py6S a lot easier (and quicker) to install if graphical output is not required.

Code cleanup

More importantly, from my perspective, the Py6S code has had a significant clean-up. The code was relatively high-quality in terms of its operation, but it wasn’t formatted very well, and didn’t use various Python best practices. My excuse for this is that it was first written a long time ago, and I haven’t had time to update it…until now!

As part of this release, I ran the black formatter on the code-base, and used isort to sort the imports. I then added a pre-commit configuration to run black and isort every time I commit. The following is the contents of my .pre-commit-config.yaml file:

repos:
  - repo: https://github.com/python/black
    rev: 20.8b1
    hooks:
      - id: black
        language_version: python3.7
  - repo: https://github.com/asottile/seed-isort-config
    rev: v2.2.0
    hooks:
    -   id: seed-isort-config
  - repo: https://github.com/timothycrosley/isort
    rev: 5.5.2  # pick the isort version you'd like to use from https://github.com/timothycrosley/isort/releases
    hooks:
    -   id: isort

This will run black and isort on every commit, and also includes an isort ‘seed’ step that seeds a list of third-party modules that are used, which isort uses internally for some of its operations.

When writing Py6S I’d used a lot of * imports – like from Py6S import *. This is not best practice (and wasn’t even best practice when I wrote it!) and so I wanted to remove these – and I found a handy tool to do this: removestar. Simply run this on your source files, and it automatically replaces * imports with imports of the actual objects that you use from that module.

I also went through and fixed all flake8 errors across the whole project.

M1 support for 6S

Another recent change has been the addition of support for running the underlying 6S model on the Apple M1 processor (sometimes referred to as osx-arm64). I was expecting Apple to announce new MacBook Pros with M1 (or similar) processors at WWDC last week – so wanted to make sure that I could run 6S on a new machine (as I was intending to purchase one). Unfortunately, these laptops weren’t announced – but it’s good to get this working for other people anyway.

The recommended way to install Py6S involves installing 6S using conda, with the package hosted on conda-forge – and conveniently conda-forge provide instructions for setting up M1 builds for conda-forge projects. Building 6S on the M1 is a little difficult, as there is no Fortran compiler available for the M1’s at the moment – so a complex cross-compilation step is required.

I’m glad to say that the instructions provided by conda-forge worked really easily. As 6S is not a Python package, I could ignore the long list of additional build dependencies to add, and simply had to add

build_platform:
  osx_arm64: osx_64
test_on_native_only: true

to the conda-forge.yml file and then make sure that ${CMAKE_ARGS} was used in the call to cmake in build.sh. You can see the full PR for these changes here (there are lots of changes to other files caused by ‘re-rendering’ the feedstock to use the latest conda-forge configurations).

As I don’t have a M1 Mac yet, I can’t test it directly – but I am assured by a correspondent on Twitter that it works – hooray!

I’m glad I had chance to do some of these changes to Py6S, and hopefully I will have chance to update it more often in the future.

Share your thoughts

Pint + SQLAlchemy = Unit consistency and enforcement in your database

November 19, 2020

Last week I presented a poster at PyData Global 2020, about linking the pint and SQLAlchemy libraries together to provide robust handling of units with databases in Python.

The poster is shown below: click to enlarge so you can read the text:

The example code is available on Github and is well-commented to make it fairly easy to understand.

That poster was designed to be read when you had the opportunity to chat to me about it: as that isn’t necessarily the case, I’ll explain some of it in more detail below.

Firstly, the importance of units: I’m sure you can come up with lots of examples of situations when it’s really important to know what units your data are in. 42 metres, 42 miles and 42 lightyears all mean very different things in the real world! One of the most famous examples of this is the failure of the Mars Climate Orbiter – a spacecraft sent to observe Mars which failed when some data was provided in the wrong units.

The particular situation in which we came across this problem was in some software for processing shipping data. This data was derived from a range of sources, all of which used different units. We needed to make sure that we were associating each measurement with the right units, so that we could then accurately compare measurements.

When you need to deal with units in Python, the answer is almost always to use the pints library. This provides a great Quantity object which stores a numerical value alongside its units. These objects can be created really easily using the multiplication operator. For example:

distance = 5.2 * unit_registry.metres

(Yes, as someone asked in the poster session, both spellings of metres/meters are accepted!)

Pint has data on almost every unit you could think of built-in, along with their conversion factors, their unit abbreviations, and so on.

Once you’ve got a Quantity object like that, you can do useful things like convert it to another unit:

distance.to(unit_registry.miles)

This is great, and we built this in to our software from an early stage, using Quantity objects as much as possible. However, we then needed to store these measurements in a database. Originally our code looked like this:

# Read measurement from file
measurement = 50.2
# Assign units to it
measurement = measurement * unit_registry.yards
# Convert to units we want in the database
measurement_in_m = measurement.to(unit_registry.metre)
# Store in the database
db_model.distance = measurement

This was very error-prone, as we could forget to assign the proper units, or forget to convert to the units we were using in the database. We wanted a way for our code to stop us making these sorts of mistakes!

We found we could do this by implementing a hybrid_property in our SQLAlchemy model, which would check our data and do any necessary conversions before setting the value in the database. These work in the same way as standard ‘getter and setter’ properties on objects, where they run some code when you set or get a value from an attribute.

As is often the case with using getter/setters methods, the actual value is stored in a variable prefixed with an underscore – such as _distance, and the getter/setter names are distance.

In the getter we grab the value from _distance, which is always stored in metres, and return it as a Quantity object with the units set to metres.

In the setter, we check that the value we’re passing has a unit assigned, and then check the ‘dimensionality’ of the unit – for example, we check it is a valid length unit, or a valid speed unit. We then convert it to metres and store it in the _distance member variable.

For more details on how this all works, have a look at the example code.

This is ‘good enough’ for the work I’m doing at the moment, but I’m hoping to find some time to look at extending this: someone at the conference suggested that we could get rid of some of the boilerplate here by using factories or metaclasses, and I’d like to investigate that.

1 Comment

A Python sqlite3 context manager gotcha

November 16, 2020

I’ve neglected this blog for a while – partly due to the chaos of 2020 (which is not great), and partly due to being busy with work (which is good!). Anyway, I’m starting to pick it up again, and I thought I’d start with something that caught me out the other day.

So, let’s start with some fairly simple code using the sqlite3 module from the Python standard library:

import sqlite3
with sqlite3.connect('test.db') as connection:
    result = connection.execute("SELECT name FROM sqlite_master;")
    # Do some more SQL queries here

# Do something else here

What would you expect the state of the connection variable to be at the end of that code?

If you thought it would be closed (or possibly even undefined), then you’ve made the same mistake that I made!

I assumed that using sqlite3.connect as a context manage (in a with block) would open a connection when you entered the block, and close a connection when you exited the block.

It turns out that’s not the case! According to the documentation:

Connection objects can be used as context managers that automatically commit or rollback transactions. In the event of an exception, the transaction is rolled back; otherwise, the transaction is committed.

That is, it’s not the connect function that is providing the context manager, it’s the connection object that the functions returns which provides the context manager. And, using a connection object as a context manager handles transactions in the database rather than opening or closing the database connection itself: not what I had imagined.

So, if you want to use the context manager to get the transaction handling, then you need to add an explicit connection.close() outside of the block. If you don’t need the transaction handling then you can do it the ‘old-fashioned’ way, like this:

import sqlite3

connection = sqlite3.connect('test.db')
result = connection.execute("SELECT name FROM sqlite_master;")
# Do some more SQL queries here
connection.close()

Personally, I think that this is poor design. To replicate the usage of context managers elsewhere in Python (most famously with the open function), I think a context manager on the connect call should be used to open/close the database, and there should be a separate call to deal with transactions (like with connection.transaction():). Anyway, it’s pretty-much impossible to change it now – as it would break too many things – so we’d better get used to the way it is.

For context (and to help anyone Googling for the exact problem I had), I was struggling with getting some tests working on Windows. I’d opened a SQLite database using a context manager, executed some SQL, and then was trying to delete the SQLite file. Windows complained that the file was still open and therefore it couldn’t be deleted – and this was because the context manager wasn’t actually closing the connection.

3 Comments

Favourite books I read in 2019

January 20, 2020

I made a concerted effort to read more in 2019 – and I succeeded, reading a total of 42 books over the year (and this doesn’t even include the many books I read to my toddler).

I’ve chosen a selection of my favourites from the year to highlight in this blog post in the hope that you may enjoy some of them too.

Non-fiction

Countdown to Zero Day: Stuxnet and the Launch of the World’s First Digital Weapon by Kim Zetter

This book covers the fascinating story of the Stuxnet virus which was created to attack nuclear enrichment plants in Iran. It has enough technical details to satisfy me (particularly if you read all the footnotes), but is still accessible to less technical readers. The story is told in a very engaging manner, and the subject matter is absolutely fascinating. My only criticism would be that the last couple of chapters get a bit repetitive – but that’s a minor issue.
Amazon link

Just Mercy by Bryan Stephenson

I saw this book recommended on many other people’s reading lists, and was glad it read it. It was well-written and easy to read from a language point of view, but very hard to read from an emotional point of view. The stories of miscarriages of justice – particularly for black people – are terrifying, and really reinforced my opposition to capital punishment.
Amazon link

The Hut 6 Story by Gordon Welchman

I visited Bletchley Park last year – on a rare child-free weekend with my wife – and saw this book referred to a number of times in the various exhibitions there. I’d read a lot of books about the Bletchley Park codebreakers before but this one is far more technical than most and gives a really detailed description of the method that Gordon worked out for cracking one of the Enigma codes. I must admit that the appendix covering how the ‘diagonal board’ addition to the Bombes worked went a bit over my head – but the rest of it was great.
Amazon link

Atomic Accidents by James Mahaffey

I was recommended this book by various people on Twitter who kept quoting bits about how people thought ‘Plutonium fires wouldn’t be a big deal’, alongside other amusing quotes. I thought I knew quite a bit about nuclear accidents – given that I worked for a nuclear power station company, and have quite an interest in accident investigations – but I really enjoyed this book and learned a lot about various accidents that I hadn’t heard of before. It’s very readable – although occasionally a bit repetitive – and a fun read.
Amazon link

Prisoners of Geography by Tim Marshall

I can’t remember how I came across this book, but I’m glad that I did – it’s a fascinating look at how geography (primarily physical geography) affects countries and their relationships with each other. Things like the locations of mountain ranges, or places where you can access deep-water ports, have huge geopolitical consequences – and this book explores this for a selection of ten countries/regions. This book really helped me understand a number of world events in their geopolitical context, and I think of it often when listening to the news or reading about current events.
Amazon link

The Matter of the Heart: A History of the Heart in Eleven Operations by Thomas Morris

This is a big book – and fairly heavy-going in places – but it’s worth the effort. It’s a fascinating look at the heart and how humans have learnt to ‘fix’ it in various ways. It’s split into chapters about various different operations – such as implanting pacemakers, replacing valves, or transplanting an entire heart – and each chapter covers the whole historical development of that operation, from first conception to eventual widespread success. There are lot of fascinating stories (did you know that CPR was only really introduced in the 1960s?) and it’s amazing how informally a lot of these operations started – and how many people unfortunately died before the operations became successful.
Amazon link

The Dam Busters by Paul Brickhill

I’d enjoyed some of Paul Brickhill’s other books (such as The Great Escape), and yet this book had been sitting on my shelf, unread, for years. I finally got round to reading it, and enjoyed it more than I thought. A lot of the first half of the book is actually about the development of the bomb – I thought it would be all about the actual raid itself – and I found this very enjoyable from a technical perspective. The story of the raid is well-written – but I found the later chapters about some of the other things that the squadron did less interesting.
Amazon link

The Vaccine Race: How Scientists Used Human Cells to Combat Killer Viruses by Meredith Wadman

I’d never really thought about how vaccines were made – but I found this book around the time that I took my son for some of his childhood vaccinations, and found it fascinating. There are a lot of great stories in this book, but the reason it’s at the end of my list is that it is a bit heavy-going at times, and some of the stories are probably a bit gruesome for some people. Still, it’s a good read.
Amazon link

Fiction

I’ve read far more fiction in the last year than I have for quite a while – but they were mostly books by two authors, so I’ll deal with those two authors separately below.

Robert Harris

I read Robert Harris’ book Enigma many years ago and really enjoyed it, but never got round to reading any of his other work. This year I made up for this, reading Conclave – which is about the intrigue surrounding the election of a new Pope, Pompeii – which focuses on an aqueduct engineer noticing changes around Vesuvius before the eruption, and An Officer and a Spy – which tells the true story of a miscarriage of justice in 19th century France. I thoroughly enjoyed all of these – there’s something about the way that Harris sets a scene and really helps you to get the atmosphere of Roman Pompeii or the Sistine Chapel during the vote for a new Pope.

Rosie Lewis

I came across Rosie Lewis through a free book available on the Kindle store and was gripped. Rosie is a foster carer and writes with clarity and feeling about her experiences fostering various children. Her books include Torn, Taken, Broken and Betrayed, and each of them has thoroughly engaged me. As with Just Mercy above, it is an easy, but emotional, read – I cried multiple times while reading these. My favourite was probably Taken, but they were all good.

The Last Days of Night by Graham Moore

This book is a novelisation of true events around the development of electricity and the electric light bulb, focusing particularly on the patent dispute between Tesla, Westinghouse and Edison over who invented the lightbulb – and also their arguments over the best sort of current to use (AC vs DC). The book has everything: nice technical detail on electrical engineering, and a love story with lots of intrigue along the way.
Amazon link

2 Comments

Talks I’m giving this year

January 10, 2020

As I’ve mentioned before, I give talks on a range of topics to various different audiences, including local science groups, school students and at programming conferences.

I’ve already got a number of talks in the calendar for this year, as detailed below. I’ll try and keep this post up-to-date as I agree to do more talks. All of these talks (so far) are in southern England – so if you’re local then please do come along and listen.

So far all of my bookings are for one of my talks – an introduction to satellite imaging and remote sensing called Monitoring the environment from space. I do a number of other talks (see list here) and I’d love the opportunity to present them to your group: please get in touch to find out more details.

Southampton Cafe Scientifique
21st January @ 20:00
St Denys, Southampton
Title: Monitoring the environment from space
More details

Isle of Wight Cafe Scientifique
10th February @ 19:00
Shanklin, Isle of Wight
Title: Monitoring the environment from space
More details

Three Counties Science Group
17th February @ 13:45
Chiddingford, near Godalming, Surrey
Title: Monitoring the environment from space
More details

Southampton Astronomy Society
9th April @ 19:30
Shirley, Southampton
Title: Monitoring the environment from space
More details

Share your thoughts

Creating an email service for my son’s childhood memories with Python

December 10, 2019

For a number of years – since my now-toddler son was a small baby – I’ve been keeping track of various childhood achievements or memories. When I first came up with this I was rather sleep-deprived, and couldn’t decide what the best way to store this information would be – so I went with a very simple option. I just created a Word document with a table in it with two columns: the date, and the activity/achievement/memory. For example:

file

This was very flexible as it allowed me to keep anything else I wanted in this document – and it was portable (to anyone who have access to some way of reading Word documents) – and accessible to non-technical people such as my son’s grandparents.

After a while though, I wondered if I’d made the right decision: shouldn’t I have put it into some other format that could be accessed programmatically? After all, if I kept doing this for his entire childhood then I’d have a lot of interesting data in there…

Well, it turns out that a Word table isn’t too awful a format to store this sort of data in – and you can access it fairly easily from Python.

Once I realised this, I worked out what I wanted to create: a service that would email me every morning listing the things I’d put as diary entries for that day in previous years. I was modelling this very much on the Timehop app that does a similar thing with photographs, tweets and so on, so I called it julian_timehop.

If you just want to go to the code then have a look at the github repo – otherwise, read on to find out how I did it.

Steps

Let’s start by thinking about what the main steps we need to take are:

First we need to get hold of the document. I update it fairly regularly, and it lives on my laptop – whereas this script would need to run on my Linux server, so it can easily run at the same time each day. The easiest way around this was to store the document in Dropbox and use the Dropbox API to grab a copy when we run the script.
We then need to parse the document to extract the table of diary entries.
Once we’ve got the table, we can subset it to the rows that match today’s date (ignoring the year)
We then need to prepare the text of an email based on these rows, and then send the email

Let’s look at each of these in turn now.

Getting the file from Dropbox

We want to do pretty-much the simplest operation possible using the Dropbox API: login and retrieve the latest version of a file. I’ve used the Dropbox API from Python before (see my post about analysing my thesis-writing timeline with Dropbox) and it’s pretty easy. In fact, you can accomplish this task with just four lines of code.

First, we need to connect to Dropbox and authenticate. To do this, we’ll use a Dropbox API key (see here for instructions on how to get one). We don’t want to include this API key directly in the code – as we could accidentally share it with someone else (for example, by uploading the code to Github) – so we store in an environment variable called DROPBOX_KEY.

We can get the key from this environment variable with

dropbox_key = os.environ.get('DROPBOX_KEY')

We can then create a Dropbox API connection and authenticate

dbx = dropbox.Dropbox(dropbox_key)

To download a file, we just call the files_download_to_file method

dbx.files_download_to_file(output_filename, path)

In this case the path argument is the path of the file inside the Dropbox folder – in my case the path is /Notes and diary entries for Julian.docx as the file is in the root Dropbox folder.

Putting this together we get a function to download a file from Dropbox

def download_file(path):
    """
    Download a file from Dropbox, returning the local filename of the downloaded file

    Requires the DROPBOX_KEY env var to be set to a valid Dropbox API key
    """
    dropbox_key = os.environ.get('DROPBOX_KEY')
    dbx = dropbox.Dropbox(dropbox_key)
    output_filename = 'document.docx'
    dbx.files_download_to_file(output_filename, path)

    return output_filename

That’s the first step completed; next we need to extract the table from the Word document.

Extracting the table

In a previous job I did some work which involved automating the creation of Powerpoint presentations, and I used the excellent python-pptx library for reading and writing Powerpoint files. Conveniently, there is a sister library available for Word documents called python-docx which works in a very similar way.

We’re going to convert the Word table to a pandas DataFrame, so after installing python-docx we need to import the main Document class, along with pandas itself

from docx import Document
import pandas as pd

We can parse the document by creating an instance of the Document class with the filename as a parameter

doc = Document(filename)

The doc object has various useful methods and attributes – and one of these is a list of tables in the document. We know that we want to parse the first table – so we just select the 0th index

tab = doc.tables[0]

To create a pandas DataFrame, we need a list containing the contents of each column: here that means a list of dates and a list of entries.

tab.column_cells(0) gives us an iterator over all the cells in column 0, and each cell has a .text method to give the text content of that cell – so we can write a list comprehension to extract all of the contents into a list

dates = [cell.text for cell in tab.column_cells(0)]

We can then use the very handy pd.to_datetime function to convert these to actual date objects. We pass the argument errors='coerce' to force it to parse all entries in the list, without giving errors if one of them isn’t a valid date (in this case it will return NaT or Not a Time).

We can do the same for descriptions, and then put the descriptions and dates together into a DataFrame.

Here is the full code:

def read_table_from_doc(filename):
    doc = Document(filename)
    tab = doc.tables[0]

    dates = [cell.text for cell in tab.column_cells(0)]
    dates = pd.to_datetime(dates, errors='coerce')

    descs = [cell.text for cell in tab.column_cells(1)]

    df = pd.DataFrame({'desc':descs}, index=dates)

    return df

Creating the text for an email

The next step is to create the text to put in an email message, listing the date and the various memories. I wanted an output that looked like this:

01 December

2018:
Memory from 2018

2018:
Another memory from 2018

2017:
A memory from 2017

The code for this is fairly simple, and I’ll only mention the interesting bits.

Firstly, we create a subset of the DataFrame, where we only have the rows where the date was the same as today’s date (ignoring the year):

today = datetime.datetime.now()
subdf = df[(df.index.month == today.month) & (df.index.day == today.day)]

Here we’re combining two boolean indexing operations with & – though do remember to use brackets, as the order of precedence inside these boolean expressions doesn’t always work in the way you’d expect (I’ve been caught out by this a number of times).

As I knew this would only be running on my server, I could use new Python 3.7 features – so I used f-strings. This means that the body of my loop to create the HTML for the email body looks like this

text += f"<p><b>{i.year!s}:</b></br>{row['desc']}</p>\n\n"

Here we’re including variables such as i.year (the year value of the datetime index) and row['desc'] (the value of the desc column of this row).

Putting it together into a function gives the following code, which either returns the HTML text of the email or None if there are no events matching this date

def get_formatted_message(df):
    today = datetime.datetime.now()

    subdf = df[(df.index.month == today.month) & (df.index.day == today.day)]

    if len(subdf) == 0:
        return

    title_date = datetime.datetime.now().strftime('%d %B')
    text = f'<h2>{title_date}</h2>\n\n'
    for i, row in subdf.iterrows():
        text += f"<p><b>{i.year!s}:</b></br>{row['desc']}</p>\n\n"

    return text

Sending the email

I’ve written code to send emails in Python before, and had great difficulty. Emails are a lot more difficult than a lot of people think, and often my emails never got sent, or never got to their destination, or broke in some other way.

This time I managed to avoid all of those problems by using the emails library. Writing a send_email function using this library was so easy:

def send_email(text):
    message = emails.html(html=text,
                          subject='Julian Timehop',
                          mail_from=('Julian Timehop Emailer', 'robin@rtwilson.com'))

    password = os.environ.get('SMTP_PASSWORD')

    r = message.send(to=('R Wilson', 'robin@rtwilson.com'),
                     smtp={'host':'mail.rtwilson.com',
                           'port': 465, 'ssl': True,
                           'user': 'robin@rtwilson.com', 'password': password})

All of the code above is self-explanatory: we’re creating a HTML email message (it details with all of the escaping and encoding necessary), grabbing the password from another environment variable and then sending the email. Easy!

Putting it all together

We’ve now written a function for each individual step – and now we just need to put the functions together. The benefit of writing your script this way is that the main part of your script is just a few lines.

In this case, all we need is

filename = download_file('/Notes and diary entries for Julian.docx')
df = read_table_from_doc(filename)
text = get_formatted_message(df)
if text is not None:
    send_email(text)

Another benefit of this is that for anyone (including yourself) coming back to it in the future, it is very easy to get an overview of what the script does, before delving into the details.

So, I now have this set up on my server to send me an email every morning with some memories of my son’s childhood. Here’s to many more happy memories – and remember to check out the code if you’re interested.

I do freelance work on Python programming and data science – please see my freelance website for more details.

Share your thoughts

Automatically downloading nursery photos from ParentZone using Selenium

December 4, 2019

My son goes to a nursery part-time, and the nursery uses a system called ParentZone from Connect Childcare to send information between us (his parents) and nursery. Primarily, this is used to send us updates on the boring details of the day (what he’s had to eat, nappy changes and so on), and to send ‘observations’ which include photographs of what he’s been doing at nursery. The interfaces include a web app (pictured below) and a mobile app:

file

I wanted to be able to download all of these photos easily to keep them in my (enormous) set of photos of my son, without manually downloading each one. So, I wrote a script to do this, with the help of Selenium.

If you want to jump straight to the script, then have a look at the ParentZonePhotoDownloader Github repository. The script is documented and has a nice command-line interface. For more details on how I created it, read on…

Selenium is a browser automation tool that allows you to control pretty-much everything a browser does through code, while also accessing the underlying HTML that the browser is displaying. This makes it the perfect choice for scraping websites that have a lot of Javascript – like the ParentZone website.

To get Selenium working you need to install a ‘webdriver’ that will connect to a particular web browser and do the actual controlling of the browser. I’ve chosen to use chromedriver to control Google Chrome. See the Getting Started guide to see how to install chromedriver – but it’s basically as simple as downloading a binary file and putting it in your PATH.

My script starts off fairly simply, by creating an instance of the Chrome webdriver, and navigating to the ParentZone homepage:

driver = webdriver.Chrome()
driver.get("https://www.parentzone.me/")

The next line: driver.implicitly_wait(10) tells Selenium to wait up to 10 seconds for elements to appear before giving up and giving an error. This is useful for sites that might be slightly slow to load (eg. those with large pictures).

We then fill in the email address and password in the login form:

email_field = driver.find_element_by_xpath('//*[@id="login"]/fieldset/div[1]/input')
email_field.clear()
email_field.send_keys(email)

Here we’re selecting the email address field using it’s XPath, which is a sort of query language for selecting nodes from an XML document (or, by extension, an HTML document – as HTML is a form of XML). I have some basic knowledge of XPath, but usually I just copy the expressions I need from the Chrome Dev Tools window. To do this, select the right element in Dev Tools, then right click on the element’s HTML code and choose ‘Copy->Copy XPath’:

file

We then clear the field, and fake the typing of the email string that we took as a command-line argument.

We then repeat the same thing for the password field, and then just send the ‘Enter’ key to submit the field (easier than finding the right submit button and fake-clicking it).

Once we’ve logged in and gone to the correct page (the ‘timeline’ page) we want to narrow down the page to just show ‘Observations’ (as these are usually the only posts that have photographs). We do this by selecting a dropdown, and then choosing an option from the dropdown box:

dropdown = Select(driver.find_element_by_xpath('//*[@id="filter"]/div[2]/div[4]/div/div[1]/select'))
dropdown.select_by_value('7')

I found the right value (7) to set this to by reading the HTML code where the options were defined, which included this line: <option value="7">Observation</option>.

We then click the ‘Submit’ button:

submit_button = driver.find_element_by_id('submit-filter')
submit_button.click()

Now we get to the bit that had me stuck for a while… The page has ‘infinite scrolling’ – that is, as you scroll down, more posts ‘magically’ appear. We need to scroll right down to the bottom so that we have all of the observations before we try to download them.

I tried using various complicated Javascript functions, but none of them seemed to work – so I settled on a naive way to do it. I simply send the ‘End’ key (which scrolls to the end of the page), wait a few seconds, and then count the number of photos on the page (in this case, elements with the class img-responsive, which is used for photos from observations). When this number stops increasing, I know I’ve reached the point where there are no more pictures to load.

The code that does this is fairly easy to understand:

html = driver.find_element_by_tag_name('html')
old_n_photos = 0
while True:
    # Scroll
    html.send_keys(Keys.END)
    time.sleep(3)
    # Get all photos
    media_elements = driver.find_elements_by_class_name('img-responsive')
    n_photos = len(media_elements)

    if n_photos > old_n_photos:
        old_n_photos = n_photos
    else:
        break

We’ve now got a page with all the photos on it, so we just need to extract them. In fact, we’ve already got a list of all of these photo elements in media_elements, so we just iterate through this and grab some details for each image. Specifically, we get the image URL with element.get_attribute('src'), and then extract the unique image ID from that URL. We then choose the filename to save the file as based on the type of element that was used to display it on the web page (the element.tag_name). If it was a <img> tag then it’s an image, if it was a <video> tag then it was a video.

We then download the image/video file from the website using the requests library (that is, not through Selenium, but separately, just using the URL obtained through Selenium):

# For each image that we've found
for element in media_elements:
    image_url = element.get_attribute('src')
    image_id = image_url.split("&d=")[-1]

    # Deal with file extension based on tag used to display the media
    if element.tag_name == 'img':
        extension = 'jpg'
    elif element.tag_name == 'video':
        extension = 'mp4'
    image_output_path = os.path.join(output_folder,
                                        f'{image_id}.{extension}')

    # Only download and save the file if it doesn't already exist
    if not os.path.exists(image_output_path):
        r = requests.get(image_url, allow_redirects=True)
        open(image_output_path, 'wb').write(r.content)

Putting this all together into a command-line script was made much easier by the click library. Adding the following decorators to the top of the main function creates a whole command-line interface automatically – even including prompts to specify parameters that weren’t specified on the command-line:

@click.command()
@click.option('--email', help='Email address used to log in to ParentZone',
              prompt='Email address used to log in to ParentZone')
@click.option('--password', help='Password used to log in to ParentZone',
              prompt='Password used to log in to ParentZone')
@click.option('--output_folder', help='Output folder',
              default='./output')

So, that’s it. Less than 100 lines in total for a very useful script that saves me a lot of tedious downloading. The full script is available on Github

_I do freelance work in Python programming and data science – see my freelance website for more details._

11 Comments