As I’ve mentioned before, I give talks on a range of topics to various different audiences, including local science groups, school students and at programming conferences.
I’ve already got a number of talks in the calendar for this year, as detailed below. I’ll try and keep this post up-to-date as I agree to do more talks. All of these talks (so far) are in southern England – so if you’re local then please do come along and listen.
So far all of my bookings are for one of my talks – an introduction to satellite imaging and remote sensing called Monitoring the environment from space. I do a number of other talks (see list here) and I’d love the opportunity to present them to your group: please get in touch to find out more details.
Southampton Cafe Scientifique
21st January @ 20:00
St Denys, Southampton
Title: Monitoring the environment from space More details
Isle of Wight Cafe Scientifique
10th February @ 19:00
Shanklin, Isle of Wight
Title: Monitoring the environment from space More details
Three Counties Science Group
17th February @ 13:45
Chiddingford, near Godalming, Surrey
Title: Monitoring the environment from space More details
Southampton Astronomy Society
9th April @ 19:30
Shirley, Southampton
Title: Monitoring the environment from space More details
For a number of years – since my now-toddler son was a small baby – I’ve been keeping track of various childhood achievements or memories. When I first came up with this I was rather sleep-deprived, and couldn’t decide what the best way to store this information would be – so I went with a very simple option. I just created a Word document with a table in it with two columns: the date, and the activity/achievement/memory. For example:
This was very flexible as it allowed me to keep anything else I wanted in this document – and it was portable (to anyone who have access to some way of reading Word documents) – and accessible to non-technical people such as my son’s grandparents.
After a while though, I wondered if I’d made the right decision: shouldn’t I have put it into some other format that could be accessed programmatically? After all, if I kept doing this for his entire childhood then I’d have a lot of interesting data in there…
Well, it turns out that a Word table isn’t too awful a format to store this sort of data in – and you can access it fairly easily from Python.
Once I realised this, I worked out what I wanted to create: a service that would email me every morning listing the things I’d put as diary entries for that day in previous years. I was modelling this very much on the Timehop app that does a similar thing with photographs, tweets and so on, so I called it julian_timehop.
If you just want to go to the code then have a look at the github repo – otherwise, read on to find out how I did it.
Steps
Let’s start by thinking about what the main steps we need to take are:
First we need to get hold of the document. I update it fairly regularly, and it lives on my laptop – whereas this script would need to run on my Linux server, so it can easily run at the same time each day. The easiest way around this was to store the document in Dropbox and use the Dropbox API to grab a copy when we run the script.
We then need to parse the document to extract the table of diary entries.
Once we’ve got the table, we can subset it to the rows that match today’s date (ignoring the year)
We then need to prepare the text of an email based on these rows, and then send the email
Let’s look at each of these in turn now.
Getting the file from Dropbox
We want to do pretty-much the simplest operation possible using the Dropbox API: login and retrieve the latest version of a file. I’ve used the Dropbox API from Python before (see my post about analysing my thesis-writing timeline with Dropbox) and it’s pretty easy. In fact, you can accomplish this task with just four lines of code.
First, we need to connect to Dropbox and authenticate. To do this, we’ll use a Dropbox API key (see here for instructions on how to get one). We don’t want to include this API key directly in the code – as we could accidentally share it with someone else (for example, by uploading the code to Github) – so we store in an environment variable called DROPBOX_KEY.
We can get the key from this environment variable with
dropbox_key = os.environ.get('DROPBOX_KEY')
We can then create a Dropbox API connection and authenticate
dbx = dropbox.Dropbox(dropbox_key)
To download a file, we just call the files_download_to_file method
dbx.files_download_to_file(output_filename, path)
In this case the path argument is the path of the file inside the Dropbox folder – in my case the path is /Notes and diary entries for Julian.docx as the file is in the root Dropbox folder.
Putting this together we get a function to download a file from Dropbox
def download_file(path):
"""
Download a file from Dropbox, returning the local filename of the downloaded file
Requires the DROPBOX_KEY env var to be set to a valid Dropbox API key
"""
dropbox_key = os.environ.get('DROPBOX_KEY')
dbx = dropbox.Dropbox(dropbox_key)
output_filename = 'document.docx'
dbx.files_download_to_file(output_filename, path)
return output_filename
That’s the first step completed; next we need to extract the table from the Word document.
Extracting the table
In a previous job I did some work which involved automating the creation of Powerpoint presentations, and I used the excellent python-pptx library for reading and writing Powerpoint files. Conveniently, there is a sister library available for Word documents called python-docx which works in a very similar way.
We’re going to convert the Word table to a pandas DataFrame, so after installing python-docx we need to import the main Document class, along with pandas itself
from docx import Document
import pandas as pd
We can parse the document by creating an instance of the Document class with the filename as a parameter
doc = Document(filename)
The doc object has various useful methods and attributes – and one of these is a list of tables in the document. We know that we want to parse the first table – so we just select the 0th index
tab = doc.tables[0]
To create a pandas DataFrame, we need a list containing the contents of each column: here that means a list of dates and a list of entries.
tab.column_cells(0) gives us an iterator over all the cells in column 0, and each cell has a .text method to give the text content of that cell – so we can write a list comprehension to extract all of the contents into a list
dates = [cell.text for cell in tab.column_cells(0)]
We can then use the very handy pd.to_datetime function to convert these to actual date objects. We pass the argument errors='coerce' to force it to parse all entries in the list, without giving errors if one of them isn’t a valid date (in this case it will return NaT or Not a Time).
We can do the same for descriptions, and then put the descriptions and dates together into a DataFrame.
Here is the full code:
def read_table_from_doc(filename):
doc = Document(filename)
tab = doc.tables[0]
dates = [cell.text for cell in tab.column_cells(0)]
dates = pd.to_datetime(dates, errors='coerce')
descs = [cell.text for cell in tab.column_cells(1)]
df = pd.DataFrame({'desc':descs}, index=dates)
return df
Creating the text for an email
The next step is to create the text to put in an email message, listing the date and the various memories. I wanted an output that looked like this:
01 December
2018:
Memory from 2018
2018:
Another memory from 2018
2017:
A memory from 2017
The code for this is fairly simple, and I’ll only mention the interesting bits.
Firstly, we create a subset of the DataFrame, where we only have the rows where the date was the same as today’s date (ignoring the year):
Here we’re combining two boolean indexing operations with & – though do remember to use brackets, as the order of precedence inside these boolean expressions doesn’t always work in the way you’d expect (I’ve been caught out by this a number of times).
As I knew this would only be running on my server, I could use new Python 3.7 features – so I used f-strings. This means that the body of my loop to create the HTML for the email body looks like this
text += f"<p><b>{i.year!s}:</b></br>{row['desc']}</p>\n\n"
Here we’re including variables such as i.year (the year value of the datetime index) and row['desc'] (the value of the desc column of this row).
Putting it together into a function gives the following code, which either returns the HTML text of the email or None if there are no events matching this date
def get_formatted_message(df):
today = datetime.datetime.now()
subdf = df[(df.index.month == today.month) & (df.index.day == today.day)]
if len(subdf) == 0:
return
title_date = datetime.datetime.now().strftime('%d %B')
text = f'<h2>{title_date}</h2>\n\n'
for i, row in subdf.iterrows():
text += f"<p><b>{i.year!s}:</b></br>{row['desc']}</p>\n\n"
return text
Sending the email
I’ve written code to send emails in Python before, and had great difficulty. Emails are a lot more difficult than a lot of people think, and often my emails never got sent, or never got to their destination, or broke in some other way.
This time I managed to avoid all of those problems by using the emails library. Writing a send_email function using this library was so easy:
All of the code above is self-explanatory: we’re creating a HTML email message (it details with all of the escaping and encoding necessary), grabbing the password from another environment variable and then sending the email. Easy!
Putting it all together
We’ve now written a function for each individual step – and now we just need to put the functions together. The benefit of writing your script this way is that the main part of your script is just a few lines.
In this case, all we need is
filename = download_file('/Notes and diary entries for Julian.docx')
df = read_table_from_doc(filename)
text = get_formatted_message(df)
if text is not None:
send_email(text)
Another benefit of this is that for anyone (including yourself) coming back to it in the future, it is very easy to get an overview of what the script does, before delving into the details.
So, I now have this set up on my server to send me an email every morning with some memories of my son’s childhood. Here’s to many more happy memories – and remember to check out the code if you’re interested.
I do freelance work on Python programming and data science – please see my freelance website for more details.
My son goes to a nursery part-time, and the nursery uses a system called ParentZone from Connect Childcare to send information between us (his parents) and nursery. Primarily, this is used to send us updates on the boring details of the day (what he’s had to eat, nappy changes and so on), and to send ‘observations’ which include photographs of what he’s been doing at nursery. The interfaces include a web app (pictured below) and a mobile app:
I wanted to be able to download all of these photos easily to keep them in my (enormous) set of photos of my son, without manually downloading each one. So, I wrote a script to do this, with the help of Selenium.
If you want to jump straight to the script, then have a look at the ParentZonePhotoDownloader Github repository. The script is documented and has a nice command-line interface. For more details on how I created it, read on…
Selenium is a browser automation tool that allows you to control pretty-much everything a browser does through code, while also accessing the underlying HTML that the browser is displaying. This makes it the perfect choice for scraping websites that have a lot of Javascript – like the ParentZone website.
To get Selenium working you need to install a ‘webdriver’ that will connect to a particular web browser and do the actual controlling of the browser. I’ve chosen to use chromedriver to control Google Chrome. See the Getting Started guide to see how to install chromedriver – but it’s basically as simple as downloading a binary file and putting it in your PATH.
My script starts off fairly simply, by creating an instance of the Chrome webdriver, and navigating to the ParentZone homepage:
The next line: driver.implicitly_wait(10) tells Selenium to wait up to 10 seconds for elements to appear before giving up and giving an error. This is useful for sites that might be slightly slow to load (eg. those with large pictures).
We then fill in the email address and password in the login form:
Here we’re selecting the email address field using it’s XPath, which is a sort of query language for selecting nodes from an XML document (or, by extension, an HTML document – as HTML is a form of XML). I have some basic knowledge of XPath, but usually I just copy the expressions I need from the Chrome Dev Tools window. To do this, select the right element in Dev Tools, then right click on the element’s HTML code and choose ‘Copy->Copy XPath’:
We then clear the field, and fake the typing of the email string that we took as a command-line argument.
We then repeat the same thing for the password field, and then just send the ‘Enter’ key to submit the field (easier than finding the right submit button and fake-clicking it).
Once we’ve logged in and gone to the correct page (the ‘timeline’ page) we want to narrow down the page to just show ‘Observations’ (as these are usually the only posts that have photographs). We do this by selecting a dropdown, and then choosing an option from the dropdown box:
I found the right value (7) to set this to by reading the HTML code where the options were defined, which included this line: <option value="7">Observation</option>.
Now we get to the bit that had me stuck for a while… The page has ‘infinite scrolling’ – that is, as you scroll down, more posts ‘magically’ appear. We need to scroll right down to the bottom so that we have all of the observations before we try to download them.
I tried using various complicated Javascript functions, but none of them seemed to work – so I settled on a naive way to do it. I simply send the ‘End’ key (which scrolls to the end of the page), wait a few seconds, and then count the number of photos on the page (in this case, elements with the class img-responsive, which is used for photos from observations). When this number stops increasing, I know I’ve reached the point where there are no more pictures to load.
The code that does this is fairly easy to understand:
html = driver.find_element_by_tag_name('html')
old_n_photos = 0
while True:
# Scroll
html.send_keys(Keys.END)
time.sleep(3)
# Get all photos
media_elements = driver.find_elements_by_class_name('img-responsive')
n_photos = len(media_elements)
if n_photos > old_n_photos:
old_n_photos = n_photos
else:
break
We’ve now got a page with all the photos on it, so we just need to extract them. In fact, we’ve already got a list of all of these photo elements in media_elements, so we just iterate through this and grab some details for each image. Specifically, we get the image URL with element.get_attribute('src'), and then extract the unique image ID from that URL. We then choose the filename to save the file as based on the type of element that was used to display it on the web page (the element.tag_name). If it was a <img> tag then it’s an image, if it was a <video> tag then it was a video.
We then download the image/video file from the website using the requests library (that is, not through Selenium, but separately, just using the URL obtained through Selenium):
# For each image that we've found
for element in media_elements:
image_url = element.get_attribute('src')
image_id = image_url.split("&d=")[-1]
# Deal with file extension based on tag used to display the media
if element.tag_name == 'img':
extension = 'jpg'
elif element.tag_name == 'video':
extension = 'mp4'
image_output_path = os.path.join(output_folder,
f'{image_id}.{extension}')
# Only download and save the file if it doesn't already exist
if not os.path.exists(image_output_path):
r = requests.get(image_url, allow_redirects=True)
open(image_output_path, 'wb').write(r.content)
Putting this all together into a command-line script was made much easier by the click library. Adding the following decorators to the top of the main function creates a whole command-line interface automatically – even including prompts to specify parameters that weren’t specified on the command-line:
@click.command()
@click.option('--email', help='Email address used to log in to ParentZone',
prompt='Email address used to log in to ParentZone')
@click.option('--password', help='Password used to log in to ParentZone',
prompt='Password used to log in to ParentZone')
@click.option('--output_folder', help='Output folder',
default='./output')
So, that’s it. Less than 100 lines in total for a very useful script that saves me a lot of tedious downloading. The full script is available on Github
_I do freelance work in Python programming and data science – see my freelance website for more details._
This is more a ‘note to myself’ than anything else, but I expect some other people might find it useful.
I’ve often struggled with accessing MySQL from Python, as the ‘default’ MySQL library for Python is MySQLdb. This library has a number of problems: 1) it is Python 2 only, and 2) it requires compiling against the MySQL C library and header files, and so can’t be simply installed using pip.
There is a Python 3 version of MySQLdb called mysqlclient, but this also requires compiling against the MySQL libraries and header files, so can be complicated to install.
The best library I’ve found as a replacement is PyMySQL which is a pure Python library (so no need to install MySQL libraries and header files). It’s API is basically exactly the same as MySQLdb, so it’s easy to switch across.
Right, that’s the introduction – and we’re really at the actual point of this post, which is how to go about using the PyMySQL library ‘under the hood’ when you’re accessing databases through SQLAlchemy.
The weird thing is that I’m not actually using SQLAlchemy by choice in my code – but it is used by pandas to convert between SQL and data frames.
For example, you can write code like this:
from sqlalchemy import create_engine
eng = create_engine('mysql://user:pass@127.0.0.1/database')
df.to_sql('table', eng, if_exists='append', index=False)
which will append the data in df to a table in a database running on the local machine.
The create_engine call is a SQLAlchemy function which creates an engine to handle all of the complex communication to and from a specific database.
Now, when you specify a database connection string with the mysql:// prefix, SQLAlchemy tries to use the MySQLdb library to do the underlying communication with the MySQL database – and fails if it can’t be found.
So, now we’re at the actual solution: which is that you can give SQLAlchemy a ‘dialect’ to use to connect to a database – and this can be used to change the underlying library that is used to talk to the database.
So, you can change your connection string to mysql+pymysql://user:pass@127.0.0.1/database and it will use the PyMySQL library. It’s as simple as that!
There are other dialects that you can use to connect to MySQL using different underlying libraries – although these aren’t recommended by the authors of SQLAlchemy. You can find a list of them here.
_I do data science work – including processing data in MySQL databases – as part of my freelance work. Please contact me for more details._
Just a quick post today, to tell you about a couple of simple zsh functions that I find handy as a Python programmer.
First, pyimp – a very simple function that tries to import a module in Python and displays the output. If there is no output then the import succeeded, otherwise you’ll see the error. This saves constantly going into a Python interpreter and trying to import something, making that ‘has it worked or not’ cycle a bit quicker when installing a tricky package.
The function is defined as
function pyimp() { python -c "import $1" }
This just calls Python with the -c flag which tells it to execute the code you’ve given on the command line – which in this case is just an import command.
You can see below that it returns nothing for a module which is importable, but returns the error for anything which fails:
$ pyimp numpy
$ pyimp blah
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'blah'
The second is pycd which changes directory to the folder where a particular module is defined. This can be useful if you want to inspect the code of the module in depth, or if you’ve installed the module in ‘develop mode’ and want to actually edit the code.
Another quick matplotlib tip today: specifically, how easily specify colours from the standard matplotlib colour cycle.
A while back, when matplotlib overhauled their themes and colour schemes, they changed the default cycle of colours used for lines in matplotlib. Previously the first line was pure blue (color='b' in matplotlib syntax), then red, then green etc. They, very sensibly, changed this to a far nicer selection of colours.
However, this change made one thing a bit more difficult – as I found recently. I had plotted a couple of simple lines:
This produces a shaded line which extends from 5 below the line to 5 above the line:
Unfortunately the colours don’t look quite right: the line isn’t yellow, so doing a partially-transparent yellow background doesn’t look quite right.
I spent a while looking into how to extract the colour of the line so I could use this for the shading, before finding a really easy way to do it. To get the colours in the default colour cycle you can simply use the strings 'C0', 'C1', 'C2' etc. So, in this case just
The result looks far better now the colours match:
I found out about this from a wonderful graphical matplotlib cheatsheet created by Nicolas Rougier – I’d strongly suggest you check it out, there are all sorts of useful things on there that I never knew about!
Just in case you need to do this the manual way, then there are two fairly straightforward ways to get the colour of the second line.
The first is to get the default colour cycle from the matplotlib settings, and extract the relevant colour:
Here we save the result of the plt.plot call when we plot the second line. This gives us a list of the Line2D objects that were created, and we then extract the first (and only) element and call the get_color() method to extract the colour.
I do freelance work in data science and data visualisation – including using matplotlib. If you’d like to work with me, have a look at my freelance website or email me.
As you may have noticed, I hadn’t blogged here for quite a while, but have recently started blogging regularly again. This is mostly due to sorting out various WordPress issues I was having, and installing some new plugins to make writing blog posts fun again.
Ever since I installed the WordPress update that added the ‘Gutenberg’ editor, I had various problems with editing and creating new posts. I eventually switched back to the Classic Editor (following these instructions), but still wasn’t really happy. I’ve never really been a huge fan of the WordPress editor – it has been a fiddly to get things formatted the way I want, and it’s never dealt with code snippets very well.
I’ve had some plugins installed to do syntax highlighting, but these have required typing the code into a separate little dialog, and not being able to edit it easily after adding it. I really wanted to be able to include code as easily as I do in Markdown documents using ‘code fence’ syntax. For example, something like this:
`python
def func(x):
print(x)
return 2*x
`
(with no spaces between the backticks – I had to include those or that example would have been syntax highlighted for me)
Basically, I wanted to write my posts in Markdown. I investigated static blog generators, but didn’t want to deal with converting all of my previous posts, and trying to make sure URLs still redirected properly and so on.
Anyway, I found a solution which works really well for me: the WP Githuber MD plugin.
This allows you to write your posts in Markdown, and it supports Github-style fenced code blocks, with syntax highlighting.
All you need to do is install it and then enable the correct settings. To do this:
Go to the Plugins -> Installed Plugins page
Find ‘WP Githuber MD’ and click ‘Settings’
Go to the ‘Modules’ tab at the top
Turn the switch on the right-hand side of the ‘Syntax Highlight’ heading on
Fiddle with the syntax highlighting settings to your own preferences
(Optional) Turn on the switch next to ‘Image Paste’ to make it really easy to add images to your posts
That’s all that needs doing – now your code blocks will be nicely formatted, and you don’t have to bother with typing code into silly dialogs, just write the post in Markdown and insert code as usual and everything ‘just works’.
As a brief postscript, the ‘Image Paste’ functionality is also really useful. Simply copy an image from somewhere on your computer – often I’m copying something like a matplotlib graph produced by a Python script – and then switch to the Markdown editor and paste. The image will then be uploaded to your WordPress Media Library and the right code to include the image will be inserted. All done with a single keypress!
So yes, overall, I am a big fan of WP Githuber MD – I’ve not been asked to say this, but it has really transformed my blog editing experience!
I keep gathering links of interesting Python things I’ve seen around the internet: new packages, good tutorials, and so on – and so I thought I’d start a series where I share them every so often.
Not all of these are new new – some have been around for a while but are new to me – and so they might be new to you too!
Also, there is a distinct ‘PyData’ flavour to these things – they’re all things I’ve come across in my work in data science and geographic processing with Python.
I try really hard to follow the PEP8 style guide for my Python code – but I wasn’t so disciplined in the past, and so I’ve got a lot of old code sitting around which isn’t styled particularly well.
One of the things PEP8 recommends against is using: from blah import *. In my code I used to do a lot of from matplotlib.pyplot import *, and from Py6S import * – but it’s a pain to go through old code and work out what functions are actually used, and replace the import with something like from matplotlib.pyplot import plot, xlabel, title.
removestar is a tool that will do that for you! Just install it with pip install removestar and then it provides a command-line tool to fix your imports for you.
If you use OS X then you’ll know about the very handy ‘quicklook’ feature that shows you a preview of the selected file in Finder when pressing the spacebar. You can add support for new filetypes to quicklook using quicklook plugins – and I’d already set up a number of useful plugins which will show syntax-highlighted code, preview JSON, CSV and Markdown files nicely, and so on.
I only discovered ipynb-quicklook last week, and it does what you’d expect: it provides previews of Jupyter Notebook files from the Finder. Simply follow the instructions to place the ipynb-quicklook.qlgenerator file in your ~/Library/QuickLook folder, and it ‘Just Works’ – and it’s really quick to render the files too!
This is a great cheatsheet for the matplotlib plotting library from Nicolas Rougier. It’s a great quick reference for all the various matplotlib settings and functions, and reminded me of a number of things matplotlib can do that I’d forgotten about.
Find the high-resolution cheatsheet image here and the repository with all the code used to create it here. Nicolas is also writing a book called Scientific Visualization – Python & Matplotlib which looks great – and it’ll be released open-access once it’s finished (you can donate to see it ‘in progress’).
If you’re not interested in geographic data processing using Python then this probably won’t interest you…but for those who are interested this looks great. PyGEOS provides native Python bindings to the GEOS library which is used for geometry manipulation by many geospatial tools (such as calculating distances, or finding out whether one geometry contains another). However, by using the underlying C library PyGEOS bypasses the Python interpreter for a lot of the calculations, allowing them to be vectorised efficiently and making it very fast to apply these geometry functions: their preliminary performance tests show speedups ranging from 4x to 136x. The interface is very simple too – for example:
This project is still in the early days – but definitely one to watch as I think it will have a big impact on the efficiency of Python-based spatial analysis.
napari is a fast multi-dimensional image viewer for Python. I found out about it through an extremely comprehensive blog post written by Juan Nunez-Iglesias where he explains the background to the project and what problems it is designed to solve.
One of the key features of napari is that it has a full Python API, allowing you to easily visualise images from within Python – as easily as using imshow() from matplotlib, but with far more features. For example, to view three of the scikit-image sample images just run:
from skimage import data
import napari
with napari.gui_qt():
viewer = napari.Viewer()
viewer.add_image(data.astronaut(), name='astronaut')
viewer.add_image(data.moon(), name='moon')
viewer.add_image(data.camera(), name='camera')
You can then add some vector points over the image – for example, to use as starting points for a segmentation:
That is very useful for me already, and it’s just a tiny taste of what napari has to offer. I’ve only played with it for a short time, but I can already see it being really useful for me next time I’m doing a computer vision project, and I’m already planning to discuss some potential new features to help with satellite imagery work. Definitely something to check out if you’re involved in image processing in any way.
If you liked this, then get me to work for you! I do freelance work in data science, Python development and geospatial analysis – please contact me for more details, or look at my freelance website
Following on from my last post on plotting choropleth maps with the leaflet-choropleth library, I’m now going to talk about a small addition I’ve made to the library.
Leaflet-choropleth has built-in functionality to automatically categorise your data: you tell it how many categories you’d like and it splits it up. However, once I’d set up my webmap with leaflet-choropleth, using the automatically generated categories, my client said she wanted specific categories to be used. Unfortunately leaflet-choropleth didn’t support that…so I added it!
(It always pleases me a lot that if you’re in a situation where some open-source code doesn’t do what you want it to do, you can just modify it – and then you can contribute the code back to the project too!)
The pull request for this new functionality hasn’t yet been merged, but the updated code is available from my fork. The specific file you need is the updated choropleth.js file. Once you’ve replaced the original choropleth.js with this new version, you will be able to use a new limits option when calling L.choropleth. For example:
The value of the limits property should be the ‘dividing lines’ for the limits: so in this case there will be categories of < 1000, 1000-5000, etc.
I think that’s pretty-much all I can say about this – the code for an example map using this new functionality is available on Github and you can see a live map demo here.
This work was done while analysing GIS data and producing a webmap for a freelancing client. If you’d like me to do something similar for you, have a look at my freelance website or email me.
Some work I’ve been doing recently has involved putting together a webmap using the Leaflet library. I’ve been very impressed with how Leaflet works, and the range of plugins available for it.
leaflet-choropleth is an extension for Leaflet that allows easy generation of choropleth maps in Leaflet. The docs for this module are pretty good, so I’ll just show a quick example of how to use it in a fairly basic way:
This displays a choropleth based on the GeoJSON data in geojson, and uses a red-orange-yellow colourmap, basing the colours on the IMDRank property of each GeoJSON feature.
This will produce something like this – a map of Index of Multiple Deprivation values in Southampton, UK (read later if you want to see a Github repository of a full map):
One thing I wanted to do was create a legend for this layer in the Leaflet layers control. The leaflet-choropleth docs give an example of creating a legend, but I don’t really like the style, and the legend appears in a separate box rather than in the layers control for the map.
So, I put together a javascript function to create the sort of legend I wanted. For those who just want to use the function, it’s below. For those who want more details, read on…
function legend_for_choropleth_layer(layer, name, units, id) {
// Generate a HTML legend for a Leaflet layer created using choropleth.js
//
// Arguments:
// layer: The leaflet Layer object referring to the layer - must be a layer using
// choropleth.js
// name: The name to display in the layer control (will be displayed above the legend, and next
// to the checkbox
// units: A suffix to put after each numerical range in the layer - for example to specify the
// units of the values - but could be used for other purposes)
// id: The id to give the <ul> element that is used to create the legend. Useful to allow the legend
// to be shown/hidden programmatically
//
// Returns:
// The HTML ready to be used in the specification of the layers control
var limits = layer.options.limits;
var colors = layer.options.colors;
var labels = [];
// Start with just the name that you want displayed in the layer selector
var HTML = name
// For each limit value, create a string of the form 'X-Y'
limits.forEach(function (limit, index) {
if (index === 0) {
var to = parseFloat(limits[index]).toFixed(0);
var range_str = "< " + to;
}
else {
var from = parseFloat(limits[index - 1]).toFixed(0);
var to = parseFloat(limits[index]).toFixed(0);
var range_str = from + "-" + to;
}
// Put together a <li> element with the relevant classes, and the right colour and text
labels.push('<li class="sublegend-item"><div class="sublegend-color" style="background-color: ' +
colors[index] + '">Â </div>Â ' + range_str + units + '</li>');
})
// Put all the <li> elements together in a <ul> element
HTML += '<ul id="' + id + '" class="sublegend">' + labels.join('') + '</ul>';
return HTML;
}
This function is fairly simple: it loops through the limits that have been defined for each of the categories in the choropleth map, and generates a chunk of HTML for each of the different categories (specifically, a <li> element), and these elements are put together and wrapped in a <ul> to produce the final HTML for the legend. We also set CSS classes for each element of the legend, so we can style them nicely later.
When setting up the layers control in Leaflet you pass an object mapping display names (the text you want displayed in the layers control) to Layer objects – something like this:
var layers = {
'OpenStreetMap': layer_OSM,
'IMD': layer_IMD
};
var layersControl = L.control.layers({},
layers,
{ collapsed: false }).addTo(map);
To use the function to generate a legend, replace the simple display name with a call to the function, wrapped in []‘s because of javascript’s weird inability to parse function calls in object keys. For example:
Here we’re passing layer_IMD as the Layer object, IMD as the name to display above the legend, no units (so the empty string), and telling it to give the legend HTML element an ID of legend_IMD.
This produces a legend that looks something like this:
To get this nice looking legend, we use the following CSS:
Just for one final touch, I’d like the legend to disappear when the layer is ‘turned off’, and appear again when it is ‘turned on’ again. This is particularly useful when you have multiple choropleth layers on a map and the combined length of the legends make the layers control very long.
We can do this with a quick bit of jQuery (yes, I know it can be done in pure javascript, but I prefer using jQuery as it’s generally easier). Remember that one of the parameters to the legend_for_choropleth_layer function was the HTML ID to give the legend? Now you know why: we need to use that ID to hide and show the legend.
We connect to some of the Leaflet events to find out when the layers are turned on or off, and then use the jQuery hide and show methods. There’s one little niggle though: we have to use the setTimeout function to ensure that we only run this once – otherwise we get multiple events raised and it causes problems. So, the code to do this is:
layer_IMD.on('add', function () {
// Need setTimeout so that we don't get multiple
// onadd/onremove events raised
setTimeout(function () {
$('#legend_IMD').show();
});
});
layer_IMD.on('remove', function () {
// Need setTimeout so that we don't get multiple
// onadd/onremove events raised
setTimeout(function () {
$('#legend_IMD').hide();
});
});
You can see how this works by looking at the final map here – try turning the IMD layer off and on again.
All of the code behind this example is available on Github if you want to check how it all fits together.
This work was done while analysing GIS data and producing a webmap for a freelancing client. If you’d like me to do something similar for you, have a look at my freelance website or email me.