Robin's Blog

Introducing offline_folium

Another new-ish package that I’ve never got around to writing about on my blog is offline_folium. It has a somewhat niche use-case, but it seems like a few people have found it useful.

In brief, it allows you to use the folium package for creating interactive maps from Python, but without an internet connection. Folium is built on top of the Leaflet web mapping library – and it loads all the relevant JS and CSS files directly from CDNs. This means that if you try and use folium without an internet connection it just doesn’t work. Offline_folium works around this by downloading the files when you do have an internet connection, and then re-writing the links to the files to point to the offline versions.

I originally created this package when doing freelance work on a project for the UK Navy – they wanted to have interactive maps on an air-gapped computer, so I built a system that would allow this (in this case, the JS/CSS file download step was run as part of building the ‘app’ we produced). I should note at this point that without an internet connection the default OpenStreetMap background mapping will not work. In some situations that would be a significant problem – but we were using other background mapping, coastline vector files and so on, so it wasn’t a problem for us.

So, how do you use this package?

First, install it using pip install offline_folium. Then make sure you have an internet connection and run python -m offline_folium – this will download the JS/CSS files and store them in a sensible place (chosen automatically). Now you’re all set up.

Once you have no internet connection and want to use folium, you must import it in a slightly different way – first import offline from offline_folium, and then import folium, like this:

from offline_folium import offline
import folium

Now you can use folium as usual, for example by creating a simple map object:

m = folium.Map()

So, that’s pretty-much it. People are actively submitting pull requests at the moment, so hopefully the functionality will expand to work with various folium plugins too.

For more information (or to submit a PR yourself), see the Github repo.


Introducing pyAURN – a Python package for accessing UK air quality data

I realised recently that I’d never actually blogged about my pyAURN package – so it’s about time that I did.

When doing some freelance work on air quality a while back, I wanted an easy way to access UK air quality from the Automatic Urban and Rural Network (AURN). Unfortunately, there isn’t a nice API for accessing the data. Strangely, though, they do provide the data in a series of RData files for use by the openair R package. I wanted to use the data from Python though – but conveniently there is a Python package for reading RData files.

So, I put all these together into a simple Python package called pyAURN. It is strongly based upon openair, but is a lot more limited in functionality – it basically only covers importing the data, and doesn’t have many plotting or analysis functions.

Here’s an example of how to use it:

from pyaurn import importAURN, importMeta, timeAverage

# Download metadata of site IDs, names, locations etc
metadata = importMeta()

# Download 4 years of data for the Marylebone Road site
# (MY1 is the site ID for this site)
# Note: range(2016, 2022) will produce a list of six years: 2016, 2017, 2018, 2019, 2020, and 2021. 
# Alternatively define a list of years to use eg. [2016,2017,2018,2019,2020,2021]
data = importAURN("MY1", range(2016, 2022))

# Group the DataFrame by a frequency of monthly, and the statistic mean(). 
data_monthly = timeAverage(data,avg_time="month",statistic="mean")

I found this really useful for my air quality analysis work – and I hope you do too. The package is on PyPI, so you can run pip install pyaurn or view the project on Github.


How to install the Python triangle package on an Apple Silicon Mac

I was recently trying to set up RasterVision on my Apple Silicon Mac (specifically a M1 MacBook Pro, but I’m pretty sure this applies to any Apple Silicon Mac). It all went fine until it came time to install the triangle package, when I got an error. The error output is fairly long, but the key part is the end part here:

triangle/core.c:196:12: fatal error: 'longintrepr.h' file not found
        #include "longintrepr.h"
                 ^~~~~~~~~~~~~~~
      1 error generated.
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]

It took me quite a bit of searching to find the answer (Google just isn’t very good at giving relevant results these days), but actually it turns out to be very simple. The latest version of triangle on PyPI doesn’t work on Apple Silicon, but the code in the Github repository does work, so you can install directly from Github with this command:

pip install git+https://github.com/drufat/triangle.git

and it should all work fine.

Once you’ve done this, install rastervision again and it should recognise that the triangle package is already installed and not try to install it again.


How to subscribe to releases on Github

For a while I’d wished there was an easy way to get notified when my favourite open-source packages release a new version. I’d often see something on social media, but that tended to only be for the larger packages – and I wanted to keep up with the smaller ones too.

When I actually bothered to put five minutes of work into finding out if this was possible, I found it was pretty easy. So, here’s how to do it:

  1. Go to the Github repository page for the package you’re interested in.

  2. Click the down arrow next to the ‘Watch’ badge at the top right of the header:

  1. Click ‘Custom’ and then tick ‘Releases’ and click ‘Apply’:

And that’s it! Now you’ll get a notification email when a new release is created on Github. Of course, this relies on the specific project using Github Releases, which not all open-source projects do, but a fairly high proportion of them seem to. After doing this you’ll get an email when there is a new release, with the details of the release and links to see it on Github. An example email for the stac-fastapi repository is below:

Update: Peter in the comments informed me that you can also get an Atom feed for use in your RSS reader by going to https://github.com/{owner}/{repo}/releases.atom. For example, this link for my Py6S package


Some matplotlib tips – a reblog

I was looking through my past blog posts recently, and thought a few of them were worth ‘reblogging’ so that more people could see them (now that my blog posts are getting more readers). So, here are a few posts on matplotlib tips.

Matplotlib titles have configurable locations – and you can have more than one at once!


This post explains how to create matplotlib titles in various locations.

Easily hiding items from the legend in matplotlib


This post explains how to easily hide items from the legend in matplotlib.

Easily specifying colours from the default colour cycle in matplotlib


This post shows how to specify colours from the default colour cycle – ranging from a very simple way to more complex methods that might work in other situations.


New Projects page on my website

Just a quick post here to say that I’ve added a new Projects page to my freelance website. I realised I didn’t have anywhere online that I could point people to that had links to all of the ‘non-work’ (maybe that should be ‘non-paid’) projects I’ve made.

These projects include my Free GIS Data site, the British Placename Mapper, Py6S and more. I’ve also put together a separate page (linked from the projects page) with all my university theses (PhD, MSc and undergraduate) and other university work – which still get a remarkably high number of downloads.

Have a look here, or see a screenshot of the first few entries below:


Simple segmentation of geospatial images

I had a need to do some segmentation of some satellite imagery the other day, for a client. Years ago I was quite experienced at doing segmentation and classification using eCognition but that was using the university’s license, and I don’t have a license myself (and they’re very expensive). So, I wanted a free solution.

First, however, let’s talk a little bit about what segmentation is (thanks to Sonya in the comments for pointing out that I didn’t cover this!). Segmentation is a way of splitting an image up into groups of adjacent pixels (‘segments’ or ‘objects’) that ‘look right’ and, ideally, represent objects in the image. For example, an image of cells from a microscope might be segmented into individual cells, or individual organelles inside the cell (depending on the scale), a satellite image might be segmented into fields, clumps of urban area or individual buildings/swimming pools/driveways – again, depending on the scale. Segmentation uses properties of the image (like colour, texture, sharp lines etc) to try and identify these segments.

Once you’ve segmented an image, you can then do things with the segments – for example, you can classify each segment into a different category (building, road, garden, lake). This is often easier than classifying individual pixels, as you have group statistics about the segment (not just ‘value in the red band’, but a whole histogram of values in the red band, plus mean, median, max etc, plus texture measures and more). Sometimes you may want to use the segment outlines themselves as part of your output (eg. as building outlines), other times they are just used as a way of doing something else (like classification). This whole approach to image processing is often known as Object-based Image Analysis.

There are various segmentation tools in the scikit-image library, but I’ve often struggled using them on satellite or aerial imagery – the algorithms seem better suited to images with a clear foreground and background.

Luckily, I remembered RSGISLib – a very comprehensive library of remote sensing and GIS functions. I last used it many years ago, when most of the documentation was for using it from C++, and installation was a pain. I’m very pleased to say that installation is nice and easy now, and all of the examples are in Python.

So, doing segmentation – using an algorithm specifically designed for segmenting satellite/aerial images – is actually really easy now. Here’s how:

First, install RSGISLib. By far the easiest way is to use conda, but there is further documentation on other installation methods, and there are Docker containers available.

conda install -c conda-forge rsgislib

Then it’s a simple matter of calling the relevant function from Python. The documentation shows the segmentation functions available, and the one you’re most likely to want to use is the Shepherd segmentation algorithm, which is described in this paper). So, to call it, run something like this:

from rsgislib.segmentation.shepherdseg import run_shepherd_segmentation

run_shepherd_segmentation(input_image, output_seg_image,
                          gdalformat=’GTiff’,
                          calc_stats=False,
                          num_clusters=20,
                          min_n_pxls=300)

The parameters are fairly self-explanatory – it will take the input_image filename (any GDAL-supported format will work), produce an output in output_seg_image filename in the gdalformat given. The calc_stats parameter is important if you’re using a format like GeoTIFF, or any format that doesn’t support a Raster Attribute Table (these are mostly supported by somewhat more unusual formats like KEA). You’ll need to set it to False if your format doesn’t support RATs – and I found that if I forgot to set it to false then the script crashed when trying to write stats.

The final two parameters control how the segmentation algorithm itself works. I’ll leave you to read the paper to find out the details, but the names are fairly self-explanatory.

The output of the algorithm will look something like this:

It’s a raster where the value of all the pixels in the first segment are 1, the pixels in the second segment are 2, and so on. The image above uses a greyscale ‘black to white’ colormap, so as the values of the segments increase towards the bottom of the image, they show as more white.

You can convert this raster output to a set of vector polygons, one for each segment, by using any standard raster to vector ‘polygonize’ algorithm. The easiest is probably using GDAL, by running a command like:

gdal_polygonize.py SegRaster.tif SegVector.gpkg

This will give you a result that looks like the red lines on this image:

So, there’s a simple way of doing satellite image segmentation in Python. I hope it was useful.


What’s the largest building in Southampton? Find out with 5 lines of code

Recently I became suddenly curious about the sizes of buildings in Southampton, UK, where I live. I don’t know what triggered this sudden curiosity, but I wanted to know what the largest buildings in Southampton are. In this context, I’m using “largest” to mean largest in terms of land area covered – ie. the area of the outline when viewed in plan view. Luckily I know about useful sources of geographic data, and also know how to use GeoPandas, so I could answer this question pretty quickly – in fact, in only five lines of code. I’ll take you through this code below, as an example of a very simple GIS analysis.

First I needed to get hold of the data. I know Ordnance Survey release data on buildings in Great Britain, but to make this even easier we can go to Alastair Rae’s website where he has split the OS data up into Local Authority areas. We need to download the buildings data for Southampton, so we go here and download a GeoPackage file.

Then we need to create a Python environment to do the analysis in. You can do this in various ways – with virtualenvs, conda environments or whatever – but you just need to ensure that Jupyter and GeoPandas are installed. Then create a new notebook and you’re ready to start coding.

First, we import geopandas:

import geopandas as gpd

and then load the buildings data:

buildings = gpd.read_file("Southampton_buildings.gpkg")

The buildings GeoDataFrame has a load of polygon geometries, one for each building. We can calculate the area of a polygon with the .area property – so to create a new ‘area’ column in the GeoDataFrame we can run:

buildings[’area’] = buildings.geometry.area

I’m only interested in the largest buildings, so we can now sort by this new area column, and take the first twenty entries:

top20 = buildings.sort_values(’area’, ascending=False).head(20)

We can then use the lovely explore function to show these buildings on a map. This will load an interactive map in the Jupyter notebook:

top20.explore()

If you’d like to save the interactive map to a standalone HTML file then you can do this instead:

top20.explore().save(“map.html”)

I’ve done that, and uploaded that HTML file to my website – and you can view it here.

So, putting all the code together, we have:

import geopandas as gpd
buildings = gpd.read_file("Southampton_buildings.gpkg")
buildings[’area’] = buildings.geometry.area
top20 = buildings.sort_values(’area’, ascending=False).head(20)
top20.explore()

Five lines of code, with a simple analysis, resulting in an interactive map, and all with the power of GeoPandas.

Hopefully in a future post I’ll do a bit more work on this data – I’d like to make a prettier map, and I’d like to try and find some way to test my friends and see if they can work out what buildings they are.


Free GIS Data site update – all links now working!

This is just a quick post to let you know that my Free GIS Data site is still going, and I’ve recently been through and fixed (or removed) all of the broken links.

For those of you who aren’t aware of the site, it provides categorised links to over 500 sites which have freely available geographic data available for download. Categories include Physical Geography, Natural Hazards, Land Use, Population and countries ranging from Afghanistan to USA.

I got a bit behind with fixing broken links (some sites had just disappeared, but many sites had just changed their URLs without setting up any redirects – remember cool URLs don’t change), but they should all be fixed now. I try and keep the site up-to-date with new data sources I find – if you know of any that should be listed then please send me an email, a tweet, a toot or similar.