Robin's Blog

Learning resources for GIS in Python with cloud-native geospatial, PostGIS and more

March 28, 2025

I recently gave a careers talk to students at Solent University, and through that I got to know a MSc student there who had previous GIS experience and was now doing a Data Analytics and AI MSc course. Her GIS experience was mostly in the ESRI stack (ArcGIS and related tools) and she was keen to learn other tools and how to combine her new Python and data knowledge with her previous GIS knowledge. I wrote her a long email with links to loads of resources and, with her permission, I’m publishing it here as it may be useful to others. The general focus is on the tools I use, which are mostly Python-focused, but also on becoming familiar with a range of tools rather than using tools from just one ecosystem (like ESRI). I hope it is useful to you.

Tools to investigate:

GDAL
GDAL is a library that consists of two parts GDAL and OGR. It provides ways to read and write geospatial data formats like shapefile, geopackage, GeoJSON, GeoTIFF etc – both raster (GDAL) and vector (OGR). It has a load of command-line tools like gdal_translate, ogr2ogr, gdalwarp and so on. These are extremely useful for converting between formats, importing data to databases, doing basic processing etc. It will be very useful for you to become familiar with the GDAL command-line tools. It comes with a Python interface which is a bit of a pain to use, and there are nicer libraries that are easier for using GDAL functionality from Python. A good tutorial for the command-line tools is at https://courses.spatialthoughts.com/gdal-tools.html
Command line tools in general
Getting familiar with running things from the command-line (Command Prompt on Windows) is very useful. On Windows I suggest installing ‘Windows Terminal’ (https://apps.microsoft.com/detail/9n0dx20hk701?hl=en-GB&gl=GB) and using that – but make sure you select standard command prompt not Powershell when you open a terminal using it.
Git
There’s a good tutorial at https://swcarpentry.github.io/git-novice/
qgis2web
This is a plugin for QGIS that will export a QGIS project to a standalone web map, with the styling/visualisation as close to the original QGIS project as they can manage. It’s very useful for exporting maps to share easily. There’s a tutorial at https://www.qgistutorials.com/en/docs/web_mapping_with_qgis2web.html and the main page is at https://plugins.qgis.org/plugins/qgis2web/

Python libraries to investigate:

GeoPandas – like pandas but including geometry columns for geospatial information. Try the geopandas explore() method, which will do an automatic webmap of a GeoPandas GeoDataFrame (like you did manually with Folium, but automatically)
rasterio – a nice wrapper around GDAL functionality that lets you easily load/manipulate/save raster geospatial data
fiona – a similar wrapper for vector data in GDAL/OGR
shapely – a library for representing vector geometries in Python – used by fiona, GeoPandas etc
rasterstats – a library for doing ‘zonal statistics’ – ie. getting raster values within a polygon, at a point etc

Cloud Native Geospatial
There’s a good ‘zine’ that explains the basics behind cloud-native geospatial – see https://zines.developmentseed.org/zines/cloud-native/. Understanding the basics of the topics in there would be good. There are loads of good tutorials online for using STAC catalogues, COG files and so on. See https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/ and https://planetarycomputer.microsoft.com/docs/tutorials/cloudless-mosaic-sentinel2/ and https://github.com/microsoft/PlanetaryComputerExamples/blob/main/tutorials/surface_analysis.ipynb

My Blog
You can subscribe via email on the left-hand side at the bottom of the sidebar
Relevant posts:

Conference talks
These can be a really good way to get a brief intro to a topic, to know where to delve in deeper later. I often put them on and half-listen while I’m doing something else, and then switch to focusing on them fully if they get particularly interesting. There are loads of links here, don’t feel like you have to look at them all!

PostGIS Day conference: https://www.crunchydata.com/blog/postgis-day-2024-summary
Particularly relevant talks:

https://www.youtube.com/watch?v=O45Zy5zKkm8 – good example of building up a complex SQL query
https://www.youtube.com/watch?v=mR0WshjWfVY – includes descriptions of on-the-fly Mapbox Vector Tile creation like we use in GPAP
https://www.youtube.com/watch?v=BsFCVTBzTvY – also includes on-the-fly MVT creation
https://www.youtube.com/watch?v=1KhWJHKuNCY

FOSS4G UK conference last year in Bristol: https://uk.osgeo.org/foss4guk2024/bristol.html
Most relevant talks for you are the following (just the slides):

https://uk.osgeo.org/foss4guk2024/decks/08-robin-wilson-cloud-native-flood-risk.pdf – see what you think of the large SQL query that I talk through
https://uk.osgeo.org/foss4guk2024/decks/11-al-graham-overture.pdf
https://uk.osgeo.org/foss4guk2024/decks/13-alexei-qgis-grid-layout-plugin.pdf
https://uk.osgeo.org/foss4guk2024/decks/14-matt-travis-overture.pdf

FOSS4G conference YouTube videos: https://www.youtube.com/@FOSS4G/videos – they have a load of ones from 2022 at the top for some reason, but if you scroll down a long way you can find 2023 and 2024 stuff. Actually, better is to use this playlist of talks from the 2023 global conference: https://www.youtube.com/playlist?list=PLqa06jy1NEM2Kna9Gt_LDKZHv1dl4xUoZ
Here’s a few talks that might be particularly interesting/relevant to you, in no particular order

Suggestions for learning projects/tasks
(These are quite closely related to the MSc project that this student might be doing, but are probably useful for people generally)
I know when you’re starting off it is hard to work out what sort of things to do to develop your skills. One thing that is really useful is to become a bit of a ‘tool polyglot’, so you can do the same task in various tools depending on what makes sense in the circumstances.

I’ve listed a couple of tasks below. I’d suggest trying to complete them in a few ways:

Using QGIS and clicking around in the GUI
Using Python libraries like geopandas, rasterio and so on
Using PostGIS
(Possibly – not essential) Using the QGIS command-line, or model builder or similar

Task 1 – Flood risk

Download the ‘Flood Zone 2’ flood risk data from https://environment.data.gov.uk/explore/86ec354f-d465-11e4-b09e-f0def148f590?download=true for a particular area (maybe the whole of Southampton?)
Download OS data on buildings from this page – https://automaticknowledge.org/gb/ – you can download it for a specific local authority area
Find all buildings at risk of flooding, and provide a count of buildings at risk and a map of buildings at risk (static map or web map)
Extension task: also provide a total ground area of buildings at risk

Task 2 – Elevation data
(Don’t do this with PostGIS as its raster functionality isn’t great, but you could probably do all of this with GDAL command-line tools if you wanted)

Download Digital Terrain Model data from https://environment.data.gov.uk/survey – download multiple tiles
Mosaic the tiles together into one large image file
Do some basic processing on the DEM data. For example, try:
a) Subtracting the minimum value, so the lowest elevation comes out as a value of zero
b) Running a smoothing algorithm across the DEM to remove noise
Produce a map – either static or web map

1 Comment

A load more links

March 17, 2025

I did a post a while back which was just a lot of links to things I found interesting, mostly in the geospatial/data/programming sphere. Since then I’ve collected a lot more links – so here are some of them. The theme, such as there is, seems to be ‘this would have really helped me about X contracts ago, if it had existed then/I had known about it then’. Make of that what you will…

The stac-tools raster footprint utility – a useful new-ish tool that generates nice, accurate but simple outlines (‘footprints’) for the area covered by a raster file (as shown above) all ready to put into a STAC catalog
Is Antarctica Greening? – a brief article looking at some of the technicalities of using NDVI time series to monitor greening in Antarctica (reminds me of some of the issues I had using time series in my PhD)
VTracer – an interactive web interface to an open-source tool to convert raster images to vector SVGs (not geospatial images, just images in general). Gives great immediate feedback on parameter changes
tilegroxy – a proxy that can sit between end users of web map raster/vector tiles and the sources, managing things like caching, authentication etc. Seriously considering using this with my current client.
grid-banger – a Python package for converting between Ordnance Survey grid co-ordinates and latitude/longitude. Unlike many converters, it works with both fully numerical grid references, and those that start with letters (like TQ213455)
GeoDeep – a new and simple (but seemingly quite powerful) tool for doing basic deep learning on geospatial rasters. Can do things like extract cars, trees, buildings – and even extract road polygons (something which would have been useful a couple of clients ago…)
LosslessCut – not geospatial or data related, but very useful: I’ve recently been digitising some old VHS tapes, and this makes it very easy to mark chunks of video files and then export each chunk to a separate file – all while keeping the quality high
labellines – a neat little Python package for labelling lines on a graph by putting the label inside the line, rather than relying on a separate legend (see image below)
act – lets you run Github Actions locally, to save doing a million commits to see how your CI/deployment/etc runs. This would have saved me a lot of time about four contracts ago.
cuttle – this one isn’t even tech related, it’s the rules for a fairly fun card game that I’ve been playing with my wife recently. Sometime I’ll do a blog post containing my player cheatsheet that I put together.
It’s cool to care – a blog post from Alex Chan where they explain something that is very important to me: that it is cool to be enthusiastic/interested/excited by something, even if other people aren’t.
FILTER in Postgres – simple explanation of a nice bit of SQL syntax in Postgres that allows you to write things like SELECT COUNT(*) FILTER (WHERE b > 11)
QMapCompare plugin for QGIS – a useful plugin that lets you compare two views of a map, either side-by-side, with a swipe or with a focus area following the map
BGNix – a 100% free way to remove backgrounds from images (again, not geospatial in this case – just things like photos or clipart). It uses an AI model that runs entirely on your local device, and so doesn’t send your images anywhere making it high-privacy too!
Spy for changes with sys.monitoring – nice example of how to use the new sys.monitoring functionality (a newer, better version of sys.settrace) in Python to help with debugging
QGIS Deepness – a plugin for easily running deep learning models in QGIS, including a ‘model zoo’ of models that can be set up very quickly
lonboard – I mentioned lonboard in my last list of links – it’s a Python library for creating interactive maps – but hugely faster than most alternatives. Here’s a new version with animation added, allowing some pretty cool animated maps to be made.
geopandas – another new release, this time for geopandas with some nice new functionality
maplibre-cog-protocol – a plugin for maplibre that lets you load COGs directly (like georaster-layer-for-leaflet does for Leaflet)

Share your thoughts

My talk at FOSS4G UK South West 2024

March 14, 2025

As always, this post is very delayed – apologies. In fact, I was encouraged to write this by a friend who I see at PyData Southampton (Hi, if you’re reading this!). I mentioned my talk in passing to her, and she asked if I’d blogged about it yet. I admitted that I hadn’t, and promised I would by the next PyData Southampton. Well, I totally failed at that – but there is another PyData Southampton meetup on Tuesday, so I’m going to get it done in time for that.

The FOSS4G UK South West conference 2024 took place in Bristol on 12th November. I gave a talk there entitled Using cloud-native geospatial technologies to build a web app for analysing and reducing flood risk, talking about some of the work I’ve done with the company I’m currently working with: Rebalance Earth.

The talk covers the development of a web app for looking at assets (businesses, buildings, substations etc) that are at risk from flooding in the UK, and comparing various flood scenarios to understand how risk could be reduced by Natural Flood Management strategies such as river restoration. After introducing Rebalance Earth and the web app itself, I talk about the technologies behind it and the ‘cloud native’ manner in which it was designed. I specifically cover generating Mapbox Vector Tiles on-the-fly from a PostGIS database, and generating raster tiles on-the-fly from COG files stored in cloud storage.

Full slides are available here. There is also a video recording of the talk available, but it’s a bit hard to watch as you can’t see the slides on the video.

Once you’ve had a look at my talk, don’t forget to check out the other talks that were given at the conference, they were great!

Share your thoughts

The Feynman approach to small monetary compensation

November 26, 2024

I just shared this approach with some friends, and thought I’d blog it here too.

When I get a relatively small amount of monetary compensation for something, I take the ‘Feynman Approach’ to it and buy something fun with the money, giving me a sense of satisfaction from the compensation (which, presumably, was to compensate me for something bad that happened).

This comes from a story related in Surely you’re joking, Mr Feynman:

Now, it’s some dopey legal thing, but when you give the patent to the government, the document you sign is not a legal document unless there’s some exchange, so the paper I signed said,
"For the sum of one dollar, I, Richard P. Feynman, give this idea to the government . . ."
I sign the paper.
"Where’s my dollar?"
…[he eventually gets the dollar]
I take the dollar, and I realize what I’m going to do. I go down to the grocery store, and I buy a dollar’s worth – which was pretty good, then – of cookies and goodies, those chocolate goodies with marshmallow inside, a whole lot of stuff.
I come back to the theoretical laboratory, and I give them out: "I got a prize, everybody! Have a cookie! I got a prize! A dollar for my patent! I got a dollar for my patent!"
…[everyone else wants to get a real dollar for their patent]

I don’t usually spend the compensation amount on sweets or baked goods (unless it’s really quite small), but I often buy myself a little something I wanted for fun – like a few electrical components, a 3D printing accessory (or filament), or just a second-hand book (or a new one if the amount is enough).

Recent bits of compensation I’ve used this approach with are mostly Delay Repay for delayed trains (it means I was inconvenienced by the train delay, so I ‘deserve’ something nice), but it can apply to other things too. I don’t usually apply it if the compensation is reasonably large, and obviously not if I’m really short of money (in that case the money goes into ‘general funds’), but for compensation less than £15-20 it’s often an approach I take.

(I should point out that I definitely don’t agree with Feynman on everything, particularly some of his views on women, but in general I enjoyed his books)

Share your thoughts

Join the GeoTAM hackathon to work out business turnovers!

November 4, 2024

Summary: I’m involved in organising a hackathon, and I’d love you to take part. The open-source GeoTAM hackathon focuses on estimating turnover for individual business locations in the UK, from a variety of open datasets. Please checkout the hackathon page and sign up. There are prizes of up to £2,000!

(Click image for a larger version)

I’m currently working with Rebalance Earth, a boutique asset manager who are focused on making nature an investable asset. Our aim is to mobilise investment in UK natural infrastructure – for example, by arranging investment to undertake river restoration and reduce the risk of flooding. We will do this by finding businesses at risk of flooding, designing restoration schemes that will reduce this risk, and setting up ‘Nature-as-a-Service’ contracts with businesses to pay for the restoration.

I’m the Lead Geospatial Developer at Rebalance Earth, and am leading the development of our Geospatial Predictive Analytics Platform (GPAP), which helps us assess businesses at risk of flooding and design schemes to reduce this flooding.

An important part of deciding which areas to focus on is estimating the total business value at risk from flooding. A good way of establishing this is to use an estimate of the business turnover. However, there are no openly-available datasets showing business turnover in the UK – which is where the hackathon comes in.

We’re looking for participants to bring their expertise in programming, data science, machine learning and more to take some datasets we provide, combine them with other open data and try and estimate turnover. Specifically, we’re interested in turnover of individual business locations – for example, the turnover of a specific supermarket, not the whole supermarket chain.

The hackathon runs from 20th – 26th November 2024. We’ll provide some datasets, some ideas, and a Discord server to communicate through. We’d like you to bring your expertise and see what you can produce. This is a tricky task, and we’re not expecting fully polished solutions; proof-of-concept solutions are absolutely fine. You can enter as a team or an individual.

Most importantly, there are prizes:

£2,000 for the First Prize
£1,000 for the Second Prize
£500 for the Third Prize

and there’s a possibility that we might even hire you to continue work on your idea!

So, please sign up and tell your friends!

1 Comment

How to convert hex-encoded WKB geometries in PostGIS

October 9, 2024

A quick post today to talk about a couple of PostGIS functions I learnt recently.

I had a CSV file that contained well-known binary (WKB) representations of geometries, stored as hexadecimal strings. I imported the CSV into a PostGIS database, and wanted to convert these to be proper PostGIS geometries.

I initially went for the ST_GeomFromWKB function, but kept getting an error that the function didn’t exist. Well, actually the error said that it couldn’t find a function that exists with that name and those specific parameter types. That’s because I was calling it with the text column containing the hex strings, and the documentation for ST_GeomFromWKB says that its signature is one of:

geometry ST_GeomFromWKB(bytea geom);
geometry ST_GeomFromWKB(bytea geom, integer srid);

So, we need to convert the hexadecimal string to a bytea type – that is, a proper set of bytes. We can do this using the decode function, which takes two parameters: the text to decode, and a format specifier which must be one of base64, escape or hex. In this case, we’re decoding a hex string, so we want the latter.

Putting this together, we can write some simple SQL that does what we want. First, we create a geom column in our table:

ALTER TABLE test ADD geom Geometry;

and then we set that column to be the result of decoding the hex and converting the WKB to a geometry:

UPDATE test SET geom = ST_GeomFromWKB(decode(wkb_column, 'hex'), 4326);

Note that here I knew that my WKB was encoding a point in the WGS84 latitude/longitude co-ordinate system, so I passed the EPSG code 4326 which refers to this co-ordinate system.

Share your thoughts

I won two British Cartographic Society awards!

October 1, 2024

It’s been a while since I posted here – I kind of lost momentum over the summer (which is a busy time with a school-aged child) and never really picked it up again.

Anyway, I wanted to write a quick post to tell people that I won two awards at the British Cartographic Society awards ceremony a few weeks ago.

They were both for my British Placename Mapper web app, which is described in more detail in this blog post. If you haven’t seen it already, I strongly recommend you check it out.

I won a Highly Commended certificate in the Avenza Award for Electronic Mapping, and the First Prize trophy for the Ordnance Survey Award (for any map using OS data).

The certificates came in a lovely frame, and the trophy is enormous – about 30cm high and weighing over 3kg!

Here’s the trophy:

I was presented with the trophy at the BCS Annual Conference in London, but they very kindly offered to keep the trophy to save me carrying it across London on my wheelchair and back on the train, so they invited me to Ordnance Survey last week to be presented with it again. I had a lovely time at OS – including 30 minutes with their Director General/CEO and was formally presented with my trophy again (standing in front of the first ever Ordnance Survey map!):

Full information on the BCS awards are available on their website and I strongly recommend submitting any appropriate maps you’ve made for next year’s awards. I need to get my thinking cap on for next year’s entry…

Share your thoughts

Searching an aerial photo with text queries – a demo and how it works

July 11, 2024

Summary: I’ve created a demo web app where you can search an aerial photo of Southampton, UK using text queries such as "roundabout", "tennis court" or "ship". It uses vector embeddings to do this – which I explain in this blog post.

In this post I’m going to try and explain a bit more about how this works.

Firstly, I should explain that the only data used for the searching is the aerial image data itself – even though a number of these things will be shown on the OpenStreetMap map, none of that data is used, so you can also search for things that wouldn’t be shown on a map (like a blue bus)

The main technique that lets us do this is vector embeddings. I strongly suggest you read Simon Willison’s great article/talk on embeddings but I’ll try and explain here too. An embedding model lets you turn a piece of data (for example, some text, or an image) into a constant-length vector – basically just a sequence of numbers. This vector would look something like [0.283, -0.825, -0.481, 0.153, ...] and would be the same length (often hundreds or even thousands of elements long) regardless how long the data you fed into it was.

In this case, I’m using the SkyCLIP model which produces vectors that are 768 elements long. One of the key features of these vectors are that the model is trained to produce similar vectors for things that are similar in some way. For example, a text embedding model may produce a similar vector for the words "King" and "Queen", or "iPad" and "tablet". The ‘closer’ a vector is to another vector, the more similar the data that produced it.

The SkyCLIP model was trained on image-text pairs – so a load of images that had associated text describing what was in the image. SkyCLIP’s training data "contains 5.2 million remote sensing image-text pairs in total, covering more than 29K distinct semantic tags" – and these semantic tags and the text descriptions of them were generated from OpenStreetMap data.

Once we’ve got the vectors, how do we work out how close vectors are? Well, we can treat the vectors as encoding a point in 768-dimensional space. That’s a bit difficult to visualise – so imagine a point in 2- or 3-dimensional space as that’s easier, plotted on a graph. Vectors for similar things will be located physically closer on the graph – and one way of calculating similarity between two vectors is just to measure the multi-dimensional distance on a graph. In this situation we’re actually using cosine similarity, which gives a number between -1 and +1 representing the similarity of two vectors.

So, we now have a way to calculate an embedding vector for any piece of data. The next step we take is to split the aerial image into lots of little chunks – we call them ‘image chips’ – and calculate the embedding of each of those chunks, and then compare them to the embedding calculated from the text query.

I used the RasterVision library for this, and I’ll show you a bit of the code. First, we generate a sliding window dataset, which will allow us to then iterate over image chips. We define the size of the image chip to be 200×200 pixels, with a ‘stride’ of 100 pixels which means each image chip will overlap the ones on each side by 100 pixels. We then configure it to resize the output to 224×224 pixels, which is the size that the SkyCLIP model expects as input.

ds = SemanticSegmentationSlidingWindowGeoDataset.from_uris(
        image_uri=uri,
        image_raster_source_kw=dict(channel_order=[0, 1, 2]),
        size=200,
        stride=100,
        out_size=224,
    )

We then iterate over all of the image chips, run the model to calculate the embedding and stick it into a big array:

dl = DataLoader(ds, batch_size=24)

EMBEDDING_DIM_SIZE = 768
embs = torch.zeros(len(ds), EMBEDDING_DIM_SIZE)

with torch.inference_mode(), tqdm(dl, desc='Creating chip embeddings') as bar:
    i = 0
    for x, _ in bar:
        x = x.to(DEVICE)
        emb = model.encode_image(x)
        embs[i:i + len(x)] = emb.cpu()
        i += len(x)

# normalize the embeddings
embs /= embs.norm(dim=-1, keepdim=True)

embs.shape

We also do a fair amount of fiddling around to get the locations of each chip and store those too.

Once we’ve stored all of those (I’ll get on to storage in a moment), we need to calculate the embedding of the text query too – which can be done with code like this:

text = tokenizer(text_queries)
with torch.inference_mode():
        text_features = model.encode_text(text.to(DEVICE))
        text_features /= text_features.norm(dim=-1, keepdim=True)
        text_features = text_features.cpu()

It’s then ‘just’ a matter of comparing the text query embedding to the embeddings of all of the image chips, and finding the ones that are closest to each other.

To do this, we can use a vector database. There are loads of different vector databases to choose from, but I’d recently been to a tutorial at PyData Southampton (I’m one of the co-organisers, and I strongly recommend attending if you’re in the area) which used the Pinecone serverless vector database, and they have a fairly generous free tier, so I thought I’d try that.

Pinecone, like all other vector databases, allows you to insert a load of vectors and their metadata (in this case, their location in the image) into the database, and then search the database to find the vectors closest to a ‘search vector’ you provide.

I won’t bother showing you all the code for this side of things: it’s fairly standard code for calling Pinecone APIs, mostly copied from their tutorials.

I then wrapped this all up in a FastAPI API, and put a simple Javascript front-end on it to display the results on a Leaflet web map. I also added some basic caching to stop us hitting the Pinecone API too frequently (as there is limit to the number of API calls you can make on the free plan). And that’s pretty-much it.

I hope the explanation made sense: have a play with the app here and post a comment with any questions.

2 Comments

Who reads my blog? Send me an email or comment if you do!

July 7, 2024

I’m interested to find out who is reading my blog. Following the lead of Jamie Tanna who was in turn copying Terence Eden (both of whose blogs I read), I’d like to ask people who read this to drop me an email or leave a comment on this post if you read this blog and see this post. I have basic analytics on this blog, and I seem to get a reasonable number of views – but I don’t know how many of those are real people, and how many are bots etc.

Feel free to just say hi, but if you have chance then I’d love to find out a bit more about you and how you read this. Specifically, feel free to answer any or all of the following questions:

Do you read this on the website or via RSS?
Do you check regularly/occasionally for new posts, or do you follow me on social media (if so, which one?) to see new posts?
How did you find my blog in the first place?
Are you interested in and/or working in one of my specialisms – like geospatial data, general data science/data processing, Python or similar?
Which posts do you find most interesting? Programming posts on how to do things? Geographic analyses? Book reviews? Rare unusual posts (disability, recipes etc)?
Have you met me in real life?
Is there anything particular you’d like me to write about?

The comments box should be just below here, and my email is robin@rtwilson.com

Thanks!

11 Comments

< Older Entries