Robin's Blog

Pint + SQLAlchemy = Unit consistency and enforcement in your database

Last week I presented a poster at PyData Global 2020, about linking the pint and SQLAlchemy libraries together to provide robust handling of units with databases in Python.

The poster is shown below: click to enlarge so you can read the text:

The example code is available on Github and is well-commented to make it fairly easy to understand.

That poster was designed to be read when you had the opportunity to chat to me about it: as that isn’t necessarily the case, I’ll explain some of it in more detail below.

Firstly, the importance of units: I’m sure you can come up with lots of examples of situations when it’s really important to know what units your data are in. 42 metres, 42 miles and 42 lightyears all mean very different things in the real world! One of the most famous examples of this is the failure of the Mars Climate Orbiter – a spacecraft sent to observe Mars which failed when some data was provided in the wrong units.

The particular situation in which we came across this problem was in some software for processing shipping data. This data was derived from a range of sources, all of which used different units. We needed to make sure that we were associating each measurement with the right units, so that we could then accurately compare measurements.

When you need to deal with units in Python, the answer is almost always to use the pints library. This provides a great Quantity object which stores a numerical value alongside its units. These objects can be created really easily using the multiplication operator. For example:

distance = 5.2 * unit_registry.metres

(Yes, as someone asked in the poster session, both spellings of metres/meters are accepted!)

Pint has data on almost every unit you could think of built-in, along with their conversion factors, their unit abbreviations, and so on.

Once you’ve got a Quantity object like that, you can do useful things like convert it to another unit:

distance.to(unit_registry.miles)

This is great, and we built this in to our software from an early stage, using Quantity objects as much as possible. However, we then needed to store these measurements in a database. Originally our code looked like this:

# Read measurement from file
measurement = 50.2
# Assign units to it
measurement = measurement * unit_registry.yards
# Convert to units we want in the database
measurement_in_m = measurement.to(unit_registry.metre)
# Store in the database
db_model.distance = measurement

This was very error-prone, as we could forget to assign the proper units, or forget to convert to the units we were using in the database. We wanted a way for our code to stop us making these sorts of mistakes!

We found we could do this by implementing a hybrid_property in our SQLAlchemy model, which would check our data and do any necessary conversions before setting the value in the database. These work in the same way as standard ‘getter and setter’ properties on objects, where they run some code when you set or get a value from an attribute.

As is often the case with using getter/setters methods, the actual value is stored in a variable prefixed with an underscore – such as _distance, and the getter/setter names are distance.

In the getter we grab the value from _distance, which is always stored in metres, and return it as a Quantity object with the units set to metres.

In the setter, we check that the value we’re passing has a unit assigned, and then check the ‘dimensionality’ of the unit – for example, we check it is a valid length unit, or a valid speed unit. We then convert it to metres and store it in the _distance member variable.

For more details on how this all works, have a look at the example code.

This is ‘good enough’ for the work I’m doing at the moment, but I’m hoping to find some time to look at extending this: someone at the conference suggested that we could get rid of some of the boilerplate here by using factories or metaclasses, and I’d like to investigate that.


If you found this post useful, please consider buying me a coffee.
This post originally appeared on Robin's Blog.


Categorised as: Programming, Python


One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *