Pint + SQLAlchemy = Unit consistency and enforcement in your database
Last week I presented a poster at PyData Global 2020, about linking the pint and SQLAlchemy libraries together to provide robust handling of units with databases in Python.
The poster is shown below: click to enlarge so you can read the text:
The example code is available on Github and is well-commented to make it fairly easy to understand.
That poster was designed to be read when you had the opportunity to chat to me about it: as that isn’t necessarily the case, I’ll explain some of it in more detail below.
Firstly, the importance of units: I’m sure you can come up with lots of examples of situations when it’s really important to know what units your data are in. 42 metres, 42 miles and 42 lightyears all mean very different things in the real world! One of the most famous examples of this is the failure of the Mars Climate Orbiter – a spacecraft sent to observe Mars which failed when some data was provided in the wrong units.
The particular situation in which we came across this problem was in some software for processing shipping data. This data was derived from a range of sources, all of which used different units. We needed to make sure that we were associating each measurement with the right units, so that we could then accurately compare measurements.
When you need to deal with units in Python, the answer is almost always to use the pints library. This provides a great
Quantity object which stores a numerical value alongside its units. These objects can be created really easily using the multiplication operator. For example:
distance = 5.2 * unit_registry.metres
(Yes, as someone asked in the poster session, both spellings of metres/meters are accepted!)
Pint has data on almost every unit you could think of built-in, along with their conversion factors, their unit abbreviations, and so on.
Once you’ve got a
Quantity object like that, you can do useful things like convert it to another unit:
This is great, and we built this in to our software from an early stage, using Quantity objects as much as possible. However, we then needed to store these measurements in a database. Originally our code looked like this:
# Read measurement from file measurement = 50.2 # Assign units to it measurement = measurement * unit_registry.yards # Convert to units we want in the database measurement_in_m = measurement.to(unit_registry.metre) # Store in the database db_model.distance = measurement
This was very error-prone, as we could forget to assign the proper units, or forget to convert to the units we were using in the database. We wanted a way for our code to stop us making these sorts of mistakes!
We found we could do this by implementing a hybrid_property in our SQLAlchemy model, which would check our data and do any necessary conversions before setting the value in the database. These work in the same way as standard ‘getter and setter’ properties on objects, where they run some code when you set or get a value from an attribute.
As is often the case with using getter/setters methods, the actual value is stored in a variable prefixed with an underscore – such as
_distance, and the getter/setter names are
In the getter we grab the value from
_distance, which is always stored in metres, and return it as a Quantity object with the units set to metres.
In the setter, we check that the value we’re passing has a unit assigned, and then check the ‘dimensionality’ of the unit – for example, we check it is a valid length unit, or a valid speed unit. We then convert it to metres and store it in the
_distance member variable.
For more details on how this all works, have a look at the example code.
This is ‘good enough’ for the work I’m doing at the moment, but I’m hoping to find some time to look at extending this: someone at the conference suggested that we could get rid of some of the boilerplate here by using factories or metaclasses, and I’d like to investigate that.
Categorised as: Programming, Python
[…] Read More […]