Jupyter (formerly known as IPython) notebooks are great – but have you ever accidentally deleted a cell that contained a really important function that you want to keep? Well, this post might help you get it back.
So, imagine you have a notebook with the following code:
and then you accidentally delete the top cell, with the definition of your function…oops! Furthermore, you can’t find it in any of your ‘Checkpoints’ (look under the File menu). Luckily, your function is still defined…so you can still run it:
This is essential for what follows…because as the function is still defined, the Python interpreter still knows internally what the code is, and it gives us a way to get this out!
So, if you’re stuck and just want the way to fix it, then here it is:
Just call this as rescue_code(f), or whatever your function is, and a new cell should be created with the code of you function: problem solved! If you want to learn how it works then read on…
The code is actually very simple, inspect.getsourcelines(function) returns a tuple containing a list of lines of code for the function and the line of the source file that the code starts on (as we’re operating in a notebook this is always 1). We extract the 0th element of this tuple, then join the lines of code into one big string (the lines already have \n at the end of them, so we don’t have to deal with that. The only other bit is a bit of IPython magic to create a new cell below the current cell and set its contents….and that’s it!
I hope this is helpful to someone – I’m definitely going to keep this function in my toolkit.
Another instalment in my Previously Unpublicised Code series…this time RPiNDVI, my code for displaying live NDVI images from the Raspberry Pi NoIR camera.
It isn’t perfect, and it isn’t finished – but it does the job as a proof-of-concept. If you point the camera out of your window you should see high NDVI values (white) over vegetation, and low NDVI values (black) over various other things (particularly the sky!).
This is the point at which I would like to include a screenshot of the program running…but unfortunately I can’t actually find my Raspberry Pi to run it! (I guess that’s the problem with small computers…).
I can’t say the code is exceptionally exciting – it’s only about 100 lines – but it might be useful to someone. It demonstrates how to do real-time (or near-real-time) processing of video from the Raspberry Pi camera using OpenCV, and also has a few handy functions for doing contrast stretching of imagery and combining multiple images on a single display.
As always, the code is available at Github, along with a list of requirements – so have fun!
I wrote my PhD thesis in LaTeX, and stored all of the files in my Dropbox folder. Dropbox stores previous versions of your files – for up to 30 days if you are on their free plan. Towards the end of my PhD, I realised that I could write a fairly simple Python script that would grab all of these previous versions, which I could then use to do some interesting analyses. So – over a year after my thesis was submitted, I’ve finally got around to looking at the data.
I should point out here that this data comes from a sample size of one – and so if you’re writing a PhD thesis then don’t compare your speed/volume/length/whatever to me! So, with that disclaimer, on to how I did it, and what I found…
Getting the data
I wrote a nice simple class in Python to grab all previous versions of a file from Dropbox. It’s available in the DropboxBasedWordCount repo on Github – and can be used entirely independently from the LaTeX analysis that I did. It is really easy to use, just grab the DropboxDownloader.py file, install the Dropbox library (pip install Dropbox) and run something like this:
from DropboxDownloader import DropboxDownloader
# Initialise the object and give it the folder to store its downloads in
d = DropboxDownloader('/Users/robin/ThesisFilesDropboxLog')
# Download all available previous versions
d.download_history_for_files("/Users/robin/Dropbox/_PhD/_FinalThesis", # Folder containing files to download
"*.tex", # 'glob' string specifying files to download
"/Users/robin/Dropbox/") # Path to your Dropbox folder
The code inside the DropboxDownloader class is actually quite simple – it basically just calls the revisions method of the DropboxClient object, does a bit of processing of filenames and timestamps, and then grabs the file contents with the get_file method, making sure to set the rev parameter appropriately.
Counting the words
Now we have a folder (or set of folders) full of files, we need to actually count the words in them. This will vary significantly depending on what typsetting system you’re using, but for LaTeX we can use the wonderful texcount. You’ll probably find it is installed automatically with your TeX distribution, and it has a very comprehensive set of documentation that I’ll let you go away and read…
For our purposes, we wanted a simple output of the total number of words in the file, so I ran it as:
texcount -brief -total -1 -sum file.tex
I ran this from Python using subprocess.Popen (farbetter than os.system!) for each file, combining the results into a Pandas DataFrame.
Doing the analysis
Now we get to the interesting bit: what can we find out about how I wrote my thesis. I’m not going to go into details about exactly how I did all of this, but I will occasionally link to useful Pandas or NumPy functions that I used.
When you get hold of some data – particularly if it is time-series – then it is always good to plot it and see what it looks like. The pandas plot function makes this very easy – and we can easily get a plot like this:
This shows the total word count of my thesis over time. I didn’t have the idea of writing this code until well into my PhD, so the time series starts in June 2014 when I was busy working on the practical side of my PhD. By that point I had already written some chapters (such as the literature review), but I didn’t really write anything else until early August (exactly the 1st August, as it happens). I then wrote quite steadily until my word count peaked on the 18th September, around the time that I submitted my final draft to my supervisors. The decrease after that was me removing a number of ‘less useful’ bits on advice from them!
Overall, I wrote 22,317 words between those two dates (a period of 48 days), which equates to an average of 464 words a day. However, on 22 of those days I wrote nothing – so on days that I actually wrote, I wrote an average of 858 words. My maximum number of words written in one day was 2,516, and the minimum was was -7,139 (when I removed a lot!). The minimum-non-zero was 5 words…that must have been a day when I was lacking in inspiration!
Some interesting graphs
One thing that I thought would be interesting would be to look at the total number of words I wrote each day of the week:
This shows a very noticeable tailing off as the week goes on, and then a peak again on Saturday. However, as this is a sum over the whole period it may hide a lot of interesting patterns. To see these, we can plot a heatmap showing the total number of words written each day of each week:
It seems like weeks 6 and 7 were very productive, and things tailed off gradually over the whole period, until the last week when they suddenly increased again (note that some of the very high values were when I copied things I’d written elsewhere into my main thesis documents).
Looking at the number of words written over each hourly period is very easy in Pandas by grouping by the hour and then applying the ohlc function (Open-High-Low-Close), and then subtracting the Open value (number of words at the start of the hour) from the Close value (number of words at the end of the hour). Again, we can look at the total number of words written in each hour – summed across the whole period:
This shows that I had a big peak just after lunchtime (I tend to take a fairly early lunch around 12:00 or 12:30), with some peaks before breakfast (around 8:00) and after breakfast (10:00) – and similarly around the time of my evening meal (18:00), and then increasing as a bit of late work before bed. Of course, this shows the total contribution of each of these hours across the whole writing period, and doesn’t take into account how often I actually did any writing during these periods.
To see that we need to look at the mean number of words written during each hourly period:
This still shows a bit of a peak around lunchtime, but shows that by far my most productive time was early in the morning. Basically, when I wrote early in the morning I got a lot written, but I didn’t write early in the morning very often!
As before, we can look at this in more detail in a heatmap, in this instance by both hour of the day and day of the week:
You can really start to see my schedule here. For example, I rarely wrote much on Sunday mornings because I was at church, but wrote quite effectively once I got back from work. I wrote very little around my evening meal time, and wrote very little on Monday mornings or Friday afternoons – which makes sense!
So, I hope you enjoyed this little tour through my thesis writing. All of the code for grabbing the versions from Dropbox is available on Github, along with a (very badly-written and badly-documented) notebook.
Summary: When you type script.py at the Command Prompt on Windows, the Python executable used to run the script is not the first python.exe file found on your PATH, it is the the executable that is configured to run .py files when you double-click on them, which is configured in the registry.
I ran into a strange problem on Windows recently, when I was trying to run one of the GDAL command-line Python scripts (I think it was gdal_merge.py). I had installed GDAL in my conda environment, and gdal_merge.py was available on my PATH, but when I ran it I got an error saying that it couldn’t import the gdal module. This confused me a bit, so I did some more investigation.
I eventually ended up editing the gdal_merge.py script and adding a few lines at the top
This showed me that the script was being run by a completely different Python interpreter, with a completely separate site-packages folder – so it was hardly surprising that it couldn’t find the gdal library. It turns out that this ‘other’ Python interpreter was the one installed automatically by ArcGIS (hint: during the ArcGIS setup wizard, tell it to install Python to c:\ArcPython27, then it’s easy to tell which is which). But, how could this be, as I’d removed anything to do with the ArcGIS Python from my PATH…?
After a bit of playing around and Googling things, I found that when you type something like gdal_merge.py at the Command Prompt it doesn’t look on your PATH to find a python.exe file to execute the file with…instead it does the same thing as it would do if you double-clicked on the Python file in Explorer. This is kind of obvious in retrospect, but I spent a long time working it out!
The upshot of this is that if you want to change the Python installation that is used, then you need to change the Filetype Assocation for .py files. This can be done by editing the registry (look at HKEY_CLASSES_ROOT\Python.File\shell\open\command) or on the command-line using the ftype command (see here and here).
So, I really like the Jupyter notebook (formerly known as the IPython notebook), but I often find myself missing the ‘fancy’ features that ‘proper’ editors have. I particularly miss the amazing multiple cursor functionality of editors like Sublime Text and Atom.
I’ve known for a while that you can edit a cell in your default $EDITOR by running %%edit at the top of the cell – but I’ve recently found out that you can configure Jupyter to use Sublime Text-style keyboard shortcuts when editing cells in the notebook – all thanks to CodeMirror, the javascript-based text editor component that the Jupyter notebook uses. Brilliantly, this also brings with it the multiple-cursor functionality! So, you can get something like this:
So, how do you do this? It’s really simple.
Find your Jupyter configuration folder by running jupyter --config-dir
Open the custom.js file in the custom sub-folder in your favourite editor
Add the following lines to the bottom of the file
require(["codemirror/keymap/sublime", "notebook/js/cell", "base/js/namespace"],
function(sublime_keymap, cell, IPython) {
// setTimeout(function(){ // uncomment line to fake race-condition
cell.Cell.options_default.cm_config.keyMap = 'sublime';
var cells = IPython.notebook.get_cells();
for(var cl=0; cl< cells.length ; cl++){
cells[cl].code_mirror.setOption('keyMap', 'sublime');
}
// }, 1000)// uncomment line to fake race condition
}
);
That should be it – if you start a notebook now all of the Sublime Text shortcuts should be working!
I recently saw Michael Galloy’s post at http://michaelgalloy.com/2016/02/18/ten-little-idl-programs.html, showing some short (less than ten lines long) programs in IDL. I used to do a lot of programming in IDL, but have switched almost all of my work to Python now – and was intrigued to see what the code looked like in Python.
I can’t guarantee that all of my Python code here will give exactly the same answer as the IDL code – but the code should accomplish the same aim. I’ve included the IDL code that Michael provided, and for each example I provide a few comments about the differences between the Python and IDL code. I haven’t shown the output of the IDL examples in the notebook (yes, I know I can run IDL through the Jupyter Notebook, but I don’t have that set up on this machine.
Firstly, we import the various modules that we need for Python. This is rather different to IDL, where all of the functionality for these programs is built in – but is not necessarily a disadvantage, as it allows an easier separation of functionality, allowing you to only include the functions you need. I’m going to ‘cheat’ slightly, and not count these import lines in the number of lines of code used below – which I think is fair.
It is also worth noting that counting the number of lines of a bit of Python code is rather arbitrary – because although whitespace is syntactically important, you can still often combine multiple lines into one line. For example:
a=5;b=10;print(a+b)
15
Here I’m going to keep to writing Python in a sensible, relatively standard (hopefully PEP8 compliant) way.
1 line: output, calling a procedure:
IDL
print,'Hello, world!'
Python
print("Hello, world!")
Hello, world!
These are almost exactly the same…not much to see here!
2 lines: assignment, calling a function, system variables, array operations, keywords:
This is fairly similar, the major differences being the use of the np. prefix on various functions as they are part of the numpy library (this can be avoided by importing numpy as from numpy import *, but that is not recommended). The only other real differences are the use of a function to convert from degrees to radians, rather than a constant conversion factor, and the differences in the name of the function that produces an array containing a range of values – I personally found findgen always made me thinking of FINDing something, rather than Floating INDex GENeration, but that’s just me!
3 lines: input, output format codes:
IDL
name=''read,'What is your name? ',nameprint,name,format='("Hello, ", A, "!")'
Python
name=input('What is your name? ')print("Hello {name}!".format(name=name))
What is your name? Robin
Hello Robin!
This is the first example where the lengths differ – and Python is very slightly shorter. The only reason for this is that IDL requires you to initialise the name variable before you can read into it, whereas Python does not. I prefer the way that the formatting of the string works in Python – although this is but one of multiple ways of doing it in Python. For reference, you could also do any of the following:
This example is also slightly shorter in Python, mainly because we don’t have to create the display window manually, and therefore we don’t need to find out the size of the image before-hand. On the other hand, Python has no way to set the title of a plot in the call to the plotting function (in this case imshow, which I personally think is a more understandable name than tv), which adds an extra line.
6 lines: logical unit numbers, read binary data, contour plots, line continuation:
<matplotlib.contour.QuadContourSet at 0x1182a4c50>
This is also shorter, although I must admit that I haven’t configured the contour levels manually as was done in the IDL code – as I often find I don’t need to that. Again, you can see that we don’t need to create the array before we read in the file, and we don’t have to deal with all of the opening, reading and closing of the file as the np.fromfile function does all of that for us. (If we did want to work at a lower level then we could – using functions like open and close). I’ve also shown a line continuation in Python, which in many circumstances works with no explicit ‘continuation characters’ – even though it wasn’t really needed in this situation.
7 lines (contributed by Mark Piper): query image, image processing, automatic positioning of images:
Here the Python version is longer than the IDL version – although the majority of this length comes from the subplot commands which are used to combine multiple plots into one window (or one output image). Apart from that, the majority of the code is very similar – albeit with some extra parameters for the python imshow command to force nearest-neighbour interpolation and a gray-scale colormap (though these can easily be configured to be the defaults).
8 lines: writing a function, compile_opt statement, if statements, for loops:
defmg_fibonacci(x):ifx==0:return0ifx==1:returnxelse:returnmg_fibonacci(x-1)+mg_fibonacci(x-2)foriinrange(10):# Only 10 lines of output to keep the blog post reasonably short!print(i,mg_fibonacci(i))
0 0
1 1
2 1
3 2
4 3
5 5
6 8
7 13
8 21
9 34
The code here is almost the same length (9 lines for Python and 8 for IDL), even though the Python code looks a lot more ‘spacious’. This is mainly because we don’t need the .compile or compile_opt lines in Python. Apart from that, the code is very similar with the main differences being Python’s use of syntactic whitespace and use of ‘proper’ equals signs rather than IDL’s eq (and gt, lt etc).
9 lines (contributed by Mark Piper): array generation, FFTs, line plots, multiple plots/window, query for screen size:
The Python code is a lot longer here, but that is mainly due to Python requiring a separate function call to set each piece of text on a plot (the title, x-axis label, y-axis label etc). Apart from that there aren’t many differences beyond those already discussed above.
I’m going to go through the Python code in a few bits for this one…
Firstly, reading CSVs in Python is really easy using the pandas library. The first six lines of IDL code can be replaced with this single function call:
And you can print out the DataFrame and check that the CSV has loaded properly:
df
lon
lat
elev
temp
dew
wspd
wdir
0
-156.9500
20.7833
399
68
64
10
60
1
-116.9667
33.9333
692
77
50
8
270
2
-104.2545
32.3340
1003
87
50
10
340
3
-114.5225
37.6073
1333
66
35
0
0
4
-106.9418
47.3222
811
68
57
8
140
5
-94.7500
31.2335
90
89
73
10
250
6
-73.6063
43.3362
100
75
64
3
180
7
-117.1765
32.7335
4
64
62
5
200
8
-116.0930
44.8833
1530
55
51
0
0
9
-106.3722
31.8067
1206
82
57
9
10
10
-93.2237
30.1215
4
87
77
7
260
11
-109.6347
32.8543
968
80
46
0
0
12
-76.0225
43.9867
99
75
66
7
190
13
-93.1535
36.2597
415
86
71
10
310
14
-118.7213
34.7395
1378
71
46
5
200
Unfortunately the code for actually plotting the map is a bit more complicated, but it does lead to a nice looking map. Basically, the code below creates a map with a specified extent: this is controlled by the keyword arguments called things like llcrnrlat. I usually find that Python has more understandable names than IDL, but in this case they’re pretty awful: this stands for “lower-left corner latitude”.
Once we’ve created the map, and assigned it to the variable m, we use various methods to display things on the map. Note how we can use the column names of the DataFrame in the scatter call – far nicer than using column indexes (as it also works if you add new columns!). If you un-comment the m.shadedrelief() line then you even get a lovely shaded relief background…
Just as a little ‘show off’ at the end of this comparison, I wanted to show how you can nice interactive maps in Python. I haven’t gone into any of the advanced features of the folium library here – but even just these few lines of code allow you to interactively move around and see where the points are located: and it is fairly easy to add colours, popups and so on.
So, what has this shown?
Well, I don’t want to get in to a full IDL vs Python war…but I will try and make a few summary statements:
Sometimes tasks can be achieved in fewer lines of code in IDL, sometimes in Python – but overall, the number of lines of code doesn’t really matter: it’s far more important to have clear, easily understandable code.
The majority of the tasks are accomplished in a very similar way in IDL and Python – and with a bit of time most experienced programmers could work out what code in either language is doing.
A number of operations can be achieved in a simpler way using Python – for example, reading files (particularly CSV files) and displaying plots – as they don’t require the extra boilerplate code that IDL requires (to do things like get the screen size, open a display window, create an empty array to read data into etc).
Most IDL plotting functions take arguments allowing you to set things like the x-axis label or title of the plot in the same function that you use to plot the data – whereas Python requires the use of separate functions like xlabel and title.
I tend to find that Python has more sensible names for functions (things like arange rather than findgen and imshow rather than tv) – but that is probably down to personal taste.
In my opinion, Python’s plots look better by default than IDLs plots – and if you don’t like the standard matplotlib style then they can be changed relatively easily. I’ve always struggled to get IDL plots looking really nice – but that may just be my lack of expertise.
IDL has a huge amount of functionality ‘baked-in’ to the language by default, whereas Python provides lots of functionality through external libraries. Many of the actual functions are almost exactly equivalent – however there a number of disadvantages of the library-based approach, including issues with installing and updating libraries, lack of support for some libraries, trying to choose the best library to use, and the extra ‘clutter’ that comes from having to import libraries and use prefixes like np..
Overall though, most things can be accomplished in either language. I prefer Python, and do nearly all of my programming in Python these days: but it’s good to know that I can still drop back in to IDL if I need to – for example, when interfacing with ENVI.
The Jupyter Notebook used to create this post is available to download here.
I have a Coding bookmarks folder which is stuffed full of loads of interesting articles that I’ve never shared with anyone because they don’t really fit into any of my posts. So, taking an idea from The Old New Thing, I’m going to run a few ‘Link Clearance’ posts. This is the Python-focused one (there will be more soon, including a general programming one).
(Yes, I know it is now the middle February 2016, but things got delayed a bit! Most of these links are from 2015 – with a few more recent ones added too)
General Python:
Elements of Python Style: This Python style guide goes beyond PEP8 to give useful advice on the subtler art of writing high-quality Python code.
Supporting Python 3: An in-depth guide: Now you’ve decided you want to use Python 3, you need to make your code work with it. This is a good place to start.
Python 2.7 Quick Reference: Very comprehensive (and not necessarily ‘quick’) reference for Python 2.7 (but also mostly applicable to Python 3). Great to have open and rapidly search with Ctrl-F.
Python String Format Cookbook: I’m sure I’m not the only person who struggles to remember some of the more complex options for new-style string formatting in Python – this should help
The ever useful and neat subprocess module: A very comprehensive guide to this powerful – but sometimes rather complex – module. Please use this instead of os.system – it may be slightly harder to get started, but it will help you in the long run.
Hands-On Introduction to Python Programming: Very detailed slides and notes (use t to switch between them) for a course in Python programming. Rather than just showing you how to do things, this takes you inside the language showing how things actually work.
Modules and Packages – Live and Let Die: Slides from a presentation taking an in-depth look at how Python modules and packages work (note: not package managment via pip, but modules and packages themselves). There’s a lot in here that I never knew before!
Bayesian Methods for Hackers: An introduction to Bayesian methods from a programming-perspective – also book-length and definitely worth a read.
Think Bayes: If you didn’t like the previous book relying on the PyMC module then you might prefer this one – it teaches similar concepts but with pure Python (with a bit of numpy later on). It gave me a far better understanding of probability in general – not just Bayesian thinking.
Kalman and Bayesian filters in Python: Yup, yet another book – but I promise this is the last one. It covers some of what has been covered in the two previous books, but goes into a lot of depth about Kalman filters, in a very easy-to-understand way.
100 numpy exercises: This link is actually far more interesting than it sounds – it’s amazing what can be done in numpy in very few lines of code. I’d recommend starting at the top and seeing how many of the exercises you can complete…and then looking at the answers which will probably teach you a lot!
Pandas and Python: Top 10: A great introduction to useful pandas features, I often use this as a reference for functions that confuse me slightly (like map, apply and applymap
Python GDAL/OGR Cookbook!: Some good ‘cookbook’-style examples of using the Python interface to GDAL/OGR (for reading/writing geographic data). Particularly useful as the main GDAL docs are focused on the C++ interface
Fitting models using R-style formulas:Have you ever wished for R-style formulas for fitting models in Python? Well, look no further – it can be done easily using a combination of statsmodels and patsy
Probability distributions in SciPy: A great brief summary of probability distributions included in scipy, and how to use the various methods available on them
Overview of Python Visualization: Visualisation options for Python were a lot less confusing when the only option was matplotlib! This should help you navigate the range of options now available
What is your Jupyter workflow like?: As with many Reddit discussions, there is some gold buried amongst the less-useful comments. I definitely learnt some new ways of working.
pypath-magic: A handy command-line tool and IPython magic to allow you to easily change your PYTHONPATH – very useful!
MoviePy: Lovely simple interface to make animations/videos in Python – using whatever libraries/functions you want to create the actual images
SWAPY: A simple GUI to allow you to interactively generate pywinauto scripts to automate functions on Windows. Even better is that you can then edit the resulting Python code if you want – far nicer than switching to something like AutoHotKey
Glue: A great Python-based GUI for exploring data relationships, principally based on ‘linked displays’. All functionality is available through the Python API too – and the documentation is great.
Gloo: I really loved the ProjectTemplate library for R, but somehow never quite got as comfortable with this port of the library to Python. I really should try again – as the idea of a standardised structure for all analysis projects is very appealing.
pudb: Interactive, curses-style debugger, even accessible remotely and through IPython. I must remember to use this more!/li>
pony: An interesting new Object-Relational Model, a potential competitor to SQLAlchemy. I like its pythonic-nature
pyserial: Simple and easy-to-use library for serial communications in Python. I’ve used this for connecting to scientific instruments as well as for home automation.
xmltodict: This makes working with XML feel like you are working with JSON, by parsing XML data to a dict. You wouldn’t want to use it on enormous XML files, but for quick scripts it’s great!
uncertainties: A very easy-to-use package that lets you do calculations with uncertain numbers (eg. 3 +/- 0.3) – even in numpy arrays
pathlib: Do you hate os.path.join as much as I do? How does dir / output_folder / filename seem instead? A great pythonic path-handling package, which is a part of the standard library since Python 3.4. This package allows you to get the same functionality in previous versions.
fuzzywuzzy: Simple but comprehensive fuzzy string matching library
blessings: The easiest way to introduce colour, font styles and positioning to your terminal programs
PrettyPandas: Handy API for making nicely-formatted Pandas tables
pandas-profiling: I think this is slightly misleadingly named: it doesn’t do profiling in a ‘speed’ sense, but in a ‘summary’ sense. Basically it’ll produce a lovely HTML summary of your Pandas DataFrame, with a huge amount of detail
PyDataset Do you envy R programmers with their handy access to various nice test datasets as data(cars) and so on? Well, this does the same for Python – with an even larger range of data
pyq Allows you to search Python code using jQuery-like selectors, such as class:extends(#IntegerField) for all classes that extend the IntegerField class. Fascinating, and I can see all sorts of interesting uses for this…if only I had the time!
Conda
I use the Anaconda scientific Python distribution to get a standard, easily-configurable Python set up on all of my machines. I’m not going to give full details for each of these links, as they are fairly self-explanatory – but definitely very useful for those using Anaconda.
The most difficult part of programming is designing and structuring your code: the actual ‘getting the computer to do what you want’ bit is often relatively easy. This becomes particularly difficult with larger projects. The links below are all interesting discussions of software architecture with a Python focus. I find the 500 Lines or Less posts to be particularly interesting: they all implement challenging programs in relatively short pieces of code. They’ll all be released in book form eventually – and I’m definitely going to buy a copy!
Summary: Microsoft now provides a single, small installer to get all that you need to compile Python 2.7 binary packages on Windows!
This is just a brief post to share the news on something that I didn’t know about until yesterday – but that would have saved me a lot of trouble!
You may have experienced this situation: you’re trying to install a Python package on Windows, and you run pip install packagename but get loads of errors because Python can’t find a C compiler on your system. This usually manifests itself as an error about vcvarsall.bat – but all sorts of other errors point to the same problem.
Often the easiest way to solve this is to go to Christoph Gohlke‘s wonderful page which has Windows binary downloads for loads of useful Python packages. It is very comprehensive, but sometimes I find a package that isn’t available – or I want to compile a development build of a package for some reason.
Previously my strategy was to install the whole of Microsoft Visual Studio and muck around with the paths etc until it worked. However, yesterday I found a very useful download on the Microsoft website. Downloading that file, and running through the install process, gets everything that you need all set up – and then my pip install command just worked!
This was where I was going to leave this article…however the next time I tried a pip install command on the same machine I ran into problems. I’ve absolutely no idea why…but it seems that sometimes (and no, I don’t know why it isn’t always the time) you need to use the link you’ll find in your Start Menu for the Microsoft Visual C++ for Python Command Prompt, rather than a normal Command Prompt window. Once you’ve loaded up this command prompt, run the following, and then your normal pip install command:
SET DISTUTILS_USE_SDK=1
SET MSSdk=1
The only problem is that this is only available for Python 2.7. For Python 3.x you still have to go through the whole process of downloading Microsoft Visual Studio and mucking about with everything (see this link for some guidance). Hopefully Microsoft will make a similar download available for Python 3.x soon – downloading 6Gb of Visual Studio rubbish just to get a tiny Python package installed is just silly!
This isn’t normal content for my blog, but I thought a post here might reach people who would be interested in the job. Don’t worry, normal service will be resumed shortly – this isn’t going to turn into a job listing site!
A research assistant is required to assist with the development of an algorithm to monitor air quality at high-resolution from satellite images. Your day-to-day work will involve investigating extensions to the algorithm, implementing these extensions in Python code, and automating the processing and validation of large volumes of satellite imagery. You should have good scientific Python programming skills (including experience with libraries such as numpy, scipy and pandas), along with experience with one or more of remote sensing, image processing, computer vision and GIS.
You will be collaborating closely with the original developer of the algorithm, who is based at the University of Southampton and working closely with the Flowminder Foundation. The work is currently fixed-term for 2-5 months full-time on a contract basis, with pay equivalent to a salary of around £30,000pa. There is potential for contract extension and/or further work with the Flowminder Foundation. Part-time, remote or flexible working may be possible, and this role may suit a recent MSc or PhD graduate.
So, last time we worked out how communications were encrypted and managed to read the current status of the heating system (whether the boiler is on or not, the current temperature, and so on). That’s great – but it’d be even better if we could actually control the thermostat from Python: set the temperature, change from timer mode to manual mode etc. That’s what we’re going to focus on today.
So, using the same ‘man-in-the-middle’ approach I monitored the communications from the app as I changed various settings. When changing the temperature I got a message like this:
If we decode the text at the bottom we find that it decodes as:
{"value":16}\x00\x00\x00\x00
This looks like JSON, with a bit of null-padding at the end (presumably so that the encryption routine is given data with the length divisible by a certain number) – and it makes sense, as I set the thermostat to 16 degrees.
So, if we send this message (but with a different number) then presumably the temperature setting will change? Well…sort of!
You see, changing the temperature depends on what mode you are in. There are two modes, and the UMD part of the status message tells you which one you are in: manual or clock. If you’re in manual mode then you can change the temperature by simply sending a PUT message (like the one above) to /heatingCircuits/hc1/temperatureRoomManual, with JSON of {"value":21} (or whatever temperature you want).
However, if you’re in clock mode then you have to set the ‘override temperature’ (a PUT message to /heatingCircuits/hc1/manualTempOverride/temperature) with the same JSON as above, and you then have to turn the ‘temperature override function’ on (a PUT message to /heatingCircuits/hc1/manualTempOverride/status with the JSON {"value": "on"}).
Oh, and if you want to change the mode then you can just send a PUT message to /heatingCircuits/hc1/usermode with the JSON {"value": "clock"} or {"value":"manual"}.
You may be wondering whether you get a response from these messages or not: you do, but they’re not very interesting. Unless there has been an error, all you get is:
No Content
Content-Type: application/json
connection: close
There are loads of other messages you can send to do all sorts of complicated things (such as changing the timer programme), but I haven’t bothered to investigate any of those yet. I know they will use the same format as these messages, they’ll just have slightly more complicated JSON payloads, and may require the sending of multiple messages. I’m just happy that I can read the status of my thermostat, and control basic settings (mode and temperature)!
So, I haven’t actually mentioned Python that much in this series so far (sorry!) – although, in fact, most of my ‘trial and error’ work was done through Python using the great sleekxmpp library. I have to confess here that I haven’t written the code as well as I should have done: I should really have designed it to implement a proper Finite State Machine, and send and receive the appropriate messages at the appropriate times, all while updating internal information in the Python classes…but I didn’t! Sorry – dealing with all of that, working asynchronously, was too much like hard work.
So, I wrote a BaseWaveMessageBot class that implemented connecting, sending messages, encoding and decoding message payloads and a bit of simple error handling. That class has all of the complicated stuff in it, so I then wrote a couple of very simple classes (StatusBot and SetBot) that send the appropriate messages and process the responses. I then combined these in a nice class called WaveThermo. Currently there aren’t many methods in WaveThermo, but I will gradually add more functionality as I need it.
The code is available on Github and is fairly easy to use:
Of course, I’ve only tested it with my thermostat – so if it doesn’t work for you then please let me know!
So, that’s it for this time – next time I’ll talk about a bit of the work I’ve done with automated monitoring of the temperature and the thermostat state, and some of the interesting patterns I’ve found.