Robin's Blog

My top 5 ‘new’ Python modules of 2015

As I’ve been blogging a lot more about Python over the last year, I thought I’d list a few of my favourite ‘new’ Python modules from 2015. These aren’t necessarily modules that were newly released in 2015, but modules that were ‘new to me’ this year – and may be new to you too!

tqdm

This module is so simplbut so useful – it makes it stupidly easy to display progress bars for loops in your code. So, if you have some code like this:

for item in items:
    process(item)

Just wrap the iterable in the tqdm function:

from tqdm import tqdm

for item in tqdm(items):
    process(item)

and you’ll get a beautiful progress bar display like this:
18%|############ | 9/50 [00:09<;00:41,  1.00it/s

I introduced my wife to this recently and it blew her mind – it’s so beautifully simple, and so useful.

joblib

I had come across joblib previously, but I never really ‘got’ it – it seemed a bit of a mismash of various functions. I still feel like it’s a bit of a mishmash, but it’s very useful. I was re-introduced to it by one of my Flowminder colleagues, and we used it extensively in our data analysis code. So, what does it do? Well, three main things: 1) caching, 2) parallelisation, and 3) persistence (saving/loading data). I must admit that I haven’t used the parallel programming functionality yet, but I have used the other functions extensively.

The caching functionality allows you to easily ‘memoize’ functions with a simple decorator. This caches the results, and loads them from the cache when calling the function again using the same parameters – saving a lot of time. One tip for this is to choose the arguments of the function that you memoize carefully: although joblib uses a fairly fast hashing function to compare the arguments, it can still take a while if it is processing an absolutely enormous array (many Gigabytes!). In this case, it is often better to memoize a function that takes arguments of filenames, dates, model parameters or whatever else is used to create the large array – cutting out the loading of the large array and the hashing of that array on each call.

The persistence functionality is strongly linked to the memoization functions – as it is what is used to save the cached results to file. It basically performs the same function as the built-in pickle module (or the dill module – see below), but works really efficiently for objects that contain numpy arrays. The interface is exactly the same as the pickle interface (simple load and dump functions), so it’s really easy to switch over. One thing I didn’t realise before is that if you set compressed=True then a) your output files will be smaller (obviously!) and b) the output will all be in one file (as opposed to the default, which produces a .pkl file along with many .npy files).

folium

folium

I’ve barely scratched the surface of this library, but it’s been really helpful for doing quick visualisations of geographic data from within Python – and it even plays well with the Jupyter Notebook!

One of the pieces of example code from the documentation shows how easily it can be used:

map_1 = folium.Map(location=[45.372, -121.6972])
map_1.simple_marker([45.3288, -121.6625], popup='Mt. Hood Meadows')
map_1.simple_marker([45.3311, -121.7113], popup='Timberline Lodge')
map_1.create_map(path='output.html')

You can easily configure almost every aspect of the map above, including the background map used (any leaflet tileset will work), point icons, colours, sizes and pretty-much anything else. You can visualise GeoJSON data and do choropleth maps too (even linking to Pandas data frames!).

Again, I used this in my work with Flowminder, but have since used it in all sorts of other contexts too. Just taking the code above and putting the call to simple_marker in a loop makes it really easy to visualise a load of points.

The example above shows how to save a map to a specified HTML file – but to use it within the Jupyter Notebook just make sure that the map object (map_1 in the example above) is by itself on the final line in a cell, and the notebook will work its magic and display it inline…perfect!

tinydb

The first version of my ‘new’ module recipy (as presented at the Collaborations Workshop 2015) used MongoDB as the backend data store. However, this added significant complexity to the installation/set-up process, as you needed to install a MongoDB server first, get it running etc. I went looking for a pure-Python NoSQL database and came across TinyDB…which had a simple interface, and has handled everything I’ve thrown at it so far!

In the longer-term we are thinking of making the backend for recipy configurable – so that people can use MongoDB if they want some of the advantages that brings (being able to share the database easily across a network, better performance), but we’ll still keep TinyDB as the default as it just makes life so much easier!

dill

dill is a better pickle (geddit?). You’ve probably used the built-in pickle module to store various Python objects to disk but every so often you may have received an error like this:

---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
<ipython-input-5-aa42e6ee18b1> in <module>()
----> 1 pickle.dumps(x)

PicklingError: Can't pickle <function <lambda> at 0x103671e18>: attribute lookup <lambda> on __main__ failed

That’s because the pickle object is relatively limited in what it can pickle: it can’t cope with nested functions, lambdas, slices, and more. You may not often want to pickle those objects directly – but it is fairly common to come across these inside other objects you want to pickle, thus causing the pickling to fail.

dill solves this by simply being able to pickle a lot more stuff – almost everything, in fact, apart from frames, generators and tracebacks. As with joblib above, the interface is exactly the same as pickle (load and dump), so it’s really easy to switch.

Also, for those of you who like R’s ability to save the entire session in a .RData file, dill has dump_session and load_session functions too – which do exactly what you’d expect!

Bonus: Anaconda

This is a ‘bonus’ as it isn’t actually a Python module, but is something that I started using for the first time in 2015 (yes, I was late to the party!) but couldn’t manage without now!

Anaconda can be a little confusing because it consists of a number of separate things – multiple of which are called ‘Anaconda’. So, these are:

  • A cross-platform Scientific Python distribution – along the same lines as the Enthought Python Distribution, WinPython and so on. Once you’ve downloaded and installed it you get a full scientific Python stack including all of the standard libraries (numpy, scipy, pandas, matplotlib, sklearn, skimage…and many more). It is available in four flavours overall: Python 2 or Python 3, each of which has the option of the full Anaconda distribution, or the reduced-size Miniconda distribution. This leads nicely on to…
  • The conda package management system. This is designed to work with Python packages, but can be used for installing any binaries. You may think this sounds very much like pip, but it’s far better because a) it installs binaries (no more compiling numpy from source!), b) it deals with dependencies better, and c) you can easily create multiple environments which can have different packages installed and run different Python versions
  • The anaconda.org repository (previously called binstar) where you can create an account and upload binary conda packages to easily share with others. For example, I’ve got a couple of conda packages hosted there, which makes them really easy to install for anyone running conda.

So, there we go – my top five Python modules that were new to me in 2015. Hope you found it useful – and Merry Christmas and Happy New Year to you all!

(If you liked this post, then you may also like An easy way to install Jupyter Notebook extensions, Bokeh plots with DataFrame-based tooltips, and Orthogonal Distance Regression in Python)


If you found this post useful, please consider buying me a coffee.
This post originally appeared on Robin's Blog.


Categorised as: Programming, Python


41 Comments

  1. Francis Kim says:

    tqdm looks awesome!

  2. shams kitz says:

    tqdm – WOW!

  3. Bassem says:

    Indeed, tqdm — mind blown!

  4. Anthony Thomas says:

    how much overhead does tqdm add?

  5. Robin Wilson says:

    Good question – I’ll have a play and do a follow-up blog post!

  6. […] As I’ve been blogging a lot more about Python over the last year, I thought I’d list a few of my favourite ‘new’ Python modules from 2015. These aren’t necessarily modules that were newly released in 2015, but modules that were ‘new to me’ this year – and may be new to you too! Source: My top 5 ‘new’ Python modules of 2015 « Robin’s Blog […]

  7. EvanZ says:

    We use pip and virtualenvs. Is it that advantageous to move to conda?

  8. Dan Gayle says:

    Anthony Thomas, from the docs:

    “Overhead is low — about 60ns per iteration (80ns with gui=True). By comparison, the well established ProgressBar has an 800ns/iter overhead.”

  9. Jaysunn says:

    FTW, TQDM. What’s that fancy autocomplete he’s using in the shell?

    So cool.

    Jaysunn

  10. David says:

    A dill has never been a better pickle! Bread and butter all the way.

  11. kyaw zeya says:

    Pls share, how I can get tqdm modules?

  12. Milan says:

    Awesome Share, Most of them are new to us, Thanks

  13. Mike says:

    Does tqdm assume all iterations are of equal size. In other words, does a 100 item list count each item as 1%?… that would be unsophisticated and obviously a poor assumption but still cool.

    If it does not, then bravo! I would like to see more progress bar and other types of useful debugging modules.

  14. Nelson Liu says:

    Great article! I can see myself making frequent of use tqdm in the future.
    Just a minor correction, but joblib 0.9.3 doesn’t actually take a boolean compressed parameter. It takes an int from 0 to 9 that indicates the level of compression in a ‘compress’ parameter. See: https://pythonhosted.org/joblib/generated/joblib.dump.html

    Thanks for posting!

  15. rudyryk says:

    Amazing article, thanks Robin!

  16. Robin Wilson says:

    I think it assumes that by default, but there are all sorts of fancy ways to control it in more detail – see the docs at https://github.com/tqdm/tqdm

  17. Robin Wilson says:

    You should be able to install it with `pip install tqdm` – see https://github.com/tqdm/tqdm

  18. This is a nice Xmas present, thanks a lot for your briliant selection.

  19. nasr says:

    i find tqdm amazing!

  20. […] Wilson’s top five new Python modules for 2015 includes one which I’d not come across before but seems utterly indispensible particularly if […]

  21. bastula says:

    Looks like the fancy auto completer is from bpython: http://bpython-interpreter.org/

  22. […] ※ pickleの拡張に関してはここで紹介されていたdillもあるらしいですが、本題から逸れるので気になる方は調べてみてください。 […]

  23. […] came across a blog post on Hacker News about a few interesting Python modules and one that I found particularly useful […]

  24. Season’s greetings and thanks for your post about (among other things) our repo!

    Casper

  25. lrq3000 says:

    I’m another dev from tqdm, thank’s for the link, and thank’s to the person who notified us, because I discovered your wonderful recipy tool! I’m always looking for tools to help in reproducible research, and I’m glad to see such a tool with up-to-date technologies 🙂 On the subject, you might be interested by NeuralEnsemble’s Sumatra, which is very similar to your app’s concept.

    One question: does it work with Jupyter Notebooks? If not, do you think that would be possible to do in the future?

  26. Saumitra says:

    How to download and use..i cauld not find..guess it doesnt come with standard package

  27. Robin Wilson says:

    Most of these modules will need installing as they don’t come with Python by default. Clicking on the module name in the blog post should take you to the webpage for that module which will tell you how to install it – normally it is something like `pip install tqdm`.

  28. […] My top 5 ‘new’ Python modules of 2015 « Robin’s Blog (blog.rtwilson.com) […]

  29. Shubham Pandey says:

    awesome article about python modules .
    Thanks for sharing.

  30. Piyush Wanare says:

    tqdm is just awesome module thanks lrq3000 for creative and more helpful development and thanks to Robin for sharing it.

  31. […] last post about my favourite ‘new’ (well, new to me) Python packages seemed to be very well received. I’ll post a ‘debrief’ post within the next few […]

  32. […] My top 5 ‘new’ Python modules of 2015 […]

  33. Anabolika says:

    tqdm looks awesome!

  34. […] My top 5 ‘new’ Python modules of 2015 – Robin’s Blog ざっと目を通す。 […]

  35. […] are all packages that didn’t quite fit in to my Top 5 Python Packages of 2015 post, but are still […]

  36. pavan says:

    while using tqdm i got this error “name ‘process’ is not defined” how can i fix this?

  37. Robin Wilson says:

    Sorry, I’m not sure – I suggest you contact the tqdm developers about this.

Leave a Reply

Your email address will not be published. Required fields are marked *