My top 5 ‘new’ Python modules of 2015

December 23, 2015

As I’ve been blogging a lot more about Python over the last year, I thought I’d list a few of my favourite ‘new’ Python modules from 2015. These aren’t necessarily modules that were newly released in 2015, but modules that were ‘new to me’ this year – and may be new to you too!

tqdm

This module is so simple but so useful – it makes it stupidly easy to display progress bars for loops in your code. So, if you have some code like this:

for item in items:
    process(item)

Just wrap the iterable in the tqdm function:

from tqdm import tqdm

for item in tqdm(items):
    process(item)

and you’ll get a beautiful progress bar display like this:
18%|############ | 9/50 [00:09<;00:41, 1.00it/s

I introduced my wife to this recently and it blew her mind – it’s so beautifully simple, and so useful.

joblib

I had come across joblib previously, but I never really ‘got’ it – it seemed a bit of a mismash of various functions. I still feel like it’s a bit of a mishmash, but it’s very useful. I was re-introduced to it by one of my Flowminder colleagues, and we used it extensively in our data analysis code. So, what does it do? Well, three main things: 1) caching, 2) parallelisation, and 3) persistence (saving/loading data). I must admit that I haven’t used the parallel programming functionality yet, but I have used the other functions extensively.

The caching functionality allows you to easily ‘memoize’ functions with a simple decorator. This caches the results, and loads them from the cache when calling the function again using the same parameters – saving a lot of time. One tip for this is to choose the arguments of the function that you memoize carefully: although joblib uses a fairly fast hashing function to compare the arguments, it can still take a while if it is processing an absolutely enormous array (many Gigabytes!). In this case, it is often better to memoize a function that takes arguments of filenames, dates, model parameters or whatever else is used to create the large array – cutting out the loading of the large array and the hashing of that array on each call.

The persistence functionality is strongly linked to the memoization functions – as it is what is used to save the cached results to file. It basically performs the same function as the built-in pickle module (or the dill module – see below), but works really efficiently for objects that contain numpy arrays. The interface is exactly the same as the pickle interface (simple load and dump functions), so it’s really easy to switch over. One thing I didn’t realise before is that if you set compressed=True then a) your output files will be smaller (obviously!) and b) the output will all be in one file (as opposed to the default, which produces a .pkl file along with many .npy files).

folium

I’ve barely scratched the surface of this library, but it’s been really helpful for doing quick visualisations of geographic data from within Python – and it even plays well with the Jupyter Notebook!

One of the pieces of example code from the documentation shows how easily it can be used:

map_1 = folium.Map(location=[45.372, -121.6972])
map_1.simple_marker([45.3288, -121.6625], popup='Mt. Hood Meadows')
map_1.simple_marker([45.3311, -121.7113], popup='Timberline Lodge')
map_1.create_map(path='output.html')

You can easily configure almost every aspect of the map above, including the background map used (any leaflet tileset will work), point icons, colours, sizes and pretty-much anything else. You can visualise GeoJSON data and do choropleth maps too (even linking to Pandas data frames!).

Again, I used this in my work with Flowminder, but have since used it in all sorts of other contexts too. Just taking the code above and putting the call to simple_marker in a loop makes it really easy to visualise a load of points.

The example above shows how to save a map to a specified HTML file – but to use it within the Jupyter Notebook just make sure that the map object (map_1 in the example above) is by itself on the final line in a cell, and the notebook will work its magic and display it inline…perfect!

tinydb

The first version of my ‘new’ module recipy (as presented at the Collaborations Workshop 2015) used MongoDB as the backend data store. However, this added significant complexity to the installation/set-up process, as you needed to install a MongoDB server first, get it running etc. I went looking for a pure-Python NoSQL database and came across TinyDB…which had a simple interface, and has handled everything I’ve thrown at it so far!

In the longer-term we are thinking of making the backend for recipy configurable – so that people can use MongoDB if they want some of the advantages that brings (being able to share the database easily across a network, better performance), but we’ll still keep TinyDB as the default as it just makes life so much easier!

dill

dill is a better pickle (geddit?). You’ve probably used the built-in pickle module to store various Python objects to disk but every so often you may have received an error like this:

---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
<ipython-input-5-aa42e6ee18b1> in <module>()
----> 1 pickle.dumps(x)

PicklingError: Can't pickle <function <lambda> at 0x103671e18>: attribute lookup <lambda> on __main__ failed

That’s because the pickle object is relatively limited in what it can pickle: it can’t cope with nested functions, lambdas, slices, and more. You may not often want to pickle those objects directly – but it is fairly common to come across these inside other objects you want to pickle, thus causing the pickling to fail.

dill solves this by simply being able to pickle a lot more stuff – almost everything, in fact, apart from frames, generators and tracebacks. As with joblib above, the interface is exactly the same as pickle (load and dump), so it’s really easy to switch.

Also, for those of you who like R’s ability to save the entire session in a .RData file, dill has dump_session and load_session functions too – which do exactly what you’d expect!

Bonus: Anaconda

This is a ‘bonus’ as it isn’t actually a Python module, but is something that I started using for the first time in 2015 (yes, I was late to the party!) but couldn’t manage without now!

Anaconda can be a little confusing because it consists of a number of separate things – multiple of which are called ‘Anaconda’. So, these are:

A cross-platform Scientific Python distribution – along the same lines as the Enthought Python Distribution, WinPython and so on. Once you’ve downloaded and installed it you get a full scientific Python stack including all of the standard libraries (numpy, scipy, pandas, matplotlib, sklearn, skimage…and many more). It is available in four flavours overall: Python 2 or Python 3, each of which has the option of the full Anaconda distribution, or the reduced-size Miniconda distribution. This leads nicely on to…
The conda package management system. This is designed to work with Python packages, but can be used for installing any binaries. You may think this sounds very much like pip, but it’s far better because a) it installs binaries (no more compiling numpy from source!), b) it deals with dependencies better, and c) you can easily create multiple environments which can have different packages installed and run different Python versions
The anaconda.org repository (previously called binstar) where you can create an account and upload binary conda packages to easily share with others. For example, I’ve got a couple of conda packages hosted there, which makes them really easy to install for anyone running conda.

So, there we go – my top five Python modules that were new to me in 2015. Hope you found it useful – and Merry Christmas and Happy New Year to you all!

(If you liked this post, then you may also like An easy way to install Jupyter Notebook extensions, Bokeh plots with DataFrame-based tooltips, and Orthogonal Distance Regression in Python)

If you found this post useful, please consider buying me a coffee.
This post originally appeared on Robin's Blog.

Tagged with:

Categorised as: Programming, Python

My top 5 'new' Python modules of 2015 | Daily Hackers News says:

December 23, 2015 at 12:35 pm

[…] Source link […]

4 â€“ My top 5 'new' Python modules of 2015 says:

December 23, 2015 at 12:37 pm

[…] Read more here: https://blog.rtwilson.com/my-top-5-new-python-modules-of-2015/ […]

Francis Kim says:

December 23, 2015 at 1:15 pm

tqdm looks awesome!

shams kitz says:

December 23, 2015 at 2:16 pm

tqdm â€“ WOW!

Bassem says:

December 23, 2015 at 3:48 pm

Indeed, tqdm — mind blown!

Anthony Thomas says:

December 23, 2015 at 7:05 pm

how much overhead does tqdm add?

Robin Wilson says:

December 23, 2015 at 7:09 pm

Good question – I’ll have a play and do a follow-up blog post!

My top 5 â€˜newâ€™ Python modules of 2015 Â« Robin’s Blog | Raony GuimarÃ£es says:

December 23, 2015 at 8:31 pm

[…] As Iâ€™ve been blogging a lot more about Python over the last year, I thought Iâ€™d list a few of my favourite â€˜newâ€™ Python modules from 2015. These arenâ€™t necessarily modules that were newly released in 2015, but modules that were â€˜new to meâ€™ this year â€“ and may be new to you too! Source: My top 5 â€˜newâ€™ Python modules of 2015 Â« Robin’s Blog […]

EvanZ says:

December 23, 2015 at 10:19 pm

We use pip and virtualenvs. Is it that advantageous to move to conda?

Dan Gayle says:

December 23, 2015 at 10:30 pm

Anthony Thomas, from the docs:

“Overhead is low — about 60ns per iteration (80ns with gui=True). By comparison, the well established ProgressBar has an 800ns/iter overhead.”

Jaysunn says:

December 23, 2015 at 11:30 pm

FTW, TQDM. What’s that fancy autocomplete he’s using in the shell?

So cool.

Jaysunn

David says:

December 24, 2015 at 12:10 am

A dill has never been a better pickle! Bread and butter all the way.

kyaw zeya says:

December 24, 2015 at 12:55 am

Pls share, how I can get tqdm modules?

Milan says:

December 24, 2015 at 2:02 am

Awesome Share, Most of them are new to us, Thanks

Mike says:

December 24, 2015 at 2:41 am

Does tqdm assume all iterations are of equal size. In other words, does a 100 item list count each item as 1%?… that would be unsophisticated and obviously a poor assumption but still cool.

If it does not, then bravo! I would like to see more progress bar and other types of useful debugging modules.

Nelson Liu says:

December 24, 2015 at 4:10 am

Great article! I can see myself making frequent of use tqdm in the future.
Just a minor correction, but joblib 0.9.3 doesn’t actually take a boolean compressed parameter. It takes an int from 0 to 9 that indicates the level of compression in a ‘compress’ parameter. See: https://pythonhosted.org/joblib/generated/joblib.dump.html

Thanks for posting!

rudyryk says:

December 24, 2015 at 7:18 am

Amazing article, thanks Robin!

December 24, 2015 at 8:52 am

I think it assumes that by default, but there are all sorts of fancy ways to control it in more detail – see the docs at https://github.com/tqdm/tqdm

You should be able to install it with `pip install tqdm` – see https://github.com/tqdm/tqdm

jean pierre huart says:

December 24, 2015 at 9:30 am

This is a nice Xmas present, thanks a lot for your briliant selection.

nasr says:

December 24, 2015 at 1:20 pm

i find tqdm amazing!

Week 51 | import digest says:

December 24, 2015 at 2:34 pm

[…] Wilson’s top five new Python modules for 2015 includes one which I’d not come across before but seems utterly indispensible particularly if […]

bastula says:

December 24, 2015 at 7:22 pm

Looks like the fancy auto completer is from bpython: http://bpython-interpreter.org/

Pythonã®joblibãƒ¢ã‚¸ãƒ¥ãƒ¼ãƒ«ã«ã¤ã„ã¦é›‘ã«ã¾ã¨ã‚ã¦ã¿ã‚‹ – Stray Geek says:

December 24, 2015 at 8:45 pm

[…] â€» pickleã®æ‹¡å¼µã«é–¢ã—ã¦ã¯ã“ã“ã§ç´¹ä»‹ã•ã‚Œã¦ã„ãŸdillã‚‚ã‚ã‚‹ã‚‰ã—ã„ã§ã™ãŒã€æœ¬é¡Œã‹ã‚‰é€¸ã‚Œã‚‹ã®ã§æ°—ã«ãªã‚‹æ–¹ã¯èª¿ã¹ã¦ã¿ã¦ãã ã•ã„ã€‚ […]

Crawling in Python | B. Doyle says:

December 25, 2015 at 4:41 am

[…] came across aÂ blog postÂ on Hacker NewsÂ about a few interesting Python modules and one that I found particularly useful […]

tqdm developers says:

December 25, 2015 at 3:31 pm

Season’s greetings and thanks for your post about (among other things) our repo!

Casper

lrq3000 says:

December 25, 2015 at 11:31 pm

I’m another dev from tqdm, thank’s for the link, and thank’s to the person who notified us, because I discovered your wonderful recipy tool! I’m always looking for tools to help in reproducible research, and I’m glad to see such a tool with up-to-date technologies 🙂 On the subject, you might be interested by NeuralEnsemble’s Sumatra, which is very similar to your app’s concept.

One question: does it work with Jupyter Notebooks? If not, do you think that would be possible to do in the future?

Saumitra says:

December 27, 2015 at 2:22 am

How to download and use..i cauld not find..guess it doesnt come with standard package

December 27, 2015 at 8:59 am

Most of these modules will need installing as they don’t come with Python by default. Clicking on the module name in the blog post should take you to the webpage for that module which will tell you how to install it – normally it is something like `pip install tqdm`.

Links for Week 52 of 2015 – thingelstad says:

December 28, 2015 at 3:52 am

[…] My top 5 â€˜newâ€™ Python modules of 2015 Â« Robin’s Blog (blog.rtwilson.com) […]

Shubham Pandey says:

December 28, 2015 at 5:38 am

awesome article about python modules .
Thanks for sharing.

Piyush Wanare says:

December 28, 2015 at 11:31 am

tqdm is just awesome module thanks lrq3000 for creative and more helpful development and thanks to Robin for sharing it.

Nowinki | WiadomoÅ›ci o technologiach IT says:

December 29, 2015 at 2:24 pm

[…] https://blog.rtwilson.com/my-top-5-new-python-modules-of-2015/ […]

My 2015 Python life « Robin's Blog says:

December 30, 2015 at 10:29 am

[…] last post about my favourite ‘new’ (well, new to me) Python packages seemed to be very well received. I’ll post a ‘debrief’ post within the next few […]

Miscellanea December 2015 | panglott says:

December 31, 2015 at 5:48 pm

[…] My top 5 â€˜newâ€™ Python modules of 2015 […]