I’ve just signed the Science Code Manifesto because I firmly believe in what it says. Ok well, that probably doesn’t tell you much – generally I tend to believe in things that I sign – but I’d like to tell you why I signed it, and why I think it’s really important.
A lot of my PhD life is spent writing code (a lot of my life outside of my PhD is also spent writing code, but that’s another story). When I tell people this quite a few of them are surprised that I’m not doing a computer science PhD – because surely they’re the only ones who spend their time writing code? Well…no! A lot of scientists spend a lot of time writing code for research in almost every subject.
Why do they do that? Well, nearly every research project involves at least one of the following activities:
- Data processing
- Plotting graphs
- Calculating statistics
- Running models
- Building new models, simulations and so on
All of these activities can easily be done through code, and in fact it’s often far more efficient to do them through code than by other methods. However, mistakes can be made in code, and people will often want to check the results of other people’s papers (that is, to ensure reproducibility – a key factor in science) – but to do that they need the code. That is what the first tenet of the Science Code Manifesto says: “All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper”. That means that as a reader (or reviewer) I can read the code (to check it looks like it does what it’s meant to do), and run the code (to check it actually does what its meant to do). It also means that if I have the code to do the processing, plus the input data, I can generate the output data that they got, and check it against the results in the paper. I was reading a paper today which examined aerosol optical depth variations across Europe. They had really high resolution data, and I’d love to have seen a map of the distribution across the UK in detail, but it wasn’t included in the paper (they had a lower-resolution map of the whole of Europe instead). If I’d had access to the code (and the data) then I could have generated the data myself, plotted a map over the whole of Europe (to check that it looked the same as their published map) and then zoomed in on the UK to examine it in more detail.
Scientific papers are designed to be built upon. As Newton said, “If I have seen further it is only by standing on the shoulders of giants” – as scientists we all need to stand on the shoulders of those who came before us (giants or not). If you have the code that other scientists have used to produce the results in their paper, it is likely that you might want to modify it (for example, to fix some errors you’ve found), extend it (to make it applicable to your particular study area), and share it or its modifications with your colleagues. You won’t be able to do this unless you know what license the code was originally released under – hence the second tenet of “The copyright ownership and license of any released source code must be clearly stated”.
The next two tenets are very important as they place scientific code at the same level as scientific papers, books and other more ‘traditional’ academic outputs. They state that Researchers who use or adapt science source code in their research must credit the code’s creators in resulting publications and Software contributions must be included in systems of scientific assessment, credit, and recognition. This is important because if we believe that scientific code is important (which I, and the 846 people who have signed the manifesto so far believe) then we need to recognise it. This means two things: firstly citing it, so that we give the proper attribution to the authors, and let people see how it is being used; and secondly giving credit for writing code when we assess how good researchers are. This is something that varies significantly by department and research area – but it is something which I think should be standard across all fields. If you write a good piece of scientific software (not a 10 line Python script in a random file somewhere, but something which is properly released, useful, documented and sustainable) then you should be given credit for it, just as if you had written a paper or a journal article! As a number of people have commented before: a scientific paper which describes a new algorithm is not the scientific work itself – it is just an advert for the work. The real scientific work, and scientific product, is the code that implements the algorithm.
Finally, the manifesto touches on the subject of software sustainability – something that I will (hopefully) be doing a lot more work on in the near future. This refers to the practice of sustaining software so that it can continue to be used (and, ideally, continue to be improved) in the future. Software is a funny thing – it is susceptible to rotting away, just like organic material. This is known as software decay and is primarily caused by the rapid progress made in technology: it may be that the ‘latest, greatest’ technology that you used to write your software in 2012 can’t be run in 2020, or 2025, but the job the software does may still be very important. I think (hope) that all of my code will be able to run for the foreseeable future as I’ve written it in fairly standard programming languages (such as Python and R), but this may not be the case – for example, libraries can easily break as standards evolve, and if the author is no longer maintaining their libraries then they may not get fixed. This can be a big issue, and leads on to the other part of sustaining software: that of generating a community around the software, which will help sustain it in the years to come. The manifesto is actually fairly limited in what it says: Source code must remain available, linked to related materials, for the useful lifetime of the publication, but I feel that it a lot of the other things I’ve raised in this paragraph are also relevant.
So, there we go. That’s why I signed the manifesto – now have a think about it, and if you agree go and sign it too!