Resources for Sustainable Software and Reproducible Research in Remote Sensing presentation
I recently did a presentation entitled Sustainable Software and Reproducible Research in Remote Sensing at the Wavelength 2013 conference. The slides are available at SpeakerDeck, and shown below.
The rest of this page is a rather rough categorised list of various links and resources that may be useful to you after seeing the presentation (either in person at the conference, or online).
My blog posts
I’ve looked at some of these issues before on this blog. The most relevant posts are:
- Science Cost Manifesto
- ProjectTemplate for Reproducible Research
- Standard test images for Remote Sensing
Papers referenced in the presentation
- Saleska, Scott R., et al. “Amazon forests green-up during 2005 drought.”Science 318.5850 (2007): 612-612. PDF
- Samanta, Arindam, et al. “Amazon forests did not green-up during the 2005 drought.” Geophysical Research Letters 37.5 (2010): L05401. PDF
- Irish, Richard R., et al. “Characterization of the Landsat-7 ETM+ automated cloud-cover assessment (ACCA) algorithm.” Photogrammetric engineering and remote sensing 72.10 (2006): 1179. PDF
- Tang, Jiakui, et al. “Aerosol optical thickness determination by exploiting the synergy of TERRA and AQUA MODIS.” Remote Sensing of Environment 94.3 (2005): 327-334. PDF
- Aruliah, D. A., et al. “Best Practices for Scientific Computing.” arXiv preprint arXiv:1210.0530 (2012). PDF
This is basically just good coding practices. Various resources are relevant:
- Software Carpentry – look under Lessons for online lessons on various things (automated testing, version control, documentation, debugging, code program structure etc) and Boot Camps to see if there is a training day coming up near you.
- Code Complete (2nd Edition) – a big book with loads of information on good coding styles etc. Your library is likely to have a copy
- Version Control by Example – a great book by Eric Sink (disclosure: I proof-read it for him and gave various bits of feedback, and therefore got a free copy) which covers the concepts of version control in general, and then shows how they apply to various real-world systems such as Git, Mercurial and Subversion.
- Science Code Manifesto – a manifesto for how software and code should be treated in science.
- ArcGIS has a number of tools and functions that make it easier to do reproducible research.
- Model Builder is a great way to get started with automating your ArcGIS analyses without having to learn to program. It allows you to build an automatic workflow through drag-and-drop, and is really handy for automating things you do frequently, for documenting your processes and more. It can even export your models to Python code, so you can start bridging the gap between Model Builder and full Python programming. For more information look at the documentation, or Google for tutorials. (Similar tools exist in a number of other pieces of software, including Idrisi – but unfortunately not ENVI – although if you have the ArcGIS ENVI toolbox installed you can use many ENVI tools from ArcGIS).
- My ArcGIS Provenance tool is not yet working well enough to be released, but until it is, you can look at your history manually (its hard to understand that way, but better than nothing). See my blog post about ArcGIS logs here XXX
- ENVI can be scripted in IDL, but it is generally harder than ArcGIS scripting in Python – details are on the ENVI website in the ENVI Programmers Guide. At the time of writing (March 2013), I would strongly suggest using the ENVI Classic IDL API, rather than the modern API for ENVI 5.0 onwards, as the latter is not really finished yet.
- ENVI also had a log system (turn it on under File->XXX). The logs it produces are easier to read by hand than those created by ArcGIS, but harder for a computer to read…
- If you are developing some sort of method/algorithm which works on satellite data to produce an output, publishing the results generated from a freely-available input image (eg. MODIS, Landsat etc) allows others to verify their work when they try and reimplement your method.
- If you are collecting field spectra, please try and collect metadata along with it – fulfilling the requirements of: Hüni, Andreas, et al. “Metadata of spectral data collections.” 5th EARSeL Workshop on Imaging Spectroscopy, Bruges, Belgium. 2007. PDF
General Reproducible Research
All of the tools below are available to use today – so have a go, try them out, and see what happens. The only tool which isn’t available is something called Burrito. It is described in an article by its author who wrote it as part of his PhD. It was only a prototype and requires some rather unusual system settings (for example, Linux with an entirely different filesystem) – but it really shows the sort of things we should aim for in terms of reproducible research tools.
- ProjectTemplate – the R package used for my example in the presentation. Also see my blog post.
- Gloo – same idea as ProjectTemplate, but for Python
- Repro – a build-tool-based approach for automating scientific computation
- Sumatra – tracking and documentation tool for scientific computation
- VCR – Verifiable Computational Research, a tool to allow you to store and verify computer-based research.
Lots of people online have come up with useful tutorials, guides and discussions about reproducible research – some of them say that they are about other fields, but lots of ideas are easily transferrable between fields.
- Tutorial on Reproducible Research in Computational Neuroscience – This is useful for almost all fields, and gives a good overview of a lot of important areas like Version Control, Provenance Tracking etc. Very well written.
- How to define Reproducible Research – This is a more philosophical discussion of what Reproducible Research actually is – whether being able to click a button and run the computations again is all that is needed, or whether there is more to it than that.
- The Software Sustainability Institute do lots of work on software sustainability, but also cover data sustainability, and work with Software Carpentry
- Software Carpentry have various online lessons, and also run boot camps, all of which teach good scientific software development practices.
- The Open Data Handbook has a good section on open file formats.