Category Archives: Academic
Beware: Latitude/Longitude co-ordinates in ENVI may not be in WGS-84
Summary: When you use the Pixel Locator or Cursor Location/Value tool in ENVI, the latitude and longitude co-ordinates given are based on the datum that the image is in, not necessarily WGS-84.
This may be obvious to some people, but it wasn’t to me – and I thought that if I got confused then some other people probably would too – hence this article. I often use the Pixel Locator dialog box in ENVI to find a specific location in the image by entering the X and Y pixel location (referred to in ENVI as Samples and Lines) or the map co-ordinates of whatever co-ordinate system the image is projected into (for example, the Ordnance Survey National Grid):
Alternatively, you can click the button with the arrows on it, and enter a location in latitude and longitude:
Very handily, all of these values are updated as you move around the image manually – so this dialog can also be used as an equivalent of the Pixel Location/Value window – as shown below.
This all sounds fine – but the problem is that the latitude/longitude values that are shown are calculated using the datum that is defined for the image. You can see this in the image below, where the latitude and longitude value is displayed, and the datum is listed above it:
This seems like a sensible thing to do, but it makes comparison with latitude and longitude co-ordinates from other sources very difficult – as nearly all other latitude and longitude co-ordinates are provided in the WGS-84 datum. Locations from GPS systems are always in WGS-84, and locations on web maps, in GIS systems and most other sources of latitude and longitude co-ordinate systems are very frequently in WGS-84.
So, this raises two questions:
What effect does this have?
Well, I haven’t yet done a full investigation of this (sometime I will sit down and write some code to do some proper testing), but when I found the problem it was causing offsets of around 100-200m – which can be quite significant in many applications.
What can we do about this?
There is a way to tell ENVI to use WGS-84 as the datum for calculating its latitude and longitude values, but it is a bit fiddly. Basically, when you’re in the Pixel Locator dialog before you switch to the latitude/longitude display, click the Change Proj… button and then select click the Datum button to select a new datum. Then switch to the latitude/longitude display, and you’ll find that the datum is listed as WGS-84, and the values will be correct in the WGS-84 datum:
Unfortunately, this means that the map co-ordinate values (for example, the Ordnance Survey Grid References) will now be wrong, and you’ll need to switch back to the original datum to get those to be correct. Also, I can’t find a way to get the Cursor Location/Value dialog to display WGS-84 latitudes and longitudes.
I actually found this while designing a lecture and practical session for MSc students at the University of Southampton on using the ENVI API in IDL. While writing a code example for the students on finding the pixel value at a specific lat/lon co-ordinate, I found that my results came up between ten and twenty pixels away from the place I thought they would (I was using 10m SPOT imagery), and I got different results from my IDL code and ENVI – even though my IDL code was using the ENVI API! Luckily I managed to find out what the problem was – and could therefore explain it to my students, and hopefully stop them running in to the same problem as me. It may be that I’m being silly here, and everyone naturally realises what ENVI does, and thinks it is the right thing to do – but it definitely confused me, so maybe it will helped you.
van Heuklon Ozone model implementation in Python
As part of my PhD I wanted to use a simple model which would give me an estimation of the atmospheric ozone amount given a location and time of year. A simple model to do this was created by van Heuklon in 1979, and was described in a delightfully simple paper (unfortunately not freely available online, but see the reference below) using a simple formula. The map below shows the modelled ozone amounts across the world for 15th April (click to enlarge):
The formula was built up from a baseline ozone amount (235 matm-cm) which then has various components added to it, some of which varied based upon latitude, longitude or time of year. The model isn’t intended to be accurate or high-resolution, but provides me with a rough estimate of the ozone amount at a given location at a given time. Accurate measurements of ozone amounts are available from satellites, but I would need to get data from many different times of year to be able to take into account the seasonal variation in ozone amounts, and the satellite data would take up a lot of space – hence using this simple model.
The code for this model is available on Github and has two implementations: get_ozone_conc and old_get_ozone_conc. The latter uses standard Python code with if statements to set the different values in the formula depending on the input latitudes and longitudes, which means it can only process one latitude/longitude combination at once. get_ozone_conc, on the other hand, uses somewhat more complicated numpy code which allows you to pass it arrays of latitudes and longitudes and it will return an array of results. This is what was used to produce the map above – the code for which is also in the repository.
This code was created for use as part of my PhD, but has been released into the public domain (as CC0) in case anyone else finds it useful.
Also, for those of you using Py6S (my Python interface to the 6S atmospheric Radiative Transfer Model), I’m sure you will be pleased to know that this model will shortly be integrated into Py6S to allow you to easily specify ozone amounts using it.
References:
van Heuklon, T. K. (1979). Estimating atmospheric ozone for solar radiation models. Solar Energy, 22(1), 63-68
A simple way to improve sustainability, reproducibility and releasability of your code and data
This is a first of a number of posts based upon discussions I had while at the Collaborations Workshop 2013 (#CollabW13 on twitter) in Oxford, UK. During one of the sessions I described a simple technique that I try and use to increase the sustainability, reproducibility and releasability of code that I write, data I collect and the results of my work – and people thought this idea was great, and that I should blog about it…
So, what is this wonderful technique:
On Friday afternoon (when you’re bored and can’t be bothered to do any hard work…) spend an hour or two cleaning up and documenting your work from that week
It’s a very simple idea, but it really does work – and it also gives you something to do during the last afternoon of the week when you’re feeling tired and can’t really be bothered. If you can’t commit to doing it every week, you can try every fortnight or month – and I even go as far as adding it as an event to my calendar, to try and stop people arranging meetings with me then!
So, what sort of things can you do during this time?
- Document your code: Depending on the project and the intended use of the documentation, this can be anything from adding some better comments to your code, to documenting individual functions/methods (for example, using docstrings in Python) or writing high-level documentation of the whole system.
- Refactor your code: Refactoring is a “disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behaviour” – that is, basically tidying up, cleaning up, and possibly redesigning you’re code. If you’re anything like me, the code you write when actually doing science isn’t very neat, nice or well-designed – because you’re focused on the science at that point. These few hours on a Friday are your time to focus on the code for a bit…
- Generalise your code to create a library: There are probably a number of things in your code that could be used in many other programs – things like reading certain file formats, performing statistical operations or applying algorithms to array data. This is a perfect time to take these algorithms and ‘decouple’ them from their immediate application in this code so that they can be easily used in other programs you may write in the future. Ideally, this generalised functionality can be packaged into a library and distributed for others to use. My Py6S library was created in this way: I took code that I was using to run the 6S model, generalised it into a library, documented it well, released it – and it has taken on a life of its own!
- Add README files to your folders: If you’re anything like me, you’ll have loads of folders containing data and code – some of which isn’t particularly well named and may not have metadata. One of the easiest (and most effective) ways to deal with this is to create simple README files in each folder explaining what is in the folder, where it came from, what format it’s in – basically anything that you think you’ll want to know about it in a year’s time if you come back to it. I can say from experience just how useful having these README files is!
The key benefit of all of these is that it makes it so much easier to come back to your research later on, and it also makes it so much easier for you to share your research, make it reproducible and allow others to build upon it – and the great thing is that it doesn’t even take that much work. Thirty seconds writing a few notes in a README file could easily save you a week of work in a year’s time, and extending your code into a library would allow you to re-use it in other projects without much extra work.
Another similar idea that was mentioned by someone else at the Collaborations Workshop was for Research Councils to force people to add extra time to the end of their grants to do this sort of thing – although personally, I think it is a far better idea to do this as you go along. Trying to add documentation to some code that you wrote two years ago is often quite challenging…
So, there it is – a simple way to use up some time at the end of the week (when you can’t really be bothered to do anything ‘new’) which will significantly improve the sustainability, reproducibility and releasability of your code and data. Try it out, and let us know how you do in the comments below!
Review: Programming ArcGIS 10.1 with Python Cookbook
Summary: A useful guide to automating ArcGIS using Python, which is fully up-to-date with the latest version of ArcGIS. Definitely provides “quick answers to common problems”, but it may take more effort to get a deep understanding of the methods used. Good breadth of coverage – but notably lacks raster examples – and well explained throughout. I have recommended to colleagues using ArcGIS who want to start off with Python.

Reference: Pimpler, E., 2013, Programming ArcGIS 10.1 with Python Cookbook, Packt Publishing, 508pp Amazon Link, Publishers Link (with sample chapter and Table of Contents)
I wrote this review in consultation with my wife, Olivia, who has been an ArcGIS user for a few years, but is fairly new to Python programming (she’s done a bit in the last few months, but nothing within ArcGIS). I am more experienced with Python – having released a number of packages to the Python Package Index, and having used Python with ArcGIS in various contexts with both ArcGIS 9.3 and 10.1, but I thought it would be useful to get an opinion from both a more- and less-experienced user.
We both found the book useful, and learnt a lot from it, but weren’t entirely convinced by the cookbook format. In general, if you like cookbook-style books then this is likely to appeal to you, but if they annoy you then you won’t like it. The benefits of the cookbook format are that it is very easy to pick up the book, find the relevant recipe for what you’re trying to achieve, turn to it and do it immediately, as each recipe is almost entirely self-contained. Of course, the disadvantage is learning in this way lends to lead to superficial knowledge rather than deep understanding: you may know what to do to achieve a certain result, but not why you’re doing it, and thus cannot adapt the recipes easily to other situations. Another disadvantage is that the recipes are often very repetitive – for example, each recipe in this book starts with the saying “Import the ArcPy module using import arcpy”, and often “import arcpy.da” – something which could be mentioned once and then referred to in future recipes.
Moving away from the cookbook format, which is very much a matter of personal taste, the book covers a wide range of tasks that you can scripts using Python in ArcGIS including interacting with the map window, laying out and printing maps, running geoprocessing operations and accessing the raw data in each layer. The ordering of the book may seem strange to you if you are interested in automating processing operations, rather than automating layout and map style operations, as the geoprocessing part of the book doesn’t start until chapter 6. However, when you get to this section it is very good, and chapters 6 to 9 provide almost all of the information needed to automate geoprocessing tools either using the built-in ArcGIS geoprocessing tools, or by writing your own geoprocessing algorithms using the raw spatial and attribute data.
I was impressed to see coverage of creating ArcGIS Addins using Python in chapter 11, as many ArcGIS experts are not aware of the possibility of creating addins using Python. These addins can provide very user-friendly ways to interact with processes which have been automated in Python, so it is good to see they were covered in this book. I was also pleased to see a chapter on error handling which covers both generic Python error-handling techniques (such as the try-except block) and ArcPy-specific methods, including how to get error messages from ArcGIS into Python. However, there is a noteable absence of any recipes involving raster data – the only exception is in chapter 10 when the use of the Describe() function to find out information about images is explained. However, ArcGIS 10.0 introduced various functions to operate on rasters – including returning the raster data as a numpy array, and allowing updates of raster data from numpy arrays – the coverage of which would really have made the book complete.
In terms of ancilliary information in the book: there is a useful twenty page introduction to the Python language, as well as useful appendices covering automation of Python scripts (using the Windows command-line, and Task Scheduler) and “Five Things Every GIS Programmer Should Know How to Do with Python” (including downloading via FTP, sending emails, and reading various types of files).
The book is generally well-written and formated, and things are explained well – although the quality of the images used is lacking sometimes (with JPEG compression artefacts clearly visible). Overall, the book is a very useful addition to a GIS library for people who are new to automating ArcGIS using Python, and particularly those who want to find out quickly how to automate a particular operation.
John Snow’s Cholera data in more formats
In honour of the bicentenary of John Snow’s birth – and because I was asked to by someone via email – I have now released my digitisation of John Snow’s Cholera data in a few other formats: KML and as Google Fusion Tables.
To save you reading my previous blog posts on the subject, I’ll give a brief overview of my data. John Snow produced a famous map in 1854 showing the deaths caused by a cholera outbreak in Soho, London, and the locations of water pumps in the area. By doing this he found there was a significant clustering of the deaths around a certain pump – and removing the handle of the pump stopped the outbreak. This is a bit of a simplification (see Wikipedia or the John Snow Society for more details), but generally covers what happened.
Anyway, I digitised John Snow’s original data and georeferenced it to the Ordnance Survey co-ordinate system, so that I could overlay it on modern maps of that area, as below (using the OS OpenData StreetView data, containing Ordnance Survey data © Crown copyright and database right 2013):
while still being able to overlay it on John Snow’s original map:
Anyway, the data that is available is:
- Cholera Death locations (Vector) with attribute data giving the number of deaths at each point
- Pump locations (Vector)
- John Snow’s original map georeferenced to the Ordnance Survey National Grid (Raster)
- Current Ordnance Survey maps of the area (from those released under OS OpenData; Contains Ordnance Survey data © Crown copyright and database right 2013; Raster)
These are available for download/use in a number of formats:
- A zip file with the Vector data as Shapefiles and the Raster data as TIFF images
(this is the original data provided for download by me – and is probably what you want for importing into a GIS system)
Download - A zip file with the Vector data as KML files and the Raster data as TIFF images
(suitable for importing into Google Earth and other products that use KML files)
Download - Links to Google Fusion Tables with the vector data already imported
Cholera Deaths
Pumps
Deaths and Pumps together (this dataset has both pump and death points in the same table: pump points have Count values of -999, death points have Count values > 0 which give the number of Cholera deaths at that location)
The latter is particularly cool, I think as it allows you to very easily overlay the data on modern Google Maps data, and should allow some interesting ‘mashups’ to be created. All of the tables are set to be shared publically, so you could be able to copy them (using the Copy Table command in the File menu) and play around with them as much as you want! If you click the Example Map tab then you’ll see a very rudimentary map I’ve created using the data on top of Google Maps (see below) – I’m sure you’ll be able to do far better visualisations than that.
The folks at CartoDB have also used this data in one of their tutorials which shows you how to import the data to CartoDB and create a styled map to show the deaths with different sized markers – yet another way you can use the ‘first real GIS data’ in today’s modern web-based GIS tools.
So, enjoy – and please let me know (via the comments below) what you create!
Update: There was a problem with the KML files and Google Fusion Tables that I uploaded yesterday, caused by an incorrect co-ordinate transformation between the Ordnance Survey grid references and latitude/longitude. This has now been fixed and the downloads and tables have been updated. Sorry about this.
Resources for Sustainable Software and Reproducible Research in Remote Sensing presentation
I recently did a presentation entitled Sustainable Software and Reproducible Research in Remote Sensing at the Wavelength 2013 conference. The slides are available at SpeakerDeck, and shown below.
The rest of this page is a rather rough categorised list of various links and resources that may be useful to you after seeing the presentation (either in person at the conference, or online).
My blog posts
I’ve looked at some of these issues before on this blog. The most relevant posts are:
- Science Cost Manifesto
- ProjectTemplate for Reproducible Research
- Standard test images for Remote Sensing
Papers referenced in the presentation
- Saleska, Scott R., et al. “Amazon forests green-up during 2005 drought.”Science 318.5850 (2007): 612-612. PDF
- Samanta, Arindam, et al. “Amazon forests did not green-up during the 2005 drought.” Geophysical Research Letters 37.5 (2010): L05401. PDF
- Irish, Richard R., et al. “Characterization of the Landsat-7 ETM+ automated cloud-cover assessment (ACCA) algorithm.” Photogrammetric engineering and remote sensing 72.10 (2006): 1179. PDF
- Tang, Jiakui, et al. “Aerosol optical thickness determination by exploiting the synergy of TERRA and AQUA MODIS.” Remote Sensing of Environment 94.3 (2005): 327-334. PDF
- Aruliah, D. A., et al. “Best Practices for Scientific Computing.” arXiv preprint arXiv:1210.0530 (2012). PDF
Sustainable Coding
This is basically just good coding practices. Various resources are relevant:
- Software Carpentry – look under Lessons for online lessons on various things (automated testing, version control, documentation, debugging, code program structure etc) and Boot Camps to see if there is a training day coming up near you.
- Code Complete (2nd Edition) – a big book with loads of information on good coding styles etc. Your library is likely to have a copy
- Version Control by Example – a great book by Eric Sink (disclosure: I proof-read it for him and gave various bits of feedback, and therefore got a free copy) which covers the concepts of version control in general, and then shows how they apply to various real-world systems such as Git, Mercurial and Subversion.
- Science Code Manifesto – a manifesto for how software and code should be treated in science.
Remote-Sensing-related:
- ArcGIS has a number of tools and functions that make it easier to do reproducible research.
- Model Builder is a great way to get started with automating your ArcGIS analyses without having to learn to program. It allows you to build an automatic workflow through drag-and-drop, and is really handy for automating things you do frequently, for documenting your processes and more. It can even export your models to Python code, so you can start bridging the gap between Model Builder and full Python programming. For more information look at the documentation, or Google for tutorials. (Similar tools exist in a number of other pieces of software, including Idrisi – but unfortunately not ENVI – although if you have the ArcGIS ENVI toolbox installed you can use many ENVI tools from ArcGIS).

Example of ArcGIS Model Builder
- My ArcGIS Provenance tool is not yet working well enough to be released, but until it is, you can look at your history manually (its hard to understand that way, but better than nothing). See my blog post about ArcGIS logs here XXX
- Model Builder is a great way to get started with automating your ArcGIS analyses without having to learn to program. It allows you to build an automatic workflow through drag-and-drop, and is really handy for automating things you do frequently, for documenting your processes and more. It can even export your models to Python code, so you can start bridging the gap between Model Builder and full Python programming. For more information look at the documentation, or Google for tutorials. (Similar tools exist in a number of other pieces of software, including Idrisi – but unfortunately not ENVI – although if you have the ArcGIS ENVI toolbox installed you can use many ENVI tools from ArcGIS).
- ENVI
- ENVI can be scripted in IDL, but it is generally harder than ArcGIS scripting in Python – details are on the ENVI website in the ENVI Programmers Guide. At the time of writing (March 2013), I would strongly suggest using the ENVI Classic IDL API, rather than the modern API for ENVI 5.0 onwards, as the latter is not really finished yet.
- ENVI also had a log system (turn it on under File->XXX). The logs it produces are easier to read by hand than those created by ArcGIS, but harder for a computer to read…
- If you are developing some sort of method/algorithm which works on satellite data to produce an output, publishing the results generated from a freely-available input image (eg. MODIS, Landsat etc) allows others to verify their work when they try and reimplement your method.
- If you are collecting field spectra, please try and collect metadata along with it – fulfilling the requirements of: Hüni, Andreas, et al. “Metadata of spectral data collections.” 5th EARSeL Workshop on Imaging Spectroscopy, Bruges, Belgium. 2007. PDF
General Reproducible Research
Tools
All of the tools below are available to use today – so have a go, try them out, and see what happens. The only tool which isn’t available is something called Burrito. It is described in an article by its author who wrote it as part of his PhD. It was only a prototype and requires some rather unusual system settings (for example, Linux with an entirely different filesystem) – but it really shows the sort of things we should aim for in terms of reproducible research tools.
- ProjectTemplate – the R package used for my example in the presentation. Also see my blog post.
- Gloo – same idea as ProjectTemplate, but for Python
- Repro – a build-tool-based approach for automating scientific computation
- Sumatra – tracking and documentation tool for scientific computation
- VCR – Verifiable Computational Research, a tool to allow you to store and verify computer-based research.
Useful sites
Lots of people online have come up with useful tutorials, guides and discussions about reproducible research – some of them say that they are about other fields, but lots of ideas are easily transferrable between fields.
- Tutorial on Reproducible Research in Computational Neuroscience – This is useful for almost all fields, and gives a good overview of a lot of important areas like Version Control, Provenance Tracking etc. Very well written.
- How to define Reproducible Research – This is a more philosophical discussion of what Reproducible Research actually is – whether being able to click a button and run the computations again is all that is needed, or whether there is more to it than that.
General Sustainability
- The Software Sustainability Institute do lots of work on software sustainability, but also cover data sustainability, and work with Software Carpentry
- Software Carpentry have various online lessons, and also run boot camps, all of which teach good scientific software development practices.
- The Open Data Handbook has a good section on open file formats.
Converting latitude/longitude co-ordinates to Landsat WRS-2 paths/rows
As part of some work I was doing for my PhD, I needed to automatically find what Landsat scene path and row would contain a pixel with a certain latitude/longitude co-ordinate (this was basically so I could automatically download the relevant Landsat image and do some processing on it).
There is an online converter (provided by the USGS) which allows you to input latitude/longitude and get back path/row, but it doesn’t have an API of any kind, and I couldn’t see an easy to way to automatically submit requests to the form, as they’ve (very sensibly really) included methods to stop people doing that.
Summary: So, I wrote my own code to do the conversion. It’s available on Github and depends on GDAL/OGR and Shapely. See the README file on Github for instructions.
Very helpfully, the USGS have made shapefiles of the WRS-2 scene boundaries available to download, so all my code needs to do is load the shapefile, do some ‘Point in Polygon’ operations to find out which polygon(s) the point is in, and extract the ‘path’ and ‘row’ attributes from those polygons. This is fairly easy to do in Python using the wonderful OGR library to access shapefiles, and Shapely to do the ‘Point in Polygon’ operations. When writing this I used the Python Geospatial Development book as a good reference for how to combine GDAL and Shapely in this way.
The code is available at GitHub, with instructions on how to install it (basically, download the shapefiles from the USGS – I’m deliberately not re-distributing them here – and download the Python script), and it is very easy to use:
>>> import get_wrs
>>> conv = ConvertToWRS()
>>> conv.get_wrs(50.14, -1.7)
[{'path': 202, 'row': 25}]
>>> conv.get_wrs(50.14, -1.43)
[{'path': 201, 'row': 25}, {'path': 202, 'row': 25}]
Beware:
- Instantiating the class may take a while, as it has to load in the shapefile
- The get_wrs method may return multiple results, as Landsat scenes overlap at all four edges.
Please let me know (via a comment on this post) if you find this code useful.
For reference, the code is shown below, and is well-commented to show how it works:
Can I atmospherically-correct my images with Py6S?
I’m a big fan of Matt Might’s blog, and thought I’d implement one of his tips for blogging as an academic – namely Reply to Public. I’ve had a number of emails from Py6S users asking me questions about how to atmospherically-correct entire satellite images with Py6S – so I thought ‘d respond online, so that in the future people either find this response without needing to email me (ideally), or at the very least, I can point them to this blog post when they do email me.
Unfortunately, the simple answer to the question is: Py6S cannot (yet!) atmospherically correct satellite images – at least, not in a sensible manner – so I’d suggest using other atmospheric correction software such as ATCOR or FLAASH. The longer answer is below…
If you read the Py6S documentation you’ll find that there is an atmospheric correction option which you can use to take an at-sensor radiance (often called a Top of Atmosphere radiance when you’re dealing with satellite sensors) and atmospherically-correct it to get a corrected surface reflectance (or radiance) value. You could write some code to do that – it might look a bit like this:
from Py6S import * # Create a SixS object called s # (used as the standard name by convention) s = SixS() # Set the atmospheric conditions as usual s.aero_profile = AeroProfile.PredefinedType(AeroProfile.Maritime) s.aot550 = 0.05 s.atmos_profile = AtmosProfile.UserWaterAndOzone(2, 0.318) # Set the wavelength s.wavelength=Wavelength(PredefinedWavelengths.MODIS_B6) # Set the altitudes s.altitudes.set_target_sea_level() s.altitudes.set_sensor_satellite_level() # Set the geometry s.geometry = Geometry.User() s.geometry.solar_z= 35.8 s.geometry.solar_a= 149.4 s.geometry.view_z= 5.1 s.geometry.view_a= 106.8 s.geometry.month=06 s.geometry.date=28 # Turn on atmospheric correction mode and set it to do the # correction assuming a Lambertian surface with a TOA # radiance of 137.5 W/m^2 s.atmos_corr = AtmosCorr.AtmosCorrLambertianFromRadiance(137.5) # Run the model s.run() # Print the result of the atmospheric correction # (assuming Lambertian reflectance) # This is the ground-reflectance for the given radiance, # under the given atmospheric conditions print s.outputs.atmos_corrected_reflectance_lambertian
This works fine, and so you might think that all you need to do to correct a satellite image is to take the code above, and run it for each pixel of the image using a for loop, as in the pseudo-code below:
# Load in the image (eg. using GDAL) for pixel in image: reflectance = run_py6s(various, parameters, here) # Store reflectance in the output image # Save output image
Unfortunately, that won’t work very well. Or rather, it will work – it’ll just take a long time! Let’s do some maths to work out roughly how long:
- As a rough approximation, it takes Py6S around two seconds to atmospherically correct a single pixel in a single band (the exact time depends on how complicated the parameterisation is – for example, a full manual AERONET-based aerosol parameterisation will take longer than a standard aerosol model)
- A full Landsat scene is roughly 5500 x 6000 pixels (170km x 185km), which is around 33 million pixels
- 33 million pixels x 2 seconds per pixel = 66 million seconds, or 763 days!
- If I started running this now – on one band of a Landsat scene – it’d be finished around April 2015. Even if I used a simple parameterisation that only took one second per pixel, it’d still take over a year!
Now, I suspect you’re now thinking the Py6S is really awful because it takes such a long time. Well, unfortunately it’s not down to Py6S (if it was, I could try and improve it!). In fact, the Python bit of Py6S adds very little time - the vast majority of the time is taken by 6S itself.
So, maybe it’s 6S that is rubbish. Well, again, unfortunately not, or we could switch to something that is better. All other Radiative Transfer Models, such as MODTRAN, SCIATRAN etc, suffer from the same problem.
So, how on earth do people manage to correct satellite images? Obviously people have managed to do it – and there is commercial software available to do it. Well, they use something called a Lookup Table (LUT). This involves running the Radiative Transfer Model many times to produce a table of results, and then looking up the result you want in this table. Creating the table takes a long time (but nowhere near as long as 763 days!) and it often takes a lot of space to store the resulting table (for example, the Lookup Table used in ATCOR is a multi-Gigabyte file), but once you’ve got the table you can correct a pixel in a tiny fraction of a second – rather than one or two seconds – which means the correction of an image is a lot quicker.
A lookup table is a good example of two standard trade-offs in computing:
- Setup time vs Run time: Often algorithms can either be designed to have a very short (or no) setup time, but then take a long time to run; or they can be designed to have a long setup phase, and then run very quickly from then onwards. Often the decision of which one to focus on depends on how often you’re going to run your process – if you want to run it multiple times (like an atmospheric correction algorithm) then it is normally better to have a long setup time (which you only have to do once!) and then a short run time.
- Memory vs Time: Again, algorithms often trade speed against memory – a fast algorithm normally requires more memory, and vice-versa. In this case, a Lookup Table trades speed for memory – we have to store the Lookup Table in memory, but once we have access to it, we can do corrections very fast.
I haven’t written any code to use Py6S to create a lookup table and then use that to correct an image (although I will almost certainly write some code to do this sometime during my PhD) – but it wouldn’t be too difficult to do. The procedure would be something like the following:
Setup Phase:
- Come up with a list of parameter values for all of the important parameters – choosing a number of parameters across their realistic range (there is a trade-off between accuracy and speed here, as always). For example maybe: AOT = [0.1, 0.2, 0.3, 0.4…]; solar_z = [10, 20, 30, 40, 50…]; radiance = [10, 20, 30, 40, 50, 60, 70] etc.
- Write some code to loop over all of these lists and run every possible combination of them
- Store the result of all of these runs of Py6S (ie. the atmospherically-corrected reflectance) and the parameters used to get the result in a big table of some sort.
Correction Phase:
Loop through every pixel in the image, and for each pixel:
- Get the parameters (AOT, angles, radiance etc) for that pixel
- Choose the nearest parameters that were used to create results in the table (for ALL of the parameters you used)
- Interpolate between these parameters as needed (eg. for an AOT of 0.15, get the results for an AOT of 0.1 and and AOT of 0.2 and interpolate) to get an interpolated ground reflectance.
- Store this reflectance in another image.
This basic procedure has been used in almost all atmospheric correction tools for the last thirty years, and is described in various papers including Fraser et al. (1992) and Liang et al. (1997). Another way to speed-up the computation even more is by introducing parallel computation. If we can assume that each pixel’s atmospheric correction is entirely independent from every other pixel’s (which may not be the case in the real-world, but is generally assumed by most atmospheric correction algorithms) then we can split the pixels between a number of processors and thus correct many pixels in parallel – which gets an even greater speedup as we can do this both when generating the LUT, and when correcting the image.
Unfortunately, actually implementing this is a bit more complicated than I’ve explained here and various decisions have to be taken (including what ranges of parameters to use, how to interpolate, how best to store the multi-dimensional lookup table and more…) – but it is definitely on my ‘to do’ list as part of my PhD, and as part of Py6S.
So, in conclusion: Py6S can’t do atmospheric correction of satellite imagery in a sensible time at the moment, but should be able to within the next few years. In the meantime, I suggest using other atmospheric correction software (most of which, unfortunately, is commercial software). You may think that not being able to do this makes Py6S useless…but, as my next post will show, Py6S is still very useful for remote-sensing research.
References
Fallah-Adl, Hassan, et al. “Fast algorithms for removing atmospheric effects from satellite images.” Computational Science & Engineering, IEEE 3.2 (1996): 66-77.
Fraser, R. S., Ferrare, R. A., Kaufman, Y. J., Markham, B. L., & Mattoo, S. (1992). Algorithm for atmospheric corrections of aircraft and satellite imagery.International Journal of Remote Sensing, 13(3), 541-557.
Liang, Shunlin, et al. “An operational atmospheric correction algorithm for Landsat Thematic Mapper imagery over the land.” Journal of Geophysical Research 102.D14 (1997): 17173-17.
Py6S: Run a radiative transfer model with a user-defined input spectrum
Version 1.2 of Py6S has recently been released (including a couple of bug fix releases, taking the most recent version to v1.2.2), and the major new feature in this version is the ability to use any spectrum as the ground reflectance for a model run. Previously, users were restricted to using the built-in 6S ground spectra (vegetation, clear water, lake water and sand) or doing some complicated parameterisation to set the user-defined reflectances up in the correct manner – so complicated that it was almost never worth doing!
Luckily, that has all changed in this version – read on to find out more…
Setting the Ground Reflectance
Setting the ground reflectance parameterisation in Py6S is very simple – you just use one of the methods of the GroundReflectance class. For example, to create a 6S model and set a constant reflectance of 0.5 for all wavelengths, simply run:
from Py6S import * s = SixS() s.ground_reflectance = GroundReflectance.HomogeneousLambertian(0.5) s.run()
In previous versions the parameter given to HomogeneousLambertian (or HeterogeneousLambertian) could be any of:
- A single reflectance value as a float, which is used as a uniform reflectance for all wavelengths:
s.ground_reflectance = GroundReflectance.HomogeneousLambertian(0.7)
- A constant stating a built-in 6S reflectance value:
s.ground_reflectance = GroundReflectance.HomogeneousLambertian(GroundReflectance.GreenVegetation)
- An array giving a user-defined spectrum given in steps of 2.5nm for the wavelength range used for that simulation (the last bit is important!)
s.ground_reflectance = GroundReflectance.HomogeneousLambertian([0.6, 0.8, 0.34, 0.453])
But now there is one more option:
- A two-dimensional array containing wavelengths (column 0, in micrometres) and reflectance values (column 1, as a fraction). This array will then be taken by Py6S, resampled to 2.5nm, subset to only the range of wavelengths being used for the current simulation, and used with the underlying 6S model.
s.ground_reflectance = GroundReflectance.HomogeneousLambertian(spectrum_array)
This saves a lot of time and extra code, compared to doing all of the resampling yourself and using the user-defined spectrum option which was always available in Py6S. However, there are more benefits than this… Py6S will store the whole spectrum that you give, and just re-sample it to the correct wavelength range every time you run the model, which means you can use the lovely helper methods like run_vnir and run_landsat_tm to run simulations for a spectrum for a number of wavelengths with very little code. For example:
from Py6S import * s = SixS() # Somehow generate a 2D array called spectrum_array here s.ground_reflectance = GroundReflectance.HomogeneousLambertian(spectrum_array) wavelengths, results = SixSHelpers.Wavelengths.run_vnir(s, output_name="apparent_radiance")
Getting spectral data
Of course, that example won’t work until we add in some code to generate this 2D array for us. There are a two ways we could get this array:
- Generate it within our code somehow – for example, using a simple model, or interpolation between some hard-coded values.
- Load it from some sort of external datafile, for example from a CSV file using np.loadtxt
Both of those are ways that you might want to use, but I’ve added functions to Py6S to simplify a couple of ways of doing this that may be useful to people.
Firstly, the functions in the Spectra module make it easy to import spectra from the USGS Spectral Library and the ASTER Spectral Library. These two libraries seem to be the most commonly used – as far as I am aware, at least – and have a very wide range of spectra in them. The two functions are very simple to use – you simply pass the URL or file path to the spectrum data file, and it returns the right sort of array. Thus you can do:
spectrum_array = Spectra.import_from_usgs("http://speclab.cr.usgs.gov/spectral.lib06/ds231/ASCII/V/russianolive.dw92-4.30728.asc")
s.ground_reflectance = GroundReflectance.HomogeneousLambertian(spectrum_array)
Or
s.ground_reflectance = GroundReflectance.HomogeneousLambertian(Spectra.import_from_usgs("http://speclab.cr.usgs.gov/spectral.lib06/ds231/ASCII/V/russianolive.dw92-4.30728.asc"))
(Either of the URLs could be replaced with a simple local file path instead, which is particularly useful if you have downloaded the USGS Spectral Library archive file, or obtained the CD of the ASTER Spectral Library).
The other way to get data is to use a model. Many models can produce spectral data, and most will output in a format like CSV that can be imported into a NumPy array really easily – but to make things even easier I have also released a Python interface to the ProSAIL called – you guessed it – PyProSAIL. Again, the methods have been carefully designed to return the right sort of arrays, so you can simply do:
# Make sure you have both PyProSAIL and Py6S installed import pyprosail from Py6S import * spectrum = pyprosail.run(1.5, 40, 8, 0, 0.01, 0.009, 1, 3, 0.01, 30, 0, 10, 0, pyprosail.Planophile) s = SixS() s.ground_reflectance = GroundReflectance.HomogeneousLambertian(spectrum) s.run()
So, hopefully that has been a useful introduction to how to use user-defined spectra with Py6S. I’ll leave you with an example of how to put all of this together to do something useful – that is, to run a Py6S simulation with the ground reflectance set to the spectra from every USGS spectral library file in a directory, and store the results:
from Py6S import *
import glob
import numpy as np
# Create the SixS model and set some parameters
s = SixS()
s.aero_profile = AeroProfile.PredefinedType(AeroProfile.NoAerosols)
s.altitudes.set_sensor_satellite_level()
s.altitudes.set_target_sea_level()
# Get a list of files that have the extension .asc
# (the extension used by the USGS spectral library files)
files = glob.glob("./USGSSpecLib/Selected/*.asc")
results = []
# Run the model using each filename as the ground reflectance spectrum
# using the Landsat ETM+ bands
for filename in files:
s.ground_reflectance = GroundReflectance.HomogeneousLambertian(Spectra.import_from_usgs(filename))
wvs, res = SixSHelpers.Wavelengths.run_landsat_etm(s, output_name='apparent_radiance')
results.append(res)
# Stack all of the results into an array
results_arr = np.vstack(results)
# Write the results out to a CSV file for further analysis
np.savetxt("results.csv", results_arr, delimeter=",")
My first academic paper – on Py6S
Another exciting update for this new year: my first academic journal paper has been published!
It’s called Py6S: A Python interface to the 6S Radiative Transfer Model, and is published in Computers and Geosciences. If you’re reading this from a university with a subscription to Computers and Geosciences then you can read it at ScienceDirect – if you don’t, then you can read the post-peer-review (but pre-publisher-typsetting) version at my website.
The paper describes the Py6S python module that I wrote to assist me in running simulations using the Second Simulation of the Satellite Signal in the Solar Spectrum (6S) model, and is fairly short, but – I hope – fairly useful. It also gives a good way for people to cite Py6S when (if…) they use it in their work – they simply cite this paper.
Of course, the issue with journal papers is that they are static – indeed, a number of things changed in Py6S during the time between submitting the paper and it coming out in print (which was around nine months!). None of those affected the example code given in the paper – and to be honest, they were mostly bugfixes – but there are some new features gradually working their way into the code.
I’m planning to start a series of posts about Py6S on this blog, showing how various things work ‘under the hood’, announcing new features, and showing examples of how to use Py6S well. Hopefully this will be useful for me and also useful for other people who may be interested in using Py6s, but may require a bit of help.
So… stay tuned for the next post which will be on a new feature involving user-defined spectra.



