Robin's Blog

How to: Fix weird ENVI startup file issues

This post is more a note to myself than anything else – but it might prove useful for someone sometime.

In the dim and distant mists of time, I set up a startup file for ENVI which automatically loaded a specific image every time you opened ENVI. I have no idea why I did that – but it seemed like a good idea at the time. When tidying up my hard drive, I removed that particular file – and ever since then I’ve got a message each time I load ENVI telling me that it couldn’t find the file.

I looked in the ENVI preferences window, and there was nothing listed in the Startup File box (see below) – but somehow a file was still being loaded at startup. Strange.

ENVIConfig

 

I couldn’t find anything in the documentation about where else a startup file could be configured, and I searched all of the configuration files in the ENVI program folder just in case there was some sort of command in one of them – and I couldn’t find it anywhere.

Anyway, to cut a long story short, it seems that ENVI will automatically run a startup file called envi.ini located in your home directory (C:\Users\username on Windows, \home\username on Linux/OS X). This file existed on my machine, and contained the contents below – and deleting it stopped ENVI trying to open this non-existent file.

; envi startup script
open file = C:\Data\_Datastore\SPOT\SPOT_ROI.bsq

 


Simple parameter files for Python class-based algorithms

As part of my PhD I’ve developed a number of algorithms which are implemented as a class in Python code. An example would be something like this:

class Algorithm:
	def __init__(self, input_filename, output_basename, thresh, n_iter=10):
		self.input_filename = input_filename
		self.output_basename = output_basename
		
		self.thresh = thresh
		
		self.n_iter = n_iter
		
	def run(self):
		self.preprocess()
		
		self.do_iterations()
		
		self.postprocess()
		
	def preprocess(self):
		# Do something, using the self.xxx parameters

	def do_iterations(self):
		# Do something, using the self.xxx parameters
	
	def postprocess(self):
		# Do something, using the self.xxx parameters

The way you’d use this algorithm normally would be to instantiate the class with the required parameters, and then call the run method:

alg = Algorithm("test.txt", 0.67, 20)
alg.run()

That’s fine for using interactively from a Python console, or for writing nice scripts to automatically vary parameters (eg. trying for all thresholds from 0.1 to 1.0 in steps of 0.1), but sometimes it’d be nice to be able to run the algorithm from a file with the right parameters in it. This’d be particularly useful for users who aren’t so experienced with Python, but it can also help with reproducibility: having a parameter file stored in the same folder as your outputs, allowing you to easily rerun the processing.

For I while I’ve been trying to work out how to easily implement a way of using parameter files and the standard way of calling the class (as in the example above), without lots of repetition of code – and I think I’ve found a way to do it that works fairly well. I’ve added an extra function to the class which writes out a parameter file:

def write_params(self):
	with open(self.output_basename + "_params.txt", 'w') as f:
		for key, value in self.__dict__.iteritems():
			if key not in ['m', 'c', 'filenames']:
				if type(value) == int:
					valuestr = "%d" % value
				elif type(value) == float:
					valuestr = "%.2f" % value
				else:
					valuestr = "%s" % repr(value)

				f.write("%s = %s\n" % (key, valuestr))

This function is generic enough to be used with almost any class: it simply writes out the contents of all variables stored in the class. The only bit that’ll need modifying is the bit that excludes certain variables (in this case filenames, m and c, which are not parameters but internal attributes used in the class – in an updated version of this I’ll change these parameters to start with an _, and then they’ll be really easy to filter out).

The key thing is that – through the use of the repr() function – the parameter file is valid Python code, and if you run it then it will just set a load of variables corresponding to the parameters. In fact, the code to write out the parameters could be even simpler – just using repr() for every parameter, but to make the parameter file a bit nicer to look at, I decided to print out floats and ints separately with sensible formatting (two decimal places is the right accuracy for the parameters in the particular algorithm I was using – yours may differ). One of the other benefits of using configuration files that are valid Python code is that you can use any Python you want in there – string interpolation or even loops – plus you can put in comments. The disadvantage is that it’s not a particularly secure way of dealing with parameter files, but for scientific algorithms this isn’t normally a major problem.

The result of writing the parameter file as valid Python code is that it is very simple to read it in:

params = {}
execfile(filename, params)

This creates an empty dictionary, then executes the file and places all of the variables into a dictionary, giving us exactly what we’d want: a dictionary of all of our parameters. Because they’re written out from the class instance itself, any issues with default values will already have been dealt with, and the values written out will be the exact values used. Now we’ve got this dictionary, we can simply use ** to expand it to parameters for the __init__ function, and we’ve got a function that will read parameter files and create the object for us:

@classmethod
def fromparams(cls, filename):
	params = {}
	execfile(filename, params)
	del params['__builtins__']
	return cls(**params)

So, if we put all of this together we get code which automatically writes out a parameter file when a class is instantiated, and a class method that can instantiate a class from a parameter file. Here’s the final code, followed by an example of usage:

class Algorithm:
    def __init__(self, input_filename, output_basename, thresh, n_iter=10):
        self.input_filename = input_filename
        self.output_basename = output_basename
        
        self.thresh = thresh
        
        self.n_iter = n_iter

        self.write_params()

    def write_params(self):
        with open(self.output_basename + "_params.txt", 'w') as f:
            for key, value in self.__dict__.iteritems():
                if key not in ['m', 'c', 'filenames']:
                    if type(value) == int:
                        valuestr = "%d" % value
                    elif type(value) == float:
                        valuestr = "%.2f" % value
                    else:
                        valuestr = "%s" % repr(value)

                    f.write("%s = %s\n" % (key, valuestr))
            
    def run(self):
        self.preprocess()
        
        self.do_iterations()
        
        self.postprocess()

    @classmethod
    def fromparams(cls, filename):
        params = {}
        execfile(filename, params)
        del params['__builtins__']
        return cls(**params)
        
    def preprocess(self):
        # Do something, using the self.xxx parameters

    def do_iterations(self):
        # Do something, using the self.xxx parameters
    
    def postprocess(self):
        # Do something, using the self.xxx parameters

And the usage goes something like:

# Create instance with code
alg = Algorithm("input.txt", "output", 0.25, n_iter=20)
alg.run()

# Create instance from parameter file
alg = Algorithm.fromparams('output_params.txt')

How to: Find out what modules a Python script requires

I do a lot of my academic programming in Python, and – even though I often write about the importance of reproducible research – I don’t always document my code very well. This sometimes leads to problems where I have some code running fine, but I don’t know which modules it requires. These could be external libraries, or modules I’ve written myself – and it’s very frustrating to have to work out the module requirements by trial and error if I transfer the code to a new machine.

However, today I’ve realised there’s a better way: the modulefinder module. I’ve written a short piece of code which will produce a list of all of the ‘base’ or ‘root’ modules (for example, if you run from LandsatUtils.metadata import parse_metadata, then this code will record LandsatUtils) that your code uses, so you know which you need to install.

Hopefully, like me, it’ll save you some time.


How to: Solve GDAL error ‘An error occurred while writing a dirty block’

When running GDAL on my university’s supercomputer yesterday I got the following error:

ERROR 1: Landsat_Soton.tif, band 1: An error occured while writing a dirty block

This post is really just to remind me how to solve this error – I imagine the error may have a multitude of possible causes. In my case though, I knew I’d seen it before – and fixed it – but I couldn’t remember how. It turns out that it’s really simple: GDAL is giving an error saying that it can’t write part of the output file to the hard drive. In this case, it’s because the supercomputer that I’m using has quotas for the amount of storage space each user can use – and I’d gone over the quota ‘hard limit’, and therefore the operating system was refusing to write any of my files.

So, the simple answer is to delete some files, and then everything will work properly!

(If you’re not using a shared computer with quotas, then this may be because your hard drive is actually full!)

 

 


How effective is my research programming workflow? The Philip Test – Part 4

10. Can you re-generate any intermediate data set from the original raw data by running a series of scripts?

It depends which of my projects you’re talking about. For some of my nicely self-contained projects then this is very easy – everything is encapsulated in a script or a series of scripts, and you can go from raw data, through all of the intermediate datasets, to the final results very easily. The methods by which this is done vary, and include a set of Python scripts, or the use of the ProjectTemplate package in R. Since learning more about reproducible research, I try to ‘build in’ reproducibility from the very beginning of my research projects. However, I’ve found this very difficult to add to a project retrospectively – if I start a project without considering this then I’m in trouble. Unfortunately, a good proportion of my Phd is in that category, so not everything in the PhD is reproducible. However, the main algorithm that I’m developing is – and that is fully source-controlled, relatively well documented and reproducible. Thank goodness!

11. Can you re-generate all of the figures and tables in your research paper by running a single command?

The answer here is basically the same as above: for some of my projects definitely yes, for others, definitely no. Again, there seems to be a pattern that smaller more self-contained projects are more reproducible – and not all figures/tables of my PhD thesis can be reproduced – but generally you’ve got a relatively good chance. At the moment I don’t use things like Makefiles, and don’t write documents with Sweave, KnitR or equivalents – so to reproduce a figure or table you’ll often have to find a specific Python file and run it (eg. create_boxplot.py, or plot_fig1.py), but it should still produce the right results.

12. If you got hit by a bus, can one of your lab-mates resume your research where you left off with less than a week of delay?

Not really not – it would be difficult, even for my supervisor or someone who knew a lot about what I was doing to take over my work. My “bus factor” is definitely 1 (although I hope that the bus factor for Py6S is fractionally greater than 1). Someone who had a good knowledge of Python programming, including numpy, scipy, pandas and GDAL, would have a good chance at taking over one of my better-documented and more-reproducible smaller projects – but I think someone would struggle to pick up my PhD. In many ways though, that’s kinda the point of a PhD – you’re meant to end up being the World Expert in your very specific area of research, which would make it very difficult for anyone to pick up anyone’s PhD project.

For one of my other projects, it may take a while to get familiar with it – but it should be perfectly possible to take my code, along with drafts of papers and/or other documentation I’ve written and continue the research. In many ways that is the whole point of reproducible research: aiming to develop research that someone else can easily reproduce and extend. The only difference is that usually the research is reproduced/extended after it’s been completed by you, whereas if you get hit by a bus then it’ll never have been completed in the first place!


How to: Create a Landsat Metadata database, so you can select images by various criteria

Recently I ran into a situation where I needed to select Landsat scenes by various criteria – for example, to find images over a certain location, within a certain date range, with other requirements on cloudiness and so on. Normally I’d do this sort of filtering using a tool like EarthExplorer, but I needed to do this for about 300 different sets of criteria – making an automated approach essential.

So, I found a way to get all of the Landsat metadata and import it into a database so that I could query it at will and get the scene IDs for all the images I’m interested in. This post shows how to go about doing this – partly as a reference for me in case I need to do it again, but hopefully other people will find it useful.

So, to start, you need to get the Landsat metadata from the USGS. On this page, you can download the metadata files for each of the Landsat satellites separately (with Landsat 7 metadata split into SLC-on and SLC-off).

1_Website

You’ll want the CSV files, so click the link and have a break while it downloads (the CSV files are many hundreds of megabytes!). If you look at the first line of the CSV file once you’ve downloaded it (you may not want to load it in a text editor as it is such a huge file, but something like the head command will work fine), you’ll see the huge number of column headers giving every piece of metadata you could want! Of course, most of the time you won’t want all of the metadata items, so we want to extract just the columns we want.

The problem with this is that lots of the traditional tools that are used for processing CSV files – including text editors, database import tools and Excel – really don’t cope well with large CSV files. These Landsat metadata files are many hundreds of megabytes in size, so we need to use a different approach. In this case, I found that the best approach was using one of the tools from csvkit, a set of command-line tools for processing CSV files, written in Python. One of the key benefits of these tools is that they process the file one line at a time, in a very memory-efficient way, so they can work on enormous files very easily. To extract columns from a CSV file we want to use csvcut, and we can call it with the following command line

csvcut -c 5,6,1,2,3,7,8,20,25,30 LANDSAT_ETM.csv > LANDSAT_ETM_Subset.csv

This will extract the 5th, 6th, 1st, 2nd 3rd etc columns from LANDSAT_ETM.csv to LANDSAT_ETM_Subset.csv. To get a list of the columns in the file along with their ID number, so that you can choose which ones you want to extract, you can run:

csvcut -n LANDSAT_ETM.csv

After doing this you’ll have a far smaller CSV file in LANDSAT_ETM_Subset.csv that just contains the columns you’re interested in. There’s only one problem with this file – it still has the headers at the beginning. This is great for a normal CSV file, but when we import it into the database we’ll find that the header line gets imported too – not what we want! The easiest way to remove it is using the following command:

cat LANDSAT_ETM_Subset.csv | sed "1 d" > LANDSAT_ETM_Subset.csv

Again, this doesn’t load the whole file in to memory, so will work with large files happily.

We then need to create the database. This can be done with any database system, but to get a simple local database I decided to use SQLite. Once you’ve installed this you can do everything you need from the command-line (you can create the tables using a GUI tool such as SQLite Administrator, but you won’t be able to do the import using that tool – it’ll crash on large CSV files). To create a database simply run:

sqlite LandsatMetadata.sqlite

which will create a database file with that name, and then drop you in to the SQLite console. From here you can type any SQL commands (including those to create or modify tables, plus queries), plus SQLite commands which generally start with a . In this case, we need to create a table for the various columns we’ve chosen from our CSV. It is important here to make sure that the column names are exactly the same as those in the CSV, or the import command won’t work (you can change the names later with ALTER TABLE if needed). You can take the following SQL and modify it to your needs.

CREATE TABLE [images] (
[browseAvailable] BOOLEAN  NULL,
[browseURL] VARCHAR(500)  NULL,
[sceneID] VARCHAR(100)  NULL,
[sensor] VARCHAR(20)  NULL,
[acquisitionDate] DATE  NULL,
[path] INTEGER  NULL,
[row] INTEGER  NULL,
[cloudCoverFull] FLOAT  NULL,
[dayOrNight] VARCHAR(10)  NULL,
[sceneStartTime] VARCHAR(30)  NULL
);

Just type this into the SQLite console and the table will be created. We now need to import the CSV file, and first we have to define what is used as the separator in the file. Obviously, for a CSV file, this is a comma, so we type:

.separator ,

And then to actually import the CSV file we simply type:

.import LANDSAT_ETM_Subset.csv images

That is, .import followed by the name of the CSV file and the name of the table to import into. Once this is finished – it may take a while – you can check that it imported all of the rows of the CSV file by running the following query to get the number of rows in the table:

SELECT COUNT() from images;

and you can compare that to the output of

wc -l LANDSAT_ETM_Subset.csv

which will count the lines in the original file.

Your data is now in the database and you’re almost done – there’s just one more thing to do. This involves changing how the dates and times are represented in the database, so you can query them easily. Still in the SQLite console, run:

UPDATE images
SET startTime=time(substr(images.sceneStartTime,10, length(images.sceneStartTime)));

And then…you’re all done! You can now select images using queries like:

SELECT * FROM images WHERE path=202 AND row=24
AND acquisitionDate > date("2002-03-17","-1 months")
AND acquisitionDate < date("2002-03-17","+1 months")

 Once you’ve got the results from a query you’re interested in, you can simply create a text file with the sceneIDs for those images and use the Landsat Bulk Download Tool to download the images.


How effective is my research programming workflow? The Philip Test – Part 3

7. Do you use version control for your scripts?

Yes, almost always. I’ve found this a lot easier since I started using Git – to start using version control with Git simply requires running “git init” – whereas with SVN you had to configure a new repository and do all sorts of admin work before you could start. Of course, if you want to host the Git repository remotely then you’ll need to do some admin, but all of that can be done at a later date, once you’ve started recording changes to your code. This really helps me start using version control from the beginning of the project – there’s less to interrupt me when I’m in the mood to write code, rather than to fiddle around with admin.

Occasionally I still forget to start using version control at the beginning of a project, and I always kick myself when I don’t. This tends to be when I’m playing around with something that doesn’t seem important, and won’t have any future…but it’s amazing how many of those things go on to become very important (and sometimes get released as open-source projects of their own). I’m trying to remember to run git init as soon as I create a new folder to write code in – regardless of what I’m writing or how important I think it’ll be.

8. If you show analysis results to a colleague and they offer a suggestion for improvement, can you adjust your script, re-run it, and produce updated results within an hour?

Sometimes, but often not – mainly because most of my analyses take more than an hour to run! However, I can often do the alterations to the script/data and start it running within the hour – and colleagues are often surprised at this. For example, someone I’m currently working with was very surprised that I could re-run my analyses, producing maximum values rather than mean values really easily – in fact, within about five minutes! Of course, all I needed to do in my code was change the mean() function to the max() function, and re-run: simple!

I’ve already talked a bit on this blog about ProjectTemplate – a library for R that I really like for doing reproducible research, and the benefits of ProjectTemplate are linked to this question. By having defined separate folders for library functions and analysis functions – and by caching the results of analyses to stop them having to be run again (unless they’ve changed), it really helps me know what to change, and to be able to produce new results quickly. I’d recommend all R programmers check this out – it’s really handy.

9. Do you use assert statements and test cases to sanity check the outputs of your analyses?

Rarely – except in libraries that I develop, such as Py6S. I recently wrote a post on how I’d set up continuous integration and automated testing for Py6S, and I am strongly aware of how important it is to ensure that Py6S is producing accurate results, and that changes I make to improve (or fix!) one part of the library don’t cause problems elsewhere. However, I don’t tend to do the same for non-library code. Why?

Well, I’m not really sure. I can definitely see the benefits, but I’m not sure whether the benefits outweight the time that I’d have to put in to creating the tests. Assert statements would be easier to add, but I’m not sure if I should start using them in my code. Currently I use exception handling in Python to catch errors, print error messages and – crucially – set all of the results affected by that error to something appropriate (normally NaN), and these have the benefit that the code continues to run. So, if one of the 100 images that I’m processing has something strange about it then my entire processing chain doesn’t stop, but the error gets recorded for that image, and the rest of the images still get processed. I actually think that’s better – but if someone has an argument as to why I should be using assert statements then I’d love to hear it.


Software choices in remote sensing

I recently read the article Don’t be a technical masochist on John D. Cook’s blog, and it struck a chord with me about the way that I see people choosing software and programming tools in my field.

John states “Sometimes tech choices are that easy: if something is too hard, stop doing it. A great deal of pain comes from using a tool outside its intended use, and often that’s avoidable…a lot of technical pain is self-imposed. If you keep breaking your leg somewhere, stop going there.”.

I like to think that I can do some GIS/remote-sensing analyses more efficiently than others – and I think this is because I have a broad range of skills in many tools. If the only GIS/Remote Sensing tool you know how to use is ArcGIS then you try and do everything in Arc – even if the task you’re doing is very difficult (or even impossible) to do within Arc. Similarly, if you only know Python and won’t touch R, then you can’t take advantage of some of the libraries that are only available in R, which might save you a huge amount of time. I wouldn’t say I’m an expert in all of these tools, and I prefer some to others, but I have a working knowledge of most of them – and am able to acquire a more in-depth knowledge when needed.

Don’t get me wrong, sometimes it is very important to be able to do things within a certain tool – and sometimes it’s worth pushing the boat out a bit to try and see whether it’s possible to get a tool to do something weird but useful. Often though, it’s better to use the best tool for the job. That’s why, if you watch me work, you’ll see me switching backwards and forwards between various tools and technologies to get my job done. For example:

  • I’m currently working on a project which involves a lot of time series manipulation. When I started this project, the Python pandas library (which deals very nicely with time series) wasn’t very mature, and I wasn’t prepared to ‘bet the project’ on this very immature library. So, even though Python is my preferred programming language, I chose to use R, with the xts library to handle my time-series analysis.
  • I don’t use ArcGIS for remote sensing analysis, and I don’t use ENVI to do GIS. Yes, both programs allow you to deal with raster and vector data, but it’s almost always easier to use ENVI to do the remote sensing work, and then transfer things into ArcGIS to overlay it with vector data or produce pretty output maps (ENVI’s map output options are pretty awful!). I’ve lost track of the number of students I’ve found who’ve been really struggling to do satellite data processing in ArcGIS that would have taken them two minutes in ENVI.
  • If there’s a great library for a different programming language then I use it. For example, I recently needed to create a set of images where each pixel contained a random value, but there was spatial correlation between all of the values. I investigated various tools which purported to be able to do this (including some random Python code I found, ArcGIS, specialist spatial stats tools and R) and in the end the one that I found easiest to get it to work in was R – so that’s what I used. Yes, it meant I couldn’t drive it easily from Python (although using RPy, or, as a last resort, running R from the command line using the Python subprocess module) but that was a far easier option compared to trying to write some code to do this from scratch in Python.

Overall, the tools I use for my day-to-day work of GIS/Remote-sensing data processing include (in approximate order of frequency of use): ENVI, GDAL’s command-line tools, QGIS, ArcGIS, GRASS, eCognition, PostGIS, Google Earth, BEAM, and probably more that I can’t think of at the moment. On top of that, in terms of programming languages, I use Python the most, but also R and IDL fairly frequently – and I’ve even been known to write Matlab, Mathematica and C++ code when it seems to be the best option (for example, Mathematica for symbolic algebra work).

Having a basic knowledge of all of these tools (and, of course, having them installed and set up on my computers and servers) allows me to get my work done significantly faster, by using the best (that is, normally, easiest) tool for the job.


How effective is my research programming workflow? The Philip Test – Part 2

This is the second in my series of posts examining how well I fulfil each of the items on the Philip Test. The first part, with an explanation of exactly what this is, is available here, this time we’re moving on to the next three items in the list:

4. Are your scripts, data sets and notes backed up on another computer?

Let’s take these one at a time. My scripts are nearly always backed up. The exact method varies: sometimes it is just by using Dropbox, but I try to use proper source control (with Git and Github) as much as possible. The time that this falls apart is when I’ve been developing some code for a while, and just somehow ‘forgot’ to put it in source control at the start, and then never realise! This is particularly frustrating when I want to look at the history of a project later on and find one huge commit at the beginning with a commit message saying “First commit, forgot to do this earlier – oops”.

Of course, Git by itself doesn’t count as a backup, you need to actually push the commits to some sort of remote repository to get a proper backup. I try to keep as much of my code open as possible, and make it public on Github (see the list of my repositories), but I can’t do this with all of my code – particularly for collaborative projects when I don’t have the permission of the other contributors, or when the license for parts of the code is unknown). For these I tend to either have private repositories on Github (I have five of these free as part of a deal I got), or to just push to a git remote on my Rackspace server.

Notes are fairly straightforward: electronic notes are synchronised through Dropbox (for my LaTeX notes), and through Simplenote for my other ASCII notes. My paper notes aren’t backed up anywhere – so I hope I don’t loose my notebook!

Data is the difficult part of this as the data I use is very large. Depending on what I’m processing, individual image files can range from under 100Mb to 30-40Gb for a single image (the latter is for airborne images which have absolutely huge amounts of data in them). Once you start gathering together a lot of images for whatever you’re working on, and then combine these with the results of your analyses (which will often be the same size as the input images, or possibly even larger), you end up using a huge amount of space. It’s difficult enough finding somewhere to store this data – let alone somewhere to back it up! At the moment, my computer at work has a total of 4.5Tb of storage, through both internal and external hard drives, plus access to around 1Tb of networked storage for backup – but I’m having to think about buying another external hard drive soon as I’m running out of space.

One major issue in this area is that university IT services haven’t yet caught up with ‘the data revolution’, and don’t realise that anyone needs more than a few Gb of storage space – something that really needs to change! In fact, data management by itself is becoming a big part of my workload: downloading data, putting in sensible folder structures, converting data, removing old datasets etc takes a huge amount of time. (It doesn’t help that I’m scared of deleting anything in case I need it in future!).

5. Can you quickly identify errors and inconsistencies in your raw datasets?

Hmm, I’d probably say “most of the time”. The problem with working on satellite images is that often the only sensible way to identify errors and inconsistencies is to view the images – which is fun (I like actually looking at the images, rather than always working with the raw numbers), but time-consuming. As for non-image data, I find a quick look at the data after importing, and using some simple code to sanity-check the data (such as np.all(data > 0) to check that all of the data have positive values) works well.

The key tools that allow me to do this really easily are Python – particularly with numpy and pandas, ENVI for looking at satellite images (unfortunately I haven’t found any open-source tools that I am quite as productive with), and text editors for reading files. I often use Excel for looking at raw CSV data, although I hate how much Excel pesters me about how “not all features are supported in this file format” – I’d really like a nice simple ‘CSV file viewer’, if anyone knows of one?

6. Can you write scripts to acquire and merge together data from different sources and in different formats?

Yes – but only because I have access to such brilliant libraries.

One thing I end up doing a lot of is merging time series – trying to calculate the closest measurement from a satellite to some sort of ground measurement. I’ve done this in a couple of different ways: sometimes using xts in R and sometimes with Pandas in Python. To be honest, there isn’t much to choose between them, and I tend to use Python now as most of my other code is written in Python.

GDAL/OGR is an essential tool for me to access spatial data through Python code – and, depending on the application, I often use the nicer interfaces that are provided by fiona, rasterio and RIOS.

More to come in the next installment…


How to: Fix ‘WARNING: terminal is not fully functional’ error on Windows with Cygwin/Msysgit

For a while now I’ve been frustrated by an error that I get whenever I’m using git on Windows. When I try and run certain git commands – such as git log or git diff – I get the following message:

Git error message stating "WARNING: terminal not fully functional"

Git error message stating “WARNING: terminal not fully functional”

The error message “WARNING: terminal not fully functional” appears, but if you press return to continue, the command runs ok (albeit without nice colours). I’d just put up with this for ages, but decided to fix it today – and the fix is really simply. Basically, the problem is that the TERM environment variable is set to something strange, and git can’t cope with it – so warns you that there might be trouble.

So, to fix this, you just need to make sure that the TERM environment variable is set to “xterm”, which is the standard graphical terminal on Linux machines. There are a number of ways that you can do this:

  • Set this variable using the built-in Windows interface – see these instructions
  • Use a helpful tool to make this easier (particularly if you often set other environment variables, as I do), like Rapid Environment Editor
  • Set it in an initialisation batch script – such as the init.bat file that is used by the Cmder swish Windows command-line tool

Once you’ve set this (you can even test it by just running set TERM=xterm on the command line), everything should work fine – and you’ll get colours back again:

GitErrorFixed