Robin's Blog
A remote-sensing PhD student talking about interesting things…
Show MenuHide Menu

Archives

March 2011
M T W T F S S
« Feb   Apr »
 123456
78910111213
14151617181920
21222324252627
28293031  

Want to write some code? Get away from your computer!

March 19, 2011   

I’ve recently realised something. The best place to write code isn’t in front of your computer, with your compiler, IDE and tools. The best place to write code is far, far away from any of these tools – somewhere where you can think properly. For a language with which you are fairly familiar, the mechanics of translating the program in your mind to a program that the compiler can compile (or the interpreter can interpret) is fairly easy – it’s coming up with that program in your mind which is hard.

The other day I was on a train journey. I had my laptop, but no internet. Unfortunately I was using a commercial programming language (IDL, as it happens) for which I need to use my university’s site license. As I didn’t have access to the internet, I couldn’t get hold of the site license, so couldn’t run the compiler and IDE. Say what you like about commercial programming languages which require expensive licenses, but it stopped me from actually writing code in my editor with the compiler. And…guess what…it actually made me think!

I guess this post is somewhat along the lines of Does Visual Studio rot the mind? and the following quote:

One of the best lessons I learnt from my first boss was: “when your code doesn’t behave as expected, don’t use the debugger, think.”

That is what being away from your compiler forces you to do. It’s very easy to slip into the mindset of:

  1. Write a bit of (fairly bad) code
  2. Compile and run
  3. Test with a poorly chosen test case
  4. Find it doesn’t work
  5. Make small change to the code on the off-chance that it might solve the problem
  6. Repeat…

Of course this leads to code in the end that is ill-understood by the programmer, probably fairly buggy and not well tested.

Being away from the computer forces you to run through all of the thoughts in your head – which tends to take longer than getting a computer to compile and run your code (for small code bases at least…). So you don’t tend to make tiny changes and re-run things, you tend to actually think about what the code is doing. Until I did this on the train the other day, I hadn’t actually run a piece of code on paper (that is, written down columns for each of the variables and worked out what each value will be at each stage in the program) since my Computing A-Level exam!

In the case of the code I was writing the other day, I managed to produce some high quality, fast, bug-free code by writing it in long-hand on a piece of paper, thinking about it, gradually typing up bits of it, thinking some more, and then after a long time trying it in the compiler. The code (which was some region-growing image segmentation code which involved lots of recursion) was eventually copied from my piece of paper to my IDE, compiled (with only one syntax error – impressive I think) and ran correctly first time (and completed all of the tests that I had also devised on paper).

Job well done, I think, and a useful piece of advice, I hope.

How to: Set up a simple service to run in the background on a Linux machine (using daemontools)

March 19, 2011   

I have just set up a new home server (a review of which will be coming soon) and have been installing various programs that I want to run on it. A number of these are servers, such as sshd, apache, samba etc. All of these have fairly easy installs under Debian and will automatically run at startup, and can be controlled by /etc/init.d scripts.

However, I also have a number of other programs I want to run as services, constantly in the background, which don’t come with nice Debian init.d scripts. After asking a question on SuperUser I found that one fairly simple solution would be to use a set of tools called daemontools. These tools provide a simple way of defining services which run constantly, and can be controlled by an administrator (in a similar way to /etc/init.d services). daemontools seem to be designed very well, and are quite easy to use, but the documentation on the website seems to lack a simple quickstart guide…so I thought I’d write my own.

At this point I should mention that I have only been using daemontools for a few hours, so I could be completely wrong about anything I say below. These instructions will be for Debian, but should be fairly easily to use with other distros (the only bit that will be significantly different is exactly how to install it with the package manager). Anyway, proceed at your own risk!

  1. Install daemontools

    In debian you will need to run apt-get install daemontools daemontools-run (both packages are important – I didn’t install the latter package and it caused me lots of frustration). This will install the tools themselves and also add the required lines to startup files to ensure that all of the required daemontools services start when the machine boots.

  2. Create a service directory

    You will probably find that the installer has created a /service directory for you. If it hasn’t then create one yourself. Then create a directory under that directory for each service that you want to run. Here we will be creating a test service, so create a directory called test. Run chmod 1755 on this directory.

  3. Create the service run file

    daemontools needs to know what command(s) you need to run for this service, and these commands should be put inside a shell script called run in the service directory. For example, the file could contain:

     

    #!/bin/sh
    echo Running service
    exec some-command-here

  4. Finished!

    That should be all you need to get the service running. You should probably restart the machine now as that will ensure that all of the daemontools monitoring services have started correctly. Once the machine has started the new service should have started running. If it crashes or ends for some reason it will restart after one second. Any new services you add (which you can do exactly as above) should start within five seconds. You can use the svc command to control the services you have created (see man svc for details)

Review: R Graphs Cookbook by Hrishi Mittal

March 11, 2011   

Summary: Very useful for reference while producing graphs, and very comprehensive (including heat-maps, 3D graphs and maps).

Reference: Mittal, H. V., 2011, R Graph Cookbook, Packt Publishing, Birmingham, UK, 272 pages, Amazon Link Publisher’s Website

R Graphs Cookbook CoverAs a scientist I often need to plot graphs of my data, so I am keen to learn more about how to do this in various languages. I tend to use R for most of my statistical analysis, so plotting graphs in R is something that I often need to do. I have a bit of knowledge about R already (mainly gained from the books that I have previous reviewed about R), and looked to this book to explain more about graphing in R. As stated in the title it is a ‘cookbook’ – a type of technical book that provides a number of ‘recipes’ for performing various tasks, and this is both one of the main advantages and main disadvantages of this book.

I generally have a love-hate relationship with these ‘cookbook’-style books – I find them useful when wanting a quick answer to something, but I am slightly concerned about the manner in which the teaching/learning takes place. These books can be very useful, as the cookbook style allows the reader to very quickly learn how to do something which they need to do, but this learning does not always take place within a context which allows the reader to understand why they are doing what they are doing. For example, in a book like this, the reader could be told exactly what commands to type to plot a line graph – but they may not actually learn anything about what each of these commands do, and how to adjust them if they need to do something very slightly different.

However, I am pleased to say that this book is actually very good. It starts with an overview chapter that contains basic recipes for plotting various types of graphs (all of which are covered in greater detail later on in the book) as well as exporting the graphs to be used in other documents. Then comes one of the most important chapters – a detailed explanation of the par() command for adjusting parameters such as margins, colours, fonts and styles. Again, this is presented in ‘recipe’ form (of which more below) which again is a double-edged sword: it makes it easy to find the parameter setting you’re looking for, but harder to get an overview of the range of different parameters you can set. A simple table at the end of this chapter listing the parameters and the possible options for each of them would have been very useful – but was sadly not included.

The rest of the book goes through a number of types of graph, providing detailed recipes for creating them. They start with the most important types of plot: scatter graphs and line graphs (with a helpful emphasis on plotting time-series data with sensible axes labels) before moving on to bar charts, pie charts, histograms and box and whisker plots. All of this would be expected in a book on graphing software – however, this book goes further by providing a section on heat-maps and contour plots, and then a section on creating maps. The heat-maps section is particularly interesting, and I can see a number of applications of the example visualisations they have provided. The book then closes with a chapter on exporting graphs for display – both to raster and vector formats.

As mentioned already, all of the information is provided in the form of ‘recipes’, which have a standard format of: introduction, getting ready, how to do it, how it works…, there’s more… and see also. This tends to work well for most parts of the book – with the introduction explaining the type of graph and why you might want to use it, getting ready showing you how to load the required libraries, how to do it providing code and how it works explaining the code, with more options being explained in there’s more and see also. However, this falls down slightly when dealing with topics that require a little more explanation – such as the section on exporting graphics for publication, which could really do with having a more detailed section on the difference between raster and vector output, and how to choose between them.

The book generally choses sensible datasets to plot for each graph, although at times the code is made unnecessarily confusing by adding lots of code to download datasets via web APIs (useful to be able to do, but perhaps not hugely relevant to the topic of this book). Apart from this, the code is generally well written, although some extra comments in the code might have been helpful – as it would save me constantly referencing between the how to do it and how it works sections.

Overall, I will definitely keep this book on my shelf as a handy reference for when I need to create a graph quickly in R, although I would recommend combining this book with another book (for example R in a Nutshell) for more details on the graphing functions and the rest of R.

(Disclaimer: I was given a free review copy of this book)

My new bookcase – arranged by the XKCD ‘Purity’ comic

March 4, 2011   

I’ve recently moved in to a new flat, and have bought lots of bookcases to store all of my books. Of course, I had the terrible decision to make of how to arrange all of my science books. I mean I could categorise them fairly easily (maths, physics, biology etc), but what order do I put these in?

When thinking of this I thought of the xkcd comic below, and had an idea:

Why not arrange them by ‘purity’? Well – that’s what I did (see below):

Science bookshelf photo

I should point out that computing is definitely not purer than mathematics, but it is on the top shelf as that is the only shelf that my fiancee can’t reach, and she rarely (if ever) uses those books. Apart from that though, it is pretty much in purity order…

Review: Python Geospatial Development by Erik Westra

March 2, 2011   

Summary: Great book – both for GIS concepts and for teaching Python libraries. Lives up to the boast on the front cover – you really will learn to create complete mapping applications, learning a lot of useful tools and techniques on the way.Python Geospatial Development Front Cover

Reference: Westra, E., 2010, Python Geospatial Development, Packt Publishing, Birmingham, UK, 508 pages Amazon Link

Before I start this review I should probably point out that I am a PhD student working in Remote Sensing and Geographical Information Systems, so I expected to know a fair amount of the theory in this book. The reason I wanted to read it though, was to learn how to do this analysis using Python and its associated libraries. This book succeeded in teaching me how to use Python to perform geospatial analyses, and actually taught me a significant amount of wider GIS knowledge which I had not picked up through any of my university courses.

The book is divided into four main sections: the first introduces general GIS concepts, the second explains basic GIS operations in Python, the third shows how to use databases with geographic data and the fourth combines all of the previous information into two GIS web-apps. It is always difficult to work out what level to pitch this sort of book at – as a number of potential readers will already have experience with GIS (and are using the book to learn about doing GIS analysis in Python, like me), but some will be complete beginners who want to introduce map-based analysis into their applications. The first few chapters of this book are pitched nicely at a mid-point between these two reader groups: the author explains things clearly and precisely without seeming patronising. Although I already knew much of the basics, I found the section on projections and co-ordinate systems very useful as I had never properly understood these (I’d always seen the different options in software I was using for Projected Co-ordinate Systems and Geographic Co-ordinate Systems, but I never knew the difference until I read this book!).

The next section explains how to use a number of Python GIS libraries such as GDAL/OGR, PyProj, and Shapely. The author starts with a general description of the capabilities of each library, and continues with a ‘cookbook-style’ approach showing how to do various tasks with these libraries. For example, instructions are given for how to convert projections and calculate Great Circle Distances with PyProj, how to extract shape geometries and attributes from shapefiles using OGR and how to do basic GIS analysis with Shapely. Details are given on joining these libraries together to exchange data between them using the very useful Well-Known Text (WKT) format – a format that I hadn’t come across, but which appears to be very useful. This section finishes by putting together a number of these libraries to do some real-world tasks such as identifying parks near urban areas.

The third section focuses on geodatabases – an area I know very little about. This section gives a good overview of the concept of a geodatabase, and then specific details about three geodatabases: PostGIS, MySQL and SpatiaLite. I was pleased to see that three contrasting databases were chosen, and a good listing of advantages and disadvantages of each was given. After introducing the workings of these databases, code examples for linking them to Python are given, starting from basic queries and going right up to complex spatial analysis performed within the database. A geospatial application (called DISTAL) is then implemented, showing how to combine geodatabase access with the GIS analyses explained in the previous section. This is implemented as a web-application, but previous experience with web programming is not needed as it is implemented using simple Python CGI scripts, and there are sidebars explaining terms that the reader may not have come across before.

The fourth section is by far the most complicated, and deals with producing maps using a library called Mapnik and producing geo-enabled web-applications using GeoDjango. I must admit that I didn’t quite follow all of this chapter, although this is probably because I’m not hugely interested in, or experience with, building web-apps. In some ways a little too much emphasis is made of how to do things using Django – and trying to introduce any web-app framework (be it Django, Ruby on Rails, or anything else) in one chapter is a tall order – and not enough on the GIS, but I can see why the author included it – as it brings together a fair amount of the tools covered in the book into one coherent whole.

Overall, I’m very impressed with this book. If I had my way (and you never know, if I end up as a lecturer one day I might…), I’d make Chapter 2 part of the core reading for any GIS course, as I am completely shocked that it covers areas that I have never covered even when doing Advanced GIS courses at degree level! I should mention that as well as the chapters mentioned above there is a useful chapter on sources of geospatial data (which, again, mentioned sources that I’d not heard of), and a comprehensive index which makes it very easy to find things. The instructions on how to use the Python libraries (and, more importantly, how to join them together) are well-written and comprehensive and the introductions to GIS concepts are pitched at just the right level. I would thoroughly recommend this book for any GIS or geospatial data user for two main reasons: firstly, it gives great introductions to GIS concepts they may not have come across, and secondly, knowing how to do these things in Python can make certain jobs so much easier (how about a 10 line Python script rather than a few hours of repetitive data conversion?).

(Disclaimer: I was provided with a free review copy of this book)