Robin's Blog
A remote-sensing PhD student talking about interesting things…
Show MenuHide Menu

Category Archives: Books

Review: Programming ArcGIS 10.1 with Python Cookbook

March 18, 2013

Summary: A useful guide to automating ArcGIS using Python, which is fully up-to-date with the latest version of ArcGIS. Definitely provides “quick answers to common problems”, but it may take more effort to get a deep understanding of the methods used. Good breadth of coverage – but notably lacks raster examples – and well explained throughout. I have recommended to colleagues using ArcGIS who want to start off with Python. 
4445OT_0

Reference: Pimpler, E., 2013, Programming ArcGIS 10.1 with Python Cookbook, Packt Publishing, 508pp Amazon LinkPublishers Link (with sample chapter and Table of Contents)

I wrote this review in consultation with my wife, Olivia, who has been an ArcGIS user for a few years, but is fairly new to Python programming (she’s done a bit in the last few months, but nothing within ArcGIS). I am more experienced with Python – having released a number of packages to the Python Package Index, and having used Python with ArcGIS in various contexts with both ArcGIS 9.3 and 10.1, but I thought it would be useful to get an opinion from both a more- and less-experienced user.

We both found the book useful, and learnt a lot from it, but weren’t entirely convinced by the cookbook format. In general, if you like cookbook-style books then this is likely to appeal to you, but if they annoy you then you won’t like it. The benefits of the cookbook format are that it is very easy to pick up the book, find the relevant recipe for what you’re trying to achieve, turn to it and do it immediately, as each recipe is almost entirely self-contained. Of course, the disadvantage is learning in this way lends to lead to superficial knowledge rather than deep understanding: you may know what to do to achieve a certain result, but not why you’re doing it, and thus cannot adapt the recipes easily to other situations. Another disadvantage is that the recipes are often very repetitive – for example, each recipe in this book starts with the saying “Import the ArcPy module using import arcpy”, and often “import arcpy.da” – something which could be mentioned once and then referred to in future recipes.

Moving away from the cookbook format, which is very much a matter of personal taste, the book covers a wide range of tasks that you can scripts using Python in ArcGIS including interacting with the map window, laying out and printing maps, running geoprocessing operations and accessing the raw data in each layer. The ordering of the book may seem strange to you if you are interested in automating processing operations, rather than automating layout and map style operations, as the geoprocessing part of the book doesn’t start until chapter 6. However, when you get to this section it is very good, and chapters 6 to 9 provide almost all of the information needed to automate geoprocessing tools either using the built-in ArcGIS geoprocessing tools, or by writing your own geoprocessing algorithms using the raw spatial and attribute data.

I was impressed to see coverage of creating ArcGIS Addins using Python in chapter 11, as many ArcGIS experts are not aware of the possibility of creating addins using Python. These addins can provide very user-friendly ways to interact with processes which have been automated in Python, so it is good to see they were covered in this book. I was also pleased to see a chapter on error handling which covers both generic Python error-handling techniques (such as the try-except block) and ArcPy-specific methods, including how to get error messages from ArcGIS into Python. However, there is a noteable absence of any recipes involving raster data – the only exception is in chapter 10 when the use of the Describe() function to find out information about images is explained. However, ArcGIS 10.0 introduced various functions to operate on rasters – including returning the raster data as a numpy array, and allowing updates of raster data from numpy arrays – the coverage of which would really have made the book complete.

In terms of ancilliary information in the book: there is a useful twenty page introduction to the Python language, as well as useful appendices covering automation of Python scripts (using the Windows command-line, and Task Scheduler) and “Five Things Every GIS Programmer Should Know How to Do with Python” (including downloading via FTP, sending emails, and reading various types of files).

The book is generally well-written and formated, and things are explained well – although the quality of the images used is lacking sometimes (with JPEG compression artefacts clearly visible). Overall, the book is a very useful addition to a GIS library for people who are new to automating ArcGIS using Python, and particularly those who want to find out quickly how to automate a particular operation.

Review: Code – The Hidden Language of Computer Hardware and Software by Charles Petzold

December 29, 2012

Summary: This book takes you all the way from Morse Code to a fully working computer, explaining everything along the way. What’s more, it’s a great read too! If you ever wondered how a computer worked then buy this and read it – even if you think you already know (unless you’re, you know, a chip designer at Intel or something!) Front Cover

Reference: Petzold, C., 2000, Code: The Hidden Language of Computer Hardware and Software, Microsoft Press, 395pp Amazon Link

As you’ll probably know if you’ve read many articles on this site: I’m a computer programmer and general ‘geek’. So, it won’t surprise you to know that I am quite interested in how computers work – and picked up this book thinking that I’d already know quite a lot of it. I knew a fair bit – but I learnt a huge amount from reading it, and it helped me gain a full understanding of what is going on when I write computer programs – right down to the level of the electricity inside the processor. By the end of the book I was itching to buy lots of relays or transformers and make a computer on my living room table!

The book starts by looking at the ways you, as a child, might try and communicate with your best friend who lives across the street – after your parents think you’ve gone to bed. The natural solution to this is Morse code using a torch, and Petzold takes this simple code as a good starting point to explain the concepts of a code. He then moves on to Braille, which is significantly more complex than I thought, and which gives the opportunity to look at some of the more complex things you find in codes (eg. shift characters and escape characters – both of which Braille has). You’ll note that nothing about computers has been introduced yet – and that is a key feature of the first part of the book, it doesn’t go straight in to “this is how a computer works”, it starts at a very basic (but still interesting) level that becomes useful when thinking about computers later in the book, but isn’t too scary.

Electricity and electrical circuits are introduced when describing how you might communicate with another friend whose window you can’t see from yours. This is introduced almost entirely from scratch – explaining how circuits work, what voltage is, how batteries work etc – but it actually went beyond my previous knowledge in electricity fairly quickly, and taught me much of interest. Whenever circuits are drawn in the book – from here onwards – they are shown with the wires that have current in them in red, making it very easy to see what is going on.

The discussion of electricity for sending messages leads into the history of telegraph networks, and then the concept of relays. I’d never really understood relays before, but Petzold introduces them with a very good analogy as a ‘labour saving device’ at a telegraph station. Around this point a number of other key – but rather unrelated – topics are covered like Boolean logic (True/False, AND, OR etc) and number systems (particularly number bases and binary). There is a very practical emphasis on everything – and the point about the importance of binary as on/off, true/false, open/closed and so on, is very much emphasised. After these introductions, the relays discussed earlier are combined to produce logic gates (AND, OR, NOT, NAND, XOR and so on) with the aim of producing a circuit to help you choose a cat (yes, it sounds strange, but works well as an example!). Here you can start to see how this is moving towards a computer…

I’m not going to go much further into detail about the rest of the book, except to say that you move towards being able to ‘build’ (conceptually if not actually physically) a fully-working computer gradually, one step at a time. From logic gates, to adding circuits and subtracting circuits and from clocks to flip-flops and RAM you gradually work up to a full, programmable computer which you have basically built by page 260! Given how much detail everything is explained in – and how little knowledge is assumed – fitting it into 260 pages is very impressive!

Of course, the book continues past page 260, going on to cover topics including input and output (from keyboards and to the screen), high and low level programming languages, graphics, multimedia and more. Interestingly, transistors aren’t mentioned until after you’ve got almost all of the way to building a computer – but this is almost certainly because relays are far easier to understand, and accomplish the same job. Once they have been introduced, a couple of important processors (the Intel 8080 and the Motorola 6800) are examined in detail – a really interesting opportunity to see how the concepts you’ve learnt about have been applied in real life by chip designers.

I can think of very few issues with this book – although the last chapter does read rather strangely, as if the author was trying to fit far too much into far too little space (trying to cover multimedia, networking, WIMP interfaces and more in one chapter is a bit of a tall order though!), but I very much like the book as a whole. It is one of those rare books that is suitable for a very wide range of audiences – from those with almost no knowledge of the subject at all (it starts from the very beginning, so that isn’t a problem) right up to those who are experienced programmers and know some of it (they will still find a lot they don’t know, and realise a lot of things). Overall: a great read, very interesting and very educational. You won’t be disappointed.

Review: Machine Learning: An Algorithmic Perspective by Stephen Marsland

April 26, 2011

Summary: Great book – clear explanations, useful example code and a friendly, easy-going writing style. One of my favourite academic books ever! Front Cover

Reference: Marsland, S., 2009, Machine learning: An Algorithmic Perspective, Chapman & Hall/CRC, Boca Raton, Florida, 390pp Amazon Link

Machine Learning can be a difficult topic – as I found out when taking a Masters-level machine learning course this year. It can become very mathematical – particularly when dealing with complicated areas such as Support Vector Machines – and it is very difficult to pitch a university lecture course at a level where all of the students can understand it. Unfortunately, in my course I was one of the students who didn’t really understand the lectures particularly well…but this book saved me! In fact, it was so well written that I was reading it in bed at night, and staying up late to finish the chapter!

I firmly believe that fields such as Machine Learning and other practical computing topics (such as Computer Vision, Statistics, Programming etc) should be taught using practical examples and practical teaching sessions wherever possible. Also – for all but the most complex topics – full algorithms should be provided, and implementations in appropriate programming languages shown. This book definitely fulfils this – showing detailed algorithm explanations for all of the algorithms considered (apart from Support Vector Machines, which the author decided – sensibly in my opinion – were too difficult to cover in full detail), and implementing them in Python versions. Parts of the python code are shown in the book, and all of the code is available online.

The explanations and motivations for the techniques shown in the book are brilliant – both easy-to-read and comprehensive. Mathematical explanation are clear – with ‘big picture’ explanations given in textual form for those who want to skip the detailed maths – and diagrams are well-chosen and easy to understand. Furthermore, sensible examples are given for the uses of machine learning – ranging from standard datasets like iris, to more unusual examples like ozone layer depth prediction.

The book covers most of what you’d need for an introductory Machine Learning course, starting with perceptrons, before moving to multi-layer perceptrons, radial basis functions, support vector machines, decision trees, unsupervised techniques and genetic algorithms. The book also covers introductions to probability (including Bayesian inference), dimensionality reduction (including PCA and LDA) and optimisation techniques. The wide range of techniques covered makes this useful not just for machine learning students: genetic algorithms, dimensionality reduction and optimisation have many applications in other fields.

This is a shorter review than many of my reviews, as I really can’t find much to fault with the book. It’s great. My advice is, if you’re interested in this topic, need to learn it for a course, or think you’ll want to use Machine Learning techniques in your work then buy it – you won’t regret it!

Review: Matplotlib for Python Developers by Sandro Tosi

April 14, 2011

Summary: I’d recommend this for people interested in adding Matplotlib functionality to GUI and web applications, and for those who need a bit more information on how to do advanced plotting with Matplotlib. Most general users will be able to get the information they need from the Matplotlib website.

Reference: Tosi, S., 2009, Matplotlib for Python Developers, Packt Publishing, Birmingham, UK, 293 pages, Amazon Link Publisher’s Website

Matplotlib for Python Developers coverThis book is designed to teach readers how to do two things: make graphs with matplotlib, and embed matplotlib into GUI and web-based applications. My primary focus in my working life is on creating graphs using Python directly, rather than embedding graphs in other applications, although I was interested to learn about the latter, as this may come in useful in a few years time (for example, when building GUI front-ends for some of my modelling code). I think this book succeeds more in the latter objective than the former – as much of the first half of the book seems to be repeating information that is easily available on the Matplotlib website – where it is actually explained and phrased better!

Starting at the beginning, the book explains what matplotlib is, and gives a list of good reasons for using Matplotlib. However, there is then a fairly long section on output formats and Matplotlib backends. This is fairly technically detailed, and not totally easy to understand, so presenting it before the ‘Getting Started with Matplotlib’ chapter seems rather strange. This ‘Getting Started with Matplotlib’ section covers the basics of plotting including plotting lines, changing axes, adding gridlines and creating legends. It goes on to explain how to save graphs to files, how to use the interactive plot display window, and how to use the ipython pylab mode. After this, though, there is another strange arrangement of chapters – with a detailed section on matplotlib configuration files. This, again, is fairly detailed, and not hugely useful (as I haven’t yet seen any examples online where people have chosen to modify the configuration files), and it is strange to have it before the (far more useful) section on customising plots. This section is fairly good, covering how to change line styles and colours, polar plots, and plotting text and annotations. However, there is no mention of how to include more than one plot in a figure – which seems a bit of a strange absence.

The next chapter deals with object-oriented Matplotlib – that is, accessing the constituent objects directly rather than through the PyPlot commands. This is interesting, particularly as it isn’t explained particularly well on the Matplotlib website (that really just has an explanation of all of the methods in each class), but it could do with more information on the ‘big picture’ – a basic class diagram showing how all of the difference classes (Figure, Axes etc). I was pleased to see the inclusion of a section on plotting dates – something which I find I need to do quite often, but which often isn’t covered in tutorials and books.

The first part of the book now seems finished, and the book starts the section about integrating Matplotlib with various GUI and web frameworks. However, there is still one more useful chapter for those who aren’t interested in web or GUI programming, and that is the last chapter – ‘Matplotlib in the Real World’ – which covers a number of examples of using Matplotlib in the real world. Again, this seems a slightly strange arrangement of chapters, as none of the content of this chapter depends on anything covered in the GUI and web programming chapters. As for the examples, they are fairly good examples of real-world usage, and use freely-available data from the internet. As with all real-world examples of code – the code is more complicated than it possibly should be in a teaching book – but it does show you how to do everything from obtaining the data to producing the final plot. This chapter also contains a fairly detailed section on plotting maps in Matplotlib – again, a useful section, but it feels like a rather unusual inclusion.

It would be unfair of me to make much comment on the chapters of the book on integrating Matplotlib with GTK+, Qt4 and WxWidgets as the author emphasises that the chapters will not teach you the basics of programming in these GUI frameworks, and I have no experience in using them. However, I understood most of the content of these chapters without having any experience with the GUI frameworks, which bodes well for how useful the chapters will be for those with more experience. I have more experience with web programming, and found this chapter very useful. It gave me a number of ideas for how I could include matplotlib plotting in an interactive data visualisation web-application I am hoping to develop for a monitoring system that I have designed. I was originally planning to use CGI scripts to run a python script which wrote its output to a file, which was then displayed in the page, but through reading this chapter I found a far better way to do it – by using Matplotlib’s functionality to send the output directly to stdout – which, in this case, is the webpage itself. Useful tips are also given about using StringIO to a similar thing.

Overall it is the second half of the book which is really useful – particularly the sections on using Matplotlib with GUI applications, web applications, and the real-world usage examples. The first two chapters seem to repeat, in a more confusing way, the Matplotlib tutorial available on the project’s website. I must admit to being slightly concerned at the standard of English throughout the book, which may hint at poor proof-reading. Examples include “e is the most common letter in the English writings” and “Matplotlib has a plotting function ad hoc for dates” – not a major problem, but something that grates on me when I’m reading.

(Disclaimer: I was given a free copy of this book by Packt Publishing)

Review: R Graphs Cookbook by Hrishi Mittal

March 11, 2011

Summary: Very useful for reference while producing graphs, and very comprehensive (including heat-maps, 3D graphs and maps).

Reference: Mittal, H. V., 2011, R Graph Cookbook, Packt Publishing, Birmingham, UK, 272 pages, Amazon Link Publisher’s Website

R Graphs Cookbook CoverAs a scientist I often need to plot graphs of my data, so I am keen to learn more about how to do this in various languages. I tend to use R for most of my statistical analysis, so plotting graphs in R is something that I often need to do. I have a bit of knowledge about R already (mainly gained from the books that I have previous reviewed about R), and looked to this book to explain more about graphing in R. As stated in the title it is a ‘cookbook’ – a type of technical book that provides a number of ‘recipes’ for performing various tasks, and this is both one of the main advantages and main disadvantages of this book.

I generally have a love-hate relationship with these ‘cookbook’-style books – I find them useful when wanting a quick answer to something, but I am slightly concerned about the manner in which the teaching/learning takes place. These books can be very useful, as the cookbook style allows the reader to very quickly learn how to do something which they need to do, but this learning does not always take place within a context which allows the reader to understand why they are doing what they are doing. For example, in a book like this, the reader could be told exactly what commands to type to plot a line graph – but they may not actually learn anything about what each of these commands do, and how to adjust them if they need to do something very slightly different.

However, I am pleased to say that this book is actually very good. It starts with an overview chapter that contains basic recipes for plotting various types of graphs (all of which are covered in greater detail later on in the book) as well as exporting the graphs to be used in other documents. Then comes one of the most important chapters – a detailed explanation of the par() command for adjusting parameters such as margins, colours, fonts and styles. Again, this is presented in ‘recipe’ form (of which more below) which again is a double-edged sword: it makes it easy to find the parameter setting you’re looking for, but harder to get an overview of the range of different parameters you can set. A simple table at the end of this chapter listing the parameters and the possible options for each of them would have been very useful – but was sadly not included.

The rest of the book goes through a number of types of graph, providing detailed recipes for creating them. They start with the most important types of plot: scatter graphs and line graphs (with a helpful emphasis on plotting time-series data with sensible axes labels) before moving on to bar charts, pie charts, histograms and box and whisker plots. All of this would be expected in a book on graphing software – however, this book goes further by providing a section on heat-maps and contour plots, and then a section on creating maps. The heat-maps section is particularly interesting, and I can see a number of applications of the example visualisations they have provided. The book then closes with a chapter on exporting graphs for display – both to raster and vector formats.

As mentioned already, all of the information is provided in the form of ‘recipes’, which have a standard format of: introduction, getting ready, how to do it, how it works…, there’s more… and see also. This tends to work well for most parts of the book – with the introduction explaining the type of graph and why you might want to use it, getting ready showing you how to load the required libraries, how to do it providing code and how it works explaining the code, with more options being explained in there’s more and see also. However, this falls down slightly when dealing with topics that require a little more explanation – such as the section on exporting graphics for publication, which could really do with having a more detailed section on the difference between raster and vector output, and how to choose between them.

The book generally choses sensible datasets to plot for each graph, although at times the code is made unnecessarily confusing by adding lots of code to download datasets via web APIs (useful to be able to do, but perhaps not hugely relevant to the topic of this book). Apart from this, the code is generally well written, although some extra comments in the code might have been helpful – as it would save me constantly referencing between the how to do it and how it works sections.

Overall, I will definitely keep this book on my shelf as a handy reference for when I need to create a graph quickly in R, although I would recommend combining this book with another book (for example R in a Nutshell) for more details on the graphing functions and the rest of R.

(Disclaimer: I was given a free review copy of this book)

Review: Python Geospatial Development by Erik Westra

March 2, 2011

Summary: Great book – both for GIS concepts and for teaching Python libraries. Lives up to the boast on the front cover – you really will learn to create complete mapping applications, learning a lot of useful tools and techniques on the way.Python Geospatial Development Front Cover

Reference: Westra, E., 2010, Python Geospatial Development, Packt Publishing, Birmingham, UK, 508 pages Amazon Link

Before I start this review I should probably point out that I am a PhD student working in Remote Sensing and Geographical Information Systems, so I expected to know a fair amount of the theory in this book. The reason I wanted to read it though, was to learn how to do this analysis using Python and its associated libraries. This book succeeded in teaching me how to use Python to perform geospatial analyses, and actually taught me a significant amount of wider GIS knowledge which I had not picked up through any of my university courses.

The book is divided into four main sections: the first introduces general GIS concepts, the second explains basic GIS operations in Python, the third shows how to use databases with geographic data and the fourth combines all of the previous information into two GIS web-apps. It is always difficult to work out what level to pitch this sort of book at – as a number of potential readers will already have experience with GIS (and are using the book to learn about doing GIS analysis in Python, like me), but some will be complete beginners who want to introduce map-based analysis into their applications. The first few chapters of this book are pitched nicely at a mid-point between these two reader groups: the author explains things clearly and precisely without seeming patronising. Although I already knew much of the basics, I found the section on projections and co-ordinate systems very useful as I had never properly understood these (I’d always seen the different options in software I was using for Projected Co-ordinate Systems and Geographic Co-ordinate Systems, but I never knew the difference until I read this book!).

The next section explains how to use a number of Python GIS libraries such as GDAL/OGR, PyProj, and Shapely. The author starts with a general description of the capabilities of each library, and continues with a ‘cookbook-style’ approach showing how to do various tasks with these libraries. For example, instructions are given for how to convert projections and calculate Great Circle Distances with PyProj, how to extract shape geometries and attributes from shapefiles using OGR and how to do basic GIS analysis with Shapely. Details are given on joining these libraries together to exchange data between them using the very useful Well-Known Text (WKT) format – a format that I hadn’t come across, but which appears to be very useful. This section finishes by putting together a number of these libraries to do some real-world tasks such as identifying parks near urban areas.

The third section focuses on geodatabases – an area I know very little about. This section gives a good overview of the concept of a geodatabase, and then specific details about three geodatabases: PostGIS, MySQL and SpatiaLite. I was pleased to see that three contrasting databases were chosen, and a good listing of advantages and disadvantages of each was given. After introducing the workings of these databases, code examples for linking them to Python are given, starting from basic queries and going right up to complex spatial analysis performed within the database. A geospatial application (called DISTAL) is then implemented, showing how to combine geodatabase access with the GIS analyses explained in the previous section. This is implemented as a web-application, but previous experience with web programming is not needed as it is implemented using simple Python CGI scripts, and there are sidebars explaining terms that the reader may not have come across before.

The fourth section is by far the most complicated, and deals with producing maps using a library called Mapnik and producing geo-enabled web-applications using GeoDjango. I must admit that I didn’t quite follow all of this chapter, although this is probably because I’m not hugely interested in, or experience with, building web-apps. In some ways a little too much emphasis is made of how to do things using Django – and trying to introduce any web-app framework (be it Django, Ruby on Rails, or anything else) in one chapter is a tall order – and not enough on the GIS, but I can see why the author included it – as it brings together a fair amount of the tools covered in the book into one coherent whole.

Overall, I’m very impressed with this book. If I had my way (and you never know, if I end up as a lecturer one day I might…), I’d make Chapter 2 part of the core reading for any GIS course, as I am completely shocked that it covers areas that I have never covered even when doing Advanced GIS courses at degree level! I should mention that as well as the chapters mentioned above there is a useful chapter on sources of geospatial data (which, again, mentioned sources that I’d not heard of), and a comprehensive index which makes it very easy to find things. The instructions on how to use the Python libraries (and, more importantly, how to join them together) are well-written and comprehensive and the introductions to GIS concepts are pitched at just the right level. I would thoroughly recommend this book for any GIS or geospatial data user for two main reasons: firstly, it gives great introductions to GIS concepts they may not have come across, and secondly, knowing how to do these things in Python can make certain jobs so much easier (how about a 10 line Python script rather than a few hours of repetitive data conversion?).

(Disclaimer: I was provided with a free review copy of this book)

Review: The Geek Atlas by John Graham-Cumming

February 14, 2011

Summary: Very interesting, and great fun for a geek like me! Now I just need to find the time/money to visit these places…

The Geek Atlas CoverReference: Graham-Cumming, J., 2009, The Geek Atlas, O’Reilly, 544 pages Amazon Link O’Reilly Link

I’m a great fan of John Graham Cumming’s blog, so when the chance came to review his book, The Geek Atlas, I jumped at it. The book is part travel guide, part popular science textbook. It provides information about 128 ‘geeky’ places to visit around the world and provides brief introductions to the science behind the places mentioned. After reading the Table of Contents I realised that this was definitely a book for me: I’ve been to 12 out of the 128 places mentioned already, and would love to visit most of the others. In fact, I spent the next hour going through them with my fiancee and trying to work out when we’d be able to visit some more of them…

Anyway, back to the review. The places listed in the book are all interesting for some reason or other: they range from museums, churches and graveyards to company training centres and ex-military bases. The ‘interestingness’ is normally provided through some link to science or technology, an area that most people who self-identify as ‘geeks’ find very interesting. However, the author seems to have gone to great lengths to make this more than just a simple list of places to visit, and has provided very readable introductions to the science and technology behind the places. These brief introductions cover topics such as suspension bridges, natural selection and breaking the Enigma code. At times it feels that they assume a little too much of the reader (sometimes there are some rather scary looking equations which I suspect many readers would not understand), but they are generally pitched at an appropriate level. The description of Bayes Theorem, for example, is by far the best I have ever read (significantly better than the descriptions in a number of probability and statistics textbooks I have read).

As for the geographical distribution of the places, the majority are in Europe and America, with the highest concentrations in the UK and USA. Although I could easily predict some of the places on the list (for example, I knew that the Science Museum would be on the list, as would Bletchley Park), there are a number of more unusual places like the British Airways Flight Training Centre (where you can use the same flight simulators that real pilots use) and the Cherynobl Exclusion Zone (which apparently is fairly safe to enter these days). It is impressive how much science is covered in the book – from 17th century work by Newton right through to the design of the new Airbus A380 aircraft – and this is shown by the number of cross-references in the book. One of the attractions is a cemetery where a number of famous scientists were buried, and nearly all of these scientists had been covered, at least to an extent, elsewhere in the book.

I can’t really think of much more to say, other than to note that the book is quite thick and has enough interesting places in it to keep you going for a number of years. On that note – I’m off to plan some travelling…

(Disclaimer: I was provided with a free review copy of this book)

Review: R in a Nutshell by Joseph Adler

January 17, 2011

Summary: Very comprehensive and very useful, but not good for a beginner. Great book though – definitely has a place on my bookshelf.R in a Nutshell cover

Reference: Adler, J., 2010, R in a Nutshell, O’Reilly, Sebastopol, CA, 611 pages Amazon Link O’Reilly Link

After reviewing a book about R designed for beginners (see my previous post) I thought I’d step up the pace slightly and look at a more advanced book. I’m pleased to say that I was not disappointed. This book is so comprehensive – you can find nearly anything you want in it!

The book starts with a brief, but comprehensive, R tutorial. This tutorial is rather light on actual statistics, but gives a very good introduction to the syntax of the R language. This focus on the language continues through the whole of Part II, which contains detailed chapters on the language, syntax, objects, symbols, functions and high-performance programming in R. This is very different to most books on R which jump straight into the statistics. Although this part may seem rather boring to some, it provides a very good grounding in the basics of R programming, which make the rest of the book significantly easier to understand.

Through reading Part II I learnt a huge amount about the R language – finding out some things that I never knew, and realising how many things that I did know actually worked. This part may well confuse those with no previous programming experience (see my comments on this later), but those who are at least slightly familiar with the terminology, it gives a very comprehensive explanation of the language itself.

Part III is where the book starts to get into R’s main use: statistics and statistical graphics. Very sensibly, this section starts at the beginning of the process with how to import data (including instruction on how to connect R to databases) and then a lengthy (over thirty pages) section on preparing data for analysis. This is incredibly useful as this can often take a significant proportion of the time spent on a project.  The graphics chapters after this provide a comprehensive introduction to the standard (‘base’) graphics system, and then the lattice graphics system. I’m glad to see that a non-base-graphics system is given space in this book. From what I’ve seen on the web, it seems that very few R programmers use the base graphics system for producing production graphics, so this is a sensible inclusion.

Finally, after all the preparation, we get to the statistical analysis section (Part IV). Of course, as the book has covered so much of the fundamentals earlier, the statistics section can fly along, focussing mainly on the statistics themselves and the syntax of the tools that R provides, rather than the mechanics of how to write valid R commands. A wide range of statistical tests are included, and they are covered in a very sensible order (in fact, almost exactly the same order that my statistics class covered them: starting with summary statistics, then probability distributions, on to statistical models and then beyond that to classification and machine learning). I won’t lie and say that I’ve read every word of this section, but the bits I have read have been very good: concise but comprehensive.

There’s not really much more I can say about this book: it has become my ‘go-to’ reference for anything I need to do in R. It would be intimidating for a beginner, but it is not aimed at beginners, so that’s fine. For those of us who are slightly more experienced with R, it is a great book, and I thoroughly recommend it.

(Disclaimer: I was provided with a free review copy of this book)

Review: Statistical Analysis with R: Beginner’s Guide by John M. Quick

January 1, 2011

Statistical Analysis with R: A beginners guideSummary: If you can get past the strange underlying story, then this gives a good introduction to R to someone with no programming experience. However, if you have any experience with other programming languages then another book is likely to be more suitable.

Reference: Quick, J. M., Statistical Analysis in R: Beginners Guide, Packt Publishing, Birmingham, UK, 300 pages. Amazon Link Sample Chapter

As a new science PhD student, I wanted to get to grips with the most powerful statistical analysis software around, and that meant trying to understand the intricacies of programming in the R Project for Statistical Computing. John M. Quick has posted extensively on the internet about programming in R, so I had high hopes for his new book, Statistical Analysis with R in the Beginner’s Guide series recently published by Packt Publishing. Unfortunately, my high hopes were not entirely fulfilled. The book provides useful and correct information about programming in R, but it is underlain by a strange story about Chinese wars and has a number of niggling problems that prevent me from fully recommending it.

Before starting, I should explain my previous experience. I have used a number of pieces of statistical software in the past (such as SPSS) and have a small amount of experience with R from a PhD statistics skills class. I do, however, have significant programming experience in a number of languages. I think this is the part of my experience that distinguishes me from the intended audience for this book, as it is designed for real beginners. Those who have any experience in a modern programming language or linux shell will find the first few chapters very easy, and therefore somewhat frustrating. However, these do give a good basic introduction to running commands in the R shell, and working out which lines are commands and which are outputs. It also explains the [1] that appears at the beginning of R output lines, which is not mentioned in many introduction to R tutorials that I have read. Once we’re past the rather contrived example of solving a magic square by using R as a calculator the interesting bit starts…

However, before describing the data analysis section of the book, I should explain the underlying story used throughout the book. The introductory chapter gives a bit of ancient Chinese history, and states that you, the reader, have been chosen to succeed the famous military leader Zhuge Liang and need to learn how to use R to analyse his data and plan the future of the military campaign. The rest of the book takes on this theme, both in the data analysis (comparing the Shu and Wei armies, and predicting battle outcomes using regression) and the general phrasing (headings like “Have a go hero!” and emphasis that if you fail the Chinese kingdom will collapse). I’ll be honest: this story doesn’t work for me at all. In fact, it drives me nuts having it constantly throughout the book. As for why it annoys me, I’m not entirely sure: probably partly because it seems like the examples have had to be twisted rather to make them fit the story, and partly because I have no interest in ancient Chinese kingdoms, or using R to plan military campaigns.

I understand that not all readers will agree with me here, and that putting a story like this behind the scary process of learning new statistics software may help people get to grips with it. From my point of view I would have preferred to see a range of datasets used from the examples provided with R (all of the datasets listed here are built in to R), as this would (a) mean that the datasets are always available from within R and (b) provide interest for a wide range of readers.

So, apart from my personal views of the underlying story, the actual content of the book is quite good. The chapters cover the whole process of statistical analysis from data import (Chapter 4), through summary statistics (Chapter 5) and modelling (Chapters 5-7) to graphical output (Chapter 10). The final chapter of the book gives good pointers for more help on R, from the inbuilt help through to recommended blogs and websites. It also covers installing packages, although more emphasis could have been made of how useful packages can be when performing analysis in R. The book assumes some statistical knowledge, but briefly explains concepts the reader may not have experienced before (such as correlation coefficients and AIC). Each chapter takes the form of some instructions (‘Time for action’), followed by an explanation (‘What just happened?’), a few questions (‘Pop quiz’) and a suggestion for the reader to try and do (‘Have a go hero’), and this approach seems to suit the material quite well. Although it can get frustrating at times (I tend to automatically skip over quizzes in books), I think the structure would help less confident readers.

The range of content that the book covers is impressive, as it goes from installing R to comparing models using AIC and customising graphs, although at times the explanations seem a bit verbose. More worryingly, the code examples, although completely correct, are written in a programming style that I suspect no real-world R programmer uses. The arguments for each command are stored as variables before the command is run (not that unusual for complex arguments, but a bit strange to do for every argument) and these variables have incredibly long names. The code snippet below (from page 172) is a good example:

&> #create a box plot that compares the number of soldiers required across the battle methods
&> #get the data formula to be used in the plot
&> boxplotAllMethodsShuSoldiersData <- battleHistory$ShuSoldiers ~ battleHistory$Method
&> #customize the plot
&> boxPlotAllMethodsShuSoldiersLabelMain <- "Number of Soldiers Required by Battle Method"
&> boxPlotAllMethodsShuSoldiersLabelX <- "Battle Method"
&> boxPlotAllMethodsShuSoldiersLabelY <- "Number of Soldiers"
&> #use boxplot(...) to create and display the box plot
&> boxplot(formula = boxplotAllMethodsShuSoldiersData, main =
      boxPlotAllMethodsShuSoldiersLabelMain, xlab = boxPlotAllMethodsShuSoldiersLabelX,
      ylab = boxPlotAllMethodsShuSoldiersLabelY)

John’s code samples on the internet (for example here) do not have this verboseness, so I assume he put it in to try and make the code samples more easily readable. The problem is that, particularly when there is no colour syntax highlighting in the book, it actually makes it far more difficult to read and understand the code.

In conclusion, this book provides a good overview of using R and is correctly pitched for an audience of beginners. The underlying story frustrates me, but this is likely to be a matter of personal taste (try looking at the sample chapter linked at the top of this post to see how you feel about the story). Apart from the verbosity of the code examples the information is accurate and up-to-date. I would recommend this book for someone who has absolutely no experience with R or other programming languages and is somewhat scared of trying to learn R, as the underlying story and structure will provide a safe and comfortable environment for learning, but for those who feel they are more confident another book may be more suitable.

(Disclaimer: I was given a free review copy of this book by PacktPub)

Review: A Thousand Years of Nonlinear History by Manuel De Landa

December 2, 2010

Summary: Very unusual approach, but also provides an interesting new view of geography.

Book Cover

Reference: De Landa, M. 1997 A Thousand Years of Nonlinear History, Zone Books, New York. 333 pages. Amazon Link

When dipping into a chapter entitled Geological History 1000-1700 AD one would expect to find information on rock types, the development of landforms and possibly the history of the development of geological thought. In Manuel De Landa’s book A Thousand Years of Nonlinear History however, this is not the case – what will actually be found is discussion of Christaller’s Central Place Theory, the development of urban areas in both Europe and the Far East and different philosophical perspectives on these. This aspect of surprise continues throughout the book – De Landa’s approach to all the topics covered is novel, and the insights gained from these approaches are huge.

Although the book is entitled A Thousand Years of Nonlinear History, it is by no means a standard history book – it focuses on the application of historical processes, and generally the passage of time, to many areas within human geography. The most important word in the title is probably “nonlinear” as this is the way in which De Landa approaches all the areas covered in his book. It is very difficult to define what is meant by nonlinear – the author takes many pages for his explanation – but simply put it is considering history as a tree with many branches rather than one pure and straight linear course. This idea of nonlinearity is extended throughout the book to cover different types of nonlinear development (such as hierarchies and meshworks) and is used as part of the explanation for many areas of geographical development.

The book is divided into three parts (Lavas and Magmas, Flesh and Genes and Memes and Norms) each of which contains chapters which look at the specified topic from 1000-1700 AD and then from 1700-2000 AD. Sandwiched in the middle of each part is a section elaborating on some of the ideas introduced in the part – for example the Sandstone and Granite chapter within Lavas and Magmas elaborates on the ideas of hierarchies and meshworks, their definitions (within a variety of fields from biology to economics) and their effect on the development of urban geography. As mentioned in the first paragraph of this review, the names of the parts are metaphors for the content within them. For example, the first part is entitled Lavas and Magmas, and this metaphor is explained towards the end of the part by an analogy between lava and the physical constructs of cities. Some of these analogies are rather tenuous, but they all serve to give interesting new perspectives on familiar aspects of human geography.

Although De Landa’s book is very interesting, and in many ways unique, it is also a difficult read. This is really par for the course when one is explaining the sort of complex ideas which are used in this book, and some may find this book completely inaccessible because of the complexity of the ideas discussed. The majority of topics are explained very well – but some topics come across as rather confusing. Also, some of the language is rather pretentious, and one can’t help feeling that some of the ideas are not quite as complex as De Landa makes them out to be.

The presentation of A Thousand Years of Nonlinear History is, like the rest of the book, rather unusual. The striking front cover design makes the book stand out on a bookshelf – although the complexity of this cover design hinders the reading of the blurb on the back – one of the first places a prospective reader will look for information about the book. The choice of font size throughout the book is also interesting. De Landa has chosen to use larger font sizes at the beginning of each chapter – gradually reducing to a rather small font for the majority of each chapter and then increasing again towards the end. I assume this was chosen to accentuate the introduction and conclusion of each chapter – and in some ways that is a good aim. However, this has not helped my reading of the book – or my identification of the important parts of the chapter. It also has the side-effect of making the body of the chapter look very small, and this has made it quite difficult and tiring to read.

Overall, De Landa’s book is a very interesting read. It takes a new approach to almost every topic covered and provides much food for thought. Although A Thousand Years of Nonlinear History should not be used as the main text for any of the topics covered it provides much useful background reading. Some parts of the book are difficult to read and understand, but perseverance will result in appreciation of the new perspectives raised by this unusual book.