As part of the OS Open Data initiative the Ordnance Survey has released a free version of their 1:50,000 scale gazetteer. This lists all of the names shown on the 1:50,000 scale OS maps, linked to information such as their location (in both Ordnance Survey grid references and WGS84 latitude/longitude pairs) and type (city, town, water feature etc). I’ve had a little play with this data to do some statistics on it and see if I can find out anything interesting. Hopefully you’ll agree that at least some of the stuff below is interesting.
My main statistical analysis was to count the frequency of each name in the gazetteer, and then extract the ten most commonly-used names for each of the feature types listed in the database.
Cities: No cities in the UK share a name with another city.
There are 1,245 towns in the dataset and the results here are slightly more interesting – the winner is Seaton, of which there are three (in Cornwall, Devon and Yorkshire). This isn’t particularly surprising as Seaton was probably derived from Sea Town, and given that we are an island nation, there is a lot of sea by which towns could be located! A number of names are well-known for belonging to two towns in the UK, such as Newport, St. Ives and Ashford, but there are 12 such towns, listed below:
Other settlements: This is where it starts to get more interesting, but also more confusing. The feature type of Other Settlements seems to include anything that’s not a town or a city and there are over 34,000 of them. Nearly 31,000 of these are unique (a fairly impressive feat), but a number of them seem to be more popular than the rest – the top ten are listed below and include three variations on calling a place a new town (New Town, Newtown and Newton) as well as a number of other names relating the settlements to their location (West End, North End, East End) or what is there (Church End).
I was intrigued by the differences between the names of New Town, Newtown and Newton and wondered if there may be differences across the country in which name was used. Also, I wondered whether any areas of the country had significantly more of these new settlements than others. The map below shows the distribution of the three names across the country (click to see a larger version):
A number of patterns are noticeable:
There seem to be a lot of these settlements around the Welsh border (the area known as the Welsh Marches). I’d suspect this is because of the frequently changing location of the border in this area, leading both sides to rename a town as ‘new’ once they’d captured (or re-captured) it.
There is a distinct lack of these settlements in central Wales – fairly obviously, as most names there are in Welsh
Scotland has far more Newtons than any other version of the name, and these are concentrated on the eastern side of the country.
New town seems to be far more common in southern England, although there is a concentration of Newtons in Cornwall.
I’m not sure what exactly one can learn from the above, but it’s interesting to me at least. If anyone has any ideas which explain the distribution of these names better then leave a comment.
1. Conference presentations are not like undergraduate lectures In an undergraduate lecture you go in, listen to someone talk for a long time, and try and write down/remember as much of it as possible – because almost any of it could come up on the exam. It was not uncommon for me to have 3-4 pages of handwritten notes for a 45 minute lecture. Conferences aren’t like that – you don’t need to remember (or even understand!) all of the details and you only need to care about the bits you actually care about. Why is this? Well, you’re not going to be examined on it, and you can get the details from the author another way (through the paper in the conference proceedings or via email etc). Most importantly though – you only need to be interested in the stuff that is interesting or useful to you – now you’re a researcher no-one else is telling you what to be interested in, you have to decide that yourself!
I first noticed this during some departmental seminars where I was taking notes and so was my supervisor, who was sitting next to me. During the seminar I took about 2 pages of notes, and my supervisor wrote about 5 words. I had treated the seminar like a lecture, and tried to get down all of the information – he had focussed on just the bits that were really interesting to him.
I must admit I still find this very difficult – four years of lectures have got me into the ‘write everything’ mindset, but I tried hard to be more selective at this conference, and I think I did fairly well. I tried to make notes on the bottom of the page in the conference booklet that showed the abstract for the presentation – which could be quite hard if the abstract was long!
2. Show your interest in other people’s work People want you to be interested in their work – if you get excited about it they’ll be excited too, they won’t label you as a ‘geek’ or ‘weird’. In fact, nothing seems to make researchers happier than having a really in-depth conversation about their work with someone who is really interested.
3. Show your interest in your work – all the time No matter who you are talking to, show how interested and excited you are about your work, and how great you think it is. Why? Well, hopefully you are interested in your work, but also – you never know who you’re talking to. I spoke to someone who I thought was ‘just an academic’, turned out he was in charge of funding for the majority of my field in my country. My excitement in my project and explanation of why I think it is important may have paid off…
4. Social events are important We had a lot of nice social events at this conference (including a boat trip around Poole harbour and a very nice meal at a fancy hotel). Most of the really interesting conversations I had were during these events – whether they were learning about the academic environment in different countries, discussing research projects, or coming up with really bad remote sensing jokes. Getting to know people in the social events gets you known amongst the community, so next time someone sees your name as an author they remember you – always useful.
I’ve just discovered something that I feel I must share here – partly to make more people aware of it, and partly so I don’t forget it. In the IDL programming language you will sometimes find your program interrupted by a line saying something like:
% Program caused arithmetic error: Floating divide by 0
Sometimes it will be obvious where the error is – but often you can spend ages looking for it (just like with segfaults in C…). What I only just found out is that if you run the command
At the IDL prompt before running your program you will get a far more informative error message like
% Program caused arithmetic error: Floating divide by 0
% Detected at JUNK 3 junk.pro
The key thing is that this tells you what line the error occurred on (line 3 of junk.pro in the above example) – which helps you to narrow down the problem far more quickly.
More details on the values that !EXCEPT can take is available here – basically the options are no messages, unhelpful messages and helpful messages.
This is very useful – but just beware that running with !EXCEPT=2 all the time will slow down your code, so only do it if you need to for debugging purposes.
If you use iPython, particularly if you do work with matplotlib or with parallel programming, you should use the latest release.
It. Is. Great.
For those who don’t know, iPython is a replacement console for Python that offers many improvements over the standard console. For example, everything has tab completion, syntax highlighting is available, there are many magic commands (eg. %run, %edit etc), you can run standard terminal commands simply by adding ! to the beginning (eg. !wget www.google.com) and much more. With the latest (admittedly still unstable) release there are two major new features:
Parallel Programming support - you can create a team of python worker processes which can then be, very easily, set tasks to do. It even has support for automatic load-balancing (something which even some very fancy parallel programming environments can’t do!). As nearly all computers are multicore these days, and as many scientists have access to fairly sophisticated supercomputers, this is a great step for Python
GUI console - you can now use iPython in a GUI. So what? you ask – I like the terminal! Well, I like the terminal too, but having a GUI means two things:
Tooltips - as soon as you type a function name and an open bracket a tooltip will pop up giving you the list of parameters for the function and the doc-string (or at least as much of as it which will fit).
Embedding of graphics - yes, now you can embed plots from matplotlib in your console just like Matlab or Mathematica users can. What’s more, you can also export the whole console session (with both the figures and the text) to a HTML file – great for teaching use (I’m already planning how I can do this…)
iPython was always great – now it’s even better!
For more details please see here, and follow SciPyTip on Twitter to get regular tips and news for SciPy users.
I haven’t had chance to try it yet, but I’m wondering whether libraries like the Python Imaging Library or Spectral Python will also be able to put figures directly into the iPython GUI terminal. Maybe I need to work on some integration for those – that’d be great for my remote sensing teaching!
As part of my research I do a fair amount of data collection in the field. Some of the instruments I use are very modern and connect to a computer via USB, interacting with custom-written client software which allows such luxuries as timed logging, triggered logging and local calibration. However, a number of the instruments are older and don’t have computer-based logging capability, requiring you to log data to their internal memory and then download it later.
This is often perfectly satisfactory, but timing can be an issue. For example, when taking measurements using a number of instruments it is often important to make sure that measurements are taken at the same time. For example, if spectral measurements are being taken and other instruments (for example sunshine sensors, like the sensor shown below) are being used to gather data which can then be used to atmospherically correct the spectra, then it is very important to ensure that measurements are taken at the same time. This is particularly a problem in areas of fast changing weather like the UK, where sky conditions can change very quickly.
A tool called SJinn allows you to send simple strings over a RS-232 (standard serial port) connection and then obtain data sent back by the instruments. One of the examples given by SJinn is the following:
rs232 -b600 -p7n2 -s"\n" -r16
This sends a newline character over the serial port (at 600 baud with 7 data bits and 2 stop bits) and then returns the next 16 characters send on the line. In this case, it would provide the voltage measured by a digital voltmeter. As this is simply a command-line tool, it is very easy to combine into scripts, and thus use to collect timed measurements (eg. via the use of the cron daemon). I have used similar techniques to obtain measurements from the sunshine sensor shown above – a script for which will be available on my website soon.
You may find, as I have done recently, that a network printer installed on a Windows Vista starts suddenly showing as Offline even when other machines on the network can access it fine. I originally thought it would be an IP address issue, but it turned out not to be anything to do with that. In fact, the solution was far simpler – but also slightly strange…
It turns out that Windows Vista automatically enables SNMP support for networked printers, and if it can’t get a response to a SNMP message then it assumes the printer is offline. SNMP stands for Simple Network Management Protocol and is a way of getting information from network devices (such as routers, servers and printers), mainly for the purposes of finding out if there are any problems with the devices. A number of networked printers implement SNMP, and will respond to SNMP queries with information, but some don’t. My printer (a fairly old Lexmark T640) is one of the ones that doesn’t implement it – so of course Vista will never get a response to a SNMP message. The result of which is that the printer will start showing as offline at a seemingly random time because Vista has just sent a SNMP message to it, and it hasn’t responded.
Thankfully there is a simple way to fix this – and it just involves telling Vista not to try and communicate with the printer via SNMP. Simply right-click on the printer in the Printers window, choose the Ports tab, and select Configure Port. At the bottom you will see a checkbox saying something like SNMP Status Enable. Untick that, and the printer should start showing as online again.
(Update: If this doesn’t work, then try the method described in Coxy’s comment, below)
The first piece of software in my series of essential OS X software is a very handy tool which reminds you when you haven’t attached a file in an email when you intended to. How does it do this? Well, it searches for key words in the email and reminds you if, for example, you use the word attached without attaching a file.
This sort of functionality is already present in a number of other email apps such as GMail and Thunderbird, but isn’t present by default in OS X’s mail application. However, this free tool will add it. Simply download it from http://eaganj.free.fr/code/mail-plugin/ and follow the instructions (just make sure you download the beta version if you’ve got Snow Leopard, or it won’t work!)
I have recently discovered PyDev – a Python IDE which runs within Eclipse. Although I’d given up on big all-singing, all-dancing IDEs a few years ago I’m really liking it. The Ctrl-Space completion is very handy, as are the number of refactorings that are available from the menus.
Anyway, I use the Enthought Python Distribution (EPD) on my Mac, as it provides Python with a number of important scientific libraries (NumPy, SciPy, Matplotlib etc) in an easy-to-install package for OS X. It’s really handy – and is free for academic use. The only problem with using EPD is that applications can sometimes get confused between EPD and the Apple-provided version of Python.
It turns out that PyDev is one of those applications. If you follow the PyDev installation instructions, it suggests you click the Auto Config button to configure your Python interpreter. This will not work for EPD! Instead, (after deleting the interpreter you have configured already, if you’ve already configured one), click the New button and then fill in the fields as below:
Interpreter Name: This is just a name to refer to the interpreter by – it can be anything you like. I tend to use EPDPython.
Interpreter Path: You’ll need to find the python executable provided by EPD. This is normally located somewhere like:
The best way to find it is to navigate from /Library down the path, choosing the most sensible folder at each stage. When you get to the Versions folder, make sure you choose the latest version (highest number) folder, and then choose the bin directory and then the python executable. Once this is done, PyDev will automatically find the relevant folders to add to your PYTHONPATH, and everything will be working.
For a long time I have been searching for a simple, easy-to-use, comprehensive list of freely available GIS datasets that I can use in my academic work – or for any other non-commercial purposes (eg. teaching, ‘just for fun’ applications, etc). All of the lists that I have found have been out-of-date, riddled with adverts, or specific to one field or country, so…I decided to make my own:
The screenshot above just shows the first few links – there are many more, split in to categories by field (for example, natural disasters, population, ecology and administrative boundaries) and by geographical area (for example, global, UK, Puerto Rica). The site is being updated regularly, and I am frequently running a checking tool to ensure that all of the links still work. It is intentionally designed to be one simple page – my thoughts being that it is far simple to search for a dataset using Ctrl-F (the in-browser search facility) than navigating around a slow database-backed site.
I think that screenshot is enough really, but just to give a few more pieces of information:
In the screenshot above you can see twenty-one separate pieces of software whose name starts with TOSHIBA. These are 21 different pieces of software which were supplied, by Toshiba, as part of the original build of the machine. I suspect the majority of them run at startup, although I haven’t fully investigated that.
Many of the pieces of software seem not to be very important or useful – for example TOSHIBA ReelTime? What is that? Or TOSHIBA Online Product Information – since when has anyone used anything like that? As for what on earth TOSHIA Value Added Package is, I don’t know, but I suspect it isn’t adding much value to my use of the computer!
As well as the software above (nicely screenshotable (yes, that is a word) because they’re all next to each other in the list), the laptop came with a number of other applications including: Bing Toolbar, Ebay, TRORMLauncher (seems to be something else from Toshiba) and a number of others.
I wanted a computer to use – not a computer filled with junk like this. I think the best thing to do is probably to wipe the machine and reinstall a clean system, but I should not have to do that with a new computer. If any computer manufacturers are reading this (unlikely, but possible) then please stop doing this.
(For the record, I was given this computer under a university scheme, I did not purchase it myself – if I had I wouldn’t have gone for a Toshiba!)