Archives
| M | T | W | T | F | S | S |
|---|---|---|---|---|---|---|
| « Mar | May » | |||||
| 1 | 2 | 3 | ||||
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| 25 | 26 | 27 | 28 | 29 | 30 | |
How to: Consolidate your internet presence
Recently I took a while to try and simplify and consolidate my online presence. I thought it was an appropriate time to do this, as I had just bought a domain name (rtwilson.com), where I was hosting my academic website (www.rtwilson.com/academic) and my blog (which is what you’re reading now!). I thought it’d be useful to document the steps I took:
Stage 1: Create a new, clean identity
I will assume here that you have a good, sensible email address that you are likely to keep for a long time. If you don’t, then I’d recommend buying a domain name and getting a sensible email. This will the email that you use for all web-based communications and accounts.
The tasks below basically involve setting up accounts on major cross-site platforms, and ensuring they have sensible details, images etc. You may want to find an appropriate photo of yourself, or some other photo that you don’t mind representing you on the internet.
1. Create a Gravatar account with that email, and give it a sensible photo. In case you don’t know, Gravatar is a system used by a number of websites, blogs etc. to provide an image to go with your account. It’s all linked through your email, and is very simple to set up – simply go to the Gravatar signup page and follow the instructions.
2. Create an OpenID account. You may already have one of these – check here to see what account you may already have that has an OpenID associated with it. If you don’t have one, either set one up through a dedicated provider like MyOpenID, or link one to your domain.
3. Clean up and configure your Google account. You probably want to update your Google Profile with a sensible image and some sensible information.
Stage 2: Remove old accounts/identities and link them to new ones
This is a little more difficult – but just because you have to remember where you have accounts. Some of these accounts you may wish to close (Bebo, MySpace etc), but some you’ll want to keep and associate with your new email address.
This may involve simply changing the email address of your account and updating details (photo, profile etc) or removing an account and creating a new one. You will find that a number of sites will now let you link your account through OpenID or to your Google Profile, and loads of sites will pick up your avatar from Gravatar. A list of sites you might want to check is below:
- Bebo
- MySpace
- Yahoo
- Delicious
- StackExchange sites (StackOverflow, SuperUser etc)
- Blogging accounts
- Flickr
- YouTube
- Wikipedia
- Forums
- Hacker News
Stage 3: Update as necessary
Last but not least – keep an eye on your profiles on these sites. Update them when they need updating – I found a site which still said I was at school – and make sure photos (if you are using them) are at least vaguely up-to-date.
(If you liked this, you might enjoy my other how-to’s and my book reviews)
Review: Machine Learning: An Algorithmic Perspective by Stephen Marsland
Summary: Great book – clear explanations, useful example code and a friendly, easy-going writing style. One of my favourite academic books ever! 
Reference: Marsland, S., 2009, Machine learning: An Algorithmic Perspective, Chapman & Hall/CRC, Boca Raton, Florida, 390pp Amazon Link
Machine Learning can be a difficult topic – as I found out when taking a Masters-level machine learning course this year. It can become very mathematical – particularly when dealing with complicated areas such as Support Vector Machines – and it is very difficult to pitch a university lecture course at a level where all of the students can understand it. Unfortunately, in my course I was one of the students who didn’t really understand the lectures particularly well…but this book saved me! In fact, it was so well written that I was reading it in bed at night, and staying up late to finish the chapter!
I firmly believe that fields such as Machine Learning and other practical computing topics (such as Computer Vision, Statistics, Programming etc) should be taught using practical examples and practical teaching sessions wherever possible. Also – for all but the most complex topics – full algorithms should be provided, and implementations in appropriate programming languages shown. This book definitely fulfils this – showing detailed algorithm explanations for all of the algorithms considered (apart from Support Vector Machines, which the author decided – sensibly in my opinion – were too difficult to cover in full detail), and implementing them in Python versions. Parts of the python code are shown in the book, and all of the code is available online.
The explanations and motivations for the techniques shown in the book are brilliant – both easy-to-read and comprehensive. Mathematical explanation are clear – with ‘big picture’ explanations given in textual form for those who want to skip the detailed maths – and diagrams are well-chosen and easy to understand. Furthermore, sensible examples are given for the uses of machine learning – ranging from standard datasets like iris, to more unusual examples like ozone layer depth prediction.
The book covers most of what you’d need for an introductory Machine Learning course, starting with perceptrons, before moving to multi-layer perceptrons, radial basis functions, support vector machines, decision trees, unsupervised techniques and genetic algorithms. The book also covers introductions to probability (including Bayesian inference), dimensionality reduction (including PCA and LDA) and optimisation techniques. The wide range of techniques covered makes this useful not just for machine learning students: genetic algorithms, dimensionality reduction and optimisation have many applications in other fields.
This is a shorter review than many of my reviews, as I really can’t find much to fault with the book. It’s great. My advice is, if you’re interested in this topic, need to learn it for a course, or think you’ll want to use Machine Learning techniques in your work then buy it – you won’t regret it!
Finally – a way to move an ArcGIS map to another computer without breaking it
Just a quick post this time, as I’m currently enjoying a nice holiday (well, holiday combined with work) in France. I had to post this because I’ve just realised that one of my biggest gripes with ArcGIS has been fixed in version 10! Hooray!
I suspect a lot of other people have been frustrated by this too: if you want to take an ArcGIS map document and use it on another computer it is (or at least, was) very difficult. The .mpd file only contains references to the actual data for the map, so you have to find where all of the data is stored and take that with you too – and whats more, the references are often stored as relative path names, so even moving it to a different folder on the same computer is a pain. In fact, I seem to remember being taught in an undergraduate ArcGIS course never to move an ArcGIS .mpd file once I had created it!
Anyway, ArcGIS 10 appears to have a new function called Map Packages. This allows you to package a .mpd file with all of the data that it uses into one (quite big, probably) .mpk file, which is then completely portable. Sounds great!
I haven’t been able to test it yet (I haven’t got ArcGIS on the laptop with me in France), but it sounds like it’ll be very useful.
For more information, see this ArcGIS blog post.
Programming << Pointers << Parallel Programming
I remember, fairly early on in my programming career, reading Joel Spolsky’s article about interviewing for programmers. At the time I thought I might want to get a job as a programmer (in fact, I’ve now got a job in academia – albeit in a field that involves a fair amount of programming), so I was interested to know what sort of things he thought interviewers should look for. One of the key things is being able to understand pointers, which he suggests is far harder than most of the rest of programming:
I’ve come to realize that understanding pointers in C is not a skill, it’s an aptitude. In first year computer science classes, there are always about 200 kids at the beginning of the semester, all of whom wrote complex adventure games in BASIC for their PCs when they were 4 years old. They are having a good ol’ time learning C or Pascal in college, until one day the professor introduces pointers, and suddenly,they don’t get it. They just don’t understand anything any more. 90% of the class goes off and becomes Political Science majors, then they tell their friends that there weren’t enough good looking members of the appropriate sex in their CompSci classes, that’s why they switched. For some reason most people seem to be born without the part of the brain that understands pointers. Pointers require a complex form of doubly-indirected thinking that some people just can’t do, and it’s pretty crucial to good programming. A lot of the “script jocks” who started programming by copying JavaScript snippets into their web pages and went on to learn Perl never learned about pointers, and they can never quite produce code of the quality you need.
Joel Spolsky (from The Guerrilla Guide to Interviewing)
At the time, I’d never properly dealt with pointers, so I made it my goal to understand them, which I did – mainly through reading the great K&R book. In his quote above, Joel refers to the type of ‘doubly-indirected thinking’ that is required to understand pointers – realising that what you have isn’t the object itself, it’s just a reference to the object, and that you can manipulate the pointer without necessarily manipulating the object, and so on.
However, I’ve just discovered the next thing along the line from the doubly-indirected thinking that pointers require: it’s the n-indirected thinking that parallel programming requires. That’s what the title refers to – using the mathematical symbolism of >> being ‘significantly greater than’ (yes there may be a unicode character for this, no I didn’t bother to find it). So – why’s all this parallel programming even harder (conceptually, at least)? Well…with pointers you have one thing (the pointer) which points to another thing (a memory location storing something – an integer, or a double or something). In parallel programming, everything has multiple copies, all of which may (or may not) have different values at any one time.
I’ve been doing some programming using MPI – the Message Passing Interface – recently. The way this works is that each core (that is, individual processing unit – whether it is combined on a piece of silicon with other cores or not) is treated as entirely separate from every other core in terms of memory (this is in distinct difference to other methods like OpenMP). Therefore, if cores want to exchange data they have to send an explicit message to another core to get the data. This sounds very restrictive, but by carefully distributing data to begin with, you can minimise the amount of communication you need to do.
One other thing about MPI, and the most relevant for this post, is that each process runs exactly the same code – so the whole code runs on each processor (unless you do things like ‘if (process_id == 0)’). This means that, at any point in code, one variable (for example, num_rows, holding the number of rows of the array this processor is operating on) can actually hold different values in each processor. For example, in the code I was writing, this was the same for most processes (as I tried to split up the array evenly) but with a few processes having more or less than the others. So, one variable has multiple values – ok, doesn’t sound too complicated…
However, when you start sending values from one place to another you realise that you can get in a terrible mental muddle (I find scribbling lots of diagrams helps!) as you’re sending variables from one process to variables in another process that may have different values for all of the other variables. Of course, when you’re dealing with pointers as well you’ve got the confusion of pointers, then the n-indirectedness of dealing with all of the variables. Fun!
(Oh and add to this the realisation that each process can be doing different things at the same time, but that send and receive calls must still match up….and your brain starts to explode!)
Still, all of this parallel programming is worth it – I’ve got really great speedups for some of my code!
Review: Matplotlib for Python Developers by Sandro Tosi
Summary: I’d recommend this for people interested in adding Matplotlib functionality to GUI and web applications, and for those who need a bit more information on how to do advanced plotting with Matplotlib. Most general users will be able to get the information they need from the Matplotlib website.
Reference: Tosi, S., 2009, Matplotlib for Python Developers, Packt Publishing, Birmingham, UK, 293 pages, Amazon Link Publisher’s Website
This book is designed to teach readers how to do two things: make graphs with matplotlib, and embed matplotlib into GUI and web-based applications. My primary focus in my working life is on creating graphs using Python directly, rather than embedding graphs in other applications, although I was interested to learn about the latter, as this may come in useful in a few years time (for example, when building GUI front-ends for some of my modelling code). I think this book succeeds more in the latter objective than the former – as much of the first half of the book seems to be repeating information that is easily available on the Matplotlib website – where it is actually explained and phrased better!
Starting at the beginning, the book explains what matplotlib is, and gives a list of good reasons for using Matplotlib. However, there is then a fairly long section on output formats and Matplotlib backends. This is fairly technically detailed, and not totally easy to understand, so presenting it before the ‘Getting Started with Matplotlib’ chapter seems rather strange. This ‘Getting Started with Matplotlib’ section covers the basics of plotting including plotting lines, changing axes, adding gridlines and creating legends. It goes on to explain how to save graphs to files, how to use the interactive plot display window, and how to use the ipython pylab mode. After this, though, there is another strange arrangement of chapters – with a detailed section on matplotlib configuration files. This, again, is fairly detailed, and not hugely useful (as I haven’t yet seen any examples online where people have chosen to modify the configuration files), and it is strange to have it before the (far more useful) section on customising plots. This section is fairly good, covering how to change line styles and colours, polar plots, and plotting text and annotations. However, there is no mention of how to include more than one plot in a figure – which seems a bit of a strange absence.
The next chapter deals with object-oriented Matplotlib – that is, accessing the constituent objects directly rather than through the PyPlot commands. This is interesting, particularly as it isn’t explained particularly well on the Matplotlib website (that really just has an explanation of all of the methods in each class), but it could do with more information on the ‘big picture’ – a basic class diagram showing how all of the difference classes (Figure, Axes etc). I was pleased to see the inclusion of a section on plotting dates – something which I find I need to do quite often, but which often isn’t covered in tutorials and books.
The first part of the book now seems finished, and the book starts the section about integrating Matplotlib with various GUI and web frameworks. However, there is still one more useful chapter for those who aren’t interested in web or GUI programming, and that is the last chapter – ‘Matplotlib in the Real World’ – which covers a number of examples of using Matplotlib in the real world. Again, this seems a slightly strange arrangement of chapters, as none of the content of this chapter depends on anything covered in the GUI and web programming chapters. As for the examples, they are fairly good examples of real-world usage, and use freely-available data from the internet. As with all real-world examples of code – the code is more complicated than it possibly should be in a teaching book – but it does show you how to do everything from obtaining the data to producing the final plot. This chapter also contains a fairly detailed section on plotting maps in Matplotlib – again, a useful section, but it feels like a rather unusual inclusion.
It would be unfair of me to make much comment on the chapters of the book on integrating Matplotlib with GTK+, Qt4 and WxWidgets as the author emphasises that the chapters will not teach you the basics of programming in these GUI frameworks, and I have no experience in using them. However, I understood most of the content of these chapters without having any experience with the GUI frameworks, which bodes well for how useful the chapters will be for those with more experience. I have more experience with web programming, and found this chapter very useful. It gave me a number of ideas for how I could include matplotlib plotting in an interactive data visualisation web-application I am hoping to develop for a monitoring system that I have designed. I was originally planning to use CGI scripts to run a python script which wrote its output to a file, which was then displayed in the page, but through reading this chapter I found a far better way to do it – by using Matplotlib’s functionality to send the output directly to stdout – which, in this case, is the webpage itself. Useful tips are also given about using StringIO to a similar thing.
Overall it is the second half of the book which is really useful – particularly the sections on using Matplotlib with GUI applications, web applications, and the real-world usage examples. The first two chapters seem to repeat, in a more confusing way, the Matplotlib tutorial available on the project’s website. I must admit to being slightly concerned at the standard of English throughout the book, which may hint at poor proof-reading. Examples include “e is the most common letter in the English writings” and “Matplotlib has a plotting function ad hoc for dates” – not a major problem, but something that grates on me when I’m reading.
(Disclaimer: I was given a free copy of this book by Packt Publishing)
My LaTeX preamble
Since I started my PhD I have forced myself to use LaTeX for all of the documents that I write (yes, absolutely everything), and this has really helped me get to grips with how to do things in LaTeX. Overall I have been very impressed – my documents now look really professional, and LaTeX actually works really well and isn’t that hard to get to grips with.
I thought I’d write this post to explain the various packages that I always load in my LaTeX preamble. I’ve often been meaning to put all of my preamble into a style file so that I can load it in one line, but I’ve never quite got round to it. Maybe I’ll write another blog post about this when I get round to doing it…
Anyway, without further ado, below is my standard LaTeX preamble, with brief comments explaining the packages I’ve loaded. Below the code I go into further detail about the packages I use most.
\documentclass[12pt]{article}
% Pretty much all of the ams maths packages
\usepackage{amsmath,amsthm,amssymb,amsfonts}
% Allows you to manipulate the page a bit
\usepackage[a4paper]{geometry}
% Pulls the page out a bit - makes it look better (in my opinion)
\usepackage{a4wide}
% Removes paragraph indentation (not needed most of the time now)
\usepackage{parskip}
% Allows inclusion of graphics easily and configurably
\usepackage{graphicx}
% Provides ways to make nice looking tables
\usepackage{booktabs}
% Allows you to rotate tables and figures
\usepackage{rotating}
% Allows shading of table cells
\usepackage{colortbl}
% Define a simple command to use at the start of a table row to make it have a shaded background
\newcommand{\gray}{\rowcolor[gray]{.9}}
\usepackage{textcomp}
% Provides commands to make subfigures (figures with (a), (b) and (c))
\usepackage{subfigure}
% Typesets URLs sensibly - with tt font, clickable in PDFs, and not breaking across lines
\usepackage{url}
% Makes references hyperlinks in PDF output
\usepackage{hyperref}
% Provides ways to include syntax-highlighted source code
\usepackage{listings}
\lstset{frame=single, basicstyle=\ttfamily}
% Provides Harvard-style referencing
\usepackage{natbib}
\bibpunct{(}{)}{;}{a}{,}{,}
% Provides good access to colours
\usepackage{color}
\usepackage{xcolor}
% Simple command I defined to allow me to mark TODO items in red
\newcommand{\todo}[1] {\textbf{\textcolor{red}{#1}}}
% Allows fancy stuff in the page header
\usepackage{fancyhdr}
\pagestyle{fancy}
% Vastly improves the standard formatting of captions
\usepackage[margin=10pt,font=small,labelfont=bf, labelsep=endash]{caption}
% Standard title, author etc.
\title{COMP6024\\Model Specification based on \citet{Telfer:2010}}
\author{Robin Wilson\\ID: 21985588}
\date{}
% Put text on the left-hand and right-hand side of the header
\fancyhead{}
\lhead{COMP6024}
\rhead{Robin Wilson}
\chead{}
So, which of those are most important, and why do I use them:
- amsmath – This is a package by the American Mathematical Society for typesetting mathematics sensibly. I’m not a mathematician, but I often have to include maths in my documents. Nearly all of the tutorials you read on the internet for doing maths in LaTeX will be for amsmath, and it generally gives very good results (and every possibly mathematical symbol/feature you will ever need).
- parskip – I hate indented paragraphs in wordprocessed documents – I’m sure there are various typsetting rules against doing it. Anyway, this removes all paragraph indentation without screwing anything else up!
- booktabs, rotating and colortbl – These provide everything I need to produce good quality tables. Booktabs gives me a number of different options for alignment, as well as providing
\topruleand\bottomrulewhich give thicker lines for the top and bottom of tables. Rotating allows you to produce sideways tables and figures when they won’t fit properly on a portrait page, and. Colortbl allows shading of tables cells. I’ve defined my own command here because I often want to shade a table cell with a light gray. This command (\gray) is simply replaced with\rowcolor[gray]{.9}. - subfigure – This gives a really simple way to include various image files as one figure, labelling each one as (a), (b), (c), with different captions. Simple, but works really well.
- url – Again, this does one very simple thing very well: it typsets URLs sensibly. Simply wrap the url in
\url{}and it typesets it in typewriter font, makes it into a clickable link (in PDF files) and stops it breaking over lines - caption – This is one of the most important packages in my list. It allows you to style the captions produced for tables and figures to make them actually look different from the rest of the text. This is very important to help guide the reader around the page. The way I call the package is
\usepackage[margin=10pt,font=small,labelfont=bf, labelsep=endash]{caption}, and this makes the caption indented, in a smaller font, with the label (eg. Figure 10) in bold and separated from the rest of the caption by a dash. Again, this really tidies up the appearance of documents. - fancyhdr – This lets you put useful things in the header of the document. For example, for my university work I put my name on the right-hand side of the header, and the course code on the left-hand side of the header. The simple
\lheadand\rheadcommands do that for me easily.
I haven’t been through all of my packages above, but the rest of them are fairly easily understandable. In my view, they are the really key packages for almost any type of LaTeX document.
Amazing software you haven’t heard of
Every so often, on my travels around the internet, I come across a piece of software which is so great that I wonder why on earth I haven’t heard of it before. The software listed below falls into this category, and hopefully by posting the list here I will allow more people to find them.
AeroFS
This is an online file-synchronisation service similar to Dropbox but with one key difference: nothing is stored on a cloud server unless you specify that it should be. That is, the synchronisation takes place through a securely encrypted tunnel between your computers running AeroFS, and is never stored in the cloud. This has a number of benefits: it means you can store as much as you want on your AeroFS drive without having to pay for cloud storage, and it means that data is not stored on third party computers (essential for some business applications). It is cross-platform (Windows, Linux, Mac) and free – what more could you want?
Caffeine
This simple app does one simple thing, but is invaluable. Do you ever find that your MacBook screen’s backlight goes off while you’re busy watching a film, showing your family photos, or busy watching a process complete. By clicking the coffee cup icon that Caffeine puts in your menu bar you can stop the screen backlight from switching off. Simply click the icon again to get it back to normal.
Max
We’ve all done it: suddenly needed to convert an audio file and googled “Convert from X to Y” and found a huge list of ad-riddled pages explaining how to do it if you buy their ghastly shareware software. Although I sometimes like to stick to good-old command-line tools like ffmpeg, I quite like finding a nice GUI tool to do this. Max allows you both to rip CDs (through a variety of methods) and convert audio files that you already have, all through a nice GUI interface, with no dependencies on other software. Unfortunately it’s Mac only.
Evom
Similar to Max, but for video – this program will convert any video files you have to other formats, and download YouTube videos to any format you want. It’ll even let you convert files into just the right format for playing on various hand-held devices (iPods, iPads, mobile phones etc).
DTerm
I’ve mentioned DTerm before on my blog, and I really can’t live without it on my Mac now. It allows you to quickly open a simple command prompt in any directory, and execute a command there (with full output shown), or switch immediately to a terminal focussed on that folder, ready to do any other processing you might need. It does full command-line completion, and I haven’t yet found a command that won’t work in DTerm’s terminal.
Git helps me get round to using source control
I’ve always heard how source control should be used for every project, including those which you think are just going to be throwaway code. However, I’ve often not got around to doing this – if I write a piece of code in ten minutes, but it takes five minutes to set up a new subversion repository on my server, then it seems like a waste of time to set it all up (it probably isn’t, given how version control can help further down the line, but because of this perception it often doesn’t get done).
However, git has changed that. When working with git all I need to do is type git init and I have a brand new git repository in my folder. I can then quickly shove in a .gitignore file from my stash (acquired from this git repository), add all of the files (git add *) and commit. I can do anything I want with this repository without having to connect it to a remote server. If, later in the project, I want to put it all on a remote server, I can just create a new GitHub repository and add it as a remote.
As this is just so easy, I’ve put far more code into source control, and it’s really saved my bacon a number of times!
(Of course, this applies to other version control systems too – it’s just that I use git)
