Robin's Blog

Android + OpenCV = Confusion…

May 11, 2018

This is another post that I found sitting in my drafts folder… It was written by my wife while she was doing her Computer Science MSc about 18 months ago. I expect that most of what she says is still correct, but things may have changed since then. Also, please don’t comment asking questions about Android and OpenCV – I have no experience with it, and my wife isn’t writing for Android these days.

Hello, I’m Olivia, Robin’s wife, and I thought I’d write a guest post for this blog about how to get Android and OpenCV playing nicely together.

Having just started writing for Android, I was astonished at the amount of separate tutorials I had to follow just to get started. It didn’t help that:

I had never programmed for Android
I had never programmed in Java (yes, I know)
I had never used a complex code building IDE
I needed to get OpenCV working with this

Needless to say I was a little flummoxed by the sheer amount of things that I had to learn. The fact that methods can only have one return type still bewilders me (coming from a Python background, where methods can easily return loads of different objects, of all sorts of different types).

So for those of you who are in a similar situation I thought I’d provide a quick start guide which joins together the tutorials I found the best to create your first Android app using OpenCV.

First things first, you will need to have the following downloaded:

Android Studio (when installing pretty much tick everything, just to make sure!)
OpenCV for Android

First start by following Google’s first Android tutorial Building Your First App through to Starting Another Activity (if you get to Supporting different devices then you’ve gone too far). This takes you through setting up a project, the files you will need to edit, setting up an emulator (or using a actual Android device), running the app, creating a user interface, and possibly most importantly Android ‘Activities’. It also gives a brilliant introduction in the sidebars into the concepts involved. If you just want to get things working with OpenCV the only parts you need to follow is up to Running Your Application.

To get OpenCV working within Android Studio you need to have unzipped the download. Then follow the instructions from this blog post, using whichever OpenCV version you have. This should get everything set up properly, but now we need to check everything has worked and that our app compiles.

Add these import statements at the top of your activity file:

import org.opencv.core.CvType;
import org.opencv.core.Mat;

You should then add this code within the Activity class before any other methods.

static {
// If you use opencv 2.4, System.loadLibrary("opencv_java")
 System.loadLibrary("opencv_java3");
}

Then add the following into the onCreate method at the bottom:

Mat test = new Mat(200, 200, CvType.CV_8UC1);
Imgproc.equalizeHist(test, test);

(the example code above was taken from here).

If you now try and run this, it should work, if it doesn’t it may suggest that OpenCV isn’t properly loaded. If this doesn’t work, then the comments on this blog post are fairly comprehensive – your problem has probably already been solved!

If you understand all of this and just want to have a ready made ‘project template’ to use all of this then I have put one together at https://github.com/oew1v07/AndroidOpenCVTemplate where all the libraries are already in the required folders. To get this working follow these steps:

Clone the repository using git clone https://github.com/oew1v07/AndroidOpenCVTemplate.git
Open Android Studio and choose Open an existing Android Studio project
Navigate to the cloned folder and click Open.
Using the Project button on the left side navigate to Gradle Scripts/build.gradle (Module: app). The settings here are not necessarily the ones that you will want to use. The following example from here shows in blue the areas you might want to change.
- compileSdkVersion is the Android version you wish to compile for
- buildToolsVersion are the build tools version you want to use
- minSdkVersion is the minimum Android version you want the app to run under
- targetSdkVersion is the version that will be the mostly commonly used.
- applicationId should reflect the name of your app. However this is very difficult to change after the initial setting up. The one in the cloned repository is com.example.name.myapplication. Should you wish to change this there is a list of places you would need to edit, in a list at the bottom of this article.

apply plugin: 'com.android.application'

android {
compileSdkVersion 22
buildToolsVersion "22.0.1"

defaultConfig {
applicationId "com.haidermushtaq.bullincarnadine.piclone"
minSdkVersion 15
targetSdkVersion 22
 versionCode 1
 versionName "1.0"
 }
 buildTypes {
 release {
 minifyEnabled false
 proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.pro'
 }
 }
}

dependencies {
 compile fileTree(include: ['*.jar'], dir: 'libs')
 compile 'com.android.support:appcompat-v7:22.1.1' 
}

In the file local.properties you need to change sdk.dir to the directory where the Android SDK is installed on your computer.
Once this is finished you should be able to run it with no problems, and it should display a text message saying “Elmer is the best Elephant of them all!” (for an explanation of this message, see here)

Hopefully this was helpful to other people in a similar situation to me – and brought all of the various tutorials together.

List of places to change app name

These files and folders can be navigated from the cloned repository.

Where	What to change	What to change it to
`app/build.gradle`	com.example.name.myapplication	Whatever you want this to be called. Generally something along the lines of `com.example.maker.appname`
`app/src/androidTest/java/com/example/`	The folder ‘name’	maker
`app/src/androidTest/java/com/example/name`	The folder ‘myapplication’	appname
`app/src/androidTest/java/com/example/name/myapplication/ApplicationTest.java`	package com.example.name.myapplication	package com.example.maker.appname
`app/src/main/AndroidManifest.xml`	package com.example.name.myapplication	package com.example.maker.appname
`app/src/main/java/com/example/`	The folder ‘name’	maker
`app/src/main/java/com/example/name`	The folder ‘myapplication’	appname
`app/src/main/java/com/example/name/myapplication/MyActivity.java`	package com.example.name.myapplication;	package com.example.maker.appname
`app/src/main/res/layout/activity_display_message.xml`	tools:context=”com.example.name.myapplication.DisplayMessageActivity”	tools:context=”com.example.maker.appname.DisplayMessageActivity”
`app/src/main/res/layout/activity_my.xml`	tools:context=”com.example.name.myapplication.MyActivity”	tools:context=”com.example.maker.appname.MyActivity”
`app/src/main/res/layout/content_display_message.xml`	tools:context=”com.example.name.myapplication.DisplayMessageActivity”	tools:context=”com.example.maker.appname.DisplayMessageActivity”
`app/src/main/res/menu/menu_my.xml`	tools:context=”com.example.name.myapplication.MyActivity”	tools:context=”com.example.maker.appname.MyActivity”
`app/src/main/res/values/strings.xml`	If you want to change the name of the app (when on the phone). Change <string name=”app_name”>My App</string>	Whatever you want to call it
`app/src/test/java/com/example/`	The folder ‘name’	maker
`app/src/test/java/com/example/name`	The folder ‘myapplication’	appname
`app/src/test/java/com/example/name/myapplication/ExampleUnitTest.java`	package com.example.name.myapplication;	package com.example.maker.appname

Share your thoughts

Cloud frequency map for Europe & update to cloud frequency web app

May 8, 2018

I’ve posted before about the cloud frequency map that I created using Google Earth Engine. This post is just a quick update to mention a couple of changes.

Firstly, I’ve produced some nice pretty maps of the data from 2017 over Europe and the UK respectively. I posted the Europe one to the DataIsBeautiful subreddit and got quite a few upvotes, so people obviously liked the visualisation. The two maps are below – click on the images to get the full resolution copies.

Interestingly, you can see quite a lot of artefacts around the coast – particularly in the UK one. I think this is a problem with the algorithm that occurs around coasts – or at least discontinuities from the different algorithms used over land and water.

I’ve also updated the interactive cloud frequency web app to use data from 2017.

1 Comment

Attending EGU in a wheelchair

May 3, 2018

This is an old post that I found stuck in my ‘drafts’ folder – somehow I never got round to clicking ‘publish’. I attended EGU in 2016, and haven’t been back since – so things may have changed. However, I suspect that the majority of this post is still correct.

Right, so, in case you hadn’t guessed from the title of this post: I use a wheelchair. I won’t go into all of the medical stuff, but in summary: I can’t walk more than about 100m without getting utterly exhausted, so I use an electric wheelchair for anything more than that. This has only happened relatively recently, and I got my new electric wheelchair about a month ago.

For a number of years I’d been trying to go to either the European Geophysical Union General Assembly (EGU) or the American Geophysical Union Fall Meeting (AGU), but I either hadn’t managed to get any funding, or hadn’t been well enough to travel.

This year, I’d had two talks accepted for oral presentation at EGU, and had also managed to win the Early Career Scientists Travel Award, which would pay my registration fees. So, if I was going to go, then this year was the right time to do it… I decided to ‘bite the bullet’ and go – and it actually went very well.

However, before I went I was quite worried about the whole process: I hadn’t flown with my wheelchair before, I didn’t know how accessible the conference centre would be, I was worried about getting too tired, or getting rude comments about being in a wheelchair, and so on. The rest of this post is going to be very detailed – and some of you may wonder why on earth I’ve gone in to so much detail. The reason is explained very well in this post by Hannah Ensor:

Friends, family – and even strangers – often want to be supportive. The most common phrase is: “Don’t worry, I’m sure it will be fine.” After all, accessibility is a legal requirement so that shouldn’t be a problem. And you want to make me feel better about the trip.

But think about it: Do you really understand all my needs and differences, and have an equally detailed knowledge and of everything that might present challenges, and suitable solutions to each one from the moment I leave my home until I return to it again? Have you inspected the accessible loos and checked the temperature control in the rooms? Do you realise how many places that call themselves ‘accessible’ have steps to the bathroom or even steps to the entrance?

Most of the information I could find before going was of this ‘everything will be fine; it is accessible’ type – and so that’s why I am putting all of the details in this post: hopefully it will make someone else’s life far easier in the future. Although I’m writing primarily from the point of view of a wheelchair user, I’ve also tried to think about some of the issues that people with other disabilities may experience.

I’ll start with the things that will be most generally applicable – in this case, the venue. EGU is held at the Austria Centre Vienna, close to the Danube in Vienna. The EGU website helpfully stated “The conference centre is fully-accessible”, but gave no further details – and most disabled people have learnt from painful experience not to trust these sorts of statements.

Luckily, that statement was actually true. Most people will get to the conference centre from the local U-bahn station (KaisermÃ¼hlen-VIC – which, like all of the U-bahn stations, has lifts to each platform) or one of the local hotels. There are various sets of steps along the paths leading to the conference centre, but there are always nice long sloping ramps provided too:

The entrance to the conference centre is large and flat. There are automatic doors into the ‘entrance hall’, and then push/pull doors into the conference centre itself (they are possible to open in a wheelchair, but most of the time people held them open for me).

Some of the poster halls are in a separate building (although they can be accessed by an underground link from the main conference centre). The entrance here isn’t totally flat: there is a small, but significant bump (‘mini step’) which my electric wheelchair didn’t like. Going in the door backwards worked, but that can be a bit difficult if there are lots of people around.

Each floor of the conference centre is entirely on the level, and there are multiple sets of lifts (two in each set, and I think there are three locations in the buildings):

The lifts are of a reasonable size, with enough space for me to turn my wheelchair around inside them. I could reach the buttons inside the lifts, but people who have shorter arms than me might struggle to reach from their chair. Also, as far as I could see, there were no braille markings on any of the buttons, which would make it difficult for a visually-impaired person to use the lifts.

The other minor issue with the lifts was that it was sometimes difficult to reach the call buttons for the lifts, as a set of recycling bins were usually located directly in front of the call buttons (I have no idea why…). I could reach ok most of the time, but people with shorter arms than me would struggle.

One thing that you may have noticed from the pictures above is how well-signed everything is: this was a really pleasing aspect of the conference organisation. As you may also have noticed from the maps, the building is symmetrical in a number of axes, so it could be hard to work out where you were (everything kinda looked the same…) – so the maps and signs were much appreciated!

The rooms that were actually used for the talks varied significantly in size from small ‘classroom/seminar room’ size to large ‘auditorium’ size. I didn’t manage to get many photos of the rooms because I was usually busy either listening to a talk or giving a talk, but here is an example of one of the smaller rooms:

As you can see, the floor is entirely flat here, so it is very easy to get anywhere you need to get to. When listening to talks in these sorts of rooms I tended to place my wheelchair at the end of one of the rows of seats, or – if appropriate – ask someone to move one of the seats out of the way to give me space to fit into a row properly.

When I gave a talk in one of these sorts of rooms, I spoke from my wheelchair at the front of the room (directly underneath the projection screen), with a portable microphone and a remote to change the slides. I didn’t use the official lecturn as in my chair I would have been hidden entirely behind it – and I’m not sure the microphone would have reached properly!

I haven’t got a photo of any of the larger rooms, but they have a stage at the front – which obviously makes things a bit more difficult for wheelchair users. I was offered a range of ways to present in that room: from my wheelchair on the main floor (ie. not up on the stage), or walking up the stairs to the stage and presenting from a seat behind the lecturn, or presenting from a seat behind the convenor’s table on the stage. I chose the latter option, with a microphone and laptop to control the slides – but any of them would have worked.

Each time when trying to sort out the arrangements for my talk, I found the ‘people in yellow’ (the EGU assistants in yellow t-shirts who sort out the presentations, laptops and so on) to be very helpful in arranging anything I needed.

I didn’t do a PICO presentation, but I attended a number of PICO sessions, and was impressed to see that each ‘PICO Spot’ had a lower screen for wheelchair users:

The exhibition area was generally accessible with two unfortunate exceptions: both the Google Earth Engine stand and the EGU stand were on a raised platform about 3 inches off the floor…very frustrating! I expressed my frustration to the people on the EGU stand and was assured that this wouldn’t be the case next year. All of the rest of the stands were flat on the ground and I could access them very easily.

I’ve left one of the most important things to last…a real essential item: disabled toilets. As many disabled people will know, disabled toilets can leave a lot to be desired. However, I was generally impressed with the conference centre’s toilets:

It’s difficult to show the full size of the toilet without a fisheye lens, but they were large enough to get my wheelchair in and still have a fair amount of space to move around (far better than the sort of ‘disabled’ toilets that will barely fit a wheelchair!). They were also clean, nicely decorated, and had all of the extra handles and arm-rests that should be present. What’s more, the space next to the toilet itself was kept free so that if you needed to transfer directly from a chair to the toilet then that would be possible.

All of the toilets in the main part of the conference centre were like the example above – very impressive – but unfortunately the toilets in the other building (which contained poster halls X1-4) weren’t as good. They were still better than some toilets I’ve used, but they had turned into a bit of a store-room making them cluttered and difficult to manouvere around, and also stopping anyone from transferring directly from a chair to the toilet. In this building the disabled toilets were also located in such a way that the open door to the disabled toilet would block the entrance to one of the ‘normal’ toilets…and this means that when you open the door to come out of the toilet it is quite easy to almost knock someone over!

In summary, things worked remarkably well, and accessibility was good. I would have no hesitation in attending EGU again, and using my wheelchair while there.

1 Comment

Some recent papers of mine

April 23, 2018

I’ve just realised that I haven’t posted about the last few papers that I’ve authored. Some of these came from before I stopped paid work due to ill-health, and some were based on work that was done before I stopped work, but have only been published since. Anyway, on with the papers:

Rapid and near real-time assessments of population displacement using mobile phone data following disasters: the 2015 Nepal earthquake

Link to article

I am first author of this paper, alongside over a dozen others from the Flowminder Foundation. It details work that we did in response to the 2015 Nepal earthquake, where we used mobile phone data to analyse population movements resulting from the disaster. I led this project, and – a while after we’d finished providing data to the aid agencies in Nepal – I co-ordinated the writing of the paper.

Rather than repeat things, I’ll point you to the detailed write-up on the Flowminder website, and suggest that you watch the video below:

https://www.youtube.com/watch?v=hY9g0kPi8BA

This video was part of our submission to the Global Mobile Awards 2016, and I’m very proud to say that we won in the Mobile in Emergency or Humanitarian Situations category. The judges commented that this was ‘A brilliant example of how the application of big data analysis to mobile technologies can be used to accelerate emergency aid, and provide intelligence to help prepare for future disasters.’ I’m also pleased to say that this paper has 26 citations (at the time of writing) – tying for first place with my Py6S paper.

Reference: Wilson, R., zu Erbach-Schoenberg, E., Albert, M., Power, D., Tudge, S., Gonzalez, M., Guthrie, S., Chamberlain, H., Brooks, C., Hughes, C. and Pitonakova, L. et al., 2016, Rapid and near real-time assessments of population displacement using mobile phone data following disasters: the 2015 Nepal Earthquake. PLoS currents, 8.

Predictors of Daily Mobility of Adults in Peri-Urban South India

Link to article

This paper came out of a collaboration with the London School of Hygiene and Tropical Medicine. The collaboration actually started with me providing some satellite-derived estimates of air pollution over a study area near Hyderabad, India for a project they were working on there. Alongside these I also provided some other satellite products, including satellite-derived night-time lights intensity data. They were interested in this for a number of other analyses that were taking place in that study area, and so I did some more work on the night-time lights data, calibrating it and providing data over the villages in the Andhra Pradesh Children and Parents Study (APCAPS). This night-time light data was then used as a measure of ‘urbanicity’ (the extent to which an area is urban) for each village, something which has been found to have a significant impact on health. In this study, villages which had a higher urbanicity were associated with more mobility in and around home for both women and men.

Although night-time lights data has been used as a measure of urbanicity before, it is still relatively novel – and it has not been applied as a predictor for mobility before. This has potentially useful implications for predicting mobility of populations – and this is important because mobility has a significant impact on health (for example, affecting exposure to sources of disease).

Reference: Sanchez, M., Ambros, A., Salmon, M., Bhogadi, S., Wilson, R.T., Kinra, S., Marshall, J.D. and Tonne, C., 2017. Predictors of daily mobility of adults in peri-urban South India. International journal of environmental research and public health, 14(7), p.783.

Is increasing urbanicity associated with changes in breastfeeding duration in rural India? An analysis of cross-sectional household data from the Andhra Pradesh children and parents study

Link to article

This paper is another resulting from my collaboration with the London School of Hygiene and Tropical Medicine – and again it uses the night-time light intensity data that I processed for them. It is used as a measure of urbanicity again – although this time urbanicity is categorised as low, medium or high, and used as a predictor of breastfeeding duration. It was found that higher urbanicity was linked with a shorter duration of breastfeeding – something which is important, as longer breastfeeding duration is linked to many health benefits. Interestingly, this paper was published in the year that my son was born – so I was observing a lot of breastfeeding (and the issues associated with it) in my personal life.

Reference: Oakley, L., Baker, C.P., Addanki, S., Gupta, V., Walia, G.K., Aggarwal, A., Bhogadi, S., Kulkarni, B., Wilson, R.T., Prabhakaran, D. and Ben-Shlomo, Y., 2017. Is increasing urbanicity associated with changes in breastfeeding duration in rural India? An analysis of cross-sectional household data from the Andhra Pradesh children and parents study. BMJ open, 7(9), p.e016331.

Share your thoughts

Roadside leafiness from space – Part 2

April 17, 2018

In the last post in this series, I showed some pretty maps of roadside leafiness, created by extracting NDVI values near roads – like this:

This time, I want to move away from pretty images to some numerical analysis – and also move from a local scale to a national scale.

My first question was: which is the leafiest place in the country?

Now, that question requires a bit of refining – firstly to work out what we mean by a "place". If we look in the countryside then we’re going to find very leafy roads – so we need to restrict this to urban areas. Luckily, the Office for National Statistics have produced a dataset that helps us here: the Built-Up Areas Boundary dataset. This data, from 2011, is automatically created from Ordnance Survey data, and consists of vector outlines of built-up areas in England. We can take these polygons and extract the average leafiness in each polygon and produce a map like this:

You can see here that lots of built-up areas are present, including many very small areas. You can also see that some areas are very large – built-up areas within a certain distance of each other are merged, which creates a ‘South Hampshire Built-Up Area’ which includes Southampton, Eastleigh, Chandlers Ford, Fareham, Gosport, Portsmouth and more. It’s rather frustrating from the perspective of our analysis, but I suppose it shows how built-up this area of South Hampshire is.

We can go ahead and export this aggregated leafiness data to CSV (I chose to aggregate using mean and median to compare them) to continue the analysis in Python. If you want to see how I analysed the CSV data then have a look at this notebook – but it’s fairly simple pandas analysis, and the main results are reproduced here.

So – which are the leafiest places in the country?

Name	Leafiness
Winchester	0.23
Northwich	0.18
Maidenhead	0.18
Heswall	0.17
Worcester	0.16

You can see that Winchester comes out top by a fair way. Winchester is probably the only one of these that I would have immediately thought of as very leafy – but that probably says more about my preconceptions than anything else. These were calculated using mean leafiness, but the top four stay the same, and the fifth entry is Great Malvern. I think that is likely an anomaly as the built-up area outline for Great Malvern actually includes Malvern Link, West Malvern, Colwall and a number of other smaller settlements nearby – and includes the green areas between these individual settlements.

The lowest areas are:

Name	Leafiness
Grays	0.05
Thanet	0.05
Greater London	0.06
Stevenage	0.06
Exeter	0.07

Embarassingly, I’d never heard of Grays – but it turns out it is on the banks of the Thames in East London, just east of the Dartford Crossing.

The most variable areas are:

Name	CV
Thanet	0.90
Felixstowe	0.67
Reading	0.64
Blackpool	0.62
Greater Manchester	0.60

and the least variable areas are:

Name	CV
Bath	0.22
Yeovil	0.23
Stafford	0.25
York	0.25
Durham	0.27

We can also plot mean against coefficient of variation, which allows us to examine where areas are placed when considering both the leafiness and the variability of this leafiness. The image below shows a static version of this graph – I actually produced an interactive version using my code to easily produce Bokeh plots with tooltips, and that interactive version can be found at the bottom of the notebook used to do the analysis.

I also did a bit more analysis, extracting leafiness over all Lower Layer Super Output Areas (LSOAs) in Southampton, and joining the data with the Index of Multiple Deprivation from the 2011 census – aiming to discover if there is a link between leafiness and the deprivation of an area. Short answer: there isn’t – as the graph below demonstrates:

So, that’s the end of a fun bit of analysis – thanks to James O’Connor again for the idea.

Share your thoughts

Roadside leafiness from space – Part 1

April 3, 2018

If you just want to see some pretty maps of roadside leafiness then scroll down…otherwise, start at the top to find out how I did this.

I’ve recently started doing a little bit of satellite imaging work again, and started off with a project that was inspired by a post on James O’Connor’s blog. He posted about looking at the leafiness of streets in cities by extracting the NDVI (Normalized Difference Vegetation Index) within 20m of a road. He did this for Cleveland, Ohio using data from the Landsat 5 sensor, and produced a nice output map showing some interesting patterns.

This caught my interest, and I decided to see if I could scale up the analysis to larger areas. My go-to tool for doing simple but large-scale remote sensing work is Google Earth Engine – a free tool provided by Google which can easily process large volumes of satellite imagery stored on Google’s servers. To use this tool you have to be a ‘registered tester’, as the tool is not fully released yet, but it is easy to register to become a tester.

For this particular task I needed two datasets: some relatively high-resolution satellite imagery to use to calculate the NDVI, and some vector data of the road network. Conveniently, Google Earth Engine already contains datasets that fulfil these criteria: all of the Sentinel-2 data is in there, which provides visible and near-infrared bands at 10-30m resolution (depending on the band – the ones that we need for NDVI calculation are provided at 10m), and the TIGER Roads dataset containing vector data on roads across the USA. I thought this would give me a good starting point – although I was keen to perform this analysis for the UK, so I decided to find an equivalent vector roads dataset for the UK. I downloaded OS Open Roads, and then uploaded it to Earth Engine so it could be used in my analysis. (Actually, there was a bit more to it than this, as the Open Roads data is provided in 100km tiles, so I had to stitch all of these together before uploading – I really wish OS would provide the data in a single large file. I then ran into various problems with projections – see my questions to the Google Earth Engine developers – but eventually managed to get a reasonable accuracy by uploading in the WGS-84 projection.)

The code to perform the analysis is actually very simple, and is shown below. For those of you who have access to Earth Engine, this link will open the code up in the Earth Engine workspace. In summary, what we do is select all of the Sentinel-2 data for 2017, and take the pixel-wise median (this is a very crude way of filtering out clouds, shadows and other anomalies – leaving a relatively representative view of the ground). We then calculate the NDVI and extract pixels within 10m of a road (I chose 10m as this was more likely to just get vegetation that was at the edge of the road, and was possible as Sentinel-2 is a higher resolution sensor than Landsat 5). The easiest way to do this extraction is to create a raster of distance from the vector roads data (putting in a small maximum distance to search to increase computational efficiency) and then use this to mask the NDVI. Finally, we visualise the data on the map provided in the Earth Engine workspace, and download it as a GeoTIFF for further analysis.

var roads = ee.FeatureCollection("TIGER/2016/Roads"),
    sentinel2 = ee.ImageCollection("COPERNICUS/S2"),
    roads_uk = ee.FeatureCollection("users/rtwilson/RoadLink_All_WGS84"),

var roads_uk_fc = ee.FeatureCollection(roads_uk);

// Load Sentinel collection, filter to 2017, select B4 (Red) and B8 (NIR)
// and create a median composite
var s2_median = sentinel2
                .filterDate('2017-01-01', '2017-12-31')
                .select(['B4', 'B8'])
                .median();

// Calculate NDVI
var ndvi = s2_median.normalizedDifference(['B8', 'B4']).rename('NDVI');

// Calculate distance from road for each pixel
// The first parameter (10) is the maximum distance to look, in metres
// Looking 10m in each direction gives a buffer of 20m
var dist = roads_uk_fc.distance(10)

// Threshold the distance to a road to get a binary image for masking with
var thresh_distance = dist.lt(10);

// Use the distance threshold image to mask the NDVI
// var masked_ndvi = ndvi.mask(thresh_distance);
var masked_ndvi = ndvi.multiply(thresh_distance);

Map.addLayer(masked_ndvi, {}, 'NDVI near roads');

// Export the image, specifying scale and region.
Export.image.toDrive({
  image: masked_ndvi,
  description: 'Masked_NDVI',
  scale: 10,
  region: area_to_export,
  crs: "EPSG:27700",
  maxPixels: 8185883500
});

This further analysis was carried out in QGIS – and there are a few more posts coming on some issues that I ran into while doing the QGIS-analysis, and a few tips of useful GDAL functionality that can help with this sort of analysis. In this post I’m going to focus on the visualisation I did and show some pretty pictures. In the next post I will show a bit more quantitative analysis comparing leafiness across settlements in the UK.

Anyway, on to the pictures. I used the viridis colourmap as this is generally a well-designed colourmap, and quite appropriate in this context as the high values are green/yellow. The first set of images below are all on the same colour scale, so you can compare colours across the images.

First, of course, I looked at my home town, Southampton:

You can see some interesting patterns here. There is a major area of high leafiness in the Upper Shirley area, to the west of the common, with some high areas also in Highfield and Bassett. Most of the city centre was very low – interestingly, even including the areas around the major parks – and most of the other areas of the city were quite low as well. There seems to be a relationship between leafiness and the wealth of an area, with relatively wealthy areas having high leafiness and relatively poor areas having low leafiness – as you’d expect.

Moving slightly north to Eastleigh and Chandlers Ford, you see a very noticeable hot spot:

The area of Hiltingbury, at the northern end of Chandlers Ford, is very leafy – which definitely matches up with my experience visiting there! In general, the western side of Chandlers Ford is leafier, with the eastern side of Chandlers Ford having almost as low values as Eastleigh. In general Chandlers Ford is considered significantly more desirable than Eastleigh, so I’m slightly surprised that there isn’t more of a difference. The centre of Eastleigh is extremely low – as these streets are mostly terraced or semi-detached houses with little vegetation around (either trees or front gardens) – but again, I’m slightly surprised to see the hot spot on the west of Eastleigh.

Zooming out to encompass multiple settlements in southern Hampshire, we see a very obvious pattern:

Winchester is completely dominating the area, with far higher values even than most rural areas. Again, this matches up with expectations given that Winchester is a wealthy city, and is known as a nice, leafy place to live. The rural area between Eastleigh and Winchester is also very high, as is – slightly surprisingly – the area around Bursledon to the east of Southampton.

Zooming in on Winchester shows the pattern within the city:

There is a significant east-west divide, with the estates of detached houses that have grown up on the west of the city having very high leafiness, and the older terraced and semi-detached areas on the east of the city having a far lower leafiness.

So, there are some interesting patterns here and some pretty maps. In Part 2, I’ll look at some statistical analysis of leafiness across the England, Wales and Scotland, and then consider some of the limitations of this approach.

4 Comments

Quick way to delete columns from a shapefile

March 10, 2018

I haven’t posted anything on this blog for a long time – sorry about that. I’ve been quite ill, and had a new baby – so blogging hasn’t been my top priority. Hopefully I’ll manage some slightly more regular posts now. Anyway, on with the post…

I recently needed to delete some attribute columns from a very large (multi-GB) shapefile. I had the shapefile open in QGIS, so decided the easiest way would be to do it through the GUI as follows:

Open up the attribute table
Turn on editing (far left toolbar button)
Click the Delete Field button (third from the right, or press Ctrl-L) and select the fields to delete

I was surprised to find that this took ages. It seemed to refresh the attribute table multiple times throughout the process (maybe after deleting each separate field?), and that took ages to do (because the shapefile was so large).

I then found I needed to do this process again, and looked for a more efficient way – and I found one. Unsurprisingly, it uses the GDAL/OGR command-line tools – a very helpful set of tools which often provide superior features and/or performance.

Basically, rather than deleting fields, copy the data to a new file, selecting just the fields that you want. For example:

ogr2ogr -f "ESRI Shapefile" -sql "SELECT attribute1, attribute2 FROM input" output.shp input.shp

This will select just the columns attribute1 and attribute2 from the file input.shp.

Surprisingly this command doesn’t actually produce a full shapefile as an output – instead of producing output.shp, output.shx, output.prj and output.dbf (the full set of files that constitute a ‘shapefile’), it just creates output.dbf – the file that contains the attribute table. However, this is easily fixed: just copy the other input.* files and rename them as appropriate (or, if you don’t want to keep the input data, then just rename output.dbf as input.dbf).

5 Comments

Regression in Python using R-style formula – it’s easy!

August 20, 2016

I remember experimenting with doing regressions in Python using R-style formulae a long time ago, and I remember it being a bit complicated. Luckily it’s become really easy now – and I’ll show you just how easy.

Before running this you will need to install the pandas, statsmodels and patsy packages. If you’re using conda you should be able to do this by running the following from the terminal:

conda install statsmodels patsy

(and then say yes when it asks you to confirm it)

import pandas as pd
from statsmodels.formula.api import ols

Before we can do any regression, we need some data – so lets read some data on cars:

df = pd.read_csv("http://web.pdx.edu/~gerbing/data/cars.csv")

You may have noticed from the code above that you can just give a URL to the read_csv function and it will download it and open it – handy!

Anyway, here is the data:

df.head()

	Model	MPG	Cylinders	Engine Disp	Horsepower	Weight	Accelerate	Year	Origin
0	amc ambassador dpl	15.0	8	390.0	190	3850	8.5	70	American
1	amc gremlin	21.0	6	199.0	90	2648	15.0	70	American
2	amc hornet	18.0	6	199.0	97	2774	15.5	70	American
3	amc rebel sst	16.0	8	304.0	150	3433	12.0	70	American
4	buick estate wagon (sw)	14.0	8	455.0	225	3086	10.0	70	American

Before we do our regression it might be a good idea to look at simple correlations between columns. We can get the correlations between each pair of columns using the corr() method:

df.corr()

	MPG	Cylinders	Engine Disp	Horsepower	Weight	Accelerate	Year
MPG	1.000000	-0.777618	-0.805127	-0.778427	-0.832244	0.423329	0.580541
Cylinders	-0.777618	1.000000	0.950823	0.842983	0.897527	-0.504683	-0.345647
Engine Disp	-0.805127	0.950823	1.000000	0.897257	0.932994	-0.543800	-0.369855
Horsepower	-0.778427	0.842983	0.897257	1.000000	0.864538	-0.689196	-0.416361
Weight	-0.832244	0.897527	0.932994	0.864538	1.000000	-0.416839	-0.309120
Accelerate	0.423329	-0.504683	-0.543800	-0.689196	-0.416839	1.000000	0.290316
Year	0.580541	-0.345647	-0.369855	-0.416361	-0.309120	0.290316	1.000000

Now we can do some regression using R-style formulae. In this case we’re trying to predict MPG based on the year that the car was released:

model = ols("MPG ~ Year", data=df)
results = model.fit()

The ‘formula’ that we used above is the same as R uses: on the left is the dependent variable, on the right is the independent variable. The ols method is nice and easy, we just give it the formula, and then the DataFrame to use to get the data from (in this case, it’s called df). We then call fit() to actually do the regression.

We can easily get a summary of the results here – including all sorts of crazy statistical measures!

results.summary()

OLS Regression Results
Dep. Variable:	MPG	R-squared:	0.337
Model:	OLS	Adj. R-squared:	0.335
Method:	Least Squares	F-statistic:	198.3
Date:	Sat, 20 Aug 2016	Prob (F-statistic):	1.08e-36
Time:	10:42:17	Log-Likelihood:	-1280.6
No. Observations:	392	AIC:	2565.
Df Residuals:	390	BIC:	2573.
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
Intercept	-70.0117	6.645	-10.536	0.000	-83.076 -56.947
Year	1.2300	0.087	14.080	0.000	1.058 1.402

Omnibus:	21.407	Durbin-Watson:	1.121
Prob(Omnibus):	0.000	Jarque-Bera (JB):	15.843
Skew:	0.387	Prob(JB):	0.000363
Kurtosis:	2.391	Cond. No.	1.57e+03

We can do a more complex model easily too. First lets list the columns of the data to remind us what variables we have:

df.columns

Index(['Model', 'MPG', 'Cylinders', 'Engine Disp', 'Horsepower', 'Weight',
       'Accelerate', 'Year', 'Origin'],
      dtype='object')

We can now add in more variables – doing multiple regression:

model = ols("MPG ~ Year + Weight + Horsepower", data=df)
results = model.fit()
results.summary()

OLS Regression Results
Dep. Variable:	MPG	R-squared:	0.808
Model:	OLS	Adj. R-squared:	0.807
Method:	Least Squares	F-statistic:	545.4
Date:	Sat, 20 Aug 2016	Prob (F-statistic):	9.37e-139
Time:	10:42:17	Log-Likelihood:	-1037.4
No. Observations:	392	AIC:	2083.
Df Residuals:	388	BIC:	2099.
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
Intercept	-13.7194	4.182	-3.281	0.001	-21.941 -5.498
Year	0.7487	0.052	14.365	0.000	0.646 0.851
Weight	-0.0064	0.000	-15.768	0.000	-0.007 -0.006
Horsepower	-0.0050	0.009	-0.530	0.597	-0.024 0.014

Omnibus:	41.952	Durbin-Watson:	1.423
Prob(Omnibus):	0.000	Jarque-Bera (JB):	69.490
Skew:	0.671	Prob(JB):	8.14e-16
Kurtosis:	4.566	Cond. No.	7.48e+04

We can see that bringing in some extra variables has increased the $R^2$ value from ~0.3 to ~0.8 – although we can see that the P value for the Horsepower is very high. If we remove Horsepower from the regression then it barely changes the results:

model = ols("MPG ~ Year + Weight", data=df)
results = model.fit()
results.summary()

OLS Regression Results
Dep. Variable:	MPG	R-squared:	0.808
Model:	OLS	Adj. R-squared:	0.807
Method:	Least Squares	F-statistic:	819.5
Date:	Sat, 20 Aug 2016	Prob (F-statistic):	3.33e-140
Time:	10:42:17	Log-Likelihood:	-1037.6
No. Observations:	392	AIC:	2081.
Df Residuals:	389	BIC:	2093.
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
Intercept	-14.3473	4.007	-3.581	0.000	-22.224 -6.470
Year	0.7573	0.049	15.308	0.000	0.660 0.855
Weight	-0.0066	0.000	-30.911	0.000	-0.007 -0.006

Omnibus:	42.504	Durbin-Watson:	1.425
Prob(Omnibus):	0.000	Jarque-Bera (JB):	71.997
Skew:	0.670	Prob(JB):	2.32e-16
Kurtosis:	4.616	Cond. No.	7.17e+04

We can also see if introducing categorical variables helps with the regression. In this case, we only have one categorical variable, called Origin. Patsy automatically treats strings as categorical variables, so we don’t have to do anything special – but if needed we could wrap the variable name in C() to force it to be a categorical variable.

model = ols("MPG ~ Year + Origin", data=df)
results = model.fit()
results.summary()

OLS Regression Results
Dep. Variable:	MPG	R-squared:	0.579
Model:	OLS	Adj. R-squared:	0.576
Method:	Least Squares	F-statistic:	178.0
Date:	Sat, 20 Aug 2016	Prob (F-statistic):	1.42e-72
Time:	10:42:17	Log-Likelihood:	-1191.5
No. Observations:	392	AIC:	2391.
Df Residuals:	388	BIC:	2407.
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
Intercept	-61.2643	5.393	-11.360	0.000	-71.868 -50.661
Origin[T.European]	7.4784	0.697	10.734	0.000	6.109 8.848
Origin[T.Japanese]	8.4262	0.671	12.564	0.000	7.108 9.745
Year	1.0755	0.071	15.102	0.000	0.935 1.216

Omnibus:	10.231	Durbin-Watson:	1.656
Prob(Omnibus):	0.006	Jarque-Bera (JB):	10.589
Skew:	0.402	Prob(JB):	0.00502
Kurtosis:	2.980	Cond. No.	1.60e+03

You can see here that Patsy has automatically created extra variables for Origin: in this case, European and Japanese, with the ‘default’ being American. You can configure how this is done very easily – see here.

Just for reference, you can easily get any of the statistical outputs as attributes on the results object:

results.rsquared

0.57919459237581172

results.params

Intercept            -61.264305
Origin[T.European]     7.478449
Origin[T.Japanese]     8.426227
Year                   1.075484
dtype: float64

You can also really easily use the model to predict based on values you’ve got:

results.predict({'Year':90, 'Origin':'European'})

array([ 43.00766095])

8 Comments

Reminder about cross-platform case-sensitivity differences

August 11, 2016

This is just a very brief reminder about something you might run into when you’re trying to get your code to work on multiple platforms – in this case, OS X, Linux and Windows.

Basically: file names/paths are case-sensitive on Linux, but not on OS X or Windows.

Therefore, you could have some Python code like this:

f = open(os.path.join(base_path, 'LE72020252003106EDC00_B1.tif'))

which you might use to open part of a Landsat 7 image – and it would work absolutely fine on OS X and Windows, but fail on Linux. I initially assumed that the failure on Linux was due to some of the crazy path manipulation stuff that I had done to get base_path – but it wasn’t.

It was purely down to the fact that the file was actually called LE72020252003106EDC00_B1.TIF, and Linux treats LE72020252003106EDC00_B1.tif and LE72020252003106EDC00_B1.TIF as different files.

I’d always known that paths on Windows are not case-sensitive, and that they are case-sensitive on Linux – but I’d naively assumed that OS X paths were case-sensitive too, as OS X is based on a *nix backend, but I was wrong.

If you really have problems with this then you could fairly easily write a function that checked to see if a filename exists, and if it found that it didn’t then tried searching for files using something like a case-insensitive regular expression – but it’s probably just easiest to get the case of the filename right in the first place!

3 Comments