You’ve done your literature review – what about a data review as well?
As academics, we’re always told to do a literature review at the beginning of a research project (indeed, a literature review for a PhD may take many months) – but what about doing a data review?
Whether you write it up formally (like a literature review) or not, I think it is important to sit down at the start of a research project and carefully look into what data is available for you to work with. I didn’t do this for my PhD, but I think it would have saved me a lot of trouble if I’d done this right at the beginning.
So, how should you go about a data review? Well, the key thing is to carefully think about what data you will need, and then look into whether any of that data is already available, or whether you will have to create the datasets yourself. Obviously the sort of data you require will vary significantly based on the project – as will the availability of the data – but once you’ve found some possible sources you should be asking yourself these sort of questions:
- Can I get access to the data? Does it cost money? Will I need to apply for access? If so, then apply now because it will probably take a while for you to be approved.
- Does it come with the metadata I need? I got bitten by this very recently when I used a large dataset that didn’t have all of the associated metadata with it that I needed. I then had to switch datasources towards the end of the project – wasting a lot of time and effort.
- Can I load and process the data? What format is it in? Do I have software that can read the format, or will I have to write my own code to process it?
- Is the data reliable and accurate? Hopefully there will be some sort of accuracy assessment accompanying the dataset, but even if there isn’t, you should do some assessment of the quality of the data.
- Do I understand how the data was acquired/generated, and is this acceptable for what I want to use it for? This is one of the more complicated ones, and requires reading significantly into how the data was made (in the associated paper, report or documentation – which will hopefully exist!) and checking that it is actually valid to use it in your situation. Again, I’ve been bitten by this before, which wasted time and energy.
Exactly how you do the review is up to you. If I ever have PhD students of my own I’d be tempted to ask them do a formal written data review as part of their Literature Review chapter, but it’s equally acceptable just to consider these questions informally – just try and be as exhaustive as possible in your searching, if you start generating datasets that you could have just downloaded then you’ll really feel silly.
I’m sure there are more questions you should ask – some of them will depend on the field (I can think of all sorts of questions for satellite data), but some will be more generic. If you think of any more then please leave a comment and let me know!