Quick way to delete columns from a shapefile
I haven't posted anything on this blog for a long time - sorry about that. I've been quite ill, and had a new baby - so blogging hasn't been my top priority. Hopefully I'll manage some slightly more regular posts now. Anyway, on with the post...
I recently needed to delete some attribute columns from a very large (multi-GB) shapefile. I had the shapefile open in QGIS, so decided the easiest way would be to do it through the GUI as follows:
- Open up the attribute table
- Turn on editing (far left toolbar button)
- Click the Delete Field button (third from the right, or press Ctrl-L) and select the fields to delete
I was surprised to find that this took ages. It seemed to refresh the attribute table multiple times throughout the process (maybe after deleting each separate field?), and that took ages to do (because the shapefile was so large).
I then found I needed to do this process again, and looked for a more efficient way - and I found one. Unsurprisingly, it uses the GDAL/OGR command-line tools - a very helpful set of tools which often provide superior features and/or performance.
Basically, rather than deleting fields, copy the data to a new file, selecting just the fields that you want. For example:
ogr2ogr -f "ESRI Shapefile" -sql "SELECT attribute1, attribute2 FROM input" output.shp input.shp
This will select just the columns attribute1 and attribute2 from the file input.shp.
Surprisingly this command doesn't actually produce a full shapefile as an output - instead of producing output.shp, output.shx, output.prj and output.dbf (the full set of files that constitute a 'shapefile'), it just creates output.dbf - the file that contains the attribute table. However, this is easily fixed: just copy the other input.* files and rename them as appropriate (or, if you don't want to keep the input data, then just rename output.dbf as input.dbf).