Building a Rasterio Workflow for Your Project

Raster Metadata

When you read a raster using Rasterio you have to explicitly call the metadata and store it with a variable. This is because Rasterio does not automatically update the metadata. In all your Rasterio workflows, you have to double-check if the metadata matches your final output. In this case, I will use the dtype metadata as an example.

Code Block 1: dtype differences between line 9 vs line 13 are highlighted in red
  • Line 4: We load the raster into an array but force it to be read as a Float32. We do this by adding .astype(rasterio.float32) after You can call all the datatypes such as rasterio.uint16, rasterio.int8, rasterio.float64, and more. If your IDE supports auto-completion you can see the other options.
  • Line 5: We load the metadata. The metadata is a Python dictionary object so we can manipulate as such.
  • Line 7: The raster is loaded into a standard NumPy array as seen with the print statement. So you can treat it as an ordinary NumPy array and even use NumPy functions with it.
  • Line 8–9: This is the important part. Even though we force Rasterio to read the raster as a Float32 the metadata does not follow it and it still says that the image is a uint16.
  • Line 11: We update the metadata dictionary using the dtype property of our Raster array.
  • Line 13: The metadata is now updated which is highlighted in red for clarity.
Code Block 2
  • Line 1: This is our metadata from code block 1. So we see we have our modified dtype value.
  • Line 2: This shows that the raster is also a Float32 so we verify that the dtypes in the array and metadata match so we can proceed to create our new raster.
  • Line 3: Similar to your standard Python context manager. The metadata has to be used with the double asterisks “**” which is a special syntax for Python arguments.

Sample Workflow: Calculating NDVI

Calculating NDVI is a common analysis that tries to assess the health of vegetation in a satellite image. NDVI looks at the ratio between the Near-Infrared band and the Red band and should provide a value between -1 (unhealthy) to 1 (healthy). The formula for NDVI is seen below:

Image from Digital Globe
Code Block 3
  • Line 5: We anticipate floating point calculations so we read the raster as a Float32.
  • Line 11–12: Band order is Blue, Green, Red, Near-Infrared. Thus, using Python zero-based indexing we extract the red band by using 2 and the NIR band using 3.
  • Line 14: This is the formula for NDVI.
  • Lines 15–17: The output of the NDVI calculation is a flat 2 dimensional array. Rasterio still needs a 3 dimensional array even if it is a single band image. Remember that this is still a NumPy object so we can use np.expand_dims to add an extra dimension to the zero axis.
  • Lines 19–20: NDVI is a single band image so we want to update the metadata to match this. This is under the count property. A good practice is to assign it to the first index of the shape property.
  • Line 22: We can see our updated metadata for our final NDVI output.

Final Thoughts

You can see how important it is to always keep your metadata variable updated. This is the biggest hurdle to overcome when working with Rasterio. I’ve heard that other geospatial libraries handle the metadata automatically so people sometimes get confused when transferring to Rasterio so hopefully this clears up how to work with your raster and your metadata information.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Panji Brotoisworo

Panji Brotoisworo

Researcher working on geospatial sciences and general programming