Building a Rasterio Workflow for Your Project

A big tool in your remote sensing toolbox

Working with geospatial raster data using a pure Python workflow? You have probably heard of Rasterio. Rasterio is an open-source Python geospatial library allows user to work with the powerful GDAL library with a more Pythonic workflow. This tutorial will build upon the Rasterio documentation to build some sample workflows which can be used in your project. I’ll add a few extra bits of information that I wish was in the documentation.

This tutorial assumes that you have knowledge of Python, basic Rasterio knowledge, NumPy arrays, image bit depth or NumPy dtypes, and some knowledge of GIS and remote sensing concepts. We will use the concepts outlined in the tutorial to generate a Normalized Difference Vegetation Index (NDVI) map.

First we will cover the metadata object in Rasterio and the items that are stored in the metadata. This is important because when you write a new raster you need to specify the parameters. Instead of manually writing in the parameters every time, you can simply put in the metadata variable you saved when you are writing the raster.

Raster Metadata

When you read a raster using Rasterio you have to explicitly call the metadata and store it with a variable. This is because Rasterio does not automatically update the metadata. In all your Rasterio workflows, you have to double-check if the metadata matches your final output. In this case, I will use the dtype metadata as an example.

Here is a very quick intro to data types. The bigger the bit depth the more precise your values can be. The bigger the bit depth the bigger your file sizes. So you should ask yourself the question: How precise do my values need to be? Float32 is often used for scientific analysis because it provides a balance between precision and performance. Float64 takes up twice the file size and provides the highest level of decimal precision at the cost of a bigger file size and Float64 being more computationally intensive. If your research needs precision up to 15 decimal points, use Float64, otherwise use Float32. In my workflows, I tend to stick to original uint16 datatype then use Float32 if I need decimals.

A lot of satellite images are distributed in either 8-bit or 16-bit images which do not have decimal points. A common mistake is that a user loads the raster then performs raster math which involves floating point calculations which are necessary for atmospheric correction or NDVI calculations, but the user does not update the metadata.

In this code block, we are reading a raster as a Float32 data type because we know we will work with floating point numbers later. Explanations continue below.

Code Block 1 Explanations:

  • Line 4: We load the raster into an array but force it to be read as a Float32. We do this by adding .astype(rasterio.float32) after f.read(). You can call all the datatypes such as rasterio.uint16, rasterio.int8, rasterio.float64, and more. If your IDE supports auto-completion you can see the other options.
  • Line 5: We load the metadata. The metadata is a Python dictionary object so we can manipulate as such.
  • Line 7: The raster is loaded into a standard NumPy array as seen with the print statement. So you can treat it as an ordinary NumPy array and even use NumPy functions with it.
  • Line 8–9: This is the important part. Even though we force Rasterio to read the raster as a Float32 the metadata does not follow it and it still says that the image is a uint16.
  • Line 11: We update the metadata dictionary using the dtype property of our Raster array.
  • Line 13: The metadata is now updated which is highlighted in red for clarity.

Now we just have to write our new Float32 raster. That’s very easy! Explanation below.

Explanation of code block 2:

  • Line 1: This is our metadata from code block 1. So we see we have our modified dtype value.
  • Line 2: This shows that the raster is also a Float32 so we verify that the dtypes in the array and metadata match so we can proceed to create our new raster.
  • Line 3: Similar to your standard Python context manager. The metadata has to be used with the double asterisks “**” which is a special syntax for Python arguments.

This is now the end of the Raster Metadata section which I hope provided a deeper more newbie friendly explanation to metadata handling with Rasterio. We used the dtype as an example, but remember this example whenever you modify your raster using Rasterio. You need to update the metadata which can be seen later in this tutorial or on the official Rasterio documentation.

Let’s move onto the sample workflow!

Sample Workflow: Calculating NDVI

Calculating NDVI is a common analysis that tries to assess the health of vegetation in a satellite image. NDVI looks at the ratio between the Near-Infrared band and the Red band and should provide a value between -1 (unhealthy) to 1 (healthy). The formula for NDVI is seen below:

So let’s do this analysis using Rasterio. The imagery that will be used for this sample workflow is from Digital Globe which features 4-bands in this order: Blue, Green, Red, Near-Infrared.

NDVI is a decimal value from -1 to +1 so we can anticipate that we should at least use a Float32 datatype. The individual bands can be extracted according to the array index. Explanation continues after code block 3 below:

Explanation for Code Block 3:

  • Line 5: We anticipate floating point calculations so we read the raster as a Float32.
  • Line 11–12: Band order is Blue, Green, Red, Near-Infrared. Thus, using Python zero-based indexing we extract the red band by using 2 and the NIR band using 3.
  • Line 14: This is the formula for NDVI.
  • Lines 15–17: The output of the NDVI calculation is a flat 2 dimensional array. Rasterio still needs a 3 dimensional array even if it is a single band image. Remember that this is still a NumPy object so we can use np.expand_dims to add an extra dimension to the zero axis.
  • Lines 19–20: NDVI is a single band image so we want to update the metadata to match this. This is under the count property. A good practice is to assign it to the first index of the shape property.
  • Line 22: We can see our updated metadata for our final NDVI output.

The output is seen below. I loaded this up in ArcMap and applied a color map to it for clarity. Opening the layer properties of our new raster it shows that it is a single band raster with a Float32 bit depth.

In a greyscale NDVI map, the brighter the color the healthier the vegetation. In this case of this colormap, green is healthy and red is unhealthy. It should be noted that water has a very low NDVI value so these are the very red colors in the image which in this case are rice fields and a river.

Final Thoughts

You can see how important it is to always keep your metadata variable updated. This is the biggest hurdle to overcome when working with Rasterio. I’ve heard that other geospatial libraries handle the metadata automatically so people sometimes get confused when transferring to Rasterio so hopefully this clears up how to work with your raster and your metadata information.

There is so much more you can do with Rasterio if you look at the NumPy side of things, but that is beyond the scope of this tutorial. This type of workflow can be applied to the other capabilities that Rasterio has such as using a mask to clip the raster or projecting. You just need to remember to keep your metadata updated when writing your new raster.

I’m open to comments and suggestions to this workflow, I’m always trying to learn.

Researcher working on geospatial sciences and general programming