More

When a raster is clipped to a polygon with Python, why does it still take on the extent of the entire shapefile?


I am clipping a raster image to each polygon row in a shapefile in ArcMap 9.3 using the Clip tool (Data Management) in Python with Clipping Geometry set. While the clipped raster successfully depicts the intersection of the raster and the polygon, the extent of the clipped raster is still that of the original raster. Here is my code:

# Import system modules import sys import string import os import arcgisscripting # Create the Geoprocessor object gp = arcgisscripting.create(9.3) # Load required toolboxes… gp.AddToolbox("C:/Program Files (x86)/ArcGIS/ArcToolbox/Toolboxes/Data Management Tools.tbx") # Designate a workspace gp.workspace = "L:Clip" # Assign variables… clipto = "test.shp" # shapefile to clip to clipit = "vfcm.tif" # features to be clipped # Cursor time rows = gp.SearchCursor(clipto) row = rows.Next() # clip management while row: n = str(row.VAHU6) # assign a variable for the processing message based on a field clipgeo = row.Shape print "clipping to: " +n # tells what row was clipped # Clip_management    {in_template_dataset} {nodata_value} {NONE | ClippingGeometry} outputname = str(row.VAHU6) + "_c" extent=str(clipgeo.extent) gp.Clip_management(clipit, extent, outputname, clipgeo, "", ClippingGeometry) #update str if necessary print str(row.VAHU6) row=rows.next() # reset the array… del rows del gp

Do you know why I am having this problem? When I run the tool in Model Builder, it works correctly. This problem is evident both in the file size (which is 10x greater than when correctly clipped) and when I "zoom to layer" in ArcMap. If you know how to adjust the code to work, I'd be very appreciative!

Thank you!

UPDATE: Even if I export one polygon from the shapefile manually, look up the extent in the source tab, and then type those coordinates directly into the extent in my code, it still doesn't clip properly.


cross-posting this solution to this thread AND user3397's other thread

@lpinner was on the right track, by stating that the extent being used is the entire extent of the 'clipto' shapefile. This is an underlying attribute of that shapefile, and doesn't change according to the current feature being processed by a cursor. So, every time you're feeding the extent object into the Clip tool, you're giving it this same overall extent, over and over.

You can manipulate the output by changing the Extent environment setting - this is done by setting the gp.Extent property. In my experience, there are two different ways of doing this:

  1. the legacy method involved getting and setting the extent as a space-delimited string,
  2. the newer (9.3+ I think) method involves instantiating an 'Extent' object, and getting and setting its Xmax, YMax, etc properties

Option 1: legacy method

# Set extent property with a space-delimited string # extents should be in this order: XMin YMin XMax YMax gp.Extent = "-180 -90 180 90"

Option 2: modern method

# Create Extent object, and set its attributes ext = gp.CreateObject('Extent') ext.XMin, ext.YMin, ext.XMax, ext.YMax = -180, -90, 180, 90 gp.Extent = ext

I'm not sure which ArcGIS versions restrict you to which method (off the top of my head, I think you can do both in 9.3+, but you can't do Option 2 in 9.2-).

So, to solve your problem, you need to extract the XMin, YMin, XMax and YMax lat/longs from each polygon, and change gp.Extent every time. I.e.:

while row: # do your other processing and extract extent co-ords from current polygon ext = gp.CreateObject('Extent') ext.XMin, ext.YMin, ext.XMax, ext.YMax = currentXMin, currentYMin, currentXMax, currentYMax gp.Extent = ext gp.Clip_management #… continue with Clip operation

A third option, I guess, would be to write each polygon to a new temporary feature class. This temporary FC's Extent property would then be the same as the individual polygon's extent. This would be a slow operation if you wrote these temp files to disk, but you can write them to ArcGIS's 'in_memory' workspace to keep it nippy. I.e.:

# Get the primary key field name for the polygon shapefile oidField = gp.Describe(clipit).OIDFieldName while row: # Iterate through each polygon oid = row.GetValue(oidField) # get the unique id for this polygon # Use Select tool to extract individual polygon to temporary in_memory file # use the unique ID as the SQL query to extract this record gp.Select_management(clipit, 'in_memory/temp', '"%s" = %s' % (oidField, oid)) # get the polygon's extent ext = gp.Describe('in_memory/temp').Extent

From here, you can either use this in_memory/temp polygon as ClippingGeometry for the raster Clip tool, or you can extract the Extent object's values and feed them into the Clip tool's 'extent' parameters as text.

I'll cut my answer off here, and wait and see if it helps. I can clarify any points some more my referring to my scripts at work - I'm writing this off the top off my head at the moment.


You are setting the extent to the entire featureclass "clipto". Instead, set the extent to that of the "clipgeo" geometry - i.e. extent=str(row.getvalue(shape).extent) or extent=str(clipgeo.extent)

You could also try ensuring the extent environment variable is set:

gp.extent=extent gp.Clip_management(clipit, extent, outputname, clipgeo, "", ClippingGeometry)

If you want to select a particular region on a raster data set, use masking in the raster analysis environment.[for arcgis 9.3]

For arcgis 10. I would recommend you to use Extract by mask tool in the spatial analyst toolbox. And ofcourse in arcgis 10, you can use python script.


Have you tried quoting the last argument?

gp.Clip_management(clipit, extent, outputname, clipgeo, "", ClippingGeometry)

gp.Clip_management(clipit, extent, outputname, clipgeo, "", "ClippingGeometry")


Another way for clipping is to make an selection like "polygontoclip"["templatepolygon", ] , found this for points (http://robinlovelace.net/r/2014/07/29/clipping-with-r.html), but also works with polygons.

This doesn't answer your question, but since you mention clipping by bounding box, so this post comes up in search strings:

From r-bloggers: Clipping by a bounding box

Note that due to the if statements in gClip’s body, it can handle almost any spatial data input, and still work.

Although, the question is very old, providing a good answer might help anyone in the future landing on this page.

I think what you're trying to do is straight forward. To illustrate, lets assume i'm interested in eastern coastline of Saudi Arabia (SA) and i have a shape file that has the east and west coast of SA [ and another shapefile of the Gulf (prominent waterbody on the east coast of SA). We need the SF package to crop the two shape files

Then load both shapefiles

you can also check if their CRS is same

The sf_read outputs shapefile information into three fields but we're interested in just the geometry

You should have an image like


1 Answer 1

If I'm interpretting your question correctly, this should work nicely:

Use the shapefile function from the raster package to read-in the shapefile:

It looks like all of the lsoa11cd values have a letter and a number as the first two characters in the string. Let's first subset the data to keep only those with 'E' as the first chatacter for their lsoa11cd value.

Now we can remove the first two characters from each lsoa11cd string and convert to a numeric variable for easier subsetting as follows:


Aligning extents for raster and shapefile in R: import from QGIS with same CRS

I have a small raster. It's a single tile from a Digital Globe basemap (zoom = 16) it's a standard 256 x 256 pixels. In QGIS, I created a polygon shapefile and added about 50 features. The raster and shapefile are in the same CRS. In QGIS, they align fine, as in the image below.

However, when I open both the raster and shapefile in R, the two no longer align. Again, both are in the same CRS. Reproducible code is below, and the shapefile and raster may be downloaded from a GitHub folder here. I'm using brick() from the raster package to keep RGB bands separate.

The problem appears to be due to the two files having different extents, as evident in the summary:

However, when I check the extent for both files in QGIS, they overlap. The image below is for the raster in QGIS. Both the shapefile and the raster have a y-extent range in [-256, 0].

It seems that the brick() function is setting both x and y extents to be positive, though I've reviewed the package/function documentation and don't see why this would be the case. How can I get these two files to align in R?

I've tried exporting the raster as a GeoTIFF, but that does not work either. I've also tried reading in the shapefile with a few other R packages, like rgdal. I can get it to "work" if in QGIS I export the shapefile, while centered on the raster, and select "map view extent," but this is not an optimal solution, and (a) I need to work with an array of map tiles and don't want to manually zoom to each, and (b) this doesn't explain the mistake I've made. I'm not even entirely sure if my problem is exporting from QGIS or importing to R, though I'm pretty sure my error is in brick().

Note: There is a similar sounding question here, but despite the question's title, the error was with coordinate reference systems.


The lines and polygon objects have the same extent. The extent for the points object is smaller in the vertical direction than the other two because there are no points on the line at y = 8.

Resolution

A resolution of a raster represents the area on the ground that each pixel of the raster covers. The image below illustrates the effect of changes in resolution.

(Source: National Ecological Observatory Network (NEON))


A TYPICAL DATA PROCESSING WORKFLOW USING RADPROC

In the following, a typical basic radar data processing and analysis workflow including raw data processing, temporal aggregation, heavy rainfall detection and data exchange with ArcGIS using radproc and the 17-year time series of the hourly RADKLIM RW product is illustrated and an overview of the most important functions is given. Whereas the RADKLIM and DWD gauge raw data processing is specific for Germany, the analyses and GIS exports shown in the other subsections are equally applicable for any other precipitation dataset imported into radproc's standardised HDF5 file format, introduced above.

RADKLIM raw data processing and clipping

The raw RADOLAN and RADKLIM data are usually provided as gzip compressed monthly tar archives containing one uncompressed binary file per 5- (YW, RY) or 60-minute (RW) time interval. Every binary file starts with a metadata header and then contains 900 × 900 (RADOLAN) or 1,100 × 900 (RADKLIM) gridded precipitation values as integers in 1/10 mm for the whole of Germany, whereby every value describes the spatially averaged precipitation sum per time interval and 1 km grid cell.

As the RADKLIM data formats were adopted from RADOLAN, the data processing is very similar for both products and both will be referred to as RADOLAN throughout this section.

All raw data archives need to be unzipped for data import using the function unzip_RW_binaries() for hourly data or unzip_YW_binaries() for 5-minute data from radproc's raw module. Both functions automatically generate a folder structure of yearly and monthly directories for the available time series, and gzip compress all unzipped binary files. The latter is a relatively slow process because of the large number of files but it is necessary to save hard drive space.

Subsequently, the new folder with all unpacked, compressed binary files can be passed to the overarching function create_idraster_and_process_radolan_data() which automates the entire process of data import, conversion to DataFrames and saving to HDF5. Internally, this function calls a series of helper and wrapper functions dividing the task into separate parts. The underlying binary file import into a two-dimensional NumPy array and a metadata dictionary is based on a slightly modified version of wradlib's read_RADOLAN_composite() function. Consecutively, all binary data are imported and the row order is reversed for each array. The latter is necessary in order to avoid the data grid to be upside down because the binary data block starts in the lower left grid corner whereas ESRI grids are created starting in the upper left corner. Next, the reversed arrays are reshaped to one-dimensional arrays and these are inserted into monthly DataFrames by another function. The RADOLAN pixels are numbered and converted to DataFrame columns whereas every DataFrame row is labelled with the corresponding timestamp from the RADOLAN metadata. These monthly DataFrames are saved as datasets in the specified HDF5 file.

Optionally, if ArcGIS is available, a polygon GIS shapefile or feature class containing the outline of a study area can be passed to the processing function. In that case, radproc's arcgis module is accessed to create a so-called ID raster for the national RADOLAN grid in stereographic projection which allows for spatial localisation of the numbered RADOLAN pixels. Each ID value of this raster corresponds to a DataFrame column since these are labelled with the ID numbers. The tool automatically detects the input radar data product and applies the corresponding grid size and location. The ID raster is then clipped to the extent of the given shapefile to obtain the IDs located within the study area. Finally, the clipped ID raster is converted into a one-dimensional NumPy array called ID array, and NoData values are removed (see Figure 3). The resulting ID array is used to select the RADOLAN pixels within the study area upon DataFrame creation.

Methodology for data clipping using an ID raster.

Methodology for data clipping using an ID raster.

The generated HDF5 file with monthly datasets, which is compressed by default to save hard drive space, can be directly and quickly accessed by pandas and is the basis for all other radproc functions. The entire workflow of raw data processing is illustrated in Figure 4.

Workflow for RADOLAN and RADKLIM data processing with radproc.

Workflow for RADOLAN and RADKLIM data processing with radproc.

Temporal aggregation

Besides the use of precipitation sums for climatological or hydrological analysis or as model inputs, the aggregation of longer time periods should always be one of the first steps in a workflow using weather radar data in order to assess data quality in a given study area. Many systematic measurement and correction errors which cause bias such as spokes, clutter pixels or areas of missing data, are visible, e.g., in a map showing the mean annual precipitation sum.

From any HDF5 file having the structure described above, single monthly DataFrames can be loaded with radproc's load_month() function or longer periods can be loaded with load_months_from hdf5() for further analysis, plotting or data exports.

Furthermore, the core module offers several functions for automated temporal aggregation to hours, days, months, years or hydrological seasons. These functions access the HDF5 file via the load functions and iteratively load and resample all data within the specified time period. For example, a call of the function hdf5_to_years() with the parameters year_start and year_end set to 2012 and 2017, respectively, returns a DataFrame with six rows, each of them containing the annual precipitation sum per pixel. A subsequent call of this DataFrames' mean() method yields – depending on the specified axis – either the spatially or temporally averaged annual precipitation.

Figure 5 shows the function call described above and an excerpt of the created output DataFrame located in the Harz Mountains, a low mountain range in the transition area between Northern and Central Germany, in a Jupyter Notebook (https://jupyter.org/).

Aggregating RADKLIM RW data to annual precipitation sums.

Aggregating RADKLIM RW data to annual precipitation sums.

Internally, hdf5_to_years() is only a wrapper function that calls load_years_and_resample(), which is actually used by all of radproc's resampling functions. It iterates over all months within all years of the specified time period, whereby the DataFrame for each month is loaded and resampled individually in order to reduce the required memory. The DataFrames are either resampled to the respective target frequency or, if the latter is equal to or lower than ‘month’, they are resampled to a single-row DataFrame with the monthly precipitation sum. The first resampled month DataFrame of the first year is initialised as the future output DataFrame and afterwards, one after the other, all resampled month DataFrames are appended. After the loops, the output DataFrame is finally resampled to the target frequency.

Data exchange with ArcGIS

Radproc's arcgis module provides a set of functions for data exchange between ArcGIS and Python as well as some geospatial analysis functions, e.g., for extended zonal statistics and data extraction from raster cells to points.

For the export of radar data from DataFrames to single raster datasets, the function export_to_raster() can be used, whereas the function export_dfrows_to_gdb() handles the export of entire DataFrames into new File Geodatabases. The latter function exports every DataFrame row to one raster dataset, whereby it automatically derives the file names from the DataFrame index. Additionally, a list of statistical parameters can be passed to the function to calculate some statistical characteristics from the input DataFrame and export these, too. For example, a statistics list with the entries ‘mean’ and ‘max’ yields two additional exported raster datasets, each of them containing the mean and maximum value per cell, respectively. Figure 6 shows the function call and its results for exporting the DataFrame with the annual precipitation sums generated in the ‘temporal aggregation’ subsection.

Exporting the annual precipitation sums to raster datasets.

Exporting the annual precipitation sums to raster datasets.

Moreover, feature-class attribute tables can be directly imported into pandas DataFrames with attribute_table_to_df() and, in return, a list of DataFrame columns can be joined to an attribute table using join_df_columns_to_attribute_table(). Besides data exchange with other geodata, this provides a seamless integration of point feature-classes, which is the typical geodata format for rain gauge measurements, into the data analysis workflow. This is an important feature for comparison of gauge and radar datasets. To complete this data exchange circle, the function rastervalues_to_points() receives a list of raster datasets and a point feature-class and, by location, extracts all corresponding raster values to fields in the attribute table.

Detection and count of heavy rainfall

One of the primary reasons for developing RADKLIM was to provide a highly resolved nationwide dataset for the analysis of recent changes in rainfall-related extreme weather events (Winterrath et al. 2017). As a starting point for heavy rainfall analysis, radproc currently offers three functions providing an overview of the heavy rainfall behaviour and frequency in a given study area.

The function find_heavy_rainfalls() checks a time period for the exceedance of a given rainfall intensity threshold and returns a DataFrame with all intervals meeting the given criteria. This way, the exact time and location of heavy rainfall intervals can be identified and the selected intervals can subsequently be exported for visualisation.

Using the same iterative approach as the resampling functions, find_heavy_rainfalls() accesses a given HDF5 file via the load functions in the core module and checks the time series between the parameters year_start and year_end for rainfall intervals exceeding specific thresholds. Here, the parameter thresholdValue defines the rainfall intensity threshold in mm per time unit (given by input data) to be checked for exceedance independently for each raster cell. Additionally, the parameter minArea specifies the number of raster cells in which the threshold must be exceeded for the interval to be selected, whereby these cells do not need to be adjacent. This parameter can be used to consider the surface area of rainfall cells, but also to take potentially known cells biased by clutter into account. Finally, the time period to be checked can be described in more detail by setting the season parameter to periods such as year, summer, winter or any single month or range of months.

As an example, Figure 7 shows a function call, which checks whether a precipitation amount of 100 mm/h (as the input dataset RW has an hourly resolution) has been exceeded in at least one cell anywhere in the nationwide 1,100 × 900 grid in any month of May in the period 2001 to 2017. If this holds true, the respective interval is contained in the output DataFrame. The last two lines of code select all columns (cells) containing any value greater than 100 in order to reduce the number of displayed columns. Moreover, this cell selection gives an idea, in how many cells such high rainfall amounts occurred.

Detecting rainfall intervals exceeding 100mm/h.

Detecting rainfall intervals exceeding 100mm/h.

As a result, this short analysis of the RADKLIM RW dataset reveals that, throughout the entire dataset, a precipitation amount of 100 mm has occurred in nine hourly intervals between 2001 and 2017 in the month of May with a total number of 97 cells exceeding this threshold at least once.

Taking the same parameters into account, the function count_heavy_rainfall_intervals() also checks a time period for exceedances meeting the given criteria, but returns a single-row DataFrame with a count of exceedances per cell instead of the intervals themselves. This count gives a good overview of the heavy rainfall frequency and its spatial distribution in the study area.

Finally, the third function duration_sum() computes the rolling precipitation sum from data in 5 minute resolution for a defined duration D and saves the resulting DataFrames to a new HDF5 file. The calculation considers transitions between subsequent months and yields monthly DataFrames in 5 minute resolution, whose intervals contain the respective precipitation sum of the last D minutes, that is, the last D/5 intervals. Due to the standardised format, the resulting HDF5 file can be used as input for find_heavy_rainfalls() to further detect and analyse extreme rainfall events which may have been separated and thus attenuated by the artificial interval boundaries in data with a lower temporal resolution such as RW. Nevertheless, when analysing the results, it has to be taken into account that subsequent intervals are not statistically independent because a single original 5 minute interval influences several intervals in the duration dataset. As duration sums are a commonly used method in hydrologic engineering, further analysis methods building upon them might be implemented to radproc in future.

DWD MR90 rain gauge data processing

In order to facilitate data comparison and, thus, data quality assessment, radproc's dwd_gauge module provides functions for automated rain gauge data processing. Currently, only 1-minute gauge data in the DWD MR90 format are supported, but further functions to support other input formats, especially the freely available data from DWD Climate Data Centre, are currently under development.

A MR90 rain gauge dataset comprises one data file and one metadata file. These two files per gauge station need to be saved in separate directories. To support the creation of a point feature-class from the metadata, the function summarize_metadata_files() summarises the information on station ID, station name, geographic coordinates and height above sea level from the metadata files into one single text file. A single data file can be imported into a one-column DataFrame with stationfile_to_df().

Finally, the function dwd_gauges_to_hdf5() offers an automated iterative processing and import of all data files in a directory. The gauge data are converted into the same DataFrame format as the radar data. To make the data formats completely match, the time zone of the gauge data is converted to UTC and the data are resampled to the same 5-minute intervals as the 5-minute RADKLIM product YW. The final DataFrame contains one column per rain gauge. Finally, it is divided into monthly DataFrames, which are saved to the standardised HDF5 file format. As described above, radproc's analysis and resampling functions work for all datasets converted this way. Consequently, the function calls for resampling and heavy rainfall detection shown in Figures 5 and 7 are exactly the same for the gauge data except for a different input HDF5 file path. However, instead of exporting the rows of the output DataFrame to rasters as shown in Figure 6, the rows can be exported to new fields of a feature class attribute table using join_df_columns_to_attribute_table().


Introduction

Universal Health Coverage(UHC) has become the focal point of health policy discourse as the world has made transition from Millennium Development Goals(MDG s) to Sustainable Development Goals (Goal 3.8: “Achieve UHC, including financial risk protection, access to quality essential healthcare services and access to safe, effective, quality and affordable essential medicines and vaccines for all”). Access to healthcare which is central to UHC is intractable concept and has multiple definitions with respect to ability to get care, act of seeking care, actual delivery of care, indicators thereof and is context specific. The framework of effective coverage propounded by Tanahashi encompasses the domain of accessibility (spatial and non-spatial) as a component in ascertaining barriers to Universal Health Coverage [1]. Accessibility is also one of the themes pertaining to access and utilization embedded in Penchansky’s conceptual framework of availability, accessibility, affordability, acceptability and accommodation [2] where spatial accessibility is captured in first two components. Accessibility can be defined as the factors intervening between the perception of need and realization of utility [3]. Effective accessibility to medical services reflects an individual/ family’s ability, mobility and time to reach a service once need has been established by a potential health service user [4], which can be distinguished from potential accessibility which simply implies the existence of a service, regardless of whether it is effectively accessible. Spatial accessibility measuring travel impedance (distance or time) between patient location and service points impacts the progression from potential to realized access.

Geographic Information Systems are one of the suite of information and communication technology enabled solution recommended by World Health Organization (WHO) and Asian Development Bank (ADB) to address system resiliency and universal health coverage inefficiencies [5]. Geospatial data is directly relevant to all three main functions of country’s public health system: monitoring community health & identification of health problems & priorities ensuring universal access to appropriate and cost-effective care and policy making to solve local and national health problems [6]. Most published measures of spatial accessibility to healthcare can be classified more simply into four categories: provider-population ratios, distance to nearest provider, average distance to set of providers, and gravitational models of provider influence [7]. However, there is a major lacunae with respect to studies exploring the dimension of geographic accessibility and even more scarce literature delving into spatial modelling using appropriate techniques to the setting. In Indian context specifically, the issue of distance- time as barriers to healthcare services hasn’t been analyzed systematically and the digital cartography has been used only for visualization of health indicators rather than informed decision making.

Taxonomy for healthcare studies categorizes the study of spatial accessibility as the study of spatial potential which is studied by both distance and time approach. Straight line distance however, are not symptomatic to true provider accessibility due to varied road distance, quality, terrain and seasonal variation and overestimate the population that is within an hour of health facility [8]. Thus, using transport network elevation and other natural barriers can provide more accurate estimates. Literature however, indicates relative efficiency of raster based techniques vis. a vis. network analysis specifically in areas with rudimentary/dilapidated infrastructure and road network. The study by Delamater [9] revealed raster based method identifying more total area, zip codes and population as being underserved than network method as raster method produced fewer unique contiguous area than network method. Raster based analysis by incorporating both network and off-network modes of travel and allowing for complex array of barriers redresses the limitation of straight- line distance model and network analysis, and is thus, preferred method of analysis in remote, rural and topographically challenging settings. Therefore, our study has strived to adopt a comprehensive and holistic approach to model spatial accessibility and population coverage in the area setting. Current study has attempted to provide a succinct analysis via multimodal modelling to gauge proximity and coverage using both Euclidean distance and Raster based approach. The geographic access and spatial coverage surfaces produced in our analysis provide visually powerful tools that can be used to support health research and decision making for planning and resource allocation at district level, thereby aiding in solving location-allocation conundrum. The rationale of conducting this study is to demonstrate significant spatial variations in geographical accessibility and spatial coverage of health system across different travel scenarios. The quintessential feature of this approach is to incorporate information on demand and supply of care in order to support health planners in identifying potential locations for new health facilities where maximum increase in accessibility can be achieved. To our knowledge, this is the first study from India exploring different packages of healthcare services as a geographical measure of Indicator 3.8.1(Coverage of Essential Health Services) of Goal 3, Sustainable Development Goals. Existing studies only delves into single health service or are based on assumption that any service in the health facility network can be chosen if it is reachable within specified travel time which is an unrealistic assumption as service provisioning and readiness amongst facilities is very heterogeneous in Indian setting. Hence, our study is an improvisation over previous literature as separate analysis has been conducted by subdividing health facilities and population grid in tandem with the type of service being investigated. We conducted a detailed service availability mapping of entire public health facility network in our study area enabling the identification of facilities providing specific services. Another novelty of our study arises from holistic methodological approach of incorporating household survey database allowing for participatory, dynamic and interactive approach of parametrization of travel scenarios, mode of transportation and travel speeds using actual utilization data and individual experiences by users, whereas, current literature is constrained by availability of accurate input data and limitations are inherent in the assumptions and parameterizations. Improving the accuracy and relevance in this context requires greater accessibility to, and flexibility in, travel time modelling tools to facilitate the incorporation of local knowledge and rapid exploration of multiple travel scenarios [10]. Incorporating such local knowledge refines the estimates as the travelling time and extent of catchment areas are sensitive to the mode of transportations and travelling speeds. Additionally, we have also collated and synthesized the administrative data in our framework aiding estimation of the population coverage. Finally, geographical accessibility in difficult settings like rural, remote and conflict zones poses formidable challenges to the users. Specifically, population residing in the contested borderlands have to reckon with peculiar security threats as episodes of firing and shelling obstructs the physical access further. Yet, there is an absence of representation of difficult settings in the literature and spatial accessibility to reach target population in fragile zones is not explored, which we have attempted in present study.


RHESSysWorkflows

Introduction

RHESSysWorkflows provides a series of Python tools for performing RHESSys data preparation workflows. These tools build on the workflow system defined by EcohydroLib.

Before reading ahead, you might want to check out this screencast, which provides a conceptual overview RHESSysWorkflows.

Table of Contents

For questions or support contact Brian Miles

This work was supported by the following NSF grants

Award no. 1239678 EAGER: Collaborative Research: Interoperability Testbed-Assessing a Layered Architecture for Integration of Existing Capabilities

Award no. 0940841 DataNet Federation Consortium.

Award no. 1148090 Collaborative Research: SI2-SSI: An Interactive Software Infrastructure for Sustaining Collaborative Innovation in the Hydrologic Sciences

These instructions will lead you through installing RHESSysWorkflows (and EcohydroLib) as well as GRASS 6.4 and QGIS open source GIS applications GRASS is required by RHESSysWorkflows, and QGIS is convenient to have for visualizing GIS data acquired and produced as you make RHESSys models.

These instructions are tailored to OS X and Linux users (specifically Ubuntu 14.04, or 15.04 15.10 is not compatible as it ships with GRASS 7 rather than GRASS 6.4), however installation under other Linux distributions is also possible. RHESSysWorkflows may in theory work under Windows, but this has never been tested. Windows users are encouraged to run an Ubuntu under a virtual machine.

RHESSysWorkflows is compatible with OS X 10.6 through 10.11, but only versions 10.9 through 10.11 are officially supported. For installation instructions for OS X 10.6 through 10.8, see Deprecated installation instructions toward the end of this document.

To find out what version of OS X you are currently running, click on the apple in the upper left corner of the screen and select About this Mac. To find out the latest version of OS X you computer can run, visit this helpful page.

If you encounter problems during installation, please check the installation notes before contacting the developers for support.

Installing on OS X using Homebrew

Previous methods for installing RHESSysWorkflows under OS X relied on the official GRASS 6.4 GIS packages. Unfortunately, the official GRASS 6.4 (and the new GRASS 7 for that matter) are not compatible with new security enhancements in OS X 10.11 (El Capitan). Rather than disable this new security measure (called System Integrity Protection), we recommend that RHESSysWorkflows users use a new Homebrew-based installation method, which will install GRASS without requiring that the security features of your operating system be disabled.

Homebrew is a third-party package management system that makes it easy to install open-source software under OS X. Each software package one can install through Homebrew is called a formula. To search for available software formula, visit Braumeister.

During installation you may be prompted to install OS X command line developer tools. Choose "Install".

If you already have Homebrew installed, make sure to do the following before proceeding:

A tap allows software developers to maintain a collection of software formulae OSGeo4Mac curates a number of formula related to open source GIS software.

Now Homebrew is installed and we just need to install a few software dependencies before installing RHESSysWorkflows.

Install dependencies for GRASS, QGIS, and RHESSysWorkflows

First, install XQuartz, which is needed by GRASS, by running the following command from the Terminal application:

Note, this will install a separate copy of Python 2.7 so it will not interfere with the copy of Python 2 that ships with OS X.

Install RHESSysWorkflows and Python packages

To install RHESSysWorkflows and its dependencies (including EcohydroLib), enter the following from the terminal Terminal:

Upgrading to a new version of RHESSysWorkflows

To upgrade to a newer version of RHESSysWorkflows, enter the following into the Terminal:

If pip does not install the version you expect, it may be necessary to first remove RHESSysWorkflows and Ecohydrolib before installing the new version (especially under Linux where some Python packages fail to build when installed via pip):

Install GRASS and QGIS using Homebrew as follows from the Terminal:

You will also need to install a Python library for accessing PostGIS geospatial databases This is required by QGIS:

This concludes the OS X Homebrew-specific portion of the installation instructions.

Installing on Ubuntu Linux 14.04 or 15.04

Install RHESSysWorkflows/EcohydroLib dependencies using apt-get:

Note: the above works for Ubuntu 14.04. For 15.04 the packaging of GDAL has changed use the following to install dependencies under Ubuntu 15.04:

Ubuntu 15.10 is not compatible with RHESSys or RHESSysWorkflows as this version of Ubuntu uses GRASS 7, which is not yet supported by RHESSys or RHESSysWorkflows. If you want to use another Linux distribution, make sure that it provides similar versions of the above dependencies.

Install RHESSysWorkflows and Python packages under Linux

To install RHESSysWorkflows and its dependencies (including EcohydroLib), enter the following from your Terminal:

Upgrading to a new version of RHESSysWorkflows

To upgrade to a newer version of RHESSysWorkflows, enter the following into the Terminal:

If pip does not install the version you expect, it may be necessary to first remove RHESSysWorkflows and Ecohydrolib before installing the new version (especially under Linux where some Python packages fail to build when installed via pip):

This concludes the Linux-specific portion of the installation instructions.

A note on RHESSysWorkflows version numbers

Each project can only be used with compatible versions of RHESSysWorkflows/Ecohydrolib. Compatible versions are those that write the same version number to the metadata store for a given project. This compatibility check is necessary to ensure both scientific reproducibility and to make sure your workflows do not become corrupted by incompatible versions. We strive to maintain compatibility between releases of RHESSysWorkflows/Ecohydrolib, however sometimes enabling new workflow scenarios requires incompatible changes. The release notes for each release will note when a new version breaks backward compatibility. The good news is that you can have multiple copies of RHESSysWorkflows/Ecohydrolib installed on your computer at the same time. To do so, you must do the following:

Create a new virtual environment for each version of RHESSysWorkflows you would like to run

Activate a virtual environment you would like to install a specific version of RHESSysWorkflows into

Install RHESSysWorkflows in the virtual environment, for example to install version 1.0:

pip install rhessysworkflows==1.0

Not that you do not need to use 'sudo' when running in a virtual environment as the files are installed in a directory owned by your user account.

Install GRASS Addons for RHESSysWorkflows

Follow these steps to install the GRASS addons under OS X and Linux:

Create a new location (it doesn't matter where, we'll only use it to run the g.extension command to install the extensions).

Exit GRASS (close all GUI windows, then type exit in the GRASS command line window).

On OS X only, once you have exited GRASS do the following:

For more information on these addons (r.soils.texture and r.findtheriver), see:

Setup EcohydroLib and RHESSysWorkflows configuration file

Choose the appropriate prototype configuration file:

Save into a file named '.ecohydro.cfg' stored in your home directory and replace all occurances of <myusername> with your user name (To find out your OS X or Linux user name, use the whoami command in Terminal).

Set ECOHYDROLIB_CFG environment variable so that RHESSysWorkflows can find your configuration file

Under OS X, from Terminal, do the following:

echo "export ECOHYDROLIB_CFG=$/.ecohydro.cfg" >>

If you're running Linux, do the following:

echo "export ECOHYDROLIB_CFG=$/.ecohydro.cfg" >>

echo "export LD_LIBRARY_PATH=/usr/lib/grass64/lib:$" >>

Re-load bash profile (or close and open a new Terminal window):

source /.bash_profile ( /.profile under Linux)

This concludes the configuration portion of the installation and configuration instructions.

Using RHESSysWorkflows - Introduction

All EcohydroLib and RHESSysWorkflows tools are executed from the command line. Each tool stores the data and metadata associated with a single workflow in a directory, called a project directory. Metadata are stored in a file in the project directory called metadata.txt. There can only be one metadata.txt in a project directory, so it is essential that each workflow have its own project directory.

In addition to automatically recording provenance information for data and the processing steps of a workflow, the metadata store allows for loose coupling between the tools that are used to carry out a particular workflow. By design, each workflow tool performs roughly one discrete function. This allows for flexible workflows. Each workflow tool writes a series of entries to the metadata to reflect the work done by the tool. Most workflow tools require certain entries to be present in the metadata store to perform the work they will do. For example, before DEM data for the study are can be downloaded from DEMExplorer, the bounding box for the study area must be known. The tool that queries DEMExplorer need not know how the bounding box was generated, it only cares that the bounding box is present in the metadata store. Lastly, the metadata store helps users to orchestrate workflows by requiring that only new information required at each step be entered to run a particular command, other information required can be queried from the metadata.

Each workflow tool will print usage information when run on its own for example running:

This indicates that the -p (a.k.a. --projectDir) argument is required that is, you must specify the project directory associated with workflow for which you are running the tool. For many EcohydroLib/RHESSyWorkflows tools, this is the only required command line parameter.

It's good practice when running a command to first execute the command with no command line arguments. This will show you the required and optional parameters. To get detailed help for a given command, run the command with the -h (a.k.a. --help) argument, for example:

Note that while this particular tool, and RHESSysWorkflows tools in general, have long names, they are long to be descriptive so as to be easier to use. To avoid having to type these long names out, you are encouraged to make use of tab completion in Terminal. To use tab completion, simply type the first few characters of a command and then hit the 'tab' key on your keyboard the entire command name will be 'completed' for you on the command line. If the entire name is not 'completed' for you, hit tab again to see that list of commands that match what you've typed so far. Once you type enough characters to uniquely identify the command, hitting tab once more will complete the command name.

Using RHESSysWorkflows - Typical workflows

A typical workflow will consist of running data processing/registration tools from EcohydroLib. Once the required datasets are in place (e.g. DEM, soils, landcover, etc.) RHESSysWorkflows tools can be run to create the world file and flow table associated with a RHESSys model.

In the following sections two example workflows are described: (1) using data from national spatial data infrastructure (USGS, NHD, NLCD, SSURGO, SRTM) and (2) using custom local data. The combinations of tools executed in these workflows represent two of the many unique workflows possible.

National spatial data workflow

Start by creating a directory called 'standard'. This will be your project directory for this example workflow. You can create this directory anywhere on your computer where you have write access (e.g. in your home directory).

Specify a USGS streamflow data to locate on the NHD network

First, choose the USGS streamflow gage, identified by the USGS site number, you wish to build a RHESSys model for. Note that while you can select gages that drain large basins, if you are planning to use SSRUGO soils data acquired using the RHESSysWorkflows tool GetSSURGOFeaturesForBoundingbox the study area must be less than 10,000 sq. km.

To locate the USGS gage of interest on the NHD flow line network run the following tool:

This will create the metadata store for your project in a file named metadata.txt in the project directory 'standard'. The metadata store will be populated with the gage ID (the site number you specified on the command line), and the NHD reachcode and reach measure associated with this gage. By default, RHESSysWorkflows will use a web service to perform this query. (If you are using a local copy of the NHDPlusV2 data add the -s local command line argument to the above command Most users should ignore this.)

Note that USGS NWIS gage identifiers can begin with '0'. You must enter this leading 0 when specifying a streamflow gage.

Extract NHD catchments that drain through the streamflow gage

The NHD database relates stream flowlines to the catchments that drain into them. RHESSysWorkflows can use these catchments, stored in a shapefile in your project directory, to determine the geographic bounding box for your study area (see below). This bounding box can then beused extract spatial data for your study area from datasets stored locally as well as those available via web services interfaces.

To extract a shapefile of the NHD catchments that drain through your streamflow gage, run the following tool:

(If you are using a local copy of the NHDPlusV2 data add the -s local command line argument to the above command Most users should ignore this.)

You should now see the study area shapefile in your project directory. You can visualize the study area, along with the streamflow gage, in QGIS. Note that the study area shapefile does not represent the delineation of your watershed, but should instead be a superset of the watershed. We will delineate your watershed using GRASS GIS.

Get bounding box for study area

Now that RHESSysWorkflows has a GIS representation of your study area, it can determine the extent or bounding box (also sometimes called the 'minimum bounding rectangle') of the study area. Do so by running the following tool:

As with many EcohydroLib/RHESSysWorkflows commands, you won't see much in the way of output printed to the screen don't fear. The commands are writing what's needed for future workflow steps to the metadata store associated with your project directory. If you open the metadata store, the file called metadata.txt in the project directory standard, you can see the bounding box coordinates stored in the study_area section look for the attribute named bbox_wgs84.

Acquire terrain data from U.S. Geological Survey

U.S. Geological Survey (USGS) has developed a prototype web service for downloading terrain data based on the National Elevation Dataset (NED). Now that we've defined the bounding box for our study area, it's very easy to download DEM data from this web service, as follows:

By default, this tool will download an extract terrain data for your study area from the National Elevation Dataset (NED) 30-meter (1/3 arcsecond) USA DEM. The DEM will be stored in a UTM project (WGS84 datum) with the appropriate UTM zone chosen for you. You can override both the DEM coverage type and target spatial reference system by specifying the appropriate command line parameters spatial reference systems must be refered to by their EPSG code (see http://www.spatialreference.org/ref/epsg/ for more information). Additionally, you can choose to resample the DEM extract to another spatial resolution. To learn how to specify these options, issue the help command line argument as follows:

Note that EcohydroLib/RHESSysWorkflows uses the DEM resolution, extent, and sptial reference the reference for all other rasters imported into or generated by subsequent workflow tools.

Lastly, you are not required to use a DEM from the USGS web service. See the Custom local data workflow example below, as well as the Working in watersheds outside the United States section for more information.

Extract landcover data from local NLCD 2006 or 2011 data

EcohydroLib makes it easy to import custom NLCD 2006 or 2011 tiles for your study area into your project from web services hosted by U.S. Geological Survey. For example, to acquire NLCD 2011 data:

This command will download an NLCD 2011 data matching the extent, resolution, and spatial reference of your DEM and store the tile in your project directory. (If you wish to give your NLCD tile a particular name, use the outfile command line option. To instead download NLCD 2006 data, do the following:

Download soils data from SSURGO

The USDA NRCS provides the Soil Data Mart, a sophisticated web services-based interface for querying and downloading high-resolution SSURGO soils data. SSURGO data are structured as a complex database consisting of both spatial and tabular data. For more information on this database format and the soil survey data exposed through the SSRUGO database please see the SSURGO metadata.

EcohydroLib provides two tools that make it easy to generate soil hydraulic properties commonly needed for ecohydrology modeling (namely the numeric properties Ksat, porosity, percent sand, percent silt, and percent clay). The first tool downloads spatial mapunit features for your study area as well as tabular soil hydraulic property data. These spatial and tabular data are joined, and written to your project directory as an ESRI Shapefile. For more information on what attributes are queried and how non-spatial mapunit commponents are aggregated by the code, please see the EcohydroLib source code here and here.

To download SSURGO features and attributes into your project, run the following command:

Note that for server performance and network bandwidth issues, Soil Data Mart limits SSURGO spatial queries to areas of less than roughly 10,000 sq. km. For performance reasons, EcohydroLib (and therefore RHESSysWorkflows) limits the size of SSURGO queries to

2,500 sq. km. If your study area is larger than this, you must instruct GetSSURGOFeaturesForBoundingbox to tile the query into multiple sub queries. SSURGO query tiling is enabled using the --tile option:

What this does is to split the larger query to the Soil Data Mart into many smaller queries (possibly hundreds or thousands). The results of these sub-queries are then automatically assembled into a single vector feature layer by EcohydroLib. To reduce download times, tiled queries are by default performed in parallel. The number queries to run in parallel is determined automatically by the number of simultaneous threads your computer supports (see here for more information). Use the --nprocesses option to change the number of SSURGO queries to perform in parallel. For example, to perform 16 queries in parallel (which should be fine on an 8-thread machine):

To disable parallel queries:

You can visualize the downloaded SSURGO features and joined tabular data by opening the shapfile in QGIS. The SSURGO shapefile has a long, though descriptive, name that includes the bounding box coordinates for your study area. If you are unsure what shapefile in your project directory to open, the soil_features attribute of the manifest section of your metadata store lists the filename.

While you're looking at the metadata store, scroll down to the provenance section. While the attribute names are a bit messy, you can see that for each manifest entry, there EcohydroLib has recorded detailed provenance information. For the SSURGO soil features, the Soil Data Mart web services URL is listed as the datasource for the DEM data downloaded from DEM Explorer, EcohydroLib records the exact URL used to download your DEM. Lastly, if you scroll down a bit farther, you can see that the history section of the metadata store records the order of every EcohydroLib/RHESSysWorkflow command you've run in this workflow, including all of the command line parameters.

EcohydroLib also provides a second tool for dealing with SSURGO soils data. This tool allows you to create raster maps of SSURGO mapunit polygons using the following numeric soil properties as raster values: Ksat, porosity, percent clay, percent silt, and percent sand). Use the following command to generate all of these rasters in your project directory:

Later on in this example workflow, we'll use the percent sand and percent clay rasters to generate a USDA soil texture map, which we'll use to define RHESSys soil parameters for our study watershed.

Registering custom local data: LAI data

EcohydroLib does not current provide direct access to vegetation leaf area index data from remote sensing sources. LAI data are needed by RHESSys to initialize vegetation carbon and nitrogen stores. RHESSysWorkflows relies can use a user-supplied LAI rasters to supply these initial LAI data to RHESSys. For this example workflow, you can download an LAI image here. Use the following command to register this user-supplied raster into your project:

To make this command work, you'll have to change the path to the file name passed to the -r argument to reflect the location on your computer to which you downloaded the example LAI image.

Note that EcohydroLib/RHESSysWorkflows do not work with files or directories whose names contain spaces. This will be addressed in a future release.

Also, the extent of the LAI image doesn't quite match that of our DEM. By default, RegisterRaster will not import a raster that does not match the extent of the DEM. Use the --force option to force RegisterRaster to import the raster:

When using the force option, it is even more important that you check the results of the command to ensure the data registered with the workflow are appropriate for the modeling you plan to perform. Go ahead and browse to your project directory, find the DEM and LAI rasters and open them in QGIS (you will likely have to set a color map for each, otherwise all values will render in grey).

Note the -b (a.k.a. --publisher) argument given to the above command. When specified, this optional parameter will be stored in the provenance matadata store entry of the raster.

RegisterRaster is a generic EcohydroLib tool that knows how to import several types of raster into your workflow the -t lai argument indicates that we are importing an LAI raster (see the Custom local data workflow for to learn how to import other raster types). RegisterRaster will copy the raster being imported into your project directory the raster will be resampled and reprojected to match the resolution and spatial reference of the DEM already present in the workflow. You can choose the resampling method to use, or turn off resampling, though the raster will be resampled if the spatial reference system does not match that of the DEM see the help message for more information.

At this point, we have enough spatial data in a generic format (e.g. GeoTIFF) to build RHESSys-specific datasets using RHESSysWorkflows.

Create a new GRASS location

RHESSys requires that all spatial data used to create a world file and flow table for a RHESSys model be stored in a GRASS GIS mapset. We'll start building these data in RHESSysWorkflows by creating a new GRASS location and mapset within our project directory, and importing our DEM into this mapset:

The -d (a.k.a. *--description") parameter is a textual description of this GRASS location always wrap this parameter in quotes. If you choose, you can specify custome names of the following GRASS parameters:

  1. dbase, the directory within the project directory where your GRASS location will be stored (defaults to 'GRASSData')
  2. location (defaults to 'default')
  3. mapset (defaults to 'PERMANENT')

Use the --overwrite option to CreateGRASSLocationfromDEM to overwrite the GRASS location created by a previous invocation of CreateGRASSLocationFromDEM. Note that most RHESSysWorkflows commands provide the same option. The ability to overwrite GRASS datasets accomodates the often exploratory nature of ecohydrology data preparation workflows. While the data will be overwritten, the command history stored in the metadata store will retain a listing of the order in which you ran all workflow steps. This can help you to retrace the steps you took to arrive at the current workflow state.

Go ahead and open GRASS, pointing it to the dbase named GRASSData in your project directory, and then opening the mapset PERMANENT in the location default. You should be able to load the DEM raster into the map view. We'll use GRASS to visualize the results of the next few workflow steps, so keep GRASS open in the background.

Import RHESSys source code into your project

To create worldfiles and flow tables RHESSysWorkflows needs a copy of RHESSys source code. RHESSysWorkflows also uses the new RHESSys ParamDB database and Python libraries to generate vegetation, soil, land use and other parameters needed by RHESSys. RHESSysWorkflows is only compatible with the pre-release version of RHESSys 5.16 and later versions of the code. At present, and for first-time users, the most reliable way to import ParamDB and RHESSys source code into your project is to download the code from GitHub using the ImportRHESSysSource tool. However, this tool is also capable of importing RHESSys source code stored on your computer. This allows you to import the code from a previous RHESSysWorkflows project ParamDB is always downloaded from GitHub, even when RHESSys source code is imported from a local source.

To download ParamDB and the RHESSys source code and store them in your project directory issue the following command:

If you want to checkout an alternate branch, use the -b option to specify the Git branch of RHESSys to use (e.g. 'develop'). By default, ImportRHESSysSource will use the master branch, which is the appropriate branch to use with RHESSys 5.18.

Once ImportRHESSysSource finishes importing RHESSys source code into the project directory, it will compile all the tools necessary to create world files and flow tables, while also compiling the RHESSys binary. Once the command finishes, open the rhessys directory in your project directory. Here you can see the familiar RHESSys directory structure for storing model parameters, templates, worldfiles, flow tables, temporal event files, and model output the RHESSys tools compiled by ImportRHESSysSource will be copied into the bin directory of the rhessys directory. Also note that all the source code for RHESSys is stored in the src directory.

Import RHESSys climate data

Because of the greater variability of climate data formats, and the complexity of time-series workflows, we have chosen to focus development effort on RHESSysWorkflows toward making it easier to acquire and manipulate geospatial data required for building RHESSys work files and flow tables. This means that the modeler is responsible for building the climate data necessary for building RHESSys world files and performing model runs.

RHESSysWorkflows provides the ImportClimateData tool to import RHESSys climate data into your project. To run this example workflow, download example climate data here. Unzip the file to a location on your computer (e.g. in your home directory), this will result in a directory named clim in the location. Issue the following command to import these data:

You will have to replace /path/to/clim with the path of the clim folder unpacked from the zip file downloaded above.

For your own climate data to work with ImportClimateData the data must be stored in their own directory, with each base station having file name that ends in .base. See the help for ImportClimateData for more information.

Create climate stations map

If your study watershed has multiple climate stations that you would like to use, you must use a climate stations map to associate each zone in your world file with a particular climate station. RHESSysWorkflows provides the GenerateBaseStationMap tool to create a raster map of your climate stations using Thiessen polygons derived from climate station points, these points must be specified in a text file in a format supported by GRASS's v.in.acsii tool. For this tutorial, we'll use a dummy point to associated with the bwi climate station imported above. You can download this point here. Once downloaded, unzip the file to reveal the text file containing the point, which should look like this:

In a real-world case, there would be additional lines in this file, one for each climate station. The first column is the base station ID and must match the base_station_ID field of the $.base file associated with each climate station.

When we create the world file template later on in this tutorial, the tool that we use to do so, GenerateWorldTemplate, will make sure that there is a climate base station file for each unique raster value in your base station map the world file template will not be generated if this is not the case.

The second and third columns represent the X and Y coordinates (or easting and northing) of the point feature we will use to represent the location of the climate station. The final column is the name of the climate station and should match $ in $.base (i.e. if your base station file name is 'bwi.base', the final field should be 'bwi').

Now we're ready to use GenerateBaseStationMap to: import the climate station points make Thiessen polygons based on the points, and rasterize the polygons:

The GRASS tool v.voronoi is used to generate the Thiessen polygons. Note that some versions of this tool can fail if you have only two points. Hopefully this will be fixed when GRASS 6.4.3 is released later in 2013.

Delineate watershed and generate derived data products

RHESSysWorkflows automates the process of delineating your study watershed based on the location of the streamflow gage registered in the workflow. As part of this process, many datasets needed by RHESSys will be derived from the DEM. To delineate the watershed:

Here the -t (a.k.a. --threshold) parameter specifies the minimum size (in DEM cells) for subwatersheds generated by the GRASS command r.watershed.

The -a (a.k.a. --areaEstimate) parameter allows you to provide a guess of the area (in sq. km) of the delineated watershed. DelineateWatershed will report whether the watershed is within 20% of the area. You can view the delineated watershed in GRASS by displaying the raster map named basin. If the area or the shape of the delineated watershed differs greatly from what you expect, you may need to vary how DelineateWatershed snaps your streamflow gage onto the stream network. This is accomplished by either changing the -s (a.k.a. --streamThreshold) or stream threshold parameter and/or the -w (a.k.a. --streamWindow) parameter passed to r.findtheriver.

To debug watershed delineation problems, it is also helpful to view the original streamflow gage and the snapped streamflow gage overlaid on the upslope accumulated area map (UAA). DelineateWatershed will create vector layers for each of the streamflow gage coordinants (named gage and gage_snapped) as well as a UAA raster map (named uaa).

Though we do not recommend that you make changes to the metadata store by hand, as a last resort, you can snap the gage location by hand using GRASS and update the gage_easting_raw and gage_northing_raw attributes in the rhessys section of the metadata store. Then re-run DelineateWatershed as before with the addition of the --overwrite option.

For a listing of the derived datasets generated by DelineateWatershed, use the GRASS command g.list rast or check the DelineateWatershed source code.

RHESSysWorkflows provides GeneratePatchMap, an automated tool for creating gridded and clumped patch maps. Gridded patch maps consist of a regular grid of the same resolution and extent as the DEM clumped maps can be created using elevation or topographic wetness index rasters. Modelers can also use custom patch maps registered via EcohydroLib's RegisterRaster tool and imported into GRASS using ImportRasterMapIntoGRASS (see below for a general description of this command).

To create a gridded patch map, enter the following into your Terminal:

To create an elevation clumped patch map:

. and a topographic wetness index clumped map:

Clumped patch maps are generated by calling GRASS's r.clump command with the appropriate source raster as import.

By default GeneratePatchMap will set the zone map to be that of the patch map, but only if a custom zone map has not been registered with the workflow (e.g. via a combination of RegisterRaster and ImportRasterMapIntoGRASS see custom data tutorial below). If you wish to overwrite your custom zone map with the patch map, use the --forceZone option to GeneratePatchMap.

Generating soil texture map

Since we used EcohydroLib's SSURGO tools to generate percent sand and percent clay raster maps for our watershed, we can use the GRASS add-on r.soils.texture to generate USDA soil texture classes, for which RHESSys's ParamDB contains parameters. It is also possible to use custom soil maps, which we'll explore in the custom local data workflow section below.

To generate our soil texture map in GRASS, as well as the corresponding RHESSys soil definition files, use the GenerateSoilTextureMap tool as follows:

This command will print information about what soil texture classes were encountered in the soil texture map, and what RHESSys soil default IDs these classes map onto. You can view the resulting soil texture map (named soil_texture) in GRASS. The soil definition files will be stored in the defs directory of the rhessys directory stored in your project directory.

Import LAI map into GRASS

We'll use the general command ImportRasterMapIntoGRASS to import our LAI map from the project directory into GRASS, where RHESSys will be able to make use of it (you can also derive an LAI map from your landcover map see below):

The -m (a.k.a. --method) paramer specifies how GRASS should resample the raster being imported. Value resampling methods are those supported by GRASS's r.resamp.interp command, as well as none, which will cause ImportRasterMapIntoGRASS to skip the resampling step.

Generate landcover maps in GRASS

RHESSysWorkflows uses a single landcover map to generate the following maps used by RHESSys:

  • Vegetation type (stratum)
  • Land use
  • Roads
  • Impervious surfaces
  • Leaf area index (LAI optional)

The first step in generating these maps is to import the landcover raster from your project directory into GRASS using ImportRasterMapIntoGRASS:

In our case, the landcover map in our project directory came the NLCD 2011 data hosted by USGS. However, RHESSysWorkflows supports the use of custom landcover maps regsitered via RegisterRaster. In either case, we need to provide raster reclassification rules so that RHESSysWorkflows will know how to generate vegetation, land use, roads, impervious, and optionally LAI maps from the landcover map.
To do this, we use the RegisterLandcoverReclassRules tool:

NLCD2011 is a known landcover type in RHESysWorkflows (in addition to NLCD2006), so all we need do is use the -k (a.k.a. --generateKnownRules) option. For a custom landcover map, we could instead use the -b (a.k.a. --buildPrototypeRules) option to generate prototype rules that we can edit as needed. It is also possible to specify that existing reclass rules should be imported from another directory on your computer using the -r (a.k.a. --ruleDir) parameter. To include LAI reclass rules when registering prototype or existing rules, you must use the -l (a.k.a. --includeLaiRules) parameter

The known rules for NLCD2006 and NLCD2011 that ship with RHESSysWorkflows include an LAI reclass rules file with values for grassland, and evergreen needle leaf and deciduous broadleaf forests (both temperate) drawn from the International Satellite Land Surface Climatology Project II (ISLSCP II) project. These data can be downloaded here.

Whether using known rules, building prototype rules, or importing existing rules, RegisterLandcoverReclassRules will result in the following four rules files being created in the rules directory of your project directory:

  • stratum.rule
  • landuse.rule
  • impervious.rule
  • road.rule
  • lai-recode.rule (if the --includeLaiRules option was selected)

There is no need to edit these rules for this NLCD2011 example, but you should take a moment to look at how these rules work. RHESSysWorkflows uses GRASS's r.reclass command (r.recode for creating LAI maps), and so the rules files follow this format.
It's important to note that the landcover reclass rules for stratum and landuse must result in raster maps whose values labels match class names present in the RHESSys ParamDB database. Thus, be very careful in editing the righthand side of the expressions in your stratum and landuse reclass rules.

Note that to keep track of edits you make to your project's reclass rules in your project metadata, you should use the RunCmd workflow command (see the section on custom workflows to learn how to use this tool).

You can find information on NLCD classes here

Once the landcover reclass rules are in place, it is very easy to generate the raster maps derived from the landcover data as well as the vegetation and land use definition files needed by RHESSys this is done using the following command:

If you would like an LAI map to be generate, you must use the -l (a.k.a. --makeLaiMap) parameter on the above command line. This will only work if you are using known landcover reclass rules, or if you requested that RegisterLandcoverReclassRules include LAI reclass rules when creating prototype rules or using existing rules.

Like with the soil texture map and definition generation step, GenerateLandcoverMaps will provide descriptive output of the vegetation and land use definition types encountered in the raster data.

Creating the worldfile for a watershed

Now we are almost ready to create the worldfile for our watershed. First we must create the template from which the world file will be created. To do this, we'll use the GenerateWorldTemplate tool. Fortunately this is very easy because the metadata store contains nearly all the information needed to create the template. If you are using multiple climate stations, and therefore have a base station map that you created using GenerateBaseStationMap, all you need do is:

If you are using a single climate station and did not create a climate station map, you must specify the climate station as follows:

Here we're using the climate station named bwi.

In either case, if your workflow is missing any information necessary for making the world template, GenerateWorldTemplate will exit with a corresponding error.

If you want to see the template file generate, as well as other information, use the -v (a.k.a. --verbose) option

Now use the CreateWorldfile tool to create a world file using this template:

We've specified the the -v (a.k.a. --verbose) command line option. This will print details about what CreateWorldfile, and the programs it runs on your behalf, is doing. This is recommended as these programs can fail in complex ways that CreateWorldfile may not be able to detect, so you'll likely want to know what's going on under the hood.

When CreateWorldfile finishes, it will create an initial worldfile named worldfile_init in the worldfiles directory in the rhessys directory in your project directory.

As with worldfile creation, at this point in the workflow, RHESSysWorkflows's metadata store contains nearly all the information needed to create a flow table using the createflowpaths (CF) RHESSys program. The two choices you have are whether CF should create a flow table that includes roads and/or includes a surface flow table to modeling non-topographic routing of rooftops. We'll route roads in this example, leaving rooftops for the custom local data workflow discussed below.

Run CreateFlowtable as follows:

This will result in the creation of a flow table called world.flow in the flow directory of your rhessys directory. Now we have almost everything we need to run RHESSys simulations.

Initializing vegetation carbon stores

RHESSys provides a program called LAIread to initialize vegetation carbon stores in your worldfile.

Note, LAIread should only be used for RHESSys simulations with static vegetation (i.e. not dynamic vegetation mode enable via the -g command line option to RHESSys).

Initializing carbon stores is a multi-step process that involves running LAI read to generate a redefine worldfile, running a 3-day RHESSys simulation to incorporate the redefine worldfile, writing out a new worldfile with initialized vegetation carbon stores.
RHESSysWorkflows automates all of these processes for you, it can even figure out what date to start the 3-day RHESSys simulation on based on your climate data.

In the current version of RHESSysWorkflows, RunLAIRead is only able to read simulation start dates from point time-series climate data. Users of ASCII or NetCDF gridded climate data must run LAI read by hand. The next release of RHESSysWorkflows will add support for gridded climate data to RunLAIRead.

You can run RunLAIRead as follows:

Note that we use the verbose command line option here as well. The new GRASS-based version of LAIread is relatively new and not as well tested, so we advise you to keep a close watch on what it is doing.

LAIread relies on allometric relationships to initialize vegetation carbon stores. These allometric parameters have not yet been added to RHESSys ParamDB. A default version of the parameters for RHESSys base vegetation classes is stored in the RHESSys ParamDB source code repository. RHESSysWorkflows stores this file under the name allometric.txt in the allometry folder of the ParamDB of your rhessys/db folder.
You can edit this file to suit your needs before running RunLAIRead.
Consult the RHESSys wiki for more information on allometric relationships used by LAIread.

When finished, a final worldfile named world will be created in the worldfiles directory of your rhessys directory. With this worldfile, you are ready to perform subsequent model workflow steps including: spin-up, calibration, scenario runs, and analysis and visualization.

This concludes this tutorial using RHESSysWorkflows to create a RHESSys world file and flow table using standard spatial data infrastructure.

We need one more thing before we can run our model, a TEC file. TEC stands for "temporal event controller". We use a TEC file to tell RHESSys to do things on at certain times during a simulation.
For example, to redefine the worldfile to simulate timber harvest or forest fire. We also use tec files to tell RHESSys what model outputs should be produced when. To create a TEC file that tells RHESSys to print daily model outputs starting on 10/1/2008, do the following:

For more information on tec file format, see the RHESSys wiki.

Once you have built a RHESSys model using RHESSysWorkflows, you can run your model manually. However this will not capture information about model runs in your project metadata. If you would like to record your runs in your project metadata, use the RunModel command:

Notice the '--' in the command line. All of the command line options before the '--' are options required by RunModel.py some of these are also common RHESSys options. All of the options after the '--' will be passed to RHESSys. Because the project metadata knows where RHESSys is installed in your project directory, you don't have to specify the full path of any of the RHESSys input files (e.g. world files, flow tables, tec files, etc), just specify the filenames. RunModel will echo RHESSys console outlet to the screen (if the -v or verbose option is specified as above), and will always save the same output into a file named 'rhessys.out' stored in the output folder for each model run. The output folder will be named based on the value you provide for the '-pre' or output prefix option.

Working in watersheds outside the United States

The above standard U.S. spatial data acquisition workflow steps do not provide access to data outside the U.S. (by definition). However, it is still possible to use RHESSysWorkflows to develop RHESSys models for watersheds outside the U.S. One option is to use custom local data, which is described here. If you are working in Australia, EcohydroLib (and by extension RHESSysWorkflows) provides access to 1-second (

30-meter) resolution DEM data (derived from SRTM data) using web services interfaces provided by Geoscience Australia. These data can be accessed using the GetGADEMForBoundingBox command. A typical workflow would begin as follows. First, define your study area using the RegisterStudyAreaShapefile command:

(Replace PROJECT_DIR with the name of your EcohydroLib project).

next, extract the bounding box coordinates for your study area:

then, download Geoscience Australia DEM data:

Currently, there are three types of DEM data available:

  • 1 second SRTM Digital Elevation Model of Australia
  • 1 second SRTM Digital Elevation Model - Hydrologically Enforced
  • 1 second SRTM Digital Elevation Model - Smoothed

To acquire the 1 second SRTM Digital Elevation Model of Australia data, run GetGADEMForBoundingBox as follows:

to acquire the 1 second SRTM Digital Elevation Model - Hydrologically Enforced data:

and to acquire the 1 second SRTM Digital Elevation Model - Smoothed data:

Consult the Geoscience Australia metadata catalog for more information about these data sets.

The remainder of your workflow would proceed with importing streamflow gage coordinates and subsequent steps described here.

In addition to Australia DEM, EcohydroLib provides access to gridded soils data provided by CSIRO and availble as part of the Soil and Landscape Grid of Australia dataset. To download these data into your project use the GetSoilGridAustralia command:

This will download a subset of the available gridded Australia Soils Data from the Australia-wide 3D Soil Attributes dataset currently GetSoilGridAustralia will download the percent sand, percent silt, and percent clay layers. Data for the first 1-m of the soil profile is downloaded, and a depth-weighted average value for each pixel is generated using these layers. Once these data have been downloaded, you can use the GenerateSoilTextureMap command to generate RHESSys soil texture map and parameters for USDA soil classes:

Custom local data workflow

The following sections outline how one might use RHESSysWorkflows to build RHESSys input files using custom data already available on your computer. Unlike the above standard spatial data tutorial, we won't provide data for the workflow steps below. Instead, we'll describe how your data should be formatted to work with each workflow tool. To avoid duplication, only those concepts specific to using local data in RHESSysWorkflows will be discussed. You are encouraged to read the standard spatial data tutorial above as well. The workflow sequence covered below is not the only possible workflow involving local data. Also, it is possible to combine steps from this example workflow with steps from the standard spatial data tutorial.

Import a DEM into your project

When working in watersheds outside the coverage of the NHD (such as when working outside of the U.S.) the first workflow step is to import a digital elevation model data using the RegisterDEM tool. The DEM to be imported must be in a format readable by GDAL.

Run RegisterDEM as follows:

To run this command, replace PROJECT_DIR with the absolute or relative path of an empty directory where you would like the data and metadata for your project to be stored (i.e. your project directory). It is also possible to reproject or resample the DEM on import. See RegisterDEM's help for more information (i.e. run with the -h option).

RegisterDEM will result in the DEM being copied to your project directory, also the DEM extent will be used to determine the bounding box for the study area a polygon of the DEM extent will be generated and saved as a shapefile in your project directory.

Use a DEM with streams and storm drains burned into it

If you are working with an urbanized catchment, it is often necessary to "burn" streams or storm drains into your DEM so that you can properly delineate the "sewershed." RHESSysWorkflows allows you do use both a "stream burned" and a standard "non-burned" DEM in the same workflow.
The burned DEM will only be used for operations that require it (e.g. watershed delineation, flow table creations) the standard DEM will be used for determining elevation, slope, aspect, etc. To use a stream burned DEM, do the following:

Once the stream burned raster has been registered with the workflow the DelineateWatershed and CreateFlowtable tools will know to use this raster instead of the standard DEM all other tools that use the DEM will continue to use the standard DEM. If you want to override this behavior (e.g. to test the effect that the burned DEM has on watershed delineation), you can pass the --ignoreBurnedDEM option to DelineateWatershed or CreateFlowtable, which will cause them to use the standard DEM instead.

We recommend the excellent open-source Whitebox GAT for burning streams into DEM datasets.

Import streamflow gage coordinates

The coordinates of the streamflow gage associated with your watershed are registered with the workflow using the RegisterGage tool. The tool takes as input a point shapefile containing one or more points the WGS84 lat-lon coordinates for the desired gage will be extracted from the shapefile. These coordinates will be written to the metadata store, and a new point shapefile containing a point only for the selected gage will be created in the project directory.

A typical way to run RegisterGage is:

To run this comment, replace PROJECT_DIR as above, specify the input shapefile you'd like to use, the name of the dataset within the shapefile, the name of the ID gage attribute in the dataset, and the ID of the desired gage. The name of the dataset is usually the same as the filename of the shapefile (minus the .shp). If you are unsure, you can use the command line tool ogrinfo, which ships with GDAL.

Importing data into GRASS for use with RHESSys

The following workflow steps are identical whether using standard spatial data or custom local data and will not be covered here:

  • Create a new GRASS location
  • Import RHESSys source code into your project
  • Import RHESSys climate data
  • Delineate watershed and generate derived data products
  • Generate landcover maps in GRASS

See the above standard spatial data tutorial for detailed information on these steps.

Importing other raster layers

For a list of all of the current raster map types supported by EcohydroLib, run the RegisterRaster tool as follows:

This will also show all of the resampling and other import options available.

What follows is a series of examples showing how to input some of these raster types. All rasters must be stored in a file format readable by GDAL (see above).

Here we are importing a landcover raster map obtained from the Baltimore Ecosystem Study LTER where we've asked RegisterRaster not to resample the raster (unless its spatial reference system differs from the DEM i.e. the resolution of the raster cells won't be changed). We're also telling RegisterRaster to ignore the fact that the extent of the landcover raster does not exactly match the extent of the DEM/study area. After import, you are strongly encouraged to visualize the landcover map overlaid on the DEM using QGIS to ensure that the landcover will cover an adequate portion of your study area.

For landcover maps, we recommend that you do not resample when registering the raster using RegisterRaster, but instead let GRASS handle the resampling.

To make the landcover map in the project directory available to RHESSys, it must be imported into GRASS as follows:

This will import the landcover raster into GRASS, and then resample the raster using the nearest neighbor method. For a list of valid resampling methods, run ImportRasterMapIntoGRASS with the -h option you may also specify none as the resampling method and the raster will not be resampled.

Starting with RHESSys 5.16, the createflowpaths (CF) utility is able to create surface flow tables that can incorporate non-topographic routing of flow from rooftops to nearby impervious and pervious areas. RHESSys 5.16 can use separate surface and subsurface flow tables to simulate the effect of such non-topographic routing on the landscape. You can find more information on using surface flowtable routing in RHESSys here.

To import a rooftop connectivity raster, use RegisterRaster as follows:

As with landcover maps, we recommend do not let RegisterRaster resample roof connectivity rasters, instead letting GRASS handle the resampling. RegisterRaster uses GDAL to resample rasters. GDAL ignore null/nodata pixels when resampling, whereas GRASS's r.resamp.interp does not. Thus, when a landcover and a roof top connectivity raster, which contains nodata values for all non-roof pixels, are resampled in RegisterRaster, they can become mis-registered, which will result in an invalid surface routing table.

Then make your rooftop connectivity raster available for RHESSys by importing it into GRASS:

As described in the standard spatial data tutorial above, EcohydroLib/RHESSysWorkflows requires that the user provide their own LAI data, which can be imported into a project using RegisterRaster:

Now make your LAI raster available for RHESSys by importing it into GRASS:

A custom patch map can be imported into a project as follows:

Then make your patch raster available for RHESSys by importing it into GRASS:

A custom soils map can be imported into a project as follows:

Then make your soil raster available for RHESSys by importing it into GRASS:

The GeneratePatchMap tool will use the patch map as the zone map. If you wish to use another map for the zone map, do the following after running DelineateWatershed:

Then make your zone raster available for RHESSys by importing it into GRASS:

By default no isohyet map will be used when creating the world file for a watershed. If you wish to use an isohyet map, do the following before running GenerateWorldTemplate:

Then make your isohyet raster available for RHESSys by importing it into GRASS:

Note that we tell ImportRasterMapIntoGRASS to transform the isohyet raster values to integers on import. This is necessary due to limitations in the current version of the RHESSys tool grass2world. When doing the, integer conversion, ImportRasterMapIntoGRASS wil multiply the raster values by 1000, giving three significant digits. To use another value, specify the --multiplier option.

Generating RHESSys definitions for custom soil data

When using custom soil data with RHESSysWorkflows you need to create soil definition files before you can create a worldfile. To create soil definitions, you must first create raster reclass rules that map between your soil type and a soil type known to RHESSys ParamDB. At present, ParamDB contains definitions drawn from the literature for USDA soil textures. However you may load custom soil parameters into your own local copy of ParamDB. For more information, see the ParamDB README.

To create prototype soil reclass rules for a project, do the following:

Here we're using the -b (a.k.a. --buildPrototypeRules) command line option. This will result in the creation of a file called soils.rule in the rules directory of your project directory. You will need to edit this file as necessary to map your custom soil types to ParamDB soil types.

Make sure that the soil class names on the righthand side of each reclass rule correspond to soil class names in ParamDB

You can also import existing soil reclass rules as follows:

The -r (a.k.a. --ruleDir) parameter must point to a directory that contains a file named soils.rule. This will will be copied into the rules directory of your project directory.

Once you have valid soil reclass rules in place, you can generate RHESSys soil parameter definition files for your custom soils using the following command:

This tool will print information to the screen about each soil type encountered and the RHESSys ParamDB soil parameter classes they map to. If you see no such print out, check your soil reclass rule file to make sure it is correct. The resulting soil definition files will be written to the defs directory in the rhessys directory of your project directory.

Remember most RHESSysWorkflows commands support the --overwrite command line option for overwriting existing data stored in the project directory or in GRASS.

Creating a world file template in areas with low slope

Due to limitations in the current version of RHESSys's grass2world tool, slope values less than 1.0 will be truncated to 0.0. This causes values of NaN (i.e. not a number) to result for the spherical average of aspect calculation. To work around this, you can use the --aspectMinSlopeOne command line option to instruct GenerateWorldTemplate to use a slope map whose minimum value is 1.0 when calculating the spherical average of aspect:

Creating a surface flow table using a roof connectivity map

If you are using a roof connectivity map in your workflow, you need to explicitly tell CreateFlowtable to use the roof connectivity map to generate a surface flow table. Do so as follows:

Here we're using both the --routeRoofs and --routeRoads options. You are not required to use both together, but usually when modeling rooftop connectivity you will be working in a watershed that also has roads whose effects on routing you will also want to consider.

Creating the worldfile and initializing vegetation carbon stores

The following workflow steps are identical whether using standard spatial data or custom local data and will not be covered here:

See the above standard spatial data tutorial for detailed information on these steps.

RHESSysWorkflows provides many tools for preparing RHESSys models, however there are many possible other tools and workflow steps that can be used to build a model. To allow arbitrary commands to be carried out on data stored in a project directory, RHESSysWorkflows provides the RunCmd command, for example you may which to edit your worldfile template and then re-run grass2world by hand:

(it is necessary to manually add your project directory's copy of the RHESSys binaries to your path because grass2world runs a helper program called rat that must be in your path)

Although RHESSysWorkflows will not be able to capture full metadata about the input and output files used and produced by commands run through RunCmd, it will write an entry to the processing history of your project metadata. This way, you at least have a record of the custom workflow steps you applied to the data in your project directory.

Creating multiple worldfiles based on subbasins

For large model domains, it may be desirable to break up your watershed into multiple worldfiles. RHESSysWorkflows allows you to do this using the CreateWorldfileMultiple command:

This will create one worldfile for each subbasin delineated for your watershed.

Once you've created multiple worldfiles, you can create corresponding flow tables using the CreateFlowtableMultiple command:

CreateFlowtableMultiple supports the same command line options as its counterpart CreateFlowtable.

Finally, you can initialize vegetation carbon and nitrogen stores for multiple worldfiles using RunLAIReadMultiple:

Visualizing RHESSys output

RHESSysWorkflows includes tools to visualize RHESSys model output.

Note that these tools are still in development, but beta versions are provided for your convenience functionality and options may change without notice.

The first tool, RHESSysPlot, will produce plots for basin-scale variables such as streamflow. This tool is very flexible, and includes the ability to plot observed data vs. modeled data, and to plot data for multiple simulations. A prototypical usage to plot observed and simulated hydrographs with rainfall plotted on a second y-axis is as follows:

Go here to download example observed streamflow for the DR5 study catchment used in the first part of this tutorial

The --figureX and --figureY options control the size of the plot (in inches). RHESSysPlot also allows you to make standard time series, semi-log scale timeseries, and cumulative distribution function plots. For a full description of options, use the --help option:

In addition to making static plots of basin-scale output variables, RHESSysWorkflows provides a tool, PatchToMovie, for making animations of patch-scale output variables. To use this tool, you first need to have RHESSys simulations for which patch-scale output was created (e.g. using the -p output option). The following example will create a 30-frames-per-second animation for infiltration:

Note that the variable can be an aribitrary mathematical expression (using '+', '-', '*', and '/') combining patch-level RHESSys variable names as well as numerical constants (as in the example). When using such expressions, you'll want to specify a title for each frame in your animation using the -t (a.k.a. --mapTitle) option (otherwise the expression will be used as the title, which likely won't fit on the frame).

When specifying simulation output (e.g. -r) and the GRASS mapset (e.g. -g), it is important to use the same GRASS mapset that was used to create the worldfile used to run the simulation.

For a full description of options, use the --help option:

PatchToMovie uses a command line program called ffmpeg to encode individual maps into a movie file. To install ffmpeg do the following:

Linux (Debian/Ubuntu-based systems)

Install ffmpeg (and vlc for viewing animations):

sudo apt-get install ffmpeg vlc

Lastly, you must add an entry to your EcohydroLib configuration file. For OS X:

See Setup EcohydroLib and RHESSysWorkflows configuration file for more details on setting up your configuration file.

Deprecated installation instructions

OS X 10.7 through 10.10 using Kyngchaos GIS packages

Install Xcode (OS X developer tools):

Install Xcode via the App Store

Make sure that Xcode command line tools are installed by running the following from the command line (e.g. using the Terminal app):

  • Agree to the Xcode license by running the following command (we only run this command to force Xcode show us the license): sudo cc

Install GIS tools: GRASS & QGIS

Note, GRASS version 6.4 is required for RHESSysWorkflows (GRASS 7.0 is not supported at this time). GRASS is used internally to carry out workflow steps (leading to the creation of RHESSys world files and flow tables).
You will also find it useful to use GRASS to visualize the results from some workflow steps.

Before installing GRASS, etc. under OS X 10.8, 10.9 or 10.10, you will need to enable applications from any source to be installed. To do so open System Preferences > Security & Privacy > General and choose "Allow apps downloaded from: Anywhere". Doing so exposes your computer to more security risks from downloaded software. We recommend that you revert this setting once you are finished with installation.

Here you will need to download and install the following:

  1. GDAL Complete framework
  2. FreeType framework
  3. cairo framework
  4. PIL (Python imaging library)
  5. GRASS.app

While you are there, we recommend you also install QGIS (Quantum GIS)

In addition to GRASS and components installed above, install:

  1. NumPy from http://www.kyngchaos.com/software/python
  2. SciPy from http://www.kyngchaos.com/software/python
  3. Matplotlib Python module from http://www.kyngchaos.com/software/python
  4. QGIS from from http://www.kyngchaos.com/software/qgis

QGIS is useful for visualizing output for earlier workflow steps that precede the importing data into GRASS.

Install RHESSysWorkflows Python modules (including EcohydroLib)

Before installing RHESSysWorkflows, we need to install some dependencies by hand (this is annoying, but unavoidable):

This is necessary because another depdendency (statsmodels) requires that we install its dependencies first. If you are running XCode 5.1 or later, you may encounter this error:

If you don't see the above error, skip the next step. To work around the error, install statsmodels' dependencies this way (you'll probably want to copy and paste this rather than typing it):

This too is annoying, but is unaviodable (for now).

To install RHESSysWorkflows and its dependencies (including EcohydroLib), enter the following from your Terminal if you are running XCode 5.0 or earlier:

If you are running Xcode 5.1 (but not Xcode 6.1 or later), we need to set the ARCHFLAGS variable as above:

Again, only do the above step if youa re running Xcode 5.1, not Xcode 6.1.

This may take a while as several of the modules rely on non-Python code that has to be compiled.

Why are GDAL Python libraries not included as a dependency of RHESSysWorkflows? This is to make life easier for users of OS X 10.7 and 10.8. For these OSes, the GDAL complete installer that accompanies GRASS will install GDAL Python modules in the copy of Python 2.7 that ships with the OS, and the GDAL Python module does not successfully build by itself under OS X, which would make the rhessysworkflows install fail. Linux users will have to make sure they install GDAL Python modules in addition to GDAL itself (e.g. via a companion package, or by 'sudo pip install GDAL').

Install GRASS Addons for RHESSysWorkflows

Follow these steps to install the GRASS addons needed by RHESSysWorkflows:

Create a new location (it doesn't matter where, we'll only use it to run the g.extension command to install the extensions)

For more information on these addons (r.soils.texture and r.findtheriver), see:

Setup EcohydroLib and RHESSysWorkflows configuration file

Save into a file named '.ecohydro.cfg' stored in your home directory and replace all occurances of <myusername> with your user name (To find out your OS X user name, use the whoami command in Terminal).

Set ECOHYDROLIB_CFG environment variable so that RHESSysWorkflows can find your configuration file

From Terminal, do the following:

echo "export ECOHYDROLIB_CFG=$/.ecohydro.cfg" >>

Re-load bash profile (or close and open a new Terminal window):

This concludes installation and configuration instructions for OS X 10.7 through 10.10 using Kyngchaos GIS packages.

Apple, and thus many third-party software developers, no longer supports OS X 10.6. If you are still running OS X 10.6, you may want to explore upgrade options as many older Macs can run newer operating systems, upto and including the latest version. If your Mac still has some life in it, it is important to upgrade from OS X 10.6 this version is no longer receiving security updates from Apple, and because newer versions have more security features by design.

If you wish to install RHESSysWorkflows on OS X 10.6, follow the instructions below (though we no longer have a OS X 10.6 machine to test on, so we won't be able to help if you run into problems).

Due to its age, there are a few more installation steps needed under OS X 10.6. Also, once Apple stops support this version of the OS, support for OS X 10.6 will also be dropped from subsequent releases of RHESSysWorkflows. If you were thinking of upgrading from OS X 10.6 to 10.9 for other reasons, this may add another.

You will need to use the sudo command line tool to install many of the components needed for EcohydroLib/RHESSysWorkflows. The sudo command allows you to run other commands as a super user. Under OS X, by default, only users who are 'admins' have permission to run sudo. To check if your user account is an administrator, or to make your user an administrator open System Preferences > Users & Groups. Note that to use sudo, your account will also have to have a non-blank password. See this Apple support article for more information.

Once installation has completed, make sure that Python 2.7 is the default Python version by doing the following from the Terminal:

This will load the Python interpreter. The first line of output will display the Python version number. Type exit() to exit the interpreter.

Install setuptools as follows:

Unpack the archive by double-clicking on it in Finder

cd setuptools-0.8 sudo python ez_setup.py

Install Xcode (OS X developer tools)

Download and install Xcode 3.2.6 and iOS SDK 4.3 for Snow Leopard here (This requires you to register for a free developer account)

RHESSysWorkflows uses Git to download RHESSys source code so you don't have to.

Install PIP, a tool for installing Python modules

Pip is the recommended way to install Python modules (i.e. rather than using easy_install). For example, Pip allows you to easily uninstall modules. To install pip, enter the following in a Terminal window:

Install GIS tools: GRASS & QGIS

Note, GRASS version 6.4 is required for RHESSysWorkflows (GRASS 7.0 is not supported at this time). GRASS is used internally to carry out workflow steps (leading to the creation of RHESSys world files and flow tables).
You will also find it useful to use GRASS to visualize the results from some workflow steps.

Here you will need to download and install the following:

  1. GDAL Complete framework
  2. FreeType framework
  3. cairo framework
  4. PIL (Python imaging library)
  5. GRASS.app

While you are there, we recommend you also install QGIS (Quantum GIS)

In addition to GRASS and components installed above, install:

  1. NumPy from http://www.kyngchaos.com/software/python
  2. SciPy from http://www.kyngchaos.com/software/python
  3. Matplotlib Python module from http://www.kyngchaos.com/software/python
  4. QGIS from from http://www.kyngchaos.com/software/qgis

QGIS is useful for visualizing output for earlier workflow steps that precede the importing data into GRASS.

Install GDAL Python modules

Even though we installed the GDAL complete framework above, we still need to install the GDAL Python modules for the copy of Python 2.7 we installed above the GDAL framework only installs the Python modules for Python 2.6, which RHESSysWorkflows is not compatible with. These installation steps are a little ugly, but bear with me (or upgrade from OS X 10.6). From a Terminal window type the following:

Install RHESSysWorkflows Python modules (including EcohydroLib)

Before installing RHESSysWorkflows, we need to install some dependencies by hand (this is annoying, but unavoidable):

This is necessary because another depdendency (statsmodels) requires that we install its dependencies first.

To install RHESSysWorkflows and its dependencies (including EcohydroLib), enter the following from your Terminal if you are running XCode 5.0 or earlier:

Install GRASS Addons for RHESSysWorkflows

Follow these steps to install the GRASS addons needed by RHESSysWorkflows:

Create a new location (it doesn't matter where, we'll only use it to run the g.extension command to install the extensions)

For more information on these addons (r.soils.texture and r.findtheriver), see:

Setup EcohydroLib and RHESSysWorkflows configuration file

Save into a file named '.ecohydro.cfg' stored in your home directory Replace all occurances of <myusername> with your user name (To find out your OS X user name, use the whoami command in Terminal).

Set ECOHYDROLIB_CFG environment variable so that RHESSysWorkflows can find your configuration file

From Terminal, do the following:

echo "export ECOHYDROLIB_CFG=$/.ecohydro.cfg" >>

Re-load bash profile (or close and open a new Terminal window):

This concludes installation and configuration instructions for OS X 10.6.

RHESSysWorkflows allows you to use local copies of the National Hydrography Dataset Plus (NHD Plus) to locate USGS streamflow gages, and the National Landcover Dataset (NLCD 2006). If you will be building many models across the U.S. or are running RHESSysWorkflows in a server environment and would like to minimize calls to external web services, you may wish to install these datasets locally to improve performance. This is entirely optional. Most users can ignore this as querying webservices for these data is preferable to downloading and installing these relatively large datasets.

To setup a local copy of NLCD2006 land cover data, do the following:

It is important that you download this version of the dataset, and not the official data from http://www.mrlc.gov/nlcd06_data.php. The offical data are packaged using a version of PkZip that is not compatible with OS X's GUI or commandline unzip utilities.

Copy NLCD2006 archive to the parent folder where you would like to store it

For example, under OS X, create a folder called 'data' in your home directory

Unpack NLCD2006 data (this will take a while. time for a coffee break):

OS X 10.6: From the command line:

tar xvjf nlcd2006_landcover_4-20-11_se5.tar.bz2

OS X 10.7/10.8: double-click on the archive in Finder

Setup pre-packaged NHDPlusV2 data

If you want to setup a local copy of NHDPlusV2 data you can obtain these data by downloading all or a subset of the NHDPlusV2 data and building the database as described in the EcohydroLib documentation. Alternatively, you can download a pre-built copy of the NHDPlusV2 database needed by RHESSysWorkflows here. To download and unpack the pre-built data, do the following:

Download pre-packaged NHDPlusV2 database here

Note, the compressed data are nearly 7 GB, nearly 11 GB uncompressed, the download may take a while

Copy the pre-packaged NHDPlusV2 database archive to the parent folder where you would like to store it

For example, under OS X, create a folder called 'data' in your home directory

Unpack NHDPlusV2 database archive (this will take a while. have a cup of tea)

OS X 10.6: From the command line:

OS X 10.7/10.8: double-click on the archive in Finder

Setup EcohydroLib and RHESSysWorkflows configuration file for local data

Choose the appropriate prototype configuration file:

Save into a file named '.ecohydro.cfg' stored in your home directory Replace all occurances of '' with your user name (To find out your OS X or Linux user name, use the whoami command in Terminal).

Modify the example configuration to point to your NHDPlusV2 and NLCD2006 local data [if you are using these data]:

If you are using OS X, and if you placed the data in a directory called 'data' in your home directory, the only changes you need to make is to substitute '' with your user name.

If you chose to store local NLCD or NHDPlusV2 somewhere else, simply use the absolute path of each file.

Set ECOHYDROLIB_CFG environment variable so that RHESSysWorkflows can find your configuration file


1.3.4 String concatenation vs. format

In GEOG 485, we used the + operator for string concatenation to produce strings from multiple components to then print them out or use them in some other way, as in the following two examples:

An alternative to this approach using string concatenation is to use the string method format(…). When this method is invoked for a particular string, the string content is interpreted as a template in which parts surrounded by curly brackets <…>should be replaced by the variables given as parameters to the method. Here is how the two examples from above would look in this approach:

In both examples, we have a string literal '….' and then directly call the format(…) method for this string literal to give us a new string in which the occurrences of <…>have been replaced. In the simple form used here, each occurrence of this pattern will be replaced by the i-th parameter given to format(…). In the second example, <0>will be replaced by the value of variable fieldName and <1>will be replaced by variable countryName. Please note that the second example will also use ' to produce the single quotes so that the entire template could be written as a single string. The numbers within the curly brackets can also be omitted if the parameters should be inserted into the string in the order in which they appear.

The main advantages of using format(…) are that the string can be a bit easier to produce and read as in particular in the second example, and that we don’t have to explicitly convert all non-string variables to strings with str(…). In addition, format allows us to include information about how the values of the variables should be formatted. By using , we say that the value of the i-th variable should be expanded to n characters if it’s less than that. For strings, this will by default be done by adding spaces after the actual string content, while for numbers, spaces will be added before the actual string representation of the number. In addition, for numbers, we can also specify the number d of decimal digits that should be displayed by using the pattern <>df>. The following example shows how this can be used to produce some well-formatted list output:

The pattern <0:20>is used here to always fill up the names of the tree species in the list with spaces to get 20 characters. Then the pattern <1:3.2f>is used to have the percentage numbers displayed as three characters before the decimal point and two digits after. As a result, the numbers line up perfectly.

The format method can do a few more things, but we are not going to go into further details here. Check out this page about formatted output if you would like to learn more about this.


Large DEM file to Contours, what am I missing?

My issue:
The DEM is over 30GB in size. I've tried to clip the DEM two different ways based on a polygon layer that would remove 90% of the DEM. However, both times, the resulting clipped DEM is still close or over 30GB and it's taking days to process.

What I've tried:
1. Clipping the DEM via Image Analysis. I've selected all the polygons via the attribute table in the polygon layer. Then selecting the DEM listed in the Image Analysis, and using the Clip tool in the processing section. The clipped area shows up quickly. When I go to save/export the clipped DEM, the process is slow and the result is the clipped DEM still at (or close) to the original file size, even though the DEM is clipped. When trying to create the contours of this clipped DEM, I get the default message that the file exceeds the size limit of 2GB.

2. Using ArcToolbox Clip Raster Tool is giving the same result.

3. I've also used ETSurface and ended up with the same result (but a much much faster processing time).

My questions:
1. Why is the clipped buffer nearly the same size of the original?
2. Is a 2-3 day run time normal? 3. I need to end up with contours based on the polygon area.
4. Is there another way I can attempt to do this?


Watch the video: Python GIS - Clip Raster to a Polygon Extent using (October 2021).