Last updated: 2021-06-29
Checks: 7 0
Knit directory: globalIRmap/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200414)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 5eed8f5. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: R/.Rhistory
Ignored: analysis/.Rhistory
Ignored: globalIRmap_rep/
Ignored: renv/library/
Ignored: renv/staging/
Untracked files:
Untracked: .drake/
Untracked: .gitignore
Untracked: figtabres.docx
Untracked: schema.ini
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/methods_workflow.Rmd
) and HTML (docs/methods_workflow.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 5eed8f5 | messamat | 2021-06-29 | wflow_publish(c(“analysis/about.Rmd”, “analysis/index.Rmd”, “analysis/methods_gettingstarted.Rmd”, |
Rmd | 8abae5d | messamat | 2021-06-29 | Update doc |
html | 8abae5d | messamat | 2021-06-29 | Update doc |
html | 6e12f71 | messamat | 2021-06-14 | Build site. |
Rmd | 5e433c3 | messamat | 2021-06-14 | Publish new pages |
html | 9b48bde | messamat | 2021-01-06 | Build site. |
Rmd | f1d9dcf | messamat | 2021-01-06 | Start building up workflowr website, start incorporating mandrake (but wait as very unstable still), plan gauge selection documentation |
html | f1d9dcf | messamat | 2021-01-06 | Start building up workflowr website, start incorporating mandrake (but wait as very unstable still), plan gauge selection documentation |
This study leverages the respective strengths of R (for data wrangling, statistics, and figure-making) and Python (for spatial analysis and mapping). As a result, re-producing it requires going back and forth between these two languages and platforms. At the broadest level, the main steps of this analysis were the following:
1. Pre-processing and formatting global river network environmental attributes in Python.
2. Pre-processing and formatting spatial datasets aside from hydro-environmental attributes in Python.
3. R analysis in R.
4. Final formatting of analysis outputs for mapping in Python.
5. Map results and comparisons in ArcMap.
Below, we briefly explain how each of these steps was implemented, but additional data not currently available publicly are needed to fully reproduce the analysis. Please contact mathis.messager@mail.mcgill.ca and/or bernhard.lehner(at)mcgill.ca for additional information should you want to re-produce the results from this study. In addition please note that processing these data takes weeks of continuous computing on a normal workstation.
Main purpose: download and compute additional hydro-environmental attributes for the global river network than those already included in RiverATLAS.
utility_functions.py:
- import key modules.
- defines utility functions used throughout the analysis.
- defines the basic folder structure of the analysis.
runUplandWeighting.py:
- define functions for routing data on river network
Downloading data requires the creation of a file called “configs.json” with login information for earthdata and alos. For guidance on formatting the json configuration file, see here.
Execute:
1. scripts for downloading data in any order
2. format_MODISmosaic.py
3. format_HydroSHEDS.py
4. format_WorldClim2.py
5. other formatting scripts in any order
6. runUplandWeighting_batch.py
7. runHydroATLASStatistics.py
Main purpose: compile and pre-process global river network; download and spatially pre-process streamflow gauging stations (reference data for model training and testing), national hydrographic datasets, and on-the-ground visual observations of flow intermittence.
utility_functions.py:
- imports key modules.
- defines utility functions used throughout the analysis.
- defines the basic folder structure of the analysis.
setup_localIRformatting.py: - defines folder structure for formatting data to compare modeled estimates of global flow intermittence to national hydrographic datasets (Comparison_databases) and to in-situ/field-based observations of flow intermittence (Insitu_databases). - defines functions used in formatting data for the comparisons
Execute:
1. scripts for downloading data in any order
2. format_RiverATLAS.py
3. format_stations.py
4. format_FROndeEaudata.py
5. format_PNWdata.py
Main purpose: QA/QC streamflow gauging station records; develop and validate random forest models, compare predictions to hydrographic datasets and on-the-ground observations, generate tables, make non-spatial figures and generate tabular predictions for global river network and spatial data for other mapping.
The structure of the Github repository stems from the fact that this project is formatted as an R package, relies on drake for organizing the analysis workflow, renv for dependency management, and includes all documents used for this workflowr website. All files and directories whose role is mainly structural (and therefore whose content can be ignored for the analysis) are marked with a X; directories are in bold, files are in italic.
After downloading the project from Github (see Getting started), launch the R project (globalIRmap.Rproj
) and execute the following lines:
renv::restore() # respond y, restores all R packages with their specific version
remotes::install_github('messamat/globalIRmap') #install project package so that the help documentation can be accessed for project functions (e.g., ?format_gaugestats)
To check the steps of the analysis, see R/IRmapping_plan.R, which contains the drake “plan”, the high-level catalog of all the steps in the workflow (see the corresponding chapter in the drake user manual). This plan defines the order of functions to use, their inputs and outputs (usually, targets), and the relationship among targets and steps. The R object for a plan is a data.frame with columns named target
and command
; each row represents a step in the workflow; each command.Each command is a concise expression that makes use of our functions, and each target is the return value of the command (the target
column has the names of the targets, not the values). In this analysis, functions from planutil_functions.R were used to create complex branches in the plan (re-running the entire model and result formatting pipeline but when defining non-perennial watercourses as those that cease to flow at least 30 days per year, rather than 1 day per year).
To get information on a function, simply type ?function
in the R console. For instance, ?comp_GRDCqstats
.
Provided that your were given the necessary data, the entire analysis can simply be re-run with the following code found in interactive.R
:
r_make()
is the central, most important function of the drake approach. It runs all the steps of the workflow in the correct order, skipping any work that is already up to date. Because of how drake tracks global functions and objects as dependencies of targets, the use of r_make()
is needed to run the analysis pipeline in a clean reproducible environment. If all targets are up to date in the .drake/
directory, then nothing will be run.
If you were provided intermediate targets (i.e., a .drake/
directory; or once you have re-run the analysis), you can load individual targets in the environment with the following commands (even if the targets are not up to date due to e.g. a change in source path).
loadd(globaltables_gad_id_cmj)
print(globaltables_gad_id_cmj)
tab <- readd(globaltables_gad_id_cmj)
print(tab)
To reproduce the tables and (non-map) figures in the manuscript, run the following line in the R console: rmarkdown::render('figtabres.Rmd', encoding = 'UTF-8')
. Note that nearly all figures and tables were manually adjusted for aesthetic purpose prior to inclusion in manuscript, so that the rendition of figtabres.Rmd
will not exactly match the final format in form (but it will match in content).
The R analysis produces a suite of outputs necessary for subsequent mapping and analysis, notably:
- RiverATLAS_predbasic800_20210216.csv
: comma-separated value table of model predictions for the RiverATLAS global river network.
- BasinATLAS_v10_lev03_errors.gpkg
: polygons of BasinATLAS level 3 subdivisions with attributes. Used to produce Figure 3 in the manuscript main text and Figure S6 in the Supplementary Information.
- GRDCstations_predbasic800.gpkg
and GRDCstations_predbasic800_mdur30.gpkg
: points of gauging stations used in model development and validation with reference attributes and model predictions. Used to produce Extended Data Fig. 2 and Supplementary Information Fig. S3.
- pnwobs_IPR_predbasic800cat%Y%m%d%H%M%S.shp
and ondeobs_IPR_predbasic800cat%Y%m%d%H%M%S.shp
: on-the-ground observations of flow intermittence from ONDE (France) and PROSPER (U.S. Pacific Northwest) with reference attributes and model predictions. Used to produce Extended Data Fig. 6.
Run predict_RF.py in the globalIRmap_py Github repository:
- Format global river network attributes for mapping.
- Divide and export RiverATLAS global river network (with predictions) into subsets by discharge and drainage area size classes for mapping.
- Join ONDE (France) and PROSPER (U.S. Pacific Northwest) on-the-ground observations with river network for mapping (Extended Data Fig. 6).
Please contact Mathis Messager for ArcGIS map packages to reproduce specific maps from the manuscript.
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] workflowr_1.6.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 rstudioapi_0.13 whisker_0.4 knitr_1.29
[5] magrittr_1.5 R6_2.4.1 rlang_0.4.10 fansi_0.4.1
[9] stringr_1.4.0 tools_4.0.2 xfun_0.24 utf8_1.1.4
[13] git2r_0.27.1 htmltools_0.5.0 ellipsis_0.3.2 rprojroot_1.3-2
[17] yaml_2.2.1 digest_0.6.25 tibble_3.1.1 lifecycle_0.2.0
[21] crayon_1.3.4 later_1.2.0 vctrs_0.3.8 promises_1.2.0.1
[25] fs_1.5.0 glue_1.4.0 evaluate_0.14 rmarkdown_2.7
[29] stringi_1.4.6 compiler_4.0.2 pillar_1.6.1 backports_1.1.10
[33] httpuv_1.5.4 renv_0.9.3 pkgconfig_2.0.3