Last updated: 2021-06-29
Checks: 7 0
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
set.seed(20200414) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 5eed8f5. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use
wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files: Ignored: .Rhistory Ignored: .Rproj.user/ Ignored: R/.Rhistory Ignored: analysis/.Rhistory Ignored: globalIRmap_rep/ Ignored: renv/library/ Ignored: renv/staging/ Untracked files: Untracked: .drake/ Untracked: .gitignore Untracked: figtabres.docx Untracked: schema.ini
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (
analysis/methods_gettingstarted.Rmd) and HTML (
docs/methods_gettingstarted.html) files. If you’ve configured a remote Git repository (see
?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.
|Rmd||5eed8f5||messamat||2021-06-29||wflow_publish(c(“analysis/about.Rmd”, “analysis/index.Rmd”, “analysis/methods_gettingstarted.Rmd”,|
|Rmd||0e72638||messamat||2021-06-24||wflow_publish(c(“analysis/about.Rmd”, “analysis/index.Rmd”, “analysis/license.Rmd”,|
|Rmd||5e433c3||messamat||2021-06-14||Publish new pages|
To reproduce this analysis, please contact firstname.lastname@example.org to obtain the required raw and pre-processed data, which represent between 60 and 500GB depending on how comprehensive of a re-production is wanted.
This analysis relies as much as possible on good enough practices in scientific computing, which users are encouraged to read.
Structure: the overall project directory is structured with the following sub-directories:
bin/ (compiled code/external packages)
data/ (raw data, not to be altered)
results/ (results of the analysis, mostly reproduceable through code executiong but also includes manually modified results)
src/ (code written for the project)
|—- globalIRmap (source code for analysis in R)
|—- globalIRmap_HydroATLAS_py (source code for formatting of global river network environmental attributes)
|—- globalIRmap_py (source code for formatting of spatial data in Python)
All scripts rely on this structure.
The overall workflow of the project is detailed in the Workflow tab of this website.
The documentation that follows is specifically for the portions of the analysis conducted in Python, which encompass all spatial formatting and analyses.
Prerequisites All GIS analyses in this study require an ESRI ArcGIS license including the Spatial Analyst extension, which itself requires a Windows OS. We used the Python Arcpy module associated with ArcGIS 10.7 in Python 2.7 with 64-bit background processing.
In Git Bash, the following commands illustrate the procedure to make a local copy (i.e. clone) of the Github repository in a newly created directory at C://test_globalIRmap/src :
Mathis@DESKTOP MINGW64 ~ $ cd /c/ Mathis@DESKTOP MINGW64 /c $ mkdir test_globalIRmap Mathis@DESKTOP MINGW64 /c $ mkdir /c/test_globalIRmap/src Mathis@DESKTOP MINGW64 /c $ cd /c/test_globalIRmap/src Mathis@DESKTOP MINGW64 /c/test_globalIRmap/src $ git clone https://github.com/messamat/globalIRmap_py.git Cloning into 'globalIRmap_py'... remote: Enumerating objects: 164, done. remote: Counting objects: 100% (164/164), done. remote: Compressing objects: 100% (108/108), done. Receiving obremote: Total 164 (delta 97), reused 118 (delta 53), pack-reused 0 Receiving objects: 100% (164/164), 116.62 KiB | 1003.00 KiB/s, done. Resolving deltas: 100% (97/97), done.
The documentation that follows is specifically for the portions of the analysis conducted in R, which encompass all statistical analyses and figure-making (aside from maps).
Documentation: this project is organized as an R package, providing documented functions to reproduce and extend the analysis reported in the publication. Note that this package has been written explicitly for this project and may not be suitable for general use. See guidelines below to install the package.
R Workflow: this project is setup with a drake workflow, ensuring reproducibility. In the
drake philosophy, every action is a function, and every R object is a “target” with dependencies. Intermediate targets/objects are stored in a
Dependency management: the R library of this project is managed by renv. This makes sure that the exact same package versions are used when recreating the project. When calling
renv::restore(), all required packages will be installed with their specific version. Please note that this project was built with R version 4.0.3 on a Windows 10 operating system. The renv packages from this project are not compatible with R versions prior to version 3.6.0.
Syntax: this analysis relies on the data.table syntax, which provides a high-performance version of data.frame. It is concise, faster, and more memory efficient than conventional data.frames and the tidyverse syntax.
Machine learning model development: for random forest model development, this project relies on the mlr3 package and ecosystem (see the mlr3 book for learning its usage), which provides an object-oriented framework for machine learning.
In Git Bash, continuing from the previous example for downloading the Github repository of the Python analysis, the following commands illustrate the procedure to make a local copy of the Github repository in a newly created directory at C://test_globalIRmap/src :
Mathis@DESKTOP MINGW64 /c/test_globalIRmap/src $ git clone https://github.com/messamat/globalIRmap.git Cloning into 'globalIRmap'... remote: Enumerating objects: 116, done. remote: Counting objects: 100% (116/116), done. remote: Compressing objects: 100% (89/89), done. remote: Total 7363 (delta 48), reused 75 (delta 19), pack-reused 7247 Receiving objects: 100% (7363/7363), 1.91 GiB | 3.78 MiB/s, done. Resolving deltas: 100% (925/925), done.
In R Studio for Windows, the following procedure can be used:
The issues tracker is the place to report problems or ask questions
See the repository history for a fine-grained view of progress and changes.
R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042) Matrix products: default locale:  LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  LC_MONETARY=English_United States.1252  LC_NUMERIC=C  LC_TIME=English_United States.1252 attached base packages:  stats graphics grDevices datasets utils methods base other attached packages:  workflowr_1.6.2 loaded via a namespace (and not attached):  Rcpp_22.214.171.124 rstudioapi_0.13 whisker_0.4 knitr_1.29  magrittr_1.5 R6_2.4.1 rlang_0.4.10 fansi_0.4.1  stringr_1.4.0 tools_4.0.2 xfun_0.24 utf8_1.1.4  git2r_0.27.1 htmltools_0.5.0 ellipsis_0.3.2 rprojroot_1.3-2  yaml_2.2.1 digest_0.6.25 tibble_3.1.1 lifecycle_0.2.0  crayon_1.3.4 later_1.2.0 vctrs_0.3.8 promises_126.96.36.199  fs_1.5.0 glue_1.4.0 evaluate_0.14 rmarkdown_2.7  stringi_1.4.6 compiler_4.0.2 pillar_1.6.1 backports_1.1.10  httpuv_1.5.4 renv_0.9.3 pkgconfig_2.0.3