7 Managing Environments
7.1 Context
This is a big one for the R user. When you are unfamiliar with how Python virtual environments work, it is probably one of the most frustrating concepts to pick up from scratch.
But fear not - to fully get to grips with how Python virtual environments work and package installation works, it is good to remind ourselves how this works in R.
7.1.1 How R package installation works
Well, remind ourselves…. Python installation in R is likely something you have not really thought about, beyond writing install.packages()
.
This is one of the huge benefits that makes R easy to pick up for “non-programmers”, a lot of this stuff is handled for you.
7.1.1.1 Building from binaries vs. building from source
7.1.2 How Python package installation works
pip, conda, miniconda, mambo PyPi… where do I start?
7.2 With conda
7.2.1 Setting up an environment from scratch
conda is …
To create a project environment which manages both R (r-base) and Python (python):
conda create -n env_name r-base python=3.12
This sets up a new environment called env_name
with both R and Python install.
Notice you can specify which version of R and/or Python to install.
7.2.1.1 Python packages
You can then install Packages packages with:
conda install numpy
7.2.1.2 Python packages on Github
For Python packages on Github:
conda install git pip
# Now use pip install with git+<repo-url>
pip install git+https://github.com/davidwilby/amundsen-sea-low-index
# If you need a specific optional dependency:
pip install asli[s3]@git+https://github.com/davidwilby/amundsen-sea-low-index
7.2.1.3 R packages
R packages are installed with the same conda install
command. Note that in this case, package names are pre-pended with r-
:
conda install r-dplyr
7.2.1.4 R packages on Github, or not on CRAN
For R packages on Github:
Taken from this issue, but does not work for me
conda install conda build
conda skeleton cran <github-url>
# Replace with your version of R
conda build --R=4.2.2 r-<lowercase-packagename>
This creates a “conda skeleton” of the R package hosted on Github.
note, explain what a conda skeleton is?
Alternative approach, specify R version in conda but use renv::restore()
for non-CRAN packages
All packages installed and used should now be recorded in the environment, with their appropriate version number.
7.2.2 Exporting conda environment: environment.yml
To access and share your environment, run:
conda env export > environment.yml
This will create an environment.yml file in your project directory. This will allow for this environment to be set up on other systems.
7.2.3 Installing from an environment.yml
For others (or for yourself on a different system) to set up an identical environment, run:
conda -f create env_name environment.yml
If you can’t (or don’t want to) use conda, it is possible to manage the Python and R environments in the same project, but treat their management as seperate processes.
7.3 Without conda
7.3.1 Python without conda
In Python, this is fairly straightforward, and you can use this method regardless of the level of permissions you have on your machine. Here, you would use venv
, and requirements.txt
.
7.3.1.1 Setting up a venv and requirements.txt
# To create your venv
python -m venv env_name
This creates you virtual environment. To use it, and install your package you need to “activate” it.
# To activate your venv
source env_name/bin/activate
Now you have activated your environment, you can install a packages using pip
.
pip install pathlib
To keep track of the packages your project depends on, record them in a file called requirements.txt
. It is a very simple file that looks like this:
configparser==7.1.0
pathlib==1.0.1 xarray==2024.6.0
All it records is packagename==1.0.0
, very simple!
If you are not sure what packages you have imported.
pip freeze
This should print them in the terminal.
7.3.1.2 Installation in a new venv from a requirements.txt
To set up a new environment and install the required packages from a specified requirements.txt:
# To create your venv
python -m venv env_name
source env_name/bin/activate
pip install -r requirements.txt
7.3.2 R without conda
To manage R dependencies, you have a couple of options, depending on the level of permissions you have on the system you are working on.
7.3.2.1 With sudo permissions
If you have sudo permissions, we can use the pak
package. The great things about pak
is that, unlike install.packages()
and renv::install()
(more on renv later), it will automatically fetch the pre-built binaries for your operating system, distribution and version. More detail on pak, and how it operates in R in Production.
It will also automatically install any system-level dependencies that your R package may require. This is especially useful on linux systems, where system dependency installation can vastly differ between distributions.
If we have sudo permissions, and are using pak
, your approach could be as simple as providing an install.R
file which installs the right packages for you.
First install pak
:
Rscript -e install.packages('pak')
Now it can be called in your set up script. You would normally only run this once on deployment.
#!/usr/bin/env Rscript
# Usage R -f install.R
# Selecting p3m.dev is an optional step for linux distros
# It will speed up installation and prevents the risk of installation
# failing on external C libraries
# This is because CRAN only provides source packages for linux
# and not binary
# see: https://r-in-production.org/packages.html#installing-a-package-on-linux
# For Ubuntu 24.04
options(repos = c(CRAN = "https://p3m.dev/cran/__linux__/noble/latest"))
# For Rocky 9
# options(repos = c(CRAN = "https://p3m.dev/cran/__linux__/rhel9/latest"))
::pak("readr")
pak::pak("paws")
pak::pak("ini")
pak::pak("assertr")
pak::pak("dplyr")
pak
# A package on Github:
::pak("thomaszwagerman/butterfly") pak
Call it with:
R -f install.R
7.3.2.2 Without sudo permission
Without sudo permission on your machine, you might have trouble running installation commands such as install.packages()
or pak::pak()
, as R might be trying to install your packages into a shared library, where you do not have ‘write’ permission.
In this case, the path of least resistance would be to use renv
. To manage your environment.
If you have not used renv
before, it is highly recommended you read Getting Started with renv before reading further.
To start using renv
:
install.packages('renv')
::init() renv
This will install and set up renv
for you. renv::init()
generates a renv.lock
file based on the packages you have installed and used.
An extract from a renv.lock
is shown below. You will notice it specifies the version of R used, which repositories it has used for installation, as well as packages and their associated version and download source.
{
"R": {
"Version": "4.4.1",
"Repositories": [
{
"Name": "P3M",
"URL": "https://packagemanager.posit.co/cran/__linux__/centos7/latest"
}
]
},
"Packages": {
"MASS": {
"Package": "MASS",
"Version": "7.3-59",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R",
"grDevices",
"graphics",
"methods",
"stats",
"utils"
],
"Hash": "0cafd6f0500e5deba33be22c46bf6055"
},
"R6": {
"Package": "R6",
"Version": "2.5.1",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R"
],
"Hash": "470851b6d5d0ac559e9d01bb352b4021"
}
}
}
This destination is set to a specific operating system, centos7. This URL is obtained from the Posit Package Manager.
When using pak
, this is automatically fetched for us. Unfortunately for us, renv
does not use pak
. To prevent having to manually change this URL for each deployment on a different system, we need to insert this URL depending on the operating system we are working on.
With the above in mind, the install.R
script would look like this:
#!/usr/bin/env Rscript
# Usage R -f hpc_setup.R
#
# This will not work with opensuse and sle,
# naming inconsistencies across distros is hard
# This is not an R project, so need to manually "activate" renv
source("renv/activate.R")
install.packages("pkgcache")
# Moving on to installing r and system dependencies with renv.lock
# Have R obtain the current platform distro and release
<- data.frame(
os distribution = pkgcache::current_r_platform_data()$distribution,
release = pkgcache::current_r_platform_data()$release
)
$release <- round(as.numeric((os$release)))
os
# Some wrangling to make matching more reliable across distros
<- pkgcache::ppm_platforms()
ppm_platforms
# Take the word "linux" out of distribution names
$distribution <- gsub("linux", "", ppm_platforms$distribution)
ppm_platforms# Makes matching rocky distro possible
$release <- round(as.numeric((ppm_platforms$release)))
ppm_platforms
# Match with pak's ppm_platforms
<- merge(
os_table
os,
ppm_platforms
)
if (os_table$os == "linux") {
<- paste0(
p3m_url "https://p3m.dev/cran/__linux__/",
$binary_url,
os_table"/latest"
)else {
} <- "https://p3m.dev/cran/latest"
p3m_url
}
::lockfile_modify(repos = c(
renvP3M = p3m_url
|>
)) ::lockfile_write()
renv
::restore() renv
In summary, this script should:
- Activate
renv
and installpkgcache
. - Detect which os, distribution and version R is being run on.
- Concatenate the package manager URL and the os binary URL.
- Modify the
renv.lock
file to point the repos URL to the correct binary URL.