Creating a Singularity Container to Run HuggingFace Transformers Models in R
Source:vignettes/singularity_transformers_container.Rmd
singularity_transformers_container.Rmd
Singularity is a container engine alternative to Docker. Singularity containers are well suited for the requirements of High Performance Computing (HPC) workloads.
A container contains all code as well as all its
dependencies so that the an application runs reliably on different
computers (or different computing environments). It can be used to run
on servers or as a way to ensure computational reproducibility (that the
code run on other systems, and in the future). For an introduction to
the concept of containers see Computational
Reproducibility via Containers in Psychology. Below is code to build
a Singularity container for setting up transformers language models from
HuggingFace and running the text
-package.
Code to build a singularity container with HuggingFace models in R
Bootstrap: docker
From: ubuntu:20.04
%environment
export LANG=C.UTF-8 LC_ALL=C.UTF-8
export XDG_RUNTIME_DIR=/tmp/.run_$(uuidgen)
%post
# Install
apt-get -y update
export R_VERSION=4.2.2
echo "export R_VERSION=${R_VERSION}" >> $SINGULARITY_ENVIRONMENT
# Install R
apt-get update
apt-get install -y --no-install-recommends software-properties-common dirmngr wget uuid-runtime
wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | \
tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
add-apt-repository \
"deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"
apt-get install -y --no-install-recommends \
r-base=${R_VERSION}* \
r-base-core=${R_VERSION}* \
r-base-dev=${R_VERSION}* \
r-recommended=${R_VERSION}* \
r-base-html=${R_VERSION}* \
r-doc-html=${R_VERSION}* \
libcurl4-openssl-dev \
libharfbuzz-dev \
libfribidi-dev \
libgit2-dev \
libxml2-dev \
libfontconfig1-dev \
libssl-dev \
libxml2-dev \
libfreetype6-dev \
libpng-dev \
libtiff5-dev \
libjpeg-dev
# Add a default CRAN mirror
echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/lib/R/etc/Rprofile.site
# Fix R package libpaths (helps RStudio Server find the right directories)
mkdir -p /usr/lib64/R/etc
echo "R_LIBS_USER='/usr/lib64/R/library'" >> /usr/lib64/R/etc/Renviron
echo "R_LIBS_SITE='${R_PACKAGE_DIR}'" >> /usr/lib64/R/etc/Renviron
# Clean up
rm -rf /var/lib/apt/lists/*
# Install python3
apt-get -y install python3 wget
apt-get -y clean
# Install Miniconda
cd /
wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p /miniconda
/bin/bash <<EOF
rm Miniconda3-latest-Linux-x86_64.sh
source /miniconda/etc/profile.d/conda.sh
conda update -y conda
# Install reticulate and text
Rscript -e 'install.packages("pkgdown")'
Rscript -e 'install.packages("ragg")'
Rscript -e 'install.packages("textshaping")'
Rscript -e 'install.packages("reticulate")'
Rscript -e 'install.packages("devtools")'
Rscript -e 'install.packages("glmnet")'
Rscript -e 'install.packages("tidyverse")'
# Rscript -e 'install.packages("text")'
Rscript -e 'devtools::install_github("oscarkjell/text")'
# Create the Conda environment at a system folder
Rscript -e 'text::textrpp_install(prompt = FALSE, rpp_version = c("torch==1.11.0", "transformers==4.19.2", "numpy", "nltk"))'
Rscript -e 'text::textrpp_initialize(save_profile = TRUE, prompt = FALSE, textEmbed_test = TRUE)'
Rscript -e 'text::textEmbed("hello", model = "distilbert-base-uncased", layers = 5)'
Rscript -e 'text::textEmbed("hello", model = "roberta-base", layers = 11)'