Skip to content

Text enables users access to HuggingFace Transformers in R through the R-package reticulate as an interface to Python, and the python packages torch and transformers. So it’s important to install both the text-package and a python environment with the text required python packages that the text-package can use. The recommended way is to use textrpp_install() to install a conda environment with text required python packages, and textrpp_initialize to initialize it.

Conda environment

library(text)
library(reticulate)

# Install text required python packages in a conda environment (with defaults).
text::textrpp_install()

# Show available conda environments.
reticulate::conda_list()

# Initialize the installed conda environment.
# save_profile = TRUE saves the settings so that you don't have to run textrpp_initialize() after restarting R. 
text::textrpp_initialize(save_profile = TRUE)

# Test so that the text package work.
textEmbed("hello")

Solving OMP errors and R/Rstudio crashes

Recently some text users (mainly on Mac), have experienced OMP errors - and that RStudio and R crashes. When this is happening we have found the following solutions for now:

Sys.setenv(OMP_NUM_THREADS = "1") #Limit the number of threads to prevent conflicts.

Sys.setenv(OMP_MAX_ACTIVE_LEVELS = "1") 

# Also might have to restart R
.rs.restartR()

# If above does not work, you can also try this; although this solution might have some risks assocaited with it (for more information see https://github.com/dmlc/xgboost/issues/1715)
Sys.setenv(KMP_DUPLICATE_LIB_OK = "TRUE") #Temporarily allows execution despite duplicate OpenMP libraries.

### This is how you can unset the settings
Sys.unsetenv("OMP_NUM_THREADS")
Sys.unsetenv("OMP_MAX_ACTIVE_LEVELS")
Sys.unsetenv("KMP_DUPLICATE_LIB_OK")

# This is how you can verify the settings
print(Sys.getenv("DYLD_LIBRARY_PATH"))


# Please let us know if you find any other solutions. 

Solving Mac OS errors

Failed to build tokenizers

if running: textrpp_install()

results in this error:

Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

In the terminal run:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Rust compiler

Error:

"Error: Error installing package(s): ..." 
including: "error: can't find Rust compiler"

In the terminal run:

brew install rust

The success of the installation is dependent on using conda, python and package versions that work together. The installation of the text-package with text required python packages is tested on Linux, Mac OS, and Windows using github actions. The installation procedure and details can be seen at github actions (look at workflow runs called System specific installation NoPy).

The table below show various combination of python and package versions that have worked (it is not an exhaustive list).

os mini_conda python torch transformers success
‘Mac OS’ ‘-’ ‘3.9.0’ ‘torch==1.11.0’ ‘transformers==4.19.2’ Pass
‘Linux’ ‘-’ ‘3.9.0’ ‘torch==1.11.0’ ‘transformers==4.19.2’ Pass
‘Windows’ ‘-’ ‘3.9.0’ ‘torch==1.11.0’ ‘transformers==4.19.2’ Pass
‘Windows’ ‘4.10.1’ ‘3.9.0’ ‘torch==1.7.1’ ‘transformers==4.12.5’ FAIL
‘Mac OS’ ‘4.10.3’ ‘3.9.0’ ‘torch==1.7.1’ ‘transformers==4.12.5’ Pass
‘Linux’ ‘4.10.3’ ‘3.9.0’ ‘torch==1.7.1’ ‘transformers==4.12.5’ Pass
‘Windows’ ‘4.10.3’ ‘3.9.0’ ‘torch==1.7.1’ ‘transformers==4.12.5’ Pass
‘Mac OS’ ‘4.10.3’ ‘3.8.10’ ‘torch==1.7.1’ ‘transformers==4.12.5’ Pass
‘Linux’ ‘4.10.3’ ‘3.8.10’ ‘torch==1.7.1’ ‘transformers==4.12.5’ Pass
‘Windows’ ‘4.10.3’ ‘3.8.10’ ‘torch==1.7.1’ ‘transformers==4.12.5’ Pass
‘Mac OS’ ‘4.10.3’ ‘3.7.0’ ‘torch==0.4.1’ ‘transformers==3.3.1’ Pass
‘Linux’ ‘4.10.3’ ‘3.7.0’ ‘torch==0.4.1’ ‘transformers==3.3.1’ Pass
‘Windows’ ‘4.10.3’ ‘3.6.13’ ‘torch==1.10’ ‘transformers==3.3.1’ Pass

Virtual environments

It is also possible to use virtual environments (although it is currently only tested on MacOS).

# Create a virtual environment with text required python packages.
# Note that you have to provide a python path.
text::textrpp_install_virtualenv(rpp_version = c("torch==1.7.1", "transformers==4.12.5", "numpy", "nltk"),
                                 python_path = "/usr/local/bin/python3.9",
                                 envname = "textrpp_virtualenv")

# Initialize the virtual environment.
text::textrpp_initialize(virtualenv = "textrpp_virtualenv",
                         condaenv = NULL,
                         save_profile = TRUE)

Versions tested for virtual environment

Virtual environments works for MacOS, whereas github actions does not currently work for Linux and Windows. At gihub actions look for a workflow run called: Virtual environment for more information.

OS Python_version torch transformers Success
‘Mac OS’ ‘3.9.8’ ‘torch==1.11.0’ ‘transformers==4.19.2’ Pass
‘Linux’ ‘3.9.8’ ‘torch==1.11.0’ ‘transformers==4.19.2’ Pass
‘Mac OS’ ‘3.9.8’ ‘torch==1.7.1’ ‘transformers==4.12.5’ Pass
‘Linux’ - - - -
‘Windows’ - - - -

Installation instructions for text 0.9.10

Below is the instructions for installing earlier versions of text (0.9.10 and before); these should work for newer versions of text as long as a correct versions of python and required packages are used.

library(text)

# To install the python packages torch, transformers, numpy and nltk through R, run: 
library(reticulate)
install_miniconda()

conda_install(envname = 'r-reticulate', c('torch==0.4.1', 'transformers==3.3.1', 'numpy', 'nltk'), pip = TRUE)

# Windows 10
conda_install(envname = 'r-reticulate', c('torch==0.4.1', 'transformers==3.3.1', 'numpy', 'nltk'))

Checking your versions

If something isn’t working right, it is a good start to examine what is installed and running on your system. For example to make sure that you have R and Python versions that are up to date.


# First check R-version and which packages that are attached and loaded.  
sessionInfo()

# Second check out python version; and make sure you at least have version 3.6.10
library(reticulate)
py_config()

Issue: RStudio craches during textEmbed

After a new install/update of text, RStudio crashed (Abort session) when running functions that fetches word embeddings (i.e., textEmbedLayersOutput or textEmbed).

Solution: Reinstall reticulate and r-miniconda

To solve the issue re-install reticulate (development version) and uninstall and install r-miniconda.

Uninstall r-miniconda by removing its entire folder (which by default [in Mac] is at Users/YOUR_USER_NAME/Library/r-miniconda).

(Note that [in Mac] the Library folder is hidden, so to make it visible go to Finder and the path Users/YOUR_USER_NAME/ and press the three keys: COMMAND + SHIFT + . . Then the Library-folder should appear, and you can find and remove r-miniconda.

library(text)

# To re-install packages start with a fresh session by restarting R and RStudio

# Install development of reticulate (might not be necessary)
devtools::install_github("rstudio/reticulate")

# After having manually removed the r-miniconda folder, install it again: 
library(reticulate)
install_miniconda()

# Subsequently re-install torch, transformers, numpy and nltk by running: 
conda_install(envname = 'r-reticulate', c('torch==0.4.1', 'transformers==3.3.1', 'numpy', 'nltk'), pip = TRUE)

The exact way to install these packages may differ across systems. Please see:
Python
torch
transformers

Share advise

If you find a good solution please feel free to email oscar [ d_o t] kjell [a_t] psy [DOT] lu [d_o_t]se so that we can update above instructions. >>>>>>> e368e8b (documentation updates)

GitHub