
Extended Installation Guide
Source:vignettes/huggingface_in_r_extended_installation_guide.Rmd
huggingface_in_r_extended_installation_guide.RmdThe text-package provides users access to HuggingFace
Transformers in R through reticulate, which
interfaces with Python. It relies
on Python packages such as torch and
transformers.
To use these features, you need to install: • The text
package in R • A Python environment with the required packages •
System-specific dependencies (see below) The easiest way is
to use textrpp_install() to install a preconfigured Conda
environment and then initialize it with
textrpp_initialize().
Set up a python environment for the text-package
library(text)
library(reticulate)
# Install text-required Python packages in a Conda environment
text::textrpp_install()
# Show available Conda environments
reticulate::conda_list()
# Initialize the installed Conda environment
text::textrpp_initialize(save_profile = TRUE)
# Test that textEmbed works
textEmbed("hello")System-level dependencies
Different operating systems require different system-level dependencies, especially when working with Python packages that rely on compiled extensions or specialized libraries. Below is a summary of what’s commonly required and how to install it.
Windows
Microsoft C++ Build Tools
Some Python packages (e.g., hdbscan, flair) require compilation with
Visual C++. Windows users may need to manually install Microsoft C++
Build Tools. You must have C++ build tools for Python packages that
need compilation.
Install the latest version of Microsoft C++
Build Tools:
1. Download and run the installer from https://visualstudio.microsoft.com/visual-cpp-build-tools/.
2. During installation, check:
- “C++ build tools”.
-
Esure “Windows 10 SDK” is also selected.
3. Complete installation
and restart your computer.
Terms of Service conflict – Annaconda forge channel
Python environments used by the text and talk packages rely on
packages from the conda-forge channel.
You don’t need to install Anaconda manually – the textrpp_install()
function will install Miniconda and set conda-forge as the default
channel.
However, if you’ve previously installed Anaconda or changed channels,
you might encounter ToS conflicts (Terms of Service).
If errors mention pkgs/main or tos accept, you can fix it by running
the following in Terminal/Command Prompt (not R):
# Run this in your Terminal (not in R)
conda config --add channels conda-forge
conda config --set channel_priority strict
conda config --remove channels defaults
MacOS
On macOS, most system-level dependencies are typically pre-installed. For any missing components, the text package automatically detects them and provides clear instructions to guide the user through installation.
General troubleshooting
1. Check if you have install permissions
Can you install an R package like dplyr?
install.packages("dplyr")Can you install system-level tools like Python/miniconda?
library(reticulate)
reticulate::install_miniconda()If you do not have premissions, please contact your administrator for advice.
2. Remembered to initialize the python environment.
After restarting R, functions like textEmbed() can stop working again.
Solution: Persist the initialization in your R profile:
text::textrpp_initialize(
condaenv = "textrpp_condaenv",
refresh_settings = TRUE,
save_profile = TRUE,
) 3. Install the development version from GitHub.
# install.packages("devtools")
devtools::install_github("oscarkjell/text")4. Force reinstallation of the environment
library(text)
text::textrpp_install(
update_conda = TRUE,
force_conda = TRUE
)5. Install the python environment using reticulate
See the article Installing and
Managing Python Environments with reticulate, for
detailed information.
6. Inspect diagnostic information
If something isn’t working right, it is a good start to examine what is installed and running on your system.
library(text)
log <- text::textDiagnostics()
log Because the text package requires some system-level setup, installation is automatically verified on Windows, macOS, and Ubuntu through our GitHub Actions. If you encounter any issues, please review the tests and check the workflow file for details on system-specific installations.
To view the workflow file, select the three-dot menu on the right side of any GitHub Action run and choose View workflow file. This file specifies the operating systems, R versions, and additional libraries being tested.
Virtual environments
It is also possible to use virtual environments (although it is currently only tested on MacOS).
# Create a virtual environment with text required python packages.
# Note that you have to provide a python path.
text::textrpp_install_virtualenv(rpp_version = c("torch==1.7.1", "transformers==4.12.5", "numpy", "nltk"),
python_path = "/usr/local/bin/python3.9",
envname = "textrpp_virtualenv")
# Initialize the virtual environment.
text::textrpp_initialize(virtualenv = "textrpp_virtualenv",
condaenv = NULL,
save_profile = TRUE)Solving OMP errors and R/Rstudio crashes
Some macOS users may experience a crash when running functions like textEmbed() from the text package. This is due to a known conflict between multiple OpenMP libraries (e.g., libomp.dylib and libiomp5.dylib) used by Python packages such as torch and transformers.
Workaround (Automatically Applied) To prevent this crash, the package sets the following environment variables when running on macOS:
Sys.setenv(OMP_NUM_THREADS = "1")
Sys.setenv(OMP_MAX_ACTIVE_LEVELS = "1")
Sys.setenv(KMP_DUPLICATE_LIB_OK = "TRUE")
These settings: - Limit the number of OpenMP threads - Avoid nested threading issues - Instruct macOS to ignore duplicate OpenMP libraries
This workaround is safe for most users and enables smooth functionality, but note that: - It may slightly reduce parallel processing performance. - It bypasses a system-level issue rather than solving it permanently.
The text package sets OpenMP-related environment variables for compatibility with PyTorch to avoid crashes due to libomp.dylib conflicts. You can skip this behavior by setting:
Sys.setenv(TEXT_SKIP_OMP_PATCH = "TRUE")
library(text)
The exact way to install these packages may differ across systems.
Please see:
Python
torch
transformers
