Function reference • text

Installation
`textrpp_install()` `textrpp_install_virtualenv()`	Install text required python packages in conda or virtualenv environment
`textrpp_uninstall()`	Uninstall textrpp conda environment
`textrpp_initialize()`	Initialize text required python packages
Transform text to word embeddings
`textEmbed()`	Extract layers and aggregate them to word embeddings, for all character variables in a given dataframe.
`textDimName()`	Change the names of the dimensions in the word embeddings.
`textEmbedRawLayers()`	Extract layers of hidden states (word embeddings) for all character variables in a given dataframe.
`textEmbedLayerAggregation()`	Select and aggregate layers of hidden states to form a word embedding.
`textEmbedReduce()`	Pre-trained dimension reduction (experimental)
`textEmbedStatic()`	Applies word embeddings from a given decontextualized static space (such as from Latent Semantic Analyses) to all character variables
Fine-tuning
`textFineTuneTask()`	Task Adapted Pre-Training (EXPERIMENTAL - under development)
`textFineTuneDomain()`	Domain Adapted Pre-Training (EXPERIMENTAL - under development)
Language Analysis Tasks
`textClassify()`	Predict label and probability of a text using a pretrained classifier language model. (experimental)
`textGeneration()`	Predicts the words that will follow a specified text prompt. (experimental)
`textNER()`	Named Entity Recognition. (experimental)
`textSum()`	Summarize texts. (experimental)
`textQA()`	Question Answering. (experimental)
`textTranslate()`	Translation. (experimental)
`textZeroShot()`	Zero Shot Classification (Experimental)
Train word embeddings
`textTrain()`	Train word embeddings to a numeric (ridge regression) or categorical (random forest) variable.
`textTrainLists()`	Individually trains word embeddings from several text variables to several numeric or categorical variables.
`textTrainRegression()`	Train word embeddings to a numeric variable.
`textTrainRandomForest()`	Train word embeddings to a categorical variable using random forest.
`textTrainN()`	(experimental) Compute cross-validated correlations for different sample-sizes of a data set. The cross-validation process can be repeated several times to enhance the reliability of the evaluation.
`textTrainNPlot()`	(experimental) Plot cross-validated correlation coefficients across different sample-sizes from the object returned by the textTrainN function. If the number of cross-validations exceed one, then error-bars will be included in the plot.
Predict from word embeddings or text
`textPredict()`	Trained models created by e.g., textTrain() or stored on e.g., github can be used to predict new scores or classes from embeddings or text using textPredict.
`textPredictTest()`	Significance testing correlations If only y1 is provided a t-test is computed, between the absolute error from yhat1-y1 and yhat2-y1.
`textPredictAll()`	Predict from several models, selecting the correct input
Semantic similarities and distances
`textSimilarity()`	Compute the semantic similarity between two text variables.
`textDistance()`	Compute the semantic distance between two text variables.
`textSimilarityMatrix()`	Compute semantic similarity scores between all combinations in a word embedding
`textDistanceMatrix()`	Compute semantic distance scores between all combinations in a word embedding
`textSimilarityNorm()`	Compute the semantic similarity between a text variable and a word norm (i.e., a text represented by one word embedding that represent a construct).
`textDistanceNorm()`	Compute the semantic distance between a text variable and a word norm (i.e., a text represented by one word embedding that represent a construct/concept).
Plot words in the word embedding space
`textProjection()`	Compute Supervised Dimension Projection and related variables for plotting words.
`textPlot()`	Plot words from textProjection() or textWordPrediction().
`textProjectionPlot()`	Plot words according to Supervised Dimension Projection.
`textWordPrediction()`	Compute predictions based on single words for plotting words. The word embeddings of single words are trained to predict the mean value associated with that word. P-values does NOT work yet (experimental).
`textCentrality()`	Compute semantic similarity score between single words' word embeddings and the aggregated word embedding of all words.
`textCentralityPlot()`	Plot words according to semantic similarity to the aggregated word embedding.
`textPCA()`	Compute 2 PCA dimensions of the word embeddings for individual words.
`textPCAPlot()`	Plot words according to 2-D plot from 2 PCA components.
BERTopic
`textTopics()`	This function creates and trains a BERTopic model (based on bertopic python packaged) on a text-variable in a tibble/data.frame. (EXPERIMENTAL)
`textTopicsTest()`	This function tests the relationship between a single topic or all topics and a variable of interest. Available tests include correlation, t-test, linear regression, binary regression, and ridge regression. (EXPERIMENTAL - under development)
`textTopicsWordcloud()`	This functions plots wordclouds of topics from a Topic Model based on their significance determined by a linear or binary regression
`textTopicsReduce()`	textTopicsReduce (EXPERIMENTAL)
`textTopicsTree()`	textTopicsTest (EXPERIMENTAL) to get the hierarchical topic tree
View or delete downloaded HuggingFace models in R
`textModels()`	Check downloaded, available models.
`textModelLayers()`	Get the number of layers in a given model.
`textModelsRemove()`	Delete a specified model and model associated files.
Miscellaneous
`textDescriptives()`	Compute descriptive statistics of character variables.
`textTokenize()`	Tokenize according to different huggingface transformers
Example Data
`Language_based_assessment_data_8`	Text and numeric data for 10 participants.
`word_embeddings_4`	Word embeddings for 4 text variables for 40 participants
`raw_embeddings_1`	Word embeddings from textEmbedRawLayers function
`Language_based_assessment_data_3_100`	Example text and numeric data.
`DP_projections_HILS_SWLS_100`	Data for plotting a Dot Product Projection Plot.
`centrality_data_harmony`	Example data for plotting a Semantic Centrality Plot.
`PC_projections_satisfactionwords_40`	Example data for plotting a Principle Component Projection Plot.

Reference

Installation

Transform text to word embeddings

Fine-tuning

Language Analysis Tasks

Train word embeddings

Predict from word embeddings or text

Semantic similarities and distances

Plot words in the word embedding space

BERTopic

View or delete downloaded HuggingFace models in R

Miscellaneous

Example Data