Installation |
|
---|---|
Install text required python packages in conda or virtualenv environment |
|
Uninstall textrpp conda environment |
|
Initialize text required python packages |
|
Transform text to word embeddings |
|
Extract layers and aggregate them to word embeddings, for all character variables in a given dataframe. |
|
Change the names of the dimensions in the word embeddings. |
|
Extract layers of hidden states (word embeddings) for all character variables in a given dataframe. |
|
Select and aggregate layers of hidden states to form a word embedding. |
|
Pre-trained dimension reduction (experimental) |
|
Applies word embeddings from a given decontextualized static space (such as from Latent Semantic Analyses) to all character variables |
|
Fine-tuning |
|
Task Adapted Pre-Training (EXPERIMENTAL - under development) |
|
Domain Adapted Pre-Training (EXPERIMENTAL - under development) |
|
Language Analysis Tasks |
|
Predict label and probability of a text using a pretrained classifier language model. (experimental) |
|
Predicts the words that will follow a specified text prompt. (experimental) |
|
Named Entity Recognition. (experimental) |
|
Summarize texts. (experimental) |
|
Question Answering. (experimental) |
|
Translation. (experimental) |
|
Zero Shot Classification (Experimental) |
|
Train word embeddings |
|
Train word embeddings to a numeric (ridge regression) or categorical (random forest) variable. |
|
Individually trains word embeddings from several text variables to several numeric or categorical variables. |
|
Train word embeddings to a numeric variable. |
|
Train word embeddings to a categorical variable using random forest. |
|
(experimental) Compute cross-validated correlations for different sample-sizes of a data set. The cross-validation process can be repeated several times to enhance the reliability of the evaluation. |
|
(experimental) Plot cross-validated correlation coefficients across different sample-sizes from the object returned by the textTrainN function. If the number of cross-validations exceed one, then error-bars will be included in the plot. |
|
Predict from word embeddings or text |
|
Trained models created by e.g., textTrain() or stored on e.g., github can be used to predict new scores or classes from embeddings or text using textPredict. |
|
Significance testing correlations If only y1 is provided a t-test is computed, between the absolute error from yhat1-y1 and yhat2-y1. |
|
Predict from several models, selecting the correct input |
|
Semantic similarities and distances |
|
Compute the semantic similarity between two text variables. |
|
Compute the semantic distance between two text variables. |
|
Compute semantic similarity scores between all combinations in a word embedding |
|
Compute semantic distance scores between all combinations in a word embedding |
|
Compute the semantic similarity between a text variable and a word norm (i.e., a text represented by one word embedding that represent a construct). |
|
Compute the semantic distance between a text variable and a word norm (i.e., a text represented by one word embedding that represent a construct/concept). |
|
Plot words in the word embedding space |
|
Compute Supervised Dimension Projection and related variables for plotting words. |
|
Plot words from textProjection() or textWordPrediction(). |
|
Plot words according to Supervised Dimension Projection. |
|
Compute predictions based on single words for plotting words. The word embeddings of single words are trained to predict the mean value associated with that word. P-values does NOT work yet (experimental). |
|
Compute semantic similarity score between single words' word embeddings and the aggregated word embedding of all words. |
|
Plot words according to semantic similarity to the aggregated word embedding. |
|
Compute 2 PCA dimensions of the word embeddings for individual words. |
|
Plot words according to 2-D plot from 2 PCA components. |
|
BERTopic |
|
This function creates and trains a BERTopic model (based on bertopic python packaged) on a text-variable in a tibble/data.frame. (EXPERIMENTAL) |
|
This function tests the relationship between a single topic or all topics and a variable of interest. Available tests include correlation, t-test, linear regression, binary regression, and ridge regression. (EXPERIMENTAL - under development) |
|
This functions plots wordclouds of topics from a Topic Model based on their significance determined by a linear or binary regression |
|
textTopicsReduce (EXPERIMENTAL) |
|
textTopicsTest (EXPERIMENTAL) to get the hierarchical topic tree |
|
View or delete downloaded HuggingFace models in R |
|
Check downloaded, available models. |
|
Get the number of layers in a given model. |
|
Delete a specified model and model associated files. |
|
Miscellaneous |
|
Compute descriptive statistics of character variables. |
|
Tokenize according to different huggingface transformers |
|
Example Data |
|
Text and numeric data for 10 participants. |
|
Word embeddings for 4 text variables for 40 participants |
|
Word embeddings from textEmbedRawLayers function |
|
Example text and numeric data. |
|
Data for plotting a Dot Product Projection Plot. |
|
Example data for plotting a Semantic Centrality Plot. |
|
Example data for plotting a Principle Component Projection Plot. |