text 1.8.0
- Implementing a faster way to embed (textEmbed() get slightly different embeddings for longer texts, because we are using a sliding window when there are two many tokens for the LLM.)
- Provides an id variable in the text embedding output.
text 1.7.0
CRAN release: 2025-09-01
- Streamlines the installation procedure and adds clearer feedback about required system-level dependencies.
- Uses
textrpp-py 0.1.0by default to set up the Python environment in a robust and reproducible way. - Updates and expands the installation instructions on the website.
- Sets the default of the
deviceargument to GPU when available, to take advantage of hardware acceleration.
text 1.6.2
- corrected
remove_non_asciiparameter intextEmbed(). # text 1.6.1 - improved installation procedure with more detailed feedback.
text 1.5.6
- Improved selection of examples in textExamples()
- On MacOS setting OMP settings at start-up.
text 1.5.5
- added bootstrap_difference and replaced “bootstrap” with “bootstrap_overlap” in the textPredictTest() function.
text 1.5.4
- changing
textTrainExamples()totextExamples()and improving the filter_word function.
text 1.5.1
- adding matrix/box legend for
textTrainExamples(). - improving functionality of
textTopics().
text 1.5
CRAN release: 2025-05-02
- added
save_output= “no_plot” intextTrainRegression()for “logistic” and “multinomial” to reduce model size of saved objects.
text 1.4.7
- Added checks for matching
word_embeddingsandmodelrequirements in thetextPredict()function. This is controlled via the newcheck_matching_word_embeddingsparameter, which validates compatibility of model type, layers, and aggregation settings. - Added a name parameter to the
textDimName()function, allowing users to specify or change the name suffix for word embedding dimensions. - Improved the
dim_names=FALSEbehavior in thetextDimName()function to also ignore model-required dimension suffixes. Now includes clearer and more informative warnings when dimension mismatches occur.
text 1.4.6
- updating from the depracated
rsample::function validation_split()toinitial_validation_split(). However, this changes some results intextTrainRegression()andtextTrainRandomForrest(). - updating
textLBAM()to takeconstruct_startparameter.
text 1.4.5
- removing objects in the environment of
textTrainRegression()to reduce saved model sizes.
text 1.4.2
- fixing bug in layer selection in
textEmbedRawLayers()(when using default -2, layer 11 was selected even for large models). This was never a problem intextEmbed().
text 1.4.1
- adding the
dlatk_methodto thetextEmbed()function.
text 1.4
CRAN release: 2025-03-18
- adding
cv_method= “group_cv” in thetextTrainRegression()function.
text 1.3.6
- fixing python dependency (aiohappyeyeballs)
- adding parameter
plot_n_word_randomandlegend_number_colourin textPlot. - removed
nltkwarning when running the functions requiring pyhon. - anchouring group word embeddings in the
textProjection()function. - adding cohen’s d to the output of the
textProjection()function
text 1.3.4
- harmonizing wordclouds with topics-package
- implementing
textTrainExamples() - updating legend plots.
text 1.3.0
CRAN release: 2024-12-05
- Alias function:
textPredict(),textAssess()andtextClassify(). - LBAM integration with
textLBAM(). - Full support of implicit motives models.
- Text cleaning functionality with
textClean()(removing common personal information). - Compatability with the topics-package, see www.r-topics.org.
text 1.2.17
-
textLBAM()returns the library as a datafram
text 1.2.16
-
textPredict()detectsmodel_type. - Instead of having to specify the URL, one can now specify the model name from the Language-Based Assessmet Model (L-BAM) library.
- Including default option to download an updated version of the L-BAM file
text 1.2.8 - 1.2.13
- fixing bugs related to text prediction functions
- adding method_typ = “texttrained” and “finetuned”
- streamlining code for implicit motives output
- adding
textFindNonASCII()function and feature intextEmbed()to warn and clean non-ASCII characters. This may change results slightly. - removed
typeparameter in textPredict() and instead giving both probability and class.
text 1.2.7
-
textClassify()is now calledtextClassifyPipe() -
textPredict()is now calledtextPredictR() - Making
textAssess(),textPredict()andtextClassify()works the same, now taking the parametermethodwith the string “text” to using textPredict(), and “huggingface” to using textClassifyPipe().
text 1.2.6
- updating python code, including adding parameters
hg_gated,hg_token, andtrust_remote_code. - changed parameter name from
return_incorrect_resultstoforce_return_results - changed default of
function_to_apply= NULL instead of “none”; this is to mimic huggingface default. -
textWordPredictionsince it is under development and note tested.
text 1.2.5
- updating security issues with python packages.
- updating the default range of penalties in textTrain() functions.
- updating textPredict() functionality
text 1.2.2
- Improving
textTrainN()includingsubsetssampling (new: default change fromrandomtosubsets),use_same_penalty_mixture(new:default change fromFALSEtoTRUE) andstd_err(new output). - Improving
textTrainPlot()
text 1.2.1
CRAN release: 2024-04-22
- Improving
textPredict()functionality. - Implementing experimental features related to
textTopics()
text 1.2
Functions
-
textTopics()trains a BERTopic model with different modules and returns the model, data, and topic_document distributions based on c-td-idf -
textTopicsTest()can perform multiple tests (correlation, t-test, regression) between a BERTopic model fromtextTopics()and data -
textTopicsWordcloud()can plot word clouds of topics tested withtextTopicsTest() -
textTopicsTree()prints out a tree structure of the hierarchical topic structure
text 1.1
Functions
-
textEmbed()is now fully embedding one column at the time; and reducing word_types for each column. This can break some code; and produce different results in plots where word_types are based on several embedded columns. -
textTrainN()andtextTrainNPlot()evaluates prediction accuracy across number of cases. -
textTrainRegression()andtextTrainRandomForestnow takes tibble as input in strata.
text 1.0
CRAN release: 2023-08-09
Function
- multinomial regression in
textTrainRegression() -
textPredictTest()can handleauc -
textEmbed()is faster (thanks to faster handling of aggregating layers) - Added
sortparameter intextEmbedRawLayers().
text 0.9.99.9
Function
Possibility to use GPU for MacOS M1 and M2 chip using device = “mps” in textEmbed()
text 0.9.99.8
Function
textFineTune() as an experimental function is implemented max_length implemented in textTranslate()
text 0.9.99.7
Function
-
textEmbedReduce()implemented
text 0.9.99.3
Bug Fix
- changed hard coded “bert-base-uncased” to
model, so thatlayers= -2 works intextEmbed(). - Update logging level critical using integer 50 with
set_verbosity. - changed in
sorting_xs_and_x_appendfrom Dim to Dim0 when renaming x_appended variables. - changed
firsttoappend_firstand made it an option intextTrainRegression()andtextTrainRandomForest().
text 0.9.99.2
CRAN release: 2022-09-20
DEFAULT CHANGES
- The default setting of textEmbed() is now providing token-level embeddings and text-level embeddings. Word_type embeddings are optional.
- In
textEmbed()layers = 11:12is nowsecond_to_last. - In
textEmbedRawLayersdefault is nowsecond_to_last. - In
textEmbedLayerAggregation()layers = 11:12is nowlayers = "all". - In
textEmbed()andtextEmbedRawLayers()xis now calledtexts. -
textEmbedLayerAggregation()now useslayers = "all",aggregation_from_layers_to_tokens,aggregation_from_tokens_to_texts.
New Function
-
textZeroShot()is implemented. -
textDistanceNorm()andtextDistanceMatrix() -
textDistance()can compute cosinedistance. -
textModelLayers()provides N layers for a given model
New Setting
max_token_to_sentence in textEmbed()
Setting name changes
-
aggregate_layersis now calledaggregation_from_layers_to_tokens. -
aggregate_tokensis now calledaggregation_from_tokens_to_texts.single_word_embeddingsis now calledword_types_embeddings
Function name changes
-
textEmbedLayersOutput()is now calledtextEmbedRawLayers()
text 0.9.98
- adding
textDimName() - DEFAULT CHANGE in
textEmbed():dim_name=TRUE - DEFAULT CHANGE in
textEmbed():single_context_embeddings=TRUE - DEFAULT CHANGE in
textEmbed(): device = “gpu” - Adding specific layer aggregations for
explore_wordsintextPlot() - Adding
x_append_targetintextPredict()function
text 0.9.97
- updating
textClassify(),textGeneration(),textNER(),textSum(),textQA(), andtextTranslate().
text 0.9.96
text 0.9.95
New features
-
textClassify()(under development) -
textGeneration()(under development) -
textNER()(under development) -
textSum()(under development) -
textQA()(under development) -
textTranslate()(under development)
text 0.9.93
New features
- New function:
textSentiment(), from huggingface transformers models. - add progression for time consuming functions including
textEmbed(),textTrainRegression(),textTrainRandomForest()andtextProjection().
text 0.9.92
New features
- Option
dim_namesto set unique dimension names intextEmbed()andtextEmbedStatic(). -
textPreictAll()function that can take several models, word embeddings, and variables as input to provide multiple outputs. - option to add variables to the embeddings in
textTrain()functions withx_append.
text 0.9.91
text 0.9.90
CRAN release: 2022-05-30
text 0.9.80
New features
- Option to set
model_max_lengthintextEmbed(). -
textModels()show downloaded models. -
textModelsRemove()deletes specified models.
text 0.9.70
New Features
- Inclusion of
textDistance()function with distance measures. - Adding more measures to
textSimilarity(). - Adding functionality from
textSimilarity()intextSimilarityTest(),textProjection()andtextCentrality()for plotting. - Adding information about how
textTrainRegression()concatenates word embeddings when provided with a list of several word embeddings. - Adding two word embedding dimensions to example data of single word embeddings to match the 10 of the contextualized embeddings in
word_embeddings_4$singlewords_we.
Bug Fixes
- In
textCentrality(), words to be plotted are selected withword_data1_all$extremes_all_x >= 1(rather than==1).
text 0.9.60
-
textSimilarityMatrix()computes semantic similarity among all combinations in a given word embedding.
text 0.9.54
-
textDescriptives()gets options to remove NA and compute total scores.
text 0.9.53
- inclusion of
textDescriptives()
text 0.9.20
New Features
- New functions being tested:
textWordPredictions()(which has a trial period/not fully developed and might be removed in future versions); p-values are not yet implemented. - Possibility to use
textPlot()for objects from bothtextProjection()andtextWordPredictions()
text 0.9.17
New Features
-
textrpp_initiate()runs automatically inlibrary(text)when default environment exits - Python warnings a captured in embedding comments
- Option to print python options to console
- Updated the permutation test for plotting and
textSimilarityTest().
text 0.9.16
New Features
-
textrpp_install()installs acondaenvironment with text required python packages. -
textrpp_install_virtualenv()install a virtual environment with text required python packages. -
textrpp_initialize()initializes installed environment. -
textrpp_uninstall()uninstallscondaenvironment.
text 0.9.13
New Features
-
textEmbed()andtextEmbedLayersOutput()support the use of GPU using thedevicesetting. -
remove_wordsmakes it possible to remove specific words fromtextProjectionPlot()
text 0.9.12
New Features
- In
textProjetion()andtextProjetionPlot()it now possible to add points of the aggregated word embeddings in the plot - In
textProjetion()it now possible to manually add words to the plot in order to explore them in the word embedding space. - In
textProjetion()it is possible to add color or remove words that are more frequent on the opposite “side” of its dot product projection. - In
textProjection()withsplit == quartile, the comparison distribution is now based on the quartile data (rather than the data for mean)
Bug Fixes
- If any of the tokens to remove is “[CLS]”, subtract 1 on token_id so that it works with layer_aggregation_helper. 0.9.11
- Can now submit one word to
textEmbed()withdecontexts=TRUE.
text 0.9.11
-
textSimilarityTest()is not giving error when using method = unpaired, with unequal number of participants in each group.
New Features
-
textPredictTest()function to significance test correlations of different models. 0.9.11
text 0.9.10
CRAN release: 2020-12-14
This version is now on CRAN. ### New Features - Adding option to deselect the step_centre and step_scale in training. - Cross-validation method in textTrainRegression() and textTrainRandomForrest() have two options cv_folds and validation_split. (0.9.02) - Better handling of NA in step_naomit in training. - DistilBert model works (0.9.03)
Bug Fixes
-
textProjectionPlot()plots words extreme in more than just one feature (i.e., words are now plotted that satisfy, for example, bothplot_n_word_extremeandplot_n_word_frequency). (0.9.01) -
textTrainRegression()andtextTrainRandomForest()also have function that select the max evaluation measure results (before only minimum was selected all the time, which, e.g., was correct for rmse but not for r) (0.9.02) - removed
id_nrin training and predict by using workflows (0.9.02).

