Changelog
Source:NEWS.md
text 1.3.0
CRAN release: 2024-12-05
- Alias function:
textPredict()
,textAssess()
andtextClassify()
. - LBAM integration with
textLBAM()
. - Full support of implicit motives models.
- Text cleaning functionality with
textClean()
(removing common personal information). - Compatability with the topics-package, see www.r-topics.org.
text 1.2.17
-
textLBAM()
returns the library as a datafram
text 1.2.16
-
textPredict()
detectsmodel_type
. - Instead of having to specify the URL, one can now specify the model name from the Language-Based Assessmet Model (L-BAM) library.
- Including default option to download an updated version of the L-BAM file
text 1.2.8 - 1.2.13
- fixing bugs related to text prediction functions
- adding method_typ = “texttrained” and “finetuned”
- streamlining code for implicit motives output
- adding
textFindNonASCII()
function and feature intextEmbed()
to warn and clean non-ASCII characters. This may change results slightly. - removed
type
parameter in textPredict() and instead giving both probability and class.
text 1.2.7
-
textClassify()
is now calledtextClassifyPipe()
-
textPredict()
is now calledtextPredictR()
- Making
textAssess()
,textPredict()
andtextClassify()
works the same, now taking the parametermethod
with the string “text” to using textPredict(), and “huggingface” to using textClassifyPipe().
text 1.2.6
- updating python code, including adding parameters
hg_gated
,hg_token
, andtrust_remote_code
. - changed parameter name from
return_incorrect_results
toforce_return_results
- changed default of
function_to_apply
= NULL instead of “none”; this is to mimic huggingface default. -
textWordPrediction
since it is under development and note tested.
text 1.2.5
- updating security issues with python packages.
- updating the default range of penalties in textTrain() functions.
- updating textPredict() functionality
text 1.2.2
- Improving
textTrainN()
includingsubsets
sampling (new: default change fromrandom
tosubsets
),use_same_penalty_mixture
(new:default change fromFALSE
toTRUE
) andstd_err
(new output). - Improving
textTrainPlot()
text 1.2.1
CRAN release: 2024-04-22
- Improving
textPredict()
functionality. - Implementing experimental features related to
textTopics()
text 1.2
Functions
-
textTopics()
trains a BERTopic model with different modules and returns the model, data, and topic_document distributions based on c-td-idf -
textTopicsTest()
can perform multiple tests (correlation, t-test, regression) between a BERTopic model fromtextTopics()
and data -
textTopicsWordcloud()
can plot word clouds of topics tested withtextTopicsTest()
-
textTopicsTree()
prints out a tree structure of the hierarchical topic structure
text 1.1
Functions
-
textEmbed()
is now fully embedding one column at the time; and reducing word_types for each column. This can break some code; and produce different results in plots where word_types are based on several embedded columns. -
textTrainN()
andtextTrainNPlot()
evaluates prediction accuracy across number of cases. -
textTrainRegression()
andtextTrainRandomForest
now takes tibble as input in strata.
text 1.0
CRAN release: 2023-08-09
Function
- multinomial regression in
textTrainRegression()
-
textPredictTest()
can handleauc
-
textEmbed()
is faster (thanks to faster handling of aggregating layers) - Added
sort
parameter intextEmbedRawLayers()
.
text 0.9.99.9
Function
Possibility to use GPU for MacOS M1 and M2 chip using device = “mps” in textEmbed()
text 0.9.99.8
Function
textFineTune()
as an experimental function is implemented max_length
implemented in textTranslate()
text 0.9.99.7
Function
-
textEmbedReduce()
implemented
text 0.9.99.3
Bug Fix
- changed hard coded “bert-base-uncased” to
model
, so thatlayers
= -2 works intextEmbed()
. - Update logging level critical using integer 50 with
set_verbosity
. - changed in
sorting_xs_and_x_append
from Dim to Dim0 when renaming x_appended variables. - changed
first
toappend_first
and made it an option intextTrainRegression()
andtextTrainRandomForest()
.
text 0.9.99.2
CRAN release: 2022-09-20
DEFAULT CHANGES
- The default setting of textEmbed() is now providing token-level embeddings and text-level embeddings. Word_type embeddings are optional.
- In
textEmbed()
layers = 11:12
is nowsecond_to_last
. - In
textEmbedRawLayers
default is nowsecond_to_last
. - In
textEmbedLayerAggregation()
layers = 11:12
is nowlayers = "all"
. - In
textEmbed()
andtextEmbedRawLayers()
x
is now calledtexts
. -
textEmbedLayerAggregation()
now useslayers = "all"
,aggregation_from_layers_to_tokens
,aggregation_from_tokens_to_texts
.
New Function
-
textZeroShot()
is implemented. -
textDistanceNorm()
andtextDistanceMatrix()
-
textDistance()
can compute cosinedistance
. -
textModelLayers()
provides N layers for a given model
New Setting
max_token_to_sentence
in textEmbed()
Setting name changes
-
aggregate_layers
is now calledaggregation_from_layers_to_tokens
. -
aggregate_tokens
is now calledaggregation_from_tokens_to_texts
.single_word_embeddings
is now calledword_types_embeddings
Function name changes
-
textEmbedLayersOutput()
is now calledtextEmbedRawLayers()
text 0.9.98
- adding
textDimName()
- DEFAULT CHANGE in
textEmbed()
:dim_name
=TRUE
- DEFAULT CHANGE in
textEmbed()
:single_context_embeddings
=TRUE
- DEFAULT CHANGE in
textEmbed()
: device = “gpu” - Adding specific layer aggregations for
explore_words
intextPlot()
- Adding
x_append_target
intextPredict()
function
text 0.9.97
- updating
textClassify()
,textGeneration()
,textNER()
,textSum()
,textQA()
, andtextTranslate()
.
text 0.9.96
text 0.9.95
New features
-
textClassify()
(under development) -
textGeneration()
(under development) -
textNER()
(under development) -
textSum()
(under development) -
textQA()
(under development) -
textTranslate()
(under development)
text 0.9.93
New features
- New function:
textSentiment()
, from huggingface transformers models. - add progression for time consuming functions including
textEmbed()
,textTrainRegression()
,textTrainRandomForest()
andtextProjection()
.
text 0.9.92
New features
- Option
dim_names
to set unique dimension names intextEmbed()
andtextEmbedStatic()
. -
textPreictAll()
function that can take several models, word embeddings, and variables as input to provide multiple outputs. - option to add variables to the embeddings in
textTrain()
functions withx_append
.
text 0.9.91
text 0.9.90
CRAN release: 2022-05-30
text 0.9.80
New features
- Option to set
model_max_length
intextEmbed()
. -
textModels()
show downloaded models. -
textModelsRemove()
deletes specified models.
text 0.9.70
New Features
- Inclusion of
textDistance()
function with distance measures. - Adding more measures to
textSimilarity()
. - Adding functionality from
textSimilarity()
intextSimilarityTest()
,textProjection()
andtextCentrality()
for plotting. - Adding information about how
textTrainRegression()
concatenates word embeddings when provided with a list of several word embeddings. - Adding two word embedding dimensions to example data of single word embeddings to match the 10 of the contextualized embeddings in
word_embeddings_4$singlewords_we
.
Bug Fixes
- In
textCentrality()
, words to be plotted are selected withword_data1_all$extremes_all_x >= 1
(rather than==1
).
text 0.9.60
-
textSimilarityMatrix()
computes semantic similarity among all combinations in a given word embedding.
text 0.9.54
-
textDescriptives()
gets options to remove NA and compute total scores.
text 0.9.53
- inclusion of
textDescriptives()
text 0.9.20
New Features
- New functions being tested:
textWordPredictions()
(which has a trial period/not fully developed and might be removed in future versions); p-values are not yet implemented. - Possibility to use
textPlot()
for objects from bothtextProjection()
andtextWordPredictions()
text 0.9.17
New Features
-
textrpp_initiate()
runs automatically inlibrary(text)
when default environment exits - Python warnings a captured in embedding comments
- Option to print python options to console
- Updated the permutation test for plotting and
textSimilarityTest()
.
text 0.9.16
New Features
-
textrpp_install()
installs aconda
environment with text required python packages. -
textrpp_install_virtualenv()
install a virtual environment with text required python packages. -
textrpp_initialize()
initializes installed environment. -
textrpp_uninstall()
uninstallsconda
environment.
text 0.9.13
New Features
-
textEmbed()
andtextEmbedLayersOutput()
support the use of GPU using thedevice
setting. -
remove_words
makes it possible to remove specific words fromtextProjectionPlot()
text 0.9.12
New Features
- In
textProjetion()
andtextProjetionPlot()
it now possible to add points of the aggregated word embeddings in the plot - In
textProjetion()
it now possible to manually add words to the plot in order to explore them in the word embedding space. - In
textProjetion()
it is possible to add color or remove words that are more frequent on the opposite “side” of its dot product projection. - In
textProjection()
withsplit == quartile
, the comparison distribution is now based on the quartile data (rather than the data for mean)
Bug Fixes
- If any of the tokens to remove is “[CLS]”, subtract 1 on token_id so that it works with layer_aggregation_helper. 0.9.11
- Can now submit one word to
textEmbed()
withdecontexts=TRUE
.
text 0.9.11
-
textSimilarityTest()
is not giving error when using method = unpaired, with unequal number of participants in each group.
New Features
-
textPredictTest()
function to significance test correlations of different models. 0.9.11
text 0.9.10
CRAN release: 2020-12-14
This version is now on CRAN. ### New Features - Adding option to deselect the step_centre
and step_scale
in training. - Cross-validation method in textTrainRegression()
and textTrainRandomForrest()
have two options cv_folds
and validation_split
. (0.9.02) - Better handling of NA
in step_naomit
in training. - DistilBert
model works (0.9.03)
Bug Fixes
-
textProjectionPlot()
plots words extreme in more than just one feature (i.e., words are now plotted that satisfy, for example, bothplot_n_word_extreme
andplot_n_word_frequency
). (0.9.01) -
textTrainRegression()
andtextTrainRandomForest()
also have function that select the max evaluation measure results (before only minimum was selected all the time, which, e.g., was correct for rmse but not for r) (0.9.02) - removed
id_nr
in training and predict by using workflows (0.9.02).