NEWS.md
textTrainN()
including subsets
sampling (new: default change from random
to subsets
), use_same_penalty_mixture
(new:default change from FALSE
to TRUE
) and std_err
(new output).textTrainPlot()
textPredict()
functionality.textTopics()
textTopics()
trains a BERTopic model with different modules and returns the model, data, and topic_document distributions based on c-td-idftextTopicsTest()
can perform multiple tests (correlation, t-test, regression) between a BERTopic model from textTopics()
and datatextTopicsWordcloud()
can plot word clouds of topics tested with textTopicsTest()
textTopicsTree()
prints out a tree structure of the hierarchical topic structuretextEmbed()
is now fully embedding one column at the time; and reducing word_types for each column. This can break some code; and produce different results in plots where word_types are based on several embedded columns.textTrainN()
and textTrainNPlot()
evaluates prediction accuracy across number of cases.textTrainRegression()
and textTrainRandomForest
now takes tibble as input in strata.textTrainRegression()
textPredictTest()
can handle auc
textEmbed()
is faster (thanks to faster handling of aggregating layers)sort
parameter in textEmbedRawLayers()
.Possibility to use GPU for MacOS M1 and M2 chip using device = “mps” in textEmbed()
textFineTune()
as an experimental function is implemented max_length
implemented in textTranslate()
textEmbedReduce()
implementedmodel
, so that layers
= -2 works in textEmbed()
.set_verbosity
.sorting_xs_and_x_append
from Dim to Dim0 when renaming x_appended variables.first
to append_first
and made it an option in textTrainRegression()
and textTrainRandomForest()
.textEmbed()
layers = 11:12
is now second_to_last
.textEmbedRawLayers
default is now second_to_last
.textEmbedLayerAggregation()
layers = 11:12
is now layers = "all"
.textEmbed()
and textEmbedRawLayers()
x
is now called texts
.textEmbedLayerAggregation()
now uses layers = "all"
, aggregation_from_layers_to_tokens
, aggregation_from_tokens_to_texts
.textZeroShot()
is implemented.textDistanceNorm()
and textDistanceMatrix()
textDistance()
can compute cosine distance
.textModelLayers()
provides N layers for a given modelmax_token_to_sentence
in textEmbed()
aggregate_layers
is now called aggregation_from_layers_to_tokens
.aggregate_tokens
is now called aggregation_from_tokens_to_texts
. single_word_embeddings
is now called word_types_embeddings
textEmbedLayersOutput()
is now called textEmbedRawLayers()
textDimName()
textEmbed()
: dim_name
= TRUE
textEmbed()
: single_context_embeddings
= TRUE
textEmbed()
: device = “gpu”explore_words
in textPlot()
x_append_target
in textPredict()
functiontextClassify()
, textGeneration()
, textNER()
, textSum()
, textQA()
, and textTranslate()
.textClassify()
(under development)textGeneration()
(under development)textNER()
(under development)textSum()
(under development)textQA()
(under development)textTranslate()
(under development)textSentiment()
, from huggingface transformers models.textEmbed()
, textTrainRegression()
, textTrainRandomForest()
and textProjection()
.dim_names
to set unique dimension names in textEmbed()
and textEmbedStatic()
.textPreictAll()
function that can take several models, word embeddings, and variables as input to provide multiple outputs.textTrain()
functions with x_append
.model_max_length
in textEmbed()
.textModels()
show downloaded models.textModelsRemove()
deletes specified models.textDistance()
function with distance measures.textSimilarity()
.textSimilarity()
in textSimilarityTest()
, textProjection()
and textCentrality()
for plotting.textTrainRegression()
concatenates word embeddings when provided with a list of several word embeddings.word_embeddings_4$singlewords_we
.textCentrality()
, words to be plotted are selected with word_data1_all$extremes_all_x >= 1
(rather than ==1
).textSimilarityMatrix()
computes semantic similarity among all combinations in a given word embedding.textDescriptives()
gets options to remove NA and compute total scores.textDescriptives()
textWordPredictions()
(which has a trial period/not fully developed and might be removed in future versions); p-values are not yet implemented.textPlot()
for objects from both textProjection()
and textWordPredictions()
textrpp_initiate()
runs automatically in library(text)
when default environment exitstextSimilarityTest()
.textrpp_install()
installs a conda
environment with text required python packages.textrpp_install_virtualenv()
install a virtual environment with text required python packages.textrpp_initialize()
initializes installed environment.textrpp_uninstall()
uninstalls conda
environment.textEmbed()
and textEmbedLayersOutput()
support the use of GPU using the device
setting.remove_words
makes it possible to remove specific words from textProjectionPlot()
textProjetion()
and textProjetionPlot()
it now possible to add points of the aggregated word embeddings in the plottextProjetion()
it now possible to manually add words to the plot in order to explore them in the word embedding space.textProjetion()
it is possible to add color or remove words that are more frequent on the opposite “side” of its dot product projection.textProjection()
with split == quartile
, the comparison distribution is now based on the quartile data (rather than the data for mean)textEmbed()
with decontexts=TRUE
.textSimilarityTest()
is not giving error when using method = unpaired, with unequal number of participants in each group.textPredictTest()
function to significance test correlations of different models. 0.9.11This version is now on CRAN. ### New Features - Adding option to deselect the step_centre
and step_scale
in training. - Cross-validation method in textTrainRegression()
and textTrainRandomForrest()
have two options cv_folds
and validation_split
. (0.9.02) - Better handling of NA
in step_naomit
in training. - DistilBert
model works (0.9.03)
textProjectionPlot()
plots words extreme in more than just one feature (i.e., words are now plotted that satisfy, for example, both plot_n_word_extreme
and plot_n_word_frequency
). (0.9.01)textTrainRegression()
and textTrainRandomForest()
also have function that select the max evaluation measure results (before only minimum was selected all the time, which, e.g., was correct for rmse but not for r) (0.9.02)id_nr
in training and predict by using workflows (0.9.02).