R/2_4_textPredict.R
textPredict.Rd
Trained models created by e.g., textTrain() or stored on e.g., github can be used to predict new scores or classes from embeddings or text using textPredict.
textPredict(
model_info = NULL,
word_embeddings = NULL,
texts = NULL,
x_append = NULL,
type = NULL,
dim_names = TRUE,
save_model = TRUE,
threshold = NULL,
show_texts = FALSE,
device = "cpu",
user_id = NULL,
story_id = NULL,
dataset = NULL,
...
)
(character or r-object) model_info has three options. 1: R model object (e.g, saved output from textTrain). 2:link to github-model (e.g, "https://github.com/CarlViggo/pretrained_swls_model/raw/main/trained_github_model_logistic.RDS"). 3: Path to a model stored locally (e.g, "path/to/your/model").
(tibble) Embeddings from e.g., textEmbed(). If you're using a pretrained model, then texts and embeddings cannot be submitted simultaneously (default = NULL).
(character) Text to predict. If this argument is specified, then arguments "word_embeddings" and "premade embeddings" cannot be defined (default = NULL).
(tibble) Variables to be appended after the word embeddings (x).
(character) Defines what output to give after logistic regression prediction. Either probabilities, classifications or both are returned (default = "class". For probabilities use "prob". For both use "class_prob").
(boolean) Account for specific dimension names from textEmbed() (rather than generic names including Dim1, Dim2 etc.). If FALSE the models need to have been trained on word embeddings created with dim_names FALSE, so that embeddings were only called Dim1, Dim2 etc.
(boolean) The model will by default be saved in your work-directory (default = TRUE). If the model already exists in your work-directory, it will automatically be loaded from there.
(numeric) Determine threshold if you are using a logistic model (default = 0.5).
(boolean) Show texts together with predictions (default = FALSE).
Name of device to use: 'cpu', 'gpu', 'gpu:k' or 'mps'/'mps:k' for MacOS, where k is a specific device number such as 'mps:1'.
(list) user_id associates sentences with their writers. User_id must be defined when calculating implicit motives. (default = NULL) shown (default = FALSE).
(list) story_id associates sentences with their stories. If story_id is defined, then the mean of the current and previous word-embedding per story-id will be calculated. (default = NULL)
(R-object, tibble) Insert your data here to integrate predictions to dataset, (default = NULL).
Setting from stats::predict can be called.
Predictions from word-embedding or text input.
See textTrain
, textTrainLists
and
textTrainRandomForest
.
if (FALSE) {
# Text data from Language_based_assessment_data_8
text_to_predict <- "I am not in harmony in my life as much as I would like to be."
# Example 1: (predict using pre-made embeddings and an R model-object)
prediction1 <- textPredict(
trained_model,
word_embeddings_4$texts$satisfactiontexts
)
# Example 2: (predict using a pretrained github model)
prediction3 <- textPredict(
texts = text_to_predict,
model_info = "https://github.com/CarlViggo/pretrained-models/raw/main/trained_hils_model.RDS"
)
# Example 3: (predict using a pretrained logistic github model and return
# probabilities and classifications)
prediction4 <- textPredict(
texts = text_to_predict,
model_info = "https://github.com/CarlViggo/pretrained-models/raw/main/
trained_github_model_logistic.RDS",
type = "class_prob",
threshold = 0.7
)
##### Automatic implicit motive coding section ######
# Create example dataset
implicit_motive_data <- dplyr::mutate(.data = Language_based_assessment_data_8,
user_id = row_number())
# Code implicit motives. (In this example, person_class will be NaN due to the absence of
# sentences classified as 'power')
implicit_motives <- textPredict(
texts = implicit_motive_data$satisfactiontexts,
model_info = "power",
user_id = implicit_motive_data$user_id,
dataset = implicit_motive_data
)
# Examine results
implicit_motives$sentence_predictions
implicit_motives$person_predictions
}
if (FALSE) {
# Examine the correlation between the predicted values and
# the Satisfaction with life scale score (pre-included in text).
psych::corr.test(
predictions1$word_embeddings__ypred,
Language_based_assessment_data_8$swlstotal
)
}