Skip to content

textTrainLists() individually trains word embeddings from several text variables to several numeric or categorical variables.

Usage

textTrainLists(
  x,
  y,
  force_train_method = "automatic",
  save_output = "all",
  method_cor = "pearson",
  eval_measure = "rmse",
  p_adjust_method = "holm",
  ...
)

Arguments

x

Word embeddings from textEmbed (or textEmbedLayerAggreation). It is possible to have word embeddings from one text variable and several numeric/categorical variables; or vice verse, word embeddings from several text variables to one numeric/categorical variable. It is not possible to mix numeric and categorical variables.

y

Tibble with several numeric or categorical variables to predict. Please note that you cannot mix numeric and categorical variables.

force_train_method

(character) Default is "automatic"; see also "regression" and "random_forest".

save_output

(character) Option not to save all output; default "all". See also "only_results" and "only_results_predictions".

method_cor

(character) A character string describing type of correlation (default "Pearson").

eval_measure

(character) Type of evaluative measure to assess models on (default "rmse").

p_adjust_method

Method to adjust/correct p-values for multiple comparisons. (default = "holm"; see also "none", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr").

...

Arguments from textTrainRegression or textTrainRandomForest (the textTrain function).

Value

Correlations between predicted and observed values (t-value, degree of freedom (df), p-value, confidence interval, alternative hypothesis, correlation coefficient) stored in a dataframe.

Examples

# Examines how well the embeddings from Language_based_assessment_data_8 can
# predict the numerical numerical variables in Language_based_assessment_data_8.
# The training is done combination wise, i.e., correlations are tested pair wise,
# column: 1-5,1-6,2-5,2-6, resulting in a dataframe with four rows.

if (FALSE) { # \dontrun{
word_embeddings <- word_embeddings_4$texts[1:2]
ratings_data <- Language_based_assessment_data_8[5:6]

trained_model <- textTrainLists(
  x = word_embeddings,
  y = ratings_data
)

# Examine results (t-value, degree of freedom (df), p-value,
# alternative-hypothesis, confidence interval, correlation coefficient).

trained_model$results
} # }

GitHub