`R/2_1_textTrain.R`

`textTrainLists.Rd`

Individually trains word embeddings from several text variables to several numeric or categorical variables.

```
textTrainLists(
x,
y,
force_train_method = "automatic",
save_output = "all",
method_cor = "pearson",
eval_measure = "rmse",
p_adjust_method = "holm",
...
)
```

- x
Word embeddings from textEmbed (or textEmbedLayerAggreation). It is possible to have word embeddings from one text variable and several numeric/categorical variables; or vice verse, word embeddings from several text variables to one numeric/categorical variable. It is not possible to mix numeric and categorical variables.

- y
Tibble with several numeric or categorical variables to predict. Please note that you cannot mix numeric and categorical variables.

- force_train_method
(character) Default is "automatic"; see also "regression" and "random_forest".

- save_output
(character) Option not to save all output; default "all". See also "only_results" and "only_results_predictions".

- method_cor
(character) A character string describing type of correlation (default "Pearson").

- eval_measure
(character) Type of evaluative measure to assess models on (default "rmse").

- p_adjust_method
Method to adjust/correct p-values for multiple comparisons. (default = "holm"; see also "none", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr").

- ...
Arguments from textTrainRegression or textTrainRandomForest (the textTrain function).

Correlations between predicted and observed values (t-value, degree of freedom (df), p-value, confidence interval, alternative hypothesis, correlation coefficient) stored in a dataframe.

```
# Examines how well the embeddings from Language_based_assessment_data_8 can
# predict the numerical numerical variables in Language_based_assessment_data_8.
# The training is done combination wise, i.e., correlations are tested pair wise,
# column: 1-5,1-6,2-5,2-6, resulting in a dataframe with four rows.
if (FALSE) { # \dontrun{
word_embeddings <- word_embeddings_4$texts[1:2]
ratings_data <- Language_based_assessment_data_8[5:6]
trained_model <- textTrainLists(
x = word_embeddings,
y = ratings_data
)
# Examine results (t-value, degree of freedom (df), p-value,
# alternative-hypothesis, confidence interval, correlation coefficient).
trained_model$results
} # }
```