`R/2_5_textTrainN.R`

`textTrainN.Rd`

(experimental) Compute cross-validated correlations for different sample-sizes of a data set. The cross-validation process can be repeated several times to enhance the reliability of the evaluation.

```
textTrainN(
x = word_embeddings_4$texts$harmonytext,
y = Language_based_assessment_data_8$hilstotal,
sample_percents = c(25, 50, 75, 100),
n_cross_val = 1,
seed = 2023
)
```

- x
Word embeddings from textEmbed (or textEmbedLayerAggregation). If several word embedding are provided in a list they will be concatenated.

- y
Numeric variable to predict.

- sample_percents
(numeric) Numeric vector that specifies the percentages of the total number of data points to include in each sample (default = c(25,50,75,100), i.e., correlations are evaluated for 25 the datapoints). The datapoints in each sample are chosen randomly for each new sample.

- n_cross_val
(numeric) Value that determines the number of times to repeat the cross-validation. (default = 1, i.e., cross-validation is only performed once). Warning: The training process gets proportionately slower to the number of cross-validations, resulting in a time complexity that increases with a factor of n (n cross-validations).

- seed
(numeric) Set different seed (default = 2023).

A tibble containing correlations for each sample. If n_cross_val > 1, correlations for each new cross-validation, along with standard-deviation and mean correlation is included in the tibble. The information in the tibble is visualised via the textTrainNPlot function.

See `textTrainNPlot`

.

```
# Compute correlations for 25%, 50%, 75% and 100% of the data in word_embeddings and perform
# cross-validation thrice.
if (FALSE) {
tibble_to_plot <- textTrainN(
x = word_embeddings_4$texts$harmonytext,
y = Language_based_assessment_data_8$hilstotal,
sample_percents = c(25,50,75,100),
n_cross_val = 3,
)
# tibble_to_plot contains correlation-coefficients for each cross_validation and
# standard deviation and mean value for each sample. The tibble can be plotted
# using the testTrainNPlot function.
# Examine tibble
tibble_to_plot
}
```