`R/3_1_textSimilarity.R`

`textDistance.Rd`

Compute the semantic distance between two text variables.

`textDistance(x, y, method = "euclidean", center = FALSE, scale = FALSE)`

- x
Word embeddings (from textEmbed).

- y
Word embeddings (from textEmbed).

- method
Character string describing type of measure to be computed; default is "euclidean" (see also measures from stats:dist() including "maximum", "manhattan", "canberra", "binary" and "minkowski". It is also possible to use "cosine", which computes the cosine distance (i.e., 1 - cosine(x, y)).

- center
(boolean; from base::scale) If center is TRUE then centering is done by subtracting the embedding mean (omitting NAs) of x from each of its dimension, and if center is FALSE, no centering is done.

- scale
(boolean; from base::scale) If scale is TRUE then scaling is done by dividing the (centered) embedding dimensions by the standard deviation of the embedding if center is TRUE, and the root mean square otherwise.

A vector comprising semantic distance scores.

```
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
distance_scores <- textDistance(
x = word_embeddings_4$texts$harmonytext,
y = word_embeddings_4$texts$satisfactiontext
)
comment(distance_scores)
#> [1] "x embedding = .Information about the embeddings. textEmbedRawLayers: model: bert-base-uncased ; layers: 11 ; word_type_embeddings: TRUE ; max_token_to_sentence: 4 ; text_version: 0.9.99. textEmbedLayerAggregation: layers = 11 aggregation_from_layers_to_tokens = concatenate aggregation_from_tokens_to_texts = mean tokens_select = tokens_deselect = .y embedding = .Information about the embeddings. textEmbedRawLayers: model: bert-base-uncased ; layers: 11 ; word_type_embeddings: TRUE ; max_token_to_sentence: 4 ; text_version: 0.9.99. textEmbedLayerAggregation: layers = 11 aggregation_from_layers_to_tokens = concatenate aggregation_from_tokens_to_texts = mean tokens_select = tokens_deselect = .method = .euclidean.center = .FALSE.scale = .FALSE"
```