Compute the semantic distance between two text variables.

textDistance(x, y, method = "euclidean", center = FALSE, scale = FALSE)

Arguments

x

Word embeddings (from textEmbed).

y

Word embeddings (from textEmbed).

method

Character string describing type of measure to be computed; default is "euclidean" (see also measures from stats:dist() including "maximum", "manhattan", "canberra", "binary" and "minkowski". It is also possible to use "cosine", which computes the cosine distance (i.e., 1 - cosine(x, y)).

center

(boolean; from base::scale) If center is TRUE then centering is done by subtracting the embedding mean (omitting NAs) of x from each of its dimension, and if center is FALSE, no centering is done.

scale

(boolean; from base::scale) If scale is TRUE then scaling is done by dividing the (centered) embedding dimensions by the standard deviation of the embedding if center is TRUE, and the root mean square otherwise.

Value

A vector comprising semantic distance scores.

Examples

library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
distance_scores <- textDistance(
  x = word_embeddings_4$texts$harmonytext,
  y = word_embeddings_4$texts$satisfactiontext
)
comment(distance_scores)
#> [1] "x embedding = .Information about the embeddings. textEmbedRawLayers: model: bert-base-uncased ; layers: 11 ; word_type_embeddings: TRUE ; max_token_to_sentence: 4 ; text_version: 0.9.99. textEmbedLayerAggregation: layers =  11 aggregation_from_layers_to_tokens =  concatenate aggregation_from_tokens_to_texts =  mean tokens_select =   tokens_deselect =  .y embedding = .Information about the embeddings. textEmbedRawLayers: model: bert-base-uncased ; layers: 11 ; word_type_embeddings: TRUE ; max_token_to_sentence: 4 ; text_version: 0.9.99. textEmbedLayerAggregation: layers =  11 aggregation_from_layers_to_tokens =  concatenate aggregation_from_tokens_to_texts =  mean tokens_select =   tokens_deselect =  .method = .euclidean.center = .FALSE.scale = .FALSE"