Semantic similarity across multiple word embeddings

textSimilarityMatrix computes semantic similarity scores between all combinations in a word embedding

Usage

textSimilarityMatrix(x, method = "cosine", center = TRUE, scale = FALSE)

Arguments

x: Word embeddings from textEmbed().
method: (character) Character string describing type of measure to be computed. Default is "cosine" (see also "spearmen", "pearson" as well as measures from textDistance() (which here is computed as 1 - textDistance) including "euclidean", "maximum", "manhattan", "canberra", "binary" and "minkowski").
center: (boolean; from base::scale) If center is TRUE then centering is done by subtracting the column means (omitting NAs) of x from their corresponding columns, and if center is FALSE, no centering is done.
scale: (boolean; from base::scale) If scale is TRUE then scaling is done by dividing the (centered) columns of x by their standard deviations if center is TRUE, and the root mean square otherwise.

Value

A matrix of semantic similarity scores

Examples

similarity_scores <- textSimilarityMatrix(word_embeddings_4$texts$harmonytext[1:3, ])
round(similarity_scores, 3)
#>       [,1]  [,2]  [,3]
#> [1,] 1.000 0.855 0.729
#> [2,] 0.855 1.000 0.885
#> [3,] 0.729 0.885 1.000

Semantic similarity across multiple word embeddings

Usage

Arguments

Value

See also

Examples