Compute cosine semantic similarity score between single words' word embeddings and the aggregated word embedding of all words.

textCentrality(
  words,
  wordembeddings,
  single_wordembeddings = single_wordembeddings_df,
  aggregation = "mean",
  min_freq_words_test = 0
)

Arguments

words

Word or text variable to be plotted.

wordembeddings

Word embeddings from textEmbed for the words to be plotted (i.e., the aggregated word embeddings for the "words" variable).

single_wordembeddings

Word embeddings from textEmbed for individual words (i.e., the decontextualized word embeddings).

aggregation

Method to aggregate the word embeddings (default = "mean"; see also "min", "max" or "[CLS]").

min_freq_words_test

Option to select words that have at least occurred a specified number of times (default = 0); when creating the semantic similarity scores within cosine similarity.

Value

A dataframe with variables (e.g., including semantic similarity, frequencies) for the individual words that are used for the plotting in the textCentralityPlot function.

See also

Examples

wordembeddings <- wordembeddings4 data <- Language_based_assessment_data_8 df_for_plotting <- textCentrality( data$harmonywords, wordembeddings$harmonywords, wordembeddings$singlewords_we ) df_for_plotting
#> # A tibble: 295 x 4 #> words n central_cosine n_percent #> <chr> <int> <dbl> <dbl> #> 1 accepting 2 NaN 0.00504 #> 2 agreeing 1 -0.241 0.00252 #> 3 alcohol 1 -0.0219 0.00252 #> 4 amazed 1 -0.112 0.00252 #> 5 amicable 1 -0.112 0.00252 #> 6 amity 1 0.226 0.00252 #> 7 amused 1 -0.321 0.00252 #> 8 anger 2 NaN 0.00504 #> 9 angry 2 NaN 0.00504 #> 10 animals 1 NaN 0.00252 #> # … with 285 more rows