R/3_2_textSimilarityTest.R
textSimilarityTest.Rd
Test whether there is a significant difference in meaning between two sets of texts (i.e., between their word embeddings).
textSimilarityTest( x, y, Npermutations = 10000, method = "paired", alternative = c("two_sided", "less", "greater"), output.permutations = TRUE, N_cluster_nodes = 1, seed = 1001 )
x | Set of word embeddings from textEmbed. |
---|---|
y | Set of word embeddings from textEmbed. |
Npermutations | Number of permutations (default 1000). |
method | Compute a "paired" or an "unpaired" test. |
alternative | Use a two or one-sided test (select one of: "two_sided", "less", "greater"). |
output.permutations | If TRUE, returns permuted values in output. |
N_cluster_nodes | Number of cluster nodes to use (more makes computation faster; see parallel package). |
seed | Set different seed. |
A list with a p-value, cosine_estimate and permuted values if output.permutations=TRUE.
x <- wordembeddings4$harmonywords y <- wordembeddings4$satisfactionwords textSimilarityTest(x, y, method = "paired", Npermutations = 10, N_cluster_nodes = 1, alternative = "two_sided" )#> $random.estimates.4.null #> [1] 0.4983119 0.5576852 0.5302025 0.5523948 0.5192839 0.5069734 0.5426047 #> [8] 0.5364955 0.5186255 0.5659261 #> #> $embedding_x #> [1] "x : " #> #> $embedding_y #> [1] "y : " #> #> $test_description #> [1] "permutations = 10 method = paired alternative = two_sided" #> #> $time_date #> [1] "Duration to run the test: 0.031413 secs; Date created: 2021-02-12 19:00:50" #> #> $cosine_estimate #> [1] 0.6069308 #> #> $p.value #> [1] 0.09090909 #>