Pre-trained dimension reduction (experimental)
textEmbedReduce(
embeddings,
n_dim = NULL,
scalar = "fb20/scalar.csv",
pca = "fb20/rpca_roberta_768_D_20.csv"
)
(list) Embedding(s) - including, tokens, texts and/or word_types.
(numeric) Number of dimensions to reduce to.
(string or matrix) Name or URL to scalar for standardizing the embeddings. If a URL, the function first examines whether it has been downloaded before. The string should be to a csv file containing a matrix with the pca weights for matrix multiplication. For more information see reference below.
(string or matrix) Name or URL to pca weights. If a URL, the function first examines whether it has been downlaoded before. The string should be to a csv file containing a matrix. For more information see reference below.
Returns embeddings with reduced number of dimensions.
To use this method please see and cite:
Ganesan, A. V., Matero, M., Ravula, A. R., Vu, H., & Schwartz, H. A. (2021, June).
Empirical evaluation of pre-trained transformers for human-level nlp: The role of sample size and dimensionality.
In Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting
(Vol. 2021, p. 4515).
NIH Public Access.
See Git-Hub Empirical-Evaluation
if (FALSE) { # \dontrun{
embeddings <- textEmbedReduce(word_embeddings_4$texts)
} # }