Skip to content

Pre-trained dimension reduction (experimental)

Usage

textEmbedReduce(
  embeddings,
  n_dim = NULL,
  scalar = "fb20/scalar.csv",
  pca = "fb20/rpca_roberta_768_D_20.csv"
)

Arguments

embeddings

(list) Embedding(s) - including, tokens, texts and/or word_types.

n_dim

(numeric) Number of dimensions to reduce to.

scalar

(string or matrix) Name or URL to scalar for standardizing the embeddings. If a URL, the function first examines whether it has been downloaded before. The string should be to a csv file containing a matrix with the pca weights for matrix multiplication. For more information see reference below.

pca

(string or matrix) Name or URL to pca weights. If a URL, the function first examines whether it has been downlaoded before. The string should be to a csv file containing a matrix. For more information see reference below.

Value

Returns embeddings with reduced number of dimensions.

Details

To use this method please see and cite:
Ganesan, A. V., Matero, M., Ravula, A. R., Vu, H., & Schwartz, H. A. (2021, June). Empirical evaluation of pre-trained transformers for human-level nlp: The role of sample size and dimensionality. In Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting (Vol. 2021, p. 4515). NIH Public Access.

See Git-Hub Empirical-Evaluation

See also

Examples

if (FALSE) { # \dontrun{
embeddings <- textEmbedReduce(word_embeddings_4$texts)
} # }

GitHub