R/1_1_textEmbed.R
textEmbedLayerAggregation.Rd
Select and aggregate layers of hidden states to form a word embeddings.
textEmbedLayerAggregation(
word_embeddings_layers,
layers = "all",
aggregation_from_layers_to_tokens = "concatenate",
aggregation_from_tokens_to_texts = "mean",
return_tokens = FALSE,
tokens_select = NULL,
tokens_deselect = NULL
)
Layers outputted from textEmbedRawLayers.
The numbers of the layers to be aggregated (e.g., c(11:12) to aggregate the eleventh and twelfth). Note that layer 0 is the input embedding to the transformer, and should normally not be used. Selecting 'all' thus removes layer 0.
Method to carry out the aggregation among the layers for each word/token, including "min", "max" and "mean" which takes the minimum, maximum or mean across each column; or "concatenate", which links together each layer of the word embedding to one long row. Default is "concatenate"
Method to carry out the aggregation among the word embeddings for the words/tokens, including "min", "max" and "mean" which takes the minimum, maximum or mean across each column; or "concatenate", which links together each layer of the word embedding to one long row.
If TRUE, provide the tokens used in the specified transformer model.
Option to only select embeddings linked to specific tokens such as "[CLS]" and "[SEP]" (default NULL).
Option to deselect embeddings linked to specific tokens such as "[CLS]" and "[SEP]" (default NULL).
A tibble with word embeddings. Note that layer 0 is the input embedding to the transformer, which is normally not used.
see textEmbedRawLayers
and textEmbed
# \donttest{
# word_embeddings_layers <- textEmbedRawLayers(Language_based_assessment_data_8$harmonywords[1],
# layers = 11:12)
# word_embeddings <- textEmbedLayerAggregation(word_embeddings_layers$context, layers = 11)
# }