Summarize texts. (experimental)
Usage
textSum(
x,
min_length = 10L,
max_length = 20L,
model = "t5-small",
device = "cpu",
tokenizer_parallelism = FALSE,
logging_level = "warning",
force_return_results = FALSE,
return_text = TRUE,
return_tensors = FALSE,
clean_up_tokenization_spaces = FALSE,
set_seed = 202208L
)
Arguments
- x
(string) A variable or a tibble/dataframe with at least one character variable.
- min_length
(explicit integer; e.g., 10L) The minimum number of tokens in the summed output.
- max_length
(explicit integer higher than min_length; e.g., 20L) The maximum number of tokens in the summed output.
- model
(string) Specififcation of a pre-trained language model that have been fine-tuned on a summarization task, such as ’bart-large-cnn’, ’t5-small’, ’t5-base’, ’t5-large’, ’t5-3b’, ’t5-11b’.
- device
(string) Device to use: 'cpu', 'gpu', or 'gpu:k' where k is a specific device number.
- tokenizer_parallelism
(boolean) If TRUE this will turn on tokenizer parallelism.
- logging_level
(string) Set the logging level. Options (ordered from less logging to more logging): critical, error, warning, info, debug
- force_return_results
(boolean) Stop returning some incorrectly formatted/structured results. This setting does CANOT evaluate the actual results (whether or not they make sense, exist, etc.). All it does is to ensure the returned results are formatted correctly (e.g., does the question-answering dictionary contain the key "answer", is sentiments from textClassify containing the labels "positive" and "negative").
- return_text
(boolean) Whether or not the outputs should include the decoded text.
- return_tensors
(boolean) Whether or not the output should include the prediction tensors (as token indices).
- clean_up_tokenization_spaces
(boolean) Option to clean up the potential extra spaces in the returned text.
- set_seed
(Integer) Set seed.
See also
see textClassify
, textGeneration
, textNER
,
textSum
, textQA
, textTranslate