Summarize texts STILL UNDER DEVELOPMENT

textSum(
  x,
  min_length = 10L,
  max_length = 20L,
  model = "t5-small",
  device = "cpu",
  tokenizer_parallelism = FALSE,
  logging_level = "warning",
  return_incorrect_results = FALSE,
  return_text = TRUE,
  return_tensors = FALSE,
  clean_up_tokenization_spaces = FALSE
)

Arguments

x

(string) A variable or a tibble/dataframe with at least one character variable.

min_length

(explicit integer; e.g., 10L) The minimum number of tokens in the summed output.

max_length

(explicit integer higher than min_length; e.g., 20L) The maximum number of tokens in the summed output.

model

(string) Specififcation of a pre-trained language model that have been fine-tuned on a summarization task, such as ’bart-large-cnn’, ’t5-small’, ’t5-base’, ’t5-large’, ’t5-3b’, ’t5-11b’.

device

(string) Device to use: 'cpu', 'gpu', or 'gpu:k' where k is a specific device number.

tokenizer_parallelism

(boolean) If TRUE this will turn on tokenizer parallelism.

logging_level

(string) Set the logging level. Options (ordered from less logging to more logging): critical, error, warning, info, debug

return_incorrect_results

(boolean) Many models are not created to be able to provide summerization - this setting stops them from returning incorrect results.

return_text

(boolean) Whether or not the outputs should include the decoded text.

return_tensors

(boolean) Whether or not the output should include the prediction tensors (as token indices).

clean_up_tokenization_spaces

(boolean) Option to clean up the potential extra spaces in the returned text.

Value

A tibble with summed text(s).

Examples

# \donttest{
  # sum_examples <- textSum(Language_based_assessment_data_8[1:2,1:2],
  # min_length = 5L,
  # max_length = 10L)
# }