Text generation

textGeneration() predicts the words that will follow a specified text prompt. (experimental)

Usage

textGeneration(
  x,
  model = "gpt2",
  device = "cpu",
  tokenizer_parallelism = FALSE,
  max_length = NULL,
  max_new_tokens = 20,
  min_length = 0,
  min_new_tokens = NULL,
  logging_level = "warning",
  force_return_results = FALSE,
  return_tensors = FALSE,
  return_full_text = TRUE,
  clean_up_tokenization_spaces = FALSE,
  prefix = "",
  handle_long_generation = NULL,
  set_seed = 202208L
)

Arguments

x: (string) A variable or a tibble/dataframe with at least one character variable.
model: (string) Specification of a pre-trained language model that have been trained with an autoregressive language modeling objective, which includes the uni-directional models (e.g., gpt2).
device: (string) Device to use: 'cpu', 'gpu', or 'gpu:k' where k is a specific device number
tokenizer_parallelism: (boolean) If TRUE this will turn on tokenizer parallelism.
max_length: (Integer) The maximum length the generated tokens can have. Corresponds to the length of the input prompt + `max_new_tokens`. Its effect is overridden by `max_new_tokens`, if also set. Defaults to NULL.
max_new_tokens: (Integer) The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt. The default value is 20.
min_length: (Integer) The minimum length of the sequence to be generated. Corresponds to the length of the input prompt + `min_new_tokens`. Its effect is overridden by `min_new_tokens`, if also set. The default value is 0.
min_new_tokens: (Integer) The minimum numbers of tokens to generate, ignoring the number of tokens in the prompt. Default is NULL.
logging_level: (string) Set the logging level. Options (ordered from less logging to more logging): critical, error, warning, info, debug
force_return_results: (boolean) Stop returning some incorrectly formatted/structured results. This setting does CANOT evaluate the actual results (whether or not they make sense, exist, etc.). All it does is to ensure the returned results are formatted correctly (e.g., does the question-answering dictionary contain the key "answer", is sentiments from textClassify containing the labels "positive" and "negative").
return_tensors: (boolean) Whether or not the output should include the prediction tensors (as token indices).
return_full_text: (boolean) If FALSE only the added text is returned, otherwise the full text is returned. (This setting is only meaningful if return_text is set to TRUE)
clean_up_tokenization_spaces: (boolean) Option to clean up the potential extra spaces in the returned text.
prefix: (string) Option to add a prefix to prompt.
handle_long_generation: By default, this function does not handle long generation (those that exceed the model maximum length).
set_seed: (Integer) Set seed. (more info :https://github.com/huggingface/transformers/issues/14033#issuecomment-948385227). This setting provides some ways to work around the problem: None: default way, where no particular strategy is applied. "hole": Truncates left of input, and leaves a gap that is wide enough to let generation happen. (this might truncate a lot of the prompt and not suitable when generation exceed the model capacity)

Value

A tibble with generated text.

Examples

# \donttest{
# generated_text <- textGeneration("The meaning of life is")
# generated_text
# }

Usage

Arguments

Value

See also

Examples