Predicts the words that will follow a specified text prompt. STILL UNDER DEVELOPMENT

textGeneration(
  x,
  model = "gpt2",
  device = "cpu",
  tokenizer_parallelism = FALSE,
  logging_level = "warning",
  return_incorrect_results = FALSE,
  return_tensors = FALSE,
  return_text = TRUE,
  return_full_text = TRUE,
  clean_up_tokenization_spaces = FALSE,
  prefix = "",
  handle_long_generation = NULL
)

Arguments

x

(string) A variable or a tibble/dataframe with at least one character variable.

model

(string) Specification of a pre-trained language model that have been trained with an autoregressive language modeling objective, which includes the uni-directional models (e.g., gpt2).

device

(string) Device to use: 'cpu', 'gpu', or 'gpu:k' where k is a specific device number

tokenizer_parallelism

(boolean) If TRUE this will turn on tokenizer parallelism.

logging_level

(string) Set the logging level. Options (ordered from less logging to more logging): critical, error, warning, info, debug

return_incorrect_results

(boolean) Many models are not created to be able to provide generation - this setting stops them from returning incorrect results.

return_tensors

(boolean) Whether or not the output should include the prediction tensors (as token indices).

return_text

(boolean) Whether or not the outputs should include the decoded text.

return_full_text

(boolean) If FALSE only the added text is returned, otherwise the full text is returned. (This setting is only meaningful if return_text is set to TRUE)

clean_up_tokenization_spaces

(boolean) Option to clean up the potential extra spaces in the returned text.

prefix

(string) Option to add a prefix to prompt.

handle_long_generation

By default, this function does not handle long generation (those that exceed the model maximum length). (more info :https://github.com/huggingface/transformers/issues/14033#issuecomment-948385227). This setting provides some ways to work around the problem: None: default way, where no particular strategy is applied. "hole": Truncates left of input, and leaves a gap that is wide enough to let generation happen. (this might truncate a lot of the prompt and not suitable when generation exceed the model capacity)

Value

A tibble with generated text.

Examples

# \donttest{
# generated_text <- textGeneration("The meaning of life is")
# generated_text
# }