About 50 results
Open links in new tab
  1. What does Keras Tokenizer method exactly do? - Stack Overflow

    On occasion, circumstances require us to do the following: from keras.preprocessing.text import Tokenizer tokenizer = Tokenizer(num_words=my_max) Then, invariably, we chant this mantra: …

  2. python - AutoTokenizer.from_pretrained fails to load locally saved ...

    from transformers import AutoTokenizer, AutoConfig tokenizer = AutoTokenizer.from_pretrained('distilroberta-base') config = AutoConfig.from_pretrained('distilroberta …

  3. How to apply max_length to truncate the token sequence from the left …

    May 11, 2022 · In the HuggingFace tokenizer, applying the max_length argument specifies the length of the tokenized text. I believe it truncates the sequence to max_length-2 (if truncation=True) by cutting …

  4. How to add all standard special tokens to my hugging face tokenizer …

    Aug 11, 2022 · I want all special tokens to always be available. How do I do this? My first attempt to give it to my tokenizer: def does_t5_have_sep_token(): tokenizer: PreTrainedTokenizerFast = …

  5. How to add new tokens to an existing Huggingface tokenizer?

    May 8, 2023 · # add the tokens to the tokenizer vocabulary tokenizer.add_tokens(list(new_tokens)) # add new, random embeddings for the new tokens model.resize_token_embeddings(len(tokenizer)) …

  6. Unable to get the tokenizer of Gemma-3 - Stack Overflow

    Mar 22, 2025 · 3 I am trying to get the tokenizer using huggingface AutoTokenizer library, but I am unable to fetch, is there any other way to get it? Where I am doing wrong?

  7. json - Tokenizer.from_file () HUGGINFACE - Stack Overflow

    Nov 1, 2022 · Tokenizer.from_file () HUGGINFACE : Exception: data did not match any variant of untagged enum ModelWrapper Asked 3 years, 5 months ago Modified 9 months ago Viewed 31k times

  8. python - How to fix "ValueError: This tokenizer cannot be instantiated ...

    Feb 28, 2023 · I'm trying to run a Hugging Face model using the following code in Google Colab: !pip install transformers from transformers import AutoTokenizer tokenizer = …

  9. How to load the saved tokenizer from pretrained model

    Oct 16, 2019 · I fine-tuned a pretrained BERT model in Pytorch using huggingface transformer. All the training/validation is done on a GPU in cloud. At the end of the training, I save the model and …

  10. what's difference between tokenizer.encode and tokenizer.encode_plus …

    The main difference is stemming from the additional information that encode_plus is providing. If you read the documentation on the respective functions, then there is a slight difference for encode(): …