Files

23 lines
867 B
Elixir
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

defmodule Tokenizers do
@moduledoc """
Elixir bindings to [Hugging Face Tokenizers](https://github.com/huggingface/tokenizers).
Hugging Face describes the Tokenizers library as:
> Fast State-of-the-art tokenizers, optimized for both research and
> production
>
> 🤗 Tokenizers provides an implementation of todays most used
> tokenizers, with a focus on performance and versatility. These
> tokenizers are also used in 🤗 Transformers.
A tokenizer is effectively a pipeline of transformations that take
a text input and return an encoded version of that text (`t:Tokenizers.Encoding.t/0`).
The main entrypoint to this library is the `Tokenizers.Tokenizer`
module, which defines the `t:Tokenizers.Tokenizer.t/0` struct, a
container holding the constituent parts of the pipeline. Most
functionality is in that module.
"""
end