# Processors

Transformers ライブラリでは、プロセッサは 2 つの異なる意味を持ちます。
- [Wav2Vec2](../model_doc/wav2vec2) などのマルチモーダル モデルの入力を前処理するオブジェクト (音声とテキスト)
  または [CLIP](../model_doc/clip) (テキストとビジョン)
- 古いバージョンのライブラリで GLUE または SQUAD のデータを前処理するために使用されていたオブジェクトは非推奨になりました。

## Multi-modal processors[[transformers.ProcessorMixin]]

マルチモーダル モデルでは、オブジェクトが複数のモダリティ (テキスト、
視覚と音声）。これは、2 つ以上の処理オブジェクトをグループ化するプロセッサーと呼ばれるオブジェクトによって処理されます。
トークナイザー (テキスト モダリティ用)、画像プロセッサー (視覚用)、特徴抽出器 (オーディオ用) など。

これらのプロセッサは、保存およびロード機能を実装する次の基本クラスを継承します。

#### transformers.ProcessorMixin[[transformers.ProcessorMixin]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L484)

This is a mixin used to provide saving/loading functionality for all processor classes.

apply_chat_templatetransformers.ProcessorMixin.apply_chat_templatehttps://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L1522[{"name": "conversation", "val": ": typing.Union[list[dict[str, str]], list[list[dict[str, str]]]]"}, {"name": "chat_template", "val": ": typing.Optional[str] = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.processing_utils.AllKwargsForChatTemplate]"}]- **conversation** (`Union[list[Dict, [str, str]], list[list[dict[str, str]]]]`) --
  The conversation to format.
- **chat_template** (`Optional[str]`, *optional*) --
  The Jinja template to use for formatting the conversation. If not provided, the tokenizer's
  chat template is used.0

Similar to the `apply_chat_template` method on tokenizers, this method applies a Jinja template to input
conversations to turn them into a single tokenizable string.

The input is expected to be in the following format, where each message content is a list consisting of text and
optionally image or video inputs. One can also provide an image, video, URL or local path which will be used to form
`pixel_values` when `return_dict=True`. If not provided, one will get only the formatted text, optionally tokenized text.

conversation = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://www.ilankelman.org/stopsigns/australia.jpg"},
{"type": "text", "text": "Please describe this image in detail."},
],
},
]

**Parameters:**

conversation (`Union[list[Dict, [str, str]], list[list[dict[str, str]]]]`) : The conversation to format.

chat_template (`Optional[str]`, *optional*) : The Jinja template to use for formatting the conversation. If not provided, the tokenizer's chat template is used.
#### batch_decode[[transformers.ProcessorMixin.batch_decode]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L1482)

This method forwards all its arguments to PreTrainedTokenizer's [batch_decode()](/docs/transformers/v4.57.3/ja/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.batch_decode). Please
refer to the docstring of this method for more information.
#### check_argument_for_proper_class[[transformers.ProcessorMixin.check_argument_for_proper_class]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L599)

Checks the passed argument's class against the expected transformers class. In case of an unexpected
mismatch between expected and actual class, an error is raise. Otherwise, the proper retrieved class
is returned.
#### decode[[transformers.ProcessorMixin.decode]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L1491)

This method forwards all its arguments to PreTrainedTokenizer's [decode()](/docs/transformers/v4.57.3/ja/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.decode). Please refer to
the docstring of this method for more information.
#### from_args_and_dict[[transformers.ProcessorMixin.from_args_and_dict]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L1149)

Instantiates a type of `~processing_utils.ProcessingMixin` from a Python dictionary of parameters.

**Parameters:**

processor_dict (`dict[str, Any]`) : Dictionary that will be used to instantiate the processor object. Such a dictionary can be retrieved from a pretrained checkpoint by leveraging the `~processing_utils.ProcessingMixin.to_dict` method.

kwargs (`dict[str, Any]`) : Additional parameters from which to initialize the processor object.

**Returns:**

``~processing_utils.ProcessingMixin``

The processor object instantiated from those
parameters.
#### from_pretrained[[transformers.ProcessorMixin.from_pretrained]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L1335)

Instantiate a processor associated with a pretrained model.

This class method is simply calling the feature extractor
[from_pretrained()](/docs/transformers/v4.57.3/ja/main_classes/feature_extractor#transformers.FeatureExtractionMixin.from_pretrained), image processor
[ImageProcessingMixin](/docs/transformers/v4.57.3/ja/internal/image_processing_utils#transformers.ImageProcessingMixin) and the tokenizer
`~tokenization_utils_base.PreTrainedTokenizer.from_pretrained` methods. Please refer to the docstrings of the
methods above for more information.

**Parameters:**

pretrained_model_name_or_path (`str` or `os.PathLike`) : This can be either:  - a string, the *model id* of a pretrained feature_extractor hosted inside a model repo on huggingface.co. - a path to a *directory* containing a feature extractor file saved using the [save_pretrained()](/docs/transformers/v4.57.3/ja/main_classes/feature_extractor#transformers.FeatureExtractionMixin.save_pretrained) method, e.g., `./my_model_directory/`. - a path or url to a saved feature extractor JSON *file*, e.g., `./my_model_directory/preprocessor_config.json`.

- ****kwargs** : Additional keyword arguments passed along to both [from_pretrained()](/docs/transformers/v4.57.3/ja/main_classes/feature_extractor#transformers.FeatureExtractionMixin.from_pretrained) and `~tokenization_utils_base.PreTrainedTokenizer.from_pretrained`.
#### get_processor_dict[[transformers.ProcessorMixin.get_processor_dict]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L890)

From a `pretrained_model_name_or_path`, resolve to a dictionary of parameters, to be used for instantiating a
processor of type `~processing_utils.ProcessingMixin` using `from_args_and_dict`.

**Parameters:**

pretrained_model_name_or_path (`str` or `os.PathLike`) : The identifier of the pre-trained checkpoint from which we want the dictionary of parameters.

subfolder (`str`, *optional*, defaults to `""`) : In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can specify the folder name here.

**Returns:**

``tuple[Dict, Dict]``

The dictionary(ies) that will be used to instantiate the processor object.
#### post_process_image_text_to_text[[transformers.ProcessorMixin.post_process_image_text_to_text]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L1742)

Post-process the output of a vlm to decode the text.

**Parameters:**

generated_outputs (`torch.Tensor` or `np.ndarray`) : The output of the model `generate` function. The output is expected to be a tensor of shape `(batch_size, sequence_length)` or `(sequence_length,)`.

skip_special_tokens (`bool`, *optional*, defaults to `True`) : Whether or not to remove special tokens in the output. Argument passed to the tokenizer's `batch_decode` method.

- ****kwargs** : Additional arguments to be passed to the tokenizer's `batch_decode method`.

**Returns:**

``list[str]``

The decoded text.
#### push_to_hub[[transformers.ProcessorMixin.push_to_hub]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/utils/hub.py#L848)

Upload the processor files to the 🤗 Model Hub.

Examples:

```python
from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained("google-bert/bert-base-cased")

# Push the processor to your namespace with the name "my-finetuned-bert".
processor.push_to_hub("my-finetuned-bert")

# Push the processor to an organization with the name "my-finetuned-bert".
processor.push_to_hub("huggingface/my-finetuned-bert")
```

**Parameters:**

repo_id (`str`) : The name of the repository you want to push your processor to. It should contain your organization name when pushing to a given organization.

use_temp_dir (`bool`, *optional*) : Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub. Will default to `True` if there is no directory named like `repo_id`, `False` otherwise.

commit_message (`str`, *optional*) : Message to commit while pushing. Will default to `"Upload processor"`.

private (`bool`, *optional*) : Whether to make the repo private. If `None` (default), the repo will be public unless the organization's default is private. This value is ignored if the repo already exists.

token (`bool` or `str`, *optional*) : The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated when running `hf auth login` (stored in `~/.huggingface`). Will default to `True` if `repo_url` is not specified.

max_shard_size (`int` or `str`, *optional*, defaults to `"5GB"`) : Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size lower than this size. If expressed as a string, needs to be digits followed by a unit (like `"5MB"`). We default it to `"5GB"` so that users can easily load models on free-tier Google Colab instances without any CPU OOM issues.

create_pr (`bool`, *optional*, defaults to `False`) : Whether or not to create a PR with the uploaded files or directly commit.

safe_serialization (`bool`, *optional*, defaults to `True`) : Whether or not to convert the model weights in safetensors format for safer serialization.

revision (`str`, *optional*) : Branch to push the uploaded files to.

commit_description (`str`, *optional*) : The description of the commit that will be created

tags (`list[str]`, *optional*) : List of tags to push on the Hub.
#### register_for_auto_class[[transformers.ProcessorMixin.register_for_auto_class]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L1398)

Register this class with a given auto class. This should only be used for custom feature extractors as the ones
in the library are already mapped with `AutoProcessor`.

**Parameters:**

auto_class (`str` or `type`, *optional*, defaults to `"AutoProcessor"`) : The auto class to register this new feature extractor with.
#### save_pretrained[[transformers.ProcessorMixin.save_pretrained]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L715)

Saves the attributes of this processor (feature extractor, tokenizer...) in the specified directory so that it
can be reloaded using the [from_pretrained()](/docs/transformers/v4.57.3/ja/main_classes/processors#transformers.ProcessorMixin.from_pretrained) method.

This class method is simply calling [save_pretrained()](/docs/transformers/v4.57.3/ja/main_classes/feature_extractor#transformers.FeatureExtractionMixin.save_pretrained) and
[save_pretrained()](/docs/transformers/v4.57.3/ja/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.save_pretrained). Please refer to the docstrings of the
methods above for more information.

**Parameters:**

save_directory (`str` or `os.PathLike`) : Directory where the feature extractor JSON file and the tokenizer files will be saved (directory will be created if it does not exist).

push_to_hub (`bool`, *optional*, defaults to `False`) : Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to with `repo_id` (will default to the name of `save_directory` in your namespace).

legacy_serialization (`bool`, *optional*, defaults to `True`) : Whether or not to save processor attributes in separate config files (legacy) or in processor's config file as a nested dict. Saving all attributes in a single dict will become the default in future versions. Set to `legacy_serialization=True` until then.

kwargs (`dict[str, Any]`, *optional*) : Additional key word arguments passed along to the [push_to_hub()](/docs/transformers/v4.57.3/ja/main_classes/model#transformers.utils.PushToHubMixin.push_to_hub) method.
#### to_dict[[transformers.ProcessorMixin.to_dict]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L620)

Serializes this instance to a Python dictionary.

**Returns:**

``dict[str, Any]``

Dictionary of all the attributes that make up this processor instance.
#### to_json_file[[transformers.ProcessorMixin.to_json_file]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L699)

Save this instance to a JSON file.

**Parameters:**

json_file_path (`str` or `os.PathLike`) : Path to the JSON file in which this processor instance's parameters will be saved.
#### to_json_string[[transformers.ProcessorMixin.to_json_string]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/processing_utils.py#L688)

Serializes this instance to a JSON string.

**Returns:**

``str``

String containing all the attributes that make up this feature_extractor instance in JSON format.

## Deprecated processors[[transformers.DataProcessor]]

すべてのプロセッサは、同じアーキテクチャに従っています。
[DataProcessor](/docs/transformers/v4.57.3/ja/main_classes/processors#transformers.DataProcessor)。プロセッサは次のリストを返します。
[InputExample](/docs/transformers/v4.57.3/ja/main_classes/processors#transformers.InputExample)。これら
[InputExample](/docs/transformers/v4.57.3/ja/main_classes/processors#transformers.InputExample) は次のように変換できます。
`~data.processors.utils.Input features` をモデルにフィードします。

#### transformers.DataProcessor[[transformers.DataProcessor]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/utils.py#L80)

Base class for data converters for sequence classification data sets.

get_dev_examplestransformers.DataProcessor.get_dev_exampleshttps://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/utils.py#L97[{"name": "data_dir", "val": ""}]
Gets a collection of [InputExample](/docs/transformers/v4.57.3/ja/main_classes/processors#transformers.InputExample) for the dev set.
#### get_example_from_tensor_dict[[transformers.DataProcessor.get_example_from_tensor_dict]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/utils.py#L83)

Gets an example from a dict with tensorflow tensors.

**Parameters:**

tensor_dict : Keys and values should match the corresponding Glue tensorflow_dataset examples.
#### get_labels[[transformers.DataProcessor.get_labels]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/utils.py#L105)

Gets the list of labels for this data set.
#### get_test_examples[[transformers.DataProcessor.get_test_examples]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/utils.py#L101)

Gets a collection of [InputExample](/docs/transformers/v4.57.3/ja/main_classes/processors#transformers.InputExample) for the test set.
#### get_train_examples[[transformers.DataProcessor.get_train_examples]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/utils.py#L93)

Gets a collection of [InputExample](/docs/transformers/v4.57.3/ja/main_classes/processors#transformers.InputExample) for the train set.
#### tfds_map[[transformers.DataProcessor.tfds_map]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/utils.py#L109)

Some tensorflow_datasets datasets are not formatted the same way the GLUE datasets are. This method converts
examples to the correct format.

#### transformers.InputExample[[transformers.InputExample]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/utils.py#L30)

A single training/test example for simple sequence classification.

to_json_stringtransformers.InputExample.to_json_stringhttps://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/utils.py#L49[]
Serializes this instance to a JSON string.

**Parameters:**

guid : Unique id for the example.

text_a : string. The untokenized text of the first sequence. For single sequence tasks, only this sequence must be specified.

text_b : (Optional) string. The untokenized text of the second sequence. Only must be specified for sequence pair tasks.

label : (Optional) string. The label of the example. This should be specified for train and dev examples, but not for test examples.

#### transformers.InputFeatures[[transformers.InputFeatures]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/utils.py#L55)

A single set of features of data. Property names are the same names as the corresponding inputs to a model.

to_json_stringtransformers.InputFeatures.to_json_stringhttps://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/utils.py#L75[]
Serializes this instance to a JSON string.

**Parameters:**

input_ids : Indices of input sequence tokens in the vocabulary.

attention_mask : Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: Usually `1` for tokens that are NOT MASKED, `0` for MASKED (padded) tokens.

token_type_ids : (Optional) Segment token indices to indicate first and second portions of the inputs. Only some models use them.

label : (Optional) Label corresponding to the input. Int for classification problems, float for regression problems.

## GLUE[[transformers.glue_convert_examples_to_features]]

[一般言語理解評価 (GLUE)](https://gluebenchmark.com/) は、
既存の NLU タスクの多様なセットにわたるモデルのパフォーマンス。紙と同時発売された [GLUE: A
自然言語理解のためのマルチタスクベンチマークおよび分析プラットフォーム](https://openreview.net/pdf?id=rJ4km2R5t7)

このライブラリは、MRPC、MNLI、MNLI (不一致)、CoLA、SST2、STSB、
QQP、QNLI、RTE、WNLI。

それらのプロセッサは次のとおりです。

- `~data.processors.utils.MrpcProcessor`
- `~data.processors.utils.MnliProcessor`
- `~data.processors.utils.MnliMismatchedProcessor`
- `~data.processors.utils.Sst2Processor`
- `~data.processors.utils.StsbProcessor`
- `~data.processors.utils.QqpProcessor`
- `~data.processors.utils.QnliProcessor`
- `~data.processors.utils.RteProcessor`
- `~data.processors.utils.WnliProcessor`

さらに、次のメソッドを使用して、データ ファイルから値をロードし、それらをリストに変換することができます。
[InputExample](/docs/transformers/v4.57.3/ja/main_classes/processors#transformers.InputExample)。

#### transformers.glue_convert_examples_to_features[[transformers.glue_convert_examples_to_features]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/glue.py#L41)

Loads a data file into a list of `InputFeatures`

**Parameters:**

examples : List of `InputExamples` or `tf.data.Dataset` containing the examples.

tokenizer : Instance of a tokenizer that will tokenize the examples

max_length : Maximum example length. Defaults to the tokenizer's max_len

task : GLUE task

label_list : List of labels. Can be obtained from the processor using the `processor.get_labels()` method

output_mode : String indicating the output mode. Either `regression` or `classification`

**Returns:**

If the `examples` input is a `tf.data.Dataset`, will return a `tf.data.Dataset` containing the task-specific
features. If the input is a list of `InputExamples`, will return a list of task-specific `InputFeatures` which
can be fed to the model.

## XNLI

[クロスリンガル NLI コーパス (XNLI)](https://www.nyu.edu/projects/bowman/xnli/) は、
言語を超えたテキスト表現の品質。 XNLI は、[*MultiNLI*](http://www.nyu.edu/projects/bowman/multinli/) に基づくクラウドソースのデータセットです。テキストのペアには、15 個のテキスト含意アノテーションがラベル付けされています。
さまざまな言語 (英語などの高リソース言語とスワヒリ語などの低リソース言語の両方を含む)。

論文 [XNLI: Evaluating Cross-lingual Sentence Representations](https://huggingface.co/papers/1809.05053) と同時にリリースされました。

このライブラリは、XNLI データをロードするプロセッサをホストします。

- `~data.processors.utils.XnliProcessor`

テストセットにはゴールドラベルが付いているため、評価はテストセットで行われますのでご了承ください。

これらのプロセッサを使用する例は、[run_xnli.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification/run_xnli.py) スクリプトに示されています。

## SQuAD

[The Stanford Question Answering Dataset (SQuAD)](https://rajpurkar.github.io/SQuAD-explorer//) は、次のベンチマークです。
質問応答に関するモデルのパフォーマンスを評価します。 v1.1 と v2.0 の 2 つのバージョンが利用可能です。最初のバージョン
(v1.1) は、論文 [SQuAD: 100,000+ question for Machine Comprehension of Text](https://huggingface.co/papers/1606.05250) とともにリリースされました。 2 番目のバージョン (v2.0) は、論文 [Know What You Don't と同時にリリースされました。
知っておくべき: SQuAD の答えられない質問](https://huggingface.co/papers/1806.03822)。

このライブラリは、次の 2 つのバージョンのそれぞれのプロセッサをホストします。

### Processors[[transformers.data.processors.squad.SquadProcessor]]

それらのプロセッサは次のとおりです。

- `~data.processors.utils.SquadV1Processor`
- `~data.processors.utils.SquadV2Processor`

どちらも抽象クラス `~data.processors.utils.SquadProcessor` を継承しています。

#### transformers.data.processors.squad.SquadProcessor[[transformers.data.processors.squad.SquadProcessor]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/squad.py#L541)

Processor for the SQuAD data set. overridden by SquadV1Processor and SquadV2Processor, used by the version 1.1 and
version 2.0 of SQuAD, respectively.

get_dev_examplestransformers.data.processors.squad.SquadProcessor.get_dev_exampleshttps://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/squad.py#L629[{"name": "data_dir", "val": ""}, {"name": "filename", "val": " = None"}]- **data_dir** -- Directory containing the data files used for training and evaluating.
- **filename** -- None by default, specify this if the evaluation file has a different name than the original one
  which is `dev-v1.1.json` and `dev-v2.0.json` for squad versions 1.1 and 2.0 respectively.0

Returns the evaluation example from the data directory.

**Parameters:**

data_dir : Directory containing the data files used for training and evaluating.

filename : None by default, specify this if the evaluation file has a different name than the original one which is `dev-v1.1.json` and `dev-v2.0.json` for squad versions 1.1 and 2.0 respectively.
#### get_examples_from_dataset[[transformers.data.processors.squad.SquadProcessor.get_examples_from_dataset]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/squad.py#L574)

Creates a list of `SquadExample` using a TFDS dataset.

Examples:

```python
>>> import tensorflow_datasets as tfds

>>> dataset = tfds.load("squad")

>>> training_examples = get_examples_from_dataset(dataset, evaluate=False)
>>> evaluation_examples = get_examples_from_dataset(dataset, evaluate=True)
```

**Parameters:**

dataset : The tfds dataset loaded from *tensorflow_datasets.load("squad")*

evaluate : Boolean specifying if in evaluation mode or in training mode

**Returns:**

List of SquadExample
#### get_train_examples[[transformers.data.processors.squad.SquadProcessor.get_train_examples]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/squad.py#L607)

Returns the training examples from the data directory.

**Parameters:**

data_dir : Directory containing the data files used for training and evaluating.

filename : None by default, specify this if the training file has a different name than the original one which is `train-v1.1.json` and `train-v2.0.json` for squad versions 1.1 and 2.0 respectively.

さらに、次のメソッドを使用して、SQuAD の例を次の形式に変換できます。
モデルの入力として使用できる `~data.processors.utils.SquadFeatures`。

#### transformers.squad_convert_examples_to_features[[transformers.squad_convert_examples_to_features]]

[Source](https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/data/processors/squad.py#L317)

Converts a list of examples into a list of features that can be directly given as input to a model. It is
model-dependant and takes advantage of many of the tokenizer's features to create the model's inputs.

Example:

```python
processor = SquadV2Processor()
examples = processor.get_dev_examples(data_dir)

features = squad_convert_examples_to_features(
    examples=examples,
    tokenizer=tokenizer,
    max_seq_length=args.max_seq_length,
    doc_stride=args.doc_stride,
    max_query_length=args.max_query_length,
    is_training=not evaluate,
)
```

**Parameters:**

examples : list of `SquadExample`

tokenizer : an instance of a child of [PreTrainedTokenizer](/docs/transformers/v4.57.3/ja/main_classes/tokenizer#transformers.PreTrainedTokenizer)

max_seq_length : The maximum sequence length of the inputs.

doc_stride : The stride used when the context is too large and is split across several features.

max_query_length : The maximum length of the query.

is_training : whether to create features for model evaluation or model training.

padding_strategy : Default to "max_length". Which padding strategy to use

return_dataset : Default False. Either 'pt' or 'tf'. if 'pt': returns a torch.data.TensorDataset, if 'tf': returns a tf.data.Dataset

threads : multiple processing threads.

**Returns:**

list of `SquadFeatures`

これらのプロセッサと前述の方法は、データを含むファイルだけでなく、
*tensorflow_datasets* パッケージ。以下に例を示します。

### Example usage

以下にプロセッサを使用した例と、データ ファイルを使用した変換方法を示します。

```python
# Loading a V2 processor
processor = SquadV2Processor()
examples = processor.get_dev_examples(squad_v2_data_dir)

# Loading a V1 processor
processor = SquadV1Processor()
examples = processor.get_dev_examples(squad_v1_data_dir)

features = squad_convert_examples_to_features(
    examples=examples,
    tokenizer=tokenizer,
    max_seq_length=max_seq_length,
    doc_stride=args.doc_stride,
    max_query_length=max_query_length,
    is_training=not evaluate,
)
```

*tensorflow_datasets* の使用は、データ ファイルを使用するのと同じくらい簡単です。

```python
# tensorflow_datasets only handle Squad V1.
tfds_examples = tfds.load("squad")
examples = SquadV1Processor().get_examples_from_dataset(tfds_examples, evaluate=evaluate)

features = squad_convert_examples_to_features(
    examples=examples,
    tokenizer=tokenizer,
    max_seq_length=max_seq_length,
    doc_stride=args.doc_stride,
    max_query_length=max_query_length,
    is_training=not evaluate,
)
```

これらのプロセッサを使用する別の例は、[run_squad.py](https://github.com/huggingface/transformers/tree/main/examples/legacy/question-answering/run_squad.py) スクリプトに示されています。