Maira Dataset Training

There are two endpoints for Maira dataset training.

Train Specific Dataset

Use this endpoint to train an specific dataset with all its documents.

Endpoint:

POST /v1/gpt/datasets/{dataset_id}/train

Example request body

{
  "train_type": "text",
  "batch_size": 1,
  "only_payload_update": false
}

Parameters:

dataset_id: str - ID of the dataset that is to be trained.
train_type: enum - Type of train, possible values are text, image.
batch_size: Optional[int] - Number of documents to be trained in one batch. Maximum value is 10. It is recommended to keep it default (1) unless you have any specific reason.
only_payload_update: Optional[bool] – If, after training a dataset, you need to update metadata such as filterable_fields or secondary_idx_column, you can use this parameter to update only the metadata without retraining the entire dataset. Note that when "only_payload_update" = true, Maira will only update the trained documents.
- Example: You can add filterable_fields to a Maira dataset. If you later want to make another field “filterable” and update the dataset using PUT /datasets/{dataset_id}, instead of retraining the whole dataset, you can update only the newly added key (filterable field) by setting this parameter to true. This can save significant time and cost.

Use this endpoint to train a specific document of a dataset.

Endpoint:

POST /v1/gpt/datasets/{dataset_id}/documents/{document_id}/train

Request body:

{
  "train_type": "text"
}

Parameters: