Skip to content

Time Series Data

Language models are not inherently good at performing calculations. Additionally, when performing vector search in Maira, date and time values are often treated as plain text and embedded alongside other query terms, rather than being recognized as true datetime values.

To address these challenges, this feature was built. It ensures accurate calculations and makes Maira aware of the datetime fields in your dataset, enabling more reliable and time-aware responses based on your query.

Time series data are datasets that can include time-related information (such as dates, timestamps, or ranges) and allow accurate querying and calculations based on time. With this feature, you can perform precise queries and calculations of any other field directly on your dataset.

This feature (data creation and upload) is currently available only through the API. Support for it in the platform UI will be added soon.

Create dataset for time series data

First you have to create a dataset for time series data, with a POST request to this endpoint

https://api.recommender.gigalogy.com/v1/gpt/datasets_ts

Example request body

{
  "name": "Power Consumption Dataset",
  "description": "This is a time-series dataset containing information about ...",
  "datetime_fields": [
    "stocked_at",
    "submitted_on",
    "sold_at"
  ],
  "primary_datetime_field": "timestamp",
  "tags": [
    "tag1",
    "tag2",
    "tag3"
  ]
}

Parameters

  • name: The display name of the dataset. This does not affect Maira's response generation in any way. It is intended to help users easily find and identify datasets.

  • description: A brief summary of the dataset’s purpose or contents. This does not affect Maira's response generation in any way. It is provided solely to help users quickly understand the dataset.

  • datetime_fields: One or more additional columns in your dataset that contain datetime-related information. Maira will recognize these as datetime fields, but they are not considered primary.

  • primary_datetime_field: The primary timestamp column used to order or align the time-series data.

  • tags: A list of tags describing the dataset’s domain or use case.
    • For example, a dataset used for monitoring electricity usage might have tags such as power_consumption or grid_monitoring. Multiple tags are allowed.
    • Tags are required for Time Series Data (unlike other datasets), as they are used to build a profile for this dataset.

Response

A successful response would look like this

{
  "detail": {
    "response": "Dataset created successfully",
    "dataset_id": "10eb6dc0-46c3-4471-8677-0599b2e17e16"
  }
}

Note:

  1. The dataset_id, would be required for uploading the data in the next step
  2. You can also find the dataset_id later using the below GET endpoint, which lists all the datasets of the project:
https://api.recommender.gigalogy.com/v1/gpt/datasets

Upload time series data

To upload time series data, send a PUT request to the following endpoint:

https://api.recommender.gigalogy.com/v1/gpt/datasets_ts/{dataset_id}/file

Use the dataset_id obtained in the previous step.

Supported file formats: CSV, JSON, PARQUET.

Notes:

  • This is a separate endpoint from the general document upload endpoint /v1/gpt/datasets/{dataset_id}/file.
  • Time series datasets do not require any training.

Here is an example curl request

curl -X PUT \
  'https://api.recommender.gigalogy.com/v1/gpt/datasets_ts/<dataset_id>/file' \
  -H 'accept: application/json' \
  -H 'project-key: <your-project-key>' \
  -H 'api-key: <your-api-key>' \
  -H 'Content-Type: multipart/form-data' \
  -F 'dataset_file=@test_time_series.csv;type=text/csv'

Uploading Additional Files to an Existing Dataset

When uploading additional CSV files to an existing time series dataset, the column keys do not need to match exactly. You may have additional or missing keys compared to previous uploads.

However, all datetime columns (as specified in your dataset’s datetime_fields and primary_datetime_field) must match exactly across all uploads.

Handling Blank Cells and Data Formats

Blank cells: Blank cells are acceptable and will be treated as null values. However, do not use a dash (-) to represent missing values in a time series column, as this will be interpreted as a string, not as a null.

Consistent formats: All values for a given key (column) must have the same data type. For example, do not mix decimal and integer values in the same column.

Create a Profile for Time Series Data

A profile must be created specifically for time series datasets. Note that one profile can support only one Time series Dataset**.

Note: A single profile can refer to only one time series dataset. You cannot include multiple time series datasets in a single profile. However, you may refer to one time series dataset and other non-time series datasets within the same profile.

When creating the profile, you must include the dataset's tag under the "include" parameter of data tags.

For example, if the time series dataset has the tag demo_tag, then the profile must also include this tag (include: demo_tag).

You can find more details about how to create profile here.