Maira Project setup
Gigalogy's GPT based solutions allows you to make your own GPT solutions, trained with your own data, customized according to your needs. Here are some basics to get you started. Find Maira related endpoints in our sandbox.
Documents
Documents are information that GPT will consider as a single piece of information, such as Address of Gigalogy, What is Gigalogy personalization, etc.
Datasets
A Dataset is a collection of documents in a single file. For instance, single dataset may contain documents with information such as Address of Gigalogy, What is Gigalogy personalization, What is Maira.
Profiles
With each request sent to GPT endpoints POST /v1/gpt/ask
and POST /v1/gpt/ask/vision
, we include a parameter called gpt_profile_id
. This parameter's value points to a GPT profile. GPT profiles hold information that tells the GPT how to process the information provided (query) and how to respond.
To see more about what is inside a profile, check out the parameters and the example request body of the endpoint POST /v1/gpt/profiles
.
There are two types of profiles. One is for the /gpt/ask
endpoint, which is a general profile for any model except gpt-4-vision-preview
. The other is for the /gpt/ask/vision
endpoint, for which we currently support only gpt-4-vision-preview
as the model.
Project setup
Project setup for Maira involves preparing, uploading and training your data. Additionally, set up the required setting the parameters to suit your requirement.
Dataset
Upload Dataset
Use the endpoint POST /v1/gpt/datasets
to upload a dataset that will be used to train your customized GPT bot. Currently, we accept CSV and JSON format. You will find the required parameters and description in the sandbox in the link above.
How to see uploaded datasets
Once the dataset is uploaded, you can use /v1/gpt/datasets
to see all your datasets of your project. The response will give you below details along with the datasets ids. This dataset_id will be required to edit, delete, train your data.
{
"dataset_id": "a8bf8ddd-b5cb-4bea-a82b-4ac148f01c0a",
"created_at": "2023-12-24T20:23:34.992063+09:00",
"name": "NAME OF THE DATASET",
"description": DESCRIPTION OF THE DATASET,
"idx_column_name": "idx",
"image_url_column": "images"
}
Delete dataset
Use the endpoint DELETE /v1/gpt/datasets/{dataset_id}
to delete a dataset
or particular documents
from a dataset. You can find the expected request body, with required parameters and values in our sandbox here
Update Dataset
To be updated
Updating and deleting - Documents
To be updated
Training
Use endpoint POST /v1/gpt/dataset/train
to train your uploaded dataset. This endpoint will take the dataset id and image type. It is good practice to train only what is necessary to optimize the usage of resources.
Profile
Setup profile
Use POST /v1/gpt/profiles
to setup GPT profile. You can setup multiple profiles. However you will need to select one as the default profile
in the next step.
Notes for "system", "intro" and "Model" parameters.
System: This is a LLM feature and is used to set a persona or for rule-setting of the bot. The purpose here is to setup specific persona or rules in answer generation.
Intro: This is a Gigalogy feature. It is more about mode setting and general instructions for the Bot.
Profiles that are designed to give product recommendation, where we might have strict rules on how we want receive response, it's better to have it in system. For general profiles where it's more open, for example: it could answer like a FAQ or it could answer some product details, it's better to have the system more generalized and add the basic instructions in intro to setup the mode.
It is good practice to keep the above points in mind, and try different System and Intro to find the optimal setting for your bot
Model: We support all GPT models of OpenAI, which you can select based on the needs and use cases. Please consider the purpose and the estimated token count when selecting the model, as this can significantly impact costs. You can learn more about OpenAI models from this page. This setting will impact the parameters search_max_token
(tokens allocated for data sent to the model) and completion_token
(tokens allocated for the reply).
Note that intro
, system
, and query
have token costs that are not included in the token size allocation. The selected model's CONTEXT WINDOW
should cover the total token allocation. That is CONTEXT WINDOW
≥ search_max_token
+ completion_token
+ intro
+ system
+ query
.
You can use the use the endpoints in our sandbox under "MyGPT Profiles" to Check the profiles and update or delete them.
Default profile setup
Once you have setup the profile(s), decide one of the profiles that should be default and use our endpoint POST /v1/gpt/settings
to set your default profile.
with this, your GPT setup is completed.