Environment setup

To prepare GAIP for your site, there are 2 main steps

Mapping creation
Index creation

For this, we will use API endpoints of GAIP listed in our Sandbox

You can also access our sandbox from the project setting page.

Mapping creation

The personalization engine relies on specific default keys to operate effectively. To integrate your item catalog with our solution, you must align your website's data source keys (e.g., item_name, item_description, tags, ingredients, category) with the keys of the GAIP personalization engine through proper mapping. This alignment enables the personalizer to understand and process your data accurately. Following the schema correctly is crucial for the successful mapping and functioning of the personalization system.

Item Catalogue Mapping

To create mapping, use endpoints listed under Catalog Mapping in the sandbox.

GET /v1/mappers to get an existing Mapper.
PUT /v1/mappers to update an existing Mapper.
POST /v1/mappers to create a new mapper.

To set up mapping for project, use POST /v1/mappers. You can find keys, value types, and description with an example request body in the sandbox You can simply replace the values in the example with your item catalogue keys and hit Execute to finish the mapping of your product catalogue Keys with GAIP keys.

After execution, confirm the server response is success.

Here is a sample request body

{
  "key_map": {
    "item_id": "item_id",
    "parent_item_id": "string",
    "title": "title",
    "second_title": "string",
    "third_title": "string",
    "fourth_title": "string",
    "availability": "availability",
    "description": "string",
    "image_url": "string",
    "image_url_type": "STR or LIST_STR or LIST_DICT or DICT",
    "item_url": "string",
    "price": "price",
    "categories": [
      {
        "name": "category_1",
        "separator": "_"
      },
      {
        "name": "category_2",
        "separator": ""
      }
    ],
    "flag": [
      "condition"
    ],
    "average_rating": "string",
    "user_ratings_total": "string",
    "custom": [
      {
        "name": "colum3",
        "data_type": "FLOAT"
      },
      {
        "name": "column4",
        "data_type": "INT"
      }
    ],
    "item_nearby_calculation": false,
    "keywords_group_by": "string",
    "gpt_enabled": false,
    "search_settings": {
      "prioritize_category": "tags",
      "prioritize_values": [
        "value_1",
        "value_2"
      ],
      "prioritize_flags": {
        "flag1": true
      },
      "downgrade_values": [
        "value_1",
        "value_2"
      ],
      "cluster_size": 100,
      "is_keyword_enabled": true,
      "top_k": 1
    }
  }
}

Explanation of parameters

item_id: This is the unique identifier for each of your items in the catalog. This could be the item ID, item name, slug etc.
parent_item_id: The can be an ID that groups related items under a single identifier. Such as Different sizes or colors of a t-shirt, or different flavors of a beverage item.
title: This is the title of the products.
second_title: Secondary titles
third_title: Other title if applicable. Your products can have different attributes that you can map here. This might be useful later for generating recommendation or search results.
fourth_title: Refer to description of third_title
availability: This should map with the parameter that shows if an item is available or not.This has to be a boolean value.
description: Represents detail product description.
image_url: Represents image url of your data source. This is needed if you want to use image related end points such as image search or recommend based on image similarity.
image_url_type: STR or LIST_STR or LIST_DICT or DICT.
item_url: Represents your product or item details page url.
price: Represents the item price.
categories: Map the parameters that categorize your items. Multiple types of category types can be added here. This also will effect the dynamic filtering later in the search result.
flags: Only boolean values can be mapped here. Use this for flags in your catalog, such as items on sale, featured items, free items, discontinued items, new arrivals, or bestsellers.
average_rating: This represents the average rating of an item based on user reviews.
user_ratings_total : The total number of user ratings an item has received.
custom: Any int, float or string can be mapped here that was not covered.
item_nearby_calculation: Keep this false unless you are going to user the personalizer for location recommendation. This parameter helps with location based search (lat, long based)
keywords_group_by: The purpose of this is to generate groups of categories that has relationship with each other. The response of GET /v1/categories/keywords end point can be used to build dynamic navigation from the catalog data. A category key that is a parent of others should be set here.
gpt_enabled: True if you want GPT functionalities enabled in your project, otherwise, false.

Parameters under search_settings impacts how the POST v1/items/search endpoint will behave. Find explanation of each key below.

prioritize_category: If you want the search result to show certain items first, that is prioritize certain items, you can use this key, along with prioritize_values, prioritize_flags. First specify which key to consider for prioritization. Them pass the values under prioritize_values. To prioritize certain items with specific flags, use prioritize_flags. You can also use downgrade_values to de-prioritize certain tags
cluster_size: When someone search something, GAIP personalizer don't return the result from database directly. First the engine exclude, re-rank, filter etc. With cluster_size we decide a cluster of items from the database that we consider initially as result candidates. Default value is 100. With a higher value there will more items as result candidates but it will make the search speed slower. With a lower value it is opposite (Less candidates and speed is faster). It doesn’t have any impact in search quality.
is_keyword_enabled: If true, search is keyword based. If false, the engine will use GPT enabled NLP based search.
top_k: The value can be set between 1-20. With a higher value, the search engine will consider more items, potentially including less relevant ones, resulting in more creative outcomes. If the value is low, the search engine will be more strict, considering fewer but most relevant items.

Sample Code

You can find sample code for this implementation here

Once the mapper is created, you can use GET /v1/mappers endpoint to see the mapping. You can update any of mapped keys with PUT /v1/mapper endpoint and check the mapper you build from GET /v1/mapper endpoint.

User Behavior mapping

Similar to the item mapping key, there are some default keys for user behavior data.

Note

This step is required only If you want to save historical user behavior data through CSV files. If you use our data collection endpoints to collect data from now on, this is not required.

You can find the Endpoints for user mapping under "Historical User Data Collection" section in the Sandbox

To implement this, please follow similar steps as above.

However, in this case please note that there are four sets of endpoints for Browsing history, purchase history, rating history, user detail. You have to create mapper for each if you want to import the data.

Index creation

In this step, you need to create indices. We need multiple indices to run recommender solution successfully. These indices will create the necessary schemas to hold your data.

There are 3 endpoints here

POST /v1/index/create --> Create indices to hold your data
DELETE /v1/index/delete --> Delete indices
POST /v1/reindex --> Creates index with new mappings and settings and create alias for new index

Create Index

Request endpoint

POST /v1/index/create

Simply use your project key and API and click execute to create the indices for your project. Note that this will throw an error if the mapping in the previous step is not done correctly.

After the successful execution all the necessary index will be created and item index will be created in the background. You can check the status of item index creation with a task id from GET /v1/tasks/{task_id} API at the bottom of the page.

Please confirm the task was success.

Delete Index

You can delete an existing index with this endpoint.

Request endpoint

DELETE /v1/index/delete

Available values are items, image_features, browse, purchase, ratings, search, stats, settings, user, tasks, logs.

If you Delete any index, please ensure to create the index again, unless you will get error when trying to input data/item catalogue or run training.

Reindex

In Elastic search, reindexing is the process of copying data from one index to another, either within the same cluster or to a different cluster. This can be useful in a variety of situations, such as:

Updating the mapping of an index: If you need to make changes to the mapping of an index, you can create a new index with the updated mapping and then reindex the data from the old index to the new one.
Moving data from one index to another: If you need to move data from one index to another, you can reindex the data from the source index to the destination index.
Updating the data with new data: If you have updated data that you want to add to an index, you can reindex the data with the updated data.
Changing the shard count of an index: If you need to change the number of shards that an index is using, you can reindex the data to a new index with the desired number of shards.

We can use Reindex API to copy data from index to another index.

Request Endpoint:

POST /v1/reindex

Here is an example how to pass mappings and settings in request body:

{
  "index_type": "items",
  "mappings": {
    "settings": {
      "analysis": {
        "char_filter": {
          "normalize": {
            "type": "icu_normalizer",
            "name": "nfkc",
            "mode": "compose"
          }
        },
        "tokenizer": {
          "ja_kuromoji_tokenizer": {
            "mode": "search",
            "type": "kuromoji_tokenizer",
            "discard_compound_token": "true",
            "user_dictionary_rules": []
          },
          "ja_ngram_tokenizer": {
            "type": "ngram",
            "min_gram": 2,
            "max_gram": 3,
            "token_chars": [
              "letter",
              "digit"
            ]
          }
        },
        "filter": {
          "ja_index_synonym": {
            "type": "synonym",
            "lenient": "false",
            "synonyms": []
          }
        },
        "analyzer": {
          "ja_kuromoji_index_analyzer": {
            "type": "custom",
            "char_filter": [
              "normalize"
            ],
            "tokenizer": "ja_kuromoji_tokenizer",
            "filter": [
              "kuromoji_baseform",
              "kuromoji_part_of_speech",
              "ja_index_synonym",
              "cjk_width",
              "ja_stop",
              "kuromoji_stemmer",
              "lowercase"
            ]
          },
          "ja_kuromoji_search_analyzer": {
            "type": "custom",
            "char_filter": [
              "normalize"
            ],
            "tokenizer": "ja_kuromoji_tokenizer",
            "filter": [
              "kuromoji_baseform",
              "kuromoji_part_of_speech",
              "cjk_width",
              "ja_stop",
              "kuromoji_stemmer",
              "lowercase"
            ]
          },
          "ja_ngram_index_analyzer": {
            "type": "custom",
            "char_filter": [
              "normalize"
            ],
            "tokenizer": "ja_ngram_tokenizer",
            "filter": [
              "lowercase"
            ]
          },
          "ja_ngram_search_analyzer": {
            "type": "custom",
            "char_filter": [
              "normalize"
            ],
            "tokenizer": "ja_ngram_tokenizer",
            "filter": [
              "lowercase"
            ]
          }
        }
      }
    },
    "mappings": {
      "properties": {
        "item": {
          "properties": {
            "{title}": {
              "type": "text",
              "search_analyzer": "ja_kuromoji_search_analyzer",
              "analyzer": "ja_kuromoji_index_analyzer",
              "fields": {
                "ngram": {
                  "type": "text",
                  "search_analyzer": "ja_ngram_search_analyzer",
                  "analyzer": "ja_ngram_index_analyzer"
                }
              }
            },
            "{second_title}": {
              "type": "text",
              "search_analyzer": "ja_kuromoji_search_analyzer",
              "analyzer": "ja_kuromoji_index_analyzer",
              "fields": {
                "ngram": {
                  "type": "text",
                  "search_analyzer": "ja_ngram_search_analyzer",
                  "analyzer": "ja_ngram_index_analyzer"
                }
              }
            },
            "{third_title}": {
              "type": "text",
              "search_analyzer": "ja_kuromoji_search_analyzer",
              "analyzer": "ja_kuromoji_index_analyzer",
              "fields": {
                "ngram": {
                  "type": "text",
                  "search_analyzer": "ja_ngram_search_analyzer",
                  "analyzer": "ja_ngram_index_analyzer"
                }
              }
            },
            "{description}": {
              "type": "text",
              "search_analyzer": "ja_kuromoji_search_analyzer",
              "analyzer": "ja_kuromoji_index_analyzer",
              "fields": {
                "ngram": {
                  "type": "text",
                  "search_analyzer": "ja_ngram_search_analyzer",
                  "analyzer": "ja_ngram_index_analyzer"
                }
              }
            },
            "{price}": {
              "type": "float"
            },
            "{availability}": {
              "type": "boolean"
            }
          }
        }
      }
    }
  }
}

Available values are items, image_features, browse, purchase, ratings, search, stats, settings, user, tasks, logs. When you define the mappings object, you should use the same keys as in the item mapper that you have built with POST /v1/mapper API.

You might not need analyzers or tokenizers for all indices. You can keep the settings field empty if it is not required. Here is an example,

{
  "index_type": "search",
  "mappings": {
    "settings": {},
    "mappings": {
      "properties": {
        "date": {
          "type": "date"
        }
      }
    }
  }
}