Environment Setup

There are 2 main steps for Personalizer environment setup.

Mapping creation for product catalog
Mapping creation for user behavior data (Optional)
Index creation

Below, we will go through each step in detail.

Mapping creation product catalog

What is mapping?

The personalization engine relies on specific default keys to operate effectively. To integrate your item catalog with our solution, you must align your website's data source keys (e.g., item_name, item_description, tags, ingredients, category) with the keys of the GAIP personalization engine through mapping. This alignment enables the personalizer to understand and process your data accurately. Following the schema correctly is crucial for the successful mapping and functioning of the personalization system.

Endpoints

To create, update and view the mapping, use endpoints listed under Catalog Mapping in the sandbox.

GET /v1/mappers gets an existing Mapper.
PUT /v1/mappers updates an existing Mapper.
POST /v1/mappers creates a new mapper.

Creating a new mapper

To set up mapping for project, use the endpoint POST /v1/mappers. Here is an example request body below:

{
  "key_map": {
    "item_id": "item_id",
    "parent_item_id": "string",
    "title": "title",
    "second_title": "string",
    "third_title": "string",
    "fourth_title": "string",
    "availability": "availability",
    "description": "string",
    "image_url": "string",
    "image_url_type": "STR or LIST_STR or LIST_DICT or DICT",
    "item_url": "string",
    "price": "price",
    "categories": [
      {
        "name": "category_1",
        "separator": "_"
      },
      {
        "name": "category_2",
        "separator": ""
      }
    ],
    "flag": [
      "condition"
    ],
    "average_rating": "string",
    "user_ratings_total": "string",
    "custom": [
      {
        "name": "colum3",
        "data_type": "FLOAT"
      },
      {
        "name": "column4",
        "data_type": "INT"
      }
    ],
    "item_nearby_calculation": false,
    "keywords_group_by": "string",
    "gpt_enabled": false,
    "search_settings": {
      "prioritize_key": "title",
      "prioritize_category": "tags",
      "prioritize_values": [
        "value_1",
        "value_2"
      ],
      "prioritize_flags": {
        "flag1": true
      },
      "downgrade_values": [
        "value_1",
        "value_2"
      ],
      "keyword_ngram": 1,
      "depth": 3,
      "cluster_size": 100,
      "is_keyword_enabled": true,
      "top_k": 1,
      "is_backfill": false,
      "is_context_aware": true

    }
  }
}

Lets go through the parameters:

item_id: This is the unique identifier for each of your items in the catalog. This could be the item ID, item name, slug etc.
parent_item_id: The can be an ID that groups related items under a single identifier. Such as Different sizes or colors of a t-shirt, or different flavors of a beverage item.
title: This is the title of the products.
second_title: Secondary titles
third_title: Other title if applicable. Your products can have different attributes that you can map here. This might be useful later for generating recommendation or search results.
fourth_title: Refer to description of third_title
availability: This should map with the parameter that shows if an item is available or not.This has to be a boolean value.
description: Represents detail product description.
image_url: Represents image url of your data source. This is needed if you want to use image related end points such as image search or recommend based on image similarity.
image_url_type: STR or LIST_STR or LIST_DICT or DICT.
item_url: Represents your product or item details page url.
price: Represents the item price.
categories: Defines parameters for categorizing items. Multiple category types can be added, impacting dynamic filtering in search results. This parameter accepts values as List[Dict[str, str]].
flags: Only boolean values can be mapped here. Use this for flags in your catalog, such as items on sale, featured items, free items, discontinued items, new arrivals, or bestsellers.
average_rating: This represents the average rating of an item based on user reviews.
user_ratings_total : The total number of user ratings an item has received.
custom: Any int, float or string can be mapped here that was not covered.
item_nearby_calculation: Keep this false unless you are going to user the personalizer for location recommendation. This parameter helps with location based search (lat, long based)
keywords_group_by: The purpose of this is to generate groups of categories that has relationship with each other. The response of GET /v1/categories/keywords end point can be used to build dynamic navigation from the catalog data. A category key that is a parent of others should be set here.
gpt_enabled: True if you want GPT functionalities enabled in your project, otherwise, false.

Parameters under search_settings impacts how the POST v1/items/search endpoint will behave. Find explanation of each key below.

prioritize_key: Here you can define which key should be prioritized when searching for items. For example, if you want the search engine to focus on item titles, then you can pass titles here, given that is the key in your catalog with item titles.
prioritize_category: Similar to above, here you can specify which category search engine should prioritize when searching for items. Fo example you can have categories in different languages, but caegoryNameJP is the the one you want to prioritize, then you can pass it here.
prioritize_values: Define here if you want the search engine to prioritize certain values from the search query.
prioritize_flags: If you want certain flags to be prioritized in the search result, use this. For example, you want to show items that are on sale, and those items has a flag sale:true then you can pass this here to prioritize those items.
downgrade_values: The opposite of what prioritize_values, define here if you want to de-prioritize certain value from the search query.
- For example, your website only sells jackets,but of different kinds (summer, winter, designer, casual, party etc.). In this case, you might want to "downgrade" the keyword jacket from your search queries, to give more accurate search results. Now if a user search "jacket for winter party", the search engine will prioritize "winter" and "party".
keyword_ngram: Here you can define range of words (1 ~ 3) for keyword definitions. Based on these settings, the model will identify the top keywords from a query, allowing for more flexible and accurate extractions. For example, for the sentence "I love natural language processing."
- If value is 1: ["I", "love", "natural", "language", "processing"]
- If value is 2: ["I love", "love natural", "natural language", "language processing"]
- If values is 3: ["I love natural", "love natural language", "natural language processing"]
- Each n-gram represents a sequence of words from the sentence based on the specified value of n.
depth: This value can be between 1~3. For small number of data, higher depth (3) might be better. For high number of items, lower value/depth (1) is recommended
- Explanation: When your catalog has small number of items, similar items are less likely to be clustered together and a multi-layered search (higher depth) is more likely to give better result. Whereas for large number of items, it is more likely that similar items will be clustered together, and a lower depth search could give better results.
cluster_size: When someone search something, GAIP personalizer engine exclude, re-rank, filter etc. before giving the result. With cluster_size we decide a cluster of items from the database that we consider initially as result candidates. Default value is 100. With a higher value there will more items as result candidates but it will make the search speed slower. With a lower value it is opposite (Less candidates and speed is faster). It doesn’t have any impact in search quality.
is_keyword_enabled: If true, search is keyword based. If false, the engine will use GPT enabled NLP based search.
top_k: The value can be set between 1-20. With a higher value, the search engine will consider more items, potentially including less relevant ones, resulting in more creative outcomes. If the value is low, the search engine will be more strict, considering fewer but most relevant items.
is_backfill: (Optional[bool]) Specifies whether to include backfill items. Note that backfill will not function if depth is set to 1. To enable backfill, depth must be set to 3.
is_context_aware: (Optional[bool]) Enables context-aware search. When set to false, contextual search is skipped, significantly improving search speed.

Find moe about this endpoints in our API documentation here

Once the mapper is created, you can use GET /v1/mappers endpoint to view the mapping. You can update any of mapped keys with PUT /v1/mapper endpoint

User Behavior mapping

Similar to the item mapping key, there are some default keys for user behavior data.

Note that this step is required only If you want to save historical user behavior data through CSV files. If you use our data collection endpoints to collect data from now on, this is not required.

You can find the Endpoints for user mapping under "Historical User Data Collection" section in the Sandbox

To implement this, please follow similar steps as above.

However, in this case please note that there are four sets of endpoints for Browsing history, purchase history, rating history, user detail. You have to create mapper for each if you want to import the data.

Index creation

Index are the data organizing mechanism which are similar to the database of relational database system.

In this step, you need to create indices. We need multiple indices to run recommender solution successfully. These indices will create the necessary schemas to hold your data.

There are 3 endpoints here

POST /v1/index/create --> Create indices to hold your data
DELETE /v1/index/delete --> Delete indices
POST /v1/reindex --> Creates index with new mappings and settings and create alias for new index

Create Index

Use the endpoint POST /v1/index/create to create indices for your project.

Simply use your project key and API and click execute to create the indices for your project. Note that this will throw an error if the mapping in the previous step is not done correctly. This endpoint will create indices that are required for your project.

After the successful execution, all the necessary index/indices will be created. You might see a message in the response saying that the item_index is being created in the background and will give you a background task_id. You can check the status of item index creation with the task_id using the GET /v1/tasks/{task_id} endpoint.

Please confirm the task was success.

Delete Index

You can delete existing index or indices with this endpoint.

Request endpoint DELETE /v1/index/delete.

Available values items, image_features, browse, purchase, ratings, search, stats, settings, user, tasks, logs, gpt_dataset, gpt_dataset_meta, questionnaire, questionnaire_mapper, questionnaire_request".

For example, if you only want to delete the items index, your request body should look like this

{
  "index_type": "items"
}

If you Delete any index, please ensure to re-create the index again.

Reindex

Reindexing refers to the process of copying documents from one index to another. This process can include filtering source documents based on a specific query or retrieving documents from a remote cluster. Reindexing allows you to modify the settings and mappings of the destination index.

In Elasticsearch, reindexing is the process of copying data from one index to another, either within the same cluster or to a different cluster. This is useful in various scenarios, such as:

Updating index mappings: Create a new index with updated mappings and reindex data from the old index to the new one.
Moving data between indexes: Reindex data from a source index to a destination index.
Adding updated data: Reindex with updated data to an existing index.
Changing shard count: Reindex data to a new index with the desired shard count.

We can use Reindex API to copy data from index to another index.

Request Endpoint:

POST /v1/reindex

Here is an example how to pass mappings and settings in request body:

{
  "index_type": "items",
  "mappings": {
    "settings": {
      "analysis": {
        "char_filter": {
          "normalize": {
            "type": "icu_normalizer",
            "name": "nfkc",
            "mode": "compose"
          }
        },
        "tokenizer": {
          "ja_kuromoji_tokenizer": {
            "mode": "search",
            "type": "kuromoji_tokenizer",
            "discard_compound_token": "true",
            "user_dictionary_rules": []
          },
          "ja_ngram_tokenizer": {
            "type": "ngram",
            "min_gram": 2,
            "max_gram": 3,
            "token_chars": [
              "letter",
              "digit"
            ]
          }
        },
        "filter": {
          "ja_index_synonym": {
            "type": "synonym",
            "lenient": "false",
            "synonyms": []
          }
        },
        "analyzer": {
          "ja_kuromoji_index_analyzer": {
            "type": "custom",
            "char_filter": [
              "normalize"
            ],
            "tokenizer": "ja_kuromoji_tokenizer",
            "filter": [
              "kuromoji_baseform",
              "kuromoji_part_of_speech",
              "ja_index_synonym",
              "cjk_width",
              "ja_stop",
              "kuromoji_stemmer",
              "lowercase"
            ]
          },
          "ja_kuromoji_search_analyzer": {
            "type": "custom",
            "char_filter": [
              "normalize"
            ],
            "tokenizer": "ja_kuromoji_tokenizer",
            "filter": [
              "kuromoji_baseform",
              "kuromoji_part_of_speech",
              "cjk_width",
              "ja_stop",
              "kuromoji_stemmer",
              "lowercase"
            ]
          },
          "ja_ngram_index_analyzer": {
            "type": "custom",
            "char_filter": [
              "normalize"
            ],
            "tokenizer": "ja_ngram_tokenizer",
            "filter": [
              "lowercase"
            ]
          },
          "ja_ngram_search_analyzer": {
            "type": "custom",
            "char_filter": [
              "normalize"
            ],
            "tokenizer": "ja_ngram_tokenizer",
            "filter": [
              "lowercase"
            ]
          }
        }
      }
    },
    "mappings": {
      "properties": {
        "item": {
          "properties": {
            "{title}": {
              "type": "text",
              "search_analyzer": "ja_kuromoji_search_analyzer",
              "analyzer": "ja_kuromoji_index_analyzer",
              "fields": {
                "ngram": {
                  "type": "text",
                  "search_analyzer": "ja_ngram_search_analyzer",
                  "analyzer": "ja_ngram_index_analyzer"
                }
              }
            },
            "{second_title}": {
              "type": "text",
              "search_analyzer": "ja_kuromoji_search_analyzer",
              "analyzer": "ja_kuromoji_index_analyzer",
              "fields": {
                "ngram": {
                  "type": "text",
                  "search_analyzer": "ja_ngram_search_analyzer",
                  "analyzer": "ja_ngram_index_analyzer"
                }
              }
            },
            "{third_title}": {
              "type": "text",
              "search_analyzer": "ja_kuromoji_search_analyzer",
              "analyzer": "ja_kuromoji_index_analyzer",
              "fields": {
                "ngram": {
                  "type": "text",
                  "search_analyzer": "ja_ngram_search_analyzer",
                  "analyzer": "ja_ngram_index_analyzer"
                }
              }
            },
            "{description}": {
              "type": "text",
              "search_analyzer": "ja_kuromoji_search_analyzer",
              "analyzer": "ja_kuromoji_index_analyzer",
              "fields": {
                "ngram": {
                  "type": "text",
                  "search_analyzer": "ja_ngram_search_analyzer",
                  "analyzer": "ja_ngram_index_analyzer"
                }
              }
            },
            "{price}": {
              "type": "float"
            },
            "{availability}": {
              "type": "boolean"
            }
          }
        }
      }
    }
  }
}

Available values are items, image_features, browse, purchase, ratings, search, stats, settings, user, tasks, logs. When you define the mappings object, you should use the same keys as in the item mapper that you have built with POST /v1/mapper API.

You might not need analyzers or tokenizers for all indices. You can keep the settings field empty if it is not required. Here is an example,

{
  "index_type": "search",
  "mappings": {
    "settings": {},
    "mappings": {
      "properties": {
        "date": {
          "type": "date"
        }
      }
    }
  }
}