Environment Setup
There are 2 main steps for Personalizer environment setup.
- Mapping creation for product catalog
- Mapping creation for user behavior data (Optional)
- Index creation
Below, we will go through each step in detail.
Mapping creation product catalog
What is mapping?
The personalization engine relies on specific default keys to operate effectively. To integrate your item catalog with our solution, you must align your website's data source keys (e.g., item_name, item_description, tags, ingredients, category) with the keys of the GAIP personalization engine through mapping. This alignment enables the personalizer to understand and process your data accurately. Following the schema correctly is crucial for the successful mapping and functioning of the personalization system.
Endpoints
To create, update and view the mapping, use endpoints listed under Catalog Mapping
in the
sandbox.
GET /v1/mappers
gets an existing Mapper.PUT /v1/mappers
updates an existing Mapper.POST /v1/mappers
creates a new mapper.
Creating a new mapper
To set up mapping for project, use the endpoint POST /v1/mappers
.
Here is an example request body below:
{
"key_map": {
"item_id": "item_id",
"parent_item_id": "string",
"title": "title",
"second_title": "string",
"third_title": "string",
"fourth_title": "string",
"availability": "availability",
"description": "string",
"image_url": "string",
"image_url_type": "STR or LIST_STR or LIST_DICT or DICT",
"item_url": "string",
"price": "price",
"categories": [
{
"name": "category_1",
"separator": "_"
},
{
"name": "category_2",
"separator": ""
}
],
"flag": [
"condition"
],
"average_rating": "string",
"user_ratings_total": "string",
"custom": [
{
"name": "colum3",
"data_type": "FLOAT"
},
{
"name": "column4",
"data_type": "INT"
}
],
"item_nearby_calculation": false,
"keywords_group_by": "string",
"gpt_enabled": false,
"search_settings": {
"prioritize_key": "title",
"prioritize_category": "tags",
"prioritize_values": [
"value_1",
"value_2"
],
"prioritize_flags": {
"flag1": true
},
"downgrade_values": [
"value_1",
"value_2"
],
"keyword_ngram": 1,
"depth": 3,
"cluster_size": 100,
"is_keyword_enabled": true,
"top_k": 1
}
}
}
Lets go through the parameters:
- item_id: This is the unique identifier for each of your items in the catalog. This could be the item ID, item name, slug etc.
- parent_item_id: The can be an ID that groups related items under a single identifier. Such as Different sizes or colors of a t-shirt, or different flavors of a beverage item.
- title: This is the title of the products.
- second_title: Secondary titles
- third_title: Other title if applicable. Your products can have different attributes that you can map here. This might be useful later for generating recommendation or search results.
- fourth_title: Refer to description of third_title
- availability: This should map with the parameter that shows if an item is available or not.This has to be a boolean value.
- description: Represents detail product description.
- image_url: Represents image url of your data source. This is needed if you want to use image related end points such as image search or recommend based on image similarity.
- image_url_type: STR or LIST_STR or LIST_DICT or DICT.
- item_url: Represents your product or item details page url.
- price: Represents the item price.
- categories: Defines parameters for categorizing items. Multiple category types can be added, impacting dynamic filtering in search results. This parameter accepts values as
List[Dict[str, str]]
. - flags: Only boolean values can be mapped here. Use this for flags in your catalog, such as items on sale, featured items, free items, discontinued items, new arrivals, or bestsellers.
- average_rating: This represents the average rating of an item based on user reviews.
-
user_ratings_total : The total number of user ratings an item has received.
-
custom: Any int, float or string can be mapped here that was not covered.
- item_nearby_calculation: Keep this false unless you are going to user the personalizer for location recommendation. This parameter helps with location based search (lat, long based)
- keywords_group_by: The purpose of this is to generate groups of categories that has relationship with each other. The response of
GET /v1/categories/keywords
end point can be used to build dynamic navigation from the catalog data. A category key that is a parent of others should be set here. - gpt_enabled: True if you want GPT functionalities enabled in your project, otherwise, false.
Parameters under search_settings
impacts how the POST v1/items/search
endpoint will behave. Find explanation of each key below.
- prioritize_key: Here you can define which key should be prioritized when searching for items. For example, if you want the search engine to focus on item
titles
, then you can passtitles
here, given that is the key in your catalog with item titles. - prioritize_category: Similar to above, here you can specify which category search engine should prioritize when searching for items. Fo example you can have categories in different languages, but
caegoryNameJP
is the the one you want to prioritize, then you can pass it here. - prioritize_values: Define here if you want the search engine to prioritize certain values from the search query.
- prioritize_flags: If you want certain flags to be prioritized in the search result, use this. For example, you want to show items that are on sale, and those items has a flag
sale
:true
then you can pass this here to prioritize those items. - downgrade_values: The opposite of what
prioritize_values
, define here if you want to de-prioritize certain value from the search query.- For example, your website only sells jackets,but of different kinds (summer, winter, designer, casual, party etc.). In this case, you might want to "downgrade" the keyword
jacket
from your search queries, to give more accurate search results. Now if a user search "jacket for winter party", the search engine will prioritize "winter" and "party".
- For example, your website only sells jackets,but of different kinds (summer, winter, designer, casual, party etc.). In this case, you might want to "downgrade" the keyword
- keyword_ngram: Here you can define range of words (1 ~ 3) for keyword definitions. Based on these settings, the model will identify the top keywords from a query, allowing for more flexible and accurate extractions. For example, for the sentence "I love natural language processing."
- If value is 1: ["I", "love", "natural", "language", "processing"]
- If value is 2: ["I love", "love natural", "natural language", "language processing"]
- If values is 3: ["I love natural", "love natural language", "natural language processing"]
- Each n-gram represents a sequence of words from the sentence based on the specified value of n.
- depth: This value can be between 1~3. For small number of data, higher depth (3) might be better. For high number of items, lower value/depth (1) is recommended
- Explanation: When your catalog has small number of items, similar items are less likely to be clustered together and a multi-layered search (higher depth) is more likely to give better result. Whereas for large number of items, it is more likely that similar items will be clustered together, and a lower depth search could give better results.
- cluster_size:
When someone search something, GAIP personalizer engine exclude, re-rank, filter etc. before giving the result. With
cluster_size
we decide a cluster of items from the database that we consider initially as result candidates. Default value is 100. With a higher value there will more items as result candidates but it will make the search speed slower. With a lower value it is opposite (Less candidates and speed is faster). It doesn’t have any impact in search quality. - is_keyword_enabled: If true, search is keyword based. If false, the engine will use GPT enabled NLP based search.
- top_k: The value can be set between 1-20. With a higher value, the search engine will consider more items, potentially including less relevant ones, resulting in more creative outcomes. If the value is low, the search engine will be more strict, considering fewer but most relevant items.
Find moe about this endpoints in our API documentation here
Once the mapper is created, you can use GET /v1/mappers
endpoint to view the mapping. You can update any of mapped keys with PUT /v1/mapper
endpoint
User Behavior mapping
Similar to the item mapping key, there are some default keys for user behavior data.
Note that this step is required only If you want to save historical user behavior data through CSV files. If you use our data collection endpoints to collect data from now on, this is not required.
You can find the Endpoints for user mapping under "Historical User Data Collection" section in the Sandbox
To implement this, please follow similar steps as above.
However, in this case please note that there are four sets of endpoints for Browsing history, purchase history, rating history, user detail. You have to create mapper for each if you want to import the data.
Index creation
Index are the data organizing mechanism which are similar to the database of relational database system.
In this step, you need to create indices. We need multiple indices to run recommender solution successfully. These indices will create the necessary schemas to hold your data.
There are 3 endpoints here
-
POST /v1/index/create
--> Create indices to hold your data -
DELETE /v1/index/delete
--> Delete indices -
POST /v1/reindex
--> Creates index with new mappings and settings and create alias for new index
Create Index
Use the endpoint POST /v1/index/create
to create indices for your project.
Simply use your project key and API and click execute
to create the indices for your project. Note that this will throw an error if the mapping in the previous step is not done correctly. This endpoint will create indices that are required for your project.
After the successful execution, all the necessary index/indices will be created. You might see a message in the response saying that the item_index is being created in the background and will give you a background task_id. You can check the status of item index creation with the task_id using the
GET /v1/tasks/{task_id}
endpoint.
Please confirm the task was success.
Delete Index
You can delete existing index or indices with this endpoint.
Request endpoint DELETE /v1/index/delete
.
Available values items
, image_features
, browse
, purchase
, ratings
, search
, stats
, settings
, user
, tasks
, logs
, gpt_dataset
, gpt_dataset_meta
, questionnaire
, questionnaire_mapper
, questionnaire_request
".
For example, if you only want to delete the items
index, your request body should look like this
{
"index_type": "items"
}
If you Delete any index, please ensure to re-create the index again.
Reindex
Reindexing refers to the process of copying documents from one index to another. This process can include filtering source documents based on a specific query or retrieving documents from a remote cluster. Reindexing allows you to modify the settings and mappings of the destination index.
In Elasticsearch, reindexing is the process of copying data from one index to another, either within the same cluster or to a different cluster. This is useful in various scenarios, such as:
- Updating index mappings: Create a new index with updated mappings and reindex data from the old index to the new one.
- Moving data between indexes: Reindex data from a source index to a destination index.
- Adding updated data: Reindex with updated data to an existing index.
- Changing shard count: Reindex data to a new index with the desired shard count.
We can use Reindex API to copy data from index to another index.
Request Endpoint:
POST /v1/reindex
Here is an example how to pass mappings and settings in request body:
{
"index_type": "items",
"mappings": {
"settings": {
"analysis": {
"char_filter": {
"normalize": {
"type": "icu_normalizer",
"name": "nfkc",
"mode": "compose"
}
},
"tokenizer": {
"ja_kuromoji_tokenizer": {
"mode": "search",
"type": "kuromoji_tokenizer",
"discard_compound_token": "true",
"user_dictionary_rules": []
},
"ja_ngram_tokenizer": {
"type": "ngram",
"min_gram": 2,
"max_gram": 3,
"token_chars": [
"letter",
"digit"
]
}
},
"filter": {
"ja_index_synonym": {
"type": "synonym",
"lenient": "false",
"synonyms": []
}
},
"analyzer": {
"ja_kuromoji_index_analyzer": {
"type": "custom",
"char_filter": [
"normalize"
],
"tokenizer": "ja_kuromoji_tokenizer",
"filter": [
"kuromoji_baseform",
"kuromoji_part_of_speech",
"ja_index_synonym",
"cjk_width",
"ja_stop",
"kuromoji_stemmer",
"lowercase"
]
},
"ja_kuromoji_search_analyzer": {
"type": "custom",
"char_filter": [
"normalize"
],
"tokenizer": "ja_kuromoji_tokenizer",
"filter": [
"kuromoji_baseform",
"kuromoji_part_of_speech",
"cjk_width",
"ja_stop",
"kuromoji_stemmer",
"lowercase"
]
},
"ja_ngram_index_analyzer": {
"type": "custom",
"char_filter": [
"normalize"
],
"tokenizer": "ja_ngram_tokenizer",
"filter": [
"lowercase"
]
},
"ja_ngram_search_analyzer": {
"type": "custom",
"char_filter": [
"normalize"
],
"tokenizer": "ja_ngram_tokenizer",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"properties": {
"item": {
"properties": {
"{title}": {
"type": "text",
"search_analyzer": "ja_kuromoji_search_analyzer",
"analyzer": "ja_kuromoji_index_analyzer",
"fields": {
"ngram": {
"type": "text",
"search_analyzer": "ja_ngram_search_analyzer",
"analyzer": "ja_ngram_index_analyzer"
}
}
},
"{second_title}": {
"type": "text",
"search_analyzer": "ja_kuromoji_search_analyzer",
"analyzer": "ja_kuromoji_index_analyzer",
"fields": {
"ngram": {
"type": "text",
"search_analyzer": "ja_ngram_search_analyzer",
"analyzer": "ja_ngram_index_analyzer"
}
}
},
"{third_title}": {
"type": "text",
"search_analyzer": "ja_kuromoji_search_analyzer",
"analyzer": "ja_kuromoji_index_analyzer",
"fields": {
"ngram": {
"type": "text",
"search_analyzer": "ja_ngram_search_analyzer",
"analyzer": "ja_ngram_index_analyzer"
}
}
},
"{description}": {
"type": "text",
"search_analyzer": "ja_kuromoji_search_analyzer",
"analyzer": "ja_kuromoji_index_analyzer",
"fields": {
"ngram": {
"type": "text",
"search_analyzer": "ja_ngram_search_analyzer",
"analyzer": "ja_ngram_index_analyzer"
}
}
},
"{price}": {
"type": "float"
},
"{availability}": {
"type": "boolean"
}
}
}
}
}
}
}
Available values are items
, image_features
, browse
, purchase
, ratings
, search
, stats
, settings
, user
, tasks
, logs
. When you define the mappings object, you should use the same keys as in the item mapper that you have built with POST /v1/mapper
API.
You might not need analyzers or tokenizers for all indices. You can keep the settings
field empty if it is not required. Here is an example,
{
"index_type": "search",
"mappings": {
"settings": {},
"mappings": {
"properties": {
"date": {
"type": "date"
}
}
}
}
}