Analytics (a.k.a. KPIs)¶
This document presents analytics-related logic. KPIs (Key Performance Indicators) are the subset of analytics that are of special interest to our customers.
Pipeline¶
The various steps are presented, followed by a rather complete example of the pipeline.
Task Preparation¶
Before any analytics task can be run, the API needs to have some information in the database’s metadata and a table’s metadata.
Database Metadata¶
The main piece of information is the interaction types (which are specific to each database), and how they relate to the analytics to calculate.
Each Interaction Types Group (ITG)
regroups all interaction types that play a similar role with regards to analytics.
For example, an e-commerce could have two types of interactions that we would interpret the same way:
purchased
and bought
.
We’d add both interaction types to the ITG ecommerce:purchase
on which analytics regarding purchases are calculated.
All existing ITGs are presented in this table:
ITG |
Description |
---|---|
|
Click |
|
Purchase |
|
Adding to cart |
|
Removing from cart |
|
Initial state of a
conversion (ex:
|
|
Final state of a conversion (ex: purchase on same item) |
Updating the database analytics metadata with the endpoint PATCH databases/current/
,
the keys related to an analytics task are:
groups_interactions_types
(required): an object string/list-of-strings where keys are related to the analytics we can calculate, and values are interaction-types present in the database. Every key is optionalitem_property_price
(optional): in case an interaction type corresponds to a purchase and we have the price of the items as an item-property, this key is the item-property’s name. This item-property should be numerical and non-repeated.session_duration
(optional): number of second a session consists of. Has to be between 30 and 172800 seconds (2 days).
Table Metadata¶
Analytics reports are stored in a pre-created table with 'LOCAL'
connection using the endpoint
PATCH connections/<str:connection>/tables/<str:name>/
.
This table may contain analytics metadata used during the analytics calculation task:
segmentation_id
(optional): the ID of a segmentation (optional); see Analytics Segmentation (a.k.a Split) for more.start_timestamp
(optional): timestamp before which interactions are skipped in the analytics calculation.end_timestamp
(optional): timestamp after which interactions are skipped in the analytics calculation.
Task Trigger¶
Analytics calculation background tasks are triggered using the endpoint POST tasks/private/analytics_report_export/
;
the table must be specified.
Report Fetching¶
To list all analytics_report
tables and their metadata, use the endpoint GET tables/
.
When the background task is completed, the analytics reports can be obtained using the endpoint
GET storage-signed-urls/local-tables/<str:resource>/download/
with resource analytics_report
.
Example¶
Here is an example of the full pipeline, using the default segmentation as an example.
Let’s update the database metadata, knowing that the current database has:
for interactions types:
clicked
,bought
,purchased
,remove_from_cart
,added_to_cart
,view_recommended
;for item-property containing the price of items:
product_price
;user sessions lasting approximately 3 minutes.
UPDATE DATABASE’S ANALYTICS METADATA¶PATCH databases/current/ HTTP/1.1 Content-Type: text/javascript { "metadata": { "analytics": { "groups_interactions_types": { "ecommerce:click": ["clicked"], "ecommerce:purchase": ["bought", "purchased"], "ecommerce:remove_from_cart": ["remove_from_cart"], "ecommerce:add_to_cart": ["added_to_cart"], "conversion:from": ["view_recommended"], "conversion:to": ["purchased", "bought"] }, "item_property_price": "product_price", "session_duration": 180 } }, "preserve": true }
Let’s enforce that the segmentation used is the default one by creating a new segmentation:
Let’s create a table with this segmentation, and specify some time-frame for the interactions to consider:
CREATE TABLE WITH METADATA¶PATCH connections/LOCAL/tables/my_table_1.json/ HTTP/1.1 Content-Type: text/javascript { "resource": "analytics_report", "resource_params": {"columns_mapping": {}}, "connection_table_params": {"file_format": "dat", "options": {}}, "connection_table_metadata": { "analytics": { "segmentation_id": "seg123", "start_timestamp": 12345.0, "end_timestamp": 54321.0 } } }
Let’s trigger the task corresponding to this table:
TRIGGER ANALYTICS TASK¶POST tasks/private/analytics_report_export/ HTTP/1.1 Content-Type: text/javascript { "context_task": { "output_table": {"connection": "LOCAL", "name": "my_table_1.json"} } }
Later, let’s list the tables to retrieve the name we forgot:
LIST ANALYTICS_REPORT TABLES¶GET tables/&connection=LOCAL&resource=analytics_report HTTP/1.1 Content-Type: text/javascript { "has_next": false, "next_page": 2, "tables": [ { "name": "my_table_1.json", "connection": "LOCAL", "resource": "analytics_report", "status": "available", "created_timestamp": 123456789, "modified_timestamp": 123456789, "connection_table_params": { "file_format": "dat", "options": {} }, "resource_params": { "columns_mapping": {} }, "connection_table_metadata": { "segmentation_id": "seg123", "start_timestamp": 12345.0, "end_timestamp": 54321.0 } } ] }
Using this table name, let’s generate a short-lived URL to fetch the report:
GET URL TO THE ANALYTICS REPORT¶GET storage-signed-urls/local-tables/analytics_report/download/&name=my_table_1.json HTTP/1.1 Content-Type: text/javascript { "url": "https://stora...com/analytics_report/Up131.../my_table_1.json?X-Goog..." }
The JSON report can then be read in a browser, or fetched using a request.
[WIP migrate documentation on how to use the API from Confluence to here]
Analytics Metrics¶
A minimalistic report (without segmentation given to the analytics calculation task) has the following structure:
{
"segmentation": {
"type": "analytics",
"analytics": [
{"name": "<analytics_name>", "value": 123.0}
]
},
"database_metadata_snapshot": {
"groups_interactions_types": {
"<ITG>": ["<interaction_type>"]
},
"item_property_price": "<item_property>",
"session_duration": 12.0
}
}
The database metadata snapshot is added to facilitate the verification of analytics; for brevity we skip it in the following examples.
Analytics, in the above list of name/value maps, come in 3 flavors:
a singly-defined analytics, such as
users_count
an analytics bearing various statistics. These analytics generate multiple numbers, named following the pattern
'{statistic}_{analytics_name}'
wherestatistic
belongs tomin, max, mean, std, sum
andanalytics_name
belongs toaverage_purchase_value_per_user, cart_abandonment_rate_per_user, interaction_count_per_user, purchase_of_missing_item_count, purchase_value_per_user, remove_from_cart_rate_per_user
an analytics with parameter, named
'{parameter}param_{analytics_name}'
. For example, number of days over which we check customer retention (this analytics is not available yet, but is provided for the example)
As many analytics as possible are calculated. If an analytics should be infinite or NaN, it is removed from the list. This includes analytics that cannot be calculated (such as purchases, when items don’t have a price property).
{
"segmentation": {
"type": "analytics",
"analytics": [
{"name": "mean_purchase_value_per_user", "value": 60.0},
{"name": "max_purchase_value_per_user", "value": 110.0},
{"name": "min_purchase_value_per_user", "value": 10.0},
{"name": "std_purchase_value_per_user", "value": 50.0},
{"name": "sum_purchase_value_per_user", "value": 120.0},
{"name": "users_count", "value": 2}
]
}
}
Analytics Segmentation (a.k.a Split)¶
It is possible to segment (a.k.a split) analytics by group of users, such as users satisfying certain conditions, or users in the respective groups A or B from an A/B test.
Splits can be nested in any order. This means that segmenting analytics can be arbitrarily complex.
Analytics segmentation parameters are first created using the endpoint POST analytics/segmentations/params/
.
Then an analytics report is creating by triggering the background task POST tasks/private/analytics_report_export/
referring to the previous segmentation.
The structure a segmented analytics report follows the structure of the segmentation parameters.
The report simply adds the analytics
object to every leaf.
The report also adds the key skip
to all splits, containing the analytics report if no split is apply.
[WIP migrate documentation on how to use the API from Confluence to here]
A/B Test Split¶
A split of type ab_test
refers to the A/B test parameters created using the endpoint POST ab-tests/params/
,
and branches to three children segmentations:
group_a
for users belonging to the group A,group_b
for users belonging to the group B,skip
for users in either groups A or B.
These splits are complementary to A/B Test Scenarios which are used to evaluate the A/B test at runtime.
Some additional information is added to the report for convenience:
the A/B test’s
name
andprobability_a
,the A/B test scenarios that refer to this A/B test and the
scenario_a
andscenario_b
they also refer to. WARNING: all such scenarios will be included regardless of whether they were actually evaluated at runtime or not!
CREATE SEGMENTATION PARAMETERS WITH A/B TEST¶POST analytics/segmentations/params/ HTTP/1.1 Content-Type: text/javascript { "name": "my_ab_test_segmentation", "segmentation": { "type": "ab_test", "split": { "ab_test_id": "ab_test_123", "group_a": {"type": "analytics"}, "group_b": {"type": "analytics"} }, "skip": {"type": "analytics"} } }
{
"segmentation": {
"type": "ab_test",
"split": {
"name": "ab_test_123",
"probability_a": 0.666,
"ab_test_scenarios": [
{
"name": "scenario_ab_test123",
"reco_type": "profile_to_items",
"scenario_a": {"name": "scenario_123a"},
"scenario_b": {"name": "scenario_123b"}
}
],
"group_a": {
"type": "analytics",
"analytics": [{"name": "analytics_name", "value": 2.0}],
},
"group_b": {
"type": "analytics",
"analytics": [{"name": "analytics_name", "value": 0.0}]
}
},
"skip": {
"type": "analytics",
"analytics": [{"name": "analytics_name", "value": 0.0}]
}
}
}
User Condition Split¶
WORK-IN-PROGRESS
See also Condition Scenarios.
Cohorts Split¶
WORK-IN-PROGRESS
Property Explode Split¶
WORK-IN-PROGRESS