Analytics (a.k.a. KPIs)

This document presents analytics-related logic. KPIs (Key Performance Indicators) are the subset of analytics that are of special interest to our customers.

Pipeline

The various steps are presented, followed by a rather complete example of the pipeline.

Task Preparation

Before any analytics task can be run, the API needs to have some information in the database’s metadata and a table’s metadata.

Database Metadata

The main piece of information is the interaction types (which are specific to each database), and how they relate to the analytics to calculate.

Each Interaction Types Group (ITG) regroups all interaction types that play a similar role with regards to analytics. For example, an e-commerce could have two types of interactions that we would interpret the same way: purchased and bought. We’d add both interaction types to the ITG ecommerce:purchase on which analytics regarding purchases are calculated.

All existing ITGs are presented in this table:

ITG

Description

ecommerce:click

Click

ecommerce:purchase

Purchase

ecommerce:add_to_cart

Adding to cart

ecommerce:remove_from_cart

Removing from cart

conversion:from

Initial state of a conversion (ex: reco_seen)

conversion:to

Final state of a conversion (ex: purchase on same item)

Updating the database analytics metadata with the endpoint PATCH databases/current/, the keys related to an analytics task are:

  • groups_interactions_types (required): an object string/list-of-strings where keys are related to the analytics we can calculate, and values are interaction-types present in the database. Every key is optional

  • item_property_price (optional): in case an interaction type corresponds to a purchase and we have the price of the items as an item-property, this key is the item-property’s name. This item-property should be numerical and non-repeated.

  • session_duration (optional): number of second a session consists of. Has to be between 30 and 172800 seconds (2 days).

Table Metadata

Analytics reports are stored in a pre-created table with 'LOCAL' connection using the endpoint PATCH connections/<str:connection>/tables/<str:name>/.

This table may contain analytics metadata used during the analytics calculation task:

  • segmentation_id (optional): the ID of a segmentation (optional); see Analytics Segmentation (a.k.a Split) for more.

  • start_timestamp (optional): timestamp before which interactions are skipped in the analytics calculation.

  • end_timestamp (optional): timestamp after which interactions are skipped in the analytics calculation.

Task Trigger

Analytics calculation background tasks are triggered using the endpoint POST tasks/private/analytics_report_export/; the table must be specified.

Report Fetching

To list all analytics_report tables and their metadata, use the endpoint GET tables/.

When the background task is completed, the analytics reports can be obtained using the endpoint GET storage-signed-urls/local-tables/<str:resource>/download/ with resource analytics_report.

Example

Here is an example of the full pipeline, using the default segmentation as an example.

Let’s update the database metadata, knowing that the current database has:

  • for interactions types:clicked, bought, purchased, remove_from_cart, added_to_cart, view_recommended;

  • for item-property containing the price of items: product_price;

  • user sessions lasting approximately 3 minutes.

    UPDATE DATABASE’S ANALYTICS METADATA
    PATCH databases/current/ HTTP/1.1
    Content-Type: text/javascript
    
    {
        "metadata": {
            "analytics": {
                "groups_interactions_types": {
                    "ecommerce:click": ["clicked"],
                    "ecommerce:purchase": ["bought", "purchased"],
                    "ecommerce:remove_from_cart": ["remove_from_cart"],
                    "ecommerce:add_to_cart": ["added_to_cart"],
                    "conversion:from": ["view_recommended"],
                    "conversion:to": ["purchased", "bought"]
                },
                "item_property_price": "product_price",
                "session_duration": 180
            }
        },
        "preserve": true
    }
    

Let’s enforce that the segmentation used is the default one by creating a new segmentation:

CREATE SEGMENTATION (optional)
POST analytics/segmentations/params/ HTTP/1.1
Content-Type: text/javascript

{
  "name": "my_analytics_segmentation",
  "segmentation": {"type": "analytics"}
}
RESPONSE WITH SEGMENTATION ID
{
    "id": "seg123"
}

Let’s create a table with this segmentation, and specify some time-frame for the interactions to consider:

CREATE TABLE WITH METADATA
PATCH connections/LOCAL/tables/my_table_1.json/ HTTP/1.1
Content-Type: text/javascript

{
    "resource": "analytics_report",
    "resource_params": {"columns_mapping": {}},
    "connection_table_params": {"file_format": "dat", "options": {}},
    "connection_table_metadata": {
        "analytics": {
            "segmentation_id": "seg123",
            "start_timestamp": 12345.0,
            "end_timestamp": 54321.0
        }
    }
}

Let’s trigger the task corresponding to this table:

TRIGGER ANALYTICS TASK
POST tasks/private/analytics_report_export/ HTTP/1.1
Content-Type: text/javascript

{
    "context_task": {
        "output_table": {"connection": "LOCAL", "name": "my_table_1.json"}
    }
}

Later, let’s list the tables to retrieve the name we forgot:

LIST ANALYTICS_REPORT TABLES
GET tables/&connection=LOCAL&resource=analytics_report HTTP/1.1
Content-Type: text/javascript

{
    "has_next": false,
    "next_page": 2,
    "tables": [
        {
            "name": "my_table_1.json",
            "connection": "LOCAL",
            "resource": "analytics_report",
            "status": "available",
            "created_timestamp": 123456789,
            "modified_timestamp": 123456789,
            "connection_table_params": {
                "file_format": "dat",
                "options": {}
            },
            "resource_params": {
                "columns_mapping": {}
            },
            "connection_table_metadata": {
                "segmentation_id": "seg123",
                "start_timestamp": 12345.0,
                "end_timestamp": 54321.0
            }

        }
    ]
}

Using this table name, let’s generate a short-lived URL to fetch the report:

GET URL TO THE ANALYTICS REPORT
GET storage-signed-urls/local-tables/analytics_report/download/&name=my_table_1.json HTTP/1.1
Content-Type: text/javascript

{
    "url": "https://stora...com/analytics_report/Up131.../my_table_1.json?X-Goog..."
}

The JSON report can then be read in a browser, or fetched using a request.

[WIP migrate documentation on how to use the API from Confluence to here]

Analytics Metrics

A minimalistic report (without segmentation given to the analytics calculation task) has the following structure:

MINIMALISTIC REPORT FORMAT
{
    "segmentation": {
        "type": "analytics",
        "analytics": [
          {"name": "<analytics_name>", "value": 123.0}
        ]
    },
    "database_metadata_snapshot": {
        "groups_interactions_types": {
            "<ITG>": ["<interaction_type>"]
        },
        "item_property_price": "<item_property>",
        "session_duration": 12.0
    }
}

The database metadata snapshot is added to facilitate the verification of analytics; for brevity we skip it in the following examples.

Analytics, in the above list of name/value maps, come in 3 flavors:

  • a singly-defined analytics, such as users_count

  • an analytics bearing various statistics. These analytics generate multiple numbers, named following the pattern '{statistic}_{analytics_name}' where statistic belongs to min, max, mean, std, sum and analytics_name belongs to average_purchase_value_per_user, cart_abandonment_rate_per_user, interaction_count_per_user, purchase_of_missing_item_count, purchase_value_per_user, remove_from_cart_rate_per_user

  • an analytics with parameter, named '{parameter}param_{analytics_name}'. For example, number of days over which we check customer retention (this analytics is not available yet, but is provided for the example)

As many analytics as possible are calculated. If an analytics should be infinite or NaN, it is removed from the list. This includes analytics that cannot be calculated (such as purchases, when items don’t have a price property).

MINIMALISTIC REPORT WITH PURCHASE VALUE STATISTICS
{
    "segmentation": {
        "type": "analytics",
        "analytics": [
            {"name": "mean_purchase_value_per_user", "value": 60.0},
            {"name": "max_purchase_value_per_user", "value": 110.0},
            {"name": "min_purchase_value_per_user", "value": 10.0},
            {"name": "std_purchase_value_per_user", "value": 50.0},
            {"name": "sum_purchase_value_per_user", "value": 120.0},
            {"name": "users_count", "value": 2}
        ]
    }
}

Analytics Segmentation (a.k.a Split)

It is possible to segment (a.k.a split) analytics by group of users, such as users satisfying certain conditions, or users in the respective groups A or B from an A/B test.

Splits can be nested in any order. This means that segmenting analytics can be arbitrarily complex. Analytics segmentation parameters are first created using the endpoint POST analytics/segmentations/params/.

Then an analytics report is creating by triggering the background task POST tasks/private/analytics_report_export/ referring to the previous segmentation.

The structure a segmented analytics report follows the structure of the segmentation parameters. The report simply adds the analytics object to every leaf. The report also adds the key skip to all splits, containing the analytics report if no split is apply.

[WIP migrate documentation on how to use the API from Confluence to here]

A/B Test Split

A split of type ab_test refers to the A/B test parameters created using the endpoint POST ab-tests/params/, and branches to three children segmentations:

  • group_a for users belonging to the group A,

  • group_b for users belonging to the group B,

  • skip for users in either groups A or B.

These splits are complementary to A/B Test Scenarios which are used to evaluate the A/B test at runtime.

Some additional information is added to the report for convenience:

  • the A/B test’s name and probability_a,

  • the A/B test scenarios that refer to this A/B test and the scenario_a and scenario_b they also refer to. WARNING: all such scenarios will be included regardless of whether they were actually evaluated at runtime or not!

CREATE SEGMENTATION PARAMETERS WITH A/B TEST
POST analytics/segmentations/params/ HTTP/1.1
Content-Type: text/javascript

{
  "name": "my_ab_test_segmentation",
  "segmentation": {
    "type": "ab_test",
    "split": {
        "ab_test_id": "ab_test_123",
        "group_a": {"type": "analytics"},
        "group_b": {"type": "analytics"}
    },
    "skip": {"type": "analytics"}
  }
}
REPORT USING SEGMENTATION PARAMETERS WITH A/B TEST
{
  "segmentation": {
    "type": "ab_test",
    "split": {
      "name": "ab_test_123",
      "probability_a": 0.666,
      "ab_test_scenarios": [
        {
          "name": "scenario_ab_test123",
          "reco_type": "profile_to_items",
          "scenario_a": {"name": "scenario_123a"},
          "scenario_b": {"name": "scenario_123b"}
        }
      ],
      "group_a": {
        "type": "analytics",
        "analytics": [{"name": "analytics_name", "value": 2.0}],
      },
      "group_b": {
        "type": "analytics",
        "analytics": [{"name": "analytics_name", "value": 0.0}]
      }
    },
    "skip": {
      "type": "analytics",
      "analytics": [{"name": "analytics_name", "value": 0.0}]
    }
  }
}

User Condition Split

WORK-IN-PROGRESS

See also Condition Scenarios.

Cohorts Split

WORK-IN-PROGRESS

Property Explode Split

WORK-IN-PROGRESS