insights from E-Commerce retail data set

We are using Bigquery as our data warehouse solution and using standard SQL as query language . For dataset we use Google’s Google Analytics logs of an merchants website.

You need to enable your bigquery account which has a daily limit and there after it is cost effective.

Click Navigation menu > BigQuery.

nav_bq.png

Click Done.

BigQuery public datasets are not displayed by default in the BigQuery web UI. To open the public datasets project, open https://console.cloud.google.com/bigquery?p=data-to-insights&page=ecommerce in a new browser window.

In the left pane, in the Resource section, click data-to-insights. In the right pane, click Pin Project.

pin_project.png

Explore ecommerce data

Problem :  Your data analyst team exported the Google Analytics logs for an ecommerce website into BigQuery and created a new table of all the raw ecommerce visitor session data.

Explore the all_sessions_raw table data:

  1. Click data-to-insights to expand the project.
  2. Click ecommerce.
  3. Click all_sessions_raw.

In the right pane, a section opens that provides 3 views of the table data:

  • Schema tab: Field name, Type, Mode, and Description; the logical constraints used to organize the data
  • Details tab: Table metadata
  • Preview tab: Table preview
  1. Click the Details tab to view the table metadata.
view.png

Identify duplicate rows

Seeing a sample amount of data may give you greater intuition for what is included in the dataset. To preview sample rows from the table without using SQL, click the preview tab.

Scan and scroll through the rows. There is no singular field that uniquely identifies a row, so you need advanced logic to identify duplicate rows.

Your query uses the SQL GROUP BY function on every field and counts (COUNT) where there are rows that have the same values across every field.

  • If every field is unique, the COUNT will return 1 as there are no other groupings of rows with the exact same value for all fields.
  • If there is a row with the same values for all fields, they will be grouped together and the COUNT will be greater than 1. The last part of the query is an aggregation filter using HAVING to only show the results that have a COUNT of duplicates greater than 1.
  1. Copy and paste the following query into the Query editor, then Run query to find duplicate records across all columns.
#standardSQL
SELECT COUNT(*) as num_duplicate_rows, * FROM
`data-to-insights.ecommerce.all_sessions_raw`
GROUP BY
fullVisitorId, channelGrouping, time, country, city, totalTransactionRevenue, transactions, timeOnSite, pageviews, sessionQualityDim, date, visitId, type, productRefundAmount, productQuantity, productPrice, productRevenue, productSKU, v2ProductName, v2ProductCategory, productVariant, currencyCode, itemQuantity, itemRevenue, transactionRevenue, transactionId, pageTitle, searchKeyword, pagePathLevel1, eCommerceAction_type, eCommerceAction_step, eCommerceAction_option
HAVING num_duplicate_rows > 1;

Analyze the new all_sessions table

In this section you use a deduplicated table called all_sessions.

Scenario: Your data analyst team has provided you with this query, and your schema experts have identified the key fields that must be unique for each record per your schema.

Run the query to confirm that no duplicates exist, this time in the all_sessions table:

#standardSQL
# schema: https://support.google.com/analytics/answer/3437719?hl=en
SELECT
fullVisitorId, # the unique visitor ID
visitId, # a visitor can have multiple visits
date, # session date stored as string YYYYMMDD
time, # time of the individual site hit  (can be 0 to many per visitor session)
v2ProductName, # not unique since a product can have variants like Color
productSKU, # unique for each product
type, # a visitor can visit Pages and/or can trigger Events (even at the same time)
eCommerceAction_type, # maps to ‘add to cart', ‘completed checkout'
eCommerceAction_step,
eCommerceAction_option,
  transactionRevenue, # revenue of the order
  transactionId, # unique identifier for revenue bearing transaction
COUNT(*) as row_count
FROM
`data-to-insights.ecommerce.all_sessions`
GROUP BY 1,2,3 ,4, 5, 6, 7, 8, 9, 10,11,12
HAVING row_count > 1 # find duplicates

The query returns zero records.

Note: In SQL, you can GROUP BY or ORDER BY the index of the column like using “GROUP BY 1” instead of “GROUP BY fullVisitorId”

query that shows total unique visitors

  1. Write this query in the editor:
#standardSQL
SELECT
  COUNT(*) AS product_views,
  COUNT(DISTINCT fullVisitorId) AS unique_visitors
FROM `data-to-insights.ecommerce.all_sessions`;

Now write a query that shows total unique visitors(fullVisitorID) by the referring site (channelGrouping):

#standardSQL
SELECT
  COUNT(DISTINCT fullVisitorId) AS unique_visitors,
  channelGrouping
FROM `data-to-insights.ecommerce.all_sessions`
GROUP BY channelGrouping
ORDER BY channelGrouping DESC;

Write a query to list all the unique product names (v2ProductName) alphabetically:

#standardSQL
SELECT
  (v2ProductName) AS ProductName
FROM `data-to-insights.ecommerce.all_sessions`
GROUP BY ProductName
ORDER BY ProductName

Tip: In SQL, the ORDER BY clauses defaults to Ascending (ASC) A–>Z. If you want the reverse, try ORDER BY field_name DESC

Write a query to list the five products with the most views (product_views) from all visitors (include people who have viewed the same product more than once). Your query counts number of times a product (v2ProductName) was viewed (product_views), puts the list in descending order, and lists the top 5 entries:

Tip: In Google Analytics, a visitor can “view” a product during the following interaction types: ‘page’, ‘screenview’, ‘event’, ‘transaction’, ‘item’, ‘social’, ‘exception’, ‘timing’. For our purposes, simply filter for only type = ‘PAGE’.

#standardSQL
SELECT
  COUNT(*) AS product_views,
  (v2ProductName) AS ProductName
FROM `data-to-insights.ecommerce.all_sessions`
WHERE type = 'PAGE'
GROUP BY v2ProductName
ORDER BY product_views DESC
LIMIT 5;
98024b791d0c01f4.png

Now refine the query to no longer double-count product views for visitors who have viewed a product many times. Each distinct product view should only count once per visitor.

WITH unique_product_views_by_person AS (
-- find each unique product viewed by each visitor
SELECT 
 fullVisitorId,
 (v2ProductName) AS ProductName
FROM `data-to-insights.ecommerce.all_sessions`
WHERE type = 'PAGE'
GROUP BY fullVisitorId, v2ProductName )


-- aggregate the top viewed products and sort them
SELECT
  COUNT(*) AS unique_view_count,
  ProductName 
FROM unique_product_views_by_person
GROUP BY ProductName
ORDER BY unique_view_count DESC
LIMIT 5

with-clause-results.png

expand your previous query to include the total number of distinct products ordered and the total number of total units ordered (productQuantity):

#standardSQL
SELECT
  COUNT(*) AS product_views,
  COUNT(productQuantity) AS orders,
  SUM(productQuantity) AS quantity_product_ordered,
  v2ProductName
FROM `data-to-insights.ecommerce.all_sessions`
WHERE type = 'PAGE'
GROUP BY v2ProductName
ORDER BY product_views DESC
LIMIT 5;

Results

cefe4bb5b680216e.png

Expand the query to include the average amount of product per order (total number of units ordered/total number of orders, or SUM(productQuantity)/COUNT(productQuantity)).

#standardSQL
SELECT
  COUNT(*) AS product_views,
  COUNT(productQuantity) AS orders,
  SUM(productQuantity) AS quantity_product_ordered,
  SUM(productQuantity) / COUNT(productQuantity) AS avg_per_order,
  (v2ProductName) AS ProductName
FROM `data-to-insights.ecommerce.all_sessions`
WHERE type = 'PAGE'
GROUP BY v2ProductName
ORDER BY product_views DESC
LIMIT 5;

Results

37dd38ee9e9b3532.png

Calculate a conversion rate

Answer these questions:

  • How many distinct times was the product part of an order (either complete or incomplete order)?
  • How many total units of the product were part of orders (either complete or incomplete)?
  • Which product had the highest conversion rate?

#standardSQL

SELECT COUNT(*) AS product_views, COUNT(productQuantity) AS potential_orders, SUM(productQuantity) AS quantity_product_added, (COUNT(productQuantity) / COUNT(*)) AS conversion_rate, v2ProductName FROM `data-to-insights.ecommerce.all_sessions` WHERE LOWER(v2ProductName) NOT LIKE ‘%frisbee%’ GROUP BY v2ProductName HAVING quantity_product_added > 1000 ORDER BY conversion_rate DESC LIMIT 10;

Track visitor checkout progress

#standardSQL
SELECT 
  COUNT(DISTINCT fullVisitorId) AS number_of_unique_visitors,
  eCommerceAction_type
FROM `data-to-insights.ecommerce.all_sessions` 
GROUP BY eCommerceAction_type
ORDER BY eCommerceAction_type;

Bonus: You are given this mapping for the action type: Unknown = 0 Click through of product lists = 1 Product detail views = 2 Add product(s) to cart = 3 Remove product(s) from cart = 4 Check out = 5 Completed purchase = 6 Refund of purchase = 7 Checkout options = 8

Use a Case Statement to add a new dimension

#standardSQL
SELECT 
  COUNT(DISTINCT fullVisitorId) AS number_of_unique_visitors,
  eCommerceAction_type,
  CASE eCommerceAction_type
  WHEN '0' THEN 'Unknown'
  WHEN '1' THEN 'Click through of product lists'
  WHEN '2' THEN 'Product detail views'
  WHEN '3' THEN 'Add product(s) to cart'
  WHEN '4' THEN 'Remove product(s) from cart'
  WHEN '5' THEN 'Check out'
  WHEN '6' THEN 'Completed purchase'
  WHEN '7' THEN 'Refund of purchase'
  WHEN '8' THEN 'Checkout options'
  ELSE 'ERROR'
  END AS eCommerceAction_type_label
FROM `data-to-insights.ecommerce.all_sessions` 
GROUP BY eCommerceAction_type
ORDER BY eCommerceAction_type;

 Track abandoned carts from high quality sessions

#standardSQL # high quality abandoned carts SELECT #unique_session_id CONCAT(fullVisitorId,CAST(visitId AS STRING)) AS unique_session_id, sessionQualityDim, SUM(productRevenue) AS transaction_revenue, MAX(eCommerceAction_type) AS checkout_progress FROM `data-to-insights.ecommerce.all_sessions` WHERE sessionQualityDim > 60 # high quality session GROUP BY unique_session_id, sessionQualityDim HAVING checkout_progress = ‘3’ # 3 = added to cart AND (transaction_revenue = 0 OR transaction_revenue IS NULL)

thats end (GCP training)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s