We are using Bigquery as our data warehouse solution and using standard SQL as query language . For dataset we use Google’s Google Analytics logs of an merchants website.
You need to enable your bigquery account which has a daily limit and there after it is cost effective.
Click Navigation menu > BigQuery.
BigQuery public datasets are not displayed by default in the BigQuery web UI. To open the public datasets project, open https://console.cloud.google.com/bigquery?p=data-to-insights&page=ecommerce in a new browser window.
In the left pane, in the Resource section, click data-to-insights. In the right pane, click Pin Project.
Explore ecommerce data
Problem : Your data analyst team exported the Google Analytics logs for an ecommerce website into BigQuery and created a new table of all the raw ecommerce visitor session data.
all_sessions_raw table data:
- Click data-to-insights to expand the project.
- Click ecommerce.
- Click all_sessions_raw.
In the right pane, a section opens that provides 3 views of the table data:
- Schema tab: Field name, Type, Mode, and Description; the logical constraints used to organize the data
- Details tab: Table metadata
- Preview tab: Table preview
- Click the Details tab to view the table metadata.
Identify duplicate rows
Seeing a sample amount of data may give you greater intuition for what is included in the dataset. To preview sample rows from the table without using SQL, click the preview tab.
Scan and scroll through the rows. There is no singular field that uniquely identifies a row, so you need advanced logic to identify duplicate rows.
Your query uses the SQL
GROUP BY function on every field and counts (
COUNT) where there are rows that have the same values across every field.
- If every field is unique, the
COUNTwill return 1 as there are no other groupings of rows with the exact same value for all fields.
- If there is a row with the same values for all fields, they will be grouped together and the
COUNTwill be greater than 1. The last part of the query is an aggregation filter using
HAVINGto only show the results that have a
COUNTof duplicates greater than 1.
- Copy and paste the following query into the Query editor, then Run query to find duplicate records across all columns.
#standardSQL SELECT COUNT(*) as num_duplicate_rows, * FROM `data-to-insights.ecommerce.all_sessions_raw` GROUP BY fullVisitorId, channelGrouping, time, country, city, totalTransactionRevenue, transactions, timeOnSite, pageviews, sessionQualityDim, date, visitId, type, productRefundAmount, productQuantity, productPrice, productRevenue, productSKU, v2ProductName, v2ProductCategory, productVariant, currencyCode, itemQuantity, itemRevenue, transactionRevenue, transactionId, pageTitle, searchKeyword, pagePathLevel1, eCommerceAction_type, eCommerceAction_step, eCommerceAction_option HAVING num_duplicate_rows > 1;
Analyze the new
In this section you use a deduplicated table called
Scenario: Your data analyst team has provided you with this query, and your schema experts have identified the key fields that must be unique for each record per your schema.
Run the query to confirm that no duplicates exist, this time in the
#standardSQL # schema: https://support.google.com/analytics/answer/3437719?hl=en SELECT fullVisitorId, # the unique visitor ID visitId, # a visitor can have multiple visits date, # session date stored as string YYYYMMDD time, # time of the individual site hit (can be 0 to many per visitor session) v2ProductName, # not unique since a product can have variants like Color productSKU, # unique for each product type, # a visitor can visit Pages and/or can trigger Events (even at the same time) eCommerceAction_type, # maps to ‘add to cart', ‘completed checkout' eCommerceAction_step, eCommerceAction_option, transactionRevenue, # revenue of the order transactionId, # unique identifier for revenue bearing transaction COUNT(*) as row_count FROM `data-to-insights.ecommerce.all_sessions` GROUP BY 1,2,3 ,4, 5, 6, 7, 8, 9, 10,11,12 HAVING row_count > 1 # find duplicates
The query returns zero records.
Note: In SQL, you can GROUP BY or ORDER BY the index of the column like using “GROUP BY 1” instead of “GROUP BY fullVisitorId”
query that shows total unique visitors
- Write this query in the editor:
#standardSQL SELECT COUNT(*) AS product_views, COUNT(DISTINCT fullVisitorId) AS unique_visitors FROM `data-to-insights.ecommerce.all_sessions`;
Now write a query that shows total unique visitors(
fullVisitorID) by the referring site (
#standardSQL SELECT COUNT(DISTINCT fullVisitorId) AS unique_visitors, channelGrouping FROM `data-to-insights.ecommerce.all_sessions` GROUP BY channelGrouping ORDER BY channelGrouping DESC;
Write a query to list all the unique product names (
#standardSQL SELECT (v2ProductName) AS ProductName FROM `data-to-insights.ecommerce.all_sessions` GROUP BY ProductName ORDER BY ProductName
Tip: In SQL, the ORDER BY clauses defaults to Ascending (ASC) A–>Z. If you want the reverse, try ORDER BY field_name DESC
Write a query to list the five products with the most views (
product_views) from all visitors (include people who have viewed the same product more than once). Your query counts number of times a product (
v2ProductName) was viewed (
product_views), puts the list in descending order, and lists the top 5 entries:
Tip: In Google Analytics, a visitor can “view” a product during the following interaction types: ‘page’, ‘screenview’, ‘event’, ‘transaction’, ‘item’, ‘social’, ‘exception’, ‘timing’. For our purposes, simply filter for only type = ‘PAGE’.
#standardSQL SELECT COUNT(*) AS product_views, (v2ProductName) AS ProductName FROM `data-to-insights.ecommerce.all_sessions` WHERE type = 'PAGE' GROUP BY v2ProductName ORDER BY product_views DESC LIMIT 5;
Now refine the query to no longer double-count product views for visitors who have viewed a product many times. Each distinct product view should only count once per visitor.
WITH unique_product_views_by_person AS ( -- find each unique product viewed by each visitor SELECT fullVisitorId, (v2ProductName) AS ProductName FROM `data-to-insights.ecommerce.all_sessions` WHERE type = 'PAGE' GROUP BY fullVisitorId, v2ProductName ) -- aggregate the top viewed products and sort them SELECT COUNT(*) AS unique_view_count, ProductName FROM unique_product_views_by_person GROUP BY ProductName ORDER BY unique_view_count DESC LIMIT 5
expand your previous query to include the total number of distinct products ordered and the total number of total units ordered (
#standardSQL SELECT COUNT(*) AS product_views, COUNT(productQuantity) AS orders, SUM(productQuantity) AS quantity_product_ordered, v2ProductName FROM `data-to-insights.ecommerce.all_sessions` WHERE type = 'PAGE' GROUP BY v2ProductName ORDER BY product_views DESC LIMIT 5;
Expand the query to include the average amount of product per order (total number of units ordered/total number of orders, or
#standardSQL SELECT COUNT(*) AS product_views, COUNT(productQuantity) AS orders, SUM(productQuantity) AS quantity_product_ordered, SUM(productQuantity) / COUNT(productQuantity) AS avg_per_order, (v2ProductName) AS ProductName FROM `data-to-insights.ecommerce.all_sessions` WHERE type = 'PAGE' GROUP BY v2ProductName ORDER BY product_views DESC LIMIT 5;
Calculate a conversion rate
Answer these questions:
- How many distinct times was the product part of an order (either complete or incomplete order)?
- How many total units of the product were part of orders (either complete or incomplete)?
- Which product had the highest conversion rate?
SELECT COUNT(*) AS product_views, COUNT(productQuantity) AS potential_orders, SUM(productQuantity) AS quantity_product_added, (COUNT(productQuantity) / COUNT(*)) AS conversion_rate, v2ProductName FROM `data-to-insights.ecommerce.all_sessions` WHERE LOWER(v2ProductName) NOT LIKE ‘%frisbee%’ GROUP BY v2ProductName HAVING quantity_product_added > 1000 ORDER BY conversion_rate DESC LIMIT 10;
Track visitor checkout progress
#standardSQL SELECT COUNT(DISTINCT fullVisitorId) AS number_of_unique_visitors, eCommerceAction_type FROM `data-to-insights.ecommerce.all_sessions` GROUP BY eCommerceAction_type ORDER BY eCommerceAction_type;
Bonus: You are given this mapping for the action type: Unknown = 0 Click through of product lists = 1 Product detail views = 2 Add product(s) to cart = 3 Remove product(s) from cart = 4 Check out = 5 Completed purchase = 6 Refund of purchase = 7 Checkout options = 8
Use a Case Statement to add a new dimension
#standardSQL SELECT COUNT(DISTINCT fullVisitorId) AS number_of_unique_visitors, eCommerceAction_type, CASE eCommerceAction_type WHEN '0' THEN 'Unknown' WHEN '1' THEN 'Click through of product lists' WHEN '2' THEN 'Product detail views' WHEN '3' THEN 'Add product(s) to cart' WHEN '4' THEN 'Remove product(s) from cart' WHEN '5' THEN 'Check out' WHEN '6' THEN 'Completed purchase' WHEN '7' THEN 'Refund of purchase' WHEN '8' THEN 'Checkout options' ELSE 'ERROR' END AS eCommerceAction_type_label FROM `data-to-insights.ecommerce.all_sessions` GROUP BY eCommerceAction_type ORDER BY eCommerceAction_type;
Track abandoned carts from high quality sessions
#standardSQL # high quality abandoned carts SELECT #unique_session_id CONCAT(fullVisitorId,CAST(visitId AS STRING)) AS unique_session_id, sessionQualityDim, SUM(productRevenue) AS transaction_revenue, MAX(eCommerceAction_type) AS checkout_progress FROM `data-to-insights.ecommerce.all_sessions` WHERE sessionQualityDim > 60 # high quality session GROUP BY unique_session_id, sessionQualityDim HAVING checkout_progress = ‘3’ # 3 = added to cart AND (transaction_revenue = 0 OR transaction_revenue IS NULL)
thats end (GCP training)