Analysis Techniques and Visualizations

Key Concepts

Business Intelligence is the application of data analysis concepts to transform raw data into digestible results that can inform business decision making. For quick and easy consumption of this information, visualizations, like bar and line charts, are often used. As with the module covering queries, this module on analysis techniques and commonly associated visualization options will be performed against a table titled Activity. Activity stores information pertaining to user activity in a fictional social media application.

Aggregation: simultaneously analyze many rows

The process of summarizing a group of records via an aggregation function, such as average, minimum, maximum, sum or count, to return a single value representative of the group. A single query can simultaneously perform aggregations of varying types on multiple columns. In the Data Explorer, aggregations are defined in the measures field.

SELECT count(activity_id) as "Count of Activity", max(created_date) as "Maximum Activity Date" 
FROM activity

 

Segmentation: slice and dice aggregates

Segmentation is the process of dividing a population of records into subgroups. Segmentation is performed via the GROUP BY command in SQL and by use of the dimensions field in the Data Explorer. When segmenting aggregates, a single row will be returned for each unique combination of values from the grouped fields (dimensions). 

SELECT engagement_index, activity_name, COUNT(DISTINCT activity_id) AS "Count of Activity", MAX(created_date) AS "Maximum Created Date"
FROM activity
GROUP BY engagement_index, activity_name
ORDER BY engagement_index, activity_name

 

Techniques

Simple Aggregation

Summary: Perform an aggregation function on a single field without the use of any segmentation, thereby returning a single value. Filters, which are applied via the WHERE clause, can be applied to control the subset of data aggregated.

Frequently Associated Visualization(s): Single Value, Table

Example Business Question: In total, how many users are there?

Chartio Example:

SQL: 

SELECT count(DISTINCT user_id) 
FROM users

Basic Grouping (Segmenting Aggregate Results)

Summary: Apply an aggregate function to a field and segment, or group, the results using another field, typically a dimension (ex: City, State, Company, etc). There will be a single result row for each unique combination of dimension values listed in the GROUP BY clause. 

Frequently Associated Visualization(s)TableBar

Example Business Question: For all user activities in the application, what is the total of each?

Key ConceptsGROUP BY

Chartio Example: 

SQL:

SELECT activity_name, count(DISTINCT activity_id) AS "Count of Activity" 
FROM public.activity
GROUP BY activity_name
ORDER BY activity_name ASC

Time-Series (trending over time)

Summary: Apply an aggregate function to a field and segment the results by a period of time, typically day, week, or month. Results are displayed in chronological order for the time range defined. More complex time-series analysis includes either multiple measures, meaning aggregated inputs, or additional dimension(s) for further segmentation of the data. 

Frequently Associated Visualization(s): Line, Area, Bar

Example Business Question: How is monthly user activity trending? 

Key Concepts: Group By Date, Order By Date 

Chartio Example: 

SQL:

SELECT TO_CHAR(created_date, 'YYYY-MM') AS "Month of Activity", COUNT(activity_id) AS "Count of Activity" 
FROM public.activity 
GROUP BY "Month of Activity" 
ORDER BY "Month of Activity" ASC

 

Ordering/Ranking

Summary: Sort the segments of a result set in ascending or descending order based on the associated aggregated values. Dictate the number of results by including a limit clause. 

Associated Visualization(s): Table, Bar

Example Business Question: Which 5 users have the most activity?

Key Concepts: Aggregation FunctionsGroup ByOrder By, Limit

Chartio Example: 

Sorting applied by clicking "Count of Activity" in the measures field  and selecting the Sort Option: Descending

SQL:

SELECT user_id, COUNT(DISTINCT activity_id) AS "Count of Activity" 
FROM activity
GROUP BY user_id 
ORDER BY "Count of Activity" DESC, user_id ASC 
LIMIT 5

Part-to-Whole

Summary: Part-to-Whole analysis returns the percentage that each row entry in a particular column represents compared against the sum of all row entries in that column. In raw SQL, this calculation requires more sophisticated concepts like derived queries, window functions, or common table expressions; however, Chartio makes the process simple with automatic formatting for visualization options like the Pie Chart.

Associated Visualization(s): PieBar

Key Concepts: Aggregation FunctionsGroup By

Example Business Question: Of all user activity, what is the percentage of each activity type? 

SQL:

SELECT activity_name, 100.0*COUNT(activity_id)/(SELECT count(activity_id) as Total FROM activity) 
FROM activity
GROUP BY activity_name
ORDER BY activity_name

Direct Comparison 

Summary: Compare the results of multiple measures against shared or common dimension values. For example, track monthly user activity against the number of monthly user signups. These two measurements need to be calculated independently as the date dimensions used in the aggregations are different, activity.created_date and users.created_date respectively; however, the results can be combined into a single chart based on the shared dimension values, namely the month values (Dec 2014, Jan 2015, etc). These two aggregations can be performed in separate queries using the Layers functionality in Chartio.

A more simple direct comparison use case is when the same field is used for the aggregation of each metric. To reference the Order table from the Data Storage module, one could compare the count of orders and the average unit price with each metric aggregated over the order_date field. 

Associated Visualization(s): Dual-axis line chart, Multiple-bar Chart, Bar-Line chartScatter Plot

Key SQL ConceptsAggregation FunctionsGroup By, Joins

Key Chartio Concepts: Layers

Example Business Question: How do user signups by month and user activity by month trend against one another? 

Chartio Example: 

SQL: 

SELECT "Users"."Month Joined", "Users"."User Count", "Activities"."Count of Activity" 
FROM 
    (SELECT 
        TO_CHAR(created_date, 'YYYY-MM') AS "Month Joined",
        COUNT(user_id) AS "User Count" 
    FROM users 
    GROUP BY "Month Joined"
    ORDER BY "Month Joined" ASC) as "Users"
JOIN 
    (SELECT 
        TO_CHAR(created_date, 'YYYY-MM') AS "Month of Activity",
        COUNT(activity_id) AS "Count of Activity" 
    FROM activity 
    GROUP BY "Month of Activity" 
    ORDER BY "Month of Activity" ASC) as "Activities"
ON "Users"."Month Joined" = "Activities"."Month of Activity"

 

Approach

Looking for a way to get started? Follow this playbook for getting started with data analysis! For the business questions below, proceed through each of the following points: 

  • Is this information tracked, and if so, do I have access to the data?
  • What data source, object (table), and field(s) holds the information I need?
  • What aggregation function(s), if any, need to be applied? 
  • Do I need to perform any transformations to the data after initial extraction? 
  • Once I obtain the answer to the question, what is the best visualization approach displaying the results?