Business Intelligence is the application of data analysis concepts to transform raw data into digestible results that can inform business decision making. For quick and easy consumption of this information, visualizations, like bar and line charts, are often used. As with the module covering queries, this module on analysis techniques and commonly associated visualization options will be performed against a table titled Activity. Activity stores information pertaining to user activity in a fictional social media application.
Aggregation: simultaneously analyze many rows
The process of summarizing a group of records via an aggregation function, such as average, minimum, maximum, sum or count, to return a single value representative of the group. A single query can simultaneously perform aggregations of varying types on multiple columns. In the Data Explorer, aggregations are defined in the measures field.
SELECT count(activity_id) as "Count of Activity", max(created_date) as "Maximum Activity Date" FROM activity
Segmentation: slice and dice aggregates
Segmentation is the process of dividing a population of records into subgroups. Segmentation is performed via the GROUP BY command in SQL and by use of the dimensions field in the Data Explorer. When segmenting aggregates, a single row will be returned for each unique combination of values from the grouped fields (dimensions).
SELECT engagement_index, activity_name, COUNT(DISTINCT activity_id) AS "Count of Activity", MAX(created_date) AS "Maximum Created Date" FROM activity GROUP BY engagement_index, activity_name ORDER BY engagement_index, activity_name
Summary: Perform an aggregation function on a single field without the use of any segmentation, thereby returning a single value. Filters, which are applied via the WHERE clause, can be applied to control the subset of data aggregated.
Example Business Question: In total, how many users are there?
SELECT count(DISTINCT user_id) FROM users
Basic Grouping (Segmenting Aggregate Results)
Summary: Apply an aggregate function to a field and segment, or group, the results using another field, typically a dimension (ex: City, State, Company, etc). There will be a single result row for each unique combination of dimension values listed in the GROUP BY clause.
Example Business Question: For all user activities in the application, what is the total of each?
Key Concepts: GROUP BY
SELECT activity_name, count(DISTINCT activity_id) AS "Count of Activity" FROM public.activity GROUP BY activity_name ORDER BY activity_name ASC
Time-Series (trending over time)
Summary: Apply an aggregate function to a field and segment the results by a period of time, typically day, week, or month. Results are displayed in chronological order for the time range defined. More complex time-series analysis includes either multiple measures, meaning aggregated inputs, or additional dimension(s) for further segmentation of the data.
Example Business Question: How is monthly user activity trending?
SELECT TO_CHAR(created_date, 'YYYY-MM') AS "Month of Activity", COUNT(activity_id) AS "Count of Activity" FROM public.activity GROUP BY "Month of Activity" ORDER BY "Month of Activity" ASC
Example Business Question: Which 5 users have the most activity?
SELECT user_id, COUNT(DISTINCT activity_id) AS "Count of Activity" FROM activity GROUP BY user_id ORDER BY "Count of Activity" DESC, user_id ASC LIMIT 5
Summary: Part-to-Whole analysis returns the percentage that each row entry in a particular column represents compared against the sum of all row entries in that column. In raw SQL, this calculation requires more sophisticated concepts like derived queries, window functions, or common table expressions; however, Chartio makes the process simple with automatic formatting for visualization options like the Pie Chart.
Example Business Question: Of all user activity, what is the percentage of each activity type?
SELECT activity_name, 100.0*COUNT(activity_id)/(SELECT count(activity_id) as Total FROM activity) FROM activity GROUP BY activity_name ORDER BY activity_name
Summary: Compare the results of multiple measures against shared or common dimension values. For example, track monthly user activity against the number of monthly user signups. These two measurements need to be calculated independently as the date dimensions used in the aggregations are different, activity.created_date and users.created_date respectively; however, the results can be combined into a single chart based on the shared dimension values, namely the month values (Dec 2014, Jan 2015, etc). These two aggregations can be performed in separate queries using the Layers functionality in Chartio.
A more simple direct comparison use case is when the same field is used for the aggregation of each metric. To reference the Order table from the Data Storage module, one could compare the count of orders and the average unit price with each metric aggregated over the order_date field.
Key Chartio Concepts: Layers
Example Business Question: How do user signups by month and user activity by month trend against one another?
SELECT "Users"."Month Joined", "Users"."User Count", "Activities"."Count of Activity" FROM (SELECT TO_CHAR(created_date, 'YYYY-MM') AS "Month Joined", COUNT(user_id) AS "User Count" FROM users GROUP BY "Month Joined" ORDER BY "Month Joined" ASC) as "Users" JOIN (SELECT TO_CHAR(created_date, 'YYYY-MM') AS "Month of Activity", COUNT(activity_id) AS "Count of Activity" FROM activity GROUP BY "Month of Activity" ORDER BY "Month of Activity" ASC) as "Activities" ON "Users"."Month Joined" = "Activities"."Month of Activity"
Looking for a way to get started? Follow this playbook for getting started with data analysis! For the business questions below, proceed through each of the following points:
- Is this information tracked, and if so, do I have access to the data?
- What data source, object (table), and field(s) holds the information I need?
- What aggregation function(s), if any, need to be applied?
- Do I need to perform any transformations to the data after initial extraction?
- Once I obtain the answer to the question, what is the best visualization approach displaying the results?