Business Intelligence is the application of data analysis concepts to transform raw data into digestable results that can inform business decision making. For quick and easy consumption of this information, visualizations, like bar and line charts, are often used. As with the module covering queries, this modulle on analysis techniques and commonly associated visualization options will be performed against a table titled Documents. Documents stores information pertaining to the files loaded into the fictional e-signiture platform, Digisign.
Aggregation: simultaneously analyze many rows
The process of summarizing a group of records via an aggregation function, such as average, minimum, maximum, sum or count, to return a single value representative of the group. A single query can simultaneously perform aggregations, of varying types, on multiple columns. In the Chart Creator, aggregation is defined in the measures box. Example: count of documents and the maximum uploaded date
SELECT count(id) as "Count of Documents", max(date_uploaded) as "Maximum Date Uploaded" FROM documents
Segmentation: slice and dice aggregates
Segmentation is the process of dividing a population of records into subgroups. Segmentation is performed via the GROUP BY command in SQL and by use of the dimensions box in the Chartio Chart Creator. When segmenting aggregates, a single row will be returned for each unique combination of values from the grouped fields.
SELECT status, type, COUNT(DISTINCT "Documents"."id") AS "Count of Documents", MAX("Documents"."date_uploaded") AS "Maximum Date Uploaded" FROM "public"."documents" AS "Documents" GROUP BY status, type ORDER BY status, type
Summary: Perform an aggregation function on a single field without the use of any segmentation, thereby returning a single value. Filters, which are applied via the WHERE clause, can by applied to control the subset of data aggregated.
Example Business Question: In total, how many documents have been loaded into Digisign?
SELECT count(id) FROM documents
Basic Grouping (Segmenting Aggregate Results)
Summary: Apply an aggregate function to a field and segment, or group, the results using another field, typically a dimension (ex: City, State, Company, etc). There will be a single result row for each unique combination of dimension values listed in the GROUP BY clause.
Example Business Question: How many documents are currently in each status?
Key Concepts: GROUP BY
SELECT status, COUNT(id) AS "Count of Documents" FROM public.documents GROUP BY status ORDER BY status ASC
Time-Series (trending over time)
Summary: Apply an aggregate function to a field and segment the results by a period of time, typically day, week, or month. Results are displayed in chronological order for the time range defined. More complex time-series analysis includes either multiple measures, meaning aggregated inputs, or additional dimension(s) for further segmentation of the data.
Example Business Question: How are monthly document uploads trending?
SELECT TO_CHAR(date_uploaded, 'YYYY-MM') AS "Month of Date Uploaded", COUNT(id) AS "Count of Documents" FROM documents GROUP BY "Month of Date Uploaded" ORDER BY "Month of Date Uploaded" ASC
Example Business Question: Which 5 users have uploaded the most documents?
* Sorting applied by clicking the "Count of Documents" box and selecting the Sort Option: Descending
SELECT uploader_id, COUNT(id) AS "Count of Documents" FROM documents GROUP BY "Uploader Id" ORDER BY "Count of Documents" DESC, "Uploader Id" ASC LIMIT 5
Summary: Part-to-Whole analysis returns the percentage that each row entry in a particular column represents compared against the sum of all row entries in that column. In raw SQL, this calculation requires more sophisticated concepts like derived queries, window functions, or common table expressions; however, Chartio makes the process simple with automatic formatting for visualization options like the Pie Chart.
Example Business Question: Of all uploaded documents, what percentage are currently in each status?
SELECT status, 100.0*COUNT(id)/(SELECT count(id) as Total FROM documents) FROM documents GROUP BY status ORDER BY status
Summary: Compare the results of multiple measures against shared or common dimension values. For example, track the monthly document upload count against the number of monthly users signups. These two measurements need to be calculated independently as the date dimensions used in the aggregations are different, date_uploaded and date_joined respectively; however, the results can be combined into a single chart based on the shared dimension values, namely the month values (Dec 2014, Jan 2015, etc). These two aggregations can be performed in separate queries using the Layers functionality in Chartio.
* A more simple direct comparison use case is when the same field is used for the aggregation of each metric. To reference the Order table from the Data Storage module, one could compare the count of orders and the average unit price with each metric aggregated over the order_date field.
Key Chartio Concepts: Layers
Example Business Question: How do user signups by month and number of documents uploaded by month trend against one another?
SELECT users."Month Joined", users."Count of Users", documents."Count of Documents" FROM (SELECT TO_CHAR(date_joined, 'YYYY-MM') AS "Month Joined", COUNT(id) AS "Count Of Users" FROM users GROUP BY "Month Joined" ORDER BY "Month Joined" ASC) as Users JOIN (SELECT TO_CHAR(date_uploaded, 'YYYY-MM') AS "Month Uploaded", COUNT(id) AS "Count of Documents" FROM documents GROUP BY "Month Uploaded" ORDER BY "Month Uploaded" ASC) as documents ON users."Month Joined" = documents."Month Uploaded"
Looking for a way to get started? Follow this playbook for getting started with data analysis! For the business questions below, proceed through each of the following points:
- Is this information tracked, and if so, do I have access to the data?
- What data source, object (table), and field(s) holds the information I need?
- What aggregation function(s), if any, need to be applied?
- Do I need to perform any transformations to the data after initial extraction?
- Once I obtain the answer to the question, what is the best visualization approach displaying the results?