Extracting Data via Queries

What is a query?

A query is a request to retrieve information from a data source based on specific criteria. Queries typically originate in the form of a business question, ex: how many users are on our platform, how many subscriptions started this week, etc, and are then translated into a standardized language that can interact with the data source. SQL, which stands for Structured Query Language, is that language.

Why should I learn about SQL?

Though Chartio provides a drag-and-drop interface to abstract users from the syntactic challenges of writing SQL code, understanding the fundamentals of SQL query construction is a key component in learning to perform data analysis at any level. As such, each of the essential SQL commands is documented below and accompanied by both an example code excerpt and an example of how to reproduce that code using the Chartio Query Creator. Concepts build upon those previously listed.

Introduction to SQL

SQL observes certain constructs that dictate the use and order of specific commands; however, there are subtle differences in the features, functions, and syntax allowed by each type of database. The explanations and examples below are written against a fictional table in a Postgres instance called Documents. Documents stores information pertaining to the files loaded into the fictional e-signiture platform, Digisign. Below you will find a 10-row excerpt from the Documents table, which you can use as a reference.

COMMANDS

SELECT

The essential command that initiates every SQL query, SELECT specifies what is to be retrieved from the database and often consists of fields and/or expressions from one or more tables. When selecting multiple fields or expressions, each entry should be separated by a comma. This example returns a simple text string, 'This is a test'.  This first example is not retrieving a result from a table, but rather is returning the raw input string which is indicated by the single quote marks.

SELECT 'this is a test'

Optional keywords/commands used in conjunction with SELECT to narrow the scope of results are listed below based on their order of potential appearance in a query.

FROM

Used to indicate the table from which the data will be retrieved. The fields listed in the SELECT clause below (type, id, account_id, etc) live on the Documents table. No quotation marks are used when referencing field names existing on the table listed in the FROM clause. 

SELECT type, status, id, date_uploaded 
FROM documents
* Queries constructed with interactive mode and no aggregation or grouping require fields to be placed in the measures box with no aggregation function applied.

To identify the source table for any field listed in a SQL query, the table or table alias (if used), precedes the field and is separated from the field name with a period. This is most commonly used when extracting data from multiple tables at once. Example formatting shown below:

SELECT table_name.field_name_A, table_name.field_name_B 
FROM table_name

WHERE

Used to pare down the result of a query to only those rows of the source data which meet the specified criteria. This will let you hone in on a certain user, group of statuses, range of dates, etc. WHERE clause syntax typically follows this pattern: WHERE column_name OPERATOR value. Multiple criteria can be used simultaneously via the AND or OR operators. This example returns only records for which the uploader_id is not null and the date_uploaded is greater than or equal to July 1, 2014 and less than January 1, 2015. Note that fields not listed in the SELECT clause of the statement are still eligible for use in the WHERE clause. Additional detail regarding the WHERE command provided here.

SELECT type, status, id, date_uploaded 
FROM documents 
WHERE date_uploaded BETWEEN '2014-07-01' AND '2015-01-01' AND
      uploader_id is not null

GROUP BY

Used to group all records with matching values such that each unique combination has exactly one resulting record. Group By commands are typically used with aggregation functions in the SELECT clause in order to perform a mathematical function across the same population of records, for example: count, minimum, maximum, average, sum, etc. In this example, the groups are defined by the combination of unique values from the Type and Status fields. The final column, Count of Ids, represents the number of rows encompassed by each group. 

SELECT type, status, COUNT(id) 
FROM documents 
WHERE date_uploaded BETWEEN '2014-07-01' AND '2015-01-01' AND
      uploader_id is not null 
GROUP BY type, status

*Note that any field listed in the select clause without an aggregation function must also be listed in the GROUP BY clause.

ORDER BY

Used to dictate the sequence in which the query results (rows) are displayed. Ordering, also known as sorting, can be applied to multiple fields, with prioritization assigned from first field listed to last. Ordering is by default ascending (ASC: smallest to largest) and does not need to be declared; however, to sort results descending (DESC: largest to smallest), one needs to follow the column/expression with 'DESC'.

SELECT type, status, COUNT(id)
FROM documents 
WHERE date_uploaded BETWEEN '2014-07-01' AND '2015-01-01' AND
     uploader_id is not null 
GROUP BY type, status 
ORDER BY type, count(id) DESC
* Sorting in this example shown using the Chartio Data Pipeline

LIMIT

Used to dictate the maximum number of results returned by a query.

SELECT type, status, COUNT(id) 
FROM documents 
WHERE date_uploaded BETWEEN '2014-07-01' AND '2015-01-01' AND
      uploader_id is not null 
GROUP BY type, status 
ORDER BY type, count(id) DESC 
LIMIT 3

*The majority of queries will contain SELECT and FROM clauses along with some combination of WHERE, GROUP BY, ORDER BY and LIMIT. That said, the ancillary clauses are independent and each can be used with or without the others.

ALIASES

Aliases are used to rename a table or column heading for the duration of the query. Aliases are typically used to change the visual appearance of a result or to shorten the text used to identify a table or column. The word AS is used for applying aliases to both columns and tables. 

Renaming Columns

Syntax Structure:

SELECT field_name AS field_alias 
FROM table_name

Example:

SELECT type, type as "Example Alias" 
FROM documents

Renaming Tables

Syntax Structure:

SELECT table_alias.field_name 
FROM table_name AS table_alias

Renaming tables is a common practice when utilizing more advanced SQL practices like joins and nested queries but can be used to simply shorten or abbreviate long table names.

Example:

SELECT Docs.id, Docs.type 
FROM documents as Docs

OPERATORS

Operators specify an action to be performed on one or more inputs. Common operator classifications are listed below.

Comparison Operators

Comparison operators evaluate one input against another and return a boolean (true/false) result. Records for which this boolean evaluation is false are excluded from the result set. Example operators listed below (comprehensive list available in database-specific documentation)

  • = (equal to)
  • != (not equal to)
  • > (greater than)
  • >= (greater than or equal to)

Example: Return all records for which the uploaded_date was before October 1, 2014.

SELECT id, type, date_uploaded 
FROM documents as Docs 
WHERE date_uploaded < '2014-10-01' 
ORDER BY date_uploaded DESC 
LIMIT 5

Any record with a date_uploaded greater than or equal to October 1, 2014 fails the comparison established in the WHERE clause and is therefore excluded from the result set.

Logical Operators

The logical operators consist of AND, OR, and NOT. These operators evaluate the truth of inputs and return a boolean (true/false) result based on the criteria established by the operator (defined below). Records for which the boolean evaluation is false are excluded from the result set.

  • AND: Returns TRUE if the boolean evaluation of both inputs is true
  • OR: Returns TRUE if the boolean evaluation of either input is true
  • NOT: Negates the boolean evaluation of an input (True becomes False, or False becomes True)

Example: Return only records for which the uploaded_date was prior to October 1, 2014 AND the document type is PDF.

SELECT id, type, date_uploaded 
FROM documents as Docs 
WHERE date_uploaded < '2014-10-01' AND 
      type = 'pdf' 
ORDER BY date_uploaded DESC 
LIMIT 5

Mathematical Operators

Mathematical Operators manipulate numeric inputs. Common math operators including addition (+), subtraction (-), multiplication (*) and division (/). Example:

SELECT 4*5 as "4*5", id, id*2 as "Id*2" 
FROM documents

String Operators

String Operators accept string inputs and output strings. An example String Operator is concatenate, which joins strings end-to-end. The concatenate operator is represented by double pipe symbols, "||". Example:

SELECT type, status, type || '-' || status 
FROM documents

FUNCTIONS

Functions accept input(s), perform a task/transformation, and provide output(s). These inputs are either values to be transformed, or parameters that configuration how the function will be applied. Not all functions require inputs (ex: NOW() returns the current date and time). Based on the nature of the inputs, transformations and outputs, functions can be subclassified. This section will highlight aggregation functions, mathematical functions, and string (text) functions.

*Function availability and use is database specific. Check the documentation for your database for specifics.

Aggregation Functions

Aggregation functions output a single value that represents the collection of input values (for SQL purposes, the inputs are row entries). The value returned by the function varies depending on the aggregation type, which is most commonly the sum, count, average, minimum or maximum. Aggregation functions are typically used in conjunction with GROUP BY clauses, and will consequently output one value per group in the result set.

SELECT type, COUNT(id) as "Count of Ids", MAX(date_uploaded) as "Most Recent Upload" 
FROM documents 
GROUP BY type 
ORDER BY type

Mathematical Functions

This collection of functions deals specifically with the application of traditional mathematical concepts within the database. This example displays the id and the square root of the id. Given the data type of the id field (integer), the database permits this input for the function.

SELECT id, SQRT(id) 
FROM Documents

String Functions

String functions are primarily used to measure or manipulate strings. The example below demonstrates the measurement of strings (length), as well as string manipulation (left).

SELECT status as "Status", length(status) "Length of Status", left(status, 3) "Leftmost 3 Characters in Status" 
FROM documents 
GROUP BY status

*Note that functions can be nested, meaning the input for a function can be the result of another function. Ex: Round(SQRT(id),2) - this will return the square root of the ID value rounded to the nearest hundredth.

EXPRESSIONS

Expressions are combination of one or more values, operators, and/or functions that evaluate to a value. Complex expressions may can be formed through the combination of simple expressions. The following are all example expressions and the data type of each expression precedes the actual expression:

  • Number: 1+1
  • String: 'This sentence is an expression'
  • Date: 2015-01-01
  • Number: ROUND(3.14159*10**2, 2)