What is a query?
A query is a request to retrieve information from a data source based on specific criteria. Queries typically originate in the form of a business question, ex: how many users are on our platform, how many subscriptions started this week, etc, and are then translated into a standardized language that can interact with the data source. SQL, which stands for Structured Query Language, is that language.
Why should I learn about SQL?
Though Chartio provides a drag-and-drop interface to abstract users from the syntactic challenges of writing SQL code, understanding the fundamentals of SQL query construction is a key component in learning to perform data analysis at any level. As such, each of the essential SQL commands is documented below and accompanied by both an example code excerpt and an example of how to reproduce that code using the Chartio Query Creator. Concepts build upon those previously listed.
Introduction to SQL
SQL observes certain constructs that dictate the use and order of specific commands; however, there are subtle differences in the features, functions, and syntax allowed by each type of database. The explanations and examples below are written against a fictional table in a Postgres instance called Documents. Documents stores information pertaining to the files loaded into the fictional e-signiture platform, Digisign. Below you will find a 10-row excerpt from the Documents table, which you can use as a reference.
The essential command that initiates every SQL query, SELECT specifies what is to be retrieved from the database and often consists of fields and/or expressions from one or more tables. When selecting multiple fields or expressions, each entry should be separated by a comma. This example returns a simple text string, 'This is a test'. This first example is not retrieving a result from a table, but rather is returning the raw input string which is indicated by the single quote marks.
SELECT 'this is a test'
Optional keywords/commands used in conjunction with SELECT to narrow the scope of results are listed below based on their order of potential appearance in a query.
Used to indicate the table from which the data will be retrieved. The fields listed in the SELECT clause below (type, id, account_id, etc) live on the Documents table. No quotation marks are used when referencing field names existing on the table listed in the FROM clause.
Queries constructed with interactive mode and no aggregation or grouping require fields to be placed in the measures box with no aggregation function applied.
SELECT type, status, id, date_uploaded FROM documents
To identify the source table for any field listed in a SQL query, the table or table alias (if used), precedes the field and is separated from the field name with a period. This is most commonly used when extracting data from multiple tables at once. Example formatting shown below:
SELECT table_name.field_name_A, table_name.field_name_B FROM table_name
Used to pare down the result of a query to only those rows of the source data which meet the specified criteria. This will let you hone in on a certain user, group of statuses, range of dates, etc. WHERE clause syntax typically follows this pattern: WHERE column_name OPERATOR value. Multiple criteria can be used simultaneously via the AND or OR operators. This example returns only records for which the uploader_id is not null and the date_uploaded is greater than or equal to July 1, 2014 and less than January 1, 2015. Note that fields not listed in the SELECT clause of the statement are still eligible for use in the WHERE clause. Additional detail regarding the WHERE command provided here.
SELECT type, status, id, date_uploaded FROM documents WHERE date_uploaded BETWEEN '2014-07-01' AND '2015-01-01' AND uploader_id is not null
Used to group all records with matching values such that each unique combination has exactly one resulting record. Group By commands are typically used with aggregation functions in the SELECT clause in order to perform a mathematical function across the same population of records, for example: count, minimum, maximum, average, sum, etc. In this example, the groups are defined by the combination of unique values from the Type and Status fields. The final column, Count of Ids, represents the number of rows encompassed by each group.
SELECT type, status, COUNT(id) FROM documents WHERE date_uploaded BETWEEN '2014-07-01' AND '2015-01-01' AND uploader_id is not null GROUP BY type, status
*Note that any field listed in the select clause without an aggregation function must also be listed in the GROUP BY clause.
Used to dictate the sequence in which the query results (rows) are displayed. Ordering, also known as sorting, can be applied to multiple fields, with prioritization assigned from first field listed to last. Ordering is by default ascending (ASC: smallest to largest) and does not need to be declared; however, to sort results descending (DESC: largest to smallest), one needs to follow the column/expression with 'DESC'.
SELECT type, status, COUNT(id) FROM documents WHERE date_uploaded BETWEEN '2014-07-01' AND '2015-01-01' AND uploader_id is not null GROUP BY type, status ORDER BY type, count(id) DESC
Used to dictate the maximum number of results returned by a query.
SELECT type, status, COUNT(id) FROM documents WHERE date_uploaded BETWEEN '2014-07-01' AND '2015-01-01' AND uploader_id is not null GROUP BY type, status ORDER BY type, count(id) DESC LIMIT 3
*The majority of queries will contain SELECT and FROM clauses along with some combination of WHERE, GROUP BY, ORDER BY and LIMIT. That said, the ancillary clauses are independent and each can be used with or without the others.
Aliases are used to rename a table or column heading for the duration of the query. Aliases are typically used to change the visual appearance of a result or to shorten the text used to identify a table or column. The word AS is used for applying aliases to both columns and tables.
SELECT field_name AS field_alias FROM table_name
SELECT type, type as "Example Alias" FROM documents
SELECT table_alias.field_name FROM table_name AS table_alias
Renaming tables is a common practice when utilizing more advanced SQL practices like joins and nested queries but can be used to simply shorten or abbreviate long table names.
SELECT Docs.id, Docs.type FROM documents as Docs
Operators specify an action to be performed on one or more inputs. Common operator classifications are listed below.
Comparison operators evaluate one input against another and return a boolean (true/false) result. Records for which this boolean evaluation is false are excluded from the result set. Example operators listed below (comprehensive list available in database-specific documentation)
- = (equal to)
- != (not equal to)
- > (greater than)
- >= (greater than or equal to)
Example: Return all records for which the uploaded_date was before October 1, 2014.
SELECT id, type, date_uploaded FROM documents as Docs WHERE date_uploaded < '2014-10-01' ORDER BY date_uploaded DESC LIMIT 5
Any record with a date_uploaded greater than or equal to October 1, 2014 fails the comparison established in the WHERE clause and is therefore excluded from the result set.
The logical operators consist of AND, OR, and NOT. These operators evaluate the truth of inputs and return a boolean (true/false) result based on the criteria established by the operator (defined below). Records for which the boolean evaluation is false are excluded from the result set.
- AND: Returns TRUE if the boolean evaluation of both inputs is true
- OR: Returns TRUE if the boolean evaluation of either input is true
- NOT: Negates the boolean evaluation of an input (True becomes False, or False becomes True)
Example: Return only records for which the uploaded_date was prior to October 1, 2014 AND the document type is PDF.
SELECT id, type, date_uploaded FROM documents as Docs WHERE date_uploaded < '2014-10-01' AND type = 'pdf' ORDER BY date_uploaded DESC LIMIT 5
Mathematical Operators manipulate numeric inputs. Common math operators including addition (+), subtraction (-), multiplication (*) and division (/). Example:
SELECT 4*5 as "4*5", id, id*2 as "Id*2" FROM documents
String Operators accept string inputs and output strings. An example String Operator is concatenate, which joins strings end-to-end. The concatenate operator is represented by double pipe symbols, "||". Example:
SELECT type, status, type || '-' || status FROM documents
Functions accept input(s), perform a task/transformation, and provide output(s). These inputs are either values to be transformed, or parameters that configuration how the function will be applied. Not all functions require inputs (ex: NOW() returns the current date and time). Based on the nature of the inputs, transformations and outputs, functions can be subclassified. This section will highlight aggregation functions, mathematical functions, and string (text) functions.
*Function availability and use is database specific. Check the documentation for your database for specifics.
Aggregation functions output a single value that represents the collection of input values (for SQL purposes, the inputs are row entries). The value returned by the function varies depending on the aggregation type, which is most commonly the sum, count, average, minimum or maximum. Aggregation functions are typically used in conjunction with GROUP BY clauses, and will consequently output one value per group in the result set.
SELECT type, COUNT(id) as "Count of Ids", MAX(date_uploaded) as "Most Recent Upload" FROM documents GROUP BY type ORDER BY type
This collection of functions deals specifically with the application of traditional mathematical concepts within the database. This example displays the id and the square root of the id. Given the data type of the id field (integer), the database permits this input for the function.
SELECT id, SQRT(id) FROM Documents
String functions are primarily used to measure or manipulate strings. The example below demonstrates the measurement of strings (length), as well as string manipulation (left).
SELECT status as "Status", length(status) "Length of Status", left(status, 3) "Leftmost 3 Characters in Status" FROM documents GROUP BY status
*Note that functions can be nested, meaning the input for a function can be the result of another function. Ex: Round(SQRT(id),2) - this will return the square root of the ID value rounded to the nearest hundredth.
Expressions are combination of one or more values, operators, and/or functions that evaluate to a value. Complex expressions may can be formed through the combination of simple expressions. The following are all example expressions and the data type of each expression precedes the actual expression:
- Number: 1+1
- String: 'This sentence is an expression'
- Date: 2015-01-01
- Number: ROUND(3.14159*10**2, 2)