Data Storage Basics

Terms and Concepts

Before any data analysis can occur, data must first be captured and stored. The below list of terms, definitions, and images are provided to establish a baseline definition for frequently cited analytics concepts involving data capture and storage. 

Data Source

A tool that gathers and stores information, referred to as data. Examples include GoogleAnalytics, MySQL, Salesforce, and Amazon's Redshift.

Database

A data source that is typically organized in a format that allows for efficient storage and retrieval of information. There are many types of databases, but the some of the most common include Relational, Document Oriented, Graph, and Key-Value.

Relational Database

A database that stores related information in tables (defined below), also known as relations, which are 2-dimensional structures comprised of columns and rows. Examples include Postgres, MySQL, and Redshift.

Schema

A collection of objects within a database, typically consisting of tables (defined below) and the relationships between those tables.

Table

Tables are a collection of logically related information represented in 2 dimensions, called columns and rows, and are the fundamental building block of data storage in relational databases. Common examples include: Users table, Payments table, Customers table, etc. Below we will construct an example Orders table (empty framework shown below).

Row

Also called a record, rows represent a specific instance of related data within a table. In the example case of a Orders Table, there would be a single record for each order.

Column

Also called a field, columns store information of a particular type for each record in a table. Collectively, all columns in a table comprise the structure for each record. In an Orders table, some example columns could be: Id, user_Id, order_date, amount. Different sub-classifications of columns are listed below.

Primary Key

A column in a table for which each entry uniquely represents a single row in that table. Primary keys are often called IDs (short for identifier). Example: id in an Orders table.

Foreign Key

A column in a table that establishes an association with another table via shared values. To accomplish this, a foreign key is populated with values of the primary key from the other table. Foreign keys are also typically titled IDs, but prepended with the name of the referenced table. Example: In the Orders table, there is a foreign key called user_id which matches exactly one primary key value from the Users table.

Dimension

Dimensions provide contextual information and are typically comprised of text or date values (though numeric values can also be dimensions). Dimension is a blanket term that can be used to describe specific fields, or collections of fields (i.e. tables). Example dimensions: Users (table), or status (single field).

Fact

Facts are generally measurable, quantitative components of data. Example facts include quantity, unit_amount, revenue, etc

Data Type

A Data Type is a characteristic of a column that dictates the possible values that the column can assume as well as the methods available for interacting with the data. Some common data types include integers, dates, strings (text), and boolean (true or false) values, though variations and availability differ from data source to data source.