Flat-File Databases
A “table” written into a single file. The most common file type for this purpose is CSV.
The CSV format reads each line as a row, and each comma-separated value as a column.
The first row on a CSV file is used to describe the data in each column. If a , is present on the dataset, surround that entry with " to ’escape’ it.
These files can be read and written to using languages like c-language and python.
Relational Databases
Instead of using a single table, a relational database can store data in multiple tables, and then define relationships between them.
Each table must have one column with an unique key that identifies each row.
Relationships
One-to-one Relationship
In this example, id in shows corresponds to the show_id in the ratings table.
One-to-many Relationship
shows has a one-to-many relationship with genres, because a single show entry can have many genders.
Many-to-many Relationship
Both the people and writers table have a field shared with stars.
SQL
SQL is a language designed specifically for interfacing with relational databases.
To use SQL, a database is needed, so, for this example, I will use sqlite3.
Create a Database
sqlite3 name.db # Create a db file and initialize it.
sqlite3 Commands
Import a CSV Table
.mode csv
.import name.csv table_name
General Commands
.schema -- Print all tables and fields (using the commands used for creation)
.schema table -- Show the command used for creating a table
.exit
Base Syntax
CREATE TABLE table (column type, ...); -- Create a new table
SELECT columns FROM table; -- Output/Print data
INSERT INTO table (column, ...) VALUES(value, ...); -- Add data
UPDATE table SET column = value WHERE condition; -- Update values
DELETE FROM table WHERE condition; -- Delete data
In
SQL, there is NO WAY to undo actions. Especially when writing or deleting from the database, do not type;unless you know exactly what you are doing!
Wildcard
SELECT * FROM table; -- Outputs every column in the table (wildcard selector)
Functions
Math Functions
AVGCOUNTDISTINCTLOWERMAXMINUPPERLOWER…SELECT COUNT(*) FROM table; -- Counts the number of rows in a table SELECT DISTINCT column from table; -- Show only the unique values in a column SELECT COUNT(DISTINCT column) FROM table; -- Count the unique values in a column
Logic Functions
GROUP BYLIKELIMITORDER BYWHERE…
-- Count every instance of a value in a column
SELECT COUNT(*) FROM table WHERE column = 'string';
-- LIKE is used to select using formatting. '%' selects every character after)
SELECT * FROM favorites WHERE prog_language = 'C' AND problem LIKE "Hello, %"
-- Group all individual rows and display their count
SELECT prog_language, COUNT(*) FROM favorites GROUP BY prog_language;
-- Sort values by their count, ascending.
SELECT prog_language, COUNT(*) FROM favorites GROUP BY prog_language ORDER BY COUNT(*);
-- Sort values by their count, descending.
SELECT prog_language, COUNT(*) FROM favorites GROUP BY prog_language ORDER BY COUNT(*) DESC;
-- Limit the output to a single row (showing the most popular language, in this case)
SELECT prog_language, COUNT(*) FROM favorites GROUP BY prog_language ORDER BY COUNT(*) DESC LIMIT 1;
Operators
ANDOR
-- Using the AND operator to select two values
SELECT * FROM table WHERE column1 = 'a' AND column2 = 'b';
-- AND and OR (' is escaped using '' (' twice))
SELECT * FROM table WHERE prog_language = 'C' AND (problem = 'Hello, World' OR problem = 'Hello, It''s Me');
Aliases
AS
-- The column COUNT(*) will be renamed to 'n'.
SELECT prog_language, COUNT(*) AS n FROM favorites GROUP BY prog_language ORDER BY n DESC;
Conditions
IS
-- Delete all values where timestamp1 is a NULL value.
DELETE FROM favorites WHERE timestamp1 IS NULL;
Data Types
BLOBINTEGERNUMERICREALTEXTNULLNOT NULLUNIQUEPRIMARY KEYFOREIGN KEY
Relationships
INJOIN
One-to-one
-- 'shows' table
CREATE TABLE shows (
id INTEGER,
title TEXT NOT NULL,
year NUMERIC,
episodes INTEGER,
PRIMARY KEY(id)
);
-- Table connected with a one-to-one relationship with the 'shows' table
CREATE TABLE ratings (
show_id INTEGER NOT NULL,
rating REAL NOT NULL,
votes INTEGER NOT NULL,
FOREIGN KEY(show_id) REFERENCES shows(id)
);
-- Lists 'show_id', but not the actual names of each show.
SELECT show_id FROM ratings WHERE rating >= 6.0 LIMIT 10;
-- Executes the nested query first, then shows the entries selected on 'shows'
SELECT * FROM shows WHERE id IN
(SELECT show_id FROM ratings WHERE rating >= 6.0)
LIMIT 10;
-- Join both tables
SELECT title, rating FROM shows JOIN
ratings ON shows.id = ratings.show_id WHERE rating >= 6.0
LIMIT 10;
One-to-many
ON
CREATE TABLE genres (
show_id INTEGER NOT NULL,
genre TEXT NOT NULL,
FOREIGN KEY(show_id) REFERENCES shows(id)
);
-- Search for all genres for a show
SELECT genre FROM genres WHERE show_id =
(SELECT id FROM shows WHERE title = 'TitleOfShow');
-- Join tables, to show title alongside genres of a show
SELECT title, genre FROM shows JOIN genres ON shows.id =
genres.show_id WHERE id =
(SELECT id FROM shows WHERE title = 'TitleOfShow');
Many-to-many

-- Select every person who starred in a show
SELECT name FROM people WHERE id IN
(SELECT person_id FROM stars WHERE show_id =
(SELECT id FROM shows WHERE title = 'Some Name'));
-- Joined table with the show name and stars
SELECT title FROM shows JOIN people.name WHERE id IN
(SELECT person_id FROM stars WHERE show_id =
(SELECT id FROM shows WHERE title = 'Some Name'));
-- Another way to join tables
SELECT title FROM shows, stars, people
WHERE shows.id = stars.show_id
AND people.id = stars.person_id
AND name = 'Name';
Indexes
Load an index of the database into RAM to optimize searches.
The underneath algorithm is a B-Tree, which takes up more space in memory and slightly slows write speed on the database.
CREATE INDEX name ON table (column, ...);
.timer ON -- sqlite3 command to show how long operations take to complete
SELECT * FROM shows WHERE title = 'Name'; -- 0.043s
CREATE INDEX title_index ON shows(title); -- Index title on shows
SELECT * FROM shows WHERE title = 'Name'; -- 0.001s
SELECT name FROM people WHERE id IN
(SELECT person_id FROM stars WHERE show_id =
(SELECT id FROM shows WHERE title = 'Some Name')); -- 0.215s
CREATE INDEX name_index ON people(name);
CREATE INDEX person_index ON stars(person_id);
SELECT name FROM people WHERE id IN
(SELECT person_id FROM stars WHERE show_id =
(SELECT id FROM shows WHERE title = 'Some Name')); -- 0.001s
Race Conditions
When SQL is integrated with languages such as Python, for example, or is being accessed by multiple instances, race conditions may arise.
For example, consider the following code:
...
rows = db.execute("SELECT likes FROM posts WHERE id = ?", id);
likes = rows[0]["likes"]
db.execute("UPDATE posts SET likes = ? WHERE id = ?", likes + 1, id);
If this code is executed twice at the same time, instead of adding 2 likes to the database, only one will be added, because the program would read the database in the same state, and then add +1 like to that same, previous state.SQL provides solutions to this in the form of transactions:
BEGIN TRANSACTIONCOMMITROLLBACK
OnPython, a solution could be:
db.execute("BEGIN TRANSACTION")
rows = db.execute("SELECT likes FROM posts WHERE id = ?", id);
likes = rows[0]["likes"]
db.execute("UPDATE posts SET likes = ? WHERE id = ?", likes + 1, id);
db.execute("COMMIT")
Where the database would be “locked” while the program ran.
SQL Injection Attacks
If a program asks for user input, and that input is then passed over to the database, the user could write special SQL syntax to break the query.
For example:
If a program takes an email address as input, some user could type: email@example.com'--, which marks a comment.
If the query was similar to the following:
rows = db.execute(f"SELECT * FROM users WHERE email = '{email}' AND password = {psk}")
Replacing values…
rows = db.execute(f"SELEC * FROM users WHERE email = 'email@example.com'--' AND password = 'password123'")
Which would result in the user being able to log in without needing a password.
The solution to this is to use a library which uses placeholders to insert data in queries.
rows = db.execute("SELECT * FROM users WHERE email = ? AND password = ?", email, psk)