Intro to Databases

1. Database Design and Modeling

1.1 What is a Database?

A database is an organized collection of structured information or data, typically stored electronically in a computer system. Databases are managed by Database Management Systems (DBMS), which provide tools to create, retrieve, update, and manage data.

Key Points:

Data Storage: Efficient storage and retrieval of data.
Data Management: Ensures data integrity, security, and consistency.
Types: Relational, NoSQL, etc.

1.2 Types of Databases

Relational Databases (RDBMS): Use tables (relations) to store data. Examples: MySQL, PostgreSQL, Oracle.
NoSQL Databases: Designed for specific data models and use cases. Examples:
- Document Stores: MongoDB
- Key-Value Stores: Redis
- Column Stores: Cassandra
- Graph Databases: Neo4j
NewSQL: Combines scalability of NoSQL with ACID guarantees of RDBMS. Examples: Google Spanner.

1.3 Relational Database Concepts

Tables (Entities): Consist of rows (records) and columns (attributes).
Relationships: Define how tables relate to each other.
- One-to-One
- One-to-Many
- Many-to-Many

1.4 Entity-Relationship (ER) Diagrams

ER Diagrams visually represent the structure of a database.

Components:

Entities: Represented by rectangles. Example: User, Order.
Attributes: Represented by ovals connected to their entity. Example: User has UserID, Name.
Relationships: Represented by diamonds connecting entities. Example: User places Order.

Example:

[User]---places---[Order]

1.5 Normalization

Normalization is a database design technique that organizes tables to minimize redundancy and dependency. The goal is to ensure data integrity and to make it easier to maintain the database. Normalization involves dividing a database into two or more tables and defining relationships between them. There are several normal forms, but the most commonly used are:

First Normal Form (1NF):
- A table is in 1NF if it contains only atomic (indivisible) values and each entry in a column is of the same data type.
- Example:
  - Consider a Students table with a column for Courses:
    
    StudentID StudentName Courses
    1 Alice Math, Science
    2 Bob History
  - This violates 1NF because the Courses column contains multiple values. To convert it to 1NF:
    
    StudentID StudentName Course
    1 Alice Math
    1 Alice Science
    2 Bob History
  - Now each course is atomic, and the table is in 1NF.
Second Normal Form (2NF):
- A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the primary key. This means there should be no partial dependency of any column on the primary key.
- Example:
  - Consider a table of Enrollments:
    
    StudentID Course Instructor
    1 Math Dr. Smith
    1 Science Dr. Johnson
    2 History Dr. Brown
  - Here, Instructor depends only on Course, not on the composite primary key (StudentID, Course). To convert to 2NF, we split it into two tables:
    - Enrollments:
      
      StudentID Course
      1 Math
      1 Science
      2 History
    - Courses:
      
      Course Instructor
      Math Dr. Smith
      Science Dr. Johnson
      History Dr. Brown
Third Normal Form (3NF):
- A table is in 3NF if it is in 2NF and there are no transitive dependencies. This means that non-key attributes should not depend on other non-key attributes.
- Example:
  - Consider a Students table with Department and DepartmentHead:
    
    StudentID StudentName Department DepartmentHead
    1 Alice CS Dr. Adams
    2 Bob Math Dr. Baker
  - Here, DepartmentHead depends on Department, creating a transitive dependency. To convert to 3NF, split it into two tables:
    - Students:
      
      StudentID StudentName Department
      1 Alice CS
      2 Bob Math
    - Departments:
      
      Department DepartmentHead
      CS Dr. Adams
      Math Dr. Baker
Boyce-Codd Normal Form (BCNF):
- A table is in BCNF if it is in 3NF and every determinant is a candidate key. This further reduces redundancy.
- Example:
  - Consider a Projects table where a project can have multiple employees working on it:
    
    ProjectID EmployeeID Role
    1 101 Developer
    1 102 Tester
    2 101 Manager
  - In this case, the combination of ProjectID and EmployeeID forms the primary key, but Role can also determine the EmployeeID. To normalize it to BCNF, we need to separate the roles:
    - ProjectAssignments:
      
      ProjectID EmployeeID
      1 101
      1 102
      2 101
    - Roles:
      
      EmployeeID Role
      101 Developer
      102 Tester
      101 Manager

StudentID	StudentName	Courses
1	Alice	Math, Science
2	Bob	History

StudentID	StudentName	Course
1	Alice	Math
1	Alice	Science
2	Bob	History

StudentID	Course	Instructor
1	Math	Dr. Smith
1	Science	Dr. Johnson
2	History	Dr. Brown

StudentID	Course
1	Math
1	Science
2	History

Course	Instructor
Math	Dr. Smith
Science	Dr. Johnson
History	Dr. Brown

StudentID	StudentName	Department	DepartmentHead
1	Alice	CS	Dr. Adams
2	Bob	Math	Dr. Baker

StudentID	StudentName	Department
1	Alice	CS
2	Bob	Math

Department	DepartmentHead
CS	Dr. Adams
Math	Dr. Baker

ProjectID	EmployeeID	Role
1	101	Developer
1	102	Tester
2	101	Manager

ProjectID	EmployeeID
1	101
1	102
2	101

EmployeeID	Role
101	Developer
102	Tester
101	Manager

By applying these normalization forms, databases become more efficient and easier to maintain, reducing data redundancy and enhancing data integrity. However, it’s essential to balance normalization with performance considerations, as overly normalized databases can lead to complex queries and performance issues.

1.6 Keys

Primary Key: A unique identifier for each record in a table. Cannot be NULL. Example: UserID in User table.
Foreign Key: A field (or collection of fields) in one table that uniquely identifies a row of another table. Establishes a relationship between tables.
Composite Key: A primary key composed of multiple columns.
Unique Key: Ensures all values in a column are unique, but can be NULL.
Surrogate Key: An artificially generated key, usually an auto-increment integer.

2. Basic SQL Queries

Structured Query Language (SQL) is used to interact with relational databases.

2.1 SELECT Statements

Retrieve data from one or more tables.

Syntax:

SELECT column1, column2, ...
FROM table_name;

Examples:

Select specific columns:
```
SELECT name, email FROM users;
```
Select all columns:
```
SELECT * FROM users;
```

2.2 INSERT, UPDATE, DELETE

INSERT: Add new records to a table.

INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);

Example:

INSERT INTO users (name, email)
VALUES ('Alice', '[email protected]');

UPDATE: Modify existing records.

UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

Example:

UPDATE users
SET email = '[email protected]'
WHERE user_id = 1;

DELETE: Remove records from a table.

DELETE FROM table_name
WHERE condition;

Example:

DELETE FROM users
WHERE user_id = 1;

Caution: Always use WHERE clause with UPDATE and DELETE to avoid affecting all records.

2.3 WHERE Clause

Filters records based on specified conditions.

Operators:

Comparison: =, !=, <, >, <=, >=
Logical: AND, OR, NOT
Range: BETWEEN ... AND ...
Set: IN (...)
Pattern Matching: LIKE

Examples:

Simple condition:
```
SELECT * FROM users WHERE age > 25;
```

Multiple conditions:

SELECT * FROM users
WHERE age > 25 AND city = 'New York';

Pattern matching:

SELECT * FROM users
WHERE email LIKE '%@example.com';

2.4 ORDER BY and GROUP BY

ORDER BY: Sorts the result set.

Syntax:

SELECT column1, column2, ...
FROM table_name
ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...;

Example:

SELECT name, age FROM users
ORDER BY age DESC;

GROUP BY: Groups records that have the same values in specified columns. Often used with aggregate functions.

Syntax:

SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;

Example:

SELECT city, COUNT(*) as user_count
FROM users
GROUP BY city;

HAVING Clause: Filters groups based on conditions.

Example:

SELECT city, COUNT(*) as user_count
FROM users
GROUP BY city
HAVING COUNT(*) > 10;

3. Joins

Joins are used in SQL to combine rows from two or more tables based on a related column between them. There are several types of joins, each serving a different purpose. Understanding joins is essential for querying relational databases effectively.

3.1 INNER JOIN

An inner join returns only the rows that have matching values in both tables. It is the most common type of join.

Example: Consider two tables: Students and Enrollments.

Students:

| StudentID | StudentName |
|-----------|-------------|
| 1         | Alice       |
| 2         | Bob         |
| 3         | Charlie     |

Enrollments:

| StudentID | Course   |
|-----------|----------|
| 1         | Math     |
| 1         | Science  |
| 2         | History  |

SQL Query:

SELECT Students.StudentName, Enrollments.Course
FROM Students
INNER JOIN Enrollments ON Students.StudentID = Enrollments.StudentID;

Result:

| StudentName | Course   |
|-------------|----------|
| Alice       | Math     |
| Alice       | Science  |
| Bob         | History  |

3.2 LEFT (OUTER) JOIN

A left join returns all the rows from the left table and the matched rows from the right table. If there is no match, NULL values are returned for columns from the right table.

Example: Using the same Students and Enrollments tables:

SQL Query:

SELECT Students.StudentName, Enrollments.Course
FROM Students
LEFT JOIN Enrollments ON Students.StudentID = Enrollments.StudentID;

Result:

| StudentName | Course   |
|-------------|----------|
| Alice       | Math     |
| Alice       | Science  |
| Bob         | History  |
| Charlie     | NULL     |

3.3 RIGHT (OUTER) JOIN

A right join returns all the rows from the right table and the matched rows from the left table. If there is no match, NULL values are returned for columns from the left table.

Example: Suppose we have a Courses table in addition to the existing ones:

Courses:

| CourseID | CourseName |
|----------|------------|
| 1        | Math       |
| 2        | Science    |
| 3        | History    |

SQL Query:

SELECT Students.StudentName, Courses.CourseName
FROM Students
RIGHT JOIN Enrollments ON Students.StudentID = Enrollments.StudentID
RIGHT JOIN Courses ON Enrollments.Course = Courses.CourseName;

Result:

| StudentName | CourseName |
|-------------|------------|
| Alice       | Math       |
| Alice       | Science    |
| Bob         | History    |
| NULL        | History    |

3.4 FULL (OUTER) JOIN

A full join returns all rows from both tables, with NULLs in place where there are no matches.

Example: Using the previous tables:

SQL Query:

SELECT Students.StudentName, Enrollments.Course
FROM Students
FULL OUTER JOIN Enrollments ON Students.StudentID = Enrollments.StudentID;

Result:

| StudentName | Course   |
|-------------|----------|
| Alice       | Math     |
| Alice       | Science  |
| Bob         | History  |
| Charlie     | NULL     |

3.5 CROSS JOIN

A cross join returns the Cartesian product of the two tables, combining every row of the first table with every row of the second table.

Example:

SQL Query:

SELECT Students.StudentName, Courses.CourseName
FROM Students
CROSS JOIN Courses;

Result:

| StudentName | CourseName |
|-------------|------------|
| Alice       | Math       |
| Alice       | Science    |
| Alice       | History    |
| Bob         | Math       |
| Bob         | Science    |
| Bob         | History    |
| Charlie     | Math       |
| Charlie     | Science    |
| Charlie     | History    |

3.6 Self JOIN

A table is joined with itself to compare rows within the same table.

Syntax:

SELECT a.columns, b.columns
FROM table1 a
INNER JOIN table1 b
ON a.column = b.column;

Example:

SELECT e1.name AS Employee, e2.name AS Manager
FROM employees e1
INNER JOIN employees e2
ON e1.manager_id = e2.employee_id;

4. Indexes

Indexes improve the speed of data retrieval operations on a database table at the cost of additional storage and slower write operations.

4.1 What is an Index?

An index is a data structure that allows the DBMS to find and retrieve specific rows much faster than without an index.

Analogy: Similar to an index in a book, which allows you to quickly locate pages without scanning every page.

4.2 Types of Indexes

Primary Index: Automatically created when a primary key is defined. Ensures uniqueness.
Unique Index: Ensures all values in the indexed column are unique.
Non-Unique Index: Allows duplicate values; used to speed up queries.
Clustered Index: Determines the physical order of data in a table. A table can have only one clustered index.
Non-Clustered Index: Separate from the data storage; contains pointers to the data. A table can have multiple non-clustered indexes.
Composite Index: An index on multiple columns.

4.3 How Indexes Work

B-Tree Indexes: Most common type, where data is stored in a balanced tree structure allowing logarithmic time complexity for searches.
Hash Indexes: Used for equality searches; faster but limited in functionality.
Bitmap Indexes: Efficient for columns with a limited number of distinct values.

4.4 Pros and Cons of Indexes

Pros:

Faster Query Performance: Significantly reduces the time to retrieve data.
Efficient Sorting: Helps in ORDER BY and GROUP BY operations.
Uniqueness Enforcement: Ensures data integrity for unique columns.

Cons:

Storage Overhead: Requires additional disk space.
Slower Write Operations: INSERT, UPDATE, DELETE operations can be slower due to index maintenance.
Complexity: Managing multiple indexes can complicate database design.

4.5 Creating and Managing Indexes

Creating an Index:

CREATE INDEX index_name
ON table_name (column1, column2, ...);

Example:

CREATE INDEX idx_users_email
ON users (email);

Creating a Unique Index:

CREATE UNIQUE INDEX idx_unique_email
ON users (email);

Dropping an Index:

DROP INDEX index_name;

Note: Syntax may vary by DBMS.

Viewing Indexes:

MySQL:
```
SHOW INDEX FROM table_name;
```
PostgreSQL:
```
\d table_name
```

Best Practices:

Index Columns Used in WHERE Clauses: Speeds up search operations.
Index Foreign Keys: Enhances join performance.
Avoid Over-Indexing: Balance between read and write performance.
Use Composite Indexes Wisely: Order matters; place the most selective columns first.

Additional Tips for Your Interview

Understand Use Cases:
- Be prepared to explain when and why to use certain types of joins or indexes.
Explain Trade-offs:
- Discuss the balance between normalization and denormalization.
- Talk about the impact of indexes on read vs. write performance.
Practice Writing SQL:
- Be comfortable writing and explaining SQL queries on a whiteboard or in a coding environment.
Real-World Scenarios:
- Think of examples from projects or experiences where you designed a database, wrote complex queries, or optimized performance.
Terminology:
- Ensure you’re familiar with all key terms and can define them clearly.
Performance Considerations:
- Understand how indexing strategies affect query performance.
- Know basic query optimization techniques.
Be Ready for Problem-Solving:
- You might be given a scenario and asked to design a database schema or write queries to retrieve specific information.

Example Scenario

Problem: Design a simple database for an online bookstore. Include tables for Books, Authors, and Orders. Ensure that each order can include multiple books and that each book can have multiple authors.

Solution:

Entities and Relationships:
- Books: Each book has one or more authors.
- Authors: Each author can write multiple books.
- Orders: Each order can include multiple books.
Tables:
- Books:
  - book_id (Primary Key)
  - title
  - isbn
  - published_date
  - price
- Authors:
  - author_id (Primary Key)
  - name
  - bio
- Books_Authors (Join Table for Many-to-Many Relationship):
  - book_id (Foreign Key referencing Books)
  - author_id (Foreign Key referencing Authors)
  - Primary Key: (book_id, author_id)
- Orders:
  - order_id (Primary Key)
  - user_id (Foreign Key referencing Users)
  - order_date
- Orders_Books (Join Table for Many-to-Many Relationship):
  - order_id (Foreign Key referencing Orders)
  - book_id (Foreign Key referencing Books)
  - quantity
  - Primary Key: (order_id, book_id)
Indexes:
- Index on Books.title for faster search.
- Index on Orders.order_date for querying recent orders.
- Composite index on Books_Authors.author_id for efficient author-based queries.

notes MongoDB Cheatsheet