1. Database Design and Modeling
1.1 What is a Database?
A database is an organized collection of structured information or data, typically stored electronically in a computer system. Databases are managed by Database Management Systems (DBMS), which provide tools to create, retrieve, update, and manage data.
Key Points:
- Data Storage: Efficient storage and retrieval of data.
- Data Management: Ensures data integrity, security, and consistency.
- Types: Relational, NoSQL, etc.
1.2 Types of Databases
- Relational Databases (RDBMS): Use tables (relations) to store data. Examples: MySQL, PostgreSQL, Oracle.
- NoSQL Databases: Designed for specific data models and use cases. Examples:
- Document Stores: MongoDB
- Key-Value Stores: Redis
- Column Stores: Cassandra
- Graph Databases: Neo4j
- NewSQL: Combines scalability of NoSQL with ACID guarantees of RDBMS. Examples: Google Spanner.
1.3 Relational Database Concepts
- Tables (Entities): Consist of rows (records) and columns (attributes).
- Relationships: Define how tables relate to each other.
- One-to-One
- One-to-Many
- Many-to-Many
1.4 Entity-Relationship (ER) Diagrams
ER Diagrams visually represent the structure of a database.
Components:
- Entities: Represented by rectangles. Example:
User,Order. - Attributes: Represented by ovals connected to their entity. Example:
UserhasUserID,Name. - Relationships: Represented by diamonds connecting entities. Example:
UserplacesOrder.
Example:
[User]---places---[Order]1.5 Normalization
Normalization is a database design technique that organizes tables to minimize redundancy and dependency. The goal is to ensure data integrity and to make it easier to maintain the database. Normalization involves dividing a database into two or more tables and defining relationships between them. There are several normal forms, but the most commonly used are:
-
First Normal Form (1NF):
- A table is in 1NF if it contains only atomic (indivisible) values and each entry in a column is of the same data type.
- Example:
-
Consider a
Studentstable with a column forCourses:StudentID StudentName Courses 1 Alice Math, Science 2 Bob History -
This violates 1NF because the
Coursescolumn contains multiple values. To convert it to 1NF:StudentID StudentName Course 1 Alice Math 1 Alice Science 2 Bob History -
Now each course is atomic, and the table is in 1NF.
-
-
Second Normal Form (2NF):
- A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the primary key. This means there should be no partial dependency of any column on the primary key.
- Example:
-
Consider a table of
Enrollments:StudentID Course Instructor 1 Math Dr. Smith 1 Science Dr. Johnson 2 History Dr. Brown -
Here,
Instructordepends only onCourse, not on the composite primary key (StudentID,Course). To convert to 2NF, we split it into two tables:-
Enrollments:StudentID Course 1 Math 1 Science 2 History -
Courses:Course Instructor Math Dr. Smith Science Dr. Johnson History Dr. Brown
-
-
-
Third Normal Form (3NF):
- A table is in 3NF if it is in 2NF and there are no transitive dependencies. This means that non-key attributes should not depend on other non-key attributes.
- Example:
-
Consider a
Studentstable withDepartmentandDepartmentHead:StudentID StudentName Department DepartmentHead 1 Alice CS Dr. Adams 2 Bob Math Dr. Baker -
Here,
DepartmentHeaddepends onDepartment, creating a transitive dependency. To convert to 3NF, split it into two tables:-
Students:StudentID StudentName Department 1 Alice CS 2 Bob Math -
Departments:Department DepartmentHead CS Dr. Adams Math Dr. Baker
-
-
-
Boyce-Codd Normal Form (BCNF):
- A table is in BCNF if it is in 3NF and every determinant is a candidate key. This further reduces redundancy.
- Example:
-
Consider a
Projectstable where a project can have multiple employees working on it:ProjectID EmployeeID Role 1 101 Developer 1 102 Tester 2 101 Manager -
In this case, the combination of
ProjectIDandEmployeeIDforms the primary key, butRolecan also determine theEmployeeID. To normalize it to BCNF, we need to separate the roles:-
ProjectAssignments:ProjectID EmployeeID 1 101 1 102 2 101 -
Roles:EmployeeID Role 101 Developer 102 Tester 101 Manager
-
-
By applying these normalization forms, databases become more efficient and easier to maintain, reducing data redundancy and enhancing data integrity. However, it’s essential to balance normalization with performance considerations, as overly normalized databases can lead to complex queries and performance issues.
1.6 Keys
- Primary Key: A unique identifier for each record in a table. Cannot be NULL. Example:
UserIDinUsertable. - Foreign Key: A field (or collection of fields) in one table that uniquely identifies a row of another table. Establishes a relationship between tables.
- Composite Key: A primary key composed of multiple columns.
- Unique Key: Ensures all values in a column are unique, but can be NULL.
- Surrogate Key: An artificially generated key, usually an auto-increment integer.
2. Basic SQL Queries
Structured Query Language (SQL) is used to interact with relational databases.
2.1 SELECT Statements
Retrieve data from one or more tables.
Syntax:
SELECT column1, column2, ...
FROM table_name;Examples:
- Select specific columns:
SELECT name, email FROM users; - Select all columns:
SELECT * FROM users;
2.2 INSERT, UPDATE, DELETE
INSERT: Add new records to a table.
INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);Example:
INSERT INTO users (name, email)
VALUES ('Alice', '[email protected]');UPDATE: Modify existing records.
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;Example:
UPDATE users
SET email = '[email protected]'
WHERE user_id = 1;DELETE: Remove records from a table.
DELETE FROM table_name
WHERE condition;Example:
DELETE FROM users
WHERE user_id = 1;Caution: Always use WHERE clause with UPDATE and DELETE to avoid affecting all records.
2.3 WHERE Clause
Filters records based on specified conditions.
Operators:
- Comparison:
=,!=,<,>,<=,>= - Logical:
AND,OR,NOT - Range:
BETWEEN ... AND ... - Set:
IN (...) - Pattern Matching:
LIKE
Examples:
- Simple condition:
SELECT * FROM users WHERE age > 25; - Multiple conditions:
SELECT * FROM users WHERE age > 25 AND city = 'New York'; - Pattern matching:
SELECT * FROM users WHERE email LIKE '%@example.com';
2.4 ORDER BY and GROUP BY
ORDER BY: Sorts the result set.
Syntax:
SELECT column1, column2, ...
FROM table_name
ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...;Example:
SELECT name, age FROM users
ORDER BY age DESC;GROUP BY: Groups records that have the same values in specified columns. Often used with aggregate functions.
Syntax:
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;Example:
SELECT city, COUNT(*) as user_count
FROM users
GROUP BY city;HAVING Clause: Filters groups based on conditions.
Example:
SELECT city, COUNT(*) as user_count
FROM users
GROUP BY city
HAVING COUNT(*) > 10;3. Joins
Joins are used in SQL to combine rows from two or more tables based on a related column between them. There are several types of joins, each serving a different purpose. Understanding joins is essential for querying relational databases effectively.
3.1 INNER JOIN
-
An inner join returns only the rows that have matching values in both tables. It is the most common type of join.
-
Example: Consider two tables:
StudentsandEnrollments.-
Students:| StudentID | StudentName | |-----------|-------------| | 1 | Alice | | 2 | Bob | | 3 | Charlie | -
Enrollments:| StudentID | Course | |-----------|----------| | 1 | Math | | 1 | Science | | 2 | History | -
SQL Query:
SELECT Students.StudentName, Enrollments.Course FROM Students INNER JOIN Enrollments ON Students.StudentID = Enrollments.StudentID; -
Result:
| StudentName | Course | |-------------|----------| | Alice | Math | | Alice | Science | | Bob | History |
-
3.2 LEFT (OUTER) JOIN
-
A left join returns all the rows from the left table and the matched rows from the right table. If there is no match, NULL values are returned for columns from the right table.
-
Example: Using the same
StudentsandEnrollmentstables:-
SQL Query:
SELECT Students.StudentName, Enrollments.Course FROM Students LEFT JOIN Enrollments ON Students.StudentID = Enrollments.StudentID; -
Result:
| StudentName | Course | |-------------|----------| | Alice | Math | | Alice | Science | | Bob | History | | Charlie | NULL |
-
3.3 RIGHT (OUTER) JOIN
-
A right join returns all the rows from the right table and the matched rows from the left table. If there is no match, NULL values are returned for columns from the left table.
-
Example: Suppose we have a
Coursestable in addition to the existing ones:-
Courses:| CourseID | CourseName | |----------|------------| | 1 | Math | | 2 | Science | | 3 | History | -
SQL Query:
SELECT Students.StudentName, Courses.CourseName FROM Students RIGHT JOIN Enrollments ON Students.StudentID = Enrollments.StudentID RIGHT JOIN Courses ON Enrollments.Course = Courses.CourseName; -
Result:
| StudentName | CourseName | |-------------|------------| | Alice | Math | | Alice | Science | | Bob | History | | NULL | History |
-
3.4 FULL (OUTER) JOIN
-
A full join returns all rows from both tables, with NULLs in place where there are no matches.
-
Example: Using the previous tables:
-
SQL Query:
SELECT Students.StudentName, Enrollments.Course FROM Students FULL OUTER JOIN Enrollments ON Students.StudentID = Enrollments.StudentID; -
Result:
| StudentName | Course | |-------------|----------| | Alice | Math | | Alice | Science | | Bob | History | | Charlie | NULL |
-
3.5 CROSS JOIN
-
A cross join returns the Cartesian product of the two tables, combining every row of the first table with every row of the second table.
-
Example:
-
SQL Query:
SELECT Students.StudentName, Courses.CourseName FROM Students CROSS JOIN Courses; -
Result:
| StudentName | CourseName | |-------------|------------| | Alice | Math | | Alice | Science | | Alice | History | | Bob | Math | | Bob | Science | | Bob | History | | Charlie | Math | | Charlie | Science | | Charlie | History |
-
3.6 Self JOIN
A table is joined with itself to compare rows within the same table.
Syntax:
SELECT a.columns, b.columns
FROM table1 a
INNER JOIN table1 b
ON a.column = b.column;Example:
SELECT e1.name AS Employee, e2.name AS Manager
FROM employees e1
INNER JOIN employees e2
ON e1.manager_id = e2.employee_id;4. Indexes
Indexes improve the speed of data retrieval operations on a database table at the cost of additional storage and slower write operations.
4.1 What is an Index?
An index is a data structure that allows the DBMS to find and retrieve specific rows much faster than without an index.
Analogy: Similar to an index in a book, which allows you to quickly locate pages without scanning every page.
4.2 Types of Indexes
- Primary Index: Automatically created when a primary key is defined. Ensures uniqueness.
- Unique Index: Ensures all values in the indexed column are unique.
- Non-Unique Index: Allows duplicate values; used to speed up queries.
- Clustered Index: Determines the physical order of data in a table. A table can have only one clustered index.
- Non-Clustered Index: Separate from the data storage; contains pointers to the data. A table can have multiple non-clustered indexes.
- Composite Index: An index on multiple columns.
4.3 How Indexes Work
- B-Tree Indexes: Most common type, where data is stored in a balanced tree structure allowing logarithmic time complexity for searches.
- Hash Indexes: Used for equality searches; faster but limited in functionality.
- Bitmap Indexes: Efficient for columns with a limited number of distinct values.
4.4 Pros and Cons of Indexes
Pros:
- Faster Query Performance: Significantly reduces the time to retrieve data.
- Efficient Sorting: Helps in
ORDER BYandGROUP BYoperations. - Uniqueness Enforcement: Ensures data integrity for unique columns.
Cons:
- Storage Overhead: Requires additional disk space.
- Slower Write Operations:
INSERT,UPDATE,DELETEoperations can be slower due to index maintenance. - Complexity: Managing multiple indexes can complicate database design.
4.5 Creating and Managing Indexes
Creating an Index:
CREATE INDEX index_name
ON table_name (column1, column2, ...);Example:
CREATE INDEX idx_users_email
ON users (email);Creating a Unique Index:
CREATE UNIQUE INDEX idx_unique_email
ON users (email);Dropping an Index:
DROP INDEX index_name;Note: Syntax may vary by DBMS.
Viewing Indexes:
- MySQL:
SHOW INDEX FROM table_name; - PostgreSQL:
\d table_name
Best Practices:
- Index Columns Used in WHERE Clauses: Speeds up search operations.
- Index Foreign Keys: Enhances join performance.
- Avoid Over-Indexing: Balance between read and write performance.
- Use Composite Indexes Wisely: Order matters; place the most selective columns first.
Additional Tips for Your Interview
- Understand Use Cases:
- Be prepared to explain when and why to use certain types of joins or indexes.
- Explain Trade-offs:
- Discuss the balance between normalization and denormalization.
- Talk about the impact of indexes on read vs. write performance.
- Practice Writing SQL:
- Be comfortable writing and explaining SQL queries on a whiteboard or in a coding environment.
- Real-World Scenarios:
- Think of examples from projects or experiences where you designed a database, wrote complex queries, or optimized performance.
- Terminology:
- Ensure you’re familiar with all key terms and can define them clearly.
- Performance Considerations:
- Understand how indexing strategies affect query performance.
- Know basic query optimization techniques.
- Be Ready for Problem-Solving:
- You might be given a scenario and asked to design a database schema or write queries to retrieve specific information.
Example Scenario
Problem: Design a simple database for an online bookstore. Include tables for Books, Authors, and Orders. Ensure that each order can include multiple books and that each book can have multiple authors.
Solution:
-
Entities and Relationships:
- Books: Each book has one or more authors.
- Authors: Each author can write multiple books.
- Orders: Each order can include multiple books.
-
Tables:
- Books:
book_id(Primary Key)titleisbnpublished_dateprice
- Authors:
author_id(Primary Key)namebio
- Books_Authors (Join Table for Many-to-Many Relationship):
book_id(Foreign Key referencing Books)author_id(Foreign Key referencing Authors)- Primary Key: (
book_id,author_id)
- Orders:
order_id(Primary Key)user_id(Foreign Key referencing Users)order_date
- Orders_Books (Join Table for Many-to-Many Relationship):
order_id(Foreign Key referencing Orders)book_id(Foreign Key referencing Books)quantity- Primary Key: (
order_id,book_id)
- Books:
-
Indexes:
- Index on
Books.titlefor faster search. - Index on
Orders.order_datefor querying recent orders. - Composite index on
Books_Authors.author_idfor efficient author-based queries.
- Index on