databasesIntro to Databases

1. Database Design and Modeling

1.1 What is a Database?

A database is an organized collection of structured information or data, typically stored electronically in a computer system. Databases are managed by Database Management Systems (DBMS), which provide tools to create, retrieve, update, and manage data.

Key Points:

  • Data Storage: Efficient storage and retrieval of data.
  • Data Management: Ensures data integrity, security, and consistency.
  • Types: Relational, NoSQL, etc.

1.2 Types of Databases

  • Relational Databases (RDBMS): Use tables (relations) to store data. Examples: MySQL, PostgreSQL, Oracle.
  • NoSQL Databases: Designed for specific data models and use cases. Examples:
    • Document Stores: MongoDB
    • Key-Value Stores: Redis
    • Column Stores: Cassandra
    • Graph Databases: Neo4j
  • NewSQL: Combines scalability of NoSQL with ACID guarantees of RDBMS. Examples: Google Spanner.

1.3 Relational Database Concepts

  • Tables (Entities): Consist of rows (records) and columns (attributes).
  • Relationships: Define how tables relate to each other.
    • One-to-One
    • One-to-Many
    • Many-to-Many

1.4 Entity-Relationship (ER) Diagrams

ER Diagrams visually represent the structure of a database.

Components:

  • Entities: Represented by rectangles. Example: User, Order.
  • Attributes: Represented by ovals connected to their entity. Example: User has UserID, Name.
  • Relationships: Represented by diamonds connecting entities. Example: User places Order.

Example:

[User]---places---[Order]

1.5 Normalization

Normalization is a database design technique that organizes tables to minimize redundancy and dependency. The goal is to ensure data integrity and to make it easier to maintain the database. Normalization involves dividing a database into two or more tables and defining relationships between them. There are several normal forms, but the most commonly used are:

  1. First Normal Form (1NF):

    • A table is in 1NF if it contains only atomic (indivisible) values and each entry in a column is of the same data type.
    • Example:
      • Consider a Students table with a column for Courses:

        StudentIDStudentNameCourses
        1AliceMath, Science
        2BobHistory
      • This violates 1NF because the Courses column contains multiple values. To convert it to 1NF:

        StudentIDStudentNameCourse
        1AliceMath
        1AliceScience
        2BobHistory
      • Now each course is atomic, and the table is in 1NF.

  2. Second Normal Form (2NF):

    • A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the primary key. This means there should be no partial dependency of any column on the primary key.
    • Example:
      • Consider a table of Enrollments:

        StudentIDCourseInstructor
        1MathDr. Smith
        1ScienceDr. Johnson
        2HistoryDr. Brown
      • Here, Instructor depends only on Course, not on the composite primary key (StudentID, Course). To convert to 2NF, we split it into two tables:

        • Enrollments:

          StudentIDCourse
          1Math
          1Science
          2History
        • Courses:

          CourseInstructor
          MathDr. Smith
          ScienceDr. Johnson
          HistoryDr. Brown
  3. Third Normal Form (3NF):

    • A table is in 3NF if it is in 2NF and there are no transitive dependencies. This means that non-key attributes should not depend on other non-key attributes.
    • Example:
      • Consider a Students table with Department and DepartmentHead:

        StudentIDStudentNameDepartmentDepartmentHead
        1AliceCSDr. Adams
        2BobMathDr. Baker
      • Here, DepartmentHead depends on Department, creating a transitive dependency. To convert to 3NF, split it into two tables:

        • Students:

          StudentIDStudentNameDepartment
          1AliceCS
          2BobMath
        • Departments:

          DepartmentDepartmentHead
          CSDr. Adams
          MathDr. Baker
  4. Boyce-Codd Normal Form (BCNF):

    • A table is in BCNF if it is in 3NF and every determinant is a candidate key. This further reduces redundancy.
    • Example:
      • Consider a Projects table where a project can have multiple employees working on it:

        ProjectIDEmployeeIDRole
        1101Developer
        1102Tester
        2101Manager
      • In this case, the combination of ProjectID and EmployeeID forms the primary key, but Role can also determine the EmployeeID. To normalize it to BCNF, we need to separate the roles:

        • ProjectAssignments:

          ProjectIDEmployeeID
          1101
          1102
          2101
        • Roles:

          EmployeeIDRole
          101Developer
          102Tester
          101Manager

By applying these normalization forms, databases become more efficient and easier to maintain, reducing data redundancy and enhancing data integrity. However, it’s essential to balance normalization with performance considerations, as overly normalized databases can lead to complex queries and performance issues.

1.6 Keys

  • Primary Key: A unique identifier for each record in a table. Cannot be NULL. Example: UserID in User table.
  • Foreign Key: A field (or collection of fields) in one table that uniquely identifies a row of another table. Establishes a relationship between tables.
  • Composite Key: A primary key composed of multiple columns.
  • Unique Key: Ensures all values in a column are unique, but can be NULL.
  • Surrogate Key: An artificially generated key, usually an auto-increment integer.

2. Basic SQL Queries

Structured Query Language (SQL) is used to interact with relational databases.

2.1 SELECT Statements

Retrieve data from one or more tables.

Syntax:

SELECT column1, column2, ...
FROM table_name;

Examples:

  • Select specific columns:
    SELECT name, email FROM users;
  • Select all columns:
    SELECT * FROM users;

2.2 INSERT, UPDATE, DELETE

INSERT: Add new records to a table.

INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);

Example:

INSERT INTO users (name, email)
VALUES ('Alice', '[email protected]');

UPDATE: Modify existing records.

UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

Example:

UPDATE users
SET email = '[email protected]'
WHERE user_id = 1;

DELETE: Remove records from a table.

DELETE FROM table_name
WHERE condition;

Example:

DELETE FROM users
WHERE user_id = 1;

Caution: Always use WHERE clause with UPDATE and DELETE to avoid affecting all records.

2.3 WHERE Clause

Filters records based on specified conditions.

Operators:

  • Comparison: =, !=, <, >, <=, >=
  • Logical: AND, OR, NOT
  • Range: BETWEEN ... AND ...
  • Set: IN (...)
  • Pattern Matching: LIKE

Examples:

  • Simple condition:
    SELECT * FROM users WHERE age > 25;
  • Multiple conditions:
    SELECT * FROM users
    WHERE age > 25 AND city = 'New York';
  • Pattern matching:
    SELECT * FROM users
    WHERE email LIKE '%@example.com';

2.4 ORDER BY and GROUP BY

ORDER BY: Sorts the result set.

Syntax:

SELECT column1, column2, ...
FROM table_name
ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...;

Example:

SELECT name, age FROM users
ORDER BY age DESC;

GROUP BY: Groups records that have the same values in specified columns. Often used with aggregate functions.

Syntax:

SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;

Example:

SELECT city, COUNT(*) as user_count
FROM users
GROUP BY city;

HAVING Clause: Filters groups based on conditions.

Example:

SELECT city, COUNT(*) as user_count
FROM users
GROUP BY city
HAVING COUNT(*) > 10;

3. Joins

Joins are used in SQL to combine rows from two or more tables based on a related column between them. There are several types of joins, each serving a different purpose. Understanding joins is essential for querying relational databases effectively.

3.1 INNER JOIN

  • An inner join returns only the rows that have matching values in both tables. It is the most common type of join.

  • Example: Consider two tables: Students and Enrollments.

    • Students:

      | StudentID | StudentName |
      |-----------|-------------|
      | 1         | Alice       |
      | 2         | Bob         |
      | 3         | Charlie     |
    • Enrollments:

      | StudentID | Course   |
      |-----------|----------|
      | 1         | Math     |
      | 1         | Science  |
      | 2         | History  |
    • SQL Query:

      SELECT Students.StudentName, Enrollments.Course
      FROM Students
      INNER JOIN Enrollments ON Students.StudentID = Enrollments.StudentID;
    • Result:

      | StudentName | Course   |
      |-------------|----------|
      | Alice       | Math     |
      | Alice       | Science  |
      | Bob         | History  |

3.2 LEFT (OUTER) JOIN

  • A left join returns all the rows from the left table and the matched rows from the right table. If there is no match, NULL values are returned for columns from the right table.

  • Example: Using the same Students and Enrollments tables:

    • SQL Query:

      SELECT Students.StudentName, Enrollments.Course
      FROM Students
      LEFT JOIN Enrollments ON Students.StudentID = Enrollments.StudentID;
    • Result:

      | StudentName | Course   |
      |-------------|----------|
      | Alice       | Math     |
      | Alice       | Science  |
      | Bob         | History  |
      | Charlie     | NULL     |

3.3 RIGHT (OUTER) JOIN

  • A right join returns all the rows from the right table and the matched rows from the left table. If there is no match, NULL values are returned for columns from the left table.

  • Example: Suppose we have a Courses table in addition to the existing ones:

    • Courses:

      | CourseID | CourseName |
      |----------|------------|
      | 1        | Math       |
      | 2        | Science    |
      | 3        | History    |
    • SQL Query:

      SELECT Students.StudentName, Courses.CourseName
      FROM Students
      RIGHT JOIN Enrollments ON Students.StudentID = Enrollments.StudentID
      RIGHT JOIN Courses ON Enrollments.Course = Courses.CourseName;
    • Result:

      | StudentName | CourseName |
      |-------------|------------|
      | Alice       | Math       |
      | Alice       | Science    |
      | Bob         | History    |
      | NULL        | History    |

3.4 FULL (OUTER) JOIN

  • A full join returns all rows from both tables, with NULLs in place where there are no matches.

  • Example: Using the previous tables:

    • SQL Query:

      SELECT Students.StudentName, Enrollments.Course
      FROM Students
      FULL OUTER JOIN Enrollments ON Students.StudentID = Enrollments.StudentID;
    • Result:

      | StudentName | Course   |
      |-------------|----------|
      | Alice       | Math     |
      | Alice       | Science  |
      | Bob         | History  |
      | Charlie     | NULL     |

3.5 CROSS JOIN

  • A cross join returns the Cartesian product of the two tables, combining every row of the first table with every row of the second table.

  • Example:

    • SQL Query:

      SELECT Students.StudentName, Courses.CourseName
      FROM Students
      CROSS JOIN Courses;
    • Result:

      | StudentName | CourseName |
      |-------------|------------|
      | Alice       | Math       |
      | Alice       | Science    |
      | Alice       | History    |
      | Bob         | Math       |
      | Bob         | Science    |
      | Bob         | History    |
      | Charlie     | Math       |
      | Charlie     | Science    |
      | Charlie     | History    |

3.6 Self JOIN

A table is joined with itself to compare rows within the same table.

Syntax:

SELECT a.columns, b.columns
FROM table1 a
INNER JOIN table1 b
ON a.column = b.column;

Example:

SELECT e1.name AS Employee, e2.name AS Manager
FROM employees e1
INNER JOIN employees e2
ON e1.manager_id = e2.employee_id;

4. Indexes

Indexes improve the speed of data retrieval operations on a database table at the cost of additional storage and slower write operations.

4.1 What is an Index?

An index is a data structure that allows the DBMS to find and retrieve specific rows much faster than without an index.

Analogy: Similar to an index in a book, which allows you to quickly locate pages without scanning every page.

4.2 Types of Indexes

  • Primary Index: Automatically created when a primary key is defined. Ensures uniqueness.
  • Unique Index: Ensures all values in the indexed column are unique.
  • Non-Unique Index: Allows duplicate values; used to speed up queries.
  • Clustered Index: Determines the physical order of data in a table. A table can have only one clustered index.
  • Non-Clustered Index: Separate from the data storage; contains pointers to the data. A table can have multiple non-clustered indexes.
  • Composite Index: An index on multiple columns.

4.3 How Indexes Work

  • B-Tree Indexes: Most common type, where data is stored in a balanced tree structure allowing logarithmic time complexity for searches.
  • Hash Indexes: Used for equality searches; faster but limited in functionality.
  • Bitmap Indexes: Efficient for columns with a limited number of distinct values.

4.4 Pros and Cons of Indexes

Pros:

  • Faster Query Performance: Significantly reduces the time to retrieve data.
  • Efficient Sorting: Helps in ORDER BY and GROUP BY operations.
  • Uniqueness Enforcement: Ensures data integrity for unique columns.

Cons:

  • Storage Overhead: Requires additional disk space.
  • Slower Write Operations: INSERT, UPDATE, DELETE operations can be slower due to index maintenance.
  • Complexity: Managing multiple indexes can complicate database design.

4.5 Creating and Managing Indexes

Creating an Index:

CREATE INDEX index_name
ON table_name (column1, column2, ...);

Example:

CREATE INDEX idx_users_email
ON users (email);

Creating a Unique Index:

CREATE UNIQUE INDEX idx_unique_email
ON users (email);

Dropping an Index:

DROP INDEX index_name;

Note: Syntax may vary by DBMS.

Viewing Indexes:

  • MySQL:
    SHOW INDEX FROM table_name;
  • PostgreSQL:
    \d table_name

Best Practices:

  • Index Columns Used in WHERE Clauses: Speeds up search operations.
  • Index Foreign Keys: Enhances join performance.
  • Avoid Over-Indexing: Balance between read and write performance.
  • Use Composite Indexes Wisely: Order matters; place the most selective columns first.

Additional Tips for Your Interview

  1. Understand Use Cases:
    • Be prepared to explain when and why to use certain types of joins or indexes.
  2. Explain Trade-offs:
    • Discuss the balance between normalization and denormalization.
    • Talk about the impact of indexes on read vs. write performance.
  3. Practice Writing SQL:
    • Be comfortable writing and explaining SQL queries on a whiteboard or in a coding environment.
  4. Real-World Scenarios:
    • Think of examples from projects or experiences where you designed a database, wrote complex queries, or optimized performance.
  5. Terminology:
    • Ensure you’re familiar with all key terms and can define them clearly.
  6. Performance Considerations:
    • Understand how indexing strategies affect query performance.
    • Know basic query optimization techniques.
  7. Be Ready for Problem-Solving:
    • You might be given a scenario and asked to design a database schema or write queries to retrieve specific information.

Example Scenario

Problem: Design a simple database for an online bookstore. Include tables for Books, Authors, and Orders. Ensure that each order can include multiple books and that each book can have multiple authors.

Solution:

  1. Entities and Relationships:

    • Books: Each book has one or more authors.
    • Authors: Each author can write multiple books.
    • Orders: Each order can include multiple books.
  2. Tables:

    • Books:
      • book_id (Primary Key)
      • title
      • isbn
      • published_date
      • price
    • Authors:
      • author_id (Primary Key)
      • name
      • bio
    • Books_Authors (Join Table for Many-to-Many Relationship):
      • book_id (Foreign Key referencing Books)
      • author_id (Foreign Key referencing Authors)
      • Primary Key: (book_id, author_id)
    • Orders:
      • order_id (Primary Key)
      • user_id (Foreign Key referencing Users)
      • order_date
    • Orders_Books (Join Table for Many-to-Many Relationship):
      • order_id (Foreign Key referencing Orders)
      • book_id (Foreign Key referencing Books)
      • quantity
      • Primary Key: (order_id, book_id)
  3. Indexes:

    • Index on Books.title for faster search.
    • Index on Orders.order_date for querying recent orders.
    • Composite index on Books_Authors.author_id for efficient author-based queries.