System Design for Beginners

Computer Architecture

Components

Disk storage, persistent

Capacity generally measured in gigabytes or terabytes
1 byte = 8 bits, and bytes and bits in combination contain information
R/W speeds are measured in milliseconds

Memory/RAM (Random Access Memory), not persistent

Typically a lot smaller than disk storage, because it is more expensive
However, it is much faster than disk (by an order of magnitude), and R/W speeds are measured in microseconds

CPU (Central Processing Unit)

Reads/writes information from disk/memory and executes code that is written somewhere in memory
All computations are done in the CPU
CPUs also have a cache, which is a small amount of memory, usually a few megabytes, that is much faster than RAM (by an order of magnitude) and R/W speeds are measured in nanoseconds
The CPU cache is used to store frequently accessed data from memory
RAM is only accessed on CPU cache misses

Why distributed systems?

Moore’s Law is breaking down, and it is unfeasible to keep scaling up a single CPU. Instead, we can use multiple CPUs to work together to solve a problem.

Application Architecture

Developer + CI/CD Server

Local Development: Developers write code and run it locally for building, testing, and debugging.
Production: Code is built, tested, and deployed in a production environment using a CI/CD server.

Server

Storage: The server can store data locally or connect to a database server over the network.
User Interaction:
- A user sends a request to the server (frontend or backend), and the server responds accordingly.
Logging:
- Server logs can be stored in a dedicated log server for debugging and monitoring.
- Logs are typically time-series data and can generate metrics.
Alerts: Metrics can trigger an alerts service, which notifies developers when something goes wrong (e.g., when metrics fall below a threshold).

Vertical Scaling

Definition: Increasing the capacity of a single server by adding more resources (CPU, memory, etc.).
Use Case: Effective when a single server is under strain due to a high number of user requests.

Horizontal Scaling

Definition: Replicating the server and distributing tasks among multiple servers.
Load Balancing:
- A load balancer forwards user requests to different servers using various methods:
  1. Round Robin: Sends the request to the next available server in rotation.
  2. Least Connections: Directs traffic to the server with the fewest active connections.
  3. IP Hash: Routes the request to a server based on the user’s IP address.

Database

Location: The database can either run on the same server as the application or be hosted on a separate server.

Design Requirements

Core System Functions

Move data: Between machines or within the same machine.
Store data: In memory, on disk, databases, blob stores, etc.
Transform data: Process data into meaningful information (e.g., metrics from logs).

Service Level Agreement (SLA)

Definition: A contract between a service provider and a customer, outlining the expected level of service.
Purpose: SLAs ensure that Service Level Objectives (SLOs) are met. For example, an SLO could define “99.9% availability over a month.”

Key Metrics for System Design

1. Availability

Formula: Availability = Uptime / (Uptime + Downtime).
Measured in nines:
- 99% uptime = ~3.65 days of downtime per year.
- 99.999% uptime = ~5.26 minutes of downtime per year.
Improvement: Vertical scaling (increasing power of existing servers) can improve availability.

2. Reliability

Definition: The probability that the system will perform without failure.
Impact of Single Points of Failure (SPOF): A single server introduces a higher failure risk.
Improvement:
- Add redundant servers to increase reliability.
- Horizontal scaling (adding more servers) also improves reliability by spreading the load.

3. Fault Tolerance

Definition: The ability of a system to continue functioning if part of it fails.

4. Redundancy

Definition: Duplication of critical components or systems to improve reliability.
Best Practice: Place redundant servers in different geographic locations to mitigate regional failures.

5. Throughput

Definition: The amount of work a system can handle in a given time period (e.g., requests per second).
Scalability:
- Vertical scaling increases throughput but creates a single point of failure.
- Horizontal scaling improves throughput with added complexity (e.g., load balancers are required).
Measurement:
- For databases: Queries per second.
- For data pipelines: Bytes per second.

6. Latency

Definition: The time taken to perform an action or produce a result.
- End-to-end latency: Time from user request to response.
Causes: Can result from network delays, database processing, etc.
Optimization:
- Use caching and CDNs to reduce latency.
- Add more servers to reduce latency, though this can complicate reliability.

Networking Basics

IP Addresses

Purpose: Every device communicating over the internet needs an IP address, which uniquely identifies it on the network.
Structure:
- IPv4: 32-bit address, split into four 8-bit segments (max value 255 each), e.g., 192.168.1.1.
- IPv6: Newer, 128-bit address, designed to expand the available pool of addresses.
Public vs. Private:
- Public IP: Assigned to devices that communicate directly over the internet.
- Private IP: Assigned within a local network by a router.
Static vs. Dynamic IPs:
- Static: Does not change.
- Dynamic: Changes over time; DNS resolves changing server IPs.

Data Transmission & Packets

Packet Structure: Data is broken into small chunks (packets) for transmission.
- Metadata: Contains an IP header and sequence number (for TCP) to reassemble packets correctly.
TCP (Transmission Control Protocol): Ensures data is delivered in the correct order and without errors.
HTTP: A common application-level protocol built on top of TCP.

Ports

Purpose: Ports allow communication between specific services or applications on devices (IP addresses).
- 16-bit value; up to 65,000 ports.
- Default ports:
  - HTTP: 80
  - HTTPS: 443
Localhost: 127.0.0.1 is a reserved IP address for communicating with the local machine.

TCP and UDP

TCP (Transmission Control Protocol)

Data Transmission: Large data is broken into smaller packets, which are reassembled at the destination.
Reliability:
- If packets arrive out of order, TCP reorders them.
- TCP guarantees retransmission of undelivered packets, ensuring reliable delivery.
Connection-Oriented: A connection is established between the sender and receiver using a 3-way handshake (SYN, SYN-ACK, ACK) before data is sent.
- Trade-off: While reliable, the handshake process adds overhead, making TCP slower.
Common Use Cases: Protocols like HTTP, HTTPS, FTP, SMTP, and WebSocket rely on TCP for reliable data transmission.

UDP (User Datagram Protocol)

Connectionless: No need to establish a connection between the client and server before sending data.
Unreliable:
- Packets may be lost or arrive out of order, and UDP doesn’t retransmit missing packets.
- This lack of overhead makes UDP much faster than TCP.
Common Use Cases: Ideal for scenarios where speed is crucial and occasional data loss is acceptable, such as live streaming (video/audio), online gaming, and DNS.

Domain Name System

Domain Name System (DNS)

Purpose: DNS maps domain names (like example.com) to IP addresses, making it easier for users to access websites without remembering numeric IP addresses.
Management: The DNS address space is managed by ICANN (Internet Corporation for Assigned Names and Numbers).
Domain Registrars: Platforms like GoDaddy, Namecheap, and Google Domains allow users to purchase domain names.

Types of DNS Records

A Record: Maps a domain name to an IP address.
CNAME Record: Maps a domain name to another domain name (used for aliases).

Caching and Security

DNS Caching: Clients can cache DNS records to speed up lookup times and reduce the need for repeated DNS queries.
Firewalls: Can block DNS requests to prevent unauthorized access to a network.

Structure of a URL

Example: https://domains.google.com/get-started

https: Protocol (defines how data is transmitted, e.g., HTTP or HTTPS for secure transmission).
.com: Top-level domain (TLD).
domains: Subdomain (part of the larger domain, often used to separate services).
get-started: Path (specific resource or page on the website).

HTTP

Application Protocols

Client: Doesn’t always refer to the end user; it could be another server interacting with the system.
RPC (Remote Procedure Call): A protocol where one program requests services from another program on a different machine, functioning like a function call without the need to understand the underlying network.
Application Layer Protocols: Many application protocols, such as HTTP, can be seen as specialized RPCs.

HTTP (Hypertext Transfer Protocol)

Definition: HTTP is a request-response protocol, and the latest version is HTTP/3.
Statelessness: HTTP does not manage state; every interaction is self-contained within a request and its response.

HTTP Request and Response

Request Structure:
- Composed of headers and body.
- There are three types of headers: general, request, and response.
HTTP Methods:
- GET: Retrieve data.
- POST: Submit data (used for creating new resources).
- PUT: Update an existing resource.
- DELETE: Delete a resource.
- PATCH: Partially update a resource.
- HEAD: Retrieve headers without the body.
- OPTIONS: Retrieve supported methods for a resource.
HTTP Status Codes:
- 1xx: Informational.
- 2xx: Success.
  - 200: OK.
  - 201: Created (typically used with POST).
- 3xx: Redirection.
- 4xx: Client Error.
  - 400: Bad Request.
  - 401: Unauthorized.
  - 404: Not Found.
- 5xx: Server Error.
  - 500: Internal Server Error.
  - 502: Bad Gateway.
  - 503: Service Unavailable.

Headers

Request Headers:
- Method: Specifies the action (GET, POST, PUT, etc.).
- URI: Uniform Resource Identifier.
- Path: Resource location on the server.
- User-Agent: Information about the client making the request.
- Accept: Specifies acceptable media types for the response.
Response Headers:
- Status Code: Indicates the result of the request (e.g., 200, 404, 500).
- Content-Type: Media type of the response body.
- Content-Length: Size of the response.
- Server: Information about the server handling the request.

Endpoints and Methods

A route (URL path) can be paired with multiple methods (GET, POST, etc.).
The combination of a route and a method forms an endpoint.
GET requests should not be used to send sensitive data, as the data appears in the URL.
POST requests are used to send data in the body, offering more privacy.

Caching and Idempotency

GET: Idempotent, meaning repeated requests will have the same effect. It’s also more cacheable.
POST: Not idempotent; repeated requests could have different effects.
DELETE: Idempotent, as repeating it won’t affect a previously deleted resource.

SSL/TLS

SSL (Secure Sockets Layer) and TLS (Transport Layer Security): Cryptographic protocols that ensure secure communication over a network.
Security Features: These protocols protect against eavesdropping, tampering, and message forgery (e.g., Man-in-the-Middle (MITM) attacks).

WebSockets

Application Protocols Overview

HTTP (port 80): A request-response protocol where the client initiates requests, and the server responds.
WebSockets (port 443): A full-duplex protocol that allows for real-time, two-way communication between client and server.
FTP (port 21): A protocol for file transfer between machines over a network.
SMTP (port 25): A protocol for sending email between servers.
SSH (port 22): A secure shell protocol for secure remote login and command execution.
WebRTC (port 80): A protocol for real-time communication, using UDP to enable peer-to-peer video, audio, and data transfer.

WebSockets

Problem with HTTP: Traditional HTTP requires polling the server for updates, which is inefficient due to the overhead of TCP and HTTP, especially for real-time updates.

WebSocket Connection

Persistent Connection: After the initial handshake over HTTP, a WebSocket establishes a persistent, full-duplex connection between the client and the server.
Server Push: With WebSockets, the server can push data to the client whenever new information is available, rather than waiting for the client to request it.
Efficiency: WebSockets reduce the overhead seen in HTTP polling, making it ideal for applications needing real-time updates (e.g., live chats, stock tickers).
Comparison with HTTP/2:
- HTTP/2 introduced streaming, allowing servers to push updates to clients similar to WebSockets.
- WebSockets, however, provide a more flexible full-duplex communication model.
Connection Issues: A downside of WebSockets is that if the connection is interrupted, the server may still attempt to send data, leading to issues if the client is unaware of the broken connection.

API Paradigms

The three most popular API paradigms are REST, GraphQL, and gRPC. Each has unique characteristics and is suited for different use cases.

REST (Representational State Transfer)

Built on HTTP: REST APIs operate over HTTP and use its methods to perform operations.
Stateless Server: The server doesn’t maintain any session state. Each request from the client must contain all the necessary information to process it.
Client-Side State Management: State is handled on the client side, typically using cookies, local storage, or session storage.
Query Parameters: Used to filter, sort, or paginate data. They are appended to the URL after a ? and are separated by &. For example: /users?sort=desc&limit=10.
Path Parameters: Used to specify a resource within the URL path itself, such as /users/{id}.
Scalability: REST APIs are scalable horizontally, meaning you can add more servers to handle additional load.
No Verbs in Endpoints: REST uses HTTP methods (GET, POST, PUT, DELETE, PATCH) to represent CRUD operations, so there’s no need to include verbs in the URL itself.
Data Format: The most common format is JSON (JavaScript Object Notation), which uses key-value pairs, is human-readable, and easy to work with.

GraphQL

Built on HTTP: GraphQL is typically served over HTTP, using only POST requests, since GET requests do not support bodies in a standardized way.
Data Fetching: In GraphQL, the client can specify the exact resources and fields it wants in a request. This prevents overfetching (getting unnecessary data) and underfetching (getting too little data) as often happens with REST.
Single Endpoint: GraphQL uses a single endpoint for all requests, unlike REST where each resource has its own endpoint. In GraphQL, operations are classified as:
- Query: For fetching data.
- Mutation: For updating or modifying data.
POST Requests Not Idempotent: Since GraphQL relies on POST requests, actions can produce different results each time they are called.

gRPC (gRPC Remote Procedure Calls)

Built on HTTP/2: gRPC leverages the features of HTTP/2, such as multiplexed streams, header compression, and bi-directional communication.
Not Browser-Compatible: Browsers lack the low-level control over headers that gRPC requires, so they cannot directly use gRPC. A proxy is often needed for browser-based communication with gRPC servers.
Protocol Buffers: Instead of using JSON, gRPC uses protocol buffers (protobuf), a binary format that is more efficient to serialize and deserialize. This results in smaller payloads and faster communication.
Bidirectional Streaming: gRPC supports bidirectional streaming, where the client and server can exchange multiple messages in parallel, without waiting for the other party.
Efficiency: gRPC is faster and more efficient than REST and is commonly used for server-to-server communication.
Debugging: While gRPC is powerful, REST APIs are easier to debug and test. REST provides status codes (e.g., 404, 500) that make it straightforward to understand what went wrong. In contrast, with gRPC, you must define your own status messages.

API Design

In API design, the focus is on creating a clear and consistent interface for clients, rather than the internal details of the implementation, such as the programming language or framework used.

Key Considerations for API Design

1. Interface-Centric Design

Interface First: The primary concern is the interface the API exposes. This includes endpoints, request methods, parameters, and response formats.
Implementation Details: The backend technologies (programming language, framework, database) are irrelevant to clients, as long as the interface remains consistent and functional.

2. Backward Compatibility

Public-Facing APIs: For APIs exposed to the public, it’s crucial to maintain backward compatibility. Once an endpoint is live, it cannot be easily changed or deprecated without affecting existing users.
Breaking Changes: If a breaking change is necessary, a common practice is to implement API versioning.

3. API Versioning

Versioning: Introduce versioning by adding the version number as the first path parameter in the endpoint. For example, /v1/users could become /v2/users for a newer version with different behavior or structure.
Gradual Transition: Allow clients to continue using older versions while migrating to the newer version to avoid disrupting service.

4. Pagination

Purpose: Pagination is essential when dealing with large datasets, ensuring efficient handling of data without overwhelming clients or servers.
Limit and Offset: A simple form of pagination involves using limit (number of items to return) and offset (starting point). For example, /items?limit=10&offset=20.
Cursor-Based Pagination: Alternatively, cursor-based pagination uses a pointer (cursor) to mark the current position in the dataset, which can be more efficient for certain use cases.

Caching

Caching Overview

Purpose: Reduces latency and increases throughput by speeding up read/write operations.
Process:
- Browser checks cache before making network requests.
- If data exists in the cache (cache hit), it is used; if not, a network request is made (cache miss).
Key Term: Cache ratio = cache hits / (cache hits + cache misses).
Server-side caching: In-memory caches (e.g., Redis) store frequently accessed data.

HTTP Caching

Cache-Control header: Defines caching policies for requests and responses.
Cached content: Must be static (i.e., doesn’t change often).

Caching Algorithms

Write-around

Mechanism: Data is written directly to storage, bypassing the cache.
Advantage: Reduces cache flooding with write operations that are not frequently read.

Write-through

Mechanism: Data is written to both the cache and storage simultaneously.
Advantage: Ensures cache always has the latest data.

Write-back

Mechanism: Data is written to the cache first and acknowledged. Later, it is written to storage in the background.
Advantage: Faster write acknowledgment, but potential risk if the cache fails before data is written to storage.

Cache Eviction Policies

Least Recently Used (LRU)

Process: Evicts the least recently used items first.
Implementation: Recently accessed items are moved to the front; evict items from the end.

First In, First Out (FIFO)

Process: Evicts the oldest items first.
Drawback: Not ideal for cache, as older items might still be frequently accessed.

Least Frequently Used (LFU)

Process: Evicts items based on how infrequently they are accessed.
Implementation: Tracks access count for each item using key-value pairs.

Content Delivery Network (CDN)

Function: Distributes data across global data centers to improve proximity to users.
Static Files: Only static content (e.g., images, videos, CSS, JavaScript) can be placed on a CDN.
Benefits: Increases reliability and speeds up content delivery.

CDN Types

Push CDN: Content is manually pushed to the CDN after updates, propagating changes across all CDN nodes.
Pull CDN: The CDN pulls content from the server when a user requests it, caching it locally for future requests without propagating across all CDNs.

Caching with CDN

Cache-Control: The public directive in the Cache-Control header allows CDN servers to cache content.

Proxies

Forward Proxy

Function: Acts as an intermediary between clients and servers, hiding the true origin of the request from the server.

Reverse Proxy

Function: Hides the server from the client. The client interacts with the reverse proxy without knowledge of the actual server.
Examples:
- CDNs: Act as reverse proxies by abstracting the origin server from the client.
- Load Balancers: Function as reverse proxies by distributing requests to a pool of backend servers.

Load Balancers

Purpose: Distribute incoming traffic across multiple horizontally scaled backend servers.
Stateless Servers: Servers behind load balancers might be stateless, often using REST.

Load Balancer Algorithms

Round Robin: Distributes requests sequentially in a loop.
Weighted Round Robin: Assigns requests based on the weight (capacity) of each server.
Least Connections: Sends requests to the server with the fewest active connections.
Locality-based: Routes requests based on the client’s location.
IP Hash: Routes requests by hashing the client’s IP address.

Types of Load Balancers

Layer 4 Load Balancer: Operates at the network and transport layers (IP, TCP, UDP). It’s faster but less flexible.
Layer 7 Load Balancer: Operates at the application layer (HTTP, HTTPS, SMTP). More expensive due to multiple connections to backend servers but provides greater flexibility.

Load Balancer Redundancy

Single Point of Failure: High throughput means load balancers can become critical failure points.
Solution: Use backup load balancers to take over in case the primary load balancer fails.

Consistent Hashing

Definition: Maps user requests to servers by hashing the user’s IP address.
Purpose: Ensures that a user’s requests are consistently routed to the same server.
- Stateless Servers (e.g., REST): Consistent hashing is less relevant because any server can handle the request.
- Stateful Servers: Important when servers have their own in-memory caches, ensuring data continuity for users.

Server Removal and Hashing

Server Removal: When a server is removed, its users are routed to the next server in the ring.
Hashing Function Impact: Changes in the hashing function could result in many users being routed to different servers.
- Intelligent Hashing: Choosing a smart hashing function minimizes the number of users rerouted to different servers during changes.

Use Cases

Applications: Consistent hashing is useful in scenarios like CDNs and distributed databases.

Standard Query Language (SQL)

Relational Databases: Use disk for persistent data storage.
B+ Tree Structure:
- Data structure used by relational databases, where each node can have m children and is split into m-1 key-value pairs.
- Minimizes tree height, reducing read operations by increasing the number of children per node.
- Data is stored in the leaf nodes, which are connected in a linked-list fashion.
- B+ trees allow for efficient data sorting.

SQL

Tables and Schema: SQL databases are structured around tables, each defined by a schema, which specifies the structure of data.
Records and Primary Keys: Every row (record) in the table must satisfy the schema and have a unique identifier (primary key).
Joins: Data from two or more tables can be combined (joined) to retrieve related information.

ACID Properties

Atomicity: Each transaction is all or nothing—either the entire transaction succeeds or none of it does.
Consistency: Ensures the database remains in a valid state according to defined rules.
Isolation: Transactions are kept independent to avoid side effects like dirty reads or phantom reads.
Durability: Changes from committed transactions persist, even in the event of system failure.

NoSQL

Purpose: Non-relational databases designed to overcome limitations of SQL databases, offering better scalability.
Scalability:
- Vertical scaling (increasing the capacity of a single server) is the simplest approach.
- Horizontal scaling (distributing data across multiple databases) is also possible without foreign key constraints.
Replication and Consistency: Nodes can be replicated, ensuring eventual consistency—data will eventually synchronize across all nodes. A leader node ensures write propagation to followers.

Key-Value Stores

Examples: Redis, Memcached.
Use Case: Often used as a caching layer in front of SQL databases, storing data as key-value pairs.

Document Stores

Definition: Store documents (JSON objects) with flexible structures.
Primary Key: Each document has a unique identifier.
Schema: No schema is required, providing flexibility in data structure.
Popular Example: MongoDB, a widely-used open-source document store.

Wide-Column Stores

Use Case: Designed for write-heavy operations like time-series data, where frequent updates are unnecessary.
Schema: Schema is optional.
Examples: Cassandra, Google BigTable.

Graph Databases

Use Case: Ideal for data with complex relationships, like social networks.
Advantage: Graph databases use directed graphs to model relationships, making it easier to query relationship-based data without the need for extensive joins (as required in SQL databases).

Replication

Challenge: As traffic to a database increases, both reads and writes can slow down.

Leader-Follower Replication (Master-Slave Replication)

Setup: A single leader database replicates its data to multiple followers.
Asynchronous Replication:
- The leader replicates data to followers eventually, not immediately.
- Followers may serve stale data to clients for a brief period.
Synchronous Replication:
- The leader waits for followers to replicate data before proceeding.
- Guarantees up-to-date data across all followers, but increases latency, especially if replicas are distributed globally.
- Improves reliability, as a follower can be promoted to leader if the current leader fails.

Multi-Leader Replication

Setup: Multiple leaders allow both reads and writes across several replicas.
Scaling: Not only do reads scale, but writes also scale across the replicas.
Synchronization: Leaders do not need to be synchronized frequently, which adds flexibility.

Sharding

Definition: Dividing the database into smaller pieces (shards), each hosted on a different machine, to scale reads and writes.
Range-based Sharding: Data is partitioned based on a range of values (e.g., user IDs).
- Downside: Running joins across shards can be slow.
Hash-based Sharding: Data is distributed based on a hash of the key, ensuring even distribution across shards.
SQL Databases: For relational databases like MySQL and Postgres, sharding logic must be implemented at the application level.
NoSQL Databases: Many NoSQL databases have built-in sharding due to the lack of foreign key constraints and non-relational data structure.

CAP Theorem

Consistency: Ensures every read returns the most up-to-date data. During network partitioning, data from the leader won’t be written to followers. Achieving consistency may require sacrificing availability by making a node temporarily unavailable until the partition is resolved.
Availability: Prioritizes availability by allowing access to data even if it’s stale. This means every node can be accessed unless it crashes or is unreachable.
Partition Tolerance: The system remains operational even when there’s a network partition or communication failure between nodes.
Trade-offs: In distributed storage systems, you can only guarantee two of the three properties (Consistency, Availability, Partition Tolerance) at any given time. Partition tolerance is essential, leaving the choice between consistency and availability.
PACELC: An extension of the CAP theorem, which states that during a network Partition (P), you must choose between Availability (A) and Consistency (C). Else (E), when there is no partition, you balance Latency (L) versus Consistency (C).

Please stop calling databass CP or AP

Object Storage

Data Structure: Data is stored in a flat manner, unlike hierarchical file systems in databases.
Popular Services: Examples include AWS S3, Google Cloud Storage, and Azure Blob Storage.
Flat Storage Model: Although it may appear like file storage, object storage stores data in a flat structure.
- Objects are fetched directly using a name, similar to how keys are accessed in a hash map, without the need to search through a file tree.
Binary Large Objects (BLOBs): Object storage evolved from BLOBs (Binary Large Objects), which were typically used for storing large, unstructured data like images and videos.
- These objects cannot be updated or edited once stored, emphasizing fast retrieval over modification.
Global Uniqueness: Object names must be globally unique (e.g., in AWS S3) to avoid duplication.
Interface: Interactions with object storage are done via HTTP, simplifying access by making requests over the network rather than querying with SQL.

Message Queues

Purpose: Used to handle large numbers of events asynchronously by queuing them for later processing. This decouples the sender of the event from the processor of the event, allowing for more scalable and flexible systems.
Event Flow: Events are sent to the message queue, where they are stored until the application server processes them.
Durability: Message queues are durable, meaning that even if the queue goes down, queued events are not lost.
Processing:
- Events can be processed in different orders, including FIFO (First In, First Out).
- The app server can either pull messages from the queue periodically or messages can be pushed to the server.
Message Acknowledgment: Once the app server processes a message, it sends an acknowledgment (ACK) to the queue. Upon acknowledgment, the message is removed from the queue to prevent message loss.
Examples: Popular message queue services include:
- RabbitMQ
- Apache Kafka
- GCP Pub/Sub

Pub/Sub Systems

Publish/Subscribe Model:
- In pub/sub systems, event producers are called publishers, and event consumers are subscribers.
- Publishers send events to the message queue, which acts as an intermediary.
Topics and Subscriptions:
- Events in the message queue are organized into topics based on event types (e.g., payment data, analytics data).
- Subscriptions are created for topics, allowing multiple subscribers to receive events from a single topic.
- A single topic can have multiple subscriptions, enabling fan-out, where multiple subscribers can listen to the same topic.
Flexibility:
- A single app server can subscribe to multiple topics.
- This decoupling allows subscribers to scale or change without affecting the publishers.

Scaling What happens when you type google.com into your browser's address box and press enter?