A data structure is a logical way that data is organized
Uses:
- Decision making
- Numerical and statistical analysis
- Blockchain

RAM

Each integer takes 4 bytes. Each character takes 1 byte.
Each byte is 8 bits
Array store values at contiguous memory locations

Arrays

Static arrays

Contiguous block of data
Index starts at 0
Static arrays are fixed sized arrays (not really a concern of Python or JavaScript)
We delete values by overwriting, such as with a 0
Static array operations: - Reading /writing at i-th position: $O(1)$ - Insert / remove at end: $O(1)$ - Insert / remove middle: $O(n)$

Dynamic arrays

Pushing / popping at the end: $O(1)$
After dynamic array increases in size and takes new spot in memory, old array is deallocated. The amoritized time complexity of pushing to a dynamic array is $O(1)$ . because we are going to be runnig out of space infrequently (on average).
Time complexities same as static arrays.

Stacks

Linear data structure, LIFO (Last In First Out)
Uses:
- Temporary storage during recursive operations
- Redo and undo operations in doc editors
- Reversing a string
- Parenthesis matching
- Postfix to Infix expressions
- Function calls order
Stack methods:
- push
- pop
- top
- isEmpty
- size

Queue

Linear data structure, items are added to the rear end. Items are removed from the front of the queue.
Uses:
- Breadth-first search algorithm in graphs
- Operating systems: job scheduling, disk scheduling, CPU scheduling
Methods:
- enqueue
- dequeue
- isEmpty
- rear
- front
- size

Doubly ended queue (deque)

Deques allow items to be inserted at either end
Types of deques:
- Input restricted deques: insertion operations are performed at only one end while deletion is performed at both ends
- Output restricted deque: deletion operations are performed at only one end while insertion is performed at both ends in the output restricted queue
Methods:
- insertFront(): This adds an element to the front of the Deque.
- insertLast(): This adds an element to the rear of the Deque.
- deleteFront(): This deletes an element from the front of the Deque.
- deleteLast():This deletes an element from the front of the Deque.
- getFront(): This gets an element from the front of the Deque.
- getRear(): This gets an element from the rear of the Deque.
- isEmpty(): This checks whether Deque is empty or not.
- isFull(): This checks whether Deque is full or not.

Priority queue

Each element is assigned a priority value
The order in which elements in a priority queue are served is determined by their priority
Uses:
- Used in graph algorithms like Dijkstra, Prim’s Minimum spanning tree etc.
- Huffman code for data compression
- Finding Kth Largest/Smallest element
Different implementations

Operations	peek	insert	delete
Linked List	O(1)	O(n)	O(1)
Binary Heap	O(1)	O(log n)	O(log n)
Binary Search Tree	O(1)	O(log n)	O(log n)

Difference between stack and queue

Stack	Queue
Stack is a linear data structure where data is added and removed from the top.	Queue is a linear data structure where data is ended at the rear end and removed from the front.
Stack is based on LIFO (Last In First Out) principle	Queue is based on FIFO (First In First Out) principle
Insertion operation in Stack is known as push.	Insertion operation in Queue is known as enqueue.
Delete operation in Stack is known as pop.	Delete operation in Queue is known as dequeue.
Only one pointer is available for both addition and deletion: top()	Two pointers are available for addition and deletion: front() and rear()
Used in solving recursion problems	Used in solving sequential processing problems

Arrays

One-dimensional array: stores elements in contiguous memory locations, accessing them using a single index value. It is a linear data structure holding all the elements in a sequence
Two-dimensional array: tabular array that includes rows adn columns and stores data. An $m\times n$ two-dimensional array is created by grouping $m$ rows and $n$ columns.

Linked List

A series of linked nodes (or items) that are connected by links. Each link represents an entry into the linekd list, and each entry points to the next node in the sequence. The order in which nodes are added to the list is determined by the order in which they are created.
Uses
- Stack, queue, binary trees, and graphs are implemented using linked lists
- Dynamic management for oeprating system memory
- Round robin scheduling for operating system tasks
- Forward and backward operation in the browser

Difference between array and linked list

Arrays	Linked Lists
An array is a collection of data elements of the same type.	A linked list is a collection of entities known as nodes. The node is divided into two sections: data and address.
It keeps the data elements in a single memory.	It stores elements at random, or anywhere in the memory.
The memory size of an array is fixed and cannot be changed during runtime.	The memory size of a linked list is allocated during runtime.
An array's elements are not dependent on one another.	Linked List elements are dependent on one another.
It is easier and faster to access an element in an array.	In the linked list, it takes time to access an element.
Memory utilization is ineffective in the case of an array.	Memory utilization is effective in the case of linked lists.
Operations like insertion and deletion take longer time in an array.	Operations like insertion and deletion are faster in the linked list.

Hash Sets / Maps

Hash maps don’t maintain orders
Cannot contain duplicates, just like a BST or TreeMap
Average time complexities
- Insert: $O(1)$
- Remove: $O(1)$
- Search: $O(1)$
A hashtable is a collection of key value pairs
Each key value pair is known as an entry
Each storage location is a bucket
Common resolution for a collision in a linked list: turn each bucket into a linked list
Methods get and put have $O(1)$ complexity assuming the hash function used in the hash map distributes elements uniformly among the buckets
The key or value object that gets used in the hashmap must implement these methods:
- equals() used when trying to retrieve a value from the map
- hashcode() method is used when inserting the key object into the map
When hashmap is half full, its size is doubled. When it is doubled, it needs to be rehashed (elements need to be placed into correct positions)

Hash map implementation

names = ["alice", "brad", "collin", "brad", "dylan", "kim"]
 
countMap = {}
 
for name in names:
     if name not in countMap:
          countMap[name] = 1;
     else:
          countMap[name] += 1;

const names = ["alice", "brad", "collin", "brad", "dylan", "kim"];
const countMap = new Map();
 
for (let i = 0; i < names.length; i++) {
  // If countMap does not contain name
  if (!countMap.has(names[i])) {
    countMap.set(names[i], 1);
  } else {
    countMap.set(names[i], countMap.get(names[i]) + 1);
  }
}

Binary trees

Binary search tree

All elements in the left subtree of a node should have a value less than or equal to the parent node’s value
All elements in the right subtree of a node should have a value greater than or equal to the parent node’s value
Insertion in balanced BST takes $O(logn)$ time

Depth First Search

class TreeNode {
  constructor(val) {
    this.val = val;
    this.left = null;
    this.right = null;
  }
}
 
/*
1. travel to the left most node
2. process it
3. check if it has any right subtrees
4. if yes, process that
5. then pop back up one level, process it, then process its subtree
 
time complexity: O(n), same as sorted array
*/
 
// processes in ascending order
function inorder(root) {
  if (root == null) {
    return;
  }
  inorder(root.left);
  console.log(root.val);
  inorder(root.right);
}
 
// process a node before moving on to its children
function preorder(root) {
  if (root == null) {
    return;
  }
  console.log(root.val);
  preorder(root.left);
  preorder(root.right);
}
 
// process the left and right subtree, and then process root node
function postorder(root) {
  if (root == null) {
    return;
  }
  postorder(root.left);
  postorder(root.right);
  console.log(root.val);
}

Breadth first search

class TreeNode {
  constructor(val) {
    this.val = val;
    this.left = null;
    this.right = null;
  }
}
 
function bfs(root) {
  let queue = [];
  if (root != null) {
    queue.push(root);
  }
  let level = 0;
  while (queue.length > 0) {
    console.log("level " + level + ": ");
    let levelLength = queue.length;
    for (let i = 0; i < levelLength; i++) {
      let curr = queue.shift();
      console.log(curr.val + " ");
      if (curr.left != null) {
        queue.push(curr.left);
      }
      if (curr.right != null) {
        queue.push(curr.right);
      }
    }
    level++;
    console.log();
  }
}

Tree traversal

Inorder Traversal => Left, Root, Right : [4, 2, 5, 1, 3]
Preorder Traversal => Root, Left, Right : [1, 2, 4, 5, 3]
Postorder Traversal => Left, Right, Root : [4, 5, 2, 3, 1]

Inorder traversal

In binary search trees, inorder traversal gives nodes in ascending order
Algorithm:
- Traverse the left subtree
- Visit the root.
- Traverse the right subtree

Preorder traversal

Preorder traversal is commonly used to create a copy of the tree
It is also used to get prefix expression of an expression tree
Algorithm:
- Visit the root
- Traverse the left subtree
- Traverse the right subtree

Postorder traversal

Postorder traversal is commonly used to delete the tree
It is also used to get the postfix expression of an expression tree
Algorithm:
- Traverse the left subtree
- Traverse the right subtree
- Visit the root

Graph

Non-linear data structure made up of nodes and edges
The nodes are also known as vertices, and edges are lines or arcs that connect two nodes
Two most common graph representations:
- Adjacency matrix: two-dimensional array with the dimensions $V\times V$ , where $V$ is the number of vertices in a graph. It takes $O(1)$ time to remove an edge. Queries such as whether there is an edge from vertex $u$ to vertex $v$ are efficient and can be done in $O(1). However, even if the graph is sparse (has fewer edges), it takes the same amount of space. Adding a vertex takes$ O(V^2) $time and computing all of a vertex's neighbors takes$ O(V)$ time.
- Adjacency list: each node holds a list of nodes that are directly connected to that vertex. In the worst case, a graph can have $C(V,2)$ edges, consuming $O(V^2)$ space. It is simpler to add a vertex, and it takes the least amount of time to compute all of a vertex’s neighbors. In the worst case, checking adjacency takes $O(V)$ time in the worst case.

Breadth First Search (BFS) and Depth First Search (DFS)

Breadth First Search (BFS)	Depth First Search (DFS)
It stands for “Breadth First Search”	It stands for “Depth First Search”
BFS (Breadth First Search) finds the shortest path using the Queue data structure.	DFS (Depth First Search) finds the shortest path using the Stack data structure.
We walk through all nodes on the same level before passing to the next level in BFS.	DFS begins at the root node and proceeds as far as possible through the nodes until we reach the node with no unvisited nearby nodes.
When compared to DFS, BFS is slower.	When compared to BFS, DFS is faster.
BFS performs better when the target is close to the source.	DFS performs better when the target is far from the source.
BFS necessitates more memory.	DFS necessitates less memory.
Nodes that have been traversed multiple times are removed from the queue.	When there are no more nodes to visit, the visited nodes are added to the stack and then removed.
Backtracking is not an option in BFS.	The DFS algorithm is a recursive algorithm that employs the concept of backtracking.
It is based on the FIFO principle (First In First Out).	It is based on the LIFO principle (Last In First Out).

AVL Tree

Height balancing binary search trees
Compares the heights of the left and right subtrees and ensures that the difference is less than one. This is called the balance factor: $balanceFactor = height(leftSubtree) − height(rightSubtree)$
Methods: same as binary search tree, but rotations are performed to balance the tree
- Insertion
- Deletion
Four types of rotations:
- Left rotation: When a node is inserted into the right subtree of the right subtree and the tree becomes unbalanced, we perform a single left rotation.
- Right rotation: If a node is inserted in the left subtree of the left subtree, the AVL tree may become unbalanced. The tree then requires right rotation.
- Left-Right rotation: The RR rotation is performed first on the subtree, followed by the LL rotation on the entire tree.
- Right-Left rotation: The LL rotation is performed first on the subtree, followed by the RR rotation on the entire tree.
Uses:
- AVL trees are typically used for in-memory sets and dictionaries.
- AVL trees are also widely used in database applications where there are fewer insertions and deletions but frequent data lookups are required.
- Apart from database applications, it is used in applications that require improved searching.

B-Tree

B-Tree is a type of m-way tree that is commonly used for disc access.
A B-Tree with order m can onlu have m-1 keys and m children.
One of the primary reasons for using a B-Tree is its ability to store a large number of keys in a single node as well as large key values while keeping the tree’s height relatively small
Properties:
- All of the leaves are at the same height.
- The term minimum degree ‘t’ describes a B-Tree. The value of t is determined by the size of the disc block.
- Except for root, every node must have at least t-1 keys. The root must contain at least one key.
- All nodes (including root) can have no more than 2*t - 1 keys.
- The number of children of a node is equal to its key count plus one.
- A node’s keys are sorted in ascending order. The child of two keys k1 and k2 contains all keys between k1 and k2.
- In contrast to Binary Search Tree, B-Tree grows and shrinks from the root.
Uses:
- It is used to access data stored on discs in large databases.
- Using a B tree, you can search for data in a data set in significantly less time.
- The indexing feature allows for multilevel indexing.
- The B-tree approach is also used by the majority of servers.

Red-black tree

Self-balancing binary tree
Each node is either red or black. By comparing the node colors on any simple path from the root to a leaf, red-black trees ensure that no path is more than twice as long as any other, ensuring that the tree is generally balanced
Red-black trees are similar binary trees in that they both store their data in two’s complementary binary formats
Red-black trees are faster to access than binary trees
Properties:
- Every node is either red or black.
- The tree’s root is always black.
- There are no two red nodes that are adjacent.
- There is the same number of black nodes on every path from a node to any of its descendant’s NULL nodes.
- All of the leaf nodes are black.

Tree Maze

Backtracking

Example problem: Determine if a path exists from the root of the tree to a leaf node. It may not contain any zeros

The idea is to check every possibility, and backtrack if the solution fails
If the solution fails, we return false
That’s how the root node would know that the solution failed
If the recursion returns true, we found a solution

function canReachLeaf(root) {
  if (root == null || root.val == 0) {
    return false;
  }
  if (root.left == null && root.right == null) {
    return true;
  }
  if (canReachLeaf(root.left)) {
    return true;
  }
  if (canReachLeaf(root.right)) {
    return true;
  }
  return false;
}

Modified problem: return the path as well

function leafPath(root, path) {
  if (root == null || root.val == 0) {
    return false;
  }
  path.push(root.val);
  if (root.left == null && root.right == null) {
    return true;
  }
  if (leafPath(root.left, path)) {
    return true;
  }
  if (leafPath(root.right, path)) {
    return true;
  }
  path.pop();
  return false;
}

Heap / Priority Queue

Heap is implemented with a complete binary tree, could be a minheap or a maxheap
Adding and popping takes $O(logn)$ time
Distinction: Binary tree, NOT a binary search tree
Order property:
- In min heap, root is smaller than every single descendant
- This is true recursively
- Reverse is true for max heap
- Tree is implemented using arrays, index starting at 1
- leftChild: 2*i
- rightChild: 2*i + 1
- parent: i/2 (round down when using a right child)
Structure property:
- All levels are completely filled except possibly the last level
- The last level has all elements as left as possible
Max heap
- In a Max-Heap the data element present at the root node must be the greatest among all the data elements present in the tree.
- This property should be recursively true for all sub-trees of that binary tree.
Min heap
- In a Min-Heap the data element present at the root node must be the smallest (or minimum) among all the data elements present in the tree.
- This property should be recursively true for all sub-trees of that binary tree.

Push / Pop

Push

To push, insert at the end of the array
Swap with parent until order property is satisfied
Time complexity: height of the tree, $O(logn)$ , where n is the number of nodes

Pop

To remove the root, replace it with the last value of the tree (array)
Swap with child until order property is satisfied
Algorithm

// If size == 1, heap is empty, return null
// If size == 2, heap just has the root value, pop and return
// Assign the root, result = heap[1], to a temp variable
// Assign last value in array to root
// Percolate down while we at least have a left child
// Case 1: Two children
// Case 2: Just a left child
// Case 3: No children

Heapify

Process of building a heap
$O(n)$ algorithm

Graphs

Non-linear data structure made up of nodes (vertices) and edges (pointers)
$E <= V^2$
Node can be pointing to itself
Types of graphs:
- Directed graph: edges have a direction
- Undirected graph: edges do not have a direction
- Weighted graph: edges have a weight
- Unweighted graph: edges do not have a weight
Three kinds:
- Matrix (common)
  - Typically a square matrix
  - 0 or 1 values represent if a path exists
  - The 2D array is the graph space itself
- Adjacency matrix (less common)
  - Typically a square matrix
  - Each index represents an edge between a node / vertex
  - ajacencyMatrix[i][j] = 1 if there is an edge from i to j
  - ajacencyMatrix[j][i] = 1 if there is an edge from i to j
  - Space complexity: $O(V^2)$
- Adjacency list (most common)
  - Neighbors list of each node contains a list of all the nodes it is connected to / pointing to

Graph algorithms

Matrix DFS

Time complexity: $O(4^(n*m))$ Memory complexity: $O(n*m)$

// Count paths (backtracking)
function dfs(grid, r, c, visit) {
    let ROWS = grid.length, COLS = grid[0].length;
 
    if (Math.min(r, c) < 0 || r == ROWS || c == COLS ||
        visit[r][c] == 1 || grid[r][c] == 1) {
        return 0;
    }
    if (r == ROWS - 1 && c == COLS - 1) {
        return 1;
    }
    visit[r][c] = 1;
 
    let count = 0;
    count += dfs(grid, r + 1, c, visit);
    count += dfs(grid, r - 1, c, visit);
    count += dfs(grid, r, c + 1, visit);
    count += dfs(grid, r, c - 1, visit);
 
    visit[r][c] = 0;
    return count;
}
 
print dfs((grid, 0, 0, [][]));

Matrix BFS

Time and memory complexity: $O(n*m)$

// Matrix (2D Grid)
let grid = [
  [0, 0, 0, 0],
  [1, 1, 0, 0],
  [0, 0, 0, 1],
  [0, 1, 0, 0],
];
 
// Shortest path from top left to bottom right
function bfs(grid) {
  let ROWS = grid.length;
  let COLS = grid[0].length;
  let visit = new Array(4).fill(0).map(() => Array(4).fill(0)); // 4x4 2d array
  let queue = new Array();
 
  queue.push(new Array(2).fill(0)); // Add {0, 0}
  visit[0][0] = 1;
 
  let length = 0;
  while (queue.length > 0) {
    let queueLength = queue.length;
    for (let i = 0; i < queueLength; i++) {
      let pair = queue.shift();
      let r = pair[0],
        c = pair[1];
 
      if (r == ROWS - 1 && c == COLS - 1) {
        return length;
      }
      // We can directly build the four neighbors
      let neighbors = [
        [r, c + 1],
        [r, c - 1],
        [r + 1, c],
        [r - 1, c],
      ];
      for (let j = 0; j < 4; j++) {
        let newR = neighbors[j][0],
          newC = neighbors[j][1];
        if (
          Math.min(newR, newC) < 0 ||
          newR == ROWS ||
          newC == COLS ||
          visit[newR][newC] == 1 ||
          grid[newR][newC] == 1
        ) {
          continue;
        }
        queue.push(neighbors[j]);
        visit[newR][newC] = 1;
      }
    }
    length++;
  }
  return length; // This should never be called
}

Adjacency List

// GraphNode used for adjacency list
class GraphNode {
  constructor(val) {
    this.val = val;
    this.neighbors = new Array();
  }
}
 
function buildAdjList() {
  let adjList = new Map();
  let edges = [
    ["A", "B"],
    ["B", "C"],
    ["B", "E"],
    ["C", "E"],
    ["E", "D"],
  ];
  adjList.set("A", new Array());
  adjList.set("B", new Array());
 
  for (let edge of edges) {
    let src = edge[0],
      dst = edge[1];
    if (!adjList.has(src)) {
      adjList.set(src, new Array());
    }
    if (!adjList.has(dst)) {
      adjList.set(dst, new Array());
    }
    adjList.get(src).push(dst);
  }
  return adjList;
}
 
// Count paths (backtracking)
function dfs(node, target, adjList, visit) {
  if (visit.has(node)) {
    return 0;
  }
  if (node == target) {
    return 1;
  }
  let count = 0;
  visit = new Set();
  visit.add(node);
  for (let neighbor of adjList.get(node)) {
    count += dfs(neighbor, target, adjList, visit);
  }
  visit.delete(node);
  return count;
}
 
// Shortest path from node to target.
function bfs(node, target, adjList) {
  let length = 0;
  let visit = new Set();
  let q = [];
  visit.add(node);
  q.push(node);
 
  while (q.length != 0) {
    let queueLength = q.length;
 
    for (let i = 0; i < queueLength; i++) {
      let curr = q.shift();
      if (curr === target) {
        return length;
      }
      for (let neighbor of adjList.get(curr)) {
        if (!visit.has(neighbor)) {
          visit.add(neighbor);
          q.push(neighbor);
        }
      }
    }
    length++;
  }
  return length;
}

notes Intro to Databases