Test Site

Asymptotic Analysis

Bachmann-Landau Notation

Definition (big O)
Let $f(n)$ and $g(n)$ be functions mapping positive integers to positive real numbers. We say that $f(n)$ is $O(g(n))$ if there is a real constant $c \gt 0$ and an integer constant $n_0 \ge 1$ such that $$ f(n) \leq c \cdot g(n), \text{ for } n \geq n_0 $$

Simplification Rules

Ignore non-dominant terms
Ignore constant factors

Growth Rates Classifications

In $T_{case}(n) = f(n) \in O(g(n))$, $g(n)$ is one of the following:

Growth type	Function
Constant	$1$
Logarithmic	$\log n$
Linear	$n$
Linearithmic	$n \log n$
Quadratic	$n^2$
Cubic	$n^3$
Exponential	$2^n$
Factorial	$n!$

Worst Case Time Complexity

Iterative Algorithms

Complexity of Primitive Operations

The following primitive operations take $O(1)$:

Declaring a variable
Assigning a value to a variable
Following an object reference
Performing an arithmetic operation
Comparing two numbers
Accessing a single element of an array by index
Accessing the length of an array
Calling a method
Returning from a method

The complexity of a loop is expressed as the sum of the complexities of each iteration of the loop. Whenever there's nested loops, begin by calculating the innermost loop and proceed outwards, which will give nested summations.

Some useful summation properties and formulas:

Property (big O and summation) $$ \sum_{i \in I} O(f(i)) = O\left(\sum_{i \in I} f(i)\right) $$

Property (summation of a constant) $$ \sum_{i = m}^{n} ca_i = c \sum_{i = m}^{n} a_i $$

Property (addition and subtraction of summations) $$ \sum_{i = m}^{n} (a_i \pm b_i) = \sum_{i = m}^{n} a_i \pm \sum_{i = m}^{n} b_i $$

Formula (summation of a $1$) $$ \sum_{i = m}^{n} 1 = n - m + 1 $$

Formula (summation of arithmetic series) $$ \sum_{i = 1}^{n} i = \frac{n(n + 1)}{2} $$

Formula (summation of quadratic series) $$ \sum_{i = 1}^{n} i^2 = \frac{n(n + 1)(2n + 1)}{6} $$

Formula (summation of cubic series) $$ \sum_{i = 1}^{n} i^3 = \left(\frac{n(n + 1)}{2}\right)^2 $$

Formula (summation of $i^k$ series) $$ \sum_{i = 1}^{n} i^k \approx \frac{1}{k + 1}n^{k + 1} $$

Formula (summation of geometric series) $$ \sum_{i = 1}^{n}ar^{i - 1} = a\left(\frac{r^n - 1}{r - 1}\right) = a\left(\frac{1 - r^n}{1 - r}\right), \text{ where } r > 0 \text{ and } r \neq 1 $$

Formula (summation of harmonic series) $$ \sum_{i = 1}^{n} \frac{1}{i} \approx \ln n + \gamma, \text{ where } \gamma \approx 0.5772 \dots $$

Formula (summation of $\log_{2}$ series) $$ \sum_{i = 1}^{n} \log_2 i = O\left(\sum_{i = 1}^{n} \log_2 n\right) = O(n\log_2 n) $$

General Steps

Calculate the total cost by summing the costs of statements where the statements may be primitive operations or loops
Apply Bachmann-Landau notation simplification rules

Note
When analyzing the time and space complexity of graph algorithms, the standard formula is $O(V + E)$ where $V$ is vertices and $E$ is edges. However, whenever the state encompasses more than just the vertex position, we need to reframe our analysis:

Standard case: State = vertex position only

Time complexity: $O(V + E)$

We visit each vertex at most once

State-space case: State = (vertex, additional_constraints)

Let $S$ = total number of possible states

Time complexity: $O(S + E_{state})$ where $E_{state}$ is the number of state transitions

We visit each state at most once, but may revisit vertices under different constraints

Example: In a shortest path problem with fuel constraints on an $m \times n$ grid:

State = (grid_position, remaining_fuel)

$S = m \times n \times k$ where $k$ is max fuel

$E_{state} \approx 4 \times m \times n \times k$ (each state can transition to ~4 neighbors)

Time complexity: $O(S + E_{state}) = O(mnk + 4mnk) = O(mnk)$

The key insight is to analyze complexity based on unique states processed, not just unique vertices visited.

Recursive Algorithms

Come up with a recurrence relation of the following form

$$ T(n)= aT\left(\frac{n}{b}\right)+f(n) $$

$T(n)$: Time complexity for input size $n$
$a$: Number of recursive calls
$b$: Factor by which the problem size is divided
$f(n)$: Time complexity of the work done outside the recursive calls

Use backward substitution, recursion tree, or master theorem to solve the recurrence relation

Worst Case Space Complexity

Common data structure space complexities:

Variable or pointer: $O(1)$
Array: $O(n)$
Node-Pointer Data Structure: $O(n)$
Matrices: $O(m \cdot n)$
Adjacency List: $O(V + E)$

Iterative Algorithms

Add up the space complexities of the intermediate data structures
Apply Bachmann-Landau notation simplification rules

Recursive Algorithms

Add up the space complexities of the intermediate data stuctures
For a recursive function, total cost as follows

$$ \text{Cost} = \text{call stack depth} \cdot \text{cost per call} $$

Apply Bachmann-Landau notation simplification rules

Note
The algorithm's input and output is typically excluded from the total space complexity cost.

Algorithmic Patterns

Classic Two Pointers Pattern

Two Pointers Opposite

Use Case

The list input is sorted, and you are looking for a pair (or pairs) that meet a specific condition.

Usage

def two_pointers_opposite(array):
    left = 0
    right = len(array) - 1
    result = 0

    while left < right:
        # Some logic involving `array[left]` and/or `array[right]`

        if <condition>:
            left += 1
        else:
            right -= 1

    return result

Two Pointers Same Speed

Use Case

The two list inputs are sorted, and you want to compare elements between the two.

Usage

def two_pointers_same_speed(array1, array2):
    i = 0
    j = 0
    result = 0

    while i < len(array1) and j < len(array2):
        # Some logic involving `array1[i]` and `array2[j]`
        if <condition>:
            i += 1
        else:
            j += 1

    while i < len(array1):
        # Some logic involving `array1[i]`
        i += 1

    while j < len(array2):
        # Some logic involving `array2[j]`
        j += 1

    return result

Sliding Window Pattern

Fixed Sliding Window

Use Case

You are looking for a subarray or substring of some length k that satisfies a certain constraint.

Usage

def fixed_sliding_window(array, k):
    current = 0
    result = 0

    for i in range(k):
        # Some logic to build first window involving `current` and/or other variables

    for right in range(k, len(array)):
        # Add `array[i]` to window
        # Remove `arr[i - k]` from window
        # Update `result`

    return result

Variable Sliding Window

Use Case

You are looking for a subarray or substring (usually the longest) that satisfies a certain constraint.

Usage

def variable_sliding_window(array):
    left = 0
    current = 0
    result = 0

    for right in range(len(array)):
        # Some logic to add `array[right]` to `current`

        while <broken window condition>:
            # Remove `array[left]` from `current`
            left += 1

        # Update `result`, where current window length
        # at any time is `right - left + 1`

    return result

Note (When this Technique Fails)
The constraint must be preserved as the sliding window shrinks. If shrinking the window can break the constraint, consider using a prefix sum + hash map technique instead.

Note (Length of a Window)
The formula for the length of a window is right - left + 1 .

Prefix Sum Pattern

Use Case

You need a subroutine which involves finding the sum of any subarray in $O(1)$.

Usage

def build_prefix_sum(nums):
    prefix_sum = [0]

    for num in nums:
        prefix_sum.append(prefix_sum[-1] + num)

    return prefix_sum

# Usage:
# To get the sum from [i, j), perform `prefix[j] - prefix[i]`

HashMap and HashSet Pattern

HashMap

Use Case

Track elements seen so far for uniqueness (with any extra info stored as the value)
Frequency counting
Basic mapping
Memoization table
Count number of subarrays with a constraint that doesn't necessarily hold as you shrink the window

def count_num_subarrays_with_exact_constraint(array, k):
    freq = {0 : 1} # prefix sum frequency map
    prefix_sum = 0
    result = 0

    for num in array:
        # Some logic to change `prefix_sum` as necessary (e.g. prefix_sum += num)
        result += freq.get(prefix_sum - k, 0)
        freq[prefix_sum] += freq.get(prefix_sum, 0) + 1

    return result

Usage

# Create an empty map
my_map = dict()

# Create a map with initial values
my_map = {
    <key 1>: <value 1>,
    <key 2>: <value 2>
}

# Add new entry or update current entry
my_map[<immutable key>] = <value>

# Remove an entry
del my_map[<key>]

# Remove all entries
my_map.clear()

# Get number of entries
len(my_map)

# Check if my_map is empty
not my_map

# Get value
my_map[<key>]

# Get the value with a default value if key isn't found
# Note: I like thise over `defaultdict(<default value type>)` for counting since it can avoid accidentally creating phantom keys
my_map.get(<key>, <default value>)

# Note: pretty much a must for adjacency list creation
from collections import defaultdict
my_map = defaultdict(<default value's type>)

# Check if key exists
<key> in my_map

# Get all entries
my_map.items()

# Get all keys
my_map.keys()

# Get all values
my_map.values()

HashSet

Use Case

Track elements seen so far for uniqueness
Store a chunk (or all) of the input for fast lookups

#### Usage

# Create an empty set
my_set = set()

# Add an element
my_set.add(<element>)

# Check if a set contains an element
<element> in my_set

# Remove an element
my_set.remove(<element>)

# Get the set difference
my_set.difference(other_set)

# Get the set intersection
my_set.intersection(other_set)

# Get the set union
my_set.union(other_set)

# Remove all elements
my_set.clear()

# Get the number of elements
len(my_set)

# Check if the set is empty
not my_set

Fast and Slow Pointers/Floyd's Cycle Detection Pattern

Use Case

Detect cycles in linked list
Find middle of a linked list

Usage

def fast_and_slow_pointers(head):
    fast = head
    slow = head
    result = <initial value>

    while fast and fast.next:
        # Some logic

        fast = fast.next.next
        slow = slow.next

Reversing a Linked List Pattern

Use Case

The problem largely involves reversing pointers in a linked list
You need a subroutine which involves classically reversing pointers in a linked list

Usage

def reverse_linked_list(head):
    prev_node = None
    curr_node = head

    while curr_node:
        next_node = curr_node.next
        curr_node.next = prev_node
        prev_node = curr_node
        curr_node = next_node

**Note (Dummy Head)
Create a dummy head node dummy = listnode(0, head) to reduce edge cases.

Stack and Queue Pattern

Stack

Use Case

Elements in the input interacting with each other, with a LIFO order.

Usage

# Create an empty stack
stack = list()

# Push an element onto the top
stack.append(<element>)

# Pop the element at the top
stack.pop()

# Get the element at the top
stack[-1]

# Get the number of elements in the stack
len(stack)

# Check if the stack is empty
not stack

Note ($O(n)$ String Building)
We often use the stack to store the result and convert it to a string using the $O(n)$ string builder Technique.

def build_string(string):
    array = []

    for char in string:
        array.append(char)

    return "".join(array)

Queue

Use Case

Processing elements in a FIFO order.

Usage

from collections import deque

# Create a queue from an initial list
queue = deque(<initial list>)

# Enqueue an element at the back
queue.append(<element>)

# Dequeue the element at the front
queue.popleft()

# Get the element at the front
queue[0]

# Get the number of elements
len(queue)

# Check if queue is empty
not queue

Monotonic Stack and Monotonic Deque Pattern

Monotonic Stack

Use Case

Looking for the "next" or "previous" "greater" or "smaller" element
"Span until" i.e. counting a range

Usage

def monotonic_non_decreasing_stack(array):
    stack = []
    result = <initial value>

    for i in range(len(array)):
        while stack and array[stack[-1]] > array[i]:
		        stack.pop()
            # Utilize stack.pop() in result
            # e.g. result[stack.pop()] = <something involving i>
            # e.g. result += stack.pop()

        stack.append(i)

    return result

Monotonic Deque

Use Case

Maximum or minimum values in sliding window or ranges.

Usage

from collections import deque

def monotonic_non_increasing_deque(array, k):
    result = []
    dq = deque()

    for i in range(len(array)):
        while dq and array[dq[-1]] < array[i]:
            dq.pop()

        dq.append(i)

        if dq[0] <= i - k:
            dq.popleft()

        if i >= k - 1:
            result.append(array[dq[0]])

    return result

Note (Monotonicity)
A monotonic increasing/decreasing stack or queue implies that the elements are always increasing/decreasing, while a monotonic non-decreasing/non-increasing one may include repeated elements.

Note (Indices Tweak)
Store indices when you want to calculate distances or want to mutate the result list later.

Binary Tree Traversal Patterns

Binary Tree Depth-First Search

Use Case

Most binary tree problems that don't involve processing nodes by their levels.

Usage

def recursive_preorder_dfs(root):
    if not root:
        return <base case result>

    # Additional base cases

    # Some logic involving the current node and the current result

    recursive_preorder_dfs(root.left)
    recursive_preorder_dfs(root.right)

    return <current result and the two recursive calls above>

def iterative_preorder_dfs(root):
    stack = [root]

    result = <initial value>

    while stack:
        node = stack.pop()

	    # Additional base cases

        # Some logic involving the popped node and the result

	    if node.right:
            stack.append(node.right)
        if node.left:
            stack.append(node.left)

    return result

Note (Edge Cases)

Empty tree (i.e. None root)

Single node tree

Skewed tree (i.e. unbalanced trees where all nodes are on one side)

Duplicate values in a binary search tree

Note (Common Depth-First Search Visit Order)
In the iterative depth-first search, the flow is usually pre-order pop node → process node → push right → push left , while in the recursive depth-first search, pre-order process node → recurse left → recurse right is the most common, followed by post-order recurse left → recurse right → process node , then in-order recurse left → process node → recurse right .

Note (Additional Context)
In a recursive depth-first search, when you need to track additional context or values across recursive calls—such as parent nodes, path lengths, or accumulated data—you include that information as an additional parameter. In an iterative depth-first search, you would store them as part of a tuple (node, <value 1>, <value 2>, <…>, <value N>) that'll be pushed and popped from the explicit stack you create outside the while loop.

Note (Keeping Track of the Final Result)
In a recursive depth-first search, result is typically implicit since it's usually sufficient to implicitly be returned, but an explicit result is sometimes a good choice, where you create it within the scope of the provided function, then create and call an inner depth-first search function to do the actual work. In an iterative depth-first search, you typically create an explicit result variable outside the while loop.

Note (BST Techniques)
BST problems typically use DFS traversal. Common Techniques include:

Checking if the current node's value is within bounds (e.g., low <= node.val <= high)

Leveraging the BST property to prune subtrees — if node.val < low , skip the left subtree; if node.val > high , skip the right subtree (where low or high can also just be a target value)

Using inorder traversal to collect values in sorted order for problems requiring sorted data without explicit sorting.

Binary Tree Breadth-First Search

Use Case

Binary tree problems that involve processing nodes by their levels.

Usage

from collections import deque

def bfs(root):
    if not root:
        return

    queue = deque([root])
    result = 0

    while queue:
        level_width = len(queue)

        # Some logic involving the current level

        for _ in range(level_width):
            node = queue.popleft()

            # Some logic involving the current node

            if node.left:
                queue.append(node.left)
            if node.right:
                queue.append(node.right)

    return result

Graph Traversal Patterns

Adjacency List Graph Depth-First Search

Use Case

Most graph problems.

Usage

def adjacency_list_recursive_dfs(graph):
    def dfs(vertex):
        visited.add(vertex)
        result = <initial value>
        for neighbour in graph[vertex]:
            if neighbour not in visited:
                result += dfs(neighbour)
        return result

    result = <initial value>
    visited = set()
    for vertex in graph:
        if vertex not in visited:
            dfs(vertex)

            # Some logic involving the result per connected component

def adjacency_list_iterative_dfs(graph):
    def dfs(vertex):
	    result = <initial value>
        stack = [vertex]
        while stack:
            vertex = stack.pop()
            visited.add(vertex)
            for neighbour in graph[vertex]:
                if neighbour not in visited:
                    result += <calculation>
                    stack.append(neighbour)
		return result

    result = <initial value>
    visited = set()
    for vertex in graph.keys():
        if vertex not in visited:
            dfs(vertex)

            # Some logic involving the result per connected component

Matrix Graph Depth-First Search (Flood Fill)

Use Case

Most graph problems where the input is a matrix.

Usage

def matrix_recursive_dfs(matrix):
    ROWS = len(matrix)
    COLUMNS = len(matrix[0])
    # NOTE: each element is of the form (change in row, change in col)
    DIRECTIONS = [(1, 0), (-1, 0), (0, -1), (0, 1)]

    def valid_cell(row, col):
        return 0 <= row < ROWS and 0 <= col < COLUMNS and <another condition for a cell to be valid>

    def dfs(row, col):
        visited.add((row, col))
        result = <initial value>
        for dr, dc in DIRECTIONS:
            neighbour_row = row + dr
            neighbour_col = col + dc
            if valid_cell(neighbour_row, neighbour_col) and (neighbour_row, neighbour_col) not in visited:
                result += dfs(neighbour_row, neighbour_col)
        return result

    visited = set()
    result = <initial value>
    for row in range(ROWS):
        for col in range(COLUMNS):
            if (row, col) not in visited:
                dfs(row, col)

                # Some logic involving the result per connected component

    return result

def matrix_iterative_dfs(matrix):
    ROWS = len(matrix)
    COLUMNS = len(matrix[0])
    # NOTE: each element is of the form (change in row, change in col)
    DIRECTIONS = [(1, 0), (-1, 0), (0, -1), (0, 1)]

    def valid_cell(row, col):
        return 0 <= row < ROWS and 0 <= col < COLUMNS and <another condition for a cell to be valid>

    def dfs(row, col):
        stack = [(row, col)]
        result = <initial value>
        while stack:
            row, col = stack.pop()
            visited.add((row, col))
            for dr, dc in DIRECTIONS:
                neighbour_row = row + dr
                neighbour_col = col + dc
                if valid_cell(neighbour_row, neighbour_col) and (neighbour_row, neighbour_col) not in visited:
                    result += <some calculation>
                    stack.append((neighbour_row, neighbour_col))
        return result

    visited = set()
    result = <initial value>
    for row in range(ROWS):
        for col in range(COLUMNS):
            if (row, col) not in visited:
                dfs(row, col)

                # Some logic involving the result per connected component

    return result

Adjacency List Graph Breadth-First Search

Use Case

Distance in a graph.

Usage

from collections import deque

def adjacency_list_bfs(graph):
    queue = deque([(<source vertex>,<additional state>, <initial distance>)])
    visited = set([<source vertex>])

    while queue:
        vertex, <additional state>, dist = queue.popleft()

        if vertex == <destination vertex>:
            return dist

        for neighbour in graph[vertex]:
            if neighbour not in visited:
                visited.add(neighbour)
                queue.append((neighbour, <additional state>, dist + 1))

            # Some logic involving the neighbour

Matrix Graph Breadth-First Search

Use Case

Distance in a graph given as a matrix.

Usage

def matrix_iterative_bfs(matrix):
    ROWS = len(matrix)
    COLUMNS = len(matrix[0])
    # NOTE: each element is of the form (change in row, change in col)
    DIRECTIONS = [(0, 1), (1, 0), (1, 1), (-1, -1), (-1, 1), (1, -1), (-1, 0), (0, -1)]

    def valid_cell(row, col):
        return 0 <= row < ROWS and 0 <= col < COLUMNS and <another condition for a cell to be valid>

    queue = deque([(<source vertex>,<additional state>, <initial distance>)])
    visited = set([(<source vertex>, <initial distance>)])

    while queue:
        row, col, dist = queue.popleft()

        if (row, col) == (<destination row>, <destination col>):
            return dist

        for dr, dc in DIRECTIONS:
            neighbour_row = row + dr
            neighbour_col = col + dc
            neighbour_dist = dist + 1
            if (neighbour_row, neighbour_col) not in visited and valid_cell(neighbour_row, neighbour_col):
                visited.add((neighbour_row, neighbour_col))
                queue.append((neighbour_row, neighbour_col, <additional state>, neighbour_dist))

            # Some logic involving the neighbour's row and col

Note (Various Types of Graph Inputs)
Unlike linked lists and binary trees, which we are given head or root respectively, there are various graph inputs:

Matrix: A 2D list, where each element will represent a vertex, but are not numbered 0 to n, its neighbours are the adjacent squares, and the edges are determined by the problem description.

Edge list: A list of edges edges. It's useful to turn it into an adjacency list.
from collections import defaultdict

def build_adjacency_list_graph(edges):
   graph = defaultdict(list)
   for u, v in edges:
       graph[x].append(y)
       graph[y].append(x) # comment out this line if the input is a directed graph

   return graph
Integer Adjacency List: A 2D list of integers graph, where n nodes are numbered from 0 to n - 1, and graph[i] represents the neighbours of node i.

Integer Adjacency Matrix: A 2D list of integers, where n nodes are numbered from 0 to n - 1, thereby forming an n x n square matrix, and where when graph[i][j] == 1, there exist an edge between node i and node j, and when graph[i][j] == 0, there is no edge between node i and node j. It's also useful to pre-process it into an adjacency list.
from collections import defaultdict

def build_adjacency_list_graph(adjacency_matrix):
    graph = defaultdict(list)
    n = len(adjacency_matrix)

    for i in range(n):
        for j in range(i + 1, n):
            if adjacency_matrix[i][j]:
                graph[i].append(j)
                graph[j].append(i) # comment out this line if the input is a directed graph

   return graph
Although, even if the input is none of the above, it may still be an implicit graph problem, often where vertices aren't explicitly given, but can be generated on the fly through valid transitions or transformations. These problems typically involve:

A starting state and a goal/end state

A defined set of valid transitions or mutations

Optional constraints like invalid/intermediate states

Note (visited Container)
visited is typically a HashSet, but you might achieve better runtime performance by using a boolean array when the node range is predetermined (which is typical since graph problems usually number nodes from 0 to n - 1 )

Note (Prohibited Vertices)
Whenever the problem mentions prohibited vertices, then that's an indicator to add them straight away to the visited container.

Note (Distance vs Path Length for BFS)
When using BFS to find shortest paths:

If the problem asks for distance (number of moves/steps), initialize source vertices with distance = 0

If the problem asks for path length (number of cells in the path), initialize source vertices with path_length = 1

Note (Multi-source BFS)
For a multi-source BFS, create a for loop that visits all source nodes and appends them to the queue for the BFS.

Note (Reframing the Problem)
For graph problems, it's useful to rephrase the problem in terms of its inverse. For instance, take LeetCode #1557. The original problem description asks us to find the smallest set of vertices from which all nodes in the graph are reachable. Instead, we can rephrase the problem description in terms of its inverse — find the smallest set of nodes that cannot be reached from other nodes, since if a node can be reached from another node, then we would rather just include the pointer rather than the pointee in our set. Another example is LeetCode #542. The brute force solution would be to perform BFS for each cell with a 1, but instead, we can perform a multi-source BFS by performing starting from all cells with a 0 (if we have a cell x with value 1 and its nearest cell y has value 0, then it doesn't make a difference if we traverse from x -> y or y -> x — both give the same distance).

Priority Queue Pattern

Use Case

Repeatedly find the maximum or minimum element
Get the "top" $k$ elements (create an empty priority queue that prioritizes the elements you want to discard — use a min heap to keep the largest k elements, or a max heap to keep the smallest k; insert elements from the input list, and whenever the queue exceeds size k, pop the root to maintain only the top k you care about)

import heapq

def top_k(array, k):
    pq = []
    for num in array:
        # Some logic to push add an element according to problem's criteria
        heapq.heappush(pq, (<criteria as key>, num))
        if len(pq) > k:
            heapq.heappop(pq)

    return [num for _, num in pq]

Finding a median (involves one max heap priority queue for the lower half of the list, and one min heap priority queue for the upper half of the list)

Usage

import heapq

# Create an empty priority queue
priority_queue = []
heapq.heapify(priority_queue)

# Add an element
heapq.heappush(priority_queue, <element>)

# Remove the top element
heapq.heappop(priority_queue)

# Peek at the top element
priority_queue[0]

# Get the number of elements
len(priority_queue)

# Check if the priority queue is empty
not priority_queue

Note (Heaps in Python's Standard Library)
Python's heapq module only implements min‐heaps. To simulate a max heap, first build a new list pq from the elements of the input but negated using -, call heapq.heapify(pq), and then negate each popped value to get back the original.

Note (Using a Key for Priority)
To assign priority based on a specific key, wrap each item in your list as a tuple: (key, element). For max heaps, you must negate the key by -, and if the question involves tie-breakers, negate the element by -.

Greedy Algorithm Pattern

Use Case

Typically finding the minimum or the maximum of a property of the input array.

Usage

Determine if you should greedily pick the minimum or the maximum at each step.
Sort the input array (often you may need to sort based on the frequency of each element using Counter(<list>)).
Iterate over the sorted array and increment the result, making use of a heap as needed.

Binary Search Pattern

On an Array

Use Case

For some input array, array, and a desired element target, array must be sorted, and you want to:

Find the index of target if it is in array.
Find the lower bound (first index) or upper bound (last index) at target can be inserted to maintain the sorted order of array.

Usage

def binary_search(array, target):
    left = 0
    right = len(array) - 1

    while left <= right:
        mid = (left + right) // 2
        if array[mid] == target:
            # Some logic
            return
        if array[mid] < target:
            left = mid - 1
        else:
            left = mid + 1

    return left # if `target` is not in `array`, `left` is at the insertion point

def lower_bound(array, target):
    """
    Finds the first index at which `target` can be inserted in `array`
    to maintain sorted order (i.e., the lower bound).

    If `target` exists in `array`, returns the index of its first occurrence.
    If `target` does not exist, returns the index where it can be inserted.

    Equivalent to bisect.bisect_left in the Python standard library.

    Parameters:
        array (List[int]): A sorted list of integers.
        target (int): The value to search for.

    Returns:
        int: The index of the lower bound for `target`.
    """
    left = 0
    right = len(array)

    while left < right:
        mid = (left + right) // 2
        if arr[mid] < target:
            left = mid + 1
        else:
            right = mid

    return left

def upper_bound(array, target):
    """
    Finds the first index at which a value greater than `target`
    can be inserted in `array` to maintain sorted order (i.e., the upper bound).

    Returns the index of the first element greater than `target`.
    If all elements are less than or equal to `target`, returns `len(array)`.

    Equivalent to bisect.bisect_right in the Python standard library.

    Parameters:
        array (List[int]): A sorted list of integers.
        target (int): The value to search for.

    Returns:
        int: The index of the upper bound for `target`.
    """
    left = 0
    right = len(array)

    while left < right:
        mid = (left + right) // 2
        if arr[mid] <= target:
            left = mid + 1
        else:
            right = mid

    return left

Note (Duplicate Elements)
binary_seach() doesn't work if array contains duplicates, but lower_bound() and upper_bound() does allow duplicates in array.

Note (Descending Elements)
If the input array is sorted in descending order, simply invert the inequality in the if condition.

On a Solution Space

Use Case

You're trying to find a maximum or minimum value, and:

You can verify (usually with a greedy algorithm) in $O(n)$ time (or faster) whether a given candidate x is a valid solution.
If x is a valid solution:
- For a maximum, $\text{all values} \le x$ are also valid.
- For a minimum, $\text{all values} \ge x$ are also valid.
If x is not a valid solution:
- For a maximum, $\text{all values} \gt x$ are also invalid.
- For a minimum, $\text{all values} \lt x$ are also invalid.

Usage

def binary_search_minimum(array):
    def is_valid(x):
        # Some O(n) algorithm (usually also a greedy algorithm)
        return <boolean>

    left = <min possible answer>
    right = <max possible answer>

    while left <= right:
        mid = (left + right) // 2
        if is_valid(mid):
            right = mid - 1
        else:
            left = mid + 1

    return left

def binary_search_maximum(array):
    def is_valid(x):
        # Some O(n) algorithm (usually also a greedy algorithm)
        return <boolean>

    left = <min possible answer>
    right = <max possible answer>

    while left <= right:
        mid = (left + right) // 2
        if is_valid(mid):
            left = mid + 1
        else:
            right = mid - 1

    return right

Note (Safely Calculating mid)
For most languages, to avoid integer overflow, calculate mid using this formula instead left + (right - left) / 2.

Backtracking (Complete Search)

Unconstrained Backtracking

Use Case

Generating permutations, subsets, or combinations

Template

class Solution:
    def permute(self, nums: List[int]) -> List[List[int]]:
        result = []

        def backtrack(path):
            if len(path) == len(nums):
                result.append(path.copy())
                return

            for num in nums:
                if num not in path:
                    path.append(num)
                    backtrack(path)
                    path.pop()

        backtrack([])

        return result

class Solution:
    def subsets(self, nums: List[int]) -> List[List[int]]:
        result = []

        def backtrack(path, start):
            if start > len(nums):
                return

            result.append(path.copy())

            for i in range(start, len(nums)):
                curr.append(nums[i])
                backtrack(path, i + 1)
                path.pop()

        backtrack([], 0)

        return result

class Solution:
    def combine(self, n: int, k: int) -> List[List[int]]:
        result = []

        def backtrack(path, start):
            if len(path) == k:
                result.append(path.copy())
                return

            for num in range(start, n + 1):
                path.append(num)
                backtrack(path, num + 1)
                path.pop()

        backtrack([], 1)

        return result

Backtracking with pruning

Use Case

Constraints

Usage

TO DO

Note (Backtracking as a Decision Tree)
A backtracking solution can be visualized as a tree, where each node represents the current state of the path during a recursive function call. The backtrack() calls explore different branches of this tree, building potential solutions along the way. The leaves of the tree correspond to base cases—often representing complete solutions, though not necessarily in every problem.

Bit Manipulation

Mathematically, n >> k is equivalent to $\lfloor\frac{n}{2^k}\rfloor$.

Bit Mask

A bit mask is a sequence of bits used to set, clear, or isolate specific bits in another value using bitwise operators.

The bit mask is typically a constant integer matching the same length as the value which we're masking on.

Data Representation

Number Systems

Decimal System

Decimal is a number system that uses base 10, characterized by two fundamental properties:

Each digit position can hold one of 10 unique values (0 through 9). Values greater than 9 require carrying to an additional digit position to the left.
Each digit's position determines its contribution to the overall value through positional notation. Digits are labeled from right to left as $d_0, d_1, d_2, \dots, d_{N - 1}$ where each successive position represents ten times the value of the position to its right.

More formally, any $N$-digit decimal number can be expressed using positional notation as:

$$ (d_{N-1} \cdot 10^{N-1}) + (d_{N-2} \cdot 10^{N-2}) + \dots + (d_2 \cdot 10^2) + (d_1 \cdot 10^1) + (d_0 \cdot 10^0) $$

Decimal numbers are typically written without prefixes, though they may be denoted as $N_{10}$ when clarification is needed.

Binary System

Binary is a number system that uses base 2, characterized by:

Each digit position (called a bit) can hold one of two unique values: 0 or 1. Values greater than 1 require carrying to an additional bit position to the left.
Each bit's position determines its contribution through powers of 2: $2^{N-1}, \dots, 2^1, 2^0$.

Binary numbers are typically prefixed with 0b.

Binary Terminology and Groupings

Since individual bits represent limited information, they are commonly grouped:

Byte: 8 bits (the smallest addressable unit of memory)
Nibble: 4 bits (half a byte)
Word: The natural data width of a CPU (typically 32 or 64 bits)

In any binary sequence, we identify:

Most Significant Bit (MSB): The leftmost bit
Least Significant Bit (LSB): The rightmost bit

Hexadecimal System

Hexadecimal uses base 16, characterized by:

Each digit position can hold one of 16 unique values. Since we only have 10 decimal digits (0 to 9), letters A to F represent values 10 to 15 respectively.
Each digit's position determines its contribution through powers of 16: $16^{N-1}, \dots, 16^1, 16^0$.

Hexadecimal numbers are prefixed with 0x.

Hexadecimal-Binary-Decimal Conversion Table

Hex	Binary	Decimal
0	0000	0
1	0001	1
2	0010	2
3	0011	3
4	0100	4
5	0101	5
6	0110	6
7	0111	7
8	1000	8
9	1001	9
A	1010	10
B	1011	11
C	1100	12
D	1101	13
E	1110	14
F	1111	15

Converting Between Number Systems

Decimal to Any Base (Division method)

Repeatedly divide the decimal number by the target base
Record the remainder at each step
Continue until the quotient becomes 0
Read the remainders from bottom to top to get the result

Any Base to Decimal (Addition of Positional Notation)

For a number with digits $d_{N-1}, d_{N-2}, \dots, d_1, d_0$ in base $B$, the converted number to base $C$ is calculated using the following formula:

$$ (d_{N-1} \cdot B^{N-1}) + (d_{N-2} \cdot B^{N-2}) + \dots + (d_1 \cdot B^11) + (d_0 \cdot B^0) $$

Binary to Hexadecimal

Group binary digits into sets of 4 bits from right to left
Pad the leftmost group with leading zeros if necessary
Convert each 4-bit group to its hexadecimal equivalent

Hexadecimal to Binary

Convert each hexadecimal digit to its 4-bit binary equivalent
Concatenate all binary groups

Computer Data Units

Memory and Storage Capacities

Unit	Value
1 KB (Kilobyte)	$2^{10}$ bytes (1,024 bytes)
1 MB (Megabyte)	$2^{20}$ bytes (1,048,576 bytes)
1 GB (Gigabyte)	$2^{30}$ bytes (1,073,741,824 bytes)
1 TB (Terabyte)	$2^{40}$ bytes (1,099,511,627,776 bytes)
1 PB (Petabyte)	$2^{50}$ bytes (1,125,899,906,842,624 bytes)
1 EB (Exabyte)	$2^{60}$ bytes (1,152,921,504,606,846,976 bytes)

Data Transfer Units

Unit	Value
1 Kilobit (Kb)	1,000 bits
1 Megabit (Mb)	1,000,000 bits
1 Gigabit (Gb)	1,000,000,000 bits
1 Terabit (Tb)	1,000,000,000,000 bits

Character Representation

ASCII

ASCII is a character encoding standard which uses 8 bits per character, where 7 bits encode the actual character and the eighth bit serves as a parity bit for error detection, where the added bit ensures either an even number of 1 (even parity) or an odd number of 1 (odd parity) in the complete byte.

Types of ASCII Characters

Non-printable characters
- Control characters, from 0x00 to 0x1F (i.e. $0_{10}$ to $31_{10}$)
- Delete character, 0x7F (i.e. $127_{10}$)
Printable characters, from 0x20 to 0x7E ($32_{10}$ to $126_{10}$).

ASCII Table

Hex	Char	Description
00	NUL	Null character
01	SOH	Start of Heading
02	STX	Start of Text
03	ETX	End of Text
04	EOT	End of Transmission
05	ENQ	Enquiry
06	ACK	Acknowledge
07	BEL	Bell
08	BS	Backspace
09	HT	Horizontal Tab
0A	LF	Line Feed
0B	VT	Vertical Tab
0C	FF	Form Feed
0D	CR	Carriage Return
0E	SO	Shift Out
0F	SI	Shift In
10	DLE	Data Link Escape
11	DC1	Device Control 1 (XON)
12	DC2	Device Control 2
13	DC3	Device Control 3 (XOFF)
14	DC4	Device Control 4
15	NAK	Negative Acknowledge
16	SYN	Synchronous Idle
17	ETB	End of Transmission Block
18	CAN	Cancel
19	EM	End of Medium
1A	SUB	Substitute
1B	ESC	Escape
1C	FS	File Separator
1D	GS	Group Separator
1E	RS	Record Separator
1F	US	Unit Separator
20	SP	Space
21	!	Exclamation mark
22	"	Double quote
23	#	Number sign
24	$	Dollar sign
25	%	Percent sign
26	&	Ampersand
27	'	Apostrophe
28	(	Left parenthesis
29	)	Right parenthesis
2A	*	Asterisk
2B	+	Plus sign
2C	,	Comma
2D	-	Hyphen-minus
2E	.	Period
2F	/	Slash
30	0	Digit 0
31	1	Digit 1
32	2	Digit 2
33	3	Digit 3
34	4	Digit 4
35	5	Digit 5
36	6	Digit 6
37	7	Digit 7
38	8	Digit 8
39	9	Digit 9
3A	:	Colon
3B	;	Semicolon
3C	<	Less than
3D	=	Equal sign
3E	>	Greater than
3F	?	Question mark
40	@	At sign
41	A	Uppercase A
42	B	Uppercase B
43	C	Uppercase C
44	D	Uppercase D
45	E	Uppercase E
46	F	Uppercase F
47	G	Uppercase G
48	H	Uppercase H
49	I	Uppercase I
4A	J	Uppercase J
4B	K	Uppercase K
4C	L	Uppercase L
4D	M	Uppercase M
4E	N	Uppercase N
4F	O	Uppercase O
50	P	Uppercase P
51	Q	Uppercase Q
52	R	Uppercase R
53	S	Uppercase S
54	T	Uppercase T
55	U	Uppercase U
56	V	Uppercase V
57	W	Uppercase W
58	X	Uppercase X
59	Y	Uppercase Y
5A	Z	Uppercase Z
5B	[	Left square bracket
5C	\	Backslash
5D	]	Right square bracket
5E	^	Caret
5F	_	Underscore
60	`	Grave accent
61	a	Lowercase a
62	b	Lowercase b
63	c	Lowercase c
64	d	Lowercase d
65	e	Lowercase e
66	f	Lowercase f
67	g	Lowercase g
68	h	Lowercase h
69	i	Lowercase i
6A	j	Lowercase j
6B	k	Lowercase k
6C	l	Lowercase l
6D	m	Lowercase m
6E	n	Lowercase n
6F	o	Lowercase o
70	p	Lowercase p
71	q	Lowercase q
72	r	Lowercase r
73	s	Lowercase s
74	t	Lowercase t
75	u	Lowercase u
76	v	Lowercase v
77	w	Lowercase w
78	x	Lowercase x
79	y	Lowercase y
7A	z	Lowercase z
7B	{	Left curly brace
7C	\|	Vertical bar
7D	}	Right curly brace
7E	~	Tilde
7F	DEL	Delete

Unicode

Unicode is a character set standard that assigns a unique hexadecimal identifier, called a code point, to every character across all writing systems. These code points follow the format U+<....>.

Code Point Categories

Surrogate code points: Code points in the range U+D800 to U+DFFF, which are reserved for use in UTF-16 encoding
Scalar values: All other Unicode code points (excluding surrogate code points)

Character Encoding Standards

Unicode defines three primary character encoding standards for converting code points into bytes for storage and transmission. These standards use code units as their basic storage elements to represent characters.

UTF-8
- Code unit size: 8 bits (1 byte)
- Character representation: 1 to 4 code units per character
- Encoding type: Variable-length encoding
UTF-16
- Code unit size: 16 bits (2 bytes)
- Character representation: 1 code unit, or 2 code units for surrogate code points, and we call these 2 code units a surrogate pair
UTF-32
- Code unit size: 32 bits (4 bytes)
- Character representation: Exactly 1 code unit per character

Grapheme clusters

A grapheme cluster represents what users typically think of as a "character", that is, the smallest unit of written language that has semantic meaning.

Example

The grapheme clusters in the Hindi word "क्षत्रिय" are ["क्ष", "त्रि", "य"], where each cluster can comprise multiple Unicode code points. While "य" is a single code point, "क्ष" is a conjunct consonant formed from three code points: the base character "क" (U+0915), the virama "्" (U+094D), and "ष" (U+0937), which combine to create one visual unit. The cluster "त्रि" is even more complex, consisting of four code points: "त" (U+0924), "्" (U+094D), "र" (U+0930), and the vowel sign "ि" (U+093F), where the vowel mark appears visually before the consonant cluster despite following it in the Unicode sequence.

The example above demonstrates why simply counting Unicode code points doesn't always correspond to what users perceive as individual characters, making grapheme cluster boundaries crucial for correct string processing.

Integer Representation

Unsigned Integer Representation

Unsigned integers use all available bits to represent positive values (including zero). No bit is reserved for a sign, allowing for a larger range of positive values compared to signed integers of the same bit width.

An $N$-bit unsigned integer can take up values from $0$ to $2^N - 1$.

Signed Integer Representation

Signed integers reserve one bit (typically the MSB) to indicate sign, reducing the range of representable positive values but enabling representation of negative numbers.

Two's Complement

Two's complement is the most common encoding system to represent signed integers. It treats the MSB exclusively as a sign bit:

MSB = 0: positive number (or zero)
MSB = 1: negative number

An $N$-bit unsigned integer using two's complement can take up values from $-2^{N-1}$ to $2^{N-1} - 1$.

To represent a number using two's complement:

Start with the binary representation of the positive number
Toggle all the bits
Add 1 to the result

Note
When converting from binary to decimal, if the MSB is a 1, treat it as a -1, then proceed as discussed in the "Any Base to Decimal" section.

Floating-Point Representation

IEEE 754 Structure

Single Precision (32-bit)

| Sign | Biased Exponent | Normalized Mantissa |
|  1   |       8         |        23           |

Double Precision (64-bit)

| Sign | Biased Exponent | Normalized Mantissa |
|  1   |       11        |        52           |

Fields

Sign: Represents the sign of the floating-point number using signed magnitude encoding (0 = positive, 1 = negative)
Normalized Mantissa (Significand): Represents the significant digits of the number. The mantissa is normalized, meaning we move the decimal point so it falls in the range $[1.0, 2.0)$, but we implicitly store the 1, and explicitly store the digits after the decimal point (i.e., the fractional part).
Biased Exponent: Determines the power of 2 by which to scale the mantissa. The exponent is biased, meaning we substract a constant bias to the actual exponent. For single precision, the bias is $127_{10}$; for double precision, the bias is $1023_{10}$

Note
The mantissa is actually 24 bits total in single precision, if we count the implicit number before the decimal point, and 53 bits total in double precision due to the implicit leading number.

Converting Decimal to IEEE 754

To convert $12.375_{10}$ to single precision IEEE 754:

Step 1: Convert to binary

$12.375_{10} = 1100.011_2$

Step 2: Determine the sign

Since it's a positive number, the sign bit is 0.

Step 3: Normalize the mantissa

$1100.011_2 = 1.100011_2 \times 2^3$

The normalized mantissa (fractional part only), padded to 23 bits, is: $10001100000000000000000_2$

Step 4: Calculate the biased exponent

Biased exponent = actual exponent + bias = $3 + 127 = 130 = 10000010_2$

Step 5: Combine all fields

| Sign | Exponent | Mantissa                |
|  0   | 10000010 | 10001100000000000000000 |

Final result: $01000001010001100000000000000000_2$

Special Numbers

Denormalized Numbers

When calculations produce results that are too small for normal representation (i.e., below $1.0 \times 2^{1 - 127}$ for single-precision and $1.0 \times 2^{1 -1023}$ for double-precision) but are non-zero, we use denormalized representation. This is achieved by:

Setting the exponent field to all zeros (which corresponds to using the fixed exponent $1 -127$ for single-precision or $1 - 1023$ for double-precision)
Using an implicit leading 0 instead of 1 before the decimal point in the mantissa
Calculating the mantissa such that $0.\text{mantissa} \times 2^{1 -127}$ (or $2^{1 - 1023}$ for double-precision) equals the initial number's binary scientific notation.

Infinity

When calculations exceed the representable range (overflow) or a non-zero value divided by zero, the result is infinity. The exponent field becomes all ones and the mantissa is all zeros. The sign bit determines positive or negative infinity.

This allows programs to continue running with meaningful results rather than crashing on overflow.

NaN (Not a Number)

Represents mathematically undefined results. Like infinity, the exponent is all ones, but now the mantissa is non-zero. Any operation with NaN produces NaN, and NaN is never equal to anything, including itself. It also has two types:

Quiet NaN: Silently propagates through calculations
Signaling NaN: Triggers an exception/error when encountered in operations

Integer Overflow

Integer overflow occurs when you try to store a number that exceeds the maximum value that can be represented with the available number of bits, causing the result to wrap around to an unexpected value.

The overflowed result can be found as follows:

Unsigned integer: $\text{(expected value)} \bmod 2^n$
Signed Integer:
- Positive overflow (i.e., $\text{expected value} > 2^{N-1} - 1$): $\text{expected value} - 2^n$
- Negative overflow (i.e., $\text{expected value} < -2^{N-1}$): $\text{expected value} + 2^n$

Integer Byte Order

Byte order, also known as endianness of a system defines how multi-byte values are assign to memory addresses.

Big endian Byte Order: The most significant byte is stored at the lowest memory address.
Little endian Byte Order: The least significant byte is stored at the lowest memory address.

`struct` Packing

Background

Alignment refers to the set of memory addresses where a value is permitted to start. A type's natural alignment is the default alignment, determined by the platform, that enables efficient memory access (i.e. single-instruction loads & stores). Without natural alignment, a value might be placed such that it straddles machine-word boundaries. As a result, the CPU may need to perform multiple memory accesses and combine the pieces in registers, which increases the cost of reading or writing the value. In Rust, you can check a type's alignment using std::mem::align_of.

For data structures, notably structs or unions, the compiler often inserts padding — extra unused bytes — to ensure each field begins at an address satisfying its own alignment and/or the alignment of the entire data structure. Padding added between fields to align them properly is called internal field padding, while padding added at the end to satisfy overall alignment requirements is called trailing padding.

The data structure size is the total number of bytes the structure occupies in memory, including both actual data and any inserted padding. You can retrieve this size using std::mem::size_of.

We also say a type is self-aligned to describe that its alignment equals its size.

The following table lists common types along with their natural alignment and size (in bytes) on a x86-64 system.

Type	Natural Alignment for x86-64 systems (bytes)	Size (bytes)
Primitive	Memory address divisible by the type's size	Depends on the type ( `bool` = 1, `u8` / `i8` = 1, `u16` / `i16` = 2, `char` = 4, `f32` / `i32` / `u32` = 4, `f64` / `i64` / `u64` = 8, `u128` / `i128` = 16, `usize` / `isize` = determined by the maximum addressable memory on a given architecture (i.e. 8 bytes on 64-bit systems, 4 bytes on 32-bit systems))
Array / Slice	Memory address divisible by the element type's natural alignment	$\text{element size} \cdot \text{number of elements in the array}$
Pointer	Memory address divisible by the pointer size	Determined by the maximum addressable memory on a given architecture (i.e. 8 bytes on 64-bit systems, 4 bytes on 32-bit systems)
Struct	Maximum natural alignment among all fields	$\text{size of fields} + \text{internal field padding} + \text{trailing padding}$

Note
In C, the order of fields of a struct is preserved exactly as written, while in Rust, the compiler may reorder fields to reduce padding unless you use the #[repr(C)] attribute.

Field Re-Ordering

A common technique to pack a struct is re-ordering fields by decreasing natural alignment, since a field with larger alignment guarantees that any following smaller-alignment field will already be properly aligned:

#![allow(unused)]
fn main() {
#[repr(C)]
struct DefaultStruct {
    c: u8,        // 1 byte
                  // 7 bytes of field padding
    p: *const u8, // 8 bytes
    x: i16,       // 2 bytes
                  // 6 bytes of trailing padding
}
                  // total size: 24 bytes
}

#![allow(unused)]
fn main() {
#[repr(C)]
struct PackedStruct {
    p: *const u8, // 8 bytes
    x: i16,       // 2 bytes
    c: u8,        // 1 byte
                  // 5 bytes of trailing padding
}
                  // total size: 16 bytes
}

Another common trick is to group fields of the same alignment and same size together (on top of applying the trick above):

#![allow(unused)]
fn main() {
#[repr(C)]
struct DefaultStruct {
    a: i32,   // 4 bytes
    c1: u8,   // 1 byte
              // 3 bytes of field padding
    b: i32,   // 4 bytes
    c2: u8,   // 1 byte
              // 1 byte of field padding
    s1: i16,  // 2 bytes
    s2: i16,  // 2 bytes
              // 2 bytes of trailing padding
}
              // total size: 20 bytes
}

#![allow(unused)]
fn main() {
#[repr(C)]
struct PackedStruct {
    a: i32,   // 4 bytes
    b: i32,   // 4 bytes
    s1: i16,  // 2 bytes
    s2: i16,  // 2 bytes
    c1: u8,   // 1 byte
    c2: u8,   // 1 byte
              // 2 bytes of trailing padding
}
              // total size: 16 bytes
}

Compiler Struct Packing

In Rust, you can add an attribute to force the compiler to lay out struct fields back-to-back with no padding, thereby breaking natural alignment:

#![allow(unused)]
fn main() {
#[repr(packed)]
struct PackedStruct {
    a: u8,
    b: i32,
    c: f64,
}
}

Note
Accessing unaligned fields in packed structs may lead to significantly slower code or even cause undefined behaviour on some platforms which can't handle unaligned access at all.

Bit Packing

Bit Fields

Bit fields represent a technique for using integers as structured data containers by partitioning them into consecutive segments of bits, with each segment storing a distinct field. This approach involves a predetermined number of fields arranged within the integer's bit pattern.

#![allow(unused)]
fn main() {
pub struct BitField(u32);

#[derive(Debug, PartialEq, Clone, Copy)]
pub enum FieldId {
    Field1,
    Field2,
    Field3,
    Field4,
    Field5,
}

#[derive(Debug, PartialEq)]
pub enum BitFieldError {
    InvalidField,
    ValueTooLarge(u32),
}

impl BitField {
    // Overall visualization:
    // | 31 30 29 28 | 27 ... 18      | 17 ... 12  | 11 10 9 8 | 7 ... 0  |
    // +-------------+----------------+------------+-----------+----------+
    // |  Field5     |   Field4       |  Field3    |  Field2   | Field1   |
    // |   4 bits    |  10 bits       |  6 bits    |  4 bits   | 8 bits   |

    // bits 0 to 7, possible values from 0 to 255
    const FIELD1_START: u32 = 0;
    const FIELD1_WIDTH: u32 = 8;

    // bits 8 to 11, possible values from 0 to 15
    const FIELD2_START: u32 = 8;
    const FIELD2_WIDTH: u32 = 4;

    // bits 12 to 17, possible values from 0 to 63
    const FIELD3_START: u32 = 12;
    const FIELD3_WIDTH: u32 = 6;

    // bits 18 to 27, possible values from 0 to 1023
    const FIELD4_START: u32 = 18;
    const FIELD4_WIDTH: u32 = 10;

    // bits 28 to 31, possible values from 0 to 15
    const FIELD5_START: u32 = 28;
    const FIELD5_WIDTH: u32 = 4;

    pub const fn new() -> Self {
        Self(0)
    }

    pub fn set_field(&mut self, field: FieldId, value: u32) -> Result<(), BitFieldError> {
        let (start, width) = Self::get_field_info(field);
        let max_value = Self::get_max_value(width);
        if value > max_value {
            return Err(BitFieldError::ValueTooLarge(value));
        }
        let mask = Self::get_field_mask(start, width);
        self.0 = (self.0 & !mask) | ((value << start) & mask);
        Ok(())
    }

    pub fn get_field(&self, field: FieldId) -> u32 {
        let (start, width) = Self::get_field_info(field);
        let mask = Self::get_field_mask(start, width);
        (self.0 & mask) >> start
    }

    pub fn clear_field(&mut self, field: FieldId) -> Result<(), BitFieldError> {
        self.set_field(field, 0)
    }

    fn get_field_info(field: FieldId) -> (u32, u32) {
        match field {
            FieldId::Field1 => (Self::FIELD1_START, Self::FIELD1_WIDTH),
            FieldId::Field2 => (Self::FIELD2_START, Self::FIELD2_WIDTH),
            FieldId::Field3 => (Self::FIELD3_START, Self::FIELD3_WIDTH),
            FieldId::Field4 => (Self::FIELD4_START, Self::FIELD4_WIDTH),
            FieldId::Field5 => (Self::FIELD5_START, Self::FIELD5_WIDTH),
        }
    }

    fn get_field_mask(start: u32, width: u32) -> u32 {
        ((1u32 << width) - 1) << start
    }

    fn get_max_value(width: u32) -> u32 {
        (1u32 << width) - 1
    }
}
}

Bit Flags

Bit Flags is a technique where individual bits within an integer act as separate boolean toggles, each indicating whether a specific condition, feature, or option is active or inactive. The number of flags is generally predetermined and typically occupies only a portion of the available integer bits.

#![allow(unused)]
fn main() {
pub struct BitFlags(u8);

#[derive(Debug, PartialEq)]
pub enum BitFlagsError {
    InvalidFlag(u8),
}

impl BitFlags {
    pub const FLAG1: u8 = 0x01;
    pub const FLAG2: u8 = 0x02;
    pub const FLAG3: u8 = 0x04;
    pub const FLAG4: u8 = 0x08;

    const VALID_FLAGS: u8 = Self::FLAG1 | Self::FLAG2 | Self::FLAG3 | Self::FLAG4;

    pub const fn new() -> Self {
        Self(0)
    }

    pub fn set(&mut self, flags: u8) -> Result<(), BitFlagsError> {
        if (Self::VALID_FLAGS & flags) != flags {
            return Err(BitFlagsError::InvalidFlag(!Self::VALID_FLAGS & flags));
        }
        self.0 |= flags;
        Ok(())
    }

    pub fn clear(&mut self, flags: u8) -> Result<(), BitFlagsError> {
        if (Self::VALID_FLAGS & flags) != flags {
            return Err(BitFlagsError::InvalidFlag(!Self::VALID_FLAGS & flags));
        }
        self.0 &= !flags;
        Ok(())
    }

    pub fn toggle(&mut self, flags: u8) -> Result<(), BitFlagsError> {
        if (Self::VALID_FLAGS & flags) != flags {
            return Err(BitFlagsError::InvalidFlag(!Self::VALID_FLAGS & flags));
        }
        self.0 ^= flags;
        Ok(())
    }

    pub fn has_any(&self, flags: u8) -> Result<bool, BitFlagsError> {
        if (Self::VALID_FLAGS & flags) != flags {
            return Err(BitFlagsError::InvalidFlag(!Self::VALID_FLAGS & flags));
        }
        Ok((self.0 & flags) != 0)
    }
}
}

Struct of Arrays (SoA)

Background

Struct of Arrays is a data layout where you store each field in its own contiguous array (can be as the struct's field, a global variable, or a local variable), which provides:

Low memory usage since it eliminates per-element padding in a AoS layout, and instead, you only have padding for array fields in a SoA layout
Better cache locality
Good memory bandwidth utilization for batched code (i.e. if you have a loop that process several objects, but only accesses a few of the fields, then the SoA layout reduces the amount of data that needs to be loaded).

#![allow(unused)]
fn main() {
struct MySoA {
    field1: Vec<Type1>,
    field2: Vec<Type2>,
    // ...
    fieldN: Vec<TypeN>,
}
}

Example: Polymorphism using SoA and AoS

AoS with Enum-based Polymorphism

#![allow(unused)]
fn main() {
struct Shape {
    x: f32,
    y: f32,
    kind: ShapeKind,
}

enum ShapeKind {
    Circle { radius: f32 },
    Rectangle { width: f32, height: f32 },
    Triangle { base: f32, height: f32 },
}
}

AoS with Subclass-based (via Composition) Polymorphism

#![allow(unused)]
fn main() {
struct Shape {
    tag: Tag,
    x: f32,
    y: f32,
}

enum Tag {
    Circle,
    Rectangle,
    Triangle,
}

struct Circle {
    base: Shape,
    radius: f32,
}

struct Rectangle {
    base: Shape,
    width: f32,
    height: f32,
}

struct Triangle {
    base: Shape,
    base_len: f32,
    height: f32,
}
}

Hybrid AoS and SoA Variant-Encoding-based Polymorphism

#![allow(unused)]
fn main() {
struct Shape {
    tag: Tag,
    x: f32,
    y: f32,
    extra_payload_idx: u32,
}

enum Tag {
    Circle,
    Rectangle,
    Triangle,
}

struct CirclePayload {
    radius: f32,
}
struct RectanglePayload {
    width: f32,
    height: f32,
}
struct TrianglePayload {
    base: f32,
    height: f32,
}

// Layout at runtime:
//  - One Vec<Shape>
//  - Additional Vec<CirclePayload>, Vec<RectanglePayload>, and/or Vec<TrianglePayload>
//    indexed by Shape.extra_payload_idx as required

// Example of accessing the extra data:
match shapes[i].tag {
    Tag::Circle      => circle_payloads[shapes[i].extra_payload_idx].radius,
    Tag::Rectangle   => rectangle_payloads[shapes[i].extra_payload_idx],
    Tag::Triangle    => triangle_payloads[shapes[i].extra_payload_idx],
}
}

SIMD Data Parallelism

Background

Basics

Single Instruction, Multiple Data (SIMD) is a form of data parallelism in which a single instruction operates on multiple data values simultaneously. This is achieved using special registers called vector registers, or vector for short, which hold multiple values at once. Operations performed on these registers are called vector operations. Each individual value within a vector register is referred to as a lane.

SIMD Instruction Set Extensions

SIMD instruction set extensions provide additional support for vector operations using vector registers. The instruction set architecture defines which SIMD instruction set extensions are possible, but it is up to the specific processor implementation to decide which of those extensions to support, and which vector widths of the chosen SIMD extension it will implement. As a result, different CPUs using the same ISA may support different SIMD extensions and vector widths. The most common SIMD instruction set extensions for x86-64 are SSE (SSE 1 to 4), AVX, AVX2, and the most recent, AVX-512, while for AArch64 it is NEON, with more advanced optional extensions like SVE and SVE2. This is not to say that native-width vector registers are always the right choice; some algorithms are better suited to non-native-width vectors.

ISA	SIMD Instruction Set Extension	First-Class Native Vector Widths
x86-64	SSE, SSE2, SSE3, SSE4.1, SSE4.2	128 bits (fixed)
x86-64	AVX	256 bits (fixed, float only)
x86-64	AVX2	256 bits (fixed, int + float)
x86-64	AVX-512	512 bits (fixed)
AArch64	NEON	64 bits (fixed), 128 bits (fixed)
AArch64	SVE	128–2048 bits (scalable in 128-bit increments)
AArch64	SVE2	128–2048 bits (scalable in 128-bit increments, more types)

Note
Each generation of x86-64 SIMD extensions builds upon the previous one. When using narrower vector operations from older instruction sets, they operate on the lower portion of the wider registers introduced in newer extensions.

Note
To check what SIMD instruction set extensions your CPU supports, run lscpu, and look at the Flags output.

Vector Width and Performance

When writing vectorized code, you have two main approaches: use a non-native fixed vector width or adapt to the cpu's native vector width. If you choose a fixed non-native vector width that's smaller than the native registers by using older instruction sets, the hardware will still use its full-width registers but leave some lanes empty. This wastes the processor's parallel potential. If you choose a fixed vector width that's larger than the native registers, the system must combine multiple native registers to simulate the larger width. When too many registers are needed simultaneously, the program may spill (i.e., running out of physical registers and having to store data in much slower memory instead).

Rust API

Portable SIMD

std::simd module

Note
Portable SIMD in Rust requires the Rust nightly compiler. To install it, run rustup toolchain install nightly, then temporarily set it to the default compiler via rustup default nightly

Vendor SIMD Intrinsics

std::arch module

Note
Consider using Dynamic CPU Feature Detection instead, since this approach allows:

The same binary to work on all CPUs

Automatic use of the fastest available instructions

Users don't need to worry about compatibility issues

In contrast, the static CPU feature detection approach (using RUSTFLAGS) bakes specific SIMD instruction set extensions into the entire binary, which will crash on CPUs that don't support those features.

Example

Description

Given a slice of bytes haystack and a byte needle, find the first occurrence of needle in haystack.

Scalar Solution

#![allow(unused)]
fn main() {
fn scalar_find(haystack: &[u8], needle: u8) -> Option<uwidth> {
    haystack.iter().position(|&b| b == needle)
}
}

SIMD Solution

Portable SIMD Solution

#![allow(unused)]
#![feature(portable_simd)]
fn main() {
use std::simd::cmp::SimdPartialEq;
use std::simd::u8x32;

pub fn portable_simd_find(haystack: &[u8], needle: u8) -> Option<usize> {
    const VECTOR_WIDTH: usize = 32;

    // For short strings (less than 16 bytes), use simple iteration instead of SIMD
    if haystack.len() < VECTOR_WIDTH {
        return haystack.iter().position(|&b| b == needle);
    }

    // Perform SIMD on strings of at least 32 bytes.
    //
    // For every 32-byte chunk of the haystack, we create two SIMD vectors: one for the 16-byte
    // chunk of the haystack we're working on, `haystack_vec`, and one 16-byte SIMD vector containing
    // 16-bytes of just `needle`, `needle_vec` such that we can compare each byte against the needle in parallel using `simd_eq`,
    // producing a mask.
    //
    // This SIMD mask is a vector of booleans, like:
    //   [false, false, false, true, false, ...]
    //
    // We convert this mask into a compact bitmask with `to_bitmask()`, which gives us
    // a u32 where each bit corresponds to an entry in the SIMD mask:
    //
    //   - A `true` in the SIMD mask becomes a `1` in the bitmask.
    //   - A `false` becomes a `0`.
    //
    // Importantly: the first element of the mask (index 0) maps to the least significant bit (bit 0)
    // in the bitmask, mask[1] maps to bit 1, and so on. This means `bitmask.trailing_zeros()`
    // directly gives us the index of the first `true` in the mask.
    //
    // So if the bitmask is, say, 0b00001000, then the first match was at index 3.
    //
    // Using `bitmask.trailing_zeros()` gives us the index of the first `true` in the mask.
    // We then add the chunk offset to get the actual index in the full haystack.
    let needle_vec = u8x32::splat(needle);

    let mut offset = 0;
    while offset + VECTOR_WIDTH <= haystack.len() {
        let chunk = &haystack[offset..offset + VECTOR_WIDTH];
        let haystack_vec = u8x32::from_slice(chunk);
        let mask = haystack_vec.simd_eq(needle_vec);
        let bitmask = mask.to_bitmask();

        if bitmask != 0 {
            return Some(offset + bitmask.trailing_zeros() as usize);
        }
        offset += VECTOR_WIDTH;
    }

    // Handle remaining bytes (less than 32) that couldn't be processed with SIMD
    haystack[offset..]
        .iter()
        .position(|&b| b == needle)
        .map(|pos| offset + pos)
}
}

Intrinsics (NEON) SIMD Solution

#![allow(unused)]
fn main() {
pub fn intrinsics_simd_find(haystack: &[u8], needle: u8) -> Option<usize> {
    const VECTOR_WIDTH: usize = 16;

    // For short strings (less than 16 bytes), use simple iteration instead of SIMD
    if haystack.len() < VECTOR_WIDTH {
        return haystack.iter().position(|&b| b == needle);
    }

    // Process strings of 16+ bytes using SIMD instructions
    //
    // SIMD Strategy:
    // 1. Load 16 bytes of haystack into a SIMD vector
    // 2. Create a SIMD vector filled with 16 copies of the needle byte
    // 3. Compare all 16 positions simultaneously using vceqq_u8
    // 4. This produces a mask where matching positions become 0xFF, non-matches become 0x00
    //
    // Example mask: [0x00, 0x00, 0xFF, 0x00, 0xFF, 0x00, ...]
    //               (no match, no match, MATCH, no match, MATCH, no match, ...)
    //
    // Fast Detection:
    // - Use vmaxvq_u8 to find the maximum value in the mask
    // - If result is 0, no matches found (all bytes were 0x00)
    // - If result is non-zero, at least one match exists (at least one 0xFF present)
    //
    // Finding Exact Position:
    // When matches are found, we need to determine their exact locations:
    // 1. Reinterpret the 16-byte mask as two 64-bit integers for easier processing
    // 2. Check the lower 64 bits (bytes 0-7) and upper 64 bits (bytes 8-15) separately
    // 3. Within each 64-bit section, examine each byte by shifting and masking
    //    - Right-shift by (i * 8) bits to move byte i to the lowest position
    //    - Apply mask 0xFF to isolate just that byte
    //    - Check if the result equals 0xFF (indicating a match)
    //
    // Importantly: We use right bit shifts because of little-endian byte ordering — the byte
    // from haystack position 0 gets packed into the least significant (rightmost) bits of the 64-bit integer,
    // position 1 goes into the next 8 bits, and so on. By shifting right by `i * 8` bits, we move the byte from
    // haystack position `i` into the least significant position where we can extract it with a simple mask.
    let needle_vec = unsafe { vdupq_n_u8(needle) };

    let mut offset = 0;
    while offset + VECTOR_WIDTH <= haystack.len() {
        unsafe {
            let haystack_ptr = haystack.as_ptr().add(offset);
            let haystack_vec = vld1q_u8(haystack_ptr);

            let byte_mask = vceqq_u8(haystack_vec, needle_vec);

            if vmaxvq_u8(byte_mask) != 0 {
                let mask_as_u64x2 = vreinterpretq_u64_u8(byte_mask);
                let low_bits = vgetq_lane_u64(mask_as_u64x2, 0);
                let high_bits = vgetq_lane_u64(mask_as_u64x2, 1);

                if low_bits != 0 {
                    for i in 0..8 {
                        if ((low_bits >> (i * 8)) & 0xFF) == 0xFF {
                            return Some(offset + i);
                        }
                    }
                } else {
                    for i in 0..8 {
                        if ((high_bits >> (i * 8)) & 0xFF) == 0xFF {
                            return Some(offset + 8 + i);
                        }
                    }
                }
            }
        }
        offset += VECTOR_WIDTH;
    }

    // Handle remaining bytes (less than 16) that couldn't be processed with SIMD
    haystack[offset..]
        .iter()
        .position(|&b| b == needle)
        .map(|pos| offset + pos)
}
}