Test Site
Asymptotic Analysis
Bachmann-Landau Notation
Definition (big O)
Let \(f(n)\) and \(g(n)\) be functions mapping positive integers to positive real numbers. We say that \(f(n)\) is \(O(g(n))\) if there is a real constant \(c \gt 0\) and an integer constant \(n_0 \ge 1\) such that $$ f(n) \leq c \cdot g(n), \text{ for } n \geq n_0 $$
Simplification Rules
- Ignore non-dominant terms
- Ignore constant factors
Growth Rates Classifications
In \(T_{case}(n) = f(n) \in O(g(n))\), \(g(n)\) is one of the following:
Growth type | Function |
---|---|
Constant | \(1\) |
Logarithmic | \(\log n\) |
Linear | \(n\) |
Linearithmic | \(n \log n\) |
Quadratic | \(n^2\) |
Cubic | \(n^3\) |
Exponential | \(2^n\) |
Factorial | \(n!\) |
Worst Case Time Complexity
Iterative Algorithms
Complexity of Primitive Operations
The following primitive operations take \(O(1)\):
- Declaring a variable
- Assigning a value to a variable
- Following an object reference
- Performing an arithmetic operation
- Comparing two numbers
- Accessing a single element of an array by index
- Accessing the length of an array
- Calling a method
- Returning from a method
Complexity of Loops
The complexity of a loop is expressed as the sum of the complexities of each iteration of the loop. Whenever there's nested loops, begin by calculating the innermost loop and proceed outwards, which will give nested summations.
Some useful summation properties and formulas:
Property (big O and summation) $$ \sum_{i \in I} O(f(i)) = O\left(\sum_{i \in I} f(i)\right) $$
Property (summation of a constant) $$ \sum_{i = m}^{n} ca_i = c \sum_{i = m}^{n} a_i $$
Property (addition and subtraction of summations) $$ \sum_{i = m}^{n} (a_i \pm b_i) = \sum_{i = m}^{n} a_i \pm \sum_{i = m}^{n} b_i $$
Formula (summation of a \(1\)) $$ \sum_{i = m}^{n} 1 = n - m + 1 $$
Formula (summation of arithmetic series) $$ \sum_{i = 1}^{n} i = \frac{n(n + 1)}{2} $$
Formula (summation of quadratic series) $$ \sum_{i = 1}^{n} i^2 = \frac{n(n + 1)(2n + 1)}{6} $$
Formula (summation of cubic series) $$ \sum_{i = 1}^{n} i^3 = \left(\frac{n(n + 1)}{2}\right)^2 $$
Formula (summation of \(i^k\) series) $$ \sum_{i = 1}^{n} i^k \approx \frac{1}{k + 1}n^{k + 1} $$
Formula (summation of geometric series) $$ \sum_{i = 1}^{n}ar^{i - 1} = a\left(\frac{r^n - 1}{r - 1}\right) = a\left(\frac{1 - r^n}{1 - r}\right), \text{ where } r > 0 \text{ and } r \neq 1 $$
Formula (summation of harmonic series) $$ \sum_{i = 1}^{n} \frac{1}{i} \approx \ln n + \gamma, \text{ where } \gamma \approx 0.5772 \dots $$
Formula (summation of \(\log_{2}\) series) $$ \sum_{i = 1}^{n} \log_2 i = O\left(\sum_{i = 1}^{n} \log_2 n\right) = O(n\log_2 n) $$
General Steps
- Calculate the total cost by summing the costs of statements where the statements may be primitive operations or loops
- Apply Bachmann-Landau notation simplification rules
Note
When analyzing the time and space complexity of graph algorithms, the standard formula is \(O(V + E)\) where \(V\) is vertices and \(E\) is edges. However, whenever the state encompasses more than just the vertex position, we need to reframe our analysis:
Standard case: State = vertex position only
- Time complexity: \(O(V + E)\)
- We visit each vertex at most once
State-space case: State = (vertex, additional_constraints)
- Let \(S\) = total number of possible states
- Time complexity: \(O(S + E_{state})\) where \(E_{state}\) is the number of state transitions
- We visit each state at most once, but may revisit vertices under different constraints
Example: In a shortest path problem with fuel constraints on an \(m \times n\) grid:
- State = (grid_position, remaining_fuel)
- \(S = m \times n \times k\) where \(k\) is max fuel
- \(E_{state} \approx 4 \times m \times n \times k\) (each state can transition to ~4 neighbors)
- Time complexity: \(O(S + E_{state}) = O(mnk + 4mnk) = O(mnk)\)
The key insight is to analyze complexity based on unique states processed, not just unique vertices visited.
Recursive Algorithms
- Come up with a recurrence relation of the following form
$$ T(n)= aT\left(\frac{n}{b}\right)+f(n) $$
- \(T(n)\): Time complexity for input size \(n\)
- \(a\): Number of recursive calls
- \(b\): Factor by which the problem size is divided
- \(f(n)\): Time complexity of the work done outside the recursive calls
- Use backward substitution, recursion tree, or master theorem to solve the recurrence relation
Worst Case Space Complexity
Common data structure space complexities:
- Variable or pointer: \(O(1)\)
- Array: \(O(n)\)
- Node-Pointer Data Structure: \(O(n)\)
- Matrices: \(O(m \cdot n)\)
- Adjacency List: \(O(V + E)\)
Iterative Algorithms
- Add up the space complexities of the intermediate data structures
- Apply Bachmann-Landau notation simplification rules
Recursive Algorithms
- Add up the space complexities of the intermediate data stuctures
- For a recursive function, total cost as follows
$$ \text{Cost} = \text{call stack depth} \cdot \text{cost per call} $$
- Apply Bachmann-Landau notation simplification rules
Note
The algorithm's input and output is typically excluded from the total space complexity cost.
Algorithmic Patterns
Classic Two Pointers Pattern
Two Pointers Opposite
Use Case
The list input is sorted, and you are looking for a pair (or pairs) that meet a specific condition.
Usage
def two_pointers_opposite(array):
left = 0
right = len(array) - 1
result = 0
while left < right:
# Some logic involving `array[left]` and/or `array[right]`
if <condition>:
left += 1
else:
right -= 1
return result
Two Pointers Same Speed
Use Case
The two list inputs are sorted, and you want to compare elements between the two.
Usage
def two_pointers_same_speed(array1, array2):
i = 0
j = 0
result = 0
while i < len(array1) and j < len(array2):
# Some logic involving `array1[i]` and `array2[j]`
if <condition>:
i += 1
else:
j += 1
while i < len(array1):
# Some logic involving `array1[i]`
i += 1
while j < len(array2):
# Some logic involving `array2[j]`
j += 1
return result
Sliding Window Pattern
Fixed Sliding Window
Use Case
You are looking for a subarray or substring of some length k
that satisfies a certain constraint.
Usage
def fixed_sliding_window(array, k):
current = 0
result = 0
for i in range(k):
# Some logic to build first window involving `current` and/or other variables
for right in range(k, len(array)):
# Add `array[i]` to window
# Remove `arr[i - k]` from window
# Update `result`
return result
Variable Sliding Window
Use Case
You are looking for a subarray or substring (usually the longest) that satisfies a certain constraint.
Usage
def variable_sliding_window(array):
left = 0
current = 0
result = 0
for right in range(len(array)):
# Some logic to add `array[right]` to `current`
while <broken window condition>:
# Remove `array[left]` from `current`
left += 1
# Update `result`, where current window length
# at any time is `right - left + 1`
return result
Note (When this Technique Fails)
The constraint must be preserved as the sliding window shrinks. If shrinking the window can break the constraint, consider using a prefix sum + hash map technique instead.
Note (Length of a Window)
The formula for the length of a window isright - left + 1
.
Prefix Sum Pattern
Use Case
You need a subroutine which involves finding the sum of any subarray in \(O(1)\).
Usage
def build_prefix_sum(nums):
prefix_sum = [0]
for num in nums:
prefix_sum.append(prefix_sum[-1] + num)
return prefix_sum
# Usage:
# To get the sum from [i, j), perform `prefix[j] - prefix[i]`
HashMap and HashSet Pattern
HashMap
Use Case
- Track elements seen so far for uniqueness (with any extra info stored as the value)
- Frequency counting
- Basic mapping
- Memoization table
- Count number of subarrays with a constraint that doesn't necessarily hold as you shrink the window
def count_num_subarrays_with_exact_constraint(array, k):
freq = {0 : 1} # prefix sum frequency map
prefix_sum = 0
result = 0
for num in array:
# Some logic to change `prefix_sum` as necessary (e.g. prefix_sum += num)
result += freq.get(prefix_sum - k, 0)
freq[prefix_sum] += freq.get(prefix_sum, 0) + 1
return result
Usage
# Create an empty map
my_map = dict()
# Create a map with initial values
my_map = {
<key 1>: <value 1>,
<key 2>: <value 2>
}
# Add new entry or update current entry
my_map[<immutable key>] = <value>
# Remove an entry
del my_map[<key>]
# Remove all entries
my_map.clear()
# Get number of entries
len(my_map)
# Check if my_map is empty
not my_map
# Get value
my_map[<key>]
# Get the value with a default value if key isn't found
# Note: I like thise over `defaultdict(<default value type>)` for counting since it can avoid accidentally creating phantom keys
my_map.get(<key>, <default value>)
# Note: pretty much a must for adjacency list creation
from collections import defaultdict
my_map = defaultdict(<default value's type>)
# Check if key exists
<key> in my_map
# Get all entries
my_map.items()
# Get all keys
my_map.keys()
# Get all values
my_map.values()
HashSet
Use Case
- Track elements seen so far for uniqueness
- Store a chunk (or all) of the input for fast lookups
#### Usage
# Create an empty set
my_set = set()
# Add an element
my_set.add(<element>)
# Check if a set contains an element
<element> in my_set
# Remove an element
my_set.remove(<element>)
# Get the set difference
my_set.difference(other_set)
# Get the set intersection
my_set.intersection(other_set)
# Get the set union
my_set.union(other_set)
# Remove all elements
my_set.clear()
# Get the number of elements
len(my_set)
# Check if the set is empty
not my_set
Fast and Slow Pointers/Floyd's Cycle Detection Pattern
Use Case
- Detect cycles in linked list
- Find middle of a linked list
Usage
def fast_and_slow_pointers(head):
fast = head
slow = head
result = <initial value>
while fast and fast.next:
# Some logic
fast = fast.next.next
slow = slow.next
Reversing a Linked List Pattern
Use Case
- The problem largely involves reversing pointers in a linked list
- You need a subroutine which involves classically reversing pointers in a linked list
Usage
def reverse_linked_list(head):
prev_node = None
curr_node = head
while curr_node:
next_node = curr_node.next
curr_node.next = prev_node
prev_node = curr_node
curr_node = next_node
**Note (Dummy Head)
Create a dummy head nodedummy = listnode(0, head)
to reduce edge cases.
Stack and Queue Pattern
Stack
Use Case
Elements in the input interacting with each other, with a LIFO order.
Usage
# Create an empty stack
stack = list()
# Push an element onto the top
stack.append(<element>)
# Pop the element at the top
stack.pop()
# Get the element at the top
stack[-1]
# Get the number of elements in the stack
len(stack)
# Check if the stack is empty
not stack
Note (\(O(n)\) String Building)
We often use the stack to store the result and convert it to a string using the \(O(n)\) string builder Technique.
def build_string(string):
array = []
for char in string:
array.append(char)
return "".join(array)
Queue
Use Case
Processing elements in a FIFO order.
Usage
from collections import deque
# Create a queue from an initial list
queue = deque(<initial list>)
# Enqueue an element at the back
queue.append(<element>)
# Dequeue the element at the front
queue.popleft()
# Get the element at the front
queue[0]
# Get the number of elements
len(queue)
# Check if queue is empty
not queue
Monotonic Stack and Monotonic Deque Pattern
Monotonic Stack
Use Case
- Looking for the "next" or "previous" "greater" or "smaller" element
- "Span until" i.e. counting a range
Usage
def monotonic_non_decreasing_stack(array):
stack = []
result = <initial value>
for i in range(len(array)):
while stack and array[stack[-1]] > array[i]:
stack.pop()
# Utilize stack.pop() in result
# e.g. result[stack.pop()] = <something involving i>
# e.g. result += stack.pop()
stack.append(i)
return result
Monotonic Deque
Use Case
Maximum or minimum values in sliding window or ranges.
Usage
from collections import deque
def monotonic_non_increasing_deque(array, k):
result = []
dq = deque()
for i in range(len(array)):
while dq and array[dq[-1]] < array[i]:
dq.pop()
dq.append(i)
if dq[0] <= i - k:
dq.popleft()
if i >= k - 1:
result.append(array[dq[0]])
return result
Note (Monotonicity)
A monotonic increasing/decreasing stack or queue implies that the elements are always increasing/decreasing, while a monotonic non-decreasing/non-increasing one may include repeated elements.
Note (Indices Tweak)
Store indices when you want to calculate distances or want to mutate the result list later.
Binary Tree Traversal Patterns
Binary Tree Depth-First Search
Use Case
Most binary tree problems that don't involve processing nodes by their levels.
Usage
def recursive_preorder_dfs(root):
if not root:
return <base case result>
# Additional base cases
# Some logic involving the current node and the current result
recursive_preorder_dfs(root.left)
recursive_preorder_dfs(root.right)
return <current result and the two recursive calls above>
def iterative_preorder_dfs(root):
stack = [root]
result = <initial value>
while stack:
node = stack.pop()
# Additional base cases
# Some logic involving the popped node and the result
if node.right:
stack.append(node.right)
if node.left:
stack.append(node.left)
return result
Note (Edge Cases)
- Empty tree (i.e.
None
root)- Single node tree
- Skewed tree (i.e. unbalanced trees where all nodes are on one side)
- Duplicate values in a binary search tree
Note (Common Depth-First Search Visit Order)
In the iterative depth-first search, the flow is usually pre-orderpop node → process node → push right → push left
, while in the recursive depth-first search, pre-orderprocess node → recurse left → recurse right
is the most common, followed by post-orderrecurse left → recurse right → process node
, then in-orderrecurse left → process node → recurse right
.
Note (Additional Context)
In a recursive depth-first search, when you need to track additional context or values across recursive calls—such as parent nodes, path lengths, or accumulated data—you include that information as an additional parameter. In an iterative depth-first search, you would store them as part of a tuple(node, <value 1>, <value 2>, <…>, <value N>)
that'll be pushed and popped from the explicit stack you create outside the while loop.
Note (Keeping Track of the Final Result)
In a recursive depth-first search,result
is typically implicit since it's usually sufficient to implicitly be returned, but an explicitresult
is sometimes a good choice, where you create it within the scope of the provided function, then create and call an inner depth-first search function to do the actual work. In an iterative depth-first search, you typically create an explicitresult
variable outside the while loop.
Note (BST Techniques)
BST problems typically use DFS traversal. Common Techniques include:
- Checking if the current node's value is within bounds (e.g.,
low <= node.val <= high
)- Leveraging the BST property to prune subtrees — if
node.val < low
, skip the left subtree; ifnode.val > high
, skip the right subtree (wherelow
orhigh
can also just be a target value)- Using inorder traversal to collect values in sorted order for problems requiring sorted data without explicit sorting.
Binary Tree Breadth-First Search
Use Case
Binary tree problems that involve processing nodes by their levels.
Usage
from collections import deque
def bfs(root):
if not root:
return
queue = deque([root])
result = 0
while queue:
level_width = len(queue)
# Some logic involving the current level
for _ in range(level_width):
node = queue.popleft()
# Some logic involving the current node
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
return result
Graph Traversal Patterns
Adjacency List Graph Depth-First Search
Use Case
Most graph problems.
Usage
def adjacency_list_recursive_dfs(graph):
def dfs(vertex):
visited.add(vertex)
result = <initial value>
for neighbour in graph[vertex]:
if neighbour not in visited:
result += dfs(neighbour)
return result
result = <initial value>
visited = set()
for vertex in graph:
if vertex not in visited:
dfs(vertex)
# Some logic involving the result per connected component
def adjacency_list_iterative_dfs(graph):
def dfs(vertex):
result = <initial value>
stack = [vertex]
while stack:
vertex = stack.pop()
visited.add(vertex)
for neighbour in graph[vertex]:
if neighbour not in visited:
result += <calculation>
stack.append(neighbour)
return result
result = <initial value>
visited = set()
for vertex in graph.keys():
if vertex not in visited:
dfs(vertex)
# Some logic involving the result per connected component
Matrix Graph Depth-First Search (Flood Fill)
Use Case
Most graph problems where the input is a matrix.
Usage
def matrix_recursive_dfs(matrix):
ROWS = len(matrix)
COLUMNS = len(matrix[0])
# NOTE: each element is of the form (change in row, change in col)
DIRECTIONS = [(1, 0), (-1, 0), (0, -1), (0, 1)]
def valid_cell(row, col):
return 0 <= row < ROWS and 0 <= col < COLUMNS and <another condition for a cell to be valid>
def dfs(row, col):
visited.add((row, col))
result = <initial value>
for dr, dc in DIRECTIONS:
neighbour_row = row + dr
neighbour_col = col + dc
if valid_cell(neighbour_row, neighbour_col) and (neighbour_row, neighbour_col) not in visited:
result += dfs(neighbour_row, neighbour_col)
return result
visited = set()
result = <initial value>
for row in range(ROWS):
for col in range(COLUMNS):
if (row, col) not in visited:
dfs(row, col)
# Some logic involving the result per connected component
return result
def matrix_iterative_dfs(matrix):
ROWS = len(matrix)
COLUMNS = len(matrix[0])
# NOTE: each element is of the form (change in row, change in col)
DIRECTIONS = [(1, 0), (-1, 0), (0, -1), (0, 1)]
def valid_cell(row, col):
return 0 <= row < ROWS and 0 <= col < COLUMNS and <another condition for a cell to be valid>
def dfs(row, col):
stack = [(row, col)]
result = <initial value>
while stack:
row, col = stack.pop()
visited.add((row, col))
for dr, dc in DIRECTIONS:
neighbour_row = row + dr
neighbour_col = col + dc
if valid_cell(neighbour_row, neighbour_col) and (neighbour_row, neighbour_col) not in visited:
result += <some calculation>
stack.append((neighbour_row, neighbour_col))
return result
visited = set()
result = <initial value>
for row in range(ROWS):
for col in range(COLUMNS):
if (row, col) not in visited:
dfs(row, col)
# Some logic involving the result per connected component
return result
Adjacency List Graph Breadth-First Search
Use Case
Distance in a graph.
Usage
from collections import deque
def adjacency_list_bfs(graph):
queue = deque([(<source vertex>,<additional state>, <initial distance>)])
visited = set([<source vertex>])
while queue:
vertex, <additional state>, dist = queue.popleft()
if vertex == <destination vertex>:
return dist
for neighbour in graph[vertex]:
if neighbour not in visited:
visited.add(neighbour)
queue.append((neighbour, <additional state>, dist + 1))
# Some logic involving the neighbour
Matrix Graph Breadth-First Search
Use Case
Distance in a graph given as a matrix.
Usage
def matrix_iterative_bfs(matrix):
ROWS = len(matrix)
COLUMNS = len(matrix[0])
# NOTE: each element is of the form (change in row, change in col)
DIRECTIONS = [(0, 1), (1, 0), (1, 1), (-1, -1), (-1, 1), (1, -1), (-1, 0), (0, -1)]
def valid_cell(row, col):
return 0 <= row < ROWS and 0 <= col < COLUMNS and <another condition for a cell to be valid>
queue = deque([(<source vertex>,<additional state>, <initial distance>)])
visited = set([(<source vertex>, <initial distance>)])
while queue:
row, col, dist = queue.popleft()
if (row, col) == (<destination row>, <destination col>):
return dist
for dr, dc in DIRECTIONS:
neighbour_row = row + dr
neighbour_col = col + dc
neighbour_dist = dist + 1
if (neighbour_row, neighbour_col) not in visited and valid_cell(neighbour_row, neighbour_col):
visited.add((neighbour_row, neighbour_col))
queue.append((neighbour_row, neighbour_col, <additional state>, neighbour_dist))
# Some logic involving the neighbour's row and col
Note (Various Types of Graph Inputs)
Unlike linked lists and binary trees, which we are givenhead
orroot
respectively, there are various graph inputs:
- Matrix: A 2D list, where each element will represent a vertex, but are not numbered
0
ton
, its neighbours are the adjacent squares, and the edges are determined by the problem description.- Edge list: A list of edges
edges
. It's useful to turn it into an adjacency list.from collections import defaultdict def build_adjacency_list_graph(edges): graph = defaultdict(list) for u, v in edges: graph[x].append(y) graph[y].append(x) # comment out this line if the input is a directed graph return graph
- Integer Adjacency List: A 2D list of integers
graph
, wheren
nodes are numbered from0
ton - 1
, andgraph[i]
represents the neighbours of nodei
.- Integer Adjacency Matrix: A 2D list of integers, where
n
nodes are numbered from0
ton - 1
, thereby forming ann x n
square matrix, and where whengraph[i][j] == 1
, there exist an edge between nodei
and nodej
, and whengraph[i][j] == 0
, there is no edge between nodei
and nodej
. It's also useful to pre-process it into an adjacency list.from collections import defaultdict def build_adjacency_list_graph(adjacency_matrix): graph = defaultdict(list) n = len(adjacency_matrix) for i in range(n): for j in range(i + 1, n): if adjacency_matrix[i][j]: graph[i].append(j) graph[j].append(i) # comment out this line if the input is a directed graph return graph
Although, even if the input is none of the above, it may still be an implicit graph problem, often where vertices aren't explicitly given, but can be generated on the fly through valid transitions or transformations. These problems typically involve:
- A starting state and a goal/end state
- A defined set of valid transitions or mutations
- Optional constraints like invalid/intermediate states
Note (
visited
Container)
visited
is typically a HashSet, but you might achieve better runtime performance by using a boolean array when the node range is predetermined (which is typical since graph problems usually number nodes from0
ton - 1
)
Note (Prohibited Vertices)
Whenever the problem mentions prohibited vertices, then that's an indicator to add them straight away to thevisited
container.
Note (Distance vs Path Length for BFS)
When using BFS to find shortest paths:
- If the problem asks for distance (number of moves/steps), initialize source vertices with
distance = 0
- If the problem asks for path length (number of cells in the path), initialize source vertices with
path_length = 1
Note (Multi-source BFS)
For a multi-source BFS, create a for loop that visits all source nodes and appends them to the queue for the BFS.
Note (Reframing the Problem)
For graph problems, it's useful to rephrase the problem in terms of its inverse. For instance, take LeetCode #1557. The original problem description asks us to find the smallest set of vertices from which all nodes in the graph are reachable. Instead, we can rephrase the problem description in terms of its inverse — find the smallest set of nodes that cannot be reached from other nodes, since if a node can be reached from another node, then we would rather just include the pointer rather than the pointee in our set. Another example is LeetCode #542. The brute force solution would be to perform BFS for each cell with a 1, but instead, we can perform a multi-source BFS by performing starting from all cells with a 0 (if we have a cellx
with value 1 and its nearest cell y has value 0, then it doesn't make a difference if we traverse fromx -> y
ory -> x
— both give the same distance).
Priority Queue Pattern
Use Case
- Repeatedly find the maximum or minimum element
- Get the "top" \(k\) elements (create an empty priority queue that prioritizes the elements you want to discard — use a min heap to keep the largest
k
elements, or a max heap to keep the smallestk
; insert elements from the input list, and whenever the queue exceeds sizek
, pop the root to maintain only the topk
you care about)
import heapq
def top_k(array, k):
pq = []
for num in array:
# Some logic to push add an element according to problem's criteria
heapq.heappush(pq, (<criteria as key>, num))
if len(pq) > k:
heapq.heappop(pq)
return [num for _, num in pq]
- Finding a median (involves one max heap priority queue for the lower half of the list, and one min heap priority queue for the upper half of the list)
Usage
import heapq
# Create an empty priority queue
priority_queue = []
heapq.heapify(priority_queue)
# Add an element
heapq.heappush(priority_queue, <element>)
# Remove the top element
heapq.heappop(priority_queue)
# Peek at the top element
priority_queue[0]
# Get the number of elements
len(priority_queue)
# Check if the priority queue is empty
not priority_queue
Note (Heaps in Python's Standard Library)
Python'sheapq
module only implements min‐heaps. To simulate a max heap, first build a new listpq
from the elements of the input but negated using-
, callheapq.heapify(pq)
, and then negate each popped value to get back the original.
Note (Using a Key for Priority)
To assign priority based on a specific key, wrap each item in your list as a tuple:(key, element)
. For max heaps, you must negate the key by-
, and if the question involves tie-breakers, negate the element by-
.
Greedy Algorithm Pattern
Use Case
Typically finding the minimum or the maximum of a property of the input array.
Usage
- Determine if you should greedily pick the minimum or the maximum at each step.
- Sort the input array (often you may need to sort based on the frequency of each element using
Counter(<list>)
). - Iterate over the sorted array and increment the result, making use of a heap as needed.
Binary Search Pattern
On an Array
Use Case
For some input array, array
, and a desired element target
, array
must be sorted, and you want to:
- Find the index of
target
if it is inarray
. - Find the lower bound (first index) or upper bound (last index) at
target
can be inserted to maintain the sorted order ofarray
.
Usage
def binary_search(array, target):
left = 0
right = len(array) - 1
while left <= right:
mid = (left + right) // 2
if array[mid] == target:
# Some logic
return
if array[mid] < target:
left = mid - 1
else:
left = mid + 1
return left # if `target` is not in `array`, `left` is at the insertion point
def lower_bound(array, target):
"""
Finds the first index at which `target` can be inserted in `array`
to maintain sorted order (i.e., the lower bound).
If `target` exists in `array`, returns the index of its first occurrence.
If `target` does not exist, returns the index where it can be inserted.
Equivalent to bisect.bisect_left in the Python standard library.
Parameters:
array (List[int]): A sorted list of integers.
target (int): The value to search for.
Returns:
int: The index of the lower bound for `target`.
"""
left = 0
right = len(array)
while left < right:
mid = (left + right) // 2
if arr[mid] < target:
left = mid + 1
else:
right = mid
return left
def upper_bound(array, target):
"""
Finds the first index at which a value greater than `target`
can be inserted in `array` to maintain sorted order (i.e., the upper bound).
Returns the index of the first element greater than `target`.
If all elements are less than or equal to `target`, returns `len(array)`.
Equivalent to bisect.bisect_right in the Python standard library.
Parameters:
array (List[int]): A sorted list of integers.
target (int): The value to search for.
Returns:
int: The index of the upper bound for `target`.
"""
left = 0
right = len(array)
while left < right:
mid = (left + right) // 2
if arr[mid] <= target:
left = mid + 1
else:
right = mid
return left
Note (Duplicate Elements)
binary_seach()
doesn't work ifarray
contains duplicates, butlower_bound()
andupper_bound()
does allow duplicates inarray
.
Note (Descending Elements)
If the input array is sorted in descending order, simply invert the inequality in the if condition.
On a Solution Space
Use Case
You're trying to find a maximum or minimum value, and:
- You can verify (usually with a greedy algorithm) in \(O(n)\) time (or faster) whether a given candidate
x
is a valid solution. - If
x
is a valid solution:- For a maximum, \(\text{all values} \le x\) are also valid.
- For a minimum, \(\text{all values} \ge x\) are also valid.
- If
x
is not a valid solution:- For a maximum, \(\text{all values} \gt x\) are also invalid.
- For a minimum, \(\text{all values} \lt x\) are also invalid.
Usage
def binary_search_minimum(array):
def is_valid(x):
# Some O(n) algorithm (usually also a greedy algorithm)
return <boolean>
left = <min possible answer>
right = <max possible answer>
while left <= right:
mid = (left + right) // 2
if is_valid(mid):
right = mid - 1
else:
left = mid + 1
return left
def binary_search_maximum(array):
def is_valid(x):
# Some O(n) algorithm (usually also a greedy algorithm)
return <boolean>
left = <min possible answer>
right = <max possible answer>
while left <= right:
mid = (left + right) // 2
if is_valid(mid):
left = mid + 1
else:
right = mid - 1
return right
Note (Safely Calculating
mid
)
For most languages, to avoid integer overflow, calculatemid
using this formula insteadleft + (right - left) / 2
.
Backtracking (Complete Search)
Unconstrained Backtracking
Use Case
Generating permutations, subsets, or combinations
Template
class Solution:
def permute(self, nums: List[int]) -> List[List[int]]:
result = []
def backtrack(path):
if len(path) == len(nums):
result.append(path.copy())
return
for num in nums:
if num not in path:
path.append(num)
backtrack(path)
path.pop()
backtrack([])
return result
class Solution:
def subsets(self, nums: List[int]) -> List[List[int]]:
result = []
def backtrack(path, start):
if start > len(nums):
return
result.append(path.copy())
for i in range(start, len(nums)):
curr.append(nums[i])
backtrack(path, i + 1)
path.pop()
backtrack([], 0)
return result
class Solution:
def combine(self, n: int, k: int) -> List[List[int]]:
result = []
def backtrack(path, start):
if len(path) == k:
result.append(path.copy())
return
for num in range(start, n + 1):
path.append(num)
backtrack(path, num + 1)
path.pop()
backtrack([], 1)
return result
Backtracking with pruning
Use Case
Constraints
Usage
TO DO
Note (Backtracking as a Decision Tree)
A backtracking solution can be visualized as a tree, where each node represents the current state of the path during a recursive function call. Thebacktrack()
calls explore different branches of this tree, building potential solutions along the way. The leaves of the tree correspond to base cases—often representing complete solutions, though not necessarily in every problem.
Bit Manipulation
Basic Bitwise Operators
Bitwise NOT
Syntax
!a
Behaviour
The output is the inverted input (i.e. 0
becomes 1
and 1
becomes 0
).
Input | Output
------|-------
1 | 0
0 | 1
Bitwise AND
Syntax
a & b
Behaviour
If both inputs are a 1
, the output is a 1
.
Input 1 | Input 2 | Output
--------|---------|-------
0 | 0 | 0
1 | 0 | 0
0 | 1 | 0
1 | 1 | 1
Bitwise OR
Syntax
a | b
Behaviour
If at least one input is a 1
, the output is a 1
.
Input 1 | Input 2 | Output
--------|---------|-------
0 | 0 | 0
1 | 0 | 1
0 | 1 | 1
1 | 1 | 1
Bitwise XOR
Syntax
a ^ b
Behaviour
If exactly one input is a 1
, then the output will be a 1
.
Input 1 | Input 2 | Output
--------|---------|-------
0 | 0 | 0
1 | 0 | 1
0 | 1 | 1
1 | 1 | 0
Bitwise Left Shift
Syntax
n << k
, where n
, k
are integers.
Behaviour
Shifts the bits of an integer $n$ towards the left by \(k\) places, and filling the bottom \(k\) bits with zeroes.
Mathematically, n << k
is equivalent to \(n \cdot 2^k\).
Bitwise Right Shift
Syntax
n >> k
, where n
, k
are integers.
Behaviour
Shifts the bits of an integer $n$ towards the right by \(k\) places, and filling the top \(k\) bits with zeroes.
Mathematically, n >> k
is equivalent to \(\lfloor\frac{n}{2^k}\rfloor\).
Bit Mask
A bit mask is a sequence of bits used to set, clear, or isolate specific bits in another value using bitwise operators.
The bit mask is typically a constant integer matching the same length as the value which we're masking on.
Data Representation
Number Systems
Decimal System
Decimal is a number system that uses base 10, characterized by two fundamental properties:
- Each digit position can hold one of 10 unique values (0 through 9). Values greater than 9 require carrying to an additional digit position to the left.
- Each digit's position determines its contribution to the overall value through positional notation. Digits are labeled from right to left as \(d_0, d_1, d_2, \dots, d_{N - 1}\) where each successive position represents ten times the value of the position to its right.
More formally, any \(N\)-digit decimal number can be expressed using positional notation as:
$$ (d_{N-1} \cdot 10^{N-1}) + (d_{N-2} \cdot 10^{N-2}) + \dots + (d_2 \cdot 10^2) + (d_1 \cdot 10^1) + (d_0 \cdot 10^0) $$
Decimal numbers are typically written without prefixes, though they may be denoted as \(N_{10}\) when clarification is needed.
Binary System
Binary is a number system that uses base 2, characterized by:
- Each digit position (called a bit) can hold one of two unique values: 0 or 1. Values greater than 1 require carrying to an additional bit position to the left.
- Each bit's position determines its contribution through powers of 2: \(2^{N-1}, \dots, 2^1, 2^0\).
Binary numbers are typically prefixed with 0b
.
Binary Terminology and Groupings
Since individual bits represent limited information, they are commonly grouped:
- Byte: 8 bits (the smallest addressable unit of memory)
- Nibble: 4 bits (half a byte)
- Word: The natural data width of a CPU (typically 32 or 64 bits)
In any binary sequence, we identify:
- Most Significant Bit (MSB): The leftmost bit
- Least Significant Bit (LSB): The rightmost bit
Hexadecimal System
Hexadecimal uses base 16, characterized by:
- Each digit position can hold one of 16 unique values. Since we only have 10 decimal digits (0 to 9), letters A to F represent values 10 to 15 respectively.
- Each digit's position determines its contribution through powers of 16: \(16^{N-1}, \dots, 16^1, 16^0\).
Hexadecimal numbers are prefixed with 0x
.
Hexadecimal-Binary-Decimal Conversion Table
Hex | Binary | Decimal |
---|---|---|
0 | 0000 | 0 |
1 | 0001 | 1 |
2 | 0010 | 2 |
3 | 0011 | 3 |
4 | 0100 | 4 |
5 | 0101 | 5 |
6 | 0110 | 6 |
7 | 0111 | 7 |
8 | 1000 | 8 |
9 | 1001 | 9 |
A | 1010 | 10 |
B | 1011 | 11 |
C | 1100 | 12 |
D | 1101 | 13 |
E | 1110 | 14 |
F | 1111 | 15 |
Converting Between Number Systems
Decimal to Any Base (Division method)
- Repeatedly divide the decimal number by the target base
- Record the remainder at each step
- Continue until the quotient becomes 0
- Read the remainders from bottom to top to get the result
Any Base to Decimal (Addition of Positional Notation)
For a number with digits \(d_{N-1}, d_{N-2}, \dots, d_1, d_0\) in base \(B\), the converted number to base \(C\) is calculated using the following formula:
$$ (d_{N-1} \cdot B^{N-1}) + (d_{N-2} \cdot B^{N-2}) + \dots + (d_1 \cdot B^11) + (d_0 \cdot B^0) $$
Binary to Hexadecimal
- Group binary digits into sets of 4 bits from right to left
- Pad the leftmost group with leading zeros if necessary
- Convert each 4-bit group to its hexadecimal equivalent
Hexadecimal to Binary
- Convert each hexadecimal digit to its 4-bit binary equivalent
- Concatenate all binary groups
Computer Data Units
Memory and Storage Capacities
Unit | Value |
---|---|
1 KB (Kilobyte) | \(2^{10}\) bytes (1,024 bytes) |
1 MB (Megabyte) | \(2^{20}\) bytes (1,048,576 bytes) |
1 GB (Gigabyte) | \(2^{30}\) bytes (1,073,741,824 bytes) |
1 TB (Terabyte) | \(2^{40}\) bytes (1,099,511,627,776 bytes) |
1 PB (Petabyte) | \(2^{50}\) bytes (1,125,899,906,842,624 bytes) |
1 EB (Exabyte) | \(2^{60}\) bytes (1,152,921,504,606,846,976 bytes) |
Data Transfer Units
Unit | Value |
---|---|
1 Kilobit (Kb) | 1,000 bits |
1 Megabit (Mb) | 1,000,000 bits |
1 Gigabit (Gb) | 1,000,000,000 bits |
1 Terabit (Tb) | 1,000,000,000,000 bits |
Character Representation
ASCII
ASCII is a character encoding standard which uses 8 bits per character, where 7 bits encode the actual character and the eighth bit serves as a parity bit for error detection, where the added bit ensures either an even number of 1 (even parity) or an odd number of 1 (odd parity) in the complete byte.
Types of ASCII Characters
- Non-printable characters
- Control characters, from
0x00
to0x1F
(i.e. \(0_{10}\) to \(31_{10}\)) - Delete character,
0x7F
(i.e. \(127_{10}\))
- Control characters, from
- Printable characters, from
0x20
to0x7E
(\(32_{10}\) to \(126_{10}\)).
ASCII Table
Hex | Char | Description |
---|---|---|
00 | NUL | Null character |
01 | SOH | Start of Heading |
02 | STX | Start of Text |
03 | ETX | End of Text |
04 | EOT | End of Transmission |
05 | ENQ | Enquiry |
06 | ACK | Acknowledge |
07 | BEL | Bell |
08 | BS | Backspace |
09 | HT | Horizontal Tab |
0A | LF | Line Feed |
0B | VT | Vertical Tab |
0C | FF | Form Feed |
0D | CR | Carriage Return |
0E | SO | Shift Out |
0F | SI | Shift In |
10 | DLE | Data Link Escape |
11 | DC1 | Device Control 1 (XON) |
12 | DC2 | Device Control 2 |
13 | DC3 | Device Control 3 (XOFF) |
14 | DC4 | Device Control 4 |
15 | NAK | Negative Acknowledge |
16 | SYN | Synchronous Idle |
17 | ETB | End of Transmission Block |
18 | CAN | Cancel |
19 | EM | End of Medium |
1A | SUB | Substitute |
1B | ESC | Escape |
1C | FS | File Separator |
1D | GS | Group Separator |
1E | RS | Record Separator |
1F | US | Unit Separator |
20 | SP | Space |
21 | ! | Exclamation mark |
22 | " | Double quote |
23 | # | Number sign |
24 | $ | Dollar sign |
25 | % | Percent sign |
26 | & | Ampersand |
27 | ' | Apostrophe |
28 | ( | Left parenthesis |
29 | ) | Right parenthesis |
2A | * | Asterisk |
2B | + | Plus sign |
2C | , | Comma |
2D | - | Hyphen-minus |
2E | . | Period |
2F | / | Slash |
30 | 0 | Digit 0 |
31 | 1 | Digit 1 |
32 | 2 | Digit 2 |
33 | 3 | Digit 3 |
34 | 4 | Digit 4 |
35 | 5 | Digit 5 |
36 | 6 | Digit 6 |
37 | 7 | Digit 7 |
38 | 8 | Digit 8 |
39 | 9 | Digit 9 |
3A | : | Colon |
3B | ; | Semicolon |
3C | < | Less than |
3D | = | Equal sign |
3E | > | Greater than |
3F | ? | Question mark |
40 | @ | At sign |
41 | A | Uppercase A |
42 | B | Uppercase B |
43 | C | Uppercase C |
44 | D | Uppercase D |
45 | E | Uppercase E |
46 | F | Uppercase F |
47 | G | Uppercase G |
48 | H | Uppercase H |
49 | I | Uppercase I |
4A | J | Uppercase J |
4B | K | Uppercase K |
4C | L | Uppercase L |
4D | M | Uppercase M |
4E | N | Uppercase N |
4F | O | Uppercase O |
50 | P | Uppercase P |
51 | Q | Uppercase Q |
52 | R | Uppercase R |
53 | S | Uppercase S |
54 | T | Uppercase T |
55 | U | Uppercase U |
56 | V | Uppercase V |
57 | W | Uppercase W |
58 | X | Uppercase X |
59 | Y | Uppercase Y |
5A | Z | Uppercase Z |
5B | [ | Left square bracket |
5C | \ | Backslash |
5D | ] | Right square bracket |
5E | ^ | Caret |
5F | _ | Underscore |
60 | ` | Grave accent |
61 | a | Lowercase a |
62 | b | Lowercase b |
63 | c | Lowercase c |
64 | d | Lowercase d |
65 | e | Lowercase e |
66 | f | Lowercase f |
67 | g | Lowercase g |
68 | h | Lowercase h |
69 | i | Lowercase i |
6A | j | Lowercase j |
6B | k | Lowercase k |
6C | l | Lowercase l |
6D | m | Lowercase m |
6E | n | Lowercase n |
6F | o | Lowercase o |
70 | p | Lowercase p |
71 | q | Lowercase q |
72 | r | Lowercase r |
73 | s | Lowercase s |
74 | t | Lowercase t |
75 | u | Lowercase u |
76 | v | Lowercase v |
77 | w | Lowercase w |
78 | x | Lowercase x |
79 | y | Lowercase y |
7A | z | Lowercase z |
7B | { | Left curly brace |
7C | | | Vertical bar |
7D | } | Right curly brace |
7E | ~ | Tilde |
7F | DEL | Delete |
Unicode
Unicode is a character set standard that assigns a unique hexadecimal identifier, called a code point, to every character across all writing systems. These code points follow the format U+<....>
.
Code Point Categories
- Surrogate code points: Code points in the range
U+D800
toU+DFFF
, which are reserved for use in UTF-16 encoding - Scalar values: All other Unicode code points (excluding surrogate code points)
Character Encoding Standards
Unicode defines three primary character encoding standards for converting code points into bytes for storage and transmission. These standards use code units as their basic storage elements to represent characters.
- UTF-8
- Code unit size: 8 bits (1 byte)
- Character representation: 1 to 4 code units per character
- Encoding type: Variable-length encoding
- UTF-16
- Code unit size: 16 bits (2 bytes)
- Character representation: 1 code unit, or 2 code units for surrogate code points, and we call these 2 code units a surrogate pair
- UTF-32
- Code unit size: 32 bits (4 bytes)
- Character representation: Exactly 1 code unit per character
Grapheme clusters
A grapheme cluster represents what users typically think of as a "character", that is, the smallest unit of written language that has semantic meaning.
Example
The grapheme clusters in the Hindi word "क्षत्रिय"
are ["क्ष", "त्रि", "य"]
, where each cluster can comprise multiple Unicode code points. While "य"
is a single code point, "क्ष"
is a conjunct consonant formed from three code points: the base character "क"
(U+0915
), the virama "्"
(U+094D
), and "ष"
(U+0937
), which combine to create one visual unit. The cluster "त्रि"
is even more complex, consisting of four code points: "त"
(U+0924
), "्"
(U+094D
), "र"
(U+0930
), and the vowel sign "ि"
(U+093F
), where the vowel mark appears visually before the consonant cluster despite following it in the Unicode sequence.
The example above demonstrates why simply counting Unicode code points doesn't always correspond to what users perceive as individual characters, making grapheme cluster boundaries crucial for correct string processing.
Integer Representation
Unsigned Integer Representation
Unsigned integers use all available bits to represent positive values (including zero). No bit is reserved for a sign, allowing for a larger range of positive values compared to signed integers of the same bit width.
An \(N\)-bit unsigned integer can take up values from \(0\) to \(2^N - 1\).
Signed Integer Representation
Signed integers reserve one bit (typically the MSB) to indicate sign, reducing the range of representable positive values but enabling representation of negative numbers.
Two's Complement
Two's complement is the most common encoding system to represent signed integers. It treats the MSB exclusively as a sign bit:
- MSB = 0: positive number (or zero)
- MSB = 1: negative number
An \(N\)-bit unsigned integer using two's complement can take up values from \(-2^{N-1}\) to \(2^{N-1} - 1\).
To represent a number using two's complement:
- Start with the binary representation of the positive number
- Toggle all the bits
- Add 1 to the result
Note
When converting from binary to decimal, if the MSB is a 1, treat it as a -1, then proceed as discussed in the "Any Base to Decimal" section.
Floating-Point Representation
IEEE 754 Structure
Single Precision (32-bit)
| Sign | Biased Exponent | Normalized Mantissa |
| 1 | 8 | 23 |
Double Precision (64-bit)
| Sign | Biased Exponent | Normalized Mantissa |
| 1 | 11 | 52 |
Fields
- Sign: Represents the sign of the floating-point number using signed magnitude encoding (0 = positive, 1 = negative)
- Normalized Mantissa (Significand): Represents the significant digits of the number. The mantissa is normalized, meaning we move the decimal point so it falls in the range \([1.0, 2.0)\), but we implicitly store the 1, and explicitly store the digits after the decimal point (i.e., the fractional part).
- Biased Exponent: Determines the power of 2 by which to scale the mantissa. The exponent is biased, meaning we substract a constant bias to the actual exponent. For single precision, the bias is \(127_{10}\); for double precision, the bias is \(1023_{10}\)
Note
The mantissa is actually 24 bits total in single precision, if we count the implicit number before the decimal point, and 53 bits total in double precision due to the implicit leading number.
Converting Decimal to IEEE 754
To convert \(12.375_{10}\) to single precision IEEE 754:
Step 1: Convert to binary
\(12.375_{10} = 1100.011_2\)
Step 2: Determine the sign
Since it's a positive number, the sign bit is 0.
Step 3: Normalize the mantissa
\(1100.011_2 = 1.100011_2 \times 2^3\)
The normalized mantissa (fractional part only), padded to 23 bits, is: \(10001100000000000000000_2\)
Step 4: Calculate the biased exponent
Biased exponent = actual exponent + bias = \(3 + 127 = 130 = 10000010_2\)
Step 5: Combine all fields
| Sign | Exponent | Mantissa |
| 0 | 10000010 | 10001100000000000000000 |
Final result: \(01000001010001100000000000000000_2\)
Special Numbers
Denormalized Numbers
When calculations produce results that are too small for normal representation (i.e., below \(1.0 \times 2^{1 - 127}\) for single-precision and \(1.0 \times 2^{1 -1023}\) for double-precision) but are non-zero, we use denormalized representation. This is achieved by:
- Setting the exponent field to all zeros (which corresponds to using the fixed exponent \(1 -127\) for single-precision or \(1 - 1023\) for double-precision)
- Using an implicit leading 0 instead of 1 before the decimal point in the mantissa
- Calculating the mantissa such that \(0.\text{mantissa} \times 2^{1 -127}\) (or \(2^{1 - 1023}\) for double-precision) equals the initial number's binary scientific notation.
Infinity
When calculations exceed the representable range (overflow) or a non-zero value divided by zero, the result is infinity. The exponent field becomes all ones and the mantissa is all zeros. The sign bit determines positive or negative infinity.
This allows programs to continue running with meaningful results rather than crashing on overflow.
NaN (Not a Number)
Represents mathematically undefined results. Like infinity, the exponent is all ones, but now the mantissa is non-zero. Any operation with NaN produces NaN, and NaN is never equal to anything, including itself. It also has two types:
- Quiet NaN: Silently propagates through calculations
- Signaling NaN: Triggers an exception/error when encountered in operations
Integer Overflow
Integer overflow occurs when you try to store a number that exceeds the maximum value that can be represented with the available number of bits, causing the result to wrap around to an unexpected value.
The overflowed result can be found as follows:
- Unsigned integer: \(\text{(expected value)} \bmod 2^n\)
- Signed Integer:
- Positive overflow (i.e., \(\text{expected value} > 2^{N-1} - 1\)): \(\text{expected value} - 2^n\)
- Negative overflow (i.e., \(\text{expected value} < -2^{N-1}\)): \(\text{expected value} + 2^n\)
Integer Byte Order
Byte order, also known as endianness of a system defines how multi-byte values are assign to memory addresses.
- Big endian Byte Order: The most significant byte is stored at the lowest memory address.
- Little endian Byte Order: The least significant byte is stored at the lowest memory address.
struct
Packing
Background
Alignment refers to the set of memory addresses where a value is permitted to start. A type's natural alignment is the default alignment, determined by the platform, that enables efficient memory access (i.e. single-instruction loads & stores). Without natural alignment, a value might be placed such that it straddles machine-word boundaries. As a result, the CPU may need to perform multiple memory accesses and combine the pieces in registers, which increases the cost of reading or writing the value. In Rust, you can check a type's alignment using std::mem::align_of
.
For data structures, notably structs or unions, the compiler often inserts padding — extra unused bytes — to ensure each field begins at an address satisfying its own alignment and/or the alignment of the entire data structure. Padding added between fields to align them properly is called internal field padding, while padding added at the end to satisfy overall alignment requirements is called trailing padding.
The data structure size is the total number of bytes the structure occupies in memory, including both actual data and any inserted padding. You can retrieve this size using std::mem::size_of
.
We also say a type is self-aligned to describe that its alignment equals its size.
The following table lists common types along with their natural alignment and size (in bytes) on a x86-64 system.
Type | Natural Alignment for x86-64 systems (bytes) | Size (bytes) |
---|---|---|
Primitive | Memory address divisible by the type's size | Depends on the type ( bool = 1, u8 / i8 = 1, u16 / i16 = 2, char = 4, f32 / i32 / u32 = 4, f64 / i64 / u64 = 8, u128 / i128 = 16, usize / isize = determined by the maximum addressable memory on a given architecture (i.e. 8 bytes on 64-bit systems, 4 bytes on 32-bit systems)) |
Array / Slice | Memory address divisible by the element type's natural alignment | $\text{element size} \cdot \text{number of elements in the array}$ |
Pointer | Memory address divisible by the pointer size | Determined by the maximum addressable memory on a given architecture (i.e. 8 bytes on 64-bit systems, 4 bytes on 32-bit systems) |
Struct | Maximum natural alignment among all fields | $\text{size of fields} + \text{internal field padding} + \text{trailing padding}$ |
Note
In C, the order of fields of a struct is preserved exactly as written, while in Rust, the compiler may reorder fields to reduce padding unless you use the#[repr(C)]
attribute.
Field Re-Ordering
A common technique to pack a struct is re-ordering fields by decreasing natural alignment, since a field with larger alignment guarantees that any following smaller-alignment field will already be properly aligned:
#![allow(unused)] fn main() { #[repr(C)] struct DefaultStruct { c: u8, // 1 byte // 7 bytes of field padding p: *const u8, // 8 bytes x: i16, // 2 bytes // 6 bytes of trailing padding } // total size: 24 bytes }
#![allow(unused)] fn main() { #[repr(C)] struct PackedStruct { p: *const u8, // 8 bytes x: i16, // 2 bytes c: u8, // 1 byte // 5 bytes of trailing padding } // total size: 16 bytes }
Another common trick is to group fields of the same alignment and same size together (on top of applying the trick above):
#![allow(unused)] fn main() { #[repr(C)] struct DefaultStruct { a: i32, // 4 bytes c1: u8, // 1 byte // 3 bytes of field padding b: i32, // 4 bytes c2: u8, // 1 byte // 1 byte of field padding s1: i16, // 2 bytes s2: i16, // 2 bytes // 2 bytes of trailing padding } // total size: 20 bytes }
#![allow(unused)] fn main() { #[repr(C)] struct PackedStruct { a: i32, // 4 bytes b: i32, // 4 bytes s1: i16, // 2 bytes s2: i16, // 2 bytes c1: u8, // 1 byte c2: u8, // 1 byte // 2 bytes of trailing padding } // total size: 16 bytes }
Compiler Struct Packing
In Rust, you can add an attribute to force the compiler to lay out struct fields back-to-back with no padding, thereby breaking natural alignment:
#![allow(unused)] fn main() { #[repr(packed)] struct PackedStruct { a: u8, b: i32, c: f64, } }
Note
Accessing unaligned fields in packed structs may lead to significantly slower code or even cause undefined behaviour on some platforms which can't handle unaligned access at all.
Bit Packing
Bit Fields
Bit fields represent a technique for using integers as structured data containers by partitioning them into consecutive segments of bits, with each segment storing a distinct field. This approach involves a predetermined number of fields arranged within the integer's bit pattern.
#![allow(unused)] fn main() { pub struct BitField(u32); #[derive(Debug, PartialEq, Clone, Copy)] pub enum FieldId { Field1, Field2, Field3, Field4, Field5, } #[derive(Debug, PartialEq)] pub enum BitFieldError { InvalidField, ValueTooLarge(u32), } impl BitField { // Overall visualization: // | 31 30 29 28 | 27 ... 18 | 17 ... 12 | 11 10 9 8 | 7 ... 0 | // +-------------+----------------+------------+-----------+----------+ // | Field5 | Field4 | Field3 | Field2 | Field1 | // | 4 bits | 10 bits | 6 bits | 4 bits | 8 bits | // bits 0 to 7, possible values from 0 to 255 const FIELD1_START: u32 = 0; const FIELD1_WIDTH: u32 = 8; // bits 8 to 11, possible values from 0 to 15 const FIELD2_START: u32 = 8; const FIELD2_WIDTH: u32 = 4; // bits 12 to 17, possible values from 0 to 63 const FIELD3_START: u32 = 12; const FIELD3_WIDTH: u32 = 6; // bits 18 to 27, possible values from 0 to 1023 const FIELD4_START: u32 = 18; const FIELD4_WIDTH: u32 = 10; // bits 28 to 31, possible values from 0 to 15 const FIELD5_START: u32 = 28; const FIELD5_WIDTH: u32 = 4; pub const fn new() -> Self { Self(0) } pub fn set_field(&mut self, field: FieldId, value: u32) -> Result<(), BitFieldError> { let (start, width) = Self::get_field_info(field); let max_value = Self::get_max_value(width); if value > max_value { return Err(BitFieldError::ValueTooLarge(value)); } let mask = Self::get_field_mask(start, width); self.0 = (self.0 & !mask) | ((value << start) & mask); Ok(()) } pub fn get_field(&self, field: FieldId) -> u32 { let (start, width) = Self::get_field_info(field); let mask = Self::get_field_mask(start, width); (self.0 & mask) >> start } pub fn clear_field(&mut self, field: FieldId) -> Result<(), BitFieldError> { self.set_field(field, 0) } fn get_field_info(field: FieldId) -> (u32, u32) { match field { FieldId::Field1 => (Self::FIELD1_START, Self::FIELD1_WIDTH), FieldId::Field2 => (Self::FIELD2_START, Self::FIELD2_WIDTH), FieldId::Field3 => (Self::FIELD3_START, Self::FIELD3_WIDTH), FieldId::Field4 => (Self::FIELD4_START, Self::FIELD4_WIDTH), FieldId::Field5 => (Self::FIELD5_START, Self::FIELD5_WIDTH), } } fn get_field_mask(start: u32, width: u32) -> u32 { ((1u32 << width) - 1) << start } fn get_max_value(width: u32) -> u32 { (1u32 << width) - 1 } } }
Bit Flags
Bit Flags is a technique where individual bits within an integer act as separate boolean toggles, each indicating whether a specific condition, feature, or option is active or inactive. The number of flags is generally predetermined and typically occupies only a portion of the available integer bits.
#![allow(unused)] fn main() { pub struct BitFlags(u8); #[derive(Debug, PartialEq)] pub enum BitFlagsError { InvalidFlag(u8), } impl BitFlags { pub const FLAG1: u8 = 0x01; pub const FLAG2: u8 = 0x02; pub const FLAG3: u8 = 0x04; pub const FLAG4: u8 = 0x08; const VALID_FLAGS: u8 = Self::FLAG1 | Self::FLAG2 | Self::FLAG3 | Self::FLAG4; pub const fn new() -> Self { Self(0) } pub fn set(&mut self, flags: u8) -> Result<(), BitFlagsError> { if (Self::VALID_FLAGS & flags) != flags { return Err(BitFlagsError::InvalidFlag(!Self::VALID_FLAGS & flags)); } self.0 |= flags; Ok(()) } pub fn clear(&mut self, flags: u8) -> Result<(), BitFlagsError> { if (Self::VALID_FLAGS & flags) != flags { return Err(BitFlagsError::InvalidFlag(!Self::VALID_FLAGS & flags)); } self.0 &= !flags; Ok(()) } pub fn toggle(&mut self, flags: u8) -> Result<(), BitFlagsError> { if (Self::VALID_FLAGS & flags) != flags { return Err(BitFlagsError::InvalidFlag(!Self::VALID_FLAGS & flags)); } self.0 ^= flags; Ok(()) } pub fn has_any(&self, flags: u8) -> Result<bool, BitFlagsError> { if (Self::VALID_FLAGS & flags) != flags { return Err(BitFlagsError::InvalidFlag(!Self::VALID_FLAGS & flags)); } Ok((self.0 & flags) != 0) } } }
Struct of Arrays (SoA)
Background
Struct of Arrays is a data layout where you store each field in its own contiguous array (can be as the struct's field, a global variable, or a local variable), which provides:
- Low memory usage since it eliminates per-element padding in a AoS layout, and instead, you only have padding for array fields in a SoA layout
- Better cache locality
- Good memory bandwidth utilization for batched code (i.e. if you have a loop that process several objects, but only accesses a few of the fields, then the SoA layout reduces the amount of data that needs to be loaded).
#![allow(unused)] fn main() { struct MySoA { field1: Vec<Type1>, field2: Vec<Type2>, // ... fieldN: Vec<TypeN>, } }
Example: Polymorphism using SoA and AoS
AoS with Enum-based Polymorphism
#![allow(unused)] fn main() { struct Shape { x: f32, y: f32, kind: ShapeKind, } enum ShapeKind { Circle { radius: f32 }, Rectangle { width: f32, height: f32 }, Triangle { base: f32, height: f32 }, } }
AoS with Subclass-based (via Composition) Polymorphism
#![allow(unused)] fn main() { struct Shape { tag: Tag, x: f32, y: f32, } enum Tag { Circle, Rectangle, Triangle, } struct Circle { base: Shape, radius: f32, } struct Rectangle { base: Shape, width: f32, height: f32, } struct Triangle { base: Shape, base_len: f32, height: f32, } }
Hybrid AoS and SoA Variant-Encoding-based Polymorphism
#![allow(unused)] fn main() { struct Shape { tag: Tag, x: f32, y: f32, extra_payload_idx: u32, } enum Tag { Circle, Rectangle, Triangle, } struct CirclePayload { radius: f32, } struct RectanglePayload { width: f32, height: f32, } struct TrianglePayload { base: f32, height: f32, } // Layout at runtime: // - One Vec<Shape> // - Additional Vec<CirclePayload>, Vec<RectanglePayload>, and/or Vec<TrianglePayload> // indexed by Shape.extra_payload_idx as required // Example of accessing the extra data: match shapes[i].tag { Tag::Circle => circle_payloads[shapes[i].extra_payload_idx].radius, Tag::Rectangle => rectangle_payloads[shapes[i].extra_payload_idx], Tag::Triangle => triangle_payloads[shapes[i].extra_payload_idx], } }
SIMD Data Parallelism
Background
Basics
Single Instruction, Multiple Data (SIMD) is a form of data parallelism in which a single instruction operates on multiple data values simultaneously. This is achieved using special registers called vector registers, or vector for short, which hold multiple values at once. Operations performed on these registers are called vector operations. Each individual value within a vector register is referred to as a lane.
SIMD Instruction Set Extensions
SIMD instruction set extensions provide additional support for vector operations using vector registers. The instruction set architecture defines which SIMD instruction set extensions are possible, but it is up to the specific processor implementation to decide which of those extensions to support, and which vector widths of the chosen SIMD extension it will implement. As a result, different CPUs using the same ISA may support different SIMD extensions and vector widths. The most common SIMD instruction set extensions for x86-64 are SSE (SSE 1 to 4), AVX, AVX2, and the most recent, AVX-512, while for AArch64 it is NEON, with more advanced optional extensions like SVE and SVE2. This is not to say that native-width vector registers are always the right choice; some algorithms are better suited to non-native-width vectors.
ISA | SIMD Instruction Set Extension | First-Class Native Vector Widths |
---|---|---|
x86-64 | SSE, SSE2, SSE3, SSE4.1, SSE4.2 | 128 bits (fixed) |
x86-64 | AVX | 256 bits (fixed, float only) |
x86-64 | AVX2 | 256 bits (fixed, int + float) |
x86-64 | AVX-512 | 512 bits (fixed) |
AArch64 | NEON | 64 bits (fixed), 128 bits (fixed) |
AArch64 | SVE | 128–2048 bits (scalable in 128-bit increments) |
AArch64 | SVE2 | 128–2048 bits (scalable in 128-bit increments, more types) |
Note
Each generation of x86-64 SIMD extensions builds upon the previous one. When using narrower vector operations from older instruction sets, they operate on the lower portion of the wider registers introduced in newer extensions.
Note
To check what SIMD instruction set extensions your CPU supports, runlscpu
, and look at theFlags
output.
Vector Width and Performance
When writing vectorized code, you have two main approaches: use a non-native fixed vector width or adapt to the cpu's native vector width. If you choose a fixed non-native vector width that's smaller than the native registers by using older instruction sets, the hardware will still use its full-width registers but leave some lanes empty. This wastes the processor's parallel potential. If you choose a fixed vector width that's larger than the native registers, the system must combine multiple native registers to simulate the larger width. When too many registers are needed simultaneously, the program may spill (i.e., running out of physical registers and having to store data in much slower memory instead).
Rust API
Portable SIMD
Note
Portable SIMD in Rust requires the Rust nightly compiler. To install it, runrustup toolchain install nightly
, then temporarily set it to the default compiler viarustup default nightly
Vendor SIMD Intrinsics
Note
Consider using Dynamic CPU Feature Detection instead, since this approach allows:
- The same binary to work on all CPUs
- Automatic use of the fastest available instructions
- Users don't need to worry about compatibility issues
In contrast, the static CPU feature detection approach (using
RUSTFLAGS
) bakes specific SIMD instruction set extensions into the entire binary, which will crash on CPUs that don't support those features.
Example
Description
Given a slice of bytes haystack
and a byte needle
, find the first occurrence of needle
in haystack
.
Scalar Solution
#![allow(unused)] fn main() { fn scalar_find(haystack: &[u8], needle: u8) -> Option<uwidth> { haystack.iter().position(|&b| b == needle) } }
SIMD Solution
Portable SIMD Solution
#![allow(unused)] #![feature(portable_simd)] fn main() { use std::simd::cmp::SimdPartialEq; use std::simd::u8x32; pub fn portable_simd_find(haystack: &[u8], needle: u8) -> Option<usize> { const VECTOR_WIDTH: usize = 32; // For short strings (less than 16 bytes), use simple iteration instead of SIMD if haystack.len() < VECTOR_WIDTH { return haystack.iter().position(|&b| b == needle); } // Perform SIMD on strings of at least 32 bytes. // // For every 32-byte chunk of the haystack, we create two SIMD vectors: one for the 16-byte // chunk of the haystack we're working on, `haystack_vec`, and one 16-byte SIMD vector containing // 16-bytes of just `needle`, `needle_vec` such that we can compare each byte against the needle in parallel using `simd_eq`, // producing a mask. // // This SIMD mask is a vector of booleans, like: // [false, false, false, true, false, ...] // // We convert this mask into a compact bitmask with `to_bitmask()`, which gives us // a u32 where each bit corresponds to an entry in the SIMD mask: // // - A `true` in the SIMD mask becomes a `1` in the bitmask. // - A `false` becomes a `0`. // // Importantly: the first element of the mask (index 0) maps to the least significant bit (bit 0) // in the bitmask, mask[1] maps to bit 1, and so on. This means `bitmask.trailing_zeros()` // directly gives us the index of the first `true` in the mask. // // So if the bitmask is, say, 0b00001000, then the first match was at index 3. // // Using `bitmask.trailing_zeros()` gives us the index of the first `true` in the mask. // We then add the chunk offset to get the actual index in the full haystack. let needle_vec = u8x32::splat(needle); let mut offset = 0; while offset + VECTOR_WIDTH <= haystack.len() { let chunk = &haystack[offset..offset + VECTOR_WIDTH]; let haystack_vec = u8x32::from_slice(chunk); let mask = haystack_vec.simd_eq(needle_vec); let bitmask = mask.to_bitmask(); if bitmask != 0 { return Some(offset + bitmask.trailing_zeros() as usize); } offset += VECTOR_WIDTH; } // Handle remaining bytes (less than 32) that couldn't be processed with SIMD haystack[offset..] .iter() .position(|&b| b == needle) .map(|pos| offset + pos) } }
Intrinsics (NEON) SIMD Solution
#![allow(unused)] fn main() { pub fn intrinsics_simd_find(haystack: &[u8], needle: u8) -> Option<usize> { const VECTOR_WIDTH: usize = 16; // For short strings (less than 16 bytes), use simple iteration instead of SIMD if haystack.len() < VECTOR_WIDTH { return haystack.iter().position(|&b| b == needle); } // Process strings of 16+ bytes using SIMD instructions // // SIMD Strategy: // 1. Load 16 bytes of haystack into a SIMD vector // 2. Create a SIMD vector filled with 16 copies of the needle byte // 3. Compare all 16 positions simultaneously using vceqq_u8 // 4. This produces a mask where matching positions become 0xFF, non-matches become 0x00 // // Example mask: [0x00, 0x00, 0xFF, 0x00, 0xFF, 0x00, ...] // (no match, no match, MATCH, no match, MATCH, no match, ...) // // Fast Detection: // - Use vmaxvq_u8 to find the maximum value in the mask // - If result is 0, no matches found (all bytes were 0x00) // - If result is non-zero, at least one match exists (at least one 0xFF present) // // Finding Exact Position: // When matches are found, we need to determine their exact locations: // 1. Reinterpret the 16-byte mask as two 64-bit integers for easier processing // 2. Check the lower 64 bits (bytes 0-7) and upper 64 bits (bytes 8-15) separately // 3. Within each 64-bit section, examine each byte by shifting and masking // - Right-shift by (i * 8) bits to move byte i to the lowest position // - Apply mask 0xFF to isolate just that byte // - Check if the result equals 0xFF (indicating a match) // // Importantly: We use right bit shifts because of little-endian byte ordering — the byte // from haystack position 0 gets packed into the least significant (rightmost) bits of the 64-bit integer, // position 1 goes into the next 8 bits, and so on. By shifting right by `i * 8` bits, we move the byte from // haystack position `i` into the least significant position where we can extract it with a simple mask. let needle_vec = unsafe { vdupq_n_u8(needle) }; let mut offset = 0; while offset + VECTOR_WIDTH <= haystack.len() { unsafe { let haystack_ptr = haystack.as_ptr().add(offset); let haystack_vec = vld1q_u8(haystack_ptr); let byte_mask = vceqq_u8(haystack_vec, needle_vec); if vmaxvq_u8(byte_mask) != 0 { let mask_as_u64x2 = vreinterpretq_u64_u8(byte_mask); let low_bits = vgetq_lane_u64(mask_as_u64x2, 0); let high_bits = vgetq_lane_u64(mask_as_u64x2, 1); if low_bits != 0 { for i in 0..8 { if ((low_bits >> (i * 8)) & 0xFF) == 0xFF { return Some(offset + i); } } } else { for i in 0..8 { if ((high_bits >> (i * 8)) & 0xFF) == 0xFF { return Some(offset + 8 + i); } } } } } offset += VECTOR_WIDTH; } // Handle remaining bytes (less than 16) that couldn't be processed with SIMD haystack[offset..] .iter() .position(|&b| b == needle) .map(|pos| offset + pos) } }
Inline Assembly Solution
TO DO
Git
Setup
git init <directory>
Create an empty Git repository in <directory>
.
Cloning & Remote
git clone <repo-url>
Clone repository at <repo-url>
onto the local machine.
git remote -v
Show current remotes and their fetch/push URLs.
git remote add <remote> <repo-url>
Add a new remote named <remote>
pointing to <repo-url>
.
Staging & Committing
git add <file-or-directory>
Stage changes in <file-or-directory>
for the next commit.
git commit -m "<message>"
Commit staged changes using <message>
as the commit message.
git status
List which files are staged, unstaged, and untracked.
Stashing
git stash
Temporarily save uncommitted changes.
git stash pop
Re-apply the last stashed changes and remove it from stash.
Branching
git branch
List all local branches and highlight the current one.
git branch <branch>
Create a new branch named <branch>
.
git switch <branch>
Switch to the branch <branch>
.
git switch -c <branch>
Create and switch to a new branch named <branch>
.
Pushing & Pulling
git push <remote> <branch>
Push committed changes to <remote>
on <branch>
.
Creates <branch>
on the remote if it doesn’t exist.
git pull <remote> <branch>
Fetch and merge changes from <branch>
on <remote>
into the current branch.
git pull --rebase <remote> <branch>
Fetch changes from
Viewing History
git log --oneline --graph --all
Show a condensed visual commit history of all branches.
git diff
Show unstaged changes between working directory and the index.
Non-destructive Changes
TO DO
Destructive Changes
git commit --amend
With nothing staged, edit the last commit's message.