Several bug fixes and improvements (#1178)
* Update pythontutor block with the latest code * Move docs-en to en/docs * Move mkdocs.yml and README to en folder * Fix en/mkdocs.yml * Update the landing page * Fix the glossary * Reduce the font size of the code block tabs * Add Kotlin blocks to en/docs * Fix the code link in en/.../deque.md * Fix the EN README link
BIN
en/docs/assets/covers/chapter_appendix.jpg
Normal file
After Width: | Height: | Size: 114 KiB |
BIN
en/docs/assets/covers/chapter_array_and_linkedlist.jpg
Normal file
After Width: | Height: | Size: 126 KiB |
BIN
en/docs/assets/covers/chapter_backtracking.jpg
Normal file
After Width: | Height: | Size: 128 KiB |
BIN
en/docs/assets/covers/chapter_complexity_analysis.jpg
Normal file
After Width: | Height: | Size: 96 KiB |
BIN
en/docs/assets/covers/chapter_data_structure.jpg
Normal file
After Width: | Height: | Size: 142 KiB |
BIN
en/docs/assets/covers/chapter_divide_and_conquer.jpg
Normal file
After Width: | Height: | Size: 104 KiB |
BIN
en/docs/assets/covers/chapter_dynamic_programming.jpg
Normal file
After Width: | Height: | Size: 164 KiB |
BIN
en/docs/assets/covers/chapter_graph.jpg
Normal file
After Width: | Height: | Size: 84 KiB |
BIN
en/docs/assets/covers/chapter_greedy.jpg
Normal file
After Width: | Height: | Size: 136 KiB |
BIN
en/docs/assets/covers/chapter_hashing.jpg
Normal file
After Width: | Height: | Size: 137 KiB |
BIN
en/docs/assets/covers/chapter_heap.jpg
Normal file
After Width: | Height: | Size: 110 KiB |
BIN
en/docs/assets/covers/chapter_introduction.jpg
Normal file
After Width: | Height: | Size: 138 KiB |
BIN
en/docs/assets/covers/chapter_preface.jpg
Normal file
After Width: | Height: | Size: 119 KiB |
BIN
en/docs/assets/covers/chapter_searching.jpg
Normal file
After Width: | Height: | Size: 128 KiB |
BIN
en/docs/assets/covers/chapter_sorting.jpg
Normal file
After Width: | Height: | Size: 91 KiB |
BIN
en/docs/assets/covers/chapter_stack_and_queue.jpg
Normal file
After Width: | Height: | Size: 104 KiB |
BIN
en/docs/assets/covers/chapter_tree.jpg
Normal file
After Width: | Height: | Size: 118 KiB |
After Width: | Height: | Size: 17 KiB |
After Width: | Height: | Size: 28 KiB |
After Width: | Height: | Size: 22 KiB |
After Width: | Height: | Size: 27 KiB |
216
en/docs/chapter_array_and_linkedlist/array.md
Executable file
@ -0,0 +1,216 @@
|
||||
# Arrays
|
||||
|
||||
An "array" is a linear data structure that operates as a lineup of similar items, stored together in a computer's memory in contiguous spaces. It's like a sequence that maintains organized storage. Each item in this lineup has its unique 'spot' known as an "index". Please refer to the figure below to observe how arrays work and grasp these key terms.
|
||||
|
||||

|
||||
|
||||
## Common Operations on Arrays
|
||||
|
||||
### Initializing Arrays
|
||||
|
||||
Arrays can be initialized in two ways depending on the needs: either without initial values or with specified initial values. When initial values are not specified, most programming languages will set the array elements to $0$:
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title="array.py"
|
||||
# Initialize array
|
||||
arr: list[int] = [0] * 5 # [ 0, 0, 0, 0, 0 ]
|
||||
nums: list[int] = [1, 3, 2, 5, 4]
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title="array.cpp"
|
||||
/* Initialize array */
|
||||
// Stored on stack
|
||||
int arr[5];
|
||||
int nums[5] = { 1, 3, 2, 5, 4 };
|
||||
// Stored on heap (manual memory release needed)
|
||||
int* arr1 = new int[5];
|
||||
int* nums1 = new int[5] { 1, 3, 2, 5, 4 };
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title="array.java"
|
||||
/* Initialize array */
|
||||
int[] arr = new int[5]; // { 0, 0, 0, 0, 0 }
|
||||
int[] nums = { 1, 3, 2, 5, 4 };
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title="array.cs"
|
||||
/* Initialize array */
|
||||
int[] arr = new int[5]; // [ 0, 0, 0, 0, 0 ]
|
||||
int[] nums = [1, 3, 2, 5, 4];
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title="array.go"
|
||||
/* Initialize array */
|
||||
var arr [5]int
|
||||
// In Go, specifying the length ([5]int) denotes an array, while not specifying it ([]int) denotes a slice.
|
||||
// Since Go's arrays are designed to have compile-time fixed length, only constants can be used to specify the length.
|
||||
// For convenience in implementing the extend() method, the Slice will be considered as an Array here.
|
||||
nums := []int{1, 3, 2, 5, 4}
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title="array.swift"
|
||||
/* Initialize array */
|
||||
let arr = Array(repeating: 0, count: 5) // [0, 0, 0, 0, 0]
|
||||
let nums = [1, 3, 2, 5, 4]
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title="array.js"
|
||||
/* Initialize array */
|
||||
var arr = new Array(5).fill(0);
|
||||
var nums = [1, 3, 2, 5, 4];
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title="array.ts"
|
||||
/* Initialize array */
|
||||
let arr: number[] = new Array(5).fill(0);
|
||||
let nums: number[] = [1, 3, 2, 5, 4];
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title="array.dart"
|
||||
/* Initialize array */
|
||||
List<int> arr = List.filled(5, 0); // [0, 0, 0, 0, 0]
|
||||
List<int> nums = [1, 3, 2, 5, 4];
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title="array.rs"
|
||||
/* Initialize array */
|
||||
let arr: Vec<i32> = vec![0; 5]; // [0, 0, 0, 0, 0]
|
||||
let nums: Vec<i32> = vec![1, 3, 2, 5, 4];
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title="array.c"
|
||||
/* Initialize array */
|
||||
int arr[5] = { 0 }; // { 0, 0, 0, 0, 0 }
|
||||
int nums[5] = { 1, 3, 2, 5, 4 };
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title="array.kt"
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title="array.zig"
|
||||
// Initialize array
|
||||
var arr = [_]i32{0} ** 5; // { 0, 0, 0, 0, 0 }
|
||||
var nums = [_]i32{ 1, 3, 2, 5, 4 };
|
||||
```
|
||||
|
||||
### Accessing Elements
|
||||
|
||||
Elements in an array are stored in contiguous memory spaces, making it simpler to compute each element's memory address. The formula shown in the Figure below aids in determining an element's memory address, utilizing the array's memory address (specifically, the first element's address) and the element's index. This computation streamlines direct access to the desired element.
|
||||
|
||||

|
||||
|
||||
As observed in the above illustration, array indexing conventionally begins at $0$. While this might appear counterintuitive, considering counting usually starts at $1$, within the address calculation formula, **an index is essentially an offset from the memory address**. For the first element's address, this offset is $0$, validating its index as $0$.
|
||||
|
||||
Accessing elements in an array is highly efficient, allowing us to randomly access any element in $O(1)$ time.
|
||||
|
||||
```src
|
||||
[file]{array}-[class]{}-[func]{random_access}
|
||||
```
|
||||
|
||||
### Inserting Elements
|
||||
|
||||
Array elements are tightly packed in memory, with no space available to accommodate additional data between them. Illustrated in Figure below, inserting an element in the middle of an array requires shifting all subsequent elements back by one position to create room for the new element.
|
||||
|
||||

|
||||
|
||||
It's important to note that due to the fixed length of an array, inserting an element will unavoidably result in the loss of the last element in the array. Solutions to address this issue will be explored in the "List" chapter.
|
||||
|
||||
```src
|
||||
[file]{array}-[class]{}-[func]{insert}
|
||||
```
|
||||
|
||||
### Deleting Elements
|
||||
|
||||
Similarly, as depicted in the figure below, to delete an element at index $i$, all elements following index $i$ must be moved forward by one position.
|
||||
|
||||

|
||||
|
||||
Please note that after deletion, the former last element becomes "meaningless," hence requiring no specific modification.
|
||||
|
||||
```src
|
||||
[file]{array}-[class]{}-[func]{remove}
|
||||
```
|
||||
|
||||
In summary, the insertion and deletion operations in arrays present the following disadvantages:
|
||||
|
||||
- **High Time Complexity**: Both insertion and deletion in an array have an average time complexity of $O(n)$, where $n$ is the length of the array.
|
||||
- **Loss of Elements**: Due to the fixed length of arrays, elements that exceed the array's capacity are lost during insertion.
|
||||
- **Waste of Memory**: Initializing a longer array and utilizing only the front part results in "meaningless" end elements during insertion, leading to some wasted memory space.
|
||||
|
||||
### Traversing Arrays
|
||||
|
||||
In most programming languages, we can traverse an array either by using indices or by directly iterating over each element:
|
||||
|
||||
```src
|
||||
[file]{array}-[class]{}-[func]{traverse}
|
||||
```
|
||||
|
||||
### Finding Elements
|
||||
|
||||
Locating a specific element within an array involves iterating through the array, checking each element to determine if it matches the desired value.
|
||||
|
||||
Because arrays are linear data structures, this operation is commonly referred to as "linear search."
|
||||
|
||||
```src
|
||||
[file]{array}-[class]{}-[func]{find}
|
||||
```
|
||||
|
||||
### Expanding Arrays
|
||||
|
||||
In complex system environments, ensuring the availability of memory space after an array for safe capacity extension becomes challenging. Consequently, in most programming languages, **the length of an array is immutable**.
|
||||
|
||||
To expand an array, it's necessary to create a larger array and then copy the elements from the original array. This operation has a time complexity of $O(n)$ and can be time-consuming for large arrays. The code are as follows:
|
||||
|
||||
```src
|
||||
[file]{array}-[class]{}-[func]{extend}
|
||||
```
|
||||
|
||||
## Advantages and Limitations of Arrays
|
||||
|
||||
Arrays are stored in contiguous memory spaces and consist of elements of the same type. This approach provides substantial prior information that systems can leverage to optimize the efficiency of data structure operations.
|
||||
|
||||
- **High Space Efficiency**: Arrays allocate a contiguous block of memory for data, eliminating the need for additional structural overhead.
|
||||
- **Support for Random Access**: Arrays allow $O(1)$ time access to any element.
|
||||
- **Cache Locality**: When accessing array elements, the computer not only loads them but also caches the surrounding data, utilizing high-speed cache to enchance subsequent operation speeds.
|
||||
|
||||
However, continuous space storage is a double-edged sword, with the following limitations:
|
||||
|
||||
- **Low Efficiency in Insertion and Deletion**: As arrays accumulate many elements, inserting or deleting elements requires shifting a large number of elements.
|
||||
- **Fixed Length**: The length of an array is fixed after initialization. Expanding an array requires copying all data to a new array, incurring significant costs.
|
||||
- **Space Wastage**: If the allocated array size exceeds the what is necessary, the extra space is wasted.
|
||||
|
||||
## Typical Applications of Arrays
|
||||
|
||||
Arrays are fundamental and widely used data structures. They find frequent application in various algorithms and serve in the implementation of complex data structures.
|
||||
|
||||
- **Random Access**: Arrays are ideal for storing data when random sampling is required. By generating a random sequence based on indices, we can achieve random sampling efficiently.
|
||||
- **Sorting and Searching**: Arrays are the most commonly used data structure for sorting and searching algorithms. Techniques like quick sort, merge sort, binary search, etc., are primarily operate on arrays.
|
||||
- **Lookup Tables**: Arrays serve as efficient lookup tables for quick element or relationship retrieval. For instance, mapping characters to ASCII codes becomes seamless by using the ASCII code values as indices and storing corresponding elements in the array.
|
||||
- **Machine Learning**: Within the domain of neural networks, arrays play a pivotal role in executing crucial linear algebra operations involving vectors, matrices, and tensors. Arrays serve as the primary and most extensively used data structure in neural network programming.
|
||||
- **Data Structure Implementation**: Arrays serve as the building blocks for implementing various data structures like stacks, queues, hash tables, heaps, graphs, etc. For instance, the adjacency matrix representation of a graph is essentially a two-dimensional array.
|
9
en/docs/chapter_array_and_linkedlist/index.md
Normal file
@ -0,0 +1,9 @@
|
||||
# Arrays and Linked Lists
|
||||
|
||||

|
||||
|
||||
!!! abstract
|
||||
|
||||
The world of data structures resembles a sturdy brick wall.
|
||||
|
||||
In arrays, envision bricks snugly aligned, each resting seamlessly beside the next, creating a unified formation. Meanwhile, in linked lists, these bricks disperse freely, embraced by vines gracefully knitting connections between them.
|
After Width: | Height: | Size: 14 KiB |
After Width: | Height: | Size: 24 KiB |
After Width: | Height: | Size: 19 KiB |
After Width: | Height: | Size: 21 KiB |
686
en/docs/chapter_array_and_linkedlist/linked_list.md
Executable file
@ -0,0 +1,686 @@
|
||||
# Linked Lists
|
||||
|
||||
Memory space is a shared resource among all programs. In a complex system environment, available memory can be dispersed throughout the memory space. We understand that the memory allocated for an array must be continuous. However, for very large arrays, finding a sufficiently large contiguous memory space might be challenging. This is where the flexible advantage of linked lists becomes evident.
|
||||
|
||||
A "linked list" is a linear data structure in which each element is a node object, and the nodes are interconnected through "references". These references hold the memory addresses of subsequent nodes, enabling navigation from one node to the next.
|
||||
|
||||
The design of linked lists allows for their nodes to be distributed across memory locations without requiring contiguous memory addresses.
|
||||
|
||||

|
||||
|
||||
As shown in the figure, we see that the basic building block of a linked list is the "node" object. Each node comprises two key components: the node's "value" and a "reference" to the next node.
|
||||
|
||||
- The first node in a linked list is the "head node", and the final one is the "tail node".
|
||||
- The tail node points to "null", designated as `null` in Java, `nullptr` in C++, and `None` in Python.
|
||||
- In languages that support pointers, like C, C++, Go, and Rust, this "reference" is typically implemented as a "pointer".
|
||||
|
||||
As the code below illustrates, a `ListNode` in a linked list, besides holding a value, must also maintain an additional reference (or pointer). Therefore, **a linked list occupies more memory space than an array when storing the same quantity of data.**.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title=""
|
||||
class ListNode:
|
||||
"""Linked List Node Class"""
|
||||
def __init__(self, val: int):
|
||||
self.val: int = val # Node value
|
||||
self.next: ListNode | None = None # Reference to the next node
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title=""
|
||||
/* Linked List Node Structure */
|
||||
struct ListNode {
|
||||
int val; // Node value
|
||||
ListNode *next; // Pointer to the next node
|
||||
ListNode(int x) : val(x), next(nullptr) {} // Constructor
|
||||
};
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title=""
|
||||
/* Linked List Node Class */
|
||||
class ListNode {
|
||||
int val; // Node value
|
||||
ListNode next; // Reference to the next node
|
||||
ListNode(int x) { val = x; } // Constructor
|
||||
}
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title=""
|
||||
/* Linked List Node Class */
|
||||
class ListNode(int x) { // Constructor
|
||||
int val = x; // Node value
|
||||
ListNode? next; // Reference to the next node
|
||||
}
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title=""
|
||||
/* Linked List Node Structure */
|
||||
type ListNode struct {
|
||||
Val int // Node value
|
||||
Next *ListNode // Pointer to the next node
|
||||
}
|
||||
|
||||
// NewListNode Constructor, creates a new linked list
|
||||
func NewListNode(val int) *ListNode {
|
||||
return &ListNode{
|
||||
Val: val,
|
||||
Next: nil,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title=""
|
||||
/* Linked List Node Class */
|
||||
class ListNode {
|
||||
var val: Int // Node value
|
||||
var next: ListNode? // Reference to the next node
|
||||
|
||||
init(x: Int) { // Constructor
|
||||
val = x
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title=""
|
||||
/* Linked List Node Class */
|
||||
class ListNode {
|
||||
constructor(val, next) {
|
||||
this.val = (val === undefined ? 0 : val); // Node value
|
||||
this.next = (next === undefined ? null : next); // Reference to the next node
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title=""
|
||||
/* Linked List Node Class */
|
||||
class ListNode {
|
||||
val: number;
|
||||
next: ListNode | null;
|
||||
constructor(val?: number, next?: ListNode | null) {
|
||||
this.val = val === undefined ? 0 : val; // Node value
|
||||
this.next = next === undefined ? null : next; // Reference to the next node
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title=""
|
||||
/* 链表节点类 */
|
||||
class ListNode {
|
||||
int val; // Node value
|
||||
ListNode? next; // Reference to the next node
|
||||
ListNode(this.val, [this.next]); // Constructor
|
||||
}
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title=""
|
||||
use std::rc::Rc;
|
||||
use std::cell::RefCell;
|
||||
/* Linked List Node Class */
|
||||
#[derive(Debug)]
|
||||
struct ListNode {
|
||||
val: i32, // Node value
|
||||
next: Option<Rc<RefCell<ListNode>>>, // Pointer to the next node
|
||||
}
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title=""
|
||||
/* Linked List Node Structure */
|
||||
typedef struct ListNode {
|
||||
int val; // Node value
|
||||
struct ListNode *next; // Pointer to the next node
|
||||
} ListNode;
|
||||
|
||||
/* Constructor */
|
||||
ListNode *newListNode(int val) {
|
||||
ListNode *node;
|
||||
node = (ListNode *) malloc(sizeof(ListNode));
|
||||
node->val = val;
|
||||
node->next = NULL;
|
||||
return node;
|
||||
}
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title=""
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title=""
|
||||
// Linked List Node Class
|
||||
pub fn ListNode(comptime T: type) type {
|
||||
return struct {
|
||||
const Self = @This();
|
||||
|
||||
val: T = 0, // Node value
|
||||
next: ?*Self = null, // Pointer to the next node
|
||||
|
||||
// Constructor
|
||||
pub fn init(self: *Self, x: i32) void {
|
||||
self.val = x;
|
||||
self.next = null;
|
||||
}
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## Common Operations on Linked Lists
|
||||
|
||||
### Initializing a Linked List
|
||||
|
||||
Constructing a linked list is a two-step process: first, initializing each node object, and second, forming the reference links between the nodes. After initialization, we can traverse all nodes sequentially from the head node by following the `next` reference.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title="linked_list.py"
|
||||
# Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4
|
||||
# Initialize each node
|
||||
n0 = ListNode(1)
|
||||
n1 = ListNode(3)
|
||||
n2 = ListNode(2)
|
||||
n3 = ListNode(5)
|
||||
n4 = ListNode(4)
|
||||
# Build references between nodes
|
||||
n0.next = n1
|
||||
n1.next = n2
|
||||
n2.next = n3
|
||||
n3.next = n4
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title="linked_list.cpp"
|
||||
/* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */
|
||||
// Initialize each node
|
||||
ListNode* n0 = new ListNode(1);
|
||||
ListNode* n1 = new ListNode(3);
|
||||
ListNode* n2 = new ListNode(2);
|
||||
ListNode* n3 = new ListNode(5);
|
||||
ListNode* n4 = new ListNode(4);
|
||||
// Build references between nodes
|
||||
n0->next = n1;
|
||||
n1->next = n2;
|
||||
n2->next = n3;
|
||||
n3->next = n4;
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title="linked_list.java"
|
||||
/* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */
|
||||
// Initialize each node
|
||||
ListNode n0 = new ListNode(1);
|
||||
ListNode n1 = new ListNode(3);
|
||||
ListNode n2 = new ListNode(2);
|
||||
ListNode n3 = new ListNode(5);
|
||||
ListNode n4 = new ListNode(4);
|
||||
// Build references between nodes
|
||||
n0.next = n1;
|
||||
n1.next = n2;
|
||||
n2.next = n3;
|
||||
n3.next = n4;
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title="linked_list.cs"
|
||||
/* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */
|
||||
// Initialize each node
|
||||
ListNode n0 = new(1);
|
||||
ListNode n1 = new(3);
|
||||
ListNode n2 = new(2);
|
||||
ListNode n3 = new(5);
|
||||
ListNode n4 = new(4);
|
||||
// Build references between nodes
|
||||
n0.next = n1;
|
||||
n1.next = n2;
|
||||
n2.next = n3;
|
||||
n3.next = n4;
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title="linked_list.go"
|
||||
/* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */
|
||||
// Initialize each node
|
||||
n0 := NewListNode(1)
|
||||
n1 := NewListNode(3)
|
||||
n2 := NewListNode(2)
|
||||
n3 := NewListNode(5)
|
||||
n4 := NewListNode(4)
|
||||
// Build references between nodes
|
||||
n0.Next = n1
|
||||
n1.Next = n2
|
||||
n2.Next = n3
|
||||
n3.Next = n4
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title="linked_list.swift"
|
||||
/* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */
|
||||
// Initialize each node
|
||||
let n0 = ListNode(x: 1)
|
||||
let n1 = ListNode(x: 3)
|
||||
let n2 = ListNode(x: 2)
|
||||
let n3 = ListNode(x: 5)
|
||||
let n4 = ListNode(x: 4)
|
||||
// Build references between nodes
|
||||
n0.next = n1
|
||||
n1.next = n2
|
||||
n2.next = n3
|
||||
n3.next = n4
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title="linked_list.js"
|
||||
/* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */
|
||||
// Initialize each node
|
||||
const n0 = new ListNode(1);
|
||||
const n1 = new ListNode(3);
|
||||
const n2 = new ListNode(2);
|
||||
const n3 = new ListNode(5);
|
||||
const n4 = new ListNode(4);
|
||||
// Build references between nodes
|
||||
n0.next = n1;
|
||||
n1.next = n2;
|
||||
n2.next = n3;
|
||||
n3.next = n4;
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title="linked_list.ts"
|
||||
/* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */
|
||||
// Initialize each node
|
||||
const n0 = new ListNode(1);
|
||||
const n1 = new ListNode(3);
|
||||
const n2 = new ListNode(2);
|
||||
const n3 = new ListNode(5);
|
||||
const n4 = new ListNode(4);
|
||||
// Build references between nodes
|
||||
n0.next = n1;
|
||||
n1.next = n2;
|
||||
n2.next = n3;
|
||||
n3.next = n4;
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title="linked_list.dart"
|
||||
/* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */
|
||||
// Initialize each node
|
||||
ListNode n0 = ListNode(1);
|
||||
ListNode n1 = ListNode(3);
|
||||
ListNode n2 = ListNode(2);
|
||||
ListNode n3 = ListNode(5);
|
||||
ListNode n4 = ListNode(4);
|
||||
// Build references between nodes
|
||||
n0.next = n1;
|
||||
n1.next = n2;
|
||||
n2.next = n3;
|
||||
n3.next = n4;
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title="linked_list.rs"
|
||||
/* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */
|
||||
// Initialize each node
|
||||
let n0 = Rc::new(RefCell::new(ListNode { val: 1, next: None }));
|
||||
let n1 = Rc::new(RefCell::new(ListNode { val: 3, next: None }));
|
||||
let n2 = Rc::new(RefCell::new(ListNode { val: 2, next: None }));
|
||||
let n3 = Rc::new(RefCell::new(ListNode { val: 5, next: None }));
|
||||
let n4 = Rc::new(RefCell::new(ListNode { val: 4, next: None }));
|
||||
|
||||
// Build references between nodes
|
||||
n0.borrow_mut().next = Some(n1.clone());
|
||||
n1.borrow_mut().next = Some(n2.clone());
|
||||
n2.borrow_mut().next = Some(n3.clone());
|
||||
n3.borrow_mut().next = Some(n4.clone());
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title="linked_list.c"
|
||||
/* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */
|
||||
// Initialize each node
|
||||
ListNode* n0 = newListNode(1);
|
||||
ListNode* n1 = newListNode(3);
|
||||
ListNode* n2 = newListNode(2);
|
||||
ListNode* n3 = newListNode(5);
|
||||
ListNode* n4 = newListNode(4);
|
||||
// Build references between nodes
|
||||
n0->next = n1;
|
||||
n1->next = n2;
|
||||
n2->next = n3;
|
||||
n3->next = n4;
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title="linked_list.kt"
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title="linked_list.zig"
|
||||
// Initialize linked list
|
||||
// Initialize each node
|
||||
var n0 = inc.ListNode(i32){.val = 1};
|
||||
var n1 = inc.ListNode(i32){.val = 3};
|
||||
var n2 = inc.ListNode(i32){.val = 2};
|
||||
var n3 = inc.ListNode(i32){.val = 5};
|
||||
var n4 = inc.ListNode(i32){.val = 4};
|
||||
// Build references between nodes
|
||||
n0.next = &n1;
|
||||
n1.next = &n2;
|
||||
n2.next = &n3;
|
||||
n3.next = &n4;
|
||||
```
|
||||
|
||||
The array as a whole is a variable, for instance, the array `nums` includes elements like `nums[0]`, `nums[1]`, and so on, whereas a linked list is made up of several distinct node objects. **We typically refer to a linked list by its head node**, for example, the linked list in the previous code snippet is referred to as `n0`.
|
||||
|
||||
### Inserting a Node
|
||||
|
||||
Inserting a node into a linked list is very easy. As shown in the figure, let's assume we aim to insert a new node `P` between two adjacent nodes `n0` and `n1`. **This can be achieved by simply modifying two node references (pointers)**, with a time complexity of $O(1)$.
|
||||
|
||||
By comparison, inserting an element into an array has a time complexity of $O(n)$, which becomes less efficient when dealing with large data volumes.
|
||||
|
||||

|
||||
|
||||
```src
|
||||
[file]{linked_list}-[class]{}-[func]{insert}
|
||||
```
|
||||
|
||||
### Deleting a Node
|
||||
|
||||
As shown in the figure, deleting a node from a linked list is also very easy, **involving only the modification of a single node's reference (pointer)**.
|
||||
|
||||
It's important to note that even though node `P` continues to point to `n1` after being deleted, it becomes inaccessible during linked list traversal. This effectively means that `P` is no longer a part of the linked list.
|
||||
|
||||

|
||||
|
||||
```src
|
||||
[file]{linked_list}-[class]{}-[func]{remove}
|
||||
```
|
||||
|
||||
### Accessing Nodes
|
||||
|
||||
**Accessing nodes in a linked list is less efficient**. As previously mentioned, any element in an array can be accessed in $O(1)$ time. In contrast, with a linked list, the program involves starting from the head node and sequentially traversing through the nodes until the desired node is found. In other words, to access the $i$-th node in a linked list, the program must iterate through $i - 1$ nodes, resulting in a time complexity of $O(n)$.
|
||||
|
||||
```src
|
||||
[file]{linked_list}-[class]{}-[func]{access}
|
||||
```
|
||||
|
||||
### Finding Nodes
|
||||
|
||||
Traverse the linked list to locate a node whose value matches `target`, and then output the index of that node within the linked list. This procedure is also an example of linear search. The corresponding code is provided below:
|
||||
|
||||
```src
|
||||
[file]{linked_list}-[class]{}-[func]{find}
|
||||
```
|
||||
|
||||
## Arrays vs. Linked Lists
|
||||
|
||||
The table below summarizes the characteristics of arrays and linked lists, and it also compares their efficiencies in various operations. Because they utilize opposing storage strategies, their respective properties and operational efficiencies exhibit distinct contrasts.
|
||||
|
||||
<p align="center"> Table <id> Efficiency Comparison of Arrays and Linked Lists </p>
|
||||
|
||||
| | Arrays | Linked Lists |
|
||||
| ------------------ | ------------------------------------------------ | ----------------------- |
|
||||
| Storage | Contiguous Memory Space | Dispersed Memory Space |
|
||||
| Capacity Expansion | Fixed Length | Flexible Expansion |
|
||||
| Memory Efficiency | Less Memory per Element, Potential Space Wastage | More Memory per Element |
|
||||
| Accessing Elements | $O(1)$ | $O(n)$ |
|
||||
| Adding Elements | $O(n)$ | $O(1)$ |
|
||||
| Deleting Elements | $O(n)$ | $O(1)$ |
|
||||
|
||||
## Common Types of Linked Lists
|
||||
|
||||
As shown in the figure, there are three common types of linked lists.
|
||||
|
||||
- **Singly Linked List**: This is the standard linked list described earlier. Nodes in a singly linked list include a value and a reference to the next node. The first node is known as the head node, and the last node, which points to null (`None`), is the tail node.
|
||||
- **Circular Linked List**: This is formed when the tail node of a singly linked list points back to the head node, creating a loop. In a circular linked list, any node can function as the head node.
|
||||
- **Doubly Linked List**: In contrast to a singly linked list, a doubly linked list maintains references in two directions. Each node contains references (pointer) to both its successor (the next node) and predecessor (the previous node). Although doubly linked lists offer more flexibility for traversing in either direction, they also consume more memory space.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title=""
|
||||
class ListNode:
|
||||
"""Bidirectional linked list node class""""
|
||||
def __init__(self, val: int):
|
||||
self.val: int = val # Node value
|
||||
self.next: ListNode | None = None # Reference to the successor node
|
||||
self.prev: ListNode | None = None # Reference to a predecessor node
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title=""
|
||||
/* Bidirectional linked list node structure */
|
||||
struct ListNode {
|
||||
int val; // Node value
|
||||
ListNode *next; // Pointer to the successor node
|
||||
ListNode *prev; // Pointer to the predecessor node
|
||||
ListNode(int x) : val(x), next(nullptr), prev(nullptr) {} // Constructor
|
||||
};
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title=""
|
||||
/* Bidirectional linked list node class */
|
||||
class ListNode {
|
||||
int val; // Node value
|
||||
ListNode next; // Reference to the next node
|
||||
ListNode prev; // Reference to the predecessor node
|
||||
ListNode(int x) { val = x; } // Constructor
|
||||
}
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title=""
|
||||
/* Bidirectional linked list node class */
|
||||
class ListNode(int x) { // Constructor
|
||||
int val = x; // Node value
|
||||
ListNode next; // Reference to the next node
|
||||
ListNode prev; // Reference to the predecessor node
|
||||
}
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title=""
|
||||
/* Bidirectional linked list node structure */
|
||||
type DoublyListNode struct {
|
||||
Val int // Node value
|
||||
Next *DoublyListNode // Pointer to the successor node
|
||||
Prev *DoublyListNode // Pointer to the predecessor node
|
||||
}
|
||||
|
||||
// NewDoublyListNode initialization
|
||||
func NewDoublyListNode(val int) *DoublyListNode {
|
||||
return &DoublyListNode{
|
||||
Val: val,
|
||||
Next: nil,
|
||||
Prev: nil,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title=""
|
||||
/* Bidirectional linked list node class */
|
||||
class ListNode {
|
||||
var val: Int // Node value
|
||||
var next: ListNode? // Reference to the next node
|
||||
var prev: ListNode? // Reference to the predecessor node
|
||||
|
||||
init(x: Int) { // Constructor
|
||||
val = x
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title=""
|
||||
/* Bidirectional linked list node class */
|
||||
class ListNode {
|
||||
constructor(val, next, prev) {
|
||||
this.val = val === undefined ? 0 : val; // Node value
|
||||
this.next = next === undefined ? null : next; // Reference to the successor node
|
||||
this.prev = prev === undefined ? null : prev; // Reference to the predecessor node
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title=""
|
||||
/* Bidirectional linked list node class */
|
||||
class ListNode {
|
||||
val: number;
|
||||
next: ListNode | null;
|
||||
prev: ListNode | null;
|
||||
constructor(val?: number, next?: ListNode | null, prev?: ListNode | null) {
|
||||
this.val = val === undefined ? 0 : val; // Node value
|
||||
this.next = next === undefined ? null : next; // Reference to the successor node
|
||||
this.prev = prev === undefined ? null : prev; // Reference to the predecessor node
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title=""
|
||||
/* Bidirectional linked list node class */
|
||||
class ListNode {
|
||||
int val; // Node value
|
||||
ListNode next; // Reference to the next node
|
||||
ListNode prev; // Reference to the predecessor node
|
||||
ListNode(this.val, [this.next, this.prev]); // Constructor
|
||||
}
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title=""
|
||||
use std::rc::Rc;
|
||||
use std::cell::RefCell;
|
||||
|
||||
/* Bidirectional linked list node type */
|
||||
#[derive(Debug)]
|
||||
struct ListNode {
|
||||
val: i32, // Node value
|
||||
next: Option<Rc<RefCell<ListNode>>>, // Pointer to successor node
|
||||
prev: Option<Rc<RefCell<ListNode>>>, // Pointer to predecessor node
|
||||
}
|
||||
|
||||
/* Constructors */
|
||||
impl ListNode {
|
||||
fn new(val: i32) -> Self {
|
||||
ListNode {
|
||||
val,
|
||||
next: None,
|
||||
prev: None,
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title=""
|
||||
/* Bidirectional linked list node structure */
|
||||
typedef struct ListNode {
|
||||
int val; // Node value
|
||||
struct ListNode *next; // Pointer to the successor node
|
||||
struct ListNode *prev; // Pointer to the predecessor node
|
||||
} ListNode;
|
||||
|
||||
/* Constructors */
|
||||
ListNode *newListNode(int val) {
|
||||
ListNode *node, *next;
|
||||
node = (ListNode *) malloc(sizeof(ListNode));
|
||||
node->val = val;
|
||||
node->next = NULL;
|
||||
node->prev = NULL;
|
||||
return node;
|
||||
}
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title=""
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title=""
|
||||
// Bidirectional linked list node class
|
||||
pub fn ListNode(comptime T: type) type {
|
||||
return struct {
|
||||
const Self = @This();
|
||||
|
||||
val: T = 0, // Node value
|
||||
next: ?*Self = null, // Pointer to the successor node
|
||||
prev: ?*Self = null, // Pointer to the predecessor node
|
||||
|
||||
// Constructor
|
||||
pub fn init(self: *Self, x: i32) void {
|
||||
self.val = x;
|
||||
self.next = null;
|
||||
self.prev = null;
|
||||
}
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||

|
||||
|
||||
## Typical Applications of Linked Lists
|
||||
|
||||
Singly linked lists are frequently utilized in implementing stacks, queues, hash tables, and graphs.
|
||||
|
||||
- **Stacks and Queues**: In singly linked lists, if insertions and deletions occur at the same end, it behaves like a stack (last-in-first-out). Conversely, if insertions are at one end and deletions at the other, it functions like a queue (first-in-first-out).
|
||||
- **Hash Tables**: Linked lists are used in chaining, a popular method for resolving hash collisions. Here, all collided elements are grouped into a linked list.
|
||||
- **Graphs**: Adjacency lists, a standard method for graph representation, associate each graph vertex with a linked list. This list contains elements that represent vertices connected to the corresponding vertex.
|
||||
|
||||
Doubly linked lists are ideal for scenarios requiring rapid access to preceding and succeeding elements.
|
||||
|
||||
- **Advanced Data Structures**: In structures like red-black trees and B-trees, accessing a node's parent is essential. This is achieved by incorporating a reference to the parent node in each node, akin to a doubly linked list.
|
||||
- **Browser History**: In web browsers, doubly linked lists facilitate navigating the history of visited pages when users click forward or back.
|
||||
- **LRU Algorithm**: Doubly linked lists are apt for Least Recently Used (LRU) cache eviction algorithms, enabling swift identification of the least recently used data and facilitating fast node addition and removal.
|
||||
|
||||
Circular linked lists are ideal for applications that require periodic operations, such as resource scheduling in operating systems.
|
||||
|
||||
- **Round-Robin Scheduling Algorithm**: In operating systems, the round-robin scheduling algorithm is a common CPU scheduling method, requiring cycling through a group of processes. Each process is assigned a time slice, and upon expiration, the CPU rotates to the next process. This cyclical operation can be efficiently realized using a circular linked list, allowing for a fair and time-shared system among all processes.
|
||||
- **Data Buffers**: Circular linked lists are also used in data buffers, like in audio and video players, where the data stream is divided into multiple buffer blocks arranged in a circular fashion for seamless playback.
|
906
en/docs/chapter_array_and_linkedlist/list.md
Executable file
@ -0,0 +1,906 @@
|
||||
# List
|
||||
|
||||
A "list" is an abstract data structure concept that represents an ordered collection of elements, supporting operations such as element access, modification, addition, deletion, and traversal, without requiring users to consider capacity limitations. Lists can be implemented based on linked lists or arrays.
|
||||
|
||||
- A linked list inherently serves as a list, supporting operations for adding, deleting, searching, and modifying elements, with the flexibility to dynamically adjust its size.
|
||||
- Arrays also support these operations, but due to their immutable length, they can be considered as a list with a length limit.
|
||||
|
||||
When implementing lists using arrays, **the immutability of length reduces the practicality of the list**. This is because predicting the amount of data to be stored in advance is often challenging, making it difficult to choose an appropriate list length. If the length is too small, it may not meet the requirements; if too large, it may waste memory space.
|
||||
|
||||
To solve this problem, we can implement lists using a "dynamic array." It inherits the advantages of arrays and can dynamically expand during program execution.
|
||||
|
||||
In fact, **many programming languages' standard libraries implement lists using dynamic arrays**, such as Python's `list`, Java's `ArrayList`, C++'s `vector`, and C#'s `List`. In the following discussion, we will consider "list" and "dynamic array" as synonymous concepts.
|
||||
|
||||
## Common List Operations
|
||||
|
||||
### Initializing a List
|
||||
|
||||
We typically use two initialization methods: "without initial values" and "with initial values".
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title="list.py"
|
||||
# Initialize list
|
||||
# Without initial values
|
||||
nums1: list[int] = []
|
||||
# With initial values
|
||||
nums: list[int] = [1, 3, 2, 5, 4]
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title="list.cpp"
|
||||
/* Initialize list */
|
||||
// Note, in C++ the vector is the equivalent of nums described here
|
||||
// Without initial values
|
||||
vector<int> nums1;
|
||||
// With initial values
|
||||
vector<int> nums = { 1, 3, 2, 5, 4 };
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title="list.java"
|
||||
/* Initialize list */
|
||||
// Without initial values
|
||||
List<Integer> nums1 = new ArrayList<>();
|
||||
// With initial values (note the element type should be the wrapper class Integer[] for int[])
|
||||
Integer[] numbers = new Integer[] { 1, 3, 2, 5, 4 };
|
||||
List<Integer> nums = new ArrayList<>(Arrays.asList(numbers));
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title="list.cs"
|
||||
/* Initialize list */
|
||||
// Without initial values
|
||||
List<int> nums1 = [];
|
||||
// With initial values
|
||||
int[] numbers = [1, 3, 2, 5, 4];
|
||||
List<int> nums = [.. numbers];
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title="list_test.go"
|
||||
/* Initialize list */
|
||||
// Without initial values
|
||||
nums1 := []int{}
|
||||
// With initial values
|
||||
nums := []int{1, 3, 2, 5, 4}
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title="list.swift"
|
||||
/* Initialize list */
|
||||
// Without initial values
|
||||
let nums1: [Int] = []
|
||||
// With initial values
|
||||
var nums = [1, 3, 2, 5, 4]
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title="list.js"
|
||||
/* Initialize list */
|
||||
// Without initial values
|
||||
const nums1 = [];
|
||||
// With initial values
|
||||
const nums = [1, 3, 2, 5, 4];
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title="list.ts"
|
||||
/* Initialize list */
|
||||
// Without initial values
|
||||
const nums1: number[] = [];
|
||||
// With initial values
|
||||
const nums: number[] = [1, 3, 2, 5, 4];
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title="list.dart"
|
||||
/* Initialize list */
|
||||
// Without initial values
|
||||
List<int> nums1 = [];
|
||||
// With initial values
|
||||
List<int> nums = [1, 3, 2, 5, 4];
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title="list.rs"
|
||||
/* Initialize list */
|
||||
// Without initial values
|
||||
let nums1: Vec<i32> = Vec::new();
|
||||
// With initial values
|
||||
let nums: Vec<i32> = vec![1, 3, 2, 5, 4];
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title="list.c"
|
||||
// C does not provide built-in dynamic arrays
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title="list.kt"
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title="list.zig"
|
||||
// Initialize list
|
||||
var nums = std.ArrayList(i32).init(std.heap.page_allocator);
|
||||
defer nums.deinit();
|
||||
try nums.appendSlice(&[_]i32{ 1, 3, 2, 5, 4 });
|
||||
```
|
||||
|
||||
### Accessing Elements
|
||||
|
||||
Lists are essentially arrays, thus they can access and update elements in $O(1)$ time, which is very efficient.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title="list.py"
|
||||
# Access elements
|
||||
num: int = nums[1] # Access the element at index 1
|
||||
|
||||
# Update elements
|
||||
nums[1] = 0 # Update the element at index 1 to 0
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title="list.cpp"
|
||||
/* Access elements */
|
||||
int num = nums[1]; // Access the element at index 1
|
||||
|
||||
/* Update elements */
|
||||
nums[1] = 0; // Update the element at index 1 to 0
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title="list.java"
|
||||
/* Access elements */
|
||||
int num = nums.get(1); // Access the element at index 1
|
||||
|
||||
/* Update elements */
|
||||
nums.set(1, 0); // Update the element at index 1 to 0
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title="list.cs"
|
||||
/* Access elements */
|
||||
int num = nums[1]; // Access the element at index 1
|
||||
|
||||
/* Update elements */
|
||||
nums[1] = 0; // Update the element at index 1 to 0
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title="list_test.go"
|
||||
/* Access elements */
|
||||
num := nums[1] // Access the element at index 1
|
||||
|
||||
/* Update elements */
|
||||
nums[1] = 0 // Update the element at index 1 to 0
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title="list.swift"
|
||||
/* Access elements */
|
||||
let num = nums[1] // Access the element at index 1
|
||||
|
||||
/* Update elements */
|
||||
nums[1] = 0 // Update the element at index 1 to 0
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title="list.js"
|
||||
/* Access elements */
|
||||
const num = nums[1]; // Access the element at index 1
|
||||
|
||||
/* Update elements */
|
||||
nums[1] = 0; // Update the element at index 1 to 0
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title="list.ts"
|
||||
/* Access elements */
|
||||
const num: number = nums[1]; // Access the element at index 1
|
||||
|
||||
/* Update elements */
|
||||
nums[1] = 0; // Update the element at index 1 to 0
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title="list.dart"
|
||||
/* Access elements */
|
||||
int num = nums[1]; // Access the element at index 1
|
||||
|
||||
/* Update elements */
|
||||
nums[1] = 0; // Update the element at index 1 to 0
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title="list.rs"
|
||||
/* Access elements */
|
||||
let num: i32 = nums[1]; // Access the element at index 1
|
||||
/* Update elements */
|
||||
nums[1] = 0; // Update the element at index 1 to 0
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title="list.c"
|
||||
// C does not provide built-in dynamic arrays
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title="list.kt"
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title="list.zig"
|
||||
// Access elements
|
||||
var num = nums.items[1]; // Access the element at index 1
|
||||
|
||||
// Update elements
|
||||
nums.items[1] = 0; // Update the element at index 1 to 0
|
||||
```
|
||||
|
||||
### Inserting and Removing Elements
|
||||
|
||||
Compared to arrays, lists offer more flexibility in adding and removing elements. While adding elements to the end of a list is an $O(1)$ operation, the efficiency of inserting and removing elements elsewhere in the list remains the same as in arrays, with a time complexity of $O(n)$.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title="list.py"
|
||||
# Clear list
|
||||
nums.clear()
|
||||
|
||||
# Append elements at the end
|
||||
nums.append(1)
|
||||
nums.append(3)
|
||||
nums.append(2)
|
||||
nums.append(5)
|
||||
nums.append(4)
|
||||
|
||||
# Insert element in the middle
|
||||
nums.insert(3, 6) # Insert number 6 at index 3
|
||||
|
||||
# Remove elements
|
||||
nums.pop(3) # Remove the element at index 3
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title="list.cpp"
|
||||
/* Clear list */
|
||||
nums.clear();
|
||||
|
||||
/* Append elements at the end */
|
||||
nums.push_back(1);
|
||||
nums.push_back(3);
|
||||
nums.push_back(2);
|
||||
nums.push_back(5);
|
||||
nums.push_back(4);
|
||||
|
||||
/* Insert element in the middle */
|
||||
nums.insert(nums.begin() + 3, 6); // Insert number 6 at index 3
|
||||
|
||||
/* Remove elements */
|
||||
nums.erase(nums.begin() + 3); // Remove the element at index 3
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title="list.java"
|
||||
/* Clear list */
|
||||
nums.clear();
|
||||
|
||||
/* Append elements at the end */
|
||||
nums.add(1);
|
||||
nums.add(3);
|
||||
nums.add(2);
|
||||
nums.add(5);
|
||||
nums.add(4);
|
||||
|
||||
/* Insert element in the middle */
|
||||
nums.add(3, 6); // Insert number 6 at index 3
|
||||
|
||||
/* Remove elements */
|
||||
nums.remove(3); // Remove the element at index 3
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title="list.cs"
|
||||
/* Clear list */
|
||||
nums.Clear();
|
||||
|
||||
/* Append elements at the end */
|
||||
nums.Add(1);
|
||||
nums.Add(3);
|
||||
nums.Add(2);
|
||||
nums.Add(5);
|
||||
nums.Add(4);
|
||||
|
||||
/* Insert element in the middle */
|
||||
nums.Insert(3, 6);
|
||||
|
||||
/* Remove elements */
|
||||
nums.RemoveAt(3);
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title="list_test.go"
|
||||
/* Clear list */
|
||||
nums = nil
|
||||
|
||||
/* Append elements at the end */
|
||||
nums = append(nums, 1)
|
||||
nums = append(nums, 3)
|
||||
nums = append(nums, 2)
|
||||
nums = append(nums, 5)
|
||||
nums = append(nums, 4)
|
||||
|
||||
/* Insert element in the middle */
|
||||
nums = append(nums[:3], append([]int{6}, nums[3:]...)...) // Insert number 6 at index 3
|
||||
|
||||
/* Remove elements */
|
||||
nums = append(nums[:3], nums[4:]...) // Remove the element at index 3
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title="list.swift"
|
||||
/* Clear list */
|
||||
nums.removeAll()
|
||||
|
||||
/* Append elements at the end */
|
||||
nums.append(1)
|
||||
nums.append(3)
|
||||
nums.append(2)
|
||||
nums.append(5)
|
||||
nums.append(4)
|
||||
|
||||
/* Insert element in the middle */
|
||||
nums.insert(6, at: 3) // Insert number 6 at index 3
|
||||
|
||||
/* Remove elements */
|
||||
nums.remove(at: 3) // Remove the element at index 3
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title="list.js"
|
||||
/* Clear list */
|
||||
nums.length = 0;
|
||||
|
||||
/* Append elements at the end */
|
||||
nums.push(1);
|
||||
nums.push(3);
|
||||
nums.push(2);
|
||||
nums.push(5);
|
||||
nums.push(4);
|
||||
|
||||
/* Insert element in the middle */
|
||||
nums.splice(3, 0, 6);
|
||||
|
||||
/* Remove elements */
|
||||
nums.splice(3, 1);
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title="list.ts"
|
||||
/* Clear list */
|
||||
nums.length = 0;
|
||||
|
||||
/* Append elements at the end */
|
||||
nums.push(1);
|
||||
nums.push(3);
|
||||
nums.push(2);
|
||||
nums.push(5);
|
||||
nums.push(4);
|
||||
|
||||
/* Insert element in the middle */
|
||||
nums.splice(3, 0, 6);
|
||||
|
||||
/* Remove elements */
|
||||
nums.splice(3, 1);
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title="list.dart"
|
||||
/* Clear list */
|
||||
nums.clear();
|
||||
|
||||
/* Append elements at the end */
|
||||
nums.add(1);
|
||||
nums.add(3);
|
||||
nums.add(2);
|
||||
nums.add(5);
|
||||
nums.add(4);
|
||||
|
||||
/* Insert element in the middle */
|
||||
nums.insert(3, 6); // Insert number 6 at index 3
|
||||
|
||||
/* Remove elements */
|
||||
nums.removeAt(3); // Remove the element at index 3
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title="list.rs"
|
||||
/* Clear list */
|
||||
nums.clear();
|
||||
|
||||
/* Append elements at the end */
|
||||
nums.push(1);
|
||||
nums.push(3);
|
||||
nums.push(2);
|
||||
nums.push(5);
|
||||
nums.push(4);
|
||||
|
||||
/* Insert element in the middle */
|
||||
nums.insert(3, 6); // Insert number 6 at index 3
|
||||
|
||||
/* Remove elements */
|
||||
nums.remove(3); // Remove the element at index 3
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title="list.c"
|
||||
// C does not provide built-in dynamic arrays
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title="list.kt"
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title="list.zig"
|
||||
// Clear list
|
||||
nums.clearRetainingCapacity();
|
||||
|
||||
// Append elements at the end
|
||||
try nums.append(1);
|
||||
try nums.append(3);
|
||||
try nums.append(2);
|
||||
try nums.append(5);
|
||||
try nums.append(4);
|
||||
|
||||
// Insert element in the middle
|
||||
try nums.insert(3, 6); // Insert number 6 at index 3
|
||||
|
||||
// Remove elements
|
||||
_ = nums.orderedRemove(3); // Remove the element at index 3
|
||||
```
|
||||
|
||||
### Iterating the List
|
||||
|
||||
Similar to arrays, lists can be iterated either by using indices or by directly iterating through each element.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title="list.py"
|
||||
# Iterate through the list by index
|
||||
count = 0
|
||||
for i in range(len(nums)):
|
||||
count += nums[i]
|
||||
|
||||
# Iterate directly through list elements
|
||||
for num in nums:
|
||||
count += num
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title="list.cpp"
|
||||
/* Iterate through the list by index */
|
||||
int count = 0;
|
||||
for (int i = 0; i < nums.size(); i++) {
|
||||
count += nums[i];
|
||||
}
|
||||
|
||||
/* Iterate directly through list elements */
|
||||
count = 0;
|
||||
for (int num : nums) {
|
||||
count += num;
|
||||
}
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title="list.java"
|
||||
/* Iterate through the list by index */
|
||||
int count = 0;
|
||||
for (int i = 0; i < nums.size(); i++) {
|
||||
count += nums.get(i);
|
||||
}
|
||||
|
||||
/* Iterate directly through list elements */
|
||||
for (int num : nums) {
|
||||
count += num;
|
||||
}
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title="list.cs"
|
||||
/* Iterate through the list by index */
|
||||
int count = 0;
|
||||
for (int i = 0; i < nums.Count; i++) {
|
||||
count += nums[i];
|
||||
}
|
||||
|
||||
/* Iterate directly through list elements */
|
||||
count = 0;
|
||||
foreach (int num in nums) {
|
||||
count += num;
|
||||
}
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title="list_test.go"
|
||||
/* Iterate through the list by index */
|
||||
count := 0
|
||||
for i := 0; i < len(nums); i++ {
|
||||
count += nums[i]
|
||||
}
|
||||
|
||||
/* Iterate directly through list elements */
|
||||
count = 0
|
||||
for _, num := range nums {
|
||||
count += num
|
||||
}
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title="list.swift"
|
||||
/* Iterate through the list by index */
|
||||
var count = 0
|
||||
for i in nums.indices {
|
||||
count += nums[i]
|
||||
}
|
||||
|
||||
/* Iterate directly through list elements */
|
||||
count = 0
|
||||
for num in nums {
|
||||
count += num
|
||||
}
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title="list.js"
|
||||
/* Iterate through the list by index */
|
||||
let count = 0;
|
||||
for (let i = 0; i < nums.length; i++) {
|
||||
count += nums[i];
|
||||
}
|
||||
|
||||
/* Iterate directly through list elements */
|
||||
count = 0;
|
||||
for (const num of nums) {
|
||||
count += num;
|
||||
}
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title="list.ts"
|
||||
/* Iterate through the list by index */
|
||||
let count = 0;
|
||||
for (let i = 0; i < nums.length; i++) {
|
||||
count += nums[i];
|
||||
}
|
||||
|
||||
/* Iterate directly through list elements */
|
||||
count = 0;
|
||||
for (const num of nums) {
|
||||
count += num;
|
||||
}
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title="list.dart"
|
||||
/* Iterate through the list by index */
|
||||
int count = 0;
|
||||
for (var i = 0; i < nums.length; i++) {
|
||||
count += nums[i];
|
||||
}
|
||||
|
||||
/* Iterate directly through list elements */
|
||||
count = 0;
|
||||
for (var num in nums) {
|
||||
count += num;
|
||||
}
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title="list.rs"
|
||||
// Iterate through the list by index
|
||||
let mut _count = 0;
|
||||
for i in 0..nums.len() {
|
||||
_count += nums[i];
|
||||
}
|
||||
|
||||
// Iterate directly through list elements
|
||||
_count = 0;
|
||||
for num in &nums {
|
||||
_count += num;
|
||||
}
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title="list.c"
|
||||
// C does not provide built-in dynamic arrays
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title="list.kt"
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title="list.zig"
|
||||
// Iterate through the list by index
|
||||
var count: i32 = 0;
|
||||
var i: i32 = 0;
|
||||
while (i < nums.items.len) : (i += 1) {
|
||||
count += nums[i];
|
||||
}
|
||||
|
||||
// Iterate directly through list elements
|
||||
count = 0;
|
||||
for (nums.items) |num| {
|
||||
count += num;
|
||||
}
|
||||
```
|
||||
|
||||
### Concatenating Lists
|
||||
|
||||
Given a new list `nums1`, we can append it to the end of the original list.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title="list.py"
|
||||
# Concatenate two lists
|
||||
nums1: list[int] = [6, 8, 7, 10, 9]
|
||||
nums += nums1 # Concatenate nums1 to the end of nums
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title="list.cpp"
|
||||
/* Concatenate two lists */
|
||||
vector<int> nums1 = { 6, 8, 7, 10, 9 };
|
||||
// Concatenate nums1 to the end of nums
|
||||
nums.insert(nums.end(), nums1.begin(), nums1.end());
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title="list.java"
|
||||
/* Concatenate two lists */
|
||||
List<Integer> nums1 = new ArrayList<>(Arrays.asList(new Integer[] { 6, 8, 7, 10, 9 }));
|
||||
nums.addAll(nums1); // Concatenate nums1 to the end of nums
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title="list.cs"
|
||||
/* Concatenate two lists */
|
||||
List<int> nums1 = [6, 8, 7, 10, 9];
|
||||
nums.AddRange(nums1); // Concatenate nums1 to the end of nums
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title="list_test.go"
|
||||
/* Concatenate two lists */
|
||||
nums1 := []int{6, 8, 7, 10, 9}
|
||||
nums = append(nums, nums1...) // Concatenate nums1 to the end of nums
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title="list.swift"
|
||||
/* Concatenate two lists */
|
||||
let nums1 = [6, 8, 7, 10, 9]
|
||||
nums.append(contentsOf: nums1) // Concatenate nums1 to the end of nums
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title="list.js"
|
||||
/* Concatenate two lists */
|
||||
const nums1 = [6, 8, 7, 10, 9];
|
||||
nums.push(...nums1); // Concatenate nums1 to the end of nums
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title="list.ts"
|
||||
/* Concatenate two lists */
|
||||
const nums1: number[] = [6, 8, 7, 10, 9];
|
||||
nums.push(...nums1); // Concatenate nums1 to the end of nums
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title="list.dart"
|
||||
/* Concatenate two lists */
|
||||
List<int> nums1 = [6, 8, 7, 10, 9];
|
||||
nums.addAll(nums1); // Concatenate nums1 to the end of nums
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title="list.rs"
|
||||
/* Concatenate two lists */
|
||||
let nums1: Vec<i32> = vec![6, 8, 7, 10, 9];
|
||||
nums.extend(nums1);
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title="list.c"
|
||||
// C does not provide built-in dynamic arrays
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title="list.kt"
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title="list.zig"
|
||||
// Concatenate two lists
|
||||
var nums1 = std.ArrayList(i32).init(std.heap.page_allocator);
|
||||
defer nums1.deinit();
|
||||
try nums1.appendSlice(&[_]i32{ 6, 8, 7, 10, 9 });
|
||||
try nums.insertSlice(nums.items.len, nums1.items); // Concatenate nums1 to the end of nums
|
||||
```
|
||||
|
||||
### Sorting the List
|
||||
|
||||
Once the list is sorted, we can employ algorithms commonly used in array-related algorithm problems, such as "binary search" and "two-pointer" algorithms.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title="list.py"
|
||||
# Sort the list
|
||||
nums.sort() # After sorting, the list elements are in ascending order
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title="list.cpp"
|
||||
/* Sort the list */
|
||||
sort(nums.begin(), nums.end()); // After sorting, the list elements are in ascending order
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title="list.java"
|
||||
/* Sort the list */
|
||||
Collections.sort(nums); // After sorting, the list elements are in ascending order
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title="list.cs"
|
||||
/* Sort the list */
|
||||
nums.Sort(); // After sorting, the list elements are in ascending order
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title="list_test.go"
|
||||
/* Sort the list */
|
||||
sort.Ints(nums) // After sorting, the list elements are in ascending order
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title="list.swift"
|
||||
/* Sort the list */
|
||||
nums.sort() // After sorting, the list elements are in ascending order
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title="list.js"
|
||||
/* Sort the list */
|
||||
nums.sort((a, b) => a - b); // After sorting, the list elements are in ascending order
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title="list.ts"
|
||||
/* Sort the list */
|
||||
nums.sort((a, b) => a - b); // After sorting, the list elements are in ascending order
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title="list.dart"
|
||||
/* Sort the list */
|
||||
nums.sort(); // After sorting, the list elements are in ascending order
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title="list.rs"
|
||||
/* Sort the list */
|
||||
nums.sort(); // After sorting, the list elements are in ascending order
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title="list.c"
|
||||
// C does not provide built-in dynamic arrays
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title="list.kt"
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title="list.zig"
|
||||
// Sort the list
|
||||
std.sort.sort(i32, nums.items, {}, comptime std.sort.asc(i32));
|
||||
```
|
||||
|
||||
## List Implementation
|
||||
|
||||
Many programming languages come with built-in lists, including Java, C++, Python, etc. Their implementations tend to be intricate, featuring carefully considered settings for various parameters, like initial capacity and expansion factors. Readers who are curious can delve into the source code for further learning.
|
||||
|
||||
To enhance our understanding of how lists work, we will attempt to implement a simplified version of a list, focusing on three crucial design aspects:
|
||||
|
||||
- **Initial Capacity**: Choose a reasonable initial capacity for the array. In this example, we choose 10 as the initial capacity.
|
||||
- **Size Recording**: Declare a variable `size` to record the current number of elements in the list, updating in real-time with element insertion and deletion. With this variable, we can locate the end of the list and determine whether expansion is needed.
|
||||
- **Expansion Mechanism**: If the list reaches full capacity upon an element insertion, an expansion process is required. This involves creating a larger array based on the expansion factor, and then transferring all elements from the current array to the new one. In this example, we stipulate that the array size should double with each expansion.
|
||||
|
||||
```src
|
||||
[file]{my_list}-[class]{my_list}-[func]{}
|
||||
```
|
After Width: | Height: | Size: 11 KiB |
After Width: | Height: | Size: 18 KiB |
71
en/docs/chapter_array_and_linkedlist/ram_and_cache.md
Normal file
@ -0,0 +1,71 @@
|
||||
# Memory and Cache *
|
||||
|
||||
In the first two sections of this chapter, we explored arrays and linked lists, two fundamental and important data structures, representing "continuous storage" and "dispersed storage" respectively.
|
||||
|
||||
In fact, **the physical structure largely determines the efficiency of a program's use of memory and cache**, which in turn affects the overall performance of the algorithm.
|
||||
|
||||
## Computer Storage Devices
|
||||
|
||||
There are three types of storage devices in computers: "hard disk," "random-access memory (RAM)," and "cache memory." The following table shows their different roles and performance characteristics in computer systems.
|
||||
|
||||
<p align="center"> Table <id> Computer Storage Devices </p>
|
||||
|
||||
| | Hard Disk | Memory | Cache |
|
||||
| ---------- | -------------------------------------------------------------- | ------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------- |
|
||||
| Usage | Long-term storage of data, including OS, programs, files, etc. | Temporary storage of currently running programs and data being processed | Stores frequently accessed data and instructions, reducing the number of CPU accesses to memory |
|
||||
| Volatility | Data is not lost after power off | Data is lost after power off | Data is lost after power off |
|
||||
| Capacity | Larger, TB level | Smaller, GB level | Very small, MB level |
|
||||
| Speed | Slower, several hundred to thousands MB/s | Faster, several tens of GB/s | Very fast, several tens to hundreds of GB/s |
|
||||
| Price | Cheaper, several cents to yuan / GB | More expensive, tens to hundreds of yuan / GB | Very expensive, priced with CPU |
|
||||
|
||||
We can imagine the computer storage system as a pyramid structure shown in the figure below. The storage devices closer to the top of the pyramid are faster, have smaller capacity, and are more costly. This multi-level design is not accidental, but the result of careful consideration by computer scientists and engineers.
|
||||
|
||||
- **Hard disks are difficult to replace with memory**. Firstly, data in memory is lost after power off, making it unsuitable for long-term data storage; secondly, the cost of memory is dozens of times that of hard disks, making it difficult to popularize in the consumer market.
|
||||
- **It is difficult for caches to have both large capacity and high speed**. As the capacity of L1, L2, L3 caches gradually increases, their physical size becomes larger, increasing the physical distance from the CPU core, leading to increased data transfer time and higher element access latency. Under current technology, a multi-level cache structure is the best balance between capacity, speed, and cost.
|
||||
|
||||

|
||||
|
||||
!!! note
|
||||
|
||||
The storage hierarchy of computers reflects a delicate balance between speed, capacity, and cost. In fact, this kind of trade-off is common in all industrial fields, requiring us to find the best balance between different advantages and limitations.
|
||||
|
||||
Overall, **hard disks are used for long-term storage of large amounts of data, memory is used for temporary storage of data being processed during program execution, and cache is used to store frequently accessed data and instructions** to improve program execution efficiency. Together, they ensure the efficient operation of computer systems.
|
||||
|
||||
As shown in the figure below, during program execution, data is read from the hard disk into memory for CPU computation. The cache can be considered a part of the CPU, **smartly loading data from memory** to provide fast data access to the CPU, significantly enhancing program execution efficiency and reducing reliance on slower memory.
|
||||
|
||||

|
||||
|
||||
## Memory Efficiency of Data Structures
|
||||
|
||||
In terms of memory space utilization, arrays and linked lists have their advantages and limitations.
|
||||
|
||||
On one hand, **memory is limited and cannot be shared by multiple programs**, so we hope that data structures can use space as efficiently as possible. The elements of an array are tightly packed without extra space for storing references (pointers) between linked list nodes, making them more space-efficient. However, arrays require allocating sufficient continuous memory space at once, which may lead to memory waste, and array expansion also requires additional time and space costs. In contrast, linked lists allocate and reclaim memory dynamically on a per-node basis, providing greater flexibility.
|
||||
|
||||
On the other hand, during program execution, **as memory is repeatedly allocated and released, the degree of fragmentation of free memory becomes higher**, leading to reduced memory utilization efficiency. Arrays, due to their continuous storage method, are relatively less likely to cause memory fragmentation. In contrast, the elements of a linked list are dispersedly stored, and frequent insertion and deletion operations make memory fragmentation more likely.
|
||||
|
||||
## Cache Efficiency of Data Structures
|
||||
|
||||
Although caches are much smaller in space capacity than memory, they are much faster and play a crucial role in program execution speed. Since the cache's capacity is limited and can only store a small part of frequently accessed data, when the CPU tries to access data not in the cache, a "cache miss" occurs, forcing the CPU to load the needed data from slower memory.
|
||||
|
||||
Clearly, **the fewer the cache misses, the higher the CPU's data read-write efficiency**, and the better the program performance. The proportion of successful data retrieval from the cache by the CPU is called the "cache hit rate," a metric often used to measure cache efficiency.
|
||||
|
||||
To achieve higher efficiency, caches adopt the following data loading mechanisms.
|
||||
|
||||
- **Cache Lines**: Caches don't store and load data byte by byte but in units of cache lines. Compared to byte-by-byte transfer, the transmission of cache lines is more efficient.
|
||||
- **Prefetch Mechanism**: Processors try to predict data access patterns (such as sequential access, fixed stride jumping access, etc.) and load data into the cache according to specific patterns to improve the hit rate.
|
||||
- **Spatial Locality**: If data is accessed, data nearby is likely to be accessed in the near future. Therefore, when loading certain data, the cache also loads nearby data to improve the hit rate.
|
||||
- **Temporal Locality**: If data is accessed, it's likely to be accessed again in the near future. Caches use this principle to retain recently accessed data to improve the hit rate.
|
||||
|
||||
In fact, **arrays and linked lists have different cache utilization efficiencies**, mainly reflected in the following aspects.
|
||||
|
||||
- **Occupied Space**: Linked list elements occupy more space than array elements, resulting in less effective data volume in the cache.
|
||||
- **Cache Lines**: Linked list data is scattered throughout memory, and since caches load "by line," the proportion of loading invalid data is higher.
|
||||
- **Prefetch Mechanism**: The data access pattern of arrays is more "predictable" than that of linked lists, meaning the system is more likely to guess which data will be loaded next.
|
||||
- **Spatial Locality**: Arrays are stored in concentrated memory spaces, so the data near the loaded data is more likely to be accessed next.
|
||||
|
||||
Overall, **arrays have a higher cache hit rate and are generally more efficient in operation than linked lists**. This makes data structures based on arrays more popular in solving algorithmic problems.
|
||||
|
||||
It should be noted that **high cache efficiency does not mean that arrays are always better than linked lists**. Which data structure to choose in actual applications should be based on specific requirements. For example, both arrays and linked lists can implement the "stack" data structure (which will be detailed in the next chapter), but they are suitable for different scenarios.
|
||||
|
||||
- In algorithm problems, we tend to choose stacks based on arrays because they provide higher operational efficiency and random access capabilities, with the only cost being the need to pre-allocate a certain amount of memory space for the array.
|
||||
- If the data volume is very large, highly dynamic, and the expected size of the stack is difficult to estimate, then a stack based on a linked list is more appropriate. Linked lists can disperse a large amount of data in different parts of the memory and avoid the additional overhead of array expansion.
|
81
en/docs/chapter_array_and_linkedlist/summary.md
Normal file
@ -0,0 +1,81 @@
|
||||
# Summary
|
||||
|
||||
### Key Review
|
||||
|
||||
- Arrays and linked lists are two basic data structures, representing two storage methods in computer memory: contiguous space storage and non-contiguous space storage. Their characteristics complement each other.
|
||||
- Arrays support random access and use less memory; however, they are inefficient in inserting and deleting elements and have a fixed length after initialization.
|
||||
- Linked lists implement efficient node insertion and deletion through changing references (pointers) and can flexibly adjust their length; however, they have lower node access efficiency and consume more memory.
|
||||
- Common types of linked lists include singly linked lists, circular linked lists, and doubly linked lists, each with its own application scenarios.
|
||||
- Lists are ordered collections of elements that support addition, deletion, and modification, typically implemented based on dynamic arrays, retaining the advantages of arrays while allowing flexible length adjustment.
|
||||
- The advent of lists significantly enhanced the practicality of arrays but may lead to some memory space wastage.
|
||||
- During program execution, data is mainly stored in memory. Arrays provide higher memory space efficiency, while linked lists are more flexible in memory usage.
|
||||
- Caches provide fast data access to CPUs through mechanisms like cache lines, prefetching, spatial locality, and temporal locality, significantly enhancing program execution efficiency.
|
||||
- Due to higher cache hit rates, arrays are generally more efficient than linked lists. When choosing a data structure, the appropriate choice should be made based on specific needs and scenarios.
|
||||
|
||||
### Q & A
|
||||
|
||||
**Q**: Does storing arrays on the stack versus the heap affect time and space efficiency?
|
||||
|
||||
Arrays stored on both the stack and heap are stored in contiguous memory spaces, and data operation efficiency is essentially the same. However, stacks and heaps have their own characteristics, leading to the following differences.
|
||||
|
||||
1. Allocation and release efficiency: The stack is a smaller memory block, allocated automatically by the compiler; the heap memory is relatively larger and can be dynamically allocated in the code, more prone to fragmentation. Therefore, allocation and release operations on the heap are generally slower than on the stack.
|
||||
2. Size limitation: Stack memory is relatively small, while the heap size is generally limited by available memory. Therefore, the heap is more suitable for storing large arrays.
|
||||
3. Flexibility: The size of arrays on the stack needs to be determined at compile-time, while the size of arrays on the heap can be dynamically determined at runtime.
|
||||
|
||||
**Q**: Why do arrays require elements of the same type, while linked lists do not emphasize same-type elements?
|
||||
|
||||
Linked lists consist of nodes connected by references (pointers), and each node can store data of different types, such as int, double, string, object, etc.
|
||||
|
||||
In contrast, array elements must be of the same type, allowing the calculation of offsets to access the corresponding element positions. For example, an array containing both int and long types, with single elements occupying 4 bytes and 8 bytes respectively, cannot use the following formula to calculate offsets, as the array contains elements of two different lengths.
|
||||
|
||||
```shell
|
||||
# Element memory address = Array memory address + Element length * Element index
|
||||
```
|
||||
|
||||
**Q**: After deleting a node, is it necessary to set `P.next` to `None`?
|
||||
|
||||
Not modifying `P.next` is also acceptable. From the perspective of the linked list, traversing from the head node to the tail node will no longer encounter `P`. This means that node `P` has been effectively removed from the list, and where `P` points no longer affects the list.
|
||||
|
||||
From a garbage collection perspective, for languages with automatic garbage collection mechanisms like Java, Python, and Go, whether node `P` is collected depends on whether there are still references pointing to it, not on the value of `P.next`. In languages like C and C++, we need to manually free the node's memory.
|
||||
|
||||
**Q**: In linked lists, the time complexity for insertion and deletion operations is `O(1)`. But searching for the element before insertion or deletion takes `O(n)` time, so why isn't the time complexity `O(n)`?
|
||||
|
||||
If an element is searched first and then deleted, the time complexity is indeed `O(n)`. However, the `O(1)` advantage of linked lists in insertion and deletion can be realized in other applications. For example, in the implementation of double-ended queues using linked lists, we maintain pointers always pointing to the head and tail nodes, making each insertion and deletion operation `O(1)`.
|
||||
|
||||
**Q**: In the image "Linked List Definition and Storage Method", do the light blue storage nodes occupy a single memory address, or do they share half with the node value?
|
||||
|
||||
The diagram is just a qualitative representation; quantitative analysis depends on specific situations.
|
||||
|
||||
- Different types of node values occupy different amounts of space, such as int, long, double, and object instances.
|
||||
- The memory space occupied by pointer variables depends on the operating system and compilation environment used, usually 8 bytes or 4 bytes.
|
||||
|
||||
**Q**: Is adding elements to the end of a list always `O(1)`?
|
||||
|
||||
If adding an element exceeds the list length, the list needs to be expanded first. The system will request a new memory block and move all elements of the original list over, in which case the time complexity becomes `O(n)`.
|
||||
|
||||
**Q**: The statement "The emergence of lists greatly improves the practicality of arrays, but may lead to some memory space wastage" - does this refer to the memory occupied by additional variables like capacity, length, and expansion multiplier?
|
||||
|
||||
The space wastage here mainly refers to two aspects: on the one hand, lists are set with an initial length, which we may not always need; on the other hand, to prevent frequent expansion, expansion usually multiplies by a coefficient, such as $\times 1.5$. This results in many empty slots, which we typically cannot fully fill.
|
||||
|
||||
**Q**: In Python, after initializing `n = [1, 2, 3]`, the addresses of these 3 elements are contiguous, but initializing `m = [2, 1, 3]` shows that each element's `id` is not consecutive but identical to those in `n`. If the addresses of these elements are not contiguous, is `m` still an array?
|
||||
|
||||
If we replace list elements with linked list nodes `n = [n1, n2, n3, n4, n5]`, these 5 node objects are also typically dispersed throughout memory. However, given a list index, we can still access the node's memory address in `O(1)` time, thereby accessing the corresponding node. This is because the array stores references to the nodes, not the nodes themselves.
|
||||
|
||||
Unlike many languages, in Python, numbers are also wrapped as objects, and lists store references to these numbers, not the numbers themselves. Therefore, we find that the same number in two arrays has the same `id`, and these numbers' memory addresses need not be contiguous.
|
||||
|
||||
**Q**: The `std::list` in C++ STL has already implemented a doubly linked list, but it seems that some algorithm books don't directly use it. Is there any limitation?
|
||||
|
||||
On the one hand, we often prefer to use arrays to implement algorithms, only using linked lists when necessary, mainly for two reasons.
|
||||
|
||||
- Space overhead: Since each element requires two additional pointers (one for the previous element and one for the next), `std::list` usually occupies more space than `std::vector`.
|
||||
- Cache unfriendly: As the data is not stored continuously, `std::list` has a lower cache utilization rate. Generally, `std::vector` performs better.
|
||||
|
||||
On the other hand, linked lists are primarily necessary for binary trees and graphs. Stacks and queues are often implemented using the programming language's `stack` and `queue` classes, rather than linked lists.
|
||||
|
||||
**Q**: Does initializing a list `res = [0] * self.size()` result in each element of `res` referencing the same address?
|
||||
|
||||
No. However, this issue arises with two-dimensional arrays, for example, initializing a two-dimensional list `res = [[0] * self.size()]` would reference the same list `[0]` multiple times.
|
||||
|
||||
**Q**: In deleting a node, is it necessary to break the reference to its successor node?
|
||||
|
||||
From the perspective of data structures and algorithms (problem-solving), it's okay not to break the link, as long as the program's logic is correct. From the perspective of standard libraries, breaking the link is safer and more logically clear. If the link is not broken, and the deleted node is not properly recycled, it could affect the recycling of the successor node's memory.
|
13
en/docs/chapter_computational_complexity/index.md
Normal file
@ -0,0 +1,13 @@
|
||||
# Complexity Analysis
|
||||
|
||||
<div class="center-table" markdown>
|
||||
|
||||

|
||||
|
||||
</div>
|
||||
|
||||
!!! abstract
|
||||
|
||||
Complexity analysis is like a space-time navigator in the vast universe of algorithms.
|
||||
|
||||
It guides us in exploring deeper within the the dimensions of time and space, seeking more elegant solutions.
|
After Width: | Height: | Size: 10 KiB |
After Width: | Height: | Size: 14 KiB |
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 22 KiB |
After Width: | Height: | Size: 13 KiB |
After Width: | Height: | Size: 24 KiB |
@ -0,0 +1,194 @@
|
||||
# Iteration and Recursion
|
||||
|
||||
In algorithms, the repeated execution of a task is quite common and is closely related to the analysis of complexity. Therefore, before delving into the concepts of time complexity and space complexity, let's first explore how to implement repetitive tasks in programming. This involves understanding two fundamental programming control structures: iteration and recursion.
|
||||
|
||||
## Iteration
|
||||
|
||||
"Iteration" is a control structure for repeatedly performing a task. In iteration, a program repeats a block of code as long as a certain condition is met until this condition is no longer satisfied.
|
||||
|
||||
### For Loops
|
||||
|
||||
The `for` loop is one of the most common forms of iteration, and **it's particularly suitable when the number of iterations is known in advance**.
|
||||
|
||||
The following function uses a `for` loop to perform a summation of $1 + 2 + \dots + n$, with the sum being stored in the variable `res`. It's important to note that in Python, `range(a, b)` creates an interval that is inclusive of `a` but exclusive of `b`, meaning it iterates over the range from $a$ up to $b−1$.
|
||||
|
||||
```src
|
||||
[file]{iteration}-[class]{}-[func]{for_loop}
|
||||
```
|
||||
|
||||
The flowchart below represents this sum function.
|
||||
|
||||

|
||||
|
||||
The number of operations in this summation function is proportional to the size of the input data $n$, or in other words, it has a "linear relationship." This "linear relationship" is what time complexity describes. This topic will be discussed in more detail in the next section.
|
||||
|
||||
### While Loops
|
||||
|
||||
Similar to `for` loops, `while` loops are another approach for implementing iteration. In a `while` loop, the program checks a condition at the beginning of each iteration; if the condition is true, the execution continues, otherwise, the loop ends.
|
||||
|
||||
Below we use a `while` loop to implement the sum $1 + 2 + \dots + n$.
|
||||
|
||||
```src
|
||||
[file]{iteration}-[class]{}-[func]{while_loop}
|
||||
```
|
||||
|
||||
**`While` loops provide more flexibility than `for` loops**, especially since they allow for custom initialization and modification of the condition variable at each step.
|
||||
|
||||
For example, in the following code, the condition variable $i$ is updated twice each round, which would be inconvenient to implement with a `for` loop.
|
||||
|
||||
```src
|
||||
[file]{iteration}-[class]{}-[func]{while_loop_ii}
|
||||
```
|
||||
|
||||
Overall, **`for` loops are more concise, while `while` loops are more flexible**. Both can implement iterative structures. Which one to use should be determined based on the specific requirements of the problem.
|
||||
|
||||
### Nested Loops
|
||||
|
||||
We can nest one loop structure within another. Below is an example using `for` loops:
|
||||
|
||||
```src
|
||||
[file]{iteration}-[class]{}-[func]{nested_for_loop}
|
||||
```
|
||||
|
||||
The flowchart below represents this nested loop.
|
||||
|
||||

|
||||
|
||||
In such cases, the number of operations of the function is proportional to $n^2$, meaning the algorithm's runtime and the size of the input data $n$ has a 'quadratic relationship.'
|
||||
|
||||
We can further increase the complexity by adding more nested loops, each level of nesting effectively "increasing the dimension," which raises the time complexity to "cubic," "quartic," and so on.
|
||||
|
||||
## Recursion
|
||||
|
||||
"Recursion" is an algorithmic strategy where a function solves a problem by calling itself. It primarily involves two phases:
|
||||
|
||||
1. **Calling**: This is where the program repeatedly calls itself, often with progressively smaller or simpler arguments, moving towards the "termination condition."
|
||||
2. **Returning**: Upon triggering the "termination condition," the program begins to return from the deepest recursive function, aggregating the results of each layer.
|
||||
|
||||
From an implementation perspective, recursive code mainly includes three elements.
|
||||
|
||||
1. **Termination Condition**: Determines when to switch from "calling" to "returning."
|
||||
2. **Recursive Call**: Corresponds to "calling," where the function calls itself, usually with smaller or more simplified parameters.
|
||||
3. **Return Result**: Corresponds to "returning," where the result of the current recursion level is returned to the previous layer.
|
||||
|
||||
Observe the following code, where simply calling the function `recur(n)` can compute the sum of $1 + 2 + \dots + n$:
|
||||
|
||||
```src
|
||||
[file]{recursion}-[class]{}-[func]{recur}
|
||||
```
|
||||
|
||||
The figure below shows the recursive process of this function.
|
||||
|
||||

|
||||
|
||||
Although iteration and recursion can achieve the same results from a computational standpoint, **they represent two entirely different paradigms of thinking and problem-solving**.
|
||||
|
||||
- **Iteration**: Solves problems "from the bottom up." It starts with the most basic steps, and then repeatedly adds or accumulates these steps until the task is complete.
|
||||
- **Recursion**: Solves problems "from the top down." It breaks down the original problem into smaller sub-problems, each of which has the same form as the original problem. These sub-problems are then further decomposed into even smaller sub-problems, stopping at the base case whose solution is known.
|
||||
|
||||
Let's take the earlier example of the summation function, defined as $f(n) = 1 + 2 + \dots + n$.
|
||||
|
||||
- **Iteration**: In this approach, we simulate the summation process within a loop. Starting from $1$ and traversing to $n$, we perform the summation operation in each iteration to eventually compute $f(n)$.
|
||||
- **Recursion**: Here, the problem is broken down into a sub-problem: $f(n) = n + f(n-1)$. This decomposition continues recursively until reaching the base case, $f(1) = 1$, at which point the recursion terminates.
|
||||
|
||||
### Call Stack
|
||||
|
||||
Every time a recursive function calls itself, the system allocates memory for the newly initiated function to store local variables, the return address, and other relevant information. This leads to two primary outcomes.
|
||||
|
||||
- The function's context data is stored in a memory area called "stack frame space" and is only released after the function returns. Therefore, **recursion generally consumes more memory space than iteration**.
|
||||
- Recursive calls introduce additional overhead. **Hence, recursion is usually less time-efficient than loops.**
|
||||
|
||||
As shown in the figure below, there are $n$ unreturned recursive functions before triggering the termination condition, indicating a **recursion depth of $n$**.
|
||||
|
||||

|
||||
|
||||
In practice, the depth of recursion allowed by programming languages is usually limited, and excessively deep recursion can lead to stack overflow errors.
|
||||
|
||||
### Tail Recursion
|
||||
|
||||
Interestingly, **if a function performs its recursive call as the very last step before returning,** it can be optimized by the compiler or interpreter to be as space-efficient as iteration. This scenario is known as "tail recursion."
|
||||
|
||||
- **Regular Recursion**: In standard recursion, when the function returns to the previous level, it continues to execute more code, requiring the system to save the context of the previous call.
|
||||
- **Tail Recursion**: Here, the recursive call is the final operation before the function returns. This means that upon returning to the previous level, no further actions are needed, so the system does not need to save the context of the previous level.
|
||||
|
||||
For example, in calculating $1 + 2 + \dots + n$, we can make the result variable `res` a parameter of the function, thereby achieving tail recursion:
|
||||
|
||||
```src
|
||||
[file]{recursion}-[class]{}-[func]{tail_recur}
|
||||
```
|
||||
|
||||
The execution process of tail recursion is shown in the following figure. Comparing regular recursion and tail recursion, the point of the summation operation is different.
|
||||
|
||||
- **Regular Recursion**: The summation operation occurs during the "returning" phase, requiring another summation after each layer returns.
|
||||
- **Tail Recursion**: The summation operation occurs during the "calling" phase, and the "returning" phase only involves returning through each layer.
|
||||
|
||||

|
||||
|
||||
!!! tip
|
||||
|
||||
Note that many compilers or interpreters do not support tail recursion optimization. For example, Python does not support tail recursion optimization by default, so even if the function is in the form of tail recursion, it may still encounter stack overflow issues.
|
||||
|
||||
### Recursion Tree
|
||||
|
||||
When dealing with algorithms related to "divide and conquer", recursion often offers a more intuitive approach and more readable code than iteration. Take the "Fibonacci sequence" as an example.
|
||||
|
||||
!!! question
|
||||
|
||||
Given a Fibonacci sequence $0, 1, 1, 2, 3, 5, 8, 13, \dots$, find the $n$th number in the sequence.
|
||||
|
||||
Let the $n$th number of the Fibonacci sequence be $f(n)$, it's easy to deduce two conclusions:
|
||||
|
||||
- The first two numbers of the sequence are $f(1) = 0$ and $f(2) = 1$.
|
||||
- Each number in the sequence is the sum of the two preceding ones, that is, $f(n) = f(n - 1) + f(n - 2)$.
|
||||
|
||||
Using the recursive relation, and considering the first two numbers as termination conditions, we can write the recursive code. Calling `fib(n)` will yield the $n$th number of the Fibonacci sequence:
|
||||
|
||||
```src
|
||||
[file]{recursion}-[class]{}-[func]{fib}
|
||||
```
|
||||
|
||||
Observing the above code, we see that it recursively calls two functions within itself, **meaning that one call generates two branching calls**. As illustrated below, this continuous recursive calling eventually creates a "recursion tree" with a depth of $n$.
|
||||
|
||||

|
||||
|
||||
Fundamentally, recursion embodies the paradigm of "breaking down a problem into smaller sub-problems." This divide-and-conquer strategy is crucial.
|
||||
|
||||
- From an algorithmic perspective, many important strategies like searching, sorting, backtracking, divide-and-conquer, and dynamic programming directly or indirectly use this way of thinking.
|
||||
- From a data structure perspective, recursion is naturally suited for dealing with linked lists, trees, and graphs, as they are well suited for analysis using the divide-and-conquer approach.
|
||||
|
||||
## Comparison
|
||||
|
||||
Summarizing the above content, the following table shows the differences between iteration and recursion in terms of implementation, performance, and applicability.
|
||||
|
||||
<p align="center"> Table: Comparison of Iteration and Recursion Characteristics </p>
|
||||
|
||||
| | Iteration | Recursion |
|
||||
| ----------------- | ----------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Approach | Loop structure | Function calls itself |
|
||||
| Time Efficiency | Generally higher efficiency, no function call overhead | Each function call generates overhead |
|
||||
| Memory Usage | Typically uses a fixed size of memory space | Accumulative function calls can use a substantial amount of stack frame space |
|
||||
| Suitable Problems | Suitable for simple loop tasks, intuitive and readable code | Suitable for problem decomposition, like trees, graphs, divide-and-conquer, backtracking, etc., concise and clear code structure |
|
||||
|
||||
!!! tip
|
||||
|
||||
If you find the following content difficult to understand, consider revisiting it after reading the "Stack" chapter.
|
||||
|
||||
So, what is the intrinsic connection between iteration and recursion? Taking the above recursive function as an example, the summation operation occurs during the recursion's "return" phase. This means that the initially called function is the last to complete its summation operation, **mirroring the "last in, first out" principle of a stack**.
|
||||
|
||||
Recursive terms like "call stack" and "stack frame space" hint at the close relationship between recursion and stacks.
|
||||
|
||||
1. **Calling**: When a function is called, the system allocates a new stack frame on the "call stack" for that function, storing local variables, parameters, return addresses, and other data.
|
||||
2. **Returning**: When a function completes execution and returns, the corresponding stack frame is removed from the "call stack," restoring the execution environment of the previous function.
|
||||
|
||||
Therefore, **we can use an explicit stack to simulate the behavior of the call stack**, thus transforming recursion into an iterative form:
|
||||
|
||||
```src
|
||||
[file]{recursion}-[class]{}-[func]{for_loop_recur}
|
||||
```
|
||||
|
||||
Observing the above code, when recursion is transformed into iteration, the code becomes more complex. Although iteration and recursion can often be transformed into each other, it's not always advisable to do so for two reasons:
|
||||
|
||||
- The transformed code may become more challenging to understand and less readable.
|
||||
- For some complex problems, simulating the behavior of the system's call stack can be quite challenging.
|
||||
|
||||
In conclusion, **whether to choose iteration or recursion depends on the specific nature of the problem**. In programming practice, it's crucial to weigh the pros and cons of both and choose the most suitable approach for the situation at hand.
|
@ -0,0 +1,48 @@
|
||||
# Algorithm Efficiency Assessment
|
||||
|
||||
In algorithm design, we pursue the following two objectives in sequence.
|
||||
|
||||
1. **Finding a Solution to the Problem**: The algorithm should reliably find the correct solution within the stipulated range of inputs.
|
||||
2. **Seeking the Optimal Solution**: For the same problem, multiple solutions might exist, and we aim to find the most efficient algorithm possible.
|
||||
|
||||
In other words, under the premise of being able to solve the problem, algorithm efficiency has become the main criterion for evaluating the merits of an algorithm, which includes the following two dimensions.
|
||||
|
||||
- **Time Efficiency**: The speed at which an algorithm runs.
|
||||
- **Space Efficiency**: The size of the memory space occupied by an algorithm.
|
||||
|
||||
In short, **our goal is to design data structures and algorithms that are both fast and memory-efficient**. Effectively assessing algorithm efficiency is crucial because only then can we compare various algorithms and guide the process of algorithm design and optimization.
|
||||
|
||||
There are mainly two methods of efficiency assessment: actual testing and theoretical estimation.
|
||||
|
||||
## Actual Testing
|
||||
|
||||
Suppose we have algorithms `A` and `B`, both capable of solving the same problem, and we need to compare their efficiencies. The most direct method is to use a computer to run these two algorithms and monitor and record their runtime and memory usage. This assessment method reflects the actual situation but has significant limitations.
|
||||
|
||||
On one hand, **it's difficult to eliminate interference from the testing environment**. Hardware configurations can affect algorithm performance. For example, algorithm `A` might run faster than `B` on one computer, but the opposite result may occur on another computer with different configurations. This means we would need to test on a variety of machines to calculate average efficiency, which is impractical.
|
||||
|
||||
On the other hand, **conducting a full test is very resource-intensive**. As the volume of input data changes, the efficiency of the algorithms may vary. For example, with smaller data volumes, algorithm `A` might run faster than `B`, but the opposite might be true with larger data volumes. Therefore, to draw convincing conclusions, we need to test a wide range of input data sizes, which requires significant computational resources.
|
||||
|
||||
## Theoretical Estimation
|
||||
|
||||
Due to the significant limitations of actual testing, we can consider evaluating algorithm efficiency solely through calculations. This estimation method is known as "asymptotic complexity analysis," or simply "complexity analysis."
|
||||
|
||||
Complexity analysis reflects the relationship between the time and space resources required for algorithm execution and the size of the input data. **It describes the trend of growth in the time and space required by the algorithm as the size of the input data increases**. This definition might sound complex, but we can break it down into three key points to understand it better.
|
||||
|
||||
- "Time and space resources" correspond to "time complexity" and "space complexity," respectively.
|
||||
- "As the size of input data increases" means that complexity reflects the relationship between algorithm efficiency and the volume of input data.
|
||||
- "The trend of growth in time and space" indicates that complexity analysis focuses not on the specific values of runtime or space occupied but on the "rate" at which time or space grows.
|
||||
|
||||
**Complexity analysis overcomes the disadvantages of actual testing methods**, reflected in the following aspects:
|
||||
|
||||
- It is independent of the testing environment and applicable to all operating platforms.
|
||||
- It can reflect algorithm efficiency under different data volumes, especially in the performance of algorithms with large data volumes.
|
||||
|
||||
!!! tip
|
||||
|
||||
If you're still confused about the concept of complexity, don't worry. We will introduce it in detail in subsequent chapters.
|
||||
|
||||
Complexity analysis provides us with a "ruler" to measure the time and space resources needed to execute an algorithm and compare the efficiency between different algorithms.
|
||||
|
||||
Complexity is a mathematical concept and may be abstract and challenging for beginners. From this perspective, complexity analysis might not be the best content to introduce first. However, when discussing the characteristics of a particular data structure or algorithm, it's hard to avoid analyzing its speed and space usage.
|
||||
|
||||
In summary, it's recommended that you establish a preliminary understanding of complexity analysis before diving deep into data structures and algorithms, **so that you can carry out simple complexity analyses of algorithms**.
|
After Width: | Height: | Size: 19 KiB |
After Width: | Height: | Size: 18 KiB |
After Width: | Height: | Size: 21 KiB |
After Width: | Height: | Size: 26 KiB |
After Width: | Height: | Size: 15 KiB |
803
en/docs/chapter_computational_complexity/space_complexity.md
Normal file
@ -0,0 +1,803 @@
|
||||
# Space Complexity
|
||||
|
||||
"Space complexity" is used to measure the growth trend of the memory space occupied by an algorithm as the amount of data increases. This concept is very similar to time complexity, except that "running time" is replaced with "occupied memory space".
|
||||
|
||||
## Space Related to Algorithms
|
||||
|
||||
The memory space used by an algorithm during its execution mainly includes the following types.
|
||||
|
||||
- **Input Space**: Used to store the input data of the algorithm.
|
||||
- **Temporary Space**: Used to store variables, objects, function contexts, and other data during the algorithm's execution.
|
||||
- **Output Space**: Used to store the output data of the algorithm.
|
||||
|
||||
Generally, the scope of space complexity statistics includes both "Temporary Space" and "Output Space".
|
||||
|
||||
Temporary space can be further divided into three parts.
|
||||
|
||||
- **Temporary Data**: Used to save various constants, variables, objects, etc., during the algorithm's execution.
|
||||
- **Stack Frame Space**: Used to save the context data of the called function. The system creates a stack frame at the top of the stack each time a function is called, and the stack frame space is released after the function returns.
|
||||
- **Instruction Space**: Used to store compiled program instructions, which are usually negligible in actual statistics.
|
||||
|
||||
When analyzing the space complexity of a program, **we typically count the Temporary Data, Stack Frame Space, and Output Data**, as shown in the figure below.
|
||||
|
||||

|
||||
|
||||
The relevant code is as follows:
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title=""
|
||||
class Node:
|
||||
"""Classes""""
|
||||
def __init__(self, x: int):
|
||||
self.val: int = x # node value
|
||||
self.next: Node | None = None # reference to the next node
|
||||
|
||||
def function() -> int:
|
||||
""""Functions"""""
|
||||
# Perform certain operations...
|
||||
return 0
|
||||
|
||||
def algorithm(n) -> int: # input data
|
||||
A = 0 # temporary data (constant, usually in uppercase)
|
||||
b = 0 # temporary data (variable)
|
||||
node = Node(0) # temporary data (object)
|
||||
c = function() # Stack frame space (call function)
|
||||
return A + b + c # output data
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title=""
|
||||
/* Structures */
|
||||
struct Node {
|
||||
int val;
|
||||
Node *next;
|
||||
Node(int x) : val(x), next(nullptr) {}
|
||||
};
|
||||
|
||||
/* Functions */
|
||||
int func() {
|
||||
// Perform certain operations...
|
||||
return 0;
|
||||
}
|
||||
|
||||
int algorithm(int n) { // input data
|
||||
const int a = 0; // temporary data (constant)
|
||||
int b = 0; // temporary data (variable)
|
||||
Node* node = new Node(0); // temporary data (object)
|
||||
int c = func(); // stack frame space (call function)
|
||||
return a + b + c; // output data
|
||||
}
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title=""
|
||||
/* Classes */
|
||||
class Node {
|
||||
int val;
|
||||
Node next;
|
||||
Node(int x) { val = x; }
|
||||
}
|
||||
|
||||
/* Functions */
|
||||
int function() {
|
||||
// Perform certain operations...
|
||||
return 0;
|
||||
}
|
||||
|
||||
int algorithm(int n) { // input data
|
||||
final int a = 0; // temporary data (constant)
|
||||
int b = 0; // temporary data (variable)
|
||||
Node node = new Node(0); // temporary data (object)
|
||||
int c = function(); // stack frame space (call function)
|
||||
return a + b + c; // output data
|
||||
}
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title=""
|
||||
/* Classes */
|
||||
class Node {
|
||||
int val;
|
||||
Node next;
|
||||
Node(int x) { val = x; }
|
||||
}
|
||||
|
||||
/* Functions */
|
||||
int Function() {
|
||||
// Perform certain operations...
|
||||
return 0;
|
||||
}
|
||||
|
||||
int Algorithm(int n) { // input data
|
||||
const int a = 0; // temporary data (constant)
|
||||
int b = 0; // temporary data (variable)
|
||||
Node node = new(0); // temporary data (object)
|
||||
int c = Function(); // stack frame space (call function)
|
||||
return a + b + c; // output data
|
||||
}
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title=""
|
||||
/* Structures */
|
||||
type node struct {
|
||||
val int
|
||||
next *node
|
||||
}
|
||||
|
||||
/* Create node structure */
|
||||
func newNode(val int) *node {
|
||||
return &node{val: val}
|
||||
}
|
||||
|
||||
/* Functions */
|
||||
func function() int {
|
||||
// Perform certain operations...
|
||||
return 0
|
||||
}
|
||||
|
||||
func algorithm(n int) int { // input data
|
||||
const a = 0 // temporary data (constant)
|
||||
b := 0 // temporary storage of data (variable)
|
||||
newNode(0) // temporary data (object)
|
||||
c := function() // stack frame space (call function)
|
||||
return a + b + c // output data
|
||||
}
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title=""
|
||||
/* Classes */
|
||||
class Node {
|
||||
var val: Int
|
||||
var next: Node?
|
||||
|
||||
init(x: Int) {
|
||||
val = x
|
||||
}
|
||||
}
|
||||
|
||||
/* Functions */
|
||||
func function() -> Int {
|
||||
// Perform certain operations...
|
||||
return 0
|
||||
}
|
||||
|
||||
func algorithm(n: Int) -> Int { // input data
|
||||
let a = 0 // temporary data (constant)
|
||||
var b = 0 // temporary data (variable)
|
||||
let node = Node(x: 0) // temporary data (object)
|
||||
let c = function() // stack frame space (call function)
|
||||
return a + b + c // output data
|
||||
}
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title=""
|
||||
/* Classes */
|
||||
class Node {
|
||||
val;
|
||||
next;
|
||||
constructor(val) {
|
||||
this.val = val === undefined ? 0 : val; // node value
|
||||
this.next = null; // reference to the next node
|
||||
}
|
||||
}
|
||||
|
||||
/* Functions */
|
||||
function constFunc() {
|
||||
// Perform certain operations
|
||||
return 0;
|
||||
}
|
||||
|
||||
function algorithm(n) { // input data
|
||||
const a = 0; // temporary data (constant)
|
||||
let b = 0; // temporary data (variable)
|
||||
const node = new Node(0); // temporary data (object)
|
||||
const c = constFunc(); // Stack frame space (calling function)
|
||||
return a + b + c; // output data
|
||||
}
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title=""
|
||||
/* Classes */
|
||||
class Node {
|
||||
val: number;
|
||||
next: Node | null;
|
||||
constructor(val?: number) {
|
||||
this.val = val === undefined ? 0 : val; // node value
|
||||
this.next = null; // reference to the next node
|
||||
}
|
||||
}
|
||||
|
||||
/* Functions */
|
||||
function constFunc(): number {
|
||||
// Perform certain operations
|
||||
return 0;
|
||||
}
|
||||
|
||||
function algorithm(n: number): number { // input data
|
||||
const a = 0; // temporary data (constant)
|
||||
let b = 0; // temporary data (variable)
|
||||
const node = new Node(0); // temporary data (object)
|
||||
const c = constFunc(); // Stack frame space (calling function)
|
||||
return a + b + c; // output data
|
||||
}
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title=""
|
||||
/* Classes */
|
||||
class Node {
|
||||
int val;
|
||||
Node next;
|
||||
Node(this.val, [this.next]);
|
||||
}
|
||||
|
||||
/* Functions */
|
||||
int function() {
|
||||
// Perform certain operations...
|
||||
return 0;
|
||||
}
|
||||
|
||||
int algorithm(int n) { // input data
|
||||
const int a = 0; // temporary data (constant)
|
||||
int b = 0; // temporary data (variable)
|
||||
Node node = Node(0); // temporary data (object)
|
||||
int c = function(); // stack frame space (call function)
|
||||
return a + b + c; // output data
|
||||
}
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title=""
|
||||
use std::rc::Rc;
|
||||
use std::cell::RefCell;
|
||||
|
||||
/* Structures */
|
||||
struct Node {
|
||||
val: i32,
|
||||
next: Option<Rc<RefCell<Node>>>,
|
||||
}
|
||||
|
||||
/* Creating a Node structure */
|
||||
impl Node {
|
||||
fn new(val: i32) -> Self {
|
||||
Self { val: val, next: None }
|
||||
}
|
||||
}
|
||||
|
||||
/* Functions */
|
||||
fn function() -> i32 {
|
||||
// Perform certain operations...
|
||||
return 0;
|
||||
}
|
||||
|
||||
fn algorithm(n: i32) -> i32 { // input data
|
||||
const a: i32 = 0; // temporary data (constant)
|
||||
let mut b = 0; // temporary data (variable)
|
||||
let node = Node::new(0); // temporary data (object)
|
||||
let c = function(); // stack frame space (call function)
|
||||
return a + b + c; // output data
|
||||
}
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title=""
|
||||
/* Functions */
|
||||
int func() {
|
||||
// Perform certain operations...
|
||||
return 0;
|
||||
}
|
||||
|
||||
int algorithm(int n) { // input data
|
||||
const int a = 0; // temporary data (constant)
|
||||
int b = 0; // temporary data (variable)
|
||||
int c = func(); // stack frame space (call function)
|
||||
return a + b + c; // output data
|
||||
}
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title=""
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title=""
|
||||
|
||||
```
|
||||
|
||||
## Calculation Method
|
||||
|
||||
The method for calculating space complexity is roughly similar to that of time complexity, with the only change being the shift of the statistical object from "number of operations" to "size of used space".
|
||||
|
||||
However, unlike time complexity, **we usually only focus on the worst-case space complexity**. This is because memory space is a hard requirement, and we must ensure that there is enough memory space reserved under all input data.
|
||||
|
||||
Consider the following code, the term "worst-case" in worst-case space complexity has two meanings.
|
||||
|
||||
1. **Based on the worst input data**: When $n < 10$, the space complexity is $O(1)$; but when $n > 10$, the initialized array `nums` occupies $O(n)$ space, thus the worst-case space complexity is $O(n)$.
|
||||
2. **Based on the peak memory used during the algorithm's execution**: For example, before executing the last line, the program occupies $O(1)$ space; when initializing the array `nums`, the program occupies $O(n)$ space, hence the worst-case space complexity is $O(n)$.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title=""
|
||||
def algorithm(n: int):
|
||||
a = 0 # O(1)
|
||||
b = [0] * 10000 # O(1)
|
||||
if n > 10:
|
||||
nums = [0] * n # O(n)
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title=""
|
||||
void algorithm(int n) {
|
||||
int a = 0; // O(1)
|
||||
vector<int> b(10000); // O(1)
|
||||
if (n > 10)
|
||||
vector<int> nums(n); // O(n)
|
||||
}
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title=""
|
||||
void algorithm(int n) {
|
||||
int a = 0; // O(1)
|
||||
int[] b = new int[10000]; // O(1)
|
||||
if (n > 10)
|
||||
int[] nums = new int[n]; // O(n)
|
||||
}
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title=""
|
||||
void Algorithm(int n) {
|
||||
int a = 0; // O(1)
|
||||
int[] b = new int[10000]; // O(1)
|
||||
if (n > 10) {
|
||||
int[] nums = new int[n]; // O(n)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title=""
|
||||
func algorithm(n int) {
|
||||
a := 0 // O(1)
|
||||
b := make([]int, 10000) // O(1)
|
||||
var nums []int
|
||||
if n > 10 {
|
||||
nums := make([]int, n) // O(n)
|
||||
}
|
||||
fmt.Println(a, b, nums)
|
||||
}
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title=""
|
||||
func algorithm(n: Int) {
|
||||
let a = 0 // O(1)
|
||||
let b = Array(repeating: 0, count: 10000) // O(1)
|
||||
if n > 10 {
|
||||
let nums = Array(repeating: 0, count: n) // O(n)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title=""
|
||||
function algorithm(n) {
|
||||
const a = 0; // O(1)
|
||||
const b = new Array(10000); // O(1)
|
||||
if (n > 10) {
|
||||
const nums = new Array(n); // O(n)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title=""
|
||||
function algorithm(n: number): void {
|
||||
const a = 0; // O(1)
|
||||
const b = new Array(10000); // O(1)
|
||||
if (n > 10) {
|
||||
const nums = new Array(n); // O(n)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title=""
|
||||
void algorithm(int n) {
|
||||
int a = 0; // O(1)
|
||||
List<int> b = List.filled(10000, 0); // O(1)
|
||||
if (n > 10) {
|
||||
List<int> nums = List.filled(n, 0); // O(n)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title=""
|
||||
fn algorithm(n: i32) {
|
||||
let a = 0; // O(1)
|
||||
let b = [0; 10000]; // O(1)
|
||||
if n > 10 {
|
||||
let nums = vec![0; n as usize]; // O(n)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title=""
|
||||
void algorithm(int n) {
|
||||
int a = 0; // O(1)
|
||||
int b[10000]; // O(1)
|
||||
if (n > 10)
|
||||
int nums[n] = {0}; // O(n)
|
||||
}
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title=""
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title=""
|
||||
|
||||
```
|
||||
|
||||
**In recursive functions, stack frame space must be taken into count**. Consider the following code:
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title=""
|
||||
def function() -> int:
|
||||
# Perform certain operations
|
||||
return 0
|
||||
|
||||
def loop(n: int):
|
||||
"""Loop O(1)"""""
|
||||
for _ in range(n):
|
||||
function()
|
||||
|
||||
def recur(n: int):
|
||||
"""Recursion O(n)"""""
|
||||
if n == 1:
|
||||
return
|
||||
return recur(n - 1)
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title=""
|
||||
int func() {
|
||||
// Perform certain operations
|
||||
return 0;
|
||||
}
|
||||
/* Cycle O(1) */
|
||||
void loop(int n) {
|
||||
for (int i = 0; i < n; i++) {
|
||||
func();
|
||||
}
|
||||
}
|
||||
/* Recursion O(n) */
|
||||
void recur(int n) {
|
||||
if (n == 1) return;
|
||||
return recur(n - 1);
|
||||
}
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title=""
|
||||
int function() {
|
||||
// Perform certain operations
|
||||
return 0;
|
||||
}
|
||||
/* Cycle O(1) */
|
||||
void loop(int n) {
|
||||
for (int i = 0; i < n; i++) {
|
||||
function();
|
||||
}
|
||||
}
|
||||
/* Recursion O(n) */
|
||||
void recur(int n) {
|
||||
if (n == 1) return;
|
||||
return recur(n - 1);
|
||||
}
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title=""
|
||||
int Function() {
|
||||
// Perform certain operations
|
||||
return 0;
|
||||
}
|
||||
/* Cycle O(1) */
|
||||
void Loop(int n) {
|
||||
for (int i = 0; i < n; i++) {
|
||||
Function();
|
||||
}
|
||||
}
|
||||
/* Recursion O(n) */
|
||||
int Recur(int n) {
|
||||
if (n == 1) return 1;
|
||||
return Recur(n - 1);
|
||||
}
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title=""
|
||||
func function() int {
|
||||
// Perform certain operations
|
||||
return 0
|
||||
}
|
||||
|
||||
/* Cycle O(1) */
|
||||
func loop(n int) {
|
||||
for i := 0; i < n; i++ {
|
||||
function()
|
||||
}
|
||||
}
|
||||
|
||||
/* Recursion O(n) */
|
||||
func recur(n int) {
|
||||
if n == 1 {
|
||||
return
|
||||
}
|
||||
recur(n - 1)
|
||||
}
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title=""
|
||||
@discardableResult
|
||||
func function() -> Int {
|
||||
// Perform certain operations
|
||||
return 0
|
||||
}
|
||||
|
||||
/* Cycle O(1) */
|
||||
func loop(n: Int) {
|
||||
for _ in 0 ..< n {
|
||||
function()
|
||||
}
|
||||
}
|
||||
|
||||
/* Recursion O(n) */
|
||||
func recur(n: Int) {
|
||||
if n == 1 {
|
||||
return
|
||||
}
|
||||
recur(n: n - 1)
|
||||
}
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title=""
|
||||
function constFunc() {
|
||||
// Perform certain operations
|
||||
return 0;
|
||||
}
|
||||
/* Cycle O(1) */
|
||||
function loop(n) {
|
||||
for (let i = 0; i < n; i++) {
|
||||
constFunc();
|
||||
}
|
||||
}
|
||||
/* Recursion O(n) */
|
||||
function recur(n) {
|
||||
if (n === 1) return;
|
||||
return recur(n - 1);
|
||||
}
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title=""
|
||||
function constFunc(): number {
|
||||
// Perform certain operations
|
||||
return 0;
|
||||
}
|
||||
/* Cycle O(1) */
|
||||
function loop(n: number): void {
|
||||
for (let i = 0; i < n; i++) {
|
||||
constFunc();
|
||||
}
|
||||
}
|
||||
/* Recursion O(n) */
|
||||
function recur(n: number): void {
|
||||
if (n === 1) return;
|
||||
return recur(n - 1);
|
||||
}
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title=""
|
||||
int function() {
|
||||
// Perform certain operations
|
||||
return 0;
|
||||
}
|
||||
/* Cycle O(1) */
|
||||
void loop(int n) {
|
||||
for (int i = 0; i < n; i++) {
|
||||
function();
|
||||
}
|
||||
}
|
||||
/* Recursion O(n) */
|
||||
void recur(int n) {
|
||||
if (n == 1) return;
|
||||
return recur(n - 1);
|
||||
}
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title=""
|
||||
fn function() -> i32 {
|
||||
// Perform certain operations
|
||||
return 0;
|
||||
}
|
||||
/* Cycle O(1) */
|
||||
fn loop(n: i32) {
|
||||
for i in 0..n {
|
||||
function();
|
||||
}
|
||||
}
|
||||
/* Recursion O(n) */
|
||||
void recur(n: i32) {
|
||||
if n == 1 {
|
||||
return;
|
||||
}
|
||||
recur(n - 1);
|
||||
}
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title=""
|
||||
int func() {
|
||||
// Perform certain operations
|
||||
return 0;
|
||||
}
|
||||
/* Cycle O(1) */
|
||||
void loop(int n) {
|
||||
for (int i = 0; i < n; i++) {
|
||||
func();
|
||||
}
|
||||
}
|
||||
/* Recursion O(n) */
|
||||
void recur(int n) {
|
||||
if (n == 1) return;
|
||||
return recur(n - 1);
|
||||
}
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title=""
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title=""
|
||||
|
||||
```
|
||||
|
||||
The time complexity of both `loop()` and `recur()` functions is $O(n)$, but their space complexities differ.
|
||||
|
||||
- The `loop()` function calls `function()` $n$ times in a loop, where each iteration's `function()` returns and releases its stack frame space, so the space complexity remains $O(1)$.
|
||||
- The recursive function `recur()` will have $n$ instances of unreturned `recur()` existing simultaneously during its execution, thus occupying $O(n)$ stack frame space.
|
||||
|
||||
## Common Types
|
||||
|
||||
Let the size of the input data be $n$, the following chart displays common types of space complexities (arranged from low to high).
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
O(1) < O(\log n) < O(n) < O(n^2) < O(2^n) \newline
|
||||
\text{Constant Order} < \text{Logarithmic Order} < \text{Linear Order} < \text{Quadratic Order} < \text{Exponential Order}
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||

|
||||
|
||||
### Constant Order $O(1)$ {data-toc-label="Constant Order"}
|
||||
|
||||
Constant order is common in constants, variables, objects that are independent of the size of input data $n$.
|
||||
|
||||
Note that memory occupied by initializing variables or calling functions in a loop, which is released upon entering the next cycle, does not accumulate over space, thus the space complexity remains $O(1)$:
|
||||
|
||||
```src
|
||||
[file]{space_complexity}-[class]{}-[func]{constant}
|
||||
```
|
||||
|
||||
### Linear Order $O(n)$ {data-toc-label="Linear Order"}
|
||||
|
||||
Linear order is common in arrays, linked lists, stacks, queues, etc., where the number of elements is proportional to $n$:
|
||||
|
||||
```src
|
||||
[file]{space_complexity}-[class]{}-[func]{linear}
|
||||
```
|
||||
|
||||
As shown below, this function's recursive depth is $n$, meaning there are $n$ instances of unreturned `linear_recur()` function, using $O(n)$ size of stack frame space:
|
||||
|
||||
```src
|
||||
[file]{space_complexity}-[class]{}-[func]{linear_recur}
|
||||
```
|
||||
|
||||

|
||||
|
||||
### Quadratic Order $O(n^2)$ {data-toc-label="Quadratic Order"}
|
||||
|
||||
Quadratic order is common in matrices and graphs, where the number of elements is quadratic to $n$:
|
||||
|
||||
```src
|
||||
[file]{space_complexity}-[class]{}-[func]{quadratic}
|
||||
```
|
||||
|
||||
As shown below, the recursive depth of this function is $n$, and in each recursive call, an array is initialized with lengths $n$, $n-1$, $\dots$, $2$, $1$, averaging $n/2$, thus overall occupying $O(n^2)$ space:
|
||||
|
||||
```src
|
||||
[file]{space_complexity}-[class]{}-[func]{quadratic_recur}
|
||||
```
|
||||
|
||||

|
||||
|
||||
### Exponential Order $O(2^n)$ {data-toc-label="Exponential Order"}
|
||||
|
||||
Exponential order is common in binary trees. Observe the below image, a "full binary tree" with $n$ levels has $2^n - 1$ nodes, occupying $O(2^n)$ space:
|
||||
|
||||
```src
|
||||
[file]{space_complexity}-[class]{}-[func]{build_tree}
|
||||
```
|
||||
|
||||

|
||||
|
||||
### Logarithmic Order $O(\log n)$ {data-toc-label="Logarithmic Order"}
|
||||
|
||||
Logarithmic order is common in divide-and-conquer algorithms. For example, in merge sort, an array of length $n$ is recursively divided in half each round, forming a recursion tree of height $\log n$, using $O(\log n)$ stack frame space.
|
||||
|
||||
Another example is converting a number to a string. Given a positive integer $n$, its number of digits is $\log_{10} n + 1$, corresponding to the length of the string, thus the space complexity is $O(\log_{10} n + 1) = O(\log n)$.
|
||||
|
||||
## Balancing Time and Space
|
||||
|
||||
Ideally, we aim for both time complexity and space complexity to be optimal. However, in practice, optimizing both simultaneously is often difficult.
|
||||
|
||||
**Lowering time complexity usually comes at the cost of increased space complexity, and vice versa**. The approach of sacrificing memory space to improve algorithm speed is known as "space-time tradeoff"; the reverse is known as "time-space tradeoff".
|
||||
|
||||
The choice depends on which aspect we value more. In most cases, time is more precious than space, so "space-time tradeoff" is often the more common strategy. Of course, controlling space complexity is also very important when dealing with large volumes of data.
|
49
en/docs/chapter_computational_complexity/summary.md
Normal file
@ -0,0 +1,49 @@
|
||||
# Summary
|
||||
|
||||
### Key Review
|
||||
|
||||
**Algorithm Efficiency Assessment**
|
||||
|
||||
- Time efficiency and space efficiency are the two main criteria for assessing the merits of an algorithm.
|
||||
- We can assess algorithm efficiency through actual testing, but it's challenging to eliminate the influence of the test environment, and it consumes substantial computational resources.
|
||||
- Complexity analysis can overcome the disadvantages of actual testing. Its results are applicable across all operating platforms and can reveal the efficiency of algorithms at different data scales.
|
||||
|
||||
**Time Complexity**
|
||||
|
||||
- Time complexity measures the trend of an algorithm's running time with the increase in data volume, effectively assessing algorithm efficiency. However, it can fail in certain cases, such as with small input data volumes or when time complexities are the same, making it challenging to precisely compare the efficiency of algorithms.
|
||||
- Worst-case time complexity is denoted using big O notation, representing the asymptotic upper bound, reflecting the growth level of the number of operations $T(n)$ as $n$ approaches infinity.
|
||||
- Calculating time complexity involves two steps: first counting the number of operations, then determining the asymptotic upper bound.
|
||||
- Common time complexities, arranged from low to high, include $O(1)$, $O(\log n)$, $O(n)$, $O(n \log n)$, $O(n^2)$, $O(2^n)$, and $O(n!)$, among others.
|
||||
- The time complexity of some algorithms is not fixed and depends on the distribution of input data. Time complexities are divided into worst, best, and average cases. The best case is rarely used because input data generally needs to meet strict conditions to achieve the best case.
|
||||
- Average time complexity reflects the efficiency of an algorithm under random data inputs, closely resembling the algorithm's performance in actual applications. Calculating average time complexity requires accounting for the distribution of input data and the subsequent mathematical expectation.
|
||||
|
||||
**Space Complexity**
|
||||
|
||||
- Space complexity, similar to time complexity, measures the trend of memory space occupied by an algorithm with the increase in data volume.
|
||||
- The relevant memory space used during the algorithm's execution can be divided into input space, temporary space, and output space. Generally, input space is not included in space complexity calculations. Temporary space can be divided into temporary data, stack frame space, and instruction space, where stack frame space usually affects space complexity only in recursive functions.
|
||||
- We usually focus only on the worst-case space complexity, which means calculating the space complexity of the algorithm under the worst input data and at the worst moment of operation.
|
||||
- Common space complexities, arranged from low to high, include $O(1)$, $O(\log n)$, $O(n)$, $O(n^2)$, and $O(2^n)$, among others.
|
||||
|
||||
### Q & A
|
||||
|
||||
**Q**: Is the space complexity of tail recursion $O(1)$?
|
||||
|
||||
Theoretically, the space complexity of a tail-recursive function can be optimized to $O(1)$. However, most programming languages (such as Java, Python, C++, Go, C#) do not support automatic optimization of tail recursion, so it's generally considered to have a space complexity of $O(n)$.
|
||||
|
||||
**Q**: What is the difference between the terms "function" and "method"?
|
||||
|
||||
A "function" can be executed independently, with all parameters passed explicitly. A "method" is associated with an object and is implicitly passed to the object calling it, able to operate on the data contained within an instance of a class.
|
||||
|
||||
Here are some examples from common programming languages:
|
||||
|
||||
- C is a procedural programming language without object-oriented concepts, so it only has functions. However, we can simulate object-oriented programming by creating structures (struct), and functions associated with these structures are equivalent to methods in other programming languages.
|
||||
- Java and C# are object-oriented programming languages where code blocks (methods) are typically part of a class. Static methods behave like functions because they are bound to the class and cannot access specific instance variables.
|
||||
- C++ and Python support both procedural programming (functions) and object-oriented programming (methods).
|
||||
|
||||
**Q**: Does the "Common Types of Space Complexity" figure reflect the absolute size of occupied space?
|
||||
|
||||
No, the figure shows space complexities, which reflect growth trends, not the absolute size of the occupied space.
|
||||
|
||||
If you take $n = 8$, you might find that the values of each curve don't correspond to their functions. This is because each curve includes a constant term, intended to compress the value range into a visually comfortable range.
|
||||
|
||||
In practice, since we usually don't know the "constant term" complexity of each method, it's generally not possible to choose the best solution for $n = 8$ based solely on complexity. However, for $n = 8^5$, it's much easier to choose, as the growth trend becomes dominant.
|
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 19 KiB |
After Width: | Height: | Size: 16 KiB |
After Width: | Height: | Size: 19 KiB |
After Width: | Height: | Size: 21 KiB |
After Width: | Height: | Size: 19 KiB |
After Width: | Height: | Size: 22 KiB |
After Width: | Height: | Size: 12 KiB |
1112
en/docs/chapter_computational_complexity/time_complexity.md
Normal file
170
en/docs/chapter_data_structure/basic_data_types.md
Normal file
@ -0,0 +1,170 @@
|
||||
# Basic Data Types
|
||||
|
||||
When discussing data in computers, various forms like text, images, videos, voice and 3D models comes to mind. Despite their different organizational forms, they are all composed of various basic data types.
|
||||
|
||||
**Basic data types are those that the CPU can directly operate on** and are directly used in algorithms, mainly including the following.
|
||||
|
||||
- Integer types: `byte`, `short`, `int`, `long`.
|
||||
- Floating-point types: `float`, `double`, used to represent decimals.
|
||||
- Character type: `char`, used to represent letters, punctuation, and even emojis in various languages.
|
||||
- Boolean type: `bool`, used to represent "yes" or "no" decisions.
|
||||
|
||||
**Basic data types are stored in computers in binary form**. One binary digit is 1 bit. In most modern operating systems, 1 byte consists of 8 bits.
|
||||
|
||||
The range of values for basic data types depends on the size of the space they occupy. Below, we take Java as an example.
|
||||
|
||||
- The integer type `byte` occupies 1 byte = 8 bits and can represent $2^8$ numbers.
|
||||
- The integer type `int` occupies 4 bytes = 32 bits and can represent $2^{32}$ numbers.
|
||||
|
||||
The following table lists the space occupied, value range, and default values of various basic data types in Java. While memorizing this table isn't necessary, having a general understanding of it and referencing it when required is recommended.
|
||||
|
||||
<p align="center"> Table <id> Space Occupied and Value Range of Basic Data Types </p>
|
||||
|
||||
| Type | Symbol | Space Occupied | Minimum Value | Maximum Value | Default Value |
|
||||
| ------- | -------- | -------------- | ------------------------ | ----------------------- | -------------- |
|
||||
| Integer | `byte` | 1 byte | $-2^7$ ($-128$) | $2^7 - 1$ ($127$) | 0 |
|
||||
| | `short` | 2 bytes | $-2^{15}$ | $2^{15} - 1$ | 0 |
|
||||
| | `int` | 4 bytes | $-2^{31}$ | $2^{31} - 1$ | 0 |
|
||||
| | `long` | 8 bytes | $-2^{63}$ | $2^{63} - 1$ | 0 |
|
||||
| Float | `float` | 4 bytes | $1.175 \times 10^{-38}$ | $3.403 \times 10^{38}$ | $0.0\text{f}$ |
|
||||
| | `double` | 8 bytes | $2.225 \times 10^{-308}$ | $1.798 \times 10^{308}$ | 0.0 |
|
||||
| Char | `char` | 2 bytes | 0 | $2^{16} - 1$ | 0 |
|
||||
| Boolean | `bool` | 1 byte | $\text{false}$ | $\text{true}$ | $\text{false}$ |
|
||||
|
||||
Please note that the above table is specific to Java's basic data types. Every programming language has its own data type definitions, which might differ in space occupied, value ranges, and default values.
|
||||
|
||||
- In Python, the integer type `int` can be of any size, limited only by available memory; the floating-point `float` is double precision 64-bit; there is no `char` type, as a single character is actually a string `str` of length 1.
|
||||
- C and C++ do not specify the size of basic data types, it varies with implementation and platform. The above table follows the LP64 [data model](https://en.cppreference.com/w/cpp/language/types#Properties), used for Unix 64-bit operating systems including Linux and macOS.
|
||||
- The size of `char` in C and C++ is 1 byte, while in most programming languages, it depends on the specific character encoding method, as detailed in the "Character Encoding" chapter.
|
||||
- Even though representing a boolean only requires 1 bit (0 or 1), it is usually stored in memory as 1 byte. This is because modern computer CPUs typically use 1 byte as the smallest addressable memory unit.
|
||||
|
||||
So, what is the connection between basic data types and data structures? We know that data structures are ways to organize and store data in computers. The focus here is on "structure" rather than "data".
|
||||
|
||||
If we want to represent "a row of numbers", we naturally think of using an array. This is because the linear structure of an array can represent the adjacency and the ordering of the numbers, but whether the stored content is an integer `int`, a decimal `float`, or a character `char`, is irrelevant to the "data structure".
|
||||
|
||||
In other words, **basic data types provide the "content type" of data, while data structures provide the "way of organizing" data**. For example, in the following code, we use the same data structure (array) to store and represent different basic data types, including `int`, `float`, `char`, `bool`, etc.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title=""
|
||||
# Using various basic data types to initialize arrays
|
||||
numbers: list[int] = [0] * 5
|
||||
decimals: list[float] = [0.0] * 5
|
||||
# Python's characters are actually strings of length 1
|
||||
characters: list[str] = ['0'] * 5
|
||||
bools: list[bool] = [False] * 5
|
||||
# Python's lists can freely store various basic data types and object references
|
||||
data = [0, 0.0, 'a', False, ListNode(0)]
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title=""
|
||||
// Using various basic data types to initialize arrays
|
||||
int numbers[5];
|
||||
float decimals[5];
|
||||
char characters[5];
|
||||
bool bools[5];
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title=""
|
||||
// Using various basic data types to initialize arrays
|
||||
int[] numbers = new int[5];
|
||||
float[] decimals = new float[5];
|
||||
char[] characters = new char[5];
|
||||
boolean[] bools = new boolean[5];
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title=""
|
||||
// Using various basic data types to initialize arrays
|
||||
int[] numbers = new int[5];
|
||||
float[] decimals = new float[5];
|
||||
char[] characters = new char[5];
|
||||
bool[] bools = new bool[5];
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title=""
|
||||
// Using various basic data types to initialize arrays
|
||||
var numbers = [5]int{}
|
||||
var decimals = [5]float64{}
|
||||
var characters = [5]byte{}
|
||||
var bools = [5]bool{}
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title=""
|
||||
// Using various basic data types to initialize arrays
|
||||
let numbers = Array(repeating: 0, count: 5)
|
||||
let decimals = Array(repeating: 0.0, count: 5)
|
||||
let characters: [Character] = Array(repeating: "a", count: 5)
|
||||
let bools = Array(repeating: false, count: 5)
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title=""
|
||||
// JavaScript's arrays can freely store various basic data types and objects
|
||||
const array = [0, 0.0, 'a', false];
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title=""
|
||||
// Using various basic data types to initialize arrays
|
||||
const numbers: number[] = [];
|
||||
const characters: string[] = [];
|
||||
const bools: boolean[] = [];
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title=""
|
||||
// Using various basic data types to initialize arrays
|
||||
List<int> numbers = List.filled(5, 0);
|
||||
List<double> decimals = List.filled(5, 0.0);
|
||||
List<String> characters = List.filled(5, 'a');
|
||||
List<bool> bools = List.filled(5, false);
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title=""
|
||||
// Using various basic data types to initialize arrays
|
||||
let numbers: Vec<i32> = vec![0; 5];
|
||||
let decimals: Vec<f32> = vec![0.0, 5];
|
||||
let characters: Vec<char> = vec!['0'; 5];
|
||||
let bools: Vec<bool> = vec![false; 5];
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title=""
|
||||
// Using various basic data types to initialize arrays
|
||||
int numbers[10];
|
||||
float decimals[10];
|
||||
char characters[10];
|
||||
bool bools[10];
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title=""
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title=""
|
||||
// Using various basic data types to initialize arrays
|
||||
var numbers: [5]i32 = undefined;
|
||||
var decimals: [5]f32 = undefined;
|
||||
var characters: [5]u8 = undefined;
|
||||
var bools: [5]bool = undefined;
|
||||
```
|
After Width: | Height: | Size: 63 KiB |
After Width: | Height: | Size: 17 KiB |
After Width: | Height: | Size: 21 KiB |
87
en/docs/chapter_data_structure/character_encoding.md
Normal file
@ -0,0 +1,87 @@
|
||||
# Character Encoding *
|
||||
|
||||
In the computer system, all data is stored in binary form, and characters (represented by char) are no exception. To represent characters, we need to develop a "character set" that defines a one-to-one mapping between each character and binary numbers. With the character set, computers can convert binary numbers to characters by looking up the table.
|
||||
|
||||
## ASCII Character Set
|
||||
|
||||
The "ASCII code" is one of the earliest character sets, officially known as the American Standard Code for Information Interchange. It uses 7 binary digits (the lower 7 bits of a byte) to represent a character, allowing for a maximum of 128 different characters. As shown in the figure below, ASCII includes uppercase and lowercase English letters, numbers 0 ~ 9, various punctuation marks, and certain control characters (such as newline and tab).
|
||||
|
||||

|
||||
|
||||
However, **ASCII can only represent English characters**. With the globalization of computers, a character set called "EASCII" was developed to represent more languages. It expands from the 7-bit structure of ASCII to 8 bits, enabling the representation of 256 characters.
|
||||
|
||||
Globally, various region-specific EASCII character sets have been introduced. The first 128 characters of these sets are consistent with the ASCII, while the remaining 128 characters are defined differently to accommodate the requirements of different languages.
|
||||
|
||||
## GBK Character Set
|
||||
|
||||
Later, it was found that **EASCII still could not meet the character requirements of many languages**. For instance, there are nearly a hundred thousand Chinese characters, with several thousand used regularly. In 1980, the Standardization Administration of China released the "GB2312" character set, which included 6763 Chinese characters, essentially fulfilling the computer processing needs for the Chinese language.
|
||||
|
||||
However, GB2312 could not handle some rare and traditional characters. The "GBK" character set expands GB2312 and includes 21886 Chinese characters. In the GBK encoding scheme, ASCII characters are represented with one byte, while Chinese characters use two bytes.
|
||||
|
||||
## Unicode Character Set
|
||||
|
||||
With the rapid evolution of computer technology and a plethora of character sets and encoding standards, numerous problems arose. On the one hand, these character sets generally only defined characters for specific languages and could not function properly in multilingual environments. On the other hand, the existence of multiple character set standards for the same language caused garbled text when information was exchanged between computers using different encoding standards.
|
||||
|
||||
Researchers of that era thought: **What if a comprehensive character set encompassing all global languages and symbols was developed? Wouldn't this resolve the issues associated with cross-linguistic environments and garbled text?** Inspired by this idea, the extensive character set, Unicode, was born.
|
||||
|
||||
"Unicode" is referred to as "统一码" (Unified Code) in Chinese, theoretically capable of accommodating over a million characters. It aims to incorporate characters from all over the world into a single set, providing a universal character set for processing and displaying various languages and reducing the issues of garbled text due to different encoding standards.
|
||||
|
||||
Since its release in 1991, Unicode has continually expanded to include new languages and characters. As of September 2022, Unicode contains 149,186 characters, including characters, symbols, and even emojis from various languages. In the vast Unicode character set, commonly used characters occupy 2 bytes, while some rare characters may occupy 3 or even 4 bytes.
|
||||
|
||||
Unicode is a universal character set that assigns a number (called a "code point") to each character, **but it does not specify how these character code points should be stored in a computer system**. One might ask: How does a system interpret Unicode code points of varying lengths within a text? For example, given a 2-byte code, how does the system determine if it represents a single 2-byte character or two 1-byte characters?
|
||||
|
||||
A straightforward solution to this problem is to store all characters as equal-length encodings. As shown in the figure below, each character in "Hello" occupies 1 byte, while each character in "算法" (algorithm) occupies 2 bytes. We could encode all characters in "Hello 算法" as 2 bytes by padding the higher bits with zeros. This method would enable the system to interpret a character every 2 bytes, recovering the content of the phrase.
|
||||
|
||||

|
||||
|
||||
However, as ASCII has shown us, encoding English only requires 1 byte. Using the above approach would double the space occupied by English text compared to ASCII encoding, which is a waste of memory space. Therefore, a more efficient Unicode encoding method is needed.
|
||||
|
||||
## UTF-8 Encoding
|
||||
|
||||
Currently, UTF-8 has become the most widely used Unicode encoding method internationally. **It is a variable-length encoding**, using 1 to 4 bytes to represent a character, depending on the complexity of the character. ASCII characters need only 1 byte, Latin and Greek letters require 2 bytes, commonly used Chinese characters need 3 bytes, and some other rare characters need 4 bytes.
|
||||
|
||||
The encoding rules for UTF-8 are not complex and can be divided into two cases:
|
||||
|
||||
- For 1-byte characters, set the highest bit to $0$, and the remaining 7 bits to the Unicode code point. Notably, ASCII characters occupy the first 128 code points in the Unicode set. This means that **UTF-8 encoding is backward compatible with ASCII**. This implies that UTF-8 can be used to parse ancient ASCII text.
|
||||
- For characters of length $n$ bytes (where $n > 1$), set the highest $n$ bits of the first byte to $1$, and the $(n + 1)^{\text{th}}$ bit to $0$; starting from the second byte, set the highest 2 bits of each byte to $10$; the rest of the bits are used to fill the Unicode code point.
|
||||
|
||||
The figure below shows the UTF-8 encoding for "Hello算法". It can be observed that since the highest $n$ bits are set to $1$, the system can determine the length of the character as $n$ by counting the number of highest bits set to $1$.
|
||||
|
||||
But why set the highest 2 bits of the remaining bytes to $10$? Actually, this $10$ serves as a kind of checksum. If the system starts parsing text from an incorrect byte, the $10$ at the beginning of the byte can help the system quickly detect anomalies.
|
||||
|
||||
The reason for using $10$ as a checksum is that, under UTF-8 encoding rules, it's impossible for the highest two bits of a character to be $10$. This can be proven by contradiction: If the highest two bits of a character are $10$, it indicates that the character's length is $1$, corresponding to ASCII. However, the highest bit of an ASCII character should be $0$, which contradicts the assumption.
|
||||
|
||||

|
||||
|
||||
Apart from UTF-8, other common encoding methods include:
|
||||
|
||||
- **UTF-16 Encoding**: Uses 2 or 4 bytes to represent a character. All ASCII characters and commonly used non-English characters are represented with 2 bytes; a few characters require 4 bytes. For 2-byte characters, the UTF-16 encoding equals the Unicode code point.
|
||||
- **UTF-32 Encoding**: Every character uses 4 bytes. This means UTF-32 occupies more space than UTF-8 and UTF-16, especially for texts with a high proportion of ASCII characters.
|
||||
|
||||
From the perspective of storage space, using UTF-8 to represent English characters is very efficient because it only requires 1 byte; using UTF-16 to encode some non-English characters (such as Chinese) can be more efficient because it only requires 2 bytes, while UTF-8 might need 3 bytes.
|
||||
|
||||
From a compatibility perspective, UTF-8 is the most versatile, with many tools and libraries supporting UTF-8 as a priority.
|
||||
|
||||
## Character Encoding in Programming Languages
|
||||
|
||||
Historically, many programming languages utilized fixed-length encodings such as UTF-16 or UTF-32 for processing strings during program execution. This allows strings to be handled as arrays, offering several advantages:
|
||||
|
||||
- **Random Access**: Strings encoded in UTF-16 can be accessed randomly with ease. For UTF-8, which is a variable-length encoding, locating the $i^{th}$ character requires traversing the string from the start to the $i^{th}$ position, taking $O(n)$ time.
|
||||
- **Character Counting**: Similar to random access, counting the number of characters in a UTF-16 encoded string is an $O(1)$ operation. However, counting characters in a UTF-8 encoded string requires traversing the entire string.
|
||||
- **String Operations**: Many string operations like splitting, concatenating, inserting, and deleting are easier on UTF-16 encoded strings. These operations generally require additional computation on UTF-8 encoded strings to ensure the validity of the UTF-8 encoding.
|
||||
|
||||
The design of character encoding schemes in programming languages is an interesting topic involving various factors:
|
||||
|
||||
- Java’s `String` type uses UTF-16 encoding, with each character occupying 2 bytes. This was based on the initial belief that 16 bits were sufficient to represent all possible characters and proven incorrect later. As the Unicode standard expanded beyond 16 bits, characters in Java may now be represented by a pair of 16-bit values, known as “surrogate pairs.”
|
||||
- JavaScript and TypeScript use UTF-16 encoding for similar reasons as Java. When JavaScript was first introduced by Netscape in 1995, Unicode was still in its early stages, and 16-bit encoding was sufficient to represent all Unicode characters.
|
||||
- C# uses UTF-16 encoding, largely because the .NET platform, designed by Microsoft, and many Microsoft technologies, including the Windows operating system, extensively use UTF-16 encoding.
|
||||
|
||||
Due to the underestimation of character counts, these languages had to use "surrogate pairs" to represent Unicode characters exceeding 16 bits. This approach has its drawbacks: strings containing surrogate pairs may have characters occupying 2 or 4 bytes, losing the advantage of fixed-length encoding. Additionally, handling surrogate pairs adds complexity and debugging difficulty to programming.
|
||||
|
||||
Addressing these challenges, some languages have adopted alternative encoding strategies:
|
||||
|
||||
- Python’s `str` type uses Unicode encoding with a flexible representation where the storage length of characters depends on the largest Unicode code point in the string. If all characters are ASCII, each character occupies 1 byte, 2 bytes for characters within the Basic Multilingual Plane (BMP), and 4 bytes for characters beyond the BMP.
|
||||
- Go’s `string` type internally uses UTF-8 encoding. Go also provides the `rune` type for representing individual Unicode code points.
|
||||
- Rust’s `str` and `String` types use UTF-8 encoding internally. Rust also offers the `char` type for individual Unicode code points.
|
||||
|
||||
It’s important to note that the above discussion pertains to how strings are stored in programming languages, **which is different from how strings are stored in files or transmitted over networks**. For file storage or network transmission, strings are usually encoded in UTF-8 format for optimal compatibility and space efficiency.
|
After Width: | Height: | Size: 24 KiB |
After Width: | Height: | Size: 26 KiB |
After Width: | Height: | Size: 65 KiB |
@ -0,0 +1,48 @@
|
||||
# Classification of Data Structures
|
||||
|
||||
Common data structures include arrays, linked lists, stacks, queues, hash tables, trees, heaps, and graphs. They can be classified into "logical structure" and "physical structure".
|
||||
|
||||
## Logical Structure: Linear and Non-Linear
|
||||
|
||||
**The logical structures reveal the logical relationships between data elements**. In arrays and linked lists, data are arranged in a specific sequence, demonstrating the linear relationship between data; while in trees, data are arranged hierarchically from the top down, showing the derived relationship between "ancestors" and "descendants"; and graphs are composed of nodes and edges, reflecting the intricate network relationship.
|
||||
|
||||
As shown in the figure below, logical structures can be divided into two major categories: "linear" and "non-linear". Linear structures are more intuitive, indicating data is arranged linearly in logical relationships; non-linear structures, conversely, are arranged non-linearly.
|
||||
|
||||
- **Linear Data Structures**: Arrays, Linked Lists, Stacks, Queues, Hash Tables.
|
||||
- **Non-Linear Data Structures**: Trees, Heaps, Graphs, Hash Tables.
|
||||
|
||||

|
||||
|
||||
Non-linear data structures can be further divided into tree structures and network structures.
|
||||
|
||||
- **Linear Structures**: Arrays, linked lists, queues, stacks, and hash tables, where elements have a one-to-one sequential relationship.
|
||||
- **Tree Structures**: Trees, Heaps, Hash Tables, where elements have a one-to-many relationship.
|
||||
- **Network Structures**: Graphs, where elements have a many-to-many relationships.
|
||||
|
||||
## Physical Structure: Contiguous and Dispersed
|
||||
|
||||
**During the execution of an algorithm, the data being processed is stored in memory**. The figure below shows a computer memory stick where each black square is a physical memory space. We can think of memory as a vast Excel spreadsheet, with each cell capable of storing a certain amount of data.
|
||||
|
||||
**The system accesses the data at the target location by means of a memory address**. As shown in the figure below, the computer assigns a unique identifier to each cell in the table according to specific rules, ensuring that each memory space has a unique memory address. With these addresses, the program can access the data stored in memory.
|
||||
|
||||

|
||||
|
||||
!!! tip
|
||||
|
||||
It's worth noting that comparing memory to an Excel spreadsheet is a simplified analogy. The actual working mechanism of memory is more complex, involving concepts like address space, memory management, cache mechanisms, virtual memory, and physical memory.
|
||||
|
||||
Memory is a shared resource for all programs. When a block of memory is occupied by one program, it cannot be simultaneously used by other programs. **Therefore, considering memory resources is crucial in designing data structures and algorithms**. For instance, the algorithm's peak memory usage should not exceed the remaining free memory of the system; if there is a lack of contiguous memory blocks, then the data structure chosen must be able to be stored in non-contiguous memory blocks.
|
||||
|
||||
As illustrated in the figure below, **the physical structure reflects the way data is stored in computer memory** and it can be divided into contiguous space storage (arrays) and non-contiguous space storage (linked lists). The two types of physical structures exhibit complementary characteristics in terms of time efficiency and space efficiency.
|
||||
|
||||

|
||||
|
||||
**It is worth noting that all data structures are implemented based on arrays, linked lists, or a combination of both**. For example, stacks and queues can be implemented using either arrays or linked lists; while implementations of hash tables may involve both arrays and linked lists.
|
||||
- **Array-based implementations**: Stacks, Queues, Hash Tables, Trees, Heaps, Graphs, Matrices, Tensors (arrays with dimensions $\geq 3$).
|
||||
- **Linked-list-based implementations**: Stacks, Queues, Hash Tables, Trees, Heaps, Graphs, etc.
|
||||
|
||||
Data structures implemented based on arrays are also called “Static Data Structures,” meaning their length cannot be changed after initialization. Conversely, those based on linked lists are called “Dynamic Data Structures,” which can still adjust their size during program execution.
|
||||
|
||||
!!! tip
|
||||
|
||||
If you find it challenging to comprehend the physical structure, it is recommended that you read the next chapter, "Arrays and Linked Lists," and revisit this section later.
|
13
en/docs/chapter_data_structure/index.md
Normal file
@ -0,0 +1,13 @@
|
||||
# Data Structures
|
||||
|
||||
<div class="center-table" markdown>
|
||||
|
||||

|
||||
|
||||
</div>
|
||||
|
||||
!!! abstract
|
||||
|
||||
Data structures serve as a robust and diverse framework.
|
||||
|
||||
They offer a blueprint for the orderly organization of data, upon which algorithms come to life.
|
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 21 KiB |
150
en/docs/chapter_data_structure/number_encoding.md
Normal file
@ -0,0 +1,150 @@
|
||||
# Number Encoding *
|
||||
|
||||
!!! note
|
||||
|
||||
In this book, chapters marked with an asterisk '*' are optional readings. If you are short on time or find them challenging, you may skip these initially and return to them after completing the essential chapters.
|
||||
|
||||
## Integer Encoding
|
||||
|
||||
In the table from the previous section, we observed that all integer types can represent one more negative number than positive numbers, such as the `byte` range of $[-128, 127]$. This phenomenon seems counterintuitive, and its underlying reason involves knowledge of sign-magnitude, one's complement, and two's complement encoding.
|
||||
|
||||
Firstly, it's important to note that **numbers are stored in computers using the two's complement form**. Before analyzing why this is the case, let's define these three encoding methods:
|
||||
|
||||
- **Sign-magnitude**: The highest bit of a binary representation of a number is considered the sign bit, where $0$ represents a positive number and $1$ represents a negative number. The remaining bits represent the value of the number.
|
||||
- **One's complement**: The one's complement of a positive number is the same as its sign-magnitude. For negative numbers, it's obtained by inverting all bits except the sign bit.
|
||||
- **Two's complement**: The two's complement of a positive number is the same as its sign-magnitude. For negative numbers, it's obtained by adding $1$ to their one's complement.
|
||||
|
||||
The following diagram illustrates the conversions among sign-magnitude, one's complement, and two's complement:
|
||||
|
||||

|
||||
|
||||
Although sign-magnitude is the most intuitive, it has limitations. For one, **negative numbers in sign-magnitude cannot be directly used in calculations**. For example, in sign-magnitude, calculating $1 + (-2)$ results in $-3$, which is incorrect.
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
& 1 + (-2) \newline
|
||||
& \rightarrow 0000 \; 0001 + 1000 \; 0010 \newline
|
||||
& = 1000 \; 0011 \newline
|
||||
& \rightarrow -3
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
To address this, computers introduced the **one's complement**. If we convert to one's complement and calculate $1 + (-2)$, then convert the result back to sign-magnitude, we get the correct result of $-1$.
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
& 1 + (-2) \newline
|
||||
& \rightarrow 0000 \; 0001 \; \text{(Sign-magnitude)} + 1000 \; 0010 \; \text{(Sign-magnitude)} \newline
|
||||
& = 0000 \; 0001 \; \text{(One's complement)} + 1111 \; 1101 \; \text{(One's complement)} \newline
|
||||
& = 1111 \; 1110 \; \text{(One's complement)} \newline
|
||||
& = 1000 \; 0001 \; \text{(Sign-magnitude)} \newline
|
||||
& \rightarrow -1
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
Additionally, **there are two representations of zero in sign-magnitude**: $+0$ and $-0$. This means two different binary encodings for zero, which could lead to ambiguity. For example, in conditional checks, not differentiating between positive and negative zero might result in incorrect outcomes. Addressing this ambiguity would require additional checks, potentially reducing computational efficiency.
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
+0 & \rightarrow 0000 \; 0000 \newline
|
||||
-0 & \rightarrow 1000 \; 0000
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
Like sign-magnitude, one's complement also suffers from the positive and negative zero ambiguity. Therefore, computers further introduced the **two's complement**. Let's observe the conversion process for negative zero in sign-magnitude, one's complement, and two's complement:
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
-0 \rightarrow \; & 1000 \; 0000 \; \text{(Sign-magnitude)} \newline
|
||||
= \; & 1111 \; 1111 \; \text{(One's complement)} \newline
|
||||
= 1 \; & 0000 \; 0000 \; \text{(Two's complement)} \newline
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
Adding $1$ to the one's complement of negative zero produces a carry, but with `byte` length being only 8 bits, the carried-over $1$ to the 9th bit is discarded. Therefore, **the two's complement of negative zero is $0000 \; 0000$**, the same as positive zero, thus resolving the ambiguity.
|
||||
|
||||
One last puzzle is the $[-128, 127]$ range for `byte`, with an additional negative number, $-128$. We observe that for the interval $[-127, +127]$, all integers have corresponding sign-magnitude, one's complement, and two's complement, allowing for mutual conversion between them.
|
||||
|
||||
However, **the two's complement $1000 \; 0000$ is an exception without a corresponding sign-magnitude**. According to the conversion method, its sign-magnitude would be $0000 \; 0000$, indicating zero. This presents a contradiction because its two's complement should represent itself. Computers designate this special two's complement $1000 \; 0000$ as representing $-128$. In fact, the calculation of $(-1) + (-127)$ in two's complement results in $-128$.
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
& (-127) + (-1) \newline
|
||||
& \rightarrow 1111 \; 1111 \; \text{(Sign-magnitude)} + 1000 \; 0001 \; \text{(Sign-magnitude)} \newline
|
||||
& = 1000 \; 0000 \; \text{(One's complement)} + 1111 \; 1110 \; \text{(One's complement)} \newline
|
||||
& = 1000 \; 0001 \; \text{(Two's complement)} + 1111 \; 1111 \; \text{(Two's complement)} \newline
|
||||
& = 1000 \; 0000 \; \text{(Two's complement)} \newline
|
||||
& \rightarrow -128
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
As you might have noticed, all these calculations are additions, hinting at an important fact: **computers' internal hardware circuits are primarily designed around addition operations**. This is because addition is simpler to implement in hardware compared to other operations like multiplication, division, and subtraction, allowing for easier parallelization and faster computation.
|
||||
|
||||
It's important to note that this doesn't mean computers can only perform addition. **By combining addition with basic logical operations, computers can execute a variety of other mathematical operations**. For example, the subtraction $a - b$ can be translated into $a + (-b)$; multiplication and division can be translated into multiple additions or subtractions.
|
||||
|
||||
We can now summarize the reason for using two's complement in computers: with two's complement representation, computers can use the same circuits and operations to handle both positive and negative number addition, eliminating the need for special hardware circuits for subtraction and avoiding the ambiguity of positive and negative zero. This greatly simplifies hardware design and enhances computational efficiency.
|
||||
|
||||
The design of two's complement is quite ingenious, and due to space constraints, we'll stop here. Interested readers are encouraged to explore further.
|
||||
|
||||
## Floating-Point Number Encoding
|
||||
|
||||
You might have noticed something intriguing: despite having the same length of 4 bytes, why does a `float` have a much larger range of values compared to an `int`? This seems counterintuitive, as one would expect the range to shrink for `float` since it needs to represent fractions.
|
||||
|
||||
In fact, **this is due to the different representation method used by floating-point numbers (`float`)**. Let's consider a 32-bit binary number as:
|
||||
|
||||
$$
|
||||
b_{31} b_{30} b_{29} \ldots b_2 b_1 b_0
|
||||
$$
|
||||
|
||||
According to the IEEE 754 standard, a 32-bit `float` consists of the following three parts:
|
||||
|
||||
- Sign bit $\mathrm{S}$: Occupies 1 bit, corresponding to $b_{31}$.
|
||||
- Exponent bit $\mathrm{E}$: Occupies 8 bits, corresponding to $b_{30} b_{29} \ldots b_{23}$.
|
||||
- Fraction bit $\mathrm{N}$: Occupies 23 bits, corresponding to $b_{22} b_{21} \ldots b_0$.
|
||||
|
||||
The value of a binary `float` number is calculated as:
|
||||
|
||||
$$
|
||||
\text{val} = (-1)^{b_{31}} \times 2^{\left(b_{30} b_{29} \ldots b_{23}\right)_2 - 127} \times \left(1 . b_{22} b_{21} \ldots b_0\right)_2
|
||||
$$
|
||||
|
||||
Converted to a decimal formula, this becomes:
|
||||
|
||||
$$
|
||||
\text{val} = (-1)^{\mathrm{S}} \times 2^{\mathrm{E} - 127} \times (1 + \mathrm{N})
|
||||
$$
|
||||
|
||||
The range of each component is:
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
\mathrm{S} \in & \{ 0, 1\}, \quad \mathrm{E} \in \{ 1, 2, \dots, 254 \} \newline
|
||||
(1 + \mathrm{N}) = & (1 + \sum_{i=1}^{23} b_{23-i} \times 2^{-i}) \subset [1, 2 - 2^{-23}]
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||

|
||||
|
||||
Observing the diagram, given an example data $\mathrm{S} = 0$, $\mathrm{E} = 124$, $\mathrm{N} = 2^{-2} + 2^{-3} = 0.375$, we have:
|
||||
|
||||
$$
|
||||
\text{val} = (-1)^0 \times 2^{124 - 127} \times (1 + 0.375) = 0.171875
|
||||
$$
|
||||
|
||||
Now we can answer the initial question: **The representation of `float` includes an exponent bit, leading to a much larger range than `int`**. Based on the above calculation, the maximum positive number representable by `float` is approximately $2^{254 - 127} \times (2 - 2^{-23}) \approx 3.4 \times 10^{38}$, and the minimum negative number is obtained by switching the sign bit.
|
||||
|
||||
**However, the trade-off for `float`'s expanded range is a sacrifice in precision**. The integer type `int` uses all 32 bits to represent the number, with values evenly distributed; but due to the exponent bit, the larger the value of a `float`, the greater the difference between adjacent numbers.
|
||||
|
||||
As shown in the table below, exponent bits $E = 0$ and $E = 255$ have special meanings, **used to represent zero, infinity, $\mathrm{NaN}$, etc.**
|
||||
|
||||
<p align="center"> Table <id> Meaning of Exponent Bits </p>
|
||||
|
||||
| Exponent Bit E | Fraction Bit $\mathrm{N} = 0$ | Fraction Bit $\mathrm{N} \ne 0$ | Calculation Formula |
|
||||
| ------------------ | ----------------------------- | ------------------------------- | ---------------------------------------------------------------------- |
|
||||
| $0$ | $\pm 0$ | Subnormal Numbers | $(-1)^{\mathrm{S}} \times 2^{-126} \times (0.\mathrm{N})$ |
|
||||
| $1, 2, \dots, 254$ | Normal Numbers | Normal Numbers | $(-1)^{\mathrm{S}} \times 2^{(\mathrm{E} -127)} \times (1.\mathrm{N})$ |
|
||||
| $255$ | $\pm \infty$ | $\mathrm{NaN}$ | |
|
||||
|
||||
It's worth noting that subnormal numbers significantly improve the precision of floating-point numbers. The smallest positive normal number is $2^{-126}$, and the smallest positive subnormal number is $2^{-126} \times 2^{-23}$.
|
||||
|
||||
Double-precision `double` also uses a similar representation method to `float`, which is not elaborated here for brevity.
|
33
en/docs/chapter_data_structure/summary.md
Normal file
@ -0,0 +1,33 @@
|
||||
# Summary
|
||||
|
||||
### Key Review
|
||||
|
||||
- Data structures can be categorized from two perspectives: logical structure and physical structure. Logical structure describes the logical relationships between data elements, while physical structure describes how data is stored in computer memory.
|
||||
- Common logical structures include linear, tree-like, and network structures. We generally classify data structures into linear (arrays, linked lists, stacks, queues) and non-linear (trees, graphs, heaps) based on their logical structure. The implementation of hash tables may involve both linear and non-linear data structures.
|
||||
- When a program runs, data is stored in computer memory. Each memory space has a corresponding memory address, and the program accesses data through these addresses.
|
||||
- Physical structures are primarily divided into contiguous space storage (arrays) and dispersed space storage (linked lists). All data structures are implemented using arrays, linked lists, or a combination of both.
|
||||
- Basic data types in computers include integers (`byte`, `short`, `int`, `long`), floating-point numbers (`float`, `double`), characters (`char`), and booleans (`boolean`). Their range depends on the size of the space occupied and the representation method.
|
||||
- Original code, complement code, and two's complement code are three methods of encoding numbers in computers, and they can be converted into each other. The highest bit of the original code of an integer is the sign bit, and the remaining bits represent the value of the number.
|
||||
- Integers are stored in computers in the form of two's complement. In this representation, the computer can treat the addition of positive and negative numbers uniformly, without the need for special hardware circuits for subtraction, and there is no ambiguity of positive and negative zero.
|
||||
- The encoding of floating-point numbers consists of 1 sign bit, 8 exponent bits, and 23 fraction bits. Due to the presence of the exponent bit, the range of floating-point numbers is much greater than that of integers, but at the cost of sacrificing precision.
|
||||
- ASCII is the earliest English character set, 1 byte in length, and includes 127 characters. The GBK character set is a commonly used Chinese character set, including more than 20,000 Chinese characters. Unicode strives to provide a complete character set standard, including characters from various languages worldwide, thus solving the problem of garbled characters caused by inconsistent character encoding methods.
|
||||
- UTF-8 is the most popular Unicode encoding method, with excellent universality. It is a variable-length encoding method with good scalability and effectively improves the efficiency of space usage. UTF-16 and UTF-32 are fixed-length encoding methods. When encoding Chinese characters, UTF-16 occupies less space than UTF-8. Programming languages like Java and C# use UTF-16 encoding by default.
|
||||
|
||||
### Q & A
|
||||
|
||||
**Q**: Why does a hash table contain both linear and non-linear data structures?
|
||||
|
||||
The underlying structure of a hash table is an array. To resolve hash collisions, we may use "chaining": each bucket in the array points to a linked list, which, when exceeding a certain threshold, might be transformed into a tree (usually a red-black tree).
|
||||
From a storage perspective, the foundation of a hash table is an array, where each bucket slot might contain a value, a linked list, or a tree. Therefore, hash tables may contain both linear data structures (arrays, linked lists) and non-linear data structures (trees).
|
||||
|
||||
**Q**: Is the length of the `char` type 1 byte?
|
||||
|
||||
The length of the `char` type is determined by the encoding method used by the programming language. For example, Java, JavaScript, TypeScript, and C# all use UTF-16 encoding (to save Unicode code points), so the length of the char type is 2 bytes.
|
||||
|
||||
**Q**: Is there ambiguity in calling data structures based on arrays "static data structures"? Because operations like push and pop on stacks are "dynamic".
|
||||
|
||||
While stacks indeed allow for dynamic data operations, the data structure itself remains "static" (with unchangeable length). Even though data structures based on arrays can dynamically add or remove elements, their capacity is fixed. If the data volume exceeds the pre-allocated size, a new, larger array needs to be created, and the contents of the old array copied into it.
|
||||
|
||||
**Q**: When building stacks (queues) without specifying their size, why are they considered "static data structures"?
|
||||
|
||||
In high-level programming languages, we don't need to manually specify the initial capacity of stacks (queues); this task is automatically handled internally by the class. For example, the initial capacity of Java's ArrayList is usually 10. Furthermore, the expansion operation is also implemented automatically. See the subsequent "List" chapter for details.
|
After Width: | Height: | Size: 26 KiB |
366
en/docs/chapter_hashing/hash_algorithm.md
Normal file
@ -0,0 +1,366 @@
|
||||
# Hash Algorithms
|
||||
|
||||
The previous two sections introduced the working principle of hash tables and the methods to handle hash collisions. However, both open addressing and chaining can **only ensure that the hash table functions normally when collisions occur, but cannot reduce the frequency of hash collisions**.
|
||||
|
||||
If hash collisions occur too frequently, the performance of the hash table will deteriorate drastically. As shown in the figure below, for a chaining hash table, in the ideal case, the key-value pairs are evenly distributed across the buckets, achieving optimal query efficiency; in the worst case, all key-value pairs are stored in the same bucket, degrading the time complexity to $O(n)$.
|
||||
|
||||

|
||||
|
||||
**The distribution of key-value pairs is determined by the hash function**. Recalling the steps of calculating a hash function, first compute the hash value, then modulo it by the array length:
|
||||
|
||||
```shell
|
||||
index = hash(key) % capacity
|
||||
```
|
||||
|
||||
Observing the above formula, when the hash table capacity `capacity` is fixed, **the hash algorithm `hash()` determines the output value**, thereby determining the distribution of key-value pairs in the hash table.
|
||||
|
||||
This means that, to reduce the probability of hash collisions, we should focus on the design of the hash algorithm `hash()`.
|
||||
|
||||
## Goals of Hash Algorithms
|
||||
|
||||
To achieve a "fast and stable" hash table data structure, hash algorithms should have the following characteristics:
|
||||
|
||||
- **Determinism**: For the same input, the hash algorithm should always produce the same output. Only then can the hash table be reliable.
|
||||
- **High Efficiency**: The process of computing the hash value should be fast enough. The smaller the computational overhead, the more practical the hash table.
|
||||
- **Uniform Distribution**: The hash algorithm should ensure that key-value pairs are evenly distributed in the hash table. The more uniform the distribution, the lower the probability of hash collisions.
|
||||
|
||||
In fact, hash algorithms are not only used to implement hash tables but are also widely applied in other fields.
|
||||
|
||||
- **Password Storage**: To protect the security of user passwords, systems usually do not store the plaintext passwords but rather the hash values of the passwords. When a user enters a password, the system calculates the hash value of the input and compares it with the stored hash value. If they match, the password is considered correct.
|
||||
- **Data Integrity Check**: The data sender can calculate the hash value of the data and send it along; the receiver can recalculate the hash value of the received data and compare it with the received hash value. If they match, the data is considered intact.
|
||||
|
||||
For cryptographic applications, to prevent reverse engineering such as deducing the original password from the hash value, hash algorithms need higher-level security features.
|
||||
|
||||
- **Unidirectionality**: It should be impossible to deduce any information about the input data from the hash value.
|
||||
- **Collision Resistance**: It should be extremely difficult to find two different inputs that produce the same hash value.
|
||||
- **Avalanche Effect**: Minor changes in the input should lead to significant and unpredictable changes in the output.
|
||||
|
||||
Note that **"Uniform Distribution" and "Collision Resistance" are two separate concepts**. Satisfying uniform distribution does not necessarily mean collision resistance. For example, under random input `key`, the hash function `key % 100` can produce a uniformly distributed output. However, this hash algorithm is too simple, and all `key` with the same last two digits will have the same output, making it easy to deduce a usable `key` from the hash value, thereby cracking the password.
|
||||
|
||||
## Design of Hash Algorithms
|
||||
|
||||
The design of hash algorithms is a complex issue that requires consideration of many factors. However, for some less demanding scenarios, we can also design some simple hash algorithms.
|
||||
|
||||
- **Additive Hash**: Add up the ASCII codes of each character in the input and use the total sum as the hash value.
|
||||
- **Multiplicative Hash**: Utilize the non-correlation of multiplication, multiplying each round by a constant, accumulating the ASCII codes of each character into the hash value.
|
||||
- **XOR Hash**: Accumulate the hash value by XORing each element of the input data.
|
||||
- **Rotating Hash**: Accumulate the ASCII code of each character into a hash value, performing a rotation operation on the hash value before each accumulation.
|
||||
|
||||
```src
|
||||
[file]{simple_hash}-[class]{}-[func]{rot_hash}
|
||||
```
|
||||
|
||||
It is observed that the last step of each hash algorithm is to take the modulus of the large prime number $1000000007$ to ensure that the hash value is within an appropriate range. It is worth pondering why emphasis is placed on modulo a prime number, or what are the disadvantages of modulo a composite number? This is an interesting question.
|
||||
|
||||
To conclude: **Using a large prime number as the modulus can maximize the uniform distribution of hash values**. Since a prime number does not share common factors with other numbers, it can reduce the periodic patterns caused by the modulo operation, thus avoiding hash collisions.
|
||||
|
||||
For example, suppose we choose the composite number $9$ as the modulus, which can be divided by $3$, then all `key` divisible by $3$ will be mapped to hash values $0$, $3$, $6$.
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
\text{modulus} & = 9 \newline
|
||||
\text{key} & = \{ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, \dots \} \newline
|
||||
\text{hash} & = \{ 0, 3, 6, 0, 3, 6, 0, 3, 6, 0, 3, 6,\dots \}
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
If the input `key` happens to have this kind of arithmetic sequence distribution, then the hash values will cluster, thereby exacerbating hash collisions. Now, suppose we replace `modulus` with the prime number $13$, since there are no common factors between `key` and `modulus`, the uniformity of the output hash values will be significantly improved.
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
\text{modulus} & = 13 \newline
|
||||
\text{key} & = \{ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, \dots \} \newline
|
||||
\text{hash} & = \{ 0, 3, 6, 9, 12, 2, 5, 8, 11, 1, 4, 7, \dots \}
|
||||
\end{aligned}
|
||||
$$
|
||||
|
||||
It is worth noting that if the `key` is guaranteed to be randomly and uniformly distributed, then choosing a prime number or a composite number as the modulus can both produce uniformly distributed hash values. However, when the distribution of `key` has some periodicity, modulo a composite number is more likely to result in clustering.
|
||||
|
||||
In summary, we usually choose a prime number as the modulus, and this prime number should be large enough to eliminate periodic patterns as much as possible, enhancing the robustness of the hash algorithm.
|
||||
|
||||
## Common Hash Algorithms
|
||||
|
||||
It is not hard to see that the simple hash algorithms mentioned above are quite "fragile" and far from reaching the design goals of hash algorithms. For example, since addition and XOR obey the commutative law, additive hash and XOR hash cannot distinguish strings with the same content but in different order, which may exacerbate hash collisions and cause security issues.
|
||||
|
||||
In practice, we usually use some standard hash algorithms, such as MD5, SHA-1, SHA-2, and SHA-3. They can map input data of any length to a fixed-length hash value.
|
||||
|
||||
Over the past century, hash algorithms have been in a continuous process of upgrading and optimization. Some researchers strive to improve the performance of hash algorithms, while others, including hackers, are dedicated to finding security issues in hash algorithms. The table below shows hash algorithms commonly used in practical applications.
|
||||
|
||||
- MD5 and SHA-1 have been successfully attacked multiple times and are thus abandoned in various security applications.
|
||||
- SHA-2 series, especially SHA-256, is one of the most secure hash algorithms to date, with no successful attacks reported, hence commonly used in various security applications and protocols.
|
||||
- SHA-3 has lower implementation costs and higher computational efficiency compared to SHA-2, but its current usage coverage is not as extensive as the SHA-2 series.
|
||||
|
||||
<p align="center"> Table <id> Common Hash Algorithms </p>
|
||||
|
||||
| | MD5 | SHA-1 | SHA-2 | SHA-3 |
|
||||
| --------------- | ----------------------------------------------- | ----------------------------------- | ----------------------------------------------------------------- | ---------------------------- |
|
||||
| Release Year | 1992 | 1995 | 2002 | 2008 |
|
||||
| Output Length | 128 bit | 160 bit | 256/512 bit | 224/256/384/512 bit |
|
||||
| Hash Collisions | Frequent | Frequent | Rare | Rare |
|
||||
| Security Level | Low, has been successfully attacked | Low, has been successfully attacked | High | High |
|
||||
| Applications | Abandoned, still used for data integrity checks | Abandoned | Cryptocurrency transaction verification, digital signatures, etc. | Can be used to replace SHA-2 |
|
||||
|
||||
# Hash Values in Data Structures
|
||||
|
||||
We know that the keys in a hash table can be of various data types such as integers, decimals, or strings. Programming languages usually provide built-in hash algorithms for these data types to calculate the bucket indices in the hash table. Taking Python as an example, we can use the `hash()` function to compute the hash values for various data types.
|
||||
|
||||
- The hash values of integers and booleans are their own values.
|
||||
- The calculation of hash values for floating-point numbers and strings is more complex, and interested readers are encouraged to study this on their own.
|
||||
- The hash value of a tuple is a combination of the hash values of each of its elements, resulting in a single hash value.
|
||||
- The hash value of an object is generated based on its memory address. By overriding the hash method of an object, hash values can be generated based on content.
|
||||
|
||||
!!! tip
|
||||
|
||||
Be aware that the definition and methods of the built-in hash value calculation functions in different programming languages vary.
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title="built_in_hash.py"
|
||||
num = 3
|
||||
hash_num = hash(num)
|
||||
# Hash value of integer 3 is 3
|
||||
|
||||
bol = True
|
||||
hash_bol = hash(bol)
|
||||
# Hash value of boolean True is 1
|
||||
|
||||
dec = 3.14159
|
||||
hash_dec = hash(dec)
|
||||
# Hash value of decimal 3.14159 is 326484311674566659
|
||||
|
||||
str = "Hello 算法"
|
||||
hash_str = hash(str)
|
||||
# Hash value of string "Hello 算法" is 4617003410720528961
|
||||
|
||||
tup = (12836, "小哈")
|
||||
hash_tup = hash(tup)
|
||||
# Hash value of tuple (12836, '小哈') is 1029005403108185979
|
||||
|
||||
obj = ListNode(0)
|
||||
hash_obj = hash(obj)
|
||||
# Hash value of ListNode object at 0x1058fd810 is 274267521
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title="built_in_hash.cpp"
|
||||
int num = 3;
|
||||
size_t hashNum = hash<int>()(num);
|
||||
// Hash value of integer 3 is 3
|
||||
|
||||
bool bol = true;
|
||||
size_t hashBol = hash<bool>()(bol);
|
||||
// Hash value of boolean 1 is 1
|
||||
|
||||
double dec = 3.14159;
|
||||
size_t hashDec = hash<double>()(dec);
|
||||
// Hash value of decimal 3.14159 is 4614256650576692846
|
||||
|
||||
string str = "Hello 算法";
|
||||
size_t hashStr = hash<string>()(str);
|
||||
// Hash value of string "Hello 算法" is 15466937326284535026
|
||||
|
||||
// In C++, built-in std::hash() only provides hash values for basic data types
|
||||
// Hash values for arrays and objects need to be implemented separately
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title="built_in_hash.java"
|
||||
int num = 3;
|
||||
int hashNum = Integer.hashCode(num);
|
||||
// Hash value of integer 3 is 3
|
||||
|
||||
boolean bol = true;
|
||||
int hashBol = Boolean.hashCode(bol);
|
||||
// Hash value of boolean true is 1231
|
||||
|
||||
double dec = 3.14159;
|
||||
int hashDec = Double.hashCode(dec);
|
||||
// Hash value of decimal 3.14159 is -1340954729
|
||||
|
||||
String str = "Hello 算法";
|
||||
int hashStr = str.hashCode();
|
||||
// Hash value of string "Hello 算法" is -727081396
|
||||
|
||||
Object[] arr = { 12836, "小哈" };
|
||||
int hashTup = Arrays.hashCode(arr);
|
||||
// Hash value of array [12836, 小哈] is 1151158
|
||||
|
||||
ListNode obj = new ListNode(0);
|
||||
int hashObj = obj.hashCode();
|
||||
// Hash value of ListNode object utils.ListNode@7dc5e7b4 is 2110121908
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title="built_in_hash.cs"
|
||||
int num = 3;
|
||||
int hashNum = num.GetHashCode();
|
||||
// Hash value of integer 3 is 3;
|
||||
|
||||
bool bol = true;
|
||||
int hashBol = bol.GetHashCode();
|
||||
// Hash value of boolean true is 1;
|
||||
|
||||
double dec = 3.14159;
|
||||
int hashDec = dec.GetHashCode();
|
||||
// Hash value of decimal 3.14159 is -1340954729;
|
||||
|
||||
string str = "Hello 算法";
|
||||
int hashStr = str.GetHashCode();
|
||||
// Hash value of string "Hello 算法" is -586107568;
|
||||
|
||||
object[] arr = [12836, "小哈"];
|
||||
int hashTup = arr.GetHashCode();
|
||||
// Hash value of array [12836, 小哈] is 42931033;
|
||||
|
||||
ListNode obj = new(0);
|
||||
int hashObj = obj.GetHashCode();
|
||||
// Hash value of ListNode object 0 is 39053774;
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title="built_in_hash.go"
|
||||
// Go does not provide built-in hash code functions
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title="built_in_hash.swift"
|
||||
let num = 3
|
||||
let hashNum = num.hashValue
|
||||
// Hash value of integer 3 is 9047044699613009734
|
||||
|
||||
let bol = true
|
||||
let hashBol = bol.hashValue
|
||||
// Hash value of boolean true is -4431640247352757451
|
||||
|
||||
let dec = 3.14159
|
||||
let hashDec = dec.hashValue
|
||||
// Hash value of decimal 3.14159 is -2465384235396674631
|
||||
|
||||
let str = "Hello 算法"
|
||||
let hashStr = str.hashValue
|
||||
// Hash value of string "Hello 算法" is -7850626797806988787
|
||||
|
||||
let arr = [AnyHashable(12836), AnyHashable("小哈")]
|
||||
let hashTup = arr.hashValue
|
||||
// Hash value of array [AnyHashable(12836), AnyHashable("小哈")] is -2308633508154532996
|
||||
|
||||
let obj = ListNode(x: 0)
|
||||
let hashObj = obj.hashValue
|
||||
// Hash value of ListNode object utils.ListNode is -2434780518035996159
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title="built_in_hash.js"
|
||||
// JavaScript does not provide built-in hash code functions
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title="built_in_hash.ts"
|
||||
// TypeScript does not provide built-in hash code functions
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title="built_in_hash.dart"
|
||||
int num = 3;
|
||||
int hashNum = num.hashCode;
|
||||
// Hash value of integer 3 is 34803
|
||||
|
||||
bool bol = true;
|
||||
int hashBol = bol.hashCode;
|
||||
// Hash value of boolean true is 1231
|
||||
|
||||
double dec = 3.14159;
|
||||
int hashDec = dec.hashCode;
|
||||
// Hash value of decimal 3.14159 is 2570631074981783
|
||||
|
||||
String str = "Hello 算法";
|
||||
int hashStr = str.hashCode;
|
||||
// Hash value of string "Hello 算法" is 468167534
|
||||
|
||||
List arr = [12836, "小哈"];
|
||||
int hashArr = arr.hashCode;
|
||||
// Hash value of array [12836, 小哈] is 976512528
|
||||
|
||||
ListNode obj = new ListNode(0);
|
||||
int hashObj = obj.hashCode;
|
||||
// Hash value of ListNode object Instance of 'ListNode' is 1033450432
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title="built_in_hash.rs"
|
||||
use std::collections::hash_map::DefaultHasher;
|
||||
use std::hash::{Hash, Hasher};
|
||||
|
||||
let num = 3;
|
||||
let mut num_hasher = DefaultHasher::new();
|
||||
num.hash(&mut num_hasher);
|
||||
let hash_num = num_hasher.finish();
|
||||
// Hash value of integer 3 is 568126464209439262
|
||||
|
||||
let bol = true;
|
||||
let mut bol_hasher = DefaultHasher::new();
|
||||
bol.hash(&mut bol_hasher);
|
||||
let hash_bol = bol_hasher.finish();
|
||||
// Hash value of boolean true is 4952851536318644461
|
||||
|
||||
let dec: f32 = 3.14159;
|
||||
let mut dec_hasher = DefaultHasher::new();
|
||||
dec.to_bits().hash(&mut dec_hasher);
|
||||
let hash_dec = dec_hasher.finish();
|
||||
// Hash value of decimal 3.14159 is 2566941990314602357
|
||||
|
||||
let str = "Hello 算法";
|
||||
let mut str_hasher = DefaultHasher::new();
|
||||
str.hash(&mut str_hasher);
|
||||
let hash_str = str_hasher.finish();
|
||||
// Hash value of string "Hello 算法" is 16092673739211250988
|
||||
|
||||
let arr = (&12836, &"小哈");
|
||||
let mut tup_hasher = DefaultHasher::new();
|
||||
arr.hash(&mut tup_hasher);
|
||||
let hash_tup = tup_hasher.finish();
|
||||
// Hash value of tuple (12836, "小哈") is 1885128010422702749
|
||||
|
||||
let node = ListNode::new(42);
|
||||
let mut hasher = DefaultHasher::new();
|
||||
node.borrow().val.hash(&mut hasher);
|
||||
let hash = hasher.finish();
|
||||
// Hash value of ListNode object RefCell { value: ListNode { val: 42, next: None } } is 15387811073369036852
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title="built_in_hash.c"
|
||||
// C does not provide built-in hash code functions
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title="built_in_hash.kt"
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title="built_in_hash.zig"
|
||||
|
||||
```
|
||||
|
||||
??? pythontutor "Code Visualization"
|
||||
|
||||
https://pythontutor.com/render.html#code=class%20ListNode%3A%0A%20%20%20%20%22%22%22%E9%93%BE%E8%A1%A8%E8%8A%82%E7%82%B9%E7%B1%BB%22%22%22%0A%20%20%20%20def%20__init__%28self,%20val%3A%20int%29%3A%0A%20%20%20%20%20%20%20%20self.val%3A%20int%20%3D%20val%20%20%23%20%E8%8A%82%E7%82%B9%E5%80%BC%0A%20%20%20%20%20%20%20%20self.next%3A%20ListNode%20%7C%20None%20%3D%20None%20%20%23%20%E5%90%8E%E7%BB%A7%E8%8A%82%E7%82%B9%E5%BC%95%E7%94%A8%0A%0A%22%22%22Driver%20Code%22%22%22%0Aif%20__name__%20%3D%3D%20%22__main__%22%3A%0A%20%20%20%20num%20%3D%203%0A%20%20%20%20hash_num%20%3D%20hash%28num%29%0A%20%20%20%20%23%20%E6%95%B4%E6%95%B0%203%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%203%0A%0A%20%20%20%20bol%20%3D%20True%0A%20%20%20%20hash_bol%20%3D%20hash%28bol%29%0A%20%20%20%20%23%20%E5%B8%83%E5%B0%94%E9%87%8F%20True%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%201%0A%0A%20%20%20%20dec%20%3D%203.14159%0A%20%20%20%20hash_dec%20%3D%20hash%28dec%29%0A%20%20%20%20%23%20%E5%B0%8F%E6%95%B0%203.14159%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%20326484311674566659%0A%0A%20%20%20%20str%20%3D%20%22Hello%20%E7%AE%97%E6%B3%95%22%0A%20%20%20%20hash_str%20%3D%20hash%28str%29%0A%20%20%20%20%23%20%E5%AD%97%E7%AC%A6%E4%B8%B2%E2%80%9CHello%20%E7%AE%97%E6%B3%95%E2%80%9D%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%204617003410720528961%0A%0A%20%20%20%20tup%20%3D%20%2812836,%20%22%E5%B0%8F%E5%93%88%22%29%0A%20%20%20%20hash_tup%20%3D%20hash%28tup%29%0A%20%20%20%20%23%20%E5%85%83%E7%BB%84%20%2812836,%20'%E5%B0%8F%E5%93%88'%29%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%201029005403108185979%0A%0A%20%20%20%20obj%20%3D%20ListNode%280%29%0A%20%20%20%20hash_obj%20%3D%20hash%28obj%29%0A%20%20%20%20%23%20%E8%8A%82%E7%82%B9%E5%AF%B9%E8%B1%A1%20%3CListNode%20object%20at%200x1058fd810%3E%20%E7%9A%84%E5%93%88%E5%B8%8C%E5%80%BC%E4%B8%BA%20274267521&cumulative=false&curInstr=19&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=311&rawInputLstJSON=%5B%5D&textReferences=false
|
||||
|
||||
In many programming languages, **only immutable objects can serve as the `key` in a hash table**. If we use a list (dynamic array) as a `key`, when the contents of the list change, its hash value also changes, and we would no longer be able to find the original `value` in the hash table.
|
||||
|
||||
Although the member variables of a custom object (such as a linked list node) are mutable, it is hashable. **This is because the hash value of an object is usually generated based on its memory address**, and even if the contents of the object change, the memory address remains the same, so the hash value remains unchanged.
|
||||
|
||||
You might have noticed that the hash values output in different consoles are different. **This is because the Python interpreter adds a random salt to the string hash function each time it starts up**. This approach effectively prevents HashDoS attacks and enhances the security of the hash algorithm.
|
After Width: | Height: | Size: 30 KiB |
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 17 KiB |
108
en/docs/chapter_hashing/hash_collision.md
Normal file
@ -0,0 +1,108 @@
|
||||
# Hash Collision
|
||||
|
||||
As mentioned in the previous section, **usually the input space of a hash function is much larger than its output space**, making hash collisions theoretically inevitable. For example, if the input space consists of all integers and the output space is the size of the array capacity, multiple integers will inevitably map to the same bucket index.
|
||||
|
||||
Hash collisions can lead to incorrect query results, severely affecting the usability of hash tables. To solve this problem, we expand the hash table whenever a hash collision occurs, until the collision is resolved. This method is simple and effective but inefficient due to the extensive data transfer and hash value computation involved in resizing the hash table. To improve efficiency, we can adopt the following strategies:
|
||||
|
||||
1. Improve the data structure of the hash table, **allowing it to function normally in the event of a hash collision**.
|
||||
2. Only perform resizing when necessary, i.e., when hash collisions are severe.
|
||||
|
||||
There are mainly two methods for improving the structure of hash tables: "Separate Chaining" and "Open Addressing".
|
||||
|
||||
## Separate Chaining
|
||||
|
||||
In the original hash table, each bucket can store only one key-value pair. "Separate chaining" transforms individual elements into a linked list, with key-value pairs as list nodes, storing all colliding key-value pairs in the same list. The figure below shows an example of a hash table with separate chaining.
|
||||
|
||||

|
||||
|
||||
The operations of a hash table implemented with separate chaining have changed as follows:
|
||||
|
||||
- **Querying Elements**: Input `key`, pass through the hash function to obtain the bucket index, access the head node of the list, then traverse the list and compare `key` to find the target key-value pair.
|
||||
- **Adding Elements**: First access the list head node via the hash function, then add the node (key-value pair) to the list.
|
||||
- **Deleting Elements**: Access the list head based on the hash function's result, then traverse the list to find and remove the target node.
|
||||
|
||||
Separate chaining has the following limitations:
|
||||
|
||||
- **Increased Space Usage**: The linked list contains node pointers, which consume more memory space than arrays.
|
||||
- **Reduced Query Efficiency**: Due to the need for linear traversal of the list to find the corresponding element.
|
||||
|
||||
The code below provides a simple implementation of a separate chaining hash table, with two things to note:
|
||||
|
||||
- Lists (dynamic arrays) are used instead of linked lists for simplicity. In this setup, the hash table (array) contains multiple buckets, each of which is a list.
|
||||
- This implementation includes a method for resizing the hash table. When the load factor exceeds $\frac{2}{3}$, we resize the hash table to twice its original size.
|
||||
|
||||
```src
|
||||
[file]{hash_map_chaining}-[class]{hash_map_chaining}-[func]{}
|
||||
```
|
||||
|
||||
It's worth noting that when the list is very long, the query efficiency $O(n)$ is poor. **At this point, the list can be converted to an "AVL tree" or "Red-Black tree"** to optimize the time complexity of the query operation to $O(\log n)$.
|
||||
|
||||
## Open Addressing
|
||||
|
||||
"Open addressing" does not introduce additional data structures but uses "multiple probes" to handle hash collisions. The probing methods mainly include linear probing, quadratic probing, and double hashing.
|
||||
|
||||
Let's use linear probing as an example to introduce the mechanism of open addressing hash tables.
|
||||
|
||||
### Linear Probing
|
||||
|
||||
Linear probing uses a fixed-step linear search for probing, differing from ordinary hash tables.
|
||||
|
||||
- **Inserting Elements**: Calculate the bucket index using the hash function. If the bucket already contains an element, linearly traverse forward from the conflict position (usually with a step size of $1$) until an empty bucket is found, then insert the element.
|
||||
- **Searching for Elements**: If a hash collision is found, use the same step size to linearly traverse forward until the corresponding element is found and return `value`; if an empty bucket is encountered, it means the target element is not in the hash table, so return `None`.
|
||||
|
||||
The figure below shows the distribution of key-value pairs in an open addressing (linear probing) hash table. According to this hash function, keys with the same last two digits will be mapped to the same bucket. Through linear probing, they are stored consecutively in that bucket and the buckets below it.
|
||||
|
||||

|
||||
|
||||
However, **linear probing tends to create "clustering"**. Specifically, the longer a continuous position in the array is occupied, the more likely these positions are to encounter hash collisions, further promoting the growth of these clusters and eventually leading to deterioration in the efficiency of operations.
|
||||
|
||||
It's important to note that **we cannot directly delete elements in an open addressing hash table**. Deleting an element creates an empty bucket `None` in the array. When searching for elements, if linear probing encounters this empty bucket, it will return, making the elements below this bucket inaccessible. The program may incorrectly assume these elements do not exist, as shown in the figure below.
|
||||
|
||||

|
||||
|
||||
To solve this problem, we can use a "lazy deletion" mechanism: instead of directly removing elements from the hash table, **use a constant `TOMBSTONE` to mark the bucket**. In this mechanism, both `None` and `TOMBSTONE` represent empty buckets and can hold key-value pairs. However, when linear probing encounters `TOMBSTONE`, it should continue traversing since there may still be key-value pairs below it.
|
||||
|
||||
However, **lazy deletion may accelerate the degradation of hash table performance**. Every deletion operation produces a delete mark, and as `TOMBSTONE` increases, so does the search time, as linear probing may have to skip multiple `TOMBSTONE` to find the target element.
|
||||
|
||||
Therefore, consider recording the index of the first `TOMBSTONE` encountered during linear probing and swapping the target element found with this `TOMBSTONE`. The advantage of this is that each time a query or addition is performed, the element is moved to a bucket closer to the ideal position (starting point of probing), thereby optimizing the query efficiency.
|
||||
|
||||
The code below implements an open addressing (linear probing) hash table with lazy deletion. To make fuller use of the hash table space, we treat the hash table as a "circular array," continuing to traverse from the beginning when the end of the array is passed.
|
||||
|
||||
```src
|
||||
[file]{hash_map_open_addressing}-[class]{hash_map_open_addressing}-[func]{}
|
||||
```
|
||||
|
||||
### Quadratic Probing
|
||||
|
||||
Quadratic probing is similar to linear probing and is one of the common strategies of open addressing. When a collision occurs, quadratic probing does not simply skip a fixed number of steps but skips "the square of the number of probes," i.e., $1, 4, 9, \dots$ steps.
|
||||
|
||||
Quadratic probing has the following advantages:
|
||||
|
||||
- Quadratic probing attempts to alleviate the clustering effect of linear probing by skipping the distance of the square of the number of probes.
|
||||
- Quadratic probing skips larger distances to find empty positions, helping to distribute data more evenly.
|
||||
|
||||
However, quadratic probing is not perfect:
|
||||
|
||||
- Clustering still exists, i.e., some positions are more likely to be occupied than others.
|
||||
- Due to the growth of squares, quadratic probing may not probe the entire hash table, meaning it might not access empty buckets even if they exist in the hash table.
|
||||
|
||||
### Double Hashing
|
||||
|
||||
As the name suggests, the double hashing method uses multiple hash functions $f_1(x)$, $f_2(x)$, $f_3(x)$, $\dots$ for probing.
|
||||
|
||||
- **Inserting Elements**: If hash function $f_1(x)$ encounters a conflict, try $f_2(x)$, and so on, until an empty position is found and the element is inserted.
|
||||
- **Searching for Elements**: Search in the same order of hash functions until the target element is found and returned; if an empty position is encountered or all hash functions have been tried, it indicates the element is not in the hash table, then return `None`.
|
||||
|
||||
Compared to linear probing, double hashing is less prone to clustering but involves additional computation for multiple hash functions.
|
||||
|
||||
!!! tip
|
||||
|
||||
Please note that open addressing (linear probing, quadratic probing, and double hashing) hash tables all have the issue of "not being able to directly delete elements."
|
||||
|
||||
## Choice of Programming Languages
|
||||
|
||||
Various programming languages have adopted different hash table implementation strategies, here are a few examples:
|
||||
|
||||
- Python uses open addressing. The `dict` dictionary uses pseudo-random numbers for probing.
|
||||
- Java uses separate chaining. Since JDK 1.8, when the array length in `HashMap` reaches 64 and the length of a linked list reaches 8, the linked list is converted to a red-black tree to improve search performance.
|
||||
- Go uses separate chaining. Go stipulates that each bucket can store up to 8 key-value pairs, and if the capacity is exceeded, an overflow bucket is connected; when there are too many overflow buckets, a special equal-size expansion operation is performed to ensure performance.
|
BIN
en/docs/chapter_hashing/hash_map.assets/hash_collision.png
Normal file
After Width: | Height: | Size: 30 KiB |
BIN
en/docs/chapter_hashing/hash_map.assets/hash_function.png
Normal file
After Width: | Height: | Size: 33 KiB |
BIN
en/docs/chapter_hashing/hash_map.assets/hash_table_lookup.png
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
en/docs/chapter_hashing/hash_map.assets/hash_table_reshash.png
Normal file
After Width: | Height: | Size: 27 KiB |
537
en/docs/chapter_hashing/hash_map.md
Executable file
@ -0,0 +1,537 @@
|
||||
# Hash Table
|
||||
|
||||
A "hash table", also known as a "hash map", achieves efficient element querying by establishing a mapping between keys and values. Specifically, when we input a `key` into the hash table, we can retrieve the corresponding `value` in $O(1)$ time.
|
||||
|
||||
As shown in the figure below, given $n$ students, each with two pieces of data: "name" and "student number". If we want to implement a query feature that returns the corresponding name when given a student number, we can use the hash table shown in the figure below.
|
||||
|
||||

|
||||
|
||||
Apart from hash tables, arrays and linked lists can also be used to implement querying functions. Their efficiency is compared in the table below.
|
||||
|
||||
- **Adding Elements**: Simply add the element to the end of the array (or linked list), using $O(1)$ time.
|
||||
- **Querying Elements**: Since the array (or linked list) is unordered, it requires traversing all the elements, using $O(n)$ time.
|
||||
- **Deleting Elements**: First, locate the element, then delete it from the array (or linked list), using $O(n)$ time.
|
||||
|
||||
<p align="center"> Table <id> Comparison of Element Query Efficiency </p>
|
||||
|
||||
| | Array | Linked List | Hash Table |
|
||||
| -------------- | ------ | ----------- | ---------- |
|
||||
| Find Element | $O(n)$ | $O(n)$ | $O(1)$ |
|
||||
| Add Element | $O(1)$ | $O(1)$ | $O(1)$ |
|
||||
| Delete Element | $O(n)$ | $O(n)$ | $O(1)$ |
|
||||
|
||||
Observations reveal that **the time complexity for adding, deleting, and querying in a hash table is $O(1)$**, which is highly efficient.
|
||||
|
||||
## Common Operations of Hash Table
|
||||
|
||||
Common operations of a hash table include initialization, querying, adding key-value pairs, and deleting key-value pairs, etc. Example code is as follows:
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title="hash_map.py"
|
||||
# Initialize hash table
|
||||
hmap: dict = {}
|
||||
|
||||
# Add operation
|
||||
# Add key-value pair (key, value) to the hash table
|
||||
hmap[12836] = "Xiao Ha"
|
||||
hmap[15937] = "Xiao Luo"
|
||||
hmap[16750] = "Xiao Suan"
|
||||
hmap[13276] = "Xiao Fa"
|
||||
hmap[10583] = "Xiao Ya"
|
||||
|
||||
# Query operation
|
||||
# Input key into hash table, get value
|
||||
name: str = hmap[15937]
|
||||
|
||||
# Delete operation
|
||||
# Delete key-value pair (key, value) from hash table
|
||||
hmap.pop(10583)
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title="hash_map.cpp"
|
||||
/* Initialize hash table */
|
||||
unordered_map<int, string> map;
|
||||
|
||||
/* Add operation */
|
||||
// Add key-value pair (key, value) to the hash table
|
||||
map[12836] = "Xiao Ha";
|
||||
map[15937] = "Xiao Luo";
|
||||
map[16750] = "Xiao Suan";
|
||||
map[13276] = "Xiao Fa";
|
||||
map[10583] = "Xiao Ya";
|
||||
|
||||
/* Query operation */
|
||||
// Input key into hash table, get value
|
||||
string name = map[15937];
|
||||
|
||||
/* Delete operation */
|
||||
// Delete key-value pair (key, value) from hash table
|
||||
map.erase(10583);
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title="hash_map.java"
|
||||
/* Initialize hash table */
|
||||
Map<Integer, String> map = new HashMap<>();
|
||||
|
||||
/* Add operation */
|
||||
// Add key-value pair (key, value) to the hash table
|
||||
map.put(12836, "Xiao Ha");
|
||||
map.put(15937, "Xiao Luo");
|
||||
map.put(16750, "Xiao Suan");
|
||||
map.put(13276, "Xiao Fa");
|
||||
map.put(10583, "Xiao Ya");
|
||||
|
||||
/* Query operation */
|
||||
// Input key into hash table, get value
|
||||
String name = map.get(15937);
|
||||
|
||||
/* Delete operation */
|
||||
// Delete key-value pair (key, value) from hash table
|
||||
map.remove(10583);
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title="hash_map.cs"
|
||||
/* Initialize hash table */
|
||||
Dictionary<int, string> map = new() {
|
||||
/* Add operation */
|
||||
// Add key-value pair (key, value) to the hash table
|
||||
{ 12836, "Xiao Ha" },
|
||||
{ 15937, "Xiao Luo" },
|
||||
{ 16750, "Xiao Suan" },
|
||||
{ 13276, "Xiao Fa" },
|
||||
{ 10583, "Xiao Ya" }
|
||||
};
|
||||
|
||||
/* Query operation */
|
||||
// Input key into hash table, get value
|
||||
string name = map[15937];
|
||||
|
||||
/* Delete operation */
|
||||
// Delete key-value pair (key, value) from hash table
|
||||
map.Remove(10583);
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title="hash_map_test.go"
|
||||
/* Initialize hash table */
|
||||
hmap := make(map[int]string)
|
||||
|
||||
/* Add operation */
|
||||
// Add key-value pair (key, value) to the hash table
|
||||
hmap[12836] = "Xiao Ha"
|
||||
hmap[15937] = "Xiao Luo"
|
||||
hmap[16750] = "Xiao Suan"
|
||||
hmap[13276] = "Xiao Fa"
|
||||
hmap[10583] = "Xiao Ya"
|
||||
|
||||
/* Query operation */
|
||||
// Input key into hash table, get value
|
||||
name := hmap[15937]
|
||||
|
||||
/* Delete operation */
|
||||
// Delete key-value pair (key, value) from hash table
|
||||
delete(hmap, 10583)
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title="hash_map.swift"
|
||||
/* Initialize hash table */
|
||||
var map: [Int: String] = [:]
|
||||
|
||||
/* Add operation */
|
||||
// Add key-value pair (key, value) to the hash table
|
||||
map[12836] = "Xiao Ha"
|
||||
map[15937] = "Xiao Luo"
|
||||
map[16750] = "Xiao Suan"
|
||||
map[13276] = "Xiao Fa"
|
||||
map[10583] = "Xiao Ya"
|
||||
|
||||
/* Query operation */
|
||||
// Input key into hash table, get value
|
||||
let name = map[15937]!
|
||||
|
||||
/* Delete operation */
|
||||
// Delete key-value pair (key, value) from hash table
|
||||
map.removeValue(forKey: 10583)
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title="hash_map.js"
|
||||
/* Initialize hash table */
|
||||
const map = new Map();
|
||||
/* Add operation */
|
||||
// Add key-value pair (key, value) to the hash table
|
||||
map.set(12836, 'Xiao Ha');
|
||||
map.set(15937, 'Xiao Luo');
|
||||
map.set(16750, 'Xiao Suan');
|
||||
map.set(13276, 'Xiao Fa');
|
||||
map.set(10583, 'Xiao Ya');
|
||||
|
||||
/* Query operation */
|
||||
// Input key into hash table, get value
|
||||
let name = map.get(15937);
|
||||
|
||||
/* Delete operation */
|
||||
// Delete key-value pair (key, value) from hash table
|
||||
map.delete(10583);
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title="hash_map.ts"
|
||||
/* Initialize hash table */
|
||||
const map = new Map<number, string>();
|
||||
/* Add operation */
|
||||
// Add key-value pair (key, value) to the hash table
|
||||
map.set(12836, 'Xiao Ha');
|
||||
map.set(15937, 'Xiao Luo');
|
||||
map.set(16750, 'Xiao Suan');
|
||||
map.set(13276, 'Xiao Fa');
|
||||
map.set(10583, 'Xiao Ya');
|
||||
console.info('\nAfter adding, the hash table is\nKey -> Value');
|
||||
console.info(map);
|
||||
|
||||
/* Query operation */
|
||||
// Input key into hash table, get value
|
||||
let name = map.get(15937);
|
||||
console.info('\nInput student number 15937, query name ' + name);
|
||||
|
||||
/* Delete operation */
|
||||
// Delete key-value pair (key, value) from hash table
|
||||
map.delete(10583);
|
||||
console.info('\nAfter deleting 10583, the hash table is\nKey -> Value');
|
||||
console.info(map);
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title="hash_map.dart"
|
||||
/* Initialize hash table */
|
||||
Map<int, String> map = {};
|
||||
|
||||
/* Add operation */
|
||||
// Add key-value pair (key, value) to the hash table
|
||||
map[12836] = "Xiao Ha";
|
||||
map[15937] = "Xiao Luo";
|
||||
map[16750] = "Xiao Suan";
|
||||
map[13276] = "Xiao Fa";
|
||||
map[10583] = "Xiao Ya";
|
||||
|
||||
/* Query operation */
|
||||
// Input key into hash table, get value
|
||||
String name = map[15937];
|
||||
|
||||
/* Delete operation */
|
||||
// Delete key-value pair (key, value) from hash table
|
||||
map.remove(10583);
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title="hash_map.rs"
|
||||
use std::collections::HashMap;
|
||||
|
||||
/* Initialize hash table */
|
||||
let mut map: HashMap<i32, String> = HashMap::new();
|
||||
|
||||
/* Add operation */
|
||||
// Add key-value pair (key, value) to the hash table
|
||||
map.insert(12836, "Xiao Ha".to_string());
|
||||
map.insert(15937, "Xiao Luo".to_string());
|
||||
map.insert(16750, "Xiao Suan".to_string());
|
||||
map.insert(13279, "Xiao Fa".to_string());
|
||||
map.insert(10583, "Xiao Ya".to_string());
|
||||
|
||||
/* Query operation */
|
||||
// Input key into hash table, get value
|
||||
let _name: Option<&String> = map.get(&15937);
|
||||
|
||||
/* Delete operation */
|
||||
// Delete key-value pair (key, value) from hash table
|
||||
let _removed_value: Option<String> = map.remove(&10583);
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title="hash_map.c"
|
||||
// C does not provide a built-in hash table
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title="hash_map.kt"
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title="hash_map.zig"
|
||||
|
||||
```
|
||||
|
||||
??? pythontutor "Code Visualization"
|
||||
|
||||
https://pythontutor.com/render.html#code=%22%22%22Driver%20Code%22%22%22%0Aif%20__name__%20%3D%3D%20%22__main__%22%3A%0A%20%20%20%20%23%20%E5%88%9D%E5%A7%8B%E5%8C%96%E5%93%88%E5%B8%8C%E8%A1%A8%0A%20%20%20%20hmap%20%3D%20%7B%7D%0A%20%20%20%20%0A%20%20%20%20%23%20%E6%B7%BB%E5%8A%A0%E6%93%8D%E4%BD%9C%0A%20%20%20%20%23%20%E5%9C%A8%E5%93%88%E5%B8%8C%E8%A1%A8%E4%B8%AD%E6%B7%BB%E5%8A%A0%E9%94%AE%E5%80%BC%E5%AF%B9%20%28key,%20value%29%0A%20%20%20%20hmap%5B12836%5D%20%3D%20%22%E5%B0%8F%E5%93%88%22%0A%20%20%20%20hmap%5B15937%5D%20%3D%20%22%E5%B0%8F%E5%95%B0%22%0A%20%20%20%20hmap%5B16750%5D%20%3D%20%22%E5%B0%8F%E7%AE%97%22%0A%20%20%20%20hmap%5B13276%5D%20%3D%20%22%E5%B0%8F%E6%B3%95%22%0A%20%20%20%20hmap%5B10583%5D%20%3D%20%22%E5%B0%8F%E9%B8%AD%22%0A%20%20%20%20%0A%20%20%20%20%23%20%E6%9F%A5%E8%AF%A2%E6%93%8D%E4%BD%9C%0A%20%20%20%20%23%20%E5%90%91%E5%93%88%E5%B8%8C%E8%A1%A8%E4%B8%AD%E8%BE%93%E5%85%A5%E9%94%AE%20key%20%EF%BC%8C%E5%BE%97%E5%88%B0%E5%80%BC%20value%0A%20%20%20%20name%20%3D%20hmap%5B15937%5D%0A%20%20%20%20%0A%20%20%20%20%23%20%E5%88%A0%E9%99%A4%E6%93%8D%E4%BD%9C%0A%20%20%20%20%23%20%E5%9C%A8%E5%93%88%E5%B8%8C%E8%A1%A8%E4%B8%AD%E5%88%A0%E9%99%A4%E9%94%AE%E5%80%BC%E5%AF%B9%20%28key,%20value%29%0A%20%20%20%20hmap.pop%2810583%29&cumulative=false&curInstr=2&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=311&rawInputLstJSON=%5B%5D&textReferences=false
|
||||
|
||||
There are three common ways to traverse a hash table: traversing key-value pairs, keys, and values. Example code is as follows:
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python title="hash_map.py"
|
||||
# Traverse hash table
|
||||
# Traverse key-value pairs key->value
|
||||
for key, value in hmap.items():
|
||||
print(key, "->", value)
|
||||
# Traverse keys only
|
||||
for key in hmap.keys():
|
||||
print(key)
|
||||
# Traverse values only
|
||||
for value in hmap.values():
|
||||
print(value)
|
||||
```
|
||||
|
||||
=== "C++"
|
||||
|
||||
```cpp title="hash_map.cpp"
|
||||
/* Traverse hash table */
|
||||
// Traverse key-value pairs key->value
|
||||
for (auto kv: map) {
|
||||
cout << kv.first << " -> " << kv.second << endl;
|
||||
}
|
||||
// Traverse using iterator key->value
|
||||
for (auto iter = map.begin(); iter != map.end(); iter++) {
|
||||
cout << iter->first << "->" << iter->second << endl;
|
||||
}
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java title="hash_map.java"
|
||||
/* Traverse hash table */
|
||||
// Traverse key-value pairs key->value
|
||||
for (Map.Entry<Integer, String> kv: map.entrySet()) {
|
||||
System.out.println(kv.getKey() + " -> " + kv.getValue());
|
||||
}
|
||||
// Traverse keys only
|
||||
for (int key: map.keySet()) {
|
||||
System.out.println(key);
|
||||
}
|
||||
// Traverse values only
|
||||
for (String val: map.values()) {
|
||||
System.out.println(val);
|
||||
}
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp title="hash_map.cs"
|
||||
/* Traverse hash table */
|
||||
// Traverse key-value pairs Key->Value
|
||||
foreach (var kv in map) {
|
||||
Console.WriteLine(kv.Key + " -> " + kv.Value);
|
||||
}
|
||||
// Traverse keys only
|
||||
foreach (int key in map.Keys) {
|
||||
Console.WriteLine(key);
|
||||
}
|
||||
// Traverse values only
|
||||
foreach (string val in map.Values) {
|
||||
Console.WriteLine(val);
|
||||
}
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go title="hash_map_test.go"
|
||||
/* Traverse hash table */
|
||||
// Traverse key-value pairs key->value
|
||||
for key, value := range hmap {
|
||||
fmt.Println(key, "->", value)
|
||||
}
|
||||
// Traverse keys only
|
||||
for key := range hmap {
|
||||
fmt.Println(key)
|
||||
}
|
||||
// Traverse values only
|
||||
for _, value := range hmap {
|
||||
fmt.Println(value)
|
||||
}
|
||||
```
|
||||
|
||||
=== "Swift"
|
||||
|
||||
```swift title="hash_map.swift"
|
||||
/* Traverse hash table */
|
||||
// Traverse key-value pairs Key->Value
|
||||
for (key, value) in map {
|
||||
print("\(key) -> \(value)")
|
||||
}
|
||||
// Traverse keys only
|
||||
for key in map.keys {
|
||||
print(key)
|
||||
}
|
||||
// Traverse values only
|
||||
for value in map.values {
|
||||
print(value)
|
||||
}
|
||||
```
|
||||
|
||||
=== "JS"
|
||||
|
||||
```javascript title="hash_map.js"
|
||||
/* Traverse hash table */
|
||||
console.info('\nTraverse key-value pairs Key->Value');
|
||||
for (const [k, v] of map.entries()) {
|
||||
console.info(k + ' -> ' + v);
|
||||
}
|
||||
console.info('\nTraverse keys only Key');
|
||||
for (const k of map.keys()) {
|
||||
console.info(k);
|
||||
}
|
||||
console.info('\nTraverse values only Value');
|
||||
for (const v of map.values()) {
|
||||
console.info(v);
|
||||
}
|
||||
```
|
||||
|
||||
=== "TS"
|
||||
|
||||
```typescript title="hash_map.ts"
|
||||
/* Traverse hash table */
|
||||
console.info('\nTraverse key-value pairs Key->Value');
|
||||
for (const [k, v] of map.entries()) {
|
||||
console.info(k + ' -> ' + v);
|
||||
}
|
||||
console.info('\nTraverse keys only Key');
|
||||
for (const k of map.keys()) {
|
||||
console.info(k);
|
||||
}
|
||||
console.info('\nTraverse values only Value');
|
||||
for (const v of map.values()) {
|
||||
console.info(v);
|
||||
}
|
||||
```
|
||||
|
||||
=== "Dart"
|
||||
|
||||
```dart title="hash_map.dart"
|
||||
/* Traverse hash table */
|
||||
// Traverse key-value pairs Key->Value
|
||||
map.forEach((key, value) {
|
||||
print('$key -> $value');
|
||||
});
|
||||
|
||||
// Traverse keys only Key
|
||||
map.keys.forEach((key) {
|
||||
print(key);
|
||||
});
|
||||
|
||||
// Traverse values only Value
|
||||
map.values.forEach((value) {
|
||||
print(value);
|
||||
});
|
||||
```
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust title="hash_map.rs"
|
||||
/* Traverse hash table */
|
||||
// Traverse key-value pairs Key->Value
|
||||
for (key, value) in &map {
|
||||
println!("{key} -> {value}");
|
||||
}
|
||||
|
||||
// Traverse keys only Key
|
||||
for key in map.keys() {
|
||||
println!("{key}");
|
||||
}
|
||||
|
||||
// Traverse values only Value
|
||||
for value in map.values() {
|
||||
println!("{value}");
|
||||
}
|
||||
```
|
||||
|
||||
=== "C"
|
||||
|
||||
```c title="hash_map.c"
|
||||
// C does not provide a built-in hash table
|
||||
```
|
||||
|
||||
=== "Kotlin"
|
||||
|
||||
```kotlin title="hash_map.kt"
|
||||
|
||||
```
|
||||
|
||||
=== "Zig"
|
||||
|
||||
```zig title="hash_map.zig"
|
||||
// Zig example is not provided
|
||||
```
|
||||
|
||||
??? pythontutor "Code Visualization"
|
||||
|
||||
https://pythontutor.com/render.html#code=%22%22%22Driver%20Code%22%22%22%0Aif%20__name__%20%3D%3D%20%22__main__%22%3A%0A%20%20%20%20%23%20%E5%88%9D%E5%A7%8B%E5%8C%96%E5%93%88%E5%B8%8C%E8%A1%A8%0A%20%20%20%20hmap%20%3D%20%7B%7D%0A%20%20%20%20%0A%20%20%20%20%23%20%E6%B7%BB%E5%8A%A0%E6%93%8D%E4%BD%9C%0A%20%20%20%20%23%20%E5%9C%A8%E5%93%88%E5%B8%8C%E8%A1%A8%E4%B8%AD%E6%B7%BB%E5%8A%A0%E9%94%AE%E5%80%BC%E5%AF%B9%20%28key,%20value%29%0A%20%20%20%20hmap%5B12836%5D%20%3D%20%22%E5%B0%8F%E5%93%88%22%0A%20%20%20%20hmap%5B15937%5D%20%3D%20%22%E5%B0%8F%E5%95%B0%22%0A%20%20%20%20hmap%5B16750%5D%20%3D%20%22%E5%B0%8F%E7%AE%97%22%0A%20%20%20%20hmap%5B13276%5D%20%3D%20%22%E5%B0%8F%E6%B3%95%22%0A%20%20%20%20hmap%5B10583%5D%20%3D%20%22%E5%B0%8F%E9%B8%AD%22%0A%20%20%20%20%0A%20%20%20%20%23%20%E9%81%8D%E5%8E%86%E5%93%88%E5%B8%8C%E8%A1%A8%0A%20%20%20%20%23%20%E9%81%8D%E5%8E%86%E9%94%AE%E5%80%BC%E5%AF%B9%20key-%3Evalue%0A%20%20%20%20for%20key,%20value%20in%20hmap.items%28%29%3A%0A%20%20%20%20%20%20%20%20print%28key,%20%22-%3E%22,%20value%29%0A%20%20%20%20%23%20%E5%8D%95%E7%8B%AC%E9%81%8D%E5%8E%86%E9%94%AE%20key%0A%20%20%20%20for%20key%20in%20hmap.keys%28%29%3A%0A%20%20%20%20%20%20%20%20print%28key%29%0A%20%20%20%20%23%20%E5%8D%95%E7%8B%AC%E9%81%8D%E5%8E%86%E5%80%BC%20value%0A%20%20%20%20for%20value%20in%20hmap.values%28%29%3A%0A%20%20%20%20%20%20%20%20print%28value%29&cumulative=false&curInstr=8&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=311&rawInputLstJSON=%5B%5D&textReferences=false
|
||||
|
||||
## Simple Implementation of Hash Table
|
||||
|
||||
First, let's consider the simplest case: **implementing a hash table using just an array**. In the hash table, each empty slot in the array is called a "bucket", and each bucket can store one key-value pair. Therefore, the query operation involves finding the bucket corresponding to the `key` and retrieving the `value` from it.
|
||||
|
||||
So, how do we locate the appropriate bucket based on the `key`? This is achieved through a "hash function". The role of the hash function is to map a larger input space to a smaller output space. In a hash table, the input space is all possible keys, and the output space is all buckets (array indices). In other words, input a `key`, **and we can use the hash function to determine the storage location of the corresponding key-value pair in the array**.
|
||||
|
||||
The calculation process of the hash function for a given `key` is divided into the following two steps:
|
||||
|
||||
1. Calculate the hash value using a certain hash algorithm `hash()`.
|
||||
2. Take the modulus of the hash value with the number of buckets (array length) `capacity` to obtain the array index `index`.
|
||||
|
||||
```shell
|
||||
index = hash(key) % capacity
|
||||
```
|
||||
|
||||
Afterward, we can use `index` to access the corresponding bucket in the hash table and thereby retrieve the `value`.
|
||||
|
||||
Assuming array length `capacity = 100` and hash algorithm `hash(key) = key`, the hash function is `key % 100`. The figure below uses `key` as the student number and `value` as the name to demonstrate the working principle of the hash function.
|
||||
|
||||

|
||||
|
||||
The following code implements a simple hash table. Here, we encapsulate `key` and `value` into a class `Pair` to represent the key-value pair.
|
||||
|
||||
```src
|
||||
[file]{array_hash_map}-[class]{array_hash_map}-[func]{}
|
||||
```
|
||||
|
||||
## Hash Collision and Resizing
|
||||
|
||||
Fundamentally, the role of the hash function is to map the entire input space of all keys to the output space of all array indices. However, the input space is often much larger than the output space. Therefore, **theoretically, there must be situations where "multiple inputs correspond to the same output"**.
|
||||
|
||||
For the hash function in the above example, if the last two digits of the input `key` are the same, the output of the hash function will also be the same. For example, when querying for students with student numbers 12836 and 20336, we find:
|
||||
|
||||
```shell
|
||||
12836 % 100 = 36
|
||||
20336 % 100 = 36
|
||||
```
|
||||
|
||||
As shown in the figure below, both student numbers point to the same name, which is obviously incorrect. This situation where multiple inputs correspond to the same output is known as "hash collision".
|
||||
|
||||

|
||||
|
||||
It is easy to understand that the larger the capacity $n$ of the hash table, the lower the probability of multiple keys being allocated to the same bucket, and the fewer the collisions. Therefore, **expanding the capacity of the hash table can reduce hash collisions**.
|
||||
|
||||
As shown in the figure below, before expansion, key-value pairs `(136, A)` and `(236, D)` collided; after expansion, the collision is resolved.
|
||||
|
||||

|
||||
|
||||
Similar to array expansion, resizing a hash table requires migrating all key-value pairs from the original hash table to the new one, which is time-consuming. Furthermore, since the capacity `capacity` of the hash table changes, we need to recalculate the storage positions of all key-value pairs using the hash function, which adds to the computational overhead of the resizing process. Therefore, programming languages often reserve a sufficiently large capacity for the hash table to prevent frequent resizing.
|
||||
|
||||
The "load factor" is an important concept for hash tables. It is defined as the ratio of the number of elements in the hash table to the number of buckets. It is used to measure the severity of hash collisions and **is often used as a trigger for resizing the hash table**. For example, in Java, when the load factor exceeds $0.75$, the system will resize the hash table to twice its original size.
|
13
en/docs/chapter_hashing/index.md
Normal file
@ -0,0 +1,13 @@
|
||||
# Hash Table
|
||||
|
||||
<div class="center-table" markdown>
|
||||
|
||||

|
||||
|
||||
</div>
|
||||
|
||||
!!! abstract
|
||||
|
||||
In the world of computing, a hash table is akin to an intelligent librarian.
|
||||
|
||||
It understands how to compute index numbers, enabling swift retrieval of the desired book.
|
47
en/docs/chapter_hashing/summary.md
Normal file
@ -0,0 +1,47 @@
|
||||
# Summary
|
||||
|
||||
### Key Review
|
||||
|
||||
- Given an input `key`, a hash table can retrieve the corresponding `value` in $O(1)$ time, which is highly efficient.
|
||||
- Common hash table operations include querying, adding key-value pairs, deleting key-value pairs, and traversing the hash table.
|
||||
- The hash function maps a `key` to an array index, allowing access to the corresponding bucket to retrieve the `value`.
|
||||
- Two different keys may end up with the same array index after hashing, leading to erroneous query results. This phenomenon is known as hash collision.
|
||||
- The larger the capacity of the hash table, the lower the probability of hash collisions. Therefore, hash table resizing can mitigate hash collisions. Similar to array resizing, hash table resizing is costly.
|
||||
- Load factor, defined as the ratio of the number of elements to the number of buckets in the hash table, reflects the severity of hash collisions and is often used as a trigger for resizing the hash table.
|
||||
- Chaining addresses hash collisions by converting each element into a linked list, storing all colliding elements in the same list. However, excessively long lists can reduce query efficiency, which can be improved by converting the lists into red-black trees.
|
||||
- Open addressing handles hash collisions through multiple probes. Linear probing uses a fixed step size but cannot delete elements and is prone to clustering. Multiple hashing uses several hash functions for probing, making it less susceptible to clustering but increasing computational load.
|
||||
- Different programming languages adopt various hash table implementations. For example, Java's `HashMap` uses chaining, while Python's `dict` employs open addressing.
|
||||
- In hash tables, we desire hash algorithms with determinism, high efficiency, and uniform distribution. In cryptography, hash algorithms should also possess collision resistance and the avalanche effect.
|
||||
- Hash algorithms typically use large prime numbers as moduli to ensure uniform distribution of hash values and reduce hash collisions.
|
||||
- Common hash algorithms include MD5, SHA-1, SHA-2, and SHA-3. MD5 is often used for file integrity checks, while SHA-2 is commonly used in secure applications and protocols.
|
||||
- Programming languages usually provide built-in hash algorithms for data types to calculate bucket indices in hash tables. Generally, only immutable objects are hashable.
|
||||
|
||||
### Q & A
|
||||
|
||||
**Q**: When does the time complexity of a hash table degrade to $O(n)$?
|
||||
|
||||
The time complexity of a hash table can degrade to $O(n)$ when hash collisions are severe. When the hash function is well-designed, the capacity is set appropriately, and collisions are evenly distributed, the time complexity is $O(1)$. We usually consider the time complexity to be $O(1)$ when using built-in hash tables in programming languages.
|
||||
|
||||
**Q**: Why not use the hash function $f(x) = x$? This would eliminate collisions.
|
||||
|
||||
Under the hash function $f(x) = x$, each element corresponds to a unique bucket index, which is equivalent to an array. However, the input space is usually much larger than the output space (array length), so the last step of a hash function is often to take the modulo of the array length. In other words, the goal of a hash table is to map a larger state space to a smaller one while providing $O(1)$ query efficiency.
|
||||
|
||||
**Q**: Why can hash tables be more efficient than arrays, linked lists, or binary trees, even though they are implemented using these structures?
|
||||
|
||||
Firstly, hash tables have higher time efficiency but lower space efficiency. A significant portion of memory in hash tables remains unused.
|
||||
|
||||
Secondly, they are only more efficient in specific use cases. If a feature can be implemented with the same time complexity using an array or a linked list, it's usually faster than using a hash table. This is because the computation of the hash function incurs overhead, making the constant factor in the time complexity larger.
|
||||
|
||||
Lastly, the time complexity of hash tables can degrade. For example, in chaining, we perform search operations in a linked list or red-black tree, which still risks degrading to $O(n)$ time.
|
||||
|
||||
**Q**: Does multiple hashing also have the flaw of not being able to delete elements directly? Can space marked as deleted be reused?
|
||||
|
||||
Multiple hashing is a form of open addressing, and all open addressing methods have the drawback of not being able to delete elements directly; they require marking elements as deleted. Marked spaces can be reused. When inserting new elements into the hash table, and the hash function points to a position marked as deleted, that position can be used by the new element. This maintains the probing sequence of the hash table while ensuring efficient use of space.
|
||||
|
||||
**Q**: Why do hash collisions occur during the search process in linear probing?
|
||||
|
||||
During the search process, the hash function points to the corresponding bucket and key-value pair. If the `key` doesn't match, it indicates a hash collision. Therefore, linear probing will search downwards at a predetermined step size until the correct key-value pair is found or the search fails.
|
||||
|
||||
**Q**: Why can resizing a hash table alleviate hash collisions?
|
||||
|
||||
The last step of a hash function often involves taking the modulo of the array length $n$, to keep the output within the array index range. When resizing, the array length $n$ changes, and the indices corresponding to the keys may also change. Keys that were previously mapped to the same bucket might be distributed across multiple buckets after resizing, thereby mitigating hash collisions.
|
After Width: | Height: | Size: 11 KiB |
After Width: | Height: | Size: 16 KiB |
After Width: | Height: | Size: 16 KiB |
After Width: | Height: | Size: 16 KiB |
After Width: | Height: | Size: 14 KiB |
After Width: | Height: | Size: 25 KiB |
After Width: | Height: | Size: 58 KiB |
56
en/docs/chapter_introduction/algorithms_are_everywhere.md
Normal file
@ -0,0 +1,56 @@
|
||||
# Algorithms are Everywhere
|
||||
|
||||
When we hear the word "algorithm," we naturally think of mathematics. However, many algorithms do not involve complex mathematics but rely more on basic logic, which can be seen everywhere in our daily lives.
|
||||
|
||||
Before formally discussing algorithms, there's an interesting fact worth sharing: **you have already unconsciously learned many algorithms and have become accustomed to applying them in your daily life**. Here, I will give a few specific examples to prove this point.
|
||||
|
||||
**Example 1: Looking Up a Dictionary**. In an English dictionary, words are listed alphabetically. Suppose we're searching for a word that starts with the letter $r$. This is typically done in the following way:
|
||||
|
||||
1. Open the dictionary to about halfway and check the first letter on the page, let's say the letter is $m$.
|
||||
2. Since $r$ comes after $m$ in the alphabet, we can ignore the first half of the dictionary and focus on the latter half.
|
||||
3. Repeat steps `1.` and `2.` until you find the page where the word starts with $r$.
|
||||
|
||||
=== "<1>"
|
||||

|
||||
|
||||
=== "<2>"
|
||||

|
||||
|
||||
=== "<3>"
|
||||

|
||||
|
||||
=== "<4>"
|
||||

|
||||
|
||||
=== "<5>"
|
||||

|
||||
|
||||
This essential skill for elementary students, looking up a dictionary, is actually the famous "Binary Search" algorithm. From a data structure perspective, we can consider the dictionary as a sorted "array"; from an algorithmic perspective, the series of actions taken to look up a word in the dictionary can be viewed as "Binary Search."
|
||||
|
||||
**Example 2: Organizing Playing Cards**. When playing cards, we need to arrange the cards in our hand in ascending order, as shown in the following process.
|
||||
|
||||
1. Divide the playing cards into "ordered" and "unordered" sections, assuming initially the leftmost card is already in order.
|
||||
2. Take out a card from the unordered section and insert it into the correct position in the ordered section; after this, the leftmost two cards are in order.
|
||||
3. Continue to repeat step `2.` until all cards are in order.
|
||||
|
||||

|
||||
|
||||
The above method of organizing playing cards is essentially the "Insertion Sort" algorithm, which is very efficient for small datasets. Many programming languages' sorting functions include the insertion sort.
|
||||
|
||||
**Example 3: Making Change**. Suppose we buy goods worth $69$ yuan at a supermarket and give the cashier $100$ yuan, then the cashier needs to give us $31$ yuan in change. They would naturally complete the thought process as shown below.
|
||||
|
||||
1. The options are currencies smaller than $31$, including $1$, $5$, $10$, and $20$.
|
||||
2. Take out the largest $20$ from the options, leaving $31 - 20 = 11$.
|
||||
3. Take out the largest $10$ from the remaining options, leaving $11 - 10 = 1$.
|
||||
4. Take out the largest $1$ from the remaining options, leaving $1 - 1 = 0$.
|
||||
5. Complete the change-making, with the solution being $20 + 10 + 1 = 31$.
|
||||
|
||||

|
||||
|
||||
In the above steps, we make the best choice at each step (using the largest denomination possible), ultimately resulting in a feasible change-making plan. From the perspective of data structures and algorithms, this method is essentially a "Greedy" algorithm.
|
||||
|
||||
From cooking a meal to interstellar travel, almost all problem-solving involves algorithms. The advent of computers allows us to store data structures in memory and write code to call the CPU and GPU to execute algorithms. In this way, we can transfer real-life problems to computers, solving various complex issues more efficiently.
|
||||
|
||||
!!! tip
|
||||
|
||||
If concepts such as data structures, algorithms, arrays, and binary search still seem somewhat obsecure, I encourage you to continue reading. This book will gently guide you into the realm of understanding data structures and algorithms.
|
13
en/docs/chapter_introduction/index.md
Normal file
@ -0,0 +1,13 @@
|
||||
# Introduction to Algorithms
|
||||
|
||||
<div class="center-table" markdown>
|
||||
|
||||

|
||||
|
||||
</div>
|
||||
|
||||
!!! abstract
|
||||
|
||||
A graceful maiden dances, intertwined with the data, her skirt swaying to the melody of algorithms.
|
||||
|
||||
She invites you to a dance, follow her steps, and enter the world of algorithms full of logic and beauty.
|
9
en/docs/chapter_introduction/summary.md
Normal file
@ -0,0 +1,9 @@
|
||||
# Summary
|
||||
|
||||
- Algorithms are ubiquitous in daily life and are not as inaccessible and complex as they might seem. In fact, we have already unconsciously learned many algorithms to solve various problems in life.
|
||||
- The principle of looking up a word in a dictionary is consistent with the binary search algorithm. The binary search algorithm embodies the important algorithmic concept of divide and conquer.
|
||||
- The process of organizing playing cards is very similar to the insertion sort algorithm. The insertion sort algorithm is suitable for sorting small datasets.
|
||||
- The steps of making change in currency essentially follow the greedy algorithm, where each step involves making the best possible choice at the moment.
|
||||
- An algorithm is a set of instructions or steps used to solve a specific problem within a finite amount of time, while a data structure is the way data is organized and stored in a computer.
|
||||
- Data structures and algorithms are closely linked. Data structures are the foundation of algorithms, and algorithms are the stage to utilize the functions of data structures.
|
||||
- We can liken data structures and algorithms to building blocks. The blocks represent data, the shape and connection method of the blocks represent data structures, and the steps of assembling the blocks correspond to algorithms.
|
After Width: | Height: | Size: 346 KiB |
After Width: | Height: | Size: 13 KiB |
53
en/docs/chapter_introduction/what_is_dsa.md
Normal file
@ -0,0 +1,53 @@
|
||||
# What is an Algorithm
|
||||
|
||||
## Definition of an Algorithm
|
||||
|
||||
An "algorithm" is a set of instructions or steps to solve a specific problem within a finite amount of time. It has the following characteristics:
|
||||
|
||||
- The problem is clearly defined, including unambiguous definitions of input and output.
|
||||
- The algorithm is feasible, meaning it can be completed within a finite number of steps, time, and memory space.
|
||||
- Each step has a definitive meaning. The output is consistently the same under the same inputs and conditions.
|
||||
|
||||
## Definition of a Data Structure
|
||||
|
||||
A "data structure" is a way of organizing and storing data in a computer, with the following design goals:
|
||||
|
||||
- Minimize space occupancy to save computer memory.
|
||||
- Make data operations as fast as possible, covering data access, addition, deletion, updating, etc.
|
||||
- Provide concise data representation and logical information to enable efficient algorithm execution.
|
||||
|
||||
**Designing data structures is a balancing act, often requiring trade-offs**. If you want to improve in one aspect, you often need to compromise in another. Here are two examples:
|
||||
|
||||
- Compared to arrays, linked lists offer more convenience in data addition and deletion but sacrifice data access speed.
|
||||
- Graphs, compared to linked lists, provide richer logical information but require more memory space.
|
||||
|
||||
## Relationship Between Data Structures and Algorithms
|
||||
|
||||
As shown in the figure below, data structures and algorithms are highly related and closely integrated, specifically in the following three aspects:
|
||||
|
||||
- Data structures are the foundation of algorithms. They provide structured data storage and methods for manipulating data for algorithms.
|
||||
- Algorithms are the stage where data structures come into play. The data structure alone only stores data information; it is through the application of algorithms that specific problems can be solved.
|
||||
- Algorithms can often be implemented based on different data structures, but their execution efficiency can vary greatly. Choosing the right data structure is key.
|
||||
|
||||

|
||||
|
||||
Data structures and algorithms can be likened to a set of building blocks, as illustrated in the figure below. A building block set includes numerous pieces, accompanied by detailed assembly instructions. Following these instructions step by step allows us to construct an intricate block model.
|
||||
|
||||

|
||||
|
||||
The detailed correspondence between the two is shown in the table below.
|
||||
|
||||
<p align="center"> Table <id> Comparing Data Structures and Algorithms to Building Blocks </p>
|
||||
|
||||
| Data Structures and Algorithms | Building Blocks |
|
||||
| ------------------------------ | --------------------------------------------------------------- |
|
||||
| Input data | Unassembled blocks |
|
||||
| Data structure | Organization of blocks, including shape, size, connections, etc |
|
||||
| Algorithm | A series of steps to assemble the blocks into the desired shape |
|
||||
| Output data | Completed Block model |
|
||||
|
||||
It's worth noting that data structures and algorithms are independent of programming languages. For this reason, this book is able to provide implementations in multiple programming languages.
|
||||
|
||||
!!! tip "Conventional Abbreviation"
|
||||
|
||||
In real-life discussions, we often refer to "Data Structures and Algorithms" simply as "Algorithms". For example, the well-known LeetCode algorithm problems actually test both data structure and algorithm knowledge.
|
After Width: | Height: | Size: 154 KiB |
50
en/docs/chapter_preface/about_the_book.md
Normal file
@ -0,0 +1,50 @@
|
||||
# About This Book
|
||||
|
||||
This open-source project aims to create a free, and beginner-friendly crash course on data structures and algorithms.
|
||||
|
||||
- Using animated illustrations, it delivers structured insights into data structures and algorithmic concepts, ensuring comprehensibility and a smooth learning curve.
|
||||
- Run code with just one click, supporting Java, C++, Python, Go, JS, TS, C#, Swift, Rust, Dart, Zig and other languages.
|
||||
- Readers are encouraged to engage with each other in the discussion area for each section, questions and comments are usually answered within two days.
|
||||
|
||||
## Target Audience
|
||||
|
||||
If you are new to algorithms with limited exposure, or you have accumulated some experience in algorithms, but you only have a vague understanding of data structures and algorithms, and you are constantly jumping between "yep" and "hmm", then this book is for you!
|
||||
|
||||
If you have already accumulated a certain amount of problem-solving experience, and are familiar with most types of problems, then this book can help you review and organize your algorithm knowledge system. The repository's source code can be used as a "problem-solving toolkit" or an "algorithm cheat sheet".
|
||||
|
||||
If you are an algorithm expert, we look forward to receiving your valuable suggestions, or [join us and collaborate](https://www.hello-algo.com/chapter_appendix/contribution/).
|
||||
|
||||
!!! success "Prerequisites"
|
||||
|
||||
You should know how to write and read simple code in at least one programming language.
|
||||
|
||||
## Content Structure
|
||||
|
||||
The main content of the book is shown in the following figure.
|
||||
|
||||
- **Complexity Analysis**: explores aspects and methods for evaluating data structures and algorithms. Covers methods of deriving time complexity and space complexity, along with common types and examples.
|
||||
- **Data Structures**: focuses on fundamental data types, classification methods, definitions, pros and cons, common operations, types, applications, and implementation methods of data structures such as array, linked list, stack, queue, hash table, tree, heap, graph, etc.
|
||||
- **Algorithms**: defines algorithms, discusses their pros and cons, efficiency, application scenarios, problem-solving steps, and includes sample questions for various algorithms such as search, sorting, divide and conquer, backtracking, dynamic programming, greedy algorithms, and more.
|
||||
|
||||

|
||||
|
||||
## Acknowledgements
|
||||
|
||||
This book is continuously improved with the joint efforts of many contributors from the open-source community. Thanks to each writer who invested their time and energy, listed in the order generated by GitHub: krahets, codingonion, nuomi1, Gonglja, Reanon, justin-tse, danielsss, hpstory, S-N-O-R-L-A-X, night-cruise, msk397, gvenusleo, RiverTwilight, gyt95, zhuoqinyue, Zuoxun, Xia-Sang, mingXta, FangYuan33, GN-Yu, IsChristina, xBLACKICEx, guowei-gong, Cathay-Chen, mgisr, JoseHung, qualifier1024, pengchzn, Guanngxu, longsizhuo, L-Super, what-is-me, yuan0221, lhxsm, Slone123c, WSL0809, longranger2, theNefelibatas, xiongsp, JeffersonHuang, hongyun-robot, K3v123, yuelinxin, a16su, gaofer, malone6, Wonderdch, xjr7670, DullSword, Horbin-Magician, NI-SW, reeswell, XC-Zero, XiaChuerwu, yd-j, iron-irax, huawuque404, MolDuM, Nigh, KorsChen, foursevenlove, 52coder, bubble9um, youshaoXG, curly210102, gltianwen, fanchenggang, Transmigration-zhou, FloranceYeh, FreddieLi, ShiMaRing, lipusheng, Javesun99, JackYang-hellobobo, shanghai-Jerry, 0130w, Keynman, psychelzh, logan-qiu, ZnYang2018, MwumLi, 1ch0, Phoenix0415, qingpeng9802, Richard-Zhang1019, QiLOL, Suremotoo, Turing-1024-Lee, Evilrabbit520, GaochaoZhu, ZJKung, linzeyan, hezhizhen, ZongYangL, beintentional, czruby, coderlef, dshlstarr, szu17dmy, fbigm, gledfish, hts0000, boloboloda, iStig, jiaxianhua, wenjianmin, keshida, kilikilikid, lclc6, lwbaptx, liuxjerry, lucaswangdev, lyl625760, chadyi, noobcodemaker, selear, siqyka, syd168, 4yDX3906, tao363, wangwang105, weibk, yabo083, yi427, yishangzhang, zhouLion, baagod, ElaBosak233, xb534, luluxia, yanedie, thomasq0, YangXuanyi and th1nk3r-ing.
|
||||
|
||||
The code review work for this book was completed by codingonion, Gonglja, gvenusleo, hpstory, justin‐tse, krahets, night-cruise, nuomi1, and Reanon (listed in alphabetical order). Thanks to them for their time and effort, ensuring the standardization and uniformity of the code in various languages.
|
||||
|
||||
Throughout the creation of this book, numerous individuals provided invaluable assistance, including but not limited to:
|
||||
|
||||
- Thanks to my mentor at the company, Dr. Xi Li, who encouraged me in a conversation to "get moving fast," which solidified my determination to write this book;
|
||||
- Thanks to my girlfriend Bubble, as the first reader of this book, for offering many valuable suggestions from the perspective of a beginner in algorithms, making this book more suitable for newbies;
|
||||
- Thanks to Tengbao, Qibao, and Feibao for coming up with a creative name for this book, evoking everyone's fond memories of writing their first line of code "Hello World!";
|
||||
- Thanks to Xiaoquan for providing professional help in intellectual property, which has played a significant role in the development of this open-source book;
|
||||
- Thanks to Sutong for designing a beautiful cover and logo for this book, and for patiently making multiple revisions under my insistence;
|
||||
- Thanks to @squidfunk for providing writing and typesetting suggestions, as well as his developed open-source documentation theme [Material-for-MkDocs](https://github.com/squidfunk/mkdocs-material/tree/master).
|
||||
|
||||
Throughout the writing journey, I delved into numerous textbooks and articles on data structures and algorithms. These works served as exemplary models, ensuring the accuracy and quality of this book's content. I extend my gratitude to all who preceded me for their invaluable contributions!
|
||||
|
||||
This book advocates a combination of hands-on and minds-on learning, inspired in this regard by ["Dive into Deep Learning"](https://github.com/d2l-ai/d2l-zh). I highly recommend this excellent book to all readers.
|
||||
|
||||
**Heartfelt thanks to my parents, whose ongoing support and encouragement have allowed me to do this interesting work**.
|