diff --git a/docs-en/chapter_array_and_linkedlist/array.md b/docs-en/chapter_array_and_linkedlist/array.md new file mode 100755 index 000000000..e9ccf6ced --- /dev/null +++ b/docs-en/chapter_array_and_linkedlist/array.md @@ -0,0 +1,1230 @@ +--- +comments: true +--- + +# 4.1   Arrays + +The "array" is a linear data structure that stores elements of the same type in contiguous memory locations. We refer to the position of an element in the array as its "index". The following image illustrates the main terminology and concepts of an array. + +![Array Definition and Storage Method](array.assets/array_definition.png){ class="animation-figure" } + +

Figure 4-1   Array Definition and Storage Method

+ +## 4.1.1   Common Operations on Arrays + +### 1.   Initializing Arrays + +There are two ways to initialize arrays depending on the requirements: without initial values and with given initial values. In cases where initial values are not specified, most programming languages will initialize the array elements to $0$: + +=== "Python" + + ```python title="array.py" + # Initialize array + arr: list[int] = [0] * 5 # [ 0, 0, 0, 0, 0 ] + nums: list[int] = [1, 3, 2, 5, 4] + ``` + +=== "C++" + + ```cpp title="array.cpp" + /* Initialize array */ + // Stored on stack + int arr[5]; + int nums[5] = { 1, 3, 2, 5, 4 }; + // Stored on heap (manual memory release needed) + int* arr1 = new int[5]; + int* nums1 = new int[5] { 1, 3, 2, 5, 4 }; + ``` + +=== "Java" + + ```java title="array.java" + /* Initialize array */ + int[] arr = new int[5]; // { 0, 0, 0, 0, 0 } + int[] nums = { 1, 3, 2, 5, 4 }; + ``` + +=== "C#" + + ```csharp title="array.cs" + /* Initialize array */ + int[] arr = new int[5]; // { 0, 0, 0, 0, 0 } + int[] nums = [1, 3, 2, 5, 4]; + ``` + +=== "Go" + + ```go title="array.go" + /* Initialize array */ + var arr [5]int + // In Go, specifying the length ([5]int) denotes an array, while not specifying it ([]int) denotes a slice. + // Since Go's arrays are designed to have compile-time fixed length, only constants can be used to specify the length. + // For convenience in implementing the extend() method, the Slice will be considered as an Array here. + nums := []int{1, 3, 2, 5, 4} + ``` + +=== "Swift" + + ```swift title="array.swift" + /* Initialize array */ + let arr = Array(repeating: 0, count: 5) // [0, 0, 0, 0, 0] + let nums = [1, 3, 2, 5, 4] + ``` + +=== "JS" + + ```javascript title="array.js" + /* Initialize array */ + var arr = new Array(5).fill(0); + var nums = [1, 3, 2, 5, 4]; + ``` + +=== "TS" + + ```typescript title="array.ts" + /* Initialize array */ + let arr: number[] = new Array(5).fill(0); + let nums: number[] = [1, 3, 2, 5, 4]; + ``` + +=== "Dart" + + ```dart title="array.dart" + /* Initialize array */ + List arr = List.filled(5, 0); // [0, 0, 0, 0, 0] + List nums = [1, 3, 2, 5, 4]; + ``` + +=== "Rust" + + ```rust title="array.rs" + /* Initialize array */ + let arr: Vec = vec![0; 5]; // [0, 0, 0, 0, 0] + let nums: Vec = vec![1, 3, 2, 5, 4]; + ``` + +=== "C" + + ```c title="array.c" + /* Initialize array */ + int arr[5] = { 0 }; // { 0, 0, 0, 0, 0 } + int nums[5] = { 1, 3, 2, 5, 4 }; + ``` + +=== "Zig" + + ```zig title="array.zig" + // Initialize array + var arr = [_]i32{0} ** 5; // { 0, 0, 0, 0, 0 } + var nums = [_]i32{ 1, 3, 2, 5, 4 }; + ``` + +### 2.   Accessing Elements + +Elements in an array are stored in contiguous memory locations, which makes it easy to compute the memory address of any element. Given the memory address of the array (the address of the first element) and the index of an element, we can calculate the memory address of that element using the formula shown in the following image, allowing direct access to the element. + +![Memory Address Calculation for Array Elements](array.assets/array_memory_location_calculation.png){ class="animation-figure" } + +

Figure 4-2   Memory Address Calculation for Array Elements

+ +As observed in the above image, the index of the first element of an array is $0$, which may seem counterintuitive since counting starts from $1$. However, from the perspective of the address calculation formula, **an index is essentially an offset from the memory address**. The offset for the first element's address is $0$, making its index $0$ logical. + +Accessing elements in an array is highly efficient, allowing us to randomly access any element in $O(1)$ time. + +=== "Python" + + ```python title="array.py" + def random_access(nums: list[int]) -> int: + """随机访问元素""" + # 在区间 [0, len(nums)-1] 中随机抽取一个数字 + random_index = random.randint(0, len(nums) - 1) + # 获取并返回随机元素 + random_num = nums[random_index] + return random_num + ``` + +=== "C++" + + ```cpp title="array.cpp" + /* 随机访问元素 */ + int randomAccess(int *nums, int size) { + // 在区间 [0, size) 中随机抽取一个数字 + int randomIndex = rand() % size; + // 获取并返回随机元素 + int randomNum = nums[randomIndex]; + return randomNum; + } + ``` + +=== "Java" + + ```java title="array.java" + /* 随机访问元素 */ + int randomAccess(int[] nums) { + // 在区间 [0, nums.length) 中随机抽取一个数字 + int randomIndex = ThreadLocalRandom.current().nextInt(0, nums.length); + // 获取并返回随机元素 + int randomNum = nums[randomIndex]; + return randomNum; + } + ``` + +=== "C#" + + ```csharp title="array.cs" + /* 随机访问元素 */ + int RandomAccess(int[] nums) { + Random random = new(); + // 在区间 [0, nums.Length) 中随机抽取一个数字 + int randomIndex = random.Next(nums.Length); + // 获取并返回随机元素 + int randomNum = nums[randomIndex]; + return randomNum; + } + ``` + +=== "Go" + + ```go title="array.go" + /* 随机访问元素 */ + func randomAccess(nums []int) (randomNum int) { + // 在区间 [0, nums.length) 中随机抽取一个数字 + randomIndex := rand.Intn(len(nums)) + // 获取并返回随机元素 + randomNum = nums[randomIndex] + return + } + ``` + +=== "Swift" + + ```swift title="array.swift" + /* 随机访问元素 */ + func randomAccess(nums: [Int]) -> Int { + // 在区间 [0, nums.count) 中随机抽取一个数字 + let randomIndex = nums.indices.randomElement()! + // 获取并返回随机元素 + let randomNum = nums[randomIndex] + return randomNum + } + ``` + +=== "JS" + + ```javascript title="array.js" + /* 随机访问元素 */ + function randomAccess(nums) { + // 在区间 [0, nums.length) 中随机抽取一个数字 + const random_index = Math.floor(Math.random() * nums.length); + // 获取并返回随机元素 + const random_num = nums[random_index]; + return random_num; + } + ``` + +=== "TS" + + ```typescript title="array.ts" + /* 随机访问元素 */ + function randomAccess(nums: number[]): number { + // 在区间 [0, nums.length) 中随机抽取一个数字 + const random_index = Math.floor(Math.random() * nums.length); + // 获取并返回随机元素 + const random_num = nums[random_index]; + return random_num; + } + ``` + +=== "Dart" + + ```dart title="array.dart" + /* 随机访问元素 */ + int randomAccess(List nums) { + // 在区间 [0, nums.length) 中随机抽取一个数字 + int randomIndex = Random().nextInt(nums.length); + // 获取并返回随机元素 + int randomNum = nums[randomIndex]; + return randomNum; + } + ``` + +=== "Rust" + + ```rust title="array.rs" + /* 随机访问元素 */ + fn random_access(nums: &[i32]) -> i32 { + // 在区间 [0, nums.len()) 中随机抽取一个数字 + let random_index = rand::thread_rng().gen_range(0..nums.len()); + // 获取并返回随机元素 + let random_num = nums[random_index]; + random_num + } + ``` + +=== "C" + + ```c title="array.c" + /* 随机访问元素 */ + int randomAccess(int *nums, int size) { + // 在区间 [0, size) 中随机抽取一个数字 + int randomIndex = rand() % size; + // 获取并返回随机元素 + int randomNum = nums[randomIndex]; + return randomNum; + } + ``` + +=== "Zig" + + ```zig title="array.zig" + // 随机访问元素 + fn randomAccess(nums: []i32) i32 { + // 在区间 [0, nums.len) 中随机抽取一个整数 + var randomIndex = std.crypto.random.intRangeLessThan(usize, 0, nums.len); + // 获取并返回随机元素 + var randomNum = nums[randomIndex]; + return randomNum; + } + ``` + +### 3.   Inserting Elements + +As shown in the image below, to insert an element in the middle of an array, all elements following the insertion point must be moved one position back to make room for the new element. + +![Array Element Insertion Example](array.assets/array_insert_element.png){ class="animation-figure" } + +

Figure 4-3   Array Element Insertion Example

+ +It's important to note that since the length of an array is fixed, inserting an element will inevitably lead to the loss of the last element in the array. We will discuss solutions to this problem in the "List" chapter. + +=== "Python" + + ```python title="array.py" + def insert(nums: list[int], num: int, index: int): + """在数组的索引 index 处插入元素 num""" + # 把索引 index 以及之后的所有元素向后移动一位 + for i in range(len(nums) - 1, index, -1): + nums[i] = nums[i - 1] + # 将 num 赋给 index 处的元素 + nums[index] = num + ``` + +=== "C++" + + ```cpp title="array.cpp" + /* 在数组的索引 index 处插入元素 num */ + void insert(int *nums, int size, int num, int index) { + // 把索引 index 以及之后的所有元素向后移动一位 + for (int i = size - 1; i > index; i--) { + nums[i] = nums[i - 1]; + } + // 将 num 赋给 index 处的元素 + nums[index] = num; + } + ``` + +=== "Java" + + ```java title="array.java" + /* 在数组的索引 index 处插入元素 num */ + void insert(int[] nums, int num, int index) { + // 把索引 index 以及之后的所有元素向后移动一位 + for (int i = nums.length - 1; i > index; i--) { + nums[i] = nums[i - 1]; + } + // 将 num 赋给 index 处的元素 + nums[index] = num; + } + ``` + +=== "C#" + + ```csharp title="array.cs" + /* 在数组的索引 index 处插入元素 num */ + void Insert(int[] nums, int num, int index) { + // 把索引 index 以及之后的所有元素向后移动一位 + for (int i = nums.Length - 1; i > index; i--) { + nums[i] = nums[i - 1]; + } + // 将 num 赋给 index 处的元素 + nums[index] = num; + } + ``` + +=== "Go" + + ```go title="array.go" + /* 在数组的索引 index 处插入元素 num */ + func insert(nums []int, num int, index int) { + // 把索引 index 以及之后的所有元素向后移动一位 + for i := len(nums) - 1; i > index; i-- { + nums[i] = nums[i-1] + } + // 将 num 赋给 index 处的元素 + nums[index] = num + } + ``` + +=== "Swift" + + ```swift title="array.swift" + /* 在数组的索引 index 处插入元素 num */ + func insert(nums: inout [Int], num: Int, index: Int) { + // 把索引 index 以及之后的所有元素向后移动一位 + for i in nums.indices.dropFirst(index).reversed() { + nums[i] = nums[i - 1] + } + // 将 num 赋给 index 处的元素 + nums[index] = num + } + ``` + +=== "JS" + + ```javascript title="array.js" + /* 在数组的索引 index 处插入元素 num */ + function insert(nums, num, index) { + // 把索引 index 以及之后的所有元素向后移动一位 + for (let i = nums.length - 1; i > index; i--) { + nums[i] = nums[i - 1]; + } + // 将 num 赋给 index 处的元素 + nums[index] = num; + } + ``` + +=== "TS" + + ```typescript title="array.ts" + /* 在数组的索引 index 处插入元素 num */ + function insert(nums: number[], num: number, index: number): void { + // 把索引 index 以及之后的所有元素向后移动一位 + for (let i = nums.length - 1; i > index; i--) { + nums[i] = nums[i - 1]; + } + // 将 num 赋给 index 处的元素 + nums[index] = num; + } + ``` + +=== "Dart" + + ```dart title="array.dart" + /* 在数组的索引 index 处插入元素 _num */ + void insert(List nums, int _num, int index) { + // 把索引 index 以及之后的所有元素向后移动一位 + for (var i = nums.length - 1; i > index; i--) { + nums[i] = nums[i - 1]; + } + // 将 _num 赋给 index 处元素 + nums[index] = _num; + } + ``` + +=== "Rust" + + ```rust title="array.rs" + /* 在数组的索引 index 处插入元素 num */ + fn insert(nums: &mut Vec, num: i32, index: usize) { + // 把索引 index 以及之后的所有元素向后移动一位 + for i in (index + 1..nums.len()).rev() { + nums[i] = nums[i - 1]; + } + // 将 num 赋给 index 处的元素 + nums[index] = num; + } + ``` + +=== "C" + + ```c title="array.c" + /* 在数组的索引 index 处插入元素 num */ + void insert(int *nums, int size, int num, int index) { + // 把索引 index 以及之后的所有元素向后移动一位 + for (int i = size - 1; i > index; i--) { + nums[i] = nums[i - 1]; + } + // 将 num 赋给 index 处的元素 + nums[index] = num; + } + ``` + +=== "Zig" + + ```zig title="array.zig" + // 在数组的索引 index 处插入元素 num + fn insert(nums: []i32, num: i32, index: usize) void { + // 把索引 index 以及之后的所有元素向后移动一位 + var i = nums.len - 1; + while (i > index) : (i -= 1) { + nums[i] = nums[i - 1]; + } + // 将 num 赋给 index 处的元素 + nums[index] = num; + } + ``` + +### 4.   Deleting Elements + +Similarly, as illustrated below, to delete an element at index $i$, all elements following index $i$ must be moved forward by one position. + +![Array Element Deletion Example](array.assets/array_remove_element.png){ class="animation-figure" } + +

Figure 4-4   Array Element Deletion Example

+ +Note that after deletion, the last element becomes "meaningless", so we do not need to specifically modify it. + +=== "Python" + + ```python title="array.py" + def remove(nums: list[int], index: int): + """删除索引 index 处的元素""" + # 把索引 index 之后的所有元素向前移动一位 + for i in range(index, len(nums) - 1): + nums[i] = nums[i + 1] + ``` + +=== "C++" + + ```cpp title="array.cpp" + /* 删除索引 index 处的元素 */ + void remove(int *nums, int size, int index) { + // 把索引 index 之后的所有元素向前移动一位 + for (int i = index; i < size - 1; i++) { + nums[i] = nums[i + 1]; + } + } + ``` + +=== "Java" + + ```java title="array.java" + /* 删除索引 index 处的元素 */ + void remove(int[] nums, int index) { + // 把索引 index 之后的所有元素向前移动一位 + for (int i = index; i < nums.length - 1; i++) { + nums[i] = nums[i + 1]; + } + } + ``` + +=== "C#" + + ```csharp title="array.cs" + /* 删除索引 index 处的元素 */ + void Remove(int[] nums, int index) { + // 把索引 index 之后的所有元素向前移动一位 + for (int i = index; i < nums.Length - 1; i++) { + nums[i] = nums[i + 1]; + } + } + ``` + +=== "Go" + + ```go title="array.go" + /* 删除索引 index 处的元素 */ + func remove(nums []int, index int) { + // 把索引 index 之后的所有元素向前移动一位 + for i := index; i < len(nums)-1; i++ { + nums[i] = nums[i+1] + } + } + ``` + +=== "Swift" + + ```swift title="array.swift" + /* 删除索引 index 处的元素 */ + func remove(nums: inout [Int], index: Int) { + // 把索引 index 之后的所有元素向前移动一位 + for i in nums.indices.dropFirst(index).dropLast() { + nums[i] = nums[i + 1] + } + } + ``` + +=== "JS" + + ```javascript title="array.js" + /* 删除索引 index 处的元素 */ + function remove(nums, index) { + // 把索引 index 之后的所有元素向前移动一位 + for (let i = index; i < nums.length - 1; i++) { + nums[i] = nums[i + 1]; + } + } + ``` + +=== "TS" + + ```typescript title="array.ts" + /* 删除索引 index 处的元素 */ + function remove(nums: number[], index: number): void { + // 把索引 index 之后的所有元素向前移动一位 + for (let i = index; i < nums.length - 1; i++) { + nums[i] = nums[i + 1]; + } + } + ``` + +=== "Dart" + + ```dart title="array.dart" + /* 删除索引 index 处的元素 */ + void remove(List nums, int index) { + // 把索引 index 之后的所有元素向前移动一位 + for (var i = index; i < nums.length - 1; i++) { + nums[i] = nums[i + 1]; + } + } + ``` + +=== "Rust" + + ```rust title="array.rs" + /* 删除索引 index 处的元素 */ + fn remove(nums: &mut Vec, index: usize) { + // 把索引 index 之后的所有元素向前移动一位 + for i in index..nums.len() - 1 { + nums[i] = nums[i + 1]; + } + } + ``` + +=== "C" + + ```c title="array.c" + /* 删除索引 index 处的元素 */ + // 注意:stdio.h 占用了 remove 关键词 + void removeItem(int *nums, int size, int index) { + // 把索引 index 之后的所有元素向前移动一位 + for (int i = index; i < size - 1; i++) { + nums[i] = nums[i + 1]; + } + } + ``` + +=== "Zig" + + ```zig title="array.zig" + // 删除索引 index 处的元素 + fn remove(nums: []i32, index: usize) void { + // 把索引 index 之后的所有元素向前移动一位 + var i = index; + while (i < nums.len - 1) : (i += 1) { + nums[i] = nums[i + 1]; + } + } + ``` + +Overall, the insertion and deletion operations in arrays have the following disadvantages: + +- **High Time Complexity**: Both insertion and deletion in an array have an average time complexity of $O(n)$, where $n$ is the length of the array. +- **Loss of Elements**: Due to the fixed length of arrays, elements that exceed the array's capacity are lost during insertion. +- **Waste of Memory**: We can initialize a longer array and use only the front part, allowing the "lost" end elements during insertion to be "meaningless", but this leads to some wasted memory space. + +### 5.   Traversing Arrays + +In most programming languages, we can traverse an array either by indices or by directly iterating over each element: + +=== "Python" + + ```python title="array.py" + def traverse(nums: list[int]): + """遍历数组""" + count = 0 + # 通过索引遍历数组 + for i in range(len(nums)): + count += nums[i] + # 直接遍历数组元素 + for num in nums: + count += num + # 同时遍历数据索引和元素 + for i, num in enumerate(nums): + count += nums[i] + count += num + ``` + +=== "C++" + + ```cpp title="array.cpp" + /* 遍历数组 */ + void traverse(int *nums, int size) { + int count = 0; + // 通过索引遍历数组 + for (int i = 0; i < size; i++) { + count += nums[i]; + } + } + ``` + +=== "Java" + + ```java title="array.java" + /* 遍历数组 */ + void traverse(int[] nums) { + int count = 0; + // 通过索引遍历数组 + for (int i = 0; i < nums.length; i++) { + count += nums[i]; + } + // 直接遍历数组元素 + for (int num : nums) { + count += num; + } + } + ``` + +=== "C#" + + ```csharp title="array.cs" + /* 遍历数组 */ + void Traverse(int[] nums) { + int count = 0; + // 通过索引遍历数组 + for (int i = 0; i < nums.Length; i++) { + count += nums[i]; + } + // 直接遍历数组元素 + foreach (int num in nums) { + count += num; + } + } + ``` + +=== "Go" + + ```go title="array.go" + /* 遍历数组 */ + func traverse(nums []int) { + count := 0 + // 通过索引遍历数组 + for i := 0; i < len(nums); i++ { + count += nums[i] + } + count = 0 + // 直接遍历数组元素 + for _, num := range nums { + count += num + } + // 同时遍历数据索引和元素 + for i, num := range nums { + count += nums[i] + count += num + } + } + ``` + +=== "Swift" + + ```swift title="array.swift" + /* 遍历数组 */ + func traverse(nums: [Int]) { + var count = 0 + // 通过索引遍历数组 + for i in nums.indices { + count += nums[i] + } + // 直接遍历数组元素 + for num in nums { + count += num + } + } + ``` + +=== "JS" + + ```javascript title="array.js" + /* 遍历数组 */ + function traverse(nums) { + let count = 0; + // 通过索引遍历数组 + for (let i = 0; i < nums.length; i++) { + count += nums[i]; + } + // 直接遍历数组元素 + for (const num of nums) { + count += num; + } + } + ``` + +=== "TS" + + ```typescript title="array.ts" + /* 遍历数组 */ + function traverse(nums: number[]): void { + let count = 0; + // 通过索引遍历数组 + for (let i = 0; i < nums.length; i++) { + count += nums[i]; + } + // 直接遍历数组元素 + for (const num of nums) { + count += num; + } + } + ``` + +=== "Dart" + + ```dart title="array.dart" + /* 遍历数组元素 */ + void traverse(List nums) { + int count = 0; + // 通过索引遍历数组 + for (var i = 0; i < nums.length; i++) { + count += nums[i]; + } + // 直接遍历数组元素 + for (int _num in nums) { + count += _num; + } + // 通过 forEach 方法遍历数组 + nums.forEach((_num) { + count += _num; + }); + } + ``` + +=== "Rust" + + ```rust title="array.rs" + /* 遍历数组 */ + fn traverse(nums: &[i32]) { + let mut _count = 0; + // 通过索引遍历数组 + for i in 0..nums.len() { + _count += nums[i]; + } + // 直接遍历数组元素 + for num in nums { + _count += num; + } + } + ``` + +=== "C" + + ```c title="array.c" + /* 遍历数组 */ + void traverse(int *nums, int size) { + int count = 0; + // 通过索引遍历数组 + for (int i = 0; i < size; i++) { + count += nums[i]; + } + } + ``` + +=== "Zig" + + ```zig title="array.zig" + // 遍历数组 + fn traverse(nums: []i32) void { + var count: i32 = 0; + // 通过索引遍历数组 + var i: i32 = 0; + while (i < nums.len) : (i += 1) { + count += nums[i]; + } + count = 0; + // 直接遍历数组元素 + for (nums) |num| { + count += num; + } + } + ``` + +### 6.   Finding Elements + +To find a specific element in an array, we need to iterate through it, checking each element to see if it matches. + +Since arrays are linear data structures, this operation is known as "linear search". + +=== "Python" + + ```python title="array.py" + def find(nums: list[int], target: int) -> int: + """在数组中查找指定元素""" + for i in range(len(nums)): + if nums[i] == target: + return i + return -1 + ``` + +=== "C++" + + ```cpp title="array.cpp" + /* 在数组中查找指定元素 */ + int find(int *nums, int size, int target) { + for (int i = 0; i < size; i++) { + if (nums[i] == target) + return i; + } + return -1; + } + ``` + +=== "Java" + + ```java title="array.java" + /* 在数组中查找指定元素 */ + int find(int[] nums, int target) { + for (int i = 0; i < nums.length; i++) { + if (nums[i] == target) + return i; + } + return -1; + } + ``` + +=== "C#" + + ```csharp title="array.cs" + /* 在数组中查找指定元素 */ + int Find(int[] nums, int target) { + for (int i = 0; i < nums.Length; i++) { + if (nums[i] == target) + return i; + } + return -1; + } + ``` + +=== "Go" + + ```go title="array.go" + /* 在数组中查找指定元素 */ + func find(nums []int, target int) (index int) { + index = -1 + for i := 0; i < len(nums); i++ { + if nums[i] == target { + index = i + break + } + } + return + } + ``` + +=== "Swift" + + ```swift title="array.swift" + /* 在数组中查找指定元素 */ + func find(nums: [Int], target: Int) -> Int { + for i in nums.indices { + if nums[i] == target { + return i + } + } + return -1 + } + ``` + +=== "JS" + + ```javascript title="array.js" + /* 在数组中查找指定元素 */ + function find(nums, target) { + for (let i = 0; i < nums.length; i++) { + if (nums[i] === target) return i; + } + return -1; + } + ``` + +=== "TS" + + ```typescript title="array.ts" + /* 在数组中查找指定元素 */ + function find(nums: number[], target: number): number { + for (let i = 0; i < nums.length; i++) { + if (nums[i] === target) { + return i; + } + } + return -1; + } + ``` + +=== "Dart" + + ```dart title="array.dart" + /* 在数组中查找指定元素 */ + int find(List nums, int target) { + for (var i = 0; i < nums.length; i++) { + if (nums[i] == target) return i; + } + return -1; + } + ``` + +=== "Rust" + + ```rust title="array.rs" + /* 在数组中查找指定元素 */ + fn find(nums: &[i32], target: i32) -> Option { + for i in 0..nums.len() { + if nums[i] == target { + return Some(i); + } + } + None + } + ``` + +=== "C" + + ```c title="array.c" + /* 在数组中查找指定元素 */ + int find(int *nums, int size, int target) { + for (int i = 0; i < size; i++) { + if (nums[i] == target) + return i; + } + return -1; + } + ``` + +=== "Zig" + + ```zig title="array.zig" + // 在数组中查找指定元素 + fn find(nums: []i32, target: i32) i32 { + for (nums, 0..) |num, i| { + if (num == target) return @intCast(i); + } + return -1; + } + ``` + +### 7.   Expanding Arrays + +In complex system environments, it's challenging to ensure that the memory space following an array is available, making it unsafe to extend the array's capacity. Therefore, in most programming languages, **the length of an array is immutable**. + +To expand an array, we need to create a larger array and then copy the elements from the original array. This operation has a time complexity of $O(n)$ and can be time-consuming for large arrays. The code is as follows: + +=== "Python" + + ```python title="array.py" + def extend(nums: list[int], enlarge: int) -> list[int]: + """扩展数组长度""" + # 初始化一个扩展长度后的数组 + res = [0] * (len(nums) + enlarge) + # 将原数组中的所有元素复制到新数组 + for i in range(len(nums)): + res[i] = nums[i] + # 返回扩展后的新数组 + return res + ``` + +=== "C++" + + ```cpp title="array.cpp" + /* 扩展数组长度 */ + int *extend(int *nums, int size, int enlarge) { + // 初始化一个扩展长度后的数组 + int *res = new int[size + enlarge]; + // 将原数组中的所有元素复制到新数组 + for (int i = 0; i < size; i++) { + res[i] = nums[i]; + } + // 释放内存 + delete[] nums; + // 返回扩展后的新数组 + return res; + } + ``` + +=== "Java" + + ```java title="array.java" + /* 扩展数组长度 */ + int[] extend(int[] nums, int enlarge) { + // 初始化一个扩展长度后的数组 + int[] res = new int[nums.length + enlarge]; + // 将原数组中的所有元素复制到新数组 + for (int i = 0; i < nums.length; i++) { + res[i] = nums[i]; + } + // 返回扩展后的新数组 + return res; + } + ``` + +=== "C#" + + ```csharp title="array.cs" + /* 扩展数组长度 */ + int[] Extend(int[] nums, int enlarge) { + // 初始化一个扩展长度后的数组 + int[] res = new int[nums.Length + enlarge]; + // 将原数组中的所有元素复制到新数组 + for (int i = 0; i < nums.Length; i++) { + res[i] = nums[i]; + } + // 返回扩展后的新数组 + return res; + } + ``` + +=== "Go" + + ```go title="array.go" + /* 扩展数组长度 */ + func extend(nums []int, enlarge int) []int { + // 初始化一个扩展长度后的数组 + res := make([]int, len(nums)+enlarge) + // 将原数组中的所有元素复制到新数组 + for i, num := range nums { + res[i] = num + } + // 返回扩展后的新数组 + return res + } + ``` + +=== "Swift" + + ```swift title="array.swift" + /* 扩展数组长度 */ + func extend(nums: [Int], enlarge: Int) -> [Int] { + // 初始化一个扩展长度后的数组 + var res = Array(repeating: 0, count: nums.count + enlarge) + // 将原数组中的所有元素复制到新数组 + for i in nums.indices { + res[i] = nums[i] + } + // 返回扩展后的新数组 + return res + } + ``` + +=== "JS" + + ```javascript title="array.js" + /* 扩展数组长度 */ + // 请注意,JavaScript 的 Array 是动态数组,可以直接扩展 + // 为了方便学习,本函数将 Array 看作长度不可变的数组 + function extend(nums, enlarge) { + // 初始化一个扩展长度后的数组 + const res = new Array(nums.length + enlarge).fill(0); + // 将原数组中的所有元素复制到新数组 + for (let i = 0; i < nums.length; i++) { + res[i] = nums[i]; + } + // 返回扩展后的新数组 + return res; + } + ``` + +=== "TS" + + ```typescript title="array.ts" + /* 扩展数组长度 */ + // 请注意,TypeScript 的 Array 是动态数组,可以直接扩展 + // 为了方便学习,本函数将 Array 看作长度不可变的数组 + function extend(nums: number[], enlarge: number): number[] { + // 初始化一个扩展长度后的数组 + const res = new Array(nums.length + enlarge).fill(0); + // 将原数组中的所有元素复制到新数组 + for (let i = 0; i < nums.length; i++) { + res[i] = nums[i]; + } + // 返回扩展后的新数组 + return res; + } + ``` + +=== "Dart" + + ```dart title="array.dart" + /* 扩展数组长度 */ + List extend(List nums, int enlarge) { + // 初始化一个扩展长度后的数组 + List res = List.filled(nums.length + enlarge, 0); + // 将原数组中的所有元素复制到新数组 + for (var i = 0; i < nums.length; i++) { + res[i] = nums[i]; + } + // 返回扩展后的新数组 + return res; + } + ``` + +=== "Rust" + + ```rust title="array.rs" + /* 扩展数组长度 */ + fn extend(nums: Vec, enlarge: usize) -> Vec { + // 初始化一个扩展长度后的数组 + let mut res: Vec = vec![0; nums.len() + enlarge]; + // 将原数组中的所有元素复制到新 + for i in 0..nums.len() { + res[i] = nums[i]; + } + // 返回扩展后的新数组 + res + } + ``` + +=== "C" + + ```c title="array.c" + /* 扩展数组长度 */ + int *extend(int *nums, int size, int enlarge) { + // 初始化一个扩展长度后的数组 + int *res = (int *)malloc(sizeof(int) * (size + enlarge)); + // 将原数组中的所有元素复制到新数组 + for (int i = 0; i < size; i++) { + res[i] = nums[i]; + } + // 初始化扩展后的空间 + for (int i = size; i < size + enlarge; i++) { + res[i] = 0; + } + // 返回扩展后的新数组 + return res; + } + ``` + +=== "Zig" + + ```zig title="array.zig" + // 扩展数组长度 + fn extend(mem_allocator: std.mem.Allocator, nums: []i32, enlarge: usize) ![]i32 { + // 初始化一个扩展长度后的数组 + var res = try mem_allocator.alloc(i32, nums.len + enlarge); + @memset(res, 0); + // 将原数组中的所有元素复制到新数组 + std.mem.copy(i32, res, nums); + // 返回扩展后的新数组 + return res; + } + ``` + +## 4.1.2   Advantages and Limitations of Arrays + +Arrays are stored in contiguous memory spaces and consist of elements of the same type. This approach includes a wealth of prior information that the system can use to optimize the operation efficiency of the data structure. + +- **High Space Efficiency**: Arrays allocate a contiguous block of memory for data, eliminating the need for additional structural overhead. +- **Support for Random Access**: Arrays allow $O(1)$ time access to any element. +- **Cache Locality**: When accessing array elements, the computer not only loads them but also caches the surrounding data, leveraging high-speed cache to improve the speed of subsequent operations. + +However, continuous space storage is a double-edged sword, with the following limitations: + +- **Low Efficiency in Insertion and Deletion**: When there are many elements in an array, insertion and deletion operations require moving a large number of elements. +- **Fixed Length**: The length of an array is fixed after initialization. Expanding an array requires copying all data to a new array, which is costly. +- **Space Wastage**: If the allocated size of an array exceeds the actual need, the extra space is wasted. + +## 4.1.3   Typical Applications of Arrays + +Arrays are a fundamental and common data structure, frequently used in various algorithms and in implementing complex data structures. + +- **Random Access**: If we want to randomly sample some data, we can use an array for storage and generate a random sequence to implement random sampling based on indices. +- **Sorting and Searching**: Arrays are the most commonly used data structure for sorting and searching algorithms. Quick sort, merge sort, binary search, etc., are primarily conducted on arrays. +- **Lookup Tables**: Arrays can be used as lookup tables for fast element or relationship retrieval. For instance, if we want to implement a mapping from characters to ASCII codes, we can use the ASCII code value of a character as the index, with the corresponding element stored in the corresponding position in the array. +- **Machine Learning**: Arrays are extensively used in neural networks for linear algebra operations between vectors, matrices, and tensors. Arrays are the most commonly used data structure in neural network programming. +- **Data Structure Implementation**: Arrays can be used to implement stacks, queues, hash tables, heaps, graphs, etc. For example, the adjacency matrix representation of a graph is essentially a two-dimensional array. diff --git a/docs-en/chapter_array_and_linkedlist/index.md b/docs-en/chapter_array_and_linkedlist/index.md new file mode 100644 index 000000000..ee0df5240 --- /dev/null +++ b/docs-en/chapter_array_and_linkedlist/index.md @@ -0,0 +1,22 @@ +--- +comments: true +icon: material/view-list-outline +--- + +# Chapter 4.   Arrays and Linked Lists + +![Arrays and Linked Lists](../assets/covers/chapter_array_and_linkedlist.jpg){ class="cover-image" } + +!!! abstract + + The world of data structures is like a solid brick wall. + + The bricks of an array are neatly arranged, each closely connected to the next. In contrast, the bricks of a linked list are scattered, with vines of connections freely weaving through the gaps between bricks. + +## 本章内容 + +- [4.1   Array](https://www.hello-algo.com/chapter_array_and_linkedlist/array/) +- [4.2   Linked List](https://www.hello-algo.com/chapter_array_and_linkedlist/linked_list/) +- [4.3   List](https://www.hello-algo.com/chapter_array_and_linkedlist/list/) +- [4.4   Memory and Cache](https://www.hello-algo.com/chapter_array_and_linkedlist/ram_and_cache/) +- [4.5   Summary](https://www.hello-algo.com/chapter_array_and_linkedlist/summary/) diff --git a/docs-en/chapter_array_and_linkedlist/linked_list.md b/docs-en/chapter_array_and_linkedlist/linked_list.md new file mode 100755 index 000000000..125e9dcec --- /dev/null +++ b/docs-en/chapter_array_and_linkedlist/linked_list.md @@ -0,0 +1,1338 @@ +--- +comments: true +--- + +# 4.2   Linked Lists + +Memory space is a common resource for all programs. In a complex system environment, free memory space can be scattered throughout memory. We know that the memory space for storing an array must be contiguous, and when the array is very large, it may not be possible to provide such a large contiguous space. This is where the flexibility advantage of linked lists becomes apparent. + +A "linked list" is a linear data structure where each element is a node object, and the nodes are connected via "references". A reference records the memory address of the next node, allowing access to the next node from the current one. + +The design of a linked list allows its nodes to be scattered throughout memory, with no need for contiguous memory addresses. + +![Linked List Definition and Storage Method](linked_list.assets/linkedlist_definition.png){ class="animation-figure" } + +

Figure 4-5   Linked List Definition and Storage Method

+ +Observing the image above, the fundamental unit of a linked list is the "node" object. Each node contains two pieces of data: the "value" of the node and the "reference" to the next node. + +- The first node of a linked list is known as the "head node", and the last one is called the "tail node". +- The tail node points to "null", which is represented as $\text{null}$ in Java, $\text{nullptr}$ in C++, and $\text{None}$ in Python. +- In languages that support pointers, like C, C++, Go, and Rust, the aforementioned "reference" should be replaced with a "pointer". + +As shown in the following code, a linked list node `ListNode`, apart from containing a value, also needs to store a reference (pointer). Therefore, **a linked list consumes more memory space than an array for the same amount of data**. + +=== "Python" + + ```python title="" + class ListNode: + """Linked List Node Class""" + def __init__(self, val: int): + self.val: int = val # Node value + self.next: ListNode | None = None # Reference to the next node + ``` + +=== "C++" + + ```cpp title="" + /* Linked List Node Structure */ + struct ListNode { + int val; // Node value + ListNode *next; // Pointer to the next node + ListNode(int x) : val(x), next(nullptr) {} // Constructor + }; + ``` + +=== "Java" + + ```java title="" + /* Linked List Node Class */ + class ListNode { + int val; // Node value + ListNode next; // Reference to the next node + ListNode(int x) { val = x; } // Constructor + } + ``` + +=== "C#" + + ```csharp title="" + /* Linked List Node Class */ + class ListNode(int x) { // Constructor + int val = x; // Node value + ListNode? next; // Reference to the next node + } + ``` + +=== "Go" + + ```go title="" + /* Linked List Node Structure */ + type ListNode struct { + Val int // Node value + Next *ListNode // Pointer to the next node + } + + // NewListNode Constructor, creates a new linked list + func NewListNode(val int) *ListNode { + return &ListNode{ + Val: val, + Next: nil, + } + } + ``` + +=== "Swift" + + ```swift title="" + /* Linked List Node Class */ + class ListNode { + var val: Int // Node value + var next: ListNode? // Reference to the next node + + init(x: Int) { // Constructor + val = x + } + } + ``` + +=== "JS" + + ```javascript title="" + /* Linked List Node Class */ + class ListNode { + constructor(val, next) { + this.val = (val === undefined ? 0 : val); // Node value + this.next = (next === undefined ? null : next); // Reference to the next node + } + } + ``` + +=== "TS" + + ```typescript title="" + /* Linked List Node Class */ + class ListNode { + val: number; + next: ListNode | null; + constructor(val?: number, next?: ListNode | null) { + this.val = val === undefined ? 0 : val; // Node value + this.next = next === undefined ? null : next; // Reference to the next node + } + } + ``` + +=== "Dart" + + ```dart title="" + /* 链表节点类 */ + class ListNode { + int val; // Node value + ListNode? next; // Reference to the next node + ListNode(this.val, [this.next]); // Constructor + } + ``` + +=== "Rust" + + ```rust title="" + use std::rc::Rc; + use std::cell::RefCell; + /* Linked List Node Class */ + #[derive(Debug)] + struct ListNode { + val: i32, // Node value + next: Option>>, // Pointer to the next node + } + ``` + +=== "C" + + ```c title="" + /* Linked List Node Structure */ + typedef struct ListNode { + int val; // Node value + struct ListNode *next; // Pointer to the next node + } ListNode; + + /* Constructor */ + ListNode *newListNode(int val) { + ListNode *node; + node = (ListNode *) malloc(sizeof(ListNode)); + node->val = val; + node->next = NULL; + return node; + } + ``` + +=== "Zig" + + ```zig title="" + // Linked List Node Class + pub fn ListNode(comptime T: type) type { + return struct { + const Self = @This(); + + val: T = 0, // Node value + next: ?*Self = null, // Pointer to the next node + + // Constructor + pub fn init(self: *Self, x: i32) void { + self.val = x; + self.next = null; + } + }; + } + ``` + +## 4.2.1   Common Operations on Linked Lists + +### 1.   Initializing a Linked List + +Building a linked list involves two steps: initializing each node object and then establishing the references between nodes. Once initialized, we can access all nodes sequentially from the head node via the `next` reference. + +=== "Python" + + ```python title="linked_list.py" + # Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 + # Initialize each node + n0 = ListNode(1) + n1 = ListNode(3) + n2 = ListNode(2) + n3 = ListNode(5) + n4 = ListNode(4) + # Build references between nodes + n0.next = n1 + n1.next = n2 + n2.next = n3 + n3.next = n4 + ``` + +=== "C++" + + ```cpp title="linked_list.cpp" + /* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */ + // Initialize each node + ListNode* n0 = new ListNode(1); + ListNode* n1 = new ListNode(3); + ListNode* n2 = new ListNode(2); + ListNode* n3 = new ListNode(5); + ListNode* n4 = new ListNode(4); + // Build references between nodes + n0->next = n1; + n1->next = n2; + n2->next = n3; + n3->next = n4; + ``` + +=== "Java" + + ```java title="linked_list.java" + /* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */ + // Initialize each node + ListNode n0 = new ListNode(1); + ListNode n1 = new ListNode(3); + ListNode n2 = new ListNode(2); + ListNode n3 = new ListNode(5); + ListNode n4 = new ListNode(4); + // Build references between nodes + n0.next = n1; + n1.next = n2; + n2.next = n3; + n3.next = n4; + ``` + +=== "C#" + + ```csharp title="linked_list.cs" + /* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */ + // Initialize each node + ListNode n0 = new(1); + ListNode n1 = new(3); + ListNode n2 = new(2); + ListNode n3 = new(5); + ListNode n4 = new(4); + // Build references between nodes + n0.next = n1; + n1.next = n2; + n2.next = n3; + n3.next = n4; + ``` + +=== "Go" + + ```go title="linked_list.go" + /* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */ + // Initialize each node + n0 := NewListNode(1) + n1 := NewListNode(3) + n2 := NewListNode(2) + n3 := NewListNode(5) + n4 := NewListNode(4) + // Build references between nodes + n0.Next = n1 + n1.Next = n2 + n2.Next = n3 + n3.Next = n4 + ``` + +=== "Swift" + + ```swift title="linked_list.swift" + /* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */ + // Initialize each node + let n0 = ListNode(x: 1) + let n1 = ListNode(x: 3) + let n2 = ListNode(x: 2) + let n3 = ListNode(x: 5) + let n4 = ListNode(x: 4) + // Build references between nodes + n0.next = n1 + n1.next = n2 + n2.next = n3 + n3.next = n4 + ``` + +=== "JS" + + ```javascript title="linked_list.js" + /* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */ + // Initialize each node + const n0 = new ListNode(1); + const n1 = new ListNode(3); + const n2 = new ListNode(2); + const n3 = new ListNode(5); + const n4 = new ListNode(4); + // Build references between nodes + n0.next = n1; + n1.next = n2; + n2.next = n3; + n3.next = n4; + ``` + +=== "TS" + + ```typescript title="linked_list.ts" + /* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */ + // Initialize each node + const n0 = new ListNode(1); + const n1 = new ListNode(3); + const n2 = new ListNode(2); + const n3 = new ListNode(5); + const n4 = new ListNode(4); + // Build references between nodes + n0.next = n1; + n1.next = n2; + n2.next = n3; + n3.next = n4; + ``` + +=== "Dart" + + ```dart title="linked_list.dart" + /* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */ + // Initialize each node + ListNode n0 = ListNode(1); + ListNode n1 = ListNode(3); + ListNode n2 = ListNode(2); + ListNode n3 = ListNode(5); + ListNode n4 = ListNode(4); + // Build references between nodes + n0.next = n1; + n1.next = n2; + n2.next = n3; + n3.next = n4; + ``` + +=== "Rust" + + ```rust title="linked_list.rs" + /* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */ + // Initialize each node + let n0 = Rc::new(RefCell::new(ListNode { val: 1, next: None })); + let n1 = Rc::new(RefCell::new(ListNode { val: 3, next: None })); + let n2 = Rc::new(RefCell::new(ListNode { val: 2, next: None })); + let n3 = Rc::new(RefCell::new(ListNode { val: 5, next: None })); + let n4 = Rc::new(RefCell::new(ListNode { val: 4, next: None })); + + // Build references between nodes + n0.borrow_mut().next = Some(n1.clone()); + n1.borrow_mut().next = Some(n2.clone()); + n2.borrow_mut().next = Some(n3.clone()); + n3.borrow_mut().next = Some(n4.clone()); + ``` + +=== "C" + + ```c title="linked_list.c" + /* Initialize linked list: 1 -> 3 -> 2 -> 5 -> 4 */ + // Initialize each node + ListNode* n0 = newListNode(1); + ListNode* n1 = newListNode(3); + ListNode* n2 = newListNode(2); + ListNode* n3 = newListNode(5); + ListNode* n4 = newListNode(4); + // Build references between nodes + n0->next = n1; + n1->next = n2; + n2->next = n3; + n3->next = n4; + ``` + +=== "Zig" + + ```zig title="linked_list.zig" + // Initialize linked list + // Initialize each node + var n0 = inc.ListNode(i32){.val = 1}; + var n1 = inc.ListNode(i32){.val = 3}; + var n2 = inc.ListNode(i32){.val = 2}; + var n3 = inc.ListNode(i32){.val = 5}; + var n4 = inc.ListNode(i32){.val = 4}; + // Build references between nodes + n0.next = &n1; + n1.next = &n2; + n2.next = &n3; + n3.next = &n4; + ``` + +An array is a single variable, such as the array `nums` containing elements `nums[0]`, `nums[1]`, etc., while a linked list is composed of multiple independent node objects. **We usually refer to the linked list by its head node**, as in the linked list `n0` in the above code. + +### 2.   Inserting a Node + +Inserting a node in a linked list is very easy. As shown in the image below, suppose we want to insert a new node `P` between two adjacent nodes `n0` and `n1`. **This requires changing only two node references (pointers)**, with a time complexity of $O(1)$. + +In contrast, the time complexity of inserting an element in an array is $O(n)$, which is less efficient with large data volumes. + +![Linked List Node Insertion Example](linked_list.assets/linkedlist_insert_node.png){ class="animation-figure" } + +

Figure 4-6   Linked List Node Insertion Example

+ +=== "Python" + + ```python title="linked_list.py" + def insert(n0: ListNode, P: ListNode): + """在链表的节点 n0 之后插入节点 P""" + n1 = n0.next + P.next = n1 + n0.next = P + ``` + +=== "C++" + + ```cpp title="linked_list.cpp" + /* 在链表的节点 n0 之后插入节点 P */ + void insert(ListNode *n0, ListNode *P) { + ListNode *n1 = n0->next; + P->next = n1; + n0->next = P; + } + ``` + +=== "Java" + + ```java title="linked_list.java" + /* 在链表的节点 n0 之后插入节点 P */ + void insert(ListNode n0, ListNode P) { + ListNode n1 = n0.next; + P.next = n1; + n0.next = P; + } + ``` + +=== "C#" + + ```csharp title="linked_list.cs" + /* 在链表的节点 n0 之后插入节点 P */ + void Insert(ListNode n0, ListNode P) { + ListNode? n1 = n0.next; + P.next = n1; + n0.next = P; + } + ``` + +=== "Go" + + ```go title="linked_list.go" + /* 在链表的节点 n0 之后插入节点 P */ + func insertNode(n0 *ListNode, P *ListNode) { + n1 := n0.Next + P.Next = n1 + n0.Next = P + } + ``` + +=== "Swift" + + ```swift title="linked_list.swift" + /* 在链表的节点 n0 之后插入节点 P */ + func insert(n0: ListNode, P: ListNode) { + let n1 = n0.next + P.next = n1 + n0.next = P + } + ``` + +=== "JS" + + ```javascript title="linked_list.js" + /* 在链表的节点 n0 之后插入节点 P */ + function insert(n0, P) { + const n1 = n0.next; + P.next = n1; + n0.next = P; + } + ``` + +=== "TS" + + ```typescript title="linked_list.ts" + /* 在链表的节点 n0 之后插入节点 P */ + function insert(n0: ListNode, P: ListNode): void { + const n1 = n0.next; + P.next = n1; + n0.next = P; + } + ``` + +=== "Dart" + + ```dart title="linked_list.dart" + /* 在链表的节点 n0 之后插入节点 P */ + void insert(ListNode n0, ListNode P) { + ListNode? n1 = n0.next; + P.next = n1; + n0.next = P; + } + ``` + +=== "Rust" + + ```rust title="linked_list.rs" + /* 在链表的节点 n0 之后插入节点 P */ + #[allow(non_snake_case)] + pub fn insert(n0: &Rc>>, P: Rc>>) { + let n1 = n0.borrow_mut().next.take(); + P.borrow_mut().next = n1; + n0.borrow_mut().next = Some(P); + } + ``` + +=== "C" + + ```c title="linked_list.c" + /* 在链表的节点 n0 之后插入节点 P */ + void insert(ListNode *n0, ListNode *P) { + ListNode *n1 = n0->next; + P->next = n1; + n0->next = P; + } + ``` + +=== "Zig" + + ```zig title="linked_list.zig" + // 在链表的节点 n0 之后插入节点 P + fn insert(n0: ?*inc.ListNode(i32), P: ?*inc.ListNode(i32)) void { + var n1 = n0.?.next; + P.?.next = n1; + n0.?.next = P; + } + ``` + +### 3.   Deleting a Node + +As shown below, deleting a node in a linked list is also very convenient, **requiring only the change of one node's reference (pointer)**. + +Note that although node `P` still points to `n1` after the deletion operation is completed, it is no longer accessible when traversing the list, meaning `P` is no longer part of the list. + +![Linked List Node Deletion](linked_list.assets/linkedlist_remove_node.png){ class="animation-figure" } + +

Figure 4-7   Linked List Node Deletion

+ +=== "Python" + + ```python title="linked_list.py" + def remove(n0: ListNode): + """删除链表的节点 n0 之后的首个节点""" + if not n0.next: + return + # n0 -> P -> n1 + P = n0.next + n1 = P.next + n0.next = n1 + ``` + +=== "C++" + + ```cpp title="linked_list.cpp" + /* 删除链表的节点 n0 之后的首个节点 */ + void remove(ListNode *n0) { + if (n0->next == nullptr) + return; + // n0 -> P -> n1 + ListNode *P = n0->next; + ListNode *n1 = P->next; + n0->next = n1; + // 释放内存 + delete P; + } + ``` + +=== "Java" + + ```java title="linked_list.java" + /* 删除链表的节点 n0 之后的首个节点 */ + void remove(ListNode n0) { + if (n0.next == null) + return; + // n0 -> P -> n1 + ListNode P = n0.next; + ListNode n1 = P.next; + n0.next = n1; + } + ``` + +=== "C#" + + ```csharp title="linked_list.cs" + /* 删除链表的节点 n0 之后的首个节点 */ + void Remove(ListNode n0) { + if (n0.next == null) + return; + // n0 -> P -> n1 + ListNode P = n0.next; + ListNode? n1 = P.next; + n0.next = n1; + } + ``` + +=== "Go" + + ```go title="linked_list.go" + /* 删除链表的节点 n0 之后的首个节点 */ + func removeItem(n0 *ListNode) { + if n0.Next == nil { + return + } + // n0 -> P -> n1 + P := n0.Next + n1 := P.Next + n0.Next = n1 + } + ``` + +=== "Swift" + + ```swift title="linked_list.swift" + /* 删除链表的节点 n0 之后的首个节点 */ + func remove(n0: ListNode) { + if n0.next == nil { + return + } + // n0 -> P -> n1 + let P = n0.next + let n1 = P?.next + n0.next = n1 + P?.next = nil + } + ``` + +=== "JS" + + ```javascript title="linked_list.js" + /* 删除链表的节点 n0 之后的首个节点 */ + function remove(n0) { + if (!n0.next) return; + // n0 -> P -> n1 + const P = n0.next; + const n1 = P.next; + n0.next = n1; + } + ``` + +=== "TS" + + ```typescript title="linked_list.ts" + /* 删除链表的节点 n0 之后的首个节点 */ + function remove(n0: ListNode): void { + if (!n0.next) { + return; + } + // n0 -> P -> n1 + const P = n0.next; + const n1 = P.next; + n0.next = n1; + } + ``` + +=== "Dart" + + ```dart title="linked_list.dart" + /* 删除链表的节点 n0 之后的首个节点 */ + void remove(ListNode n0) { + if (n0.next == null) return; + // n0 -> P -> n1 + ListNode P = n0.next!; + ListNode? n1 = P.next; + n0.next = n1; + } + ``` + +=== "Rust" + + ```rust title="linked_list.rs" + /* 删除链表的节点 n0 之后的首个节点 */ + #[allow(non_snake_case)] + pub fn remove(n0: &Rc>>) { + if n0.borrow().next.is_none() {return}; + // n0 -> P -> n1 + let P = n0.borrow_mut().next.take(); + if let Some(node) = P { + let n1 = node.borrow_mut().next.take(); + n0.borrow_mut().next = n1; + } + } + ``` + +=== "C" + + ```c title="linked_list.c" + /* 删除链表的节点 n0 之后的首个节点 */ + // 注意:stdio.h 占用了 remove 关键词 + void removeItem(ListNode *n0) { + if (!n0->next) + return; + // n0 -> P -> n1 + ListNode *P = n0->next; + ListNode *n1 = P->next; + n0->next = n1; + // 释放内存 + free(P); + } + ``` + +=== "Zig" + + ```zig title="linked_list.zig" + // 删除链表的节点 n0 之后的首个节点 + fn remove(n0: ?*inc.ListNode(i32)) void { + if (n0.?.next == null) return; + // n0 -> P -> n1 + var P = n0.?.next; + var n1 = P.?.next; + n0.?.next = n1; + } + ``` + +### 4.   Accessing Nodes + +**Accessing nodes in a linked list is less efficient**. As mentioned earlier, any element in an array can be accessed in $O(1)$ time. However, in a linked list, the program needs to start from the head node and traverse each node sequentially until it finds the target node. That is, accessing the $i$-th node of a linked list requires $i - 1$ iterations, with a time complexity of $O(n)$. + +=== "Python" + + ```python title="linked_list.py" + def access(head: ListNode, index: int) -> ListNode | None: + """访问链表中索引为 index 的节点""" + for _ in range(index): + if not head: + return None + head = head.next + return head + ``` + +=== "C++" + + ```cpp title="linked_list.cpp" + /* 访问链表中索引为 index 的节点 */ + ListNode *access(ListNode *head, int index) { + for (int i = 0; i < index; i++) { + if (head == nullptr) + return nullptr; + head = head->next; + } + return head; + } + ``` + +=== "Java" + + ```java title="linked_list.java" + /* 访问链表中索引为 index 的节点 */ + ListNode access(ListNode head, int index) { + for (int i = 0; i < index; i++) { + if (head == null) + return null; + head = head.next; + } + return head; + } + ``` + +=== "C#" + + ```csharp title="linked_list.cs" + /* 访问链表中索引为 index 的节点 */ + ListNode? Access(ListNode? head, int index) { + for (int i = 0; i < index; i++) { + if (head == null) + return null; + head = head.next; + } + return head; + } + ``` + +=== "Go" + + ```go title="linked_list.go" + /* 访问链表中索引为 index 的节点 */ + func access(head *ListNode, index int) *ListNode { + for i := 0; i < index; i++ { + if head == nil { + return nil + } + head = head.Next + } + return head + } + ``` + +=== "Swift" + + ```swift title="linked_list.swift" + /* 访问链表中索引为 index 的节点 */ + func access(head: ListNode, index: Int) -> ListNode? { + var head: ListNode? = head + for _ in 0 ..< index { + if head == nil { + return nil + } + head = head?.next + } + return head + } + ``` + +=== "JS" + + ```javascript title="linked_list.js" + /* 访问链表中索引为 index 的节点 */ + function access(head, index) { + for (let i = 0; i < index; i++) { + if (!head) { + return null; + } + head = head.next; + } + return head; + } + ``` + +=== "TS" + + ```typescript title="linked_list.ts" + /* 访问链表中索引为 index 的节点 */ + function access(head: ListNode | null, index: number): ListNode | null { + for (let i = 0; i < index; i++) { + if (!head) { + return null; + } + head = head.next; + } + return head; + } + ``` + +=== "Dart" + + ```dart title="linked_list.dart" + /* 访问链表中索引为 index 的节点 */ + ListNode? access(ListNode? head, int index) { + for (var i = 0; i < index; i++) { + if (head == null) return null; + head = head.next; + } + return head; + } + ``` + +=== "Rust" + + ```rust title="linked_list.rs" + /* 访问链表中索引为 index 的节点 */ + pub fn access(head: Rc>>, index: i32) -> Rc>> { + if index <= 0 {return head}; + if let Some(node) = &head.borrow_mut().next { + return access(node.clone(), index - 1); + } + return head; + } + ``` + +=== "C" + + ```c title="linked_list.c" + /* 访问链表中索引为 index 的节点 */ + ListNode *access(ListNode *head, int index) { + for (int i = 0; i < index; i++) { + if (head == NULL) + return NULL; + head = head->next; + } + return head; + } + ``` + +=== "Zig" + + ```zig title="linked_list.zig" + // 访问链表中索引为 index 的节点 + fn access(node: ?*inc.ListNode(i32), index: i32) ?*inc.ListNode(i32) { + var head = node; + var i: i32 = 0; + while (i < index) : (i += 1) { + head = head.?.next; + if (head == null) return null; + } + return head; + } + ``` + +### 5.   Finding Nodes + +Traverse the linked list to find a node with a value equal to `target`, and output the index of that node in the linked list. This process also falls under linear search. The code is as follows: + +=== "Python" + + ```python title="linked_list.py" + def find(head: ListNode, target: int) -> int: + """在链表中查找值为 target 的首个节点""" + index = 0 + while head: + if head.val == target: + return index + head = head.next + index += 1 + return -1 + ``` + +=== "C++" + + ```cpp title="linked_list.cpp" + /* 在链表中查找值为 target 的首个节点 */ + int find(ListNode *head, int target) { + int index = 0; + while (head != nullptr) { + if (head->val == target) + return index; + head = head->next; + index++; + } + return -1; + } + ``` + +=== "Java" + + ```java title="linked_list.java" + /* 在链表中查找值为 target 的首个节点 */ + int find(ListNode head, int target) { + int index = 0; + while (head != null) { + if (head.val == target) + return index; + head = head.next; + index++; + } + return -1; + } + ``` + +=== "C#" + + ```csharp title="linked_list.cs" + /* 在链表中查找值为 target 的首个节点 */ + int Find(ListNode? head, int target) { + int index = 0; + while (head != null) { + if (head.val == target) + return index; + head = head.next; + index++; + } + return -1; + } + ``` + +=== "Go" + + ```go title="linked_list.go" + /* 在链表中查找值为 target 的首个节点 */ + func findNode(head *ListNode, target int) int { + index := 0 + for head != nil { + if head.Val == target { + return index + } + head = head.Next + index++ + } + return -1 + } + ``` + +=== "Swift" + + ```swift title="linked_list.swift" + /* 在链表中查找值为 target 的首个节点 */ + func find(head: ListNode, target: Int) -> Int { + var head: ListNode? = head + var index = 0 + while head != nil { + if head?.val == target { + return index + } + head = head?.next + index += 1 + } + return -1 + } + ``` + +=== "JS" + + ```javascript title="linked_list.js" + /* 在链表中查找值为 target 的首个节点 */ + function find(head, target) { + let index = 0; + while (head !== null) { + if (head.val === target) { + return index; + } + head = head.next; + index += 1; + } + return -1; + } + ``` + +=== "TS" + + ```typescript title="linked_list.ts" + /* 在链表中查找值为 target 的首个节点 */ + function find(head: ListNode | null, target: number): number { + let index = 0; + while (head !== null) { + if (head.val === target) { + return index; + } + head = head.next; + index += 1; + } + return -1; + } + ``` + +=== "Dart" + + ```dart title="linked_list.dart" + /* 在链表中查找值为 target 的首个节点 */ + int find(ListNode? head, int target) { + int index = 0; + while (head != null) { + if (head.val == target) { + return index; + } + head = head.next; + index++; + } + return -1; + } + ``` + +=== "Rust" + + ```rust title="linked_list.rs" + /* 在链表中查找值为 target 的首个节点 */ + pub fn find(head: Rc>>, target: T, index: i32) -> i32 { + if head.borrow().val == target {return index}; + if let Some(node) = &head.borrow_mut().next { + return find(node.clone(), target, index + 1); + } + return -1; + } + ``` + +=== "C" + + ```c title="linked_list.c" + /* 在链表中查找值为 target 的首个节点 */ + int find(ListNode *head, int target) { + int index = 0; + while (head) { + if (head->val == target) + return index; + head = head->next; + index++; + } + return -1; + } + ``` + +=== "Zig" + + ```zig title="linked_list.zig" + // 在链表中查找值为 target 的首个节点 + fn find(node: ?*inc.ListNode(i32), target: i32) i32 { + var head = node; + var index: i32 = 0; + while (head != null) { + if (head.?.val == target) return index; + head = head.?.next; + index += 1; + } + return -1; + } + ``` + +## 4.2.2   Arrays vs. Linked Lists + +The following table summarizes the characteristics of arrays and linked lists and compares their operational efficiencies. Since they employ two opposite storage strategies, their properties and operational efficiencies also show contrasting features. + +

Table 4-1   Efficiency Comparison of Arrays and Linked Lists

+ +
+ +| | Arrays | Linked Lists | +| ------------------ | ------------------------------------------------ | ----------------------- | +| Storage | Contiguous Memory Space | Dispersed Memory Space | +| Capacity Expansion | Fixed Length | Flexible Expansion | +| Memory Efficiency | Less Memory per Element, Potential Space Wastage | More Memory per Element | +| Accessing Elements | $O(1)$ | $O(n)$ | +| Adding Elements | $O(n)$ | $O(1)$ | +| Deleting Elements | $O(n)$ | $O(1)$ | + +
+ +## 4.2.3   Common Types of Linked Lists + +As shown in the following image, there are three common types of linked lists. + +- **Singly Linked List**: This is the regular linked list introduced earlier. The nodes of a singly linked list contain the value and a reference to the next node. The first node is called the head node, and the last node, pointing to null (`None`), is the tail node. +- **Circular Linked List**: If the tail node of a singly linked list points back to the head node (forming a loop), it becomes a circular linked list. In a circular linked list, any node can be considered the head node. +- **Doubly Linked List**: Compared to a singly linked list, a doubly linked list stores references in two directions. Its nodes contain references to both the next (successor) and the previous (predecessor) nodes. Doubly linked lists are more flexible as they allow traversal in both directions but require more memory space. + +=== "Python" + + ```python title="" + class ListNode: + """Bidirectional linked list node class"""" + def __init__(self, val: int): + self.val: int = val # Node value + self.next: ListNode | None = None # Reference to the successor node + self.prev: ListNode | None = None # Reference to a predecessor node + ``` + +=== "C++" + + ```cpp title="" + /* Bidirectional linked list node structure */ + struct ListNode { + int val; // Node value + ListNode *next; // Pointer to the successor node + ListNode *prev; // Pointer to the predecessor node + ListNode(int x) : val(x), next(nullptr), prev(nullptr) {} // Constructor + }; + ``` + +=== "Java" + + ```java title="" + /* Bidirectional linked list node class */ + class ListNode { + int val; // Node value + ListNode next; // Reference to the next node + ListNode prev; // Reference to the predecessor node + ListNode(int x) { val = x; } // Constructor + } + ``` + +=== "C#" + + ```csharp title="" + /* Bidirectional linked list node class */ + class ListNode(int x) { // Constructor + int val = x; // Node value + ListNode next; // Reference to the next node + ListNode prev; // Reference to the predecessor node + } + ``` + +=== "Go" + + ```go title="" + /* Bidirectional linked list node structure */ + type DoublyListNode struct { + Val int // Node value + Next *DoublyListNode // Pointer to the successor node + Prev *DoublyListNode // Pointer to the predecessor node + } + + // NewDoublyListNode initialization + func NewDoublyListNode(val int) *DoublyListNode { + return &DoublyListNode{ + Val: val, + Next: nil, + Prev: nil, + } + } + ``` + +=== "Swift" + + ```swift title="" + /* Bidirectional linked list node class */ + class ListNode { + var val: Int // Node value + var next: ListNode? // Reference to the next node + var prev: ListNode? // Reference to the predecessor node + + init(x: Int) { // Constructor + val = x + } + } + ``` + +=== "JS" + + ```javascript title="" + /* Bidirectional linked list node class */ + class ListNode { + constructor(val, next, prev) { + this.val = val === undefined ? 0 : val; // Node value + this.next = next === undefined ? null : next; // Reference to the successor node + this.prev = prev === undefined ? null : prev; // Reference to the predecessor node + } + } + ``` + +=== "TS" + + ```typescript title="" + /* Bidirectional linked list node class */ + class ListNode { + val: number; + next: ListNode | null; + prev: ListNode | null; + constructor(val?: number, next?: ListNode | null, prev?: ListNode | null) { + this.val = val === undefined ? 0 : val; // Node value + this.next = next === undefined ? null : next; // Reference to the successor node + this.prev = prev === undefined ? null : prev; // Reference to the predecessor node + } + } + ``` + +=== "Dart" + + ```dart title="" + /* Bidirectional linked list node class */ + class ListNode { + int val; // Node value + ListNode next; // Reference to the next node + ListNode prev; // Reference to the predecessor node + ListNode(this.val, [this.next, this.prev]); // Constructor + } + ``` + +=== "Rust" + + ```rust title="" + use std::rc::Rc; + use std::cell::RefCell; + + /* Bidirectional linked list node type */ + #[derive(Debug)] + struct ListNode { + val: i32, // Node value + next: Option>>, // Pointer to successor node + prev: Option>>, // Pointer to predecessor node + } + + /* Constructors */ + impl ListNode { + fn new(val: i32) -> Self { + ListNode { + val, + next: None, + prev: None, + } + } + } + ``` + +=== "C" + + ```c title="" + /* Bidirectional linked list node structure */ + typedef struct ListNode { + int val; // Node value + struct ListNode *next; // Pointer to the successor node + struct ListNode *prev; // Pointer to the predecessor node + } ListNode; + + /* Constructors */ + ListNode *newListNode(int val) { + ListNode *node, *next; + node = (ListNode *) malloc(sizeof(ListNode)); + node->val = val; + node->next = NULL; + node->prev = NULL; + return node; + } + ``` + +=== "Zig" + + ```zig title="" + // Bidirectional linked list node class + pub fn ListNode(comptime T: type) type { + return struct { + const Self = @This(); + + val: T = 0, // Node value + next: ?*Self = null, // Pointer to the successor node + prev: ?*Self = null, // Pointer to the predecessor node + + // Constructor + pub fn init(self: *Self, x: i32) void { + self.val = x; + self.next = null; + self.prev = null; + } + }; + } + ``` + +![Common Types of Linked Lists](linked_list.assets/linkedlist_common_types.png){ class="animation-figure" } + +

Figure 4-8   Common Types of Linked Lists

+ +## 4.2.4   Typical Applications of Linked Lists + +Singly linked lists are commonly used to implement stacks, queues, hash tables, and graphs. + +- **Stacks and Queues**: When insertion and deletion operations are performed at one end of the linked list, it exhibits last-in-first-out characteristics, corresponding to a stack. When insertion is at one end and deletion is at the other, it shows first-in-first-out characteristics, corresponding to a queue. +- **Hash Tables**: Chaining is one of the mainstream solutions to hash collisions, where all colliding elements are placed in a linked list. +- **Graphs**: Adjacency lists are a common way to represent graphs, where each vertex is associated with a linked list. Each element in the list represents other vertices connected to that vertex. + +Doubly linked lists are commonly used in scenarios that require quick access to the previous and next elements. + +- **Advanced Data Structures**: For example, in red-black trees and B-trees, we need to access a node's parent, which can be achieved by storing a reference to the parent node in each node, similar to a doubly linked list. +- **Browser History**: In web browsers, when a user clicks the forward or backward button, the browser needs to know the previously and next visited web pages. The properties of a doubly linked list make this operation simple. +- **LRU Algorithm**: In Least Recently Used (LRU) cache eviction algorithms, we need to quickly find the least recently used data and support rapid addition and deletion of nodes. Here, using a doubly linked list is very appropriate. + +Circular linked lists are commonly used in scenarios requiring periodic operations, such as resource scheduling in operating systems. + +- **Round-Robin Scheduling Algorithm**: In operating systems, the round-robin scheduling algorithm is a common CPU scheduling algorithm that cycles through a group of processes. Each process is assigned a time slice, and when it expires, the CPU switches to the next process. This circular operation can be implemented using a circular linked list. +- **Data Buffers**: Circular linked lists may also be used in some data buffer implementations. For instance, in audio and video players, the data stream might be divided into multiple buffer blocks placed in a circular linked list to achieve seamless playback. diff --git a/docs-en/chapter_array_and_linkedlist/list.md b/docs-en/chapter_array_and_linkedlist/list.md new file mode 100755 index 000000000..25f111409 --- /dev/null +++ b/docs-en/chapter_array_and_linkedlist/list.md @@ -0,0 +1,2120 @@ +--- +comments: true +--- + +# 4.3   List + +A "list" is an abstract data structure concept, representing an ordered collection of elements. It supports operations like element access, modification, addition, deletion, and traversal, without requiring users to consider capacity limitations. Lists can be implemented based on linked lists or arrays. + +- A linked list naturally functions as a list, supporting operations for adding, deleting, searching, and modifying elements, and can dynamically adjust its size. +- Arrays also support these operations, but due to their fixed length, they can be considered as a list with a length limit. + +When using arrays to implement lists, **the fixed length property reduces the practicality of the list**. This is because we often cannot determine in advance how much data needs to be stored, making it difficult to choose an appropriate list length. If the length is too small, it may not meet the requirements; if too large, it may waste memory space. + +To solve this problem, we can use a "dynamic array" to implement lists. It inherits the advantages of arrays and can dynamically expand during program execution. + +In fact, **many programming languages' standard libraries implement lists using dynamic arrays**, such as Python's `list`, Java's `ArrayList`, C++'s `vector`, and C#'s `List`. In the following discussion, we will consider "list" and "dynamic array" as synonymous concepts. + +## 4.3.1   Common List Operations + +### 1.   Initializing a List + +We typically use two methods of initialization: "without initial values" and "with initial values". + +=== "Python" + + ```python title="list.py" + # Initialize list + # Without initial values + nums1: list[int] = [] + # With initial values + nums: list[int] = [1, 3, 2, 5, 4] + ``` + +=== "C++" + + ```cpp title="list.cpp" + /* Initialize list */ + // Note, in C++ the vector is the equivalent of nums described here + // Without initial values + vector nums1; + // With initial values + vector nums = { 1, 3, 2, 5, 4 }; + ``` + +=== "Java" + + ```java title="list.java" + /* Initialize list */ + // Without initial values + List nums1 = new ArrayList<>(); + // With initial values (note the element type should be the wrapper class Integer[] for int[]) + Integer[] numbers = new Integer[] { 1, 3, 2, 5, 4 }; + List nums = new ArrayList<>(Arrays.asList(numbers)); + ``` + +=== "C#" + + ```csharp title="list.cs" + /* Initialize list */ + // Without initial values + List nums1 = []; + // With initial values + int[] numbers = [1, 3, 2, 5, 4]; + List nums = [.. numbers]; + ``` + +=== "Go" + + ```go title="list_test.go" + /* Initialize list */ + // Without initial values + nums1 := []int{} + // With initial values + nums := []int{1, 3, 2, 5, 4} + ``` + +=== "Swift" + + ```swift title="list.swift" + /* Initialize list */ + // Without initial values + let nums1: [Int] = [] + // With initial values + var nums = [1, 3, 2, 5, 4] + ``` + +=== "JS" + + ```javascript title="list.js" + /* Initialize list */ + // Without initial values + const nums1 = []; + // With initial values + const nums = [1, 3, 2, 5, 4]; + ``` + +=== "TS" + + ```typescript title="list.ts" + /* Initialize list */ + // Without initial values + const nums1: number[] = []; + // With initial values + const nums: number[] = [1, 3, 2, 5, 4]; + ``` + +=== "Dart" + + ```dart title="list.dart" + /* Initialize list */ + // Without initial values + List nums1 = []; + // With initial values + List nums = [1, 3, 2, 5, 4]; + ``` + +=== "Rust" + + ```rust title="list.rs" + /* Initialize list */ + // Without initial values + let nums1: Vec = Vec::new(); + // With initial values + let nums: Vec = vec![1, 3, 2, 5, 4]; + ``` + +=== "C" + + ```c title="list.c" + // C does not provide built-in dynamic arrays + ``` + +=== "Zig" + + ```zig title="list.zig" + // Initialize list + var nums = std.ArrayList(i32).init(std.heap.page_allocator); + defer nums.deinit(); + try nums.appendSlice(&[_]i32{ 1, 3, 2, 5, 4 }); + ``` + +### 2.   Accessing Elements + +Lists are essentially arrays, so accessing and updating elements can be done in $O(1)$ time, which is very efficient. + +=== "Python" + + ```python title="list.py" + # Access elements + num: int = nums[1] # Access the element at index 1 + + # Update elements + nums[1] = 0 # Update the element at index 1 to 0 + ``` + +=== "C++" + + ```cpp title="list.cpp" + /* Access elements */ + int num = nums[1]; // Access the element at index 1 + + /* Update elements */ + nums[1] = 0; // Update the element at index 1 to 0 + ``` + +=== "Java" + + ```java title="list.java" + /* Access elements */ + int num = nums.get(1); // Access the element at index 1 + + /* Update elements */ + nums.set(1, 0); // Update the element at index 1 to 0 + ``` + +=== "C#" + + ```csharp title="list.cs" + /* Access elements */ + int num = nums[1]; // Access the element at index 1 + + /* Update elements */ + nums[1] = 0; // Update the element at index 1 to 0 + ``` + +=== "Go" + + ```go title="list_test.go" + /* Access elements */ + num := nums[1] // Access the element at index 1 + + /* Update elements */ + nums[1] = 0 // Update the element at index 1 to 0 + ``` + +=== "Swift" + + ```swift title="list.swift" + /* Access elements */ + let num = nums[1] // Access the element at index 1 + + /* Update elements */ + nums[1] = 0 // Update the element at index 1 to 0 + ``` + +=== "JS" + + ```javascript title="list.js" + /* Access elements */ + const num = nums[1]; // Access the element at index 1 + + /* Update elements */ + nums[1] = 0; // Update the element at index 1 to 0 + ``` + +=== "TS" + + ```typescript title="list.ts" + /* Access elements */ + const num: number = nums[1]; // Access the element at index 1 + + /* Update elements */ + nums[1] = 0; // Update the element at index 1 to 0 + ``` + +=== "Dart" + + ```dart title="list.dart" + /* Access elements */ + int num = nums[1]; // Access the element at index 1 + + /* Update elements */ + nums[1] = 0; // Update the element at index 1 to 0 + ``` + +=== "Rust" + + ```rust title="list.rs" + /* Access elements */ + let num: i32 = nums[1]; // Access the element at index 1 + /* Update elements */ + nums[1] = 0; // Update the element at index 1 to 0 + ``` + +=== "C" + + ```c title="list.c" + // C does not provide built-in dynamic arrays + ``` + +=== "Zig" + + ```zig title="list.zig" + // Access elements + var num = nums.items[1]; // Access the element at index 1 + + // Update elements + nums.items[1] = 0; // Update the element at index 1 to 0 + ``` + +### 3.   Inserting and Deleting Elements + +Compared to arrays, lists can freely add and remove elements. Adding elements at the end of the list has a time complexity of $O(1)$, but the efficiency of inserting and deleting elements is still the same as in arrays, with a time complexity of $O(n)$. + +=== "Python" + + ```python title="list.py" + # Clear list + nums.clear() + + # Append elements at the end + nums.append(1) + nums.append(3) + nums.append(2) + nums.append(5) + nums.append(4) + + # Insert element in the middle + nums.insert(3, 6) # Insert number 6 at index 3 + + # Remove elements + nums.pop(3) # Remove the element at index 3 + ``` + +=== "C++" + + ```cpp title="list.cpp" + /* Clear list */ + nums.clear(); + + /* Append elements at the end */ + nums.push_back(1); + nums.push_back(3); + nums.push_back(2); + nums.push_back(5); + nums.push_back(4); + + /* Insert element in the middle */ + nums.insert(nums.begin() + 3, 6); // Insert number 6 at index 3 + + /* Remove elements */ + nums.erase(nums.begin() + 3); // Remove the element at index 3 + ``` + +=== "Java" + + ```java title="list.java" + /* Clear list */ + nums.clear(); + + /* Append elements at the end */ + nums.add(1); + nums.add(3); + nums.add(2); + nums.add(5); + nums.add(4); + + /* Insert element in the middle */ + nums.add(3, 6); // Insert number 6 at index 3 + + /* Remove elements */ + nums.remove(3); // Remove the element at index 3 + ``` + +=== "C#" + + ```csharp title="list.cs" + /* Clear list */ + nums.Clear(); + + /* Append elements at the end */ + nums.Add(1); + nums.Add(3); + nums.Add(2); + nums.Add(5); + nums.Add(4); + + /* Insert element in the middle */ + nums.Insert(3, 6); + + /* Remove elements */ + nums.RemoveAt(3); + ``` + +=== "Go" + + ```go title="list_test.go" + /* Clear list */ + nums = nil + + /* Append elements at the end */ + nums = append(nums, 1) + nums = append(nums, 3) + nums = append(nums, 2) + nums = append(nums, 5) + nums = append(nums, 4) + + /* Insert element in the middle */ + nums = append(nums[:3], append([]int{6}, nums[3:]...)...) // Insert number 6 at index 3 + + /* Remove elements */ + nums = append(nums[:3], nums[4:]...) // Remove the element at index 3 + ``` + +=== "Swift" + + ```swift title="list.swift" + /* Clear list */ + nums.removeAll() + + /* Append elements at the end */ + nums.append(1) + nums.append(3) + nums.append(2) + nums.append(5) + nums.append(4) + + /* Insert element in the middle */ + nums.insert(6, at: 3) // Insert number 6 at index 3 + + /* Remove elements */ + nums.remove(at: 3) // Remove the element at index 3 + ``` + +=== "JS" + + ```javascript title="list.js" + /* Clear list */ + nums.length = 0; + + /* Append elements at the end */ + nums.push(1); + nums.push(3); + nums.push(2); + nums.push(5); + nums.push(4); + + /* Insert element in the middle */ + nums.splice(3, 0, 6); + + /* Remove elements */ + nums.splice(3, 1); + ``` + +=== "TS" + + ```typescript title="list.ts" + /* Clear list */ + nums.length = 0; + + /* Append elements at the end */ + nums.push(1); + nums.push(3); + nums.push(2); + nums.push(5); + nums.push(4); + + /* Insert element in the middle */ + nums.splice(3, 0, 6); + + /* Remove elements */ + nums.splice(3, 1); + ``` + +=== "Dart" + + ```dart title="list.dart" + /* Clear list */ + nums.clear(); + + /* Append elements at the end */ + nums.add(1); + nums.add(3); + nums.add(2); + nums.add(5); + nums.add(4); + + /* Insert element in the middle */ + nums.insert(3, 6); // Insert number 6 at index 3 + + /* Remove elements */ + nums.removeAt(3); // Remove the element at index 3 + ``` + +=== "Rust" + + ```rust title="list.rs" + /* Clear list */ + nums.clear(); + + /* Append elements at the end */ + nums.push(1); + nums.push(3); + nums.push(2); + nums.push(5); + nums.push(4); + + /* Insert element in the middle */ + nums.insert(3, 6); // Insert number 6 at index 3 + + /* Remove elements */ + nums.remove(3); // Remove the element at index 3 + ``` + +=== "C" + + ```c title="list.c" + // C does not provide built-in dynamic arrays + ``` + +=== "Zig" + + ```zig title="list.zig" + // Clear list + nums.clearRetainingCapacity(); + + // Append elements at the end + try nums.append(1); + try nums.append(3); + try nums.append(2); + try nums.append(5); + try nums.append(4); + + // Insert element in the middle + try nums.insert(3, 6); // Insert number 6 at index 3 + + // Remove elements + _ = nums.orderedRemove(3); // Remove the element at index 3 + ``` + +### 4.   Traversing the List + +Like arrays, lists can be traversed based on index, or by directly iterating over each element. + +=== "Python" + + ```python title="list.py" + # Iterate through the list by index + count = 0 + for i in range(len(nums)): + count += nums[i] + + # Iterate directly through list elements + for num in nums: + count += num + ``` + +=== "C++" + + ```cpp title="list.cpp" + /* Iterate through the list by index */ + int count = 0; + for (int i = 0; i < nums.size(); i++) { + count += nums[i]; + } + + /* Iterate directly through list elements */ + count = 0; + for (int num : nums) { + count += num; + } + ``` + +=== "Java" + + ```java title="list.java" + /* Iterate through the list by index */ + int count = 0; + for (int i = 0; i < nums.size(); i++) { + count += nums.get(i); + } + + /* Iterate directly through list elements */ + for (int num : nums) { + count += num; + } + ``` + +=== "C#" + + ```csharp title="list.cs" + /* Iterate through the list by index */ + int count = 0; + for (int i = 0; i < nums.Count; i++) { + count += nums[i]; + } + + /* Iterate directly through list elements */ + count = 0; + foreach (int num in nums) { + count += num; + } + ``` + +=== "Go" + + ```go title="list_test.go" + /* Iterate through the list by index */ + count := 0 + for i := 0; i < len(nums); i++ { + count += nums[i] + } + + /* Iterate directly through list elements */ + count = 0 + for _, num := range nums { + count += num + } + ``` + +=== "Swift" + + ```swift title="list.swift" + /* Iterate through the list by index */ + var count = 0 + for i in nums.indices { + count += nums[i] + } + + /* Iterate directly through list elements */ + count = 0 + for num in nums { + count += num + } + ``` + +=== "JS" + + ```javascript title="list.js" + /* Iterate through the list by index */ + let count = 0; + for (let i = 0; i < nums.length; i++) { + count += nums[i]; + } + + /* Iterate directly through list elements */ + count = 0; + for (const num of nums) { + count += num; + } + ``` + +=== "TS" + + ```typescript title="list.ts" + /* Iterate through the list by index */ + let count = 0; + for (let i = 0; i < nums.length; i++) { + count += nums[i]; + } + + /* Iterate directly through list elements */ + count = 0; + for (const num of nums) { + count += num; + } + ``` + +=== "Dart" + + ```dart title="list.dart" + /* Iterate through the list by index */ + int count = 0; + for (var i = 0; i < nums.length; i++) { + count += nums[i]; + } + + /* Iterate directly through list elements */ + count = 0; + for (var num in nums) { + count += num; + } + ``` + +=== "Rust" + + ```rust title="list.rs" + // Iterate through the list by index + let mut _count = 0; + for i in 0..nums.len() { + _count += nums[i]; + } + + // Iterate directly through list elements + _count = 0; + for num in &nums { + _count += num; + } + ``` + +=== "C" + + ```c title="list.c" + // C does not provide built-in dynamic arrays + ``` + +=== "Zig" + + ```zig title="list.zig" + // Iterate through the list by index + var count: i32 = 0; + var i: i32 = 0; + while (i < nums.items.len) : (i += 1) { + count += nums[i]; + } + + // Iterate directly through list elements + count = 0; + for (nums.items) |num| { + count += num; + } + ``` + +### 5.   Concatenating Lists + +Given a new list `nums1`, we can append it to the end of the original list. + +=== "Python" + + ```python title="list.py" + # Concatenate two lists + nums1: list[int] = [6, 8, 7, 10, 9] + nums += nums1 # Concatenate nums1 to the end of nums + ``` + +=== "C++" + + ```cpp title="list.cpp" + /* Concatenate two lists */ + vector nums1 = { 6, 8, 7, 10, 9 }; + // Concatenate nums1 to the end of nums + nums.insert(nums.end(), nums1.begin(), nums1.end()); + ``` + +=== "Java" + + ```java title="list.java" + /* Concatenate two lists */ + List nums1 = new ArrayList<>(Arrays.asList(new Integer[] { 6, 8, 7, 10, 9 })); + nums.addAll(nums1); // Concatenate nums1 to the end of nums + ``` + +=== "C#" + + ```csharp title="list.cs" + /* Concatenate two lists */ + List nums1 = [6, 8, 7, 10, 9]; + nums.AddRange(nums1); // Concatenate nums1 to the end of nums + ``` + +=== "Go" + + ```go title="list_test.go" + /* Concatenate two lists */ + nums1 := []int{6, 8, 7, 10, 9} + nums = append(nums, nums1...) // Concatenate nums1 to the end of nums + ``` + +=== "Swift" + + ```swift title="list.swift" + /* Concatenate two lists */ + let nums1 = [6, 8, 7, 10, 9] + nums.append(contentsOf: nums1) // Concatenate nums1 to the end of nums + ``` + +=== "JS" + + ```javascript title="list.js" + /* Concatenate two lists */ + const nums1 = [6, 8, 7, 10, 9]; + nums.push(...nums1); // Concatenate nums1 to the end of nums + ``` + +=== "TS" + + ```typescript title="list.ts" + /* Concatenate two lists */ + const nums1: number[] = [6, 8, 7, 10, 9]; + nums.push(...nums1); // Concatenate nums1 to the end of nums + ``` + +=== "Dart" + + ```dart title="list.dart" + /* Concatenate two lists */ + List nums1 = [6, 8, 7, 10, 9]; + nums.addAll(nums1); // Concatenate nums1 to the end of nums + ``` + +=== "Rust" + + ```rust title="list.rs" + /* Concatenate two lists */ + let nums1: Vec = vec![6, 8, 7, 10, 9]; + nums.extend(nums1); + ``` + +=== "C" + + ```c title="list.c" + // C does not provide built-in dynamic arrays + ``` + +=== "Zig" + + ```zig title="list.zig" + // Concatenate two lists + var nums1 = std.ArrayList(i32).init(std.heap.page_allocator); + defer nums1.deinit(); + try nums1.appendSlice(&[_]i32{ 6, 8, 7, 10, 9 }); + try nums.insertSlice(nums.items.len, nums1.items); // Concatenate nums1 to the end of nums + ``` + +### 6.   Sorting the List + +After sorting the list, we can use algorithms often tested in array-related algorithm problems, such as "binary search" and "two-pointer" algorithms. + +=== "Python" + + ```python title="list.py" + # Sort the list + nums.sort() # After sorting, the list elements are in ascending order + ``` + +=== "C++" + + ```cpp title="list.cpp" + /* Sort the list */ + sort(nums.begin(), nums.end()); // After sorting, the list elements are in ascending order + ``` + +=== "Java" + + ```java title="list.java" + /* Sort the list */ + Collections.sort(nums); // After sorting, the list elements are in ascending order + ``` + +=== "C#" + + ```csharp title="list.cs" + /* Sort the list */ + nums.Sort(); // After sorting, the list elements are in ascending order + ``` + +=== "Go" + + ```go title="list_test.go" + /* Sort the list */ + sort.Ints(nums) // After sorting, the list elements are in ascending order + ``` + +=== "Swift" + + ```swift title="list.swift" + /* Sort the list */ + nums.sort() // After sorting, the list elements are in ascending order + ``` + +=== "JS" + + ```javascript title="list.js" + /* Sort the list */ + nums.sort((a, b) => a - b); // After sorting, the list elements are in ascending order + ``` + +=== "TS" + + ```typescript title="list.ts" + /* Sort the list */ + nums.sort((a, b) => a - b); // After sorting, the list elements are in ascending order + ``` + +=== "Dart" + + ```dart title="list.dart" + /* Sort the list */ + nums.sort(); // After sorting, the list elements are in ascending order + ``` + +=== "Rust" + + ```rust title="list.rs" + /* Sort the list */ + nums.sort(); // After sorting, the list elements are in ascending order + ``` + +=== "C" + + ```c title="list.c" + // C does not provide built-in dynamic arrays + ``` + +=== "Zig" + + ```zig title="list.zig" + // Sort the list + std.sort.sort(i32, nums.items, {}, comptime std.sort.asc(i32)); + ``` + +## 4.3.2   List Implementation + +Many programming languages have built-in lists, such as Java, C++, Python, etc. Their implementations are quite complex, with very meticulous settings for parameters such as initial capacity and expansion multiplier. Interested readers can refer to the source code for learning. + +To deepen the understanding of how lists work, let's try implementing a simple version of a list, focusing on three key designs. + +- **Initial Capacity**: Choose a reasonable initial capacity for the array. In this example, we choose 10 as the initial capacity. +- **Size Recording**: Declare a variable `size` to record the current number of elements in the list, updating in real-time with element insertion and deletion. With this variable, we can locate the end of the list and determine whether expansion is needed. +- **Expansion Mechanism**: If the list's capacity is full when inserting an element, expansion is necessary. First, create a larger array based on the expansion multiplier, then move all elements of the current array to the new array. In this example, we define that each time the array will expand to twice its previous size. + +=== "Python" + + ```python title="my_list.py" + class MyList: + """列表类""" + + def __init__(self): + """构造方法""" + self._capacity: int = 10 # 列表容量 + self._arr: list[int] = [0] * self._capacity # 数组(存储列表元素) + self._size: int = 0 # 列表长度(当前元素数量) + self._extend_ratio: int = 2 # 每次列表扩容的倍数 + + def size(self) -> int: + """获取列表长度(当前元素数量)""" + return self._size + + def capacity(self) -> int: + """获取列表容量""" + return self._capacity + + def get(self, index: int) -> int: + """访问元素""" + # 索引如果越界则抛出异常,下同 + if index < 0 or index >= self._size: + raise IndexError("索引越界") + return self._arr[index] + + def set(self, num: int, index: int): + """更新元素""" + if index < 0 or index >= self._size: + raise IndexError("索引越界") + self._arr[index] = num + + def add(self, num: int): + """在尾部添加元素""" + # 元素数量超出容量时,触发扩容机制 + if self.size() == self.capacity(): + self.extend_capacity() + self._arr[self._size] = num + self._size += 1 + + def insert(self, num: int, index: int): + """在中间插入元素""" + if index < 0 or index >= self._size: + raise IndexError("索引越界") + # 元素数量超出容量时,触发扩容机制 + if self._size == self.capacity(): + self.extend_capacity() + # 将索引 index 以及之后的元素都向后移动一位 + for j in range(self._size - 1, index - 1, -1): + self._arr[j + 1] = self._arr[j] + self._arr[index] = num + # 更新元素数量 + self._size += 1 + + def remove(self, index: int) -> int: + """删除元素""" + if index < 0 or index >= self._size: + raise IndexError("索引越界") + num = self._arr[index] + # 索引 i 之后的元素都向前移动一位 + for j in range(index, self._size - 1): + self._arr[j] = self._arr[j + 1] + # 更新元素数量 + self._size -= 1 + # 返回被删除元素 + return num + + def extend_capacity(self): + """列表扩容""" + # 新建一个长度为原数组 __extend_ratio 倍的新数组,并将原数组拷贝到新数组 + self._arr = self._arr + [0] * self.capacity() * (self._extend_ratio - 1) + # 更新列表容量 + self._capacity = len(self._arr) + + def to_array(self) -> list[int]: + """返回有效长度的列表""" + return self._arr[: self._size] + ``` + +=== "C++" + + ```cpp title="my_list.cpp" + /* 列表类 */ + class MyList { + private: + int *arr; // 数组(存储列表元素) + int arrCapacity = 10; // 列表容量 + int arrSize = 0; // 列表长度(当前元素数量) + int extendRatio = 2; // 每次列表扩容的倍数 + + public: + /* 构造方法 */ + MyList() { + arr = new int[arrCapacity]; + } + + /* 析构方法 */ + ~MyList() { + delete[] arr; + } + + /* 获取列表长度(当前元素数量)*/ + int size() { + return arrSize; + } + + /* 获取列表容量 */ + int capacity() { + return arrCapacity; + } + + /* 访问元素 */ + int get(int index) { + // 索引如果越界则抛出异常,下同 + if (index < 0 || index >= size()) + throw out_of_range("索引越界"); + return arr[index]; + } + + /* 更新元素 */ + void set(int index, int num) { + if (index < 0 || index >= size()) + throw out_of_range("索引越界"); + arr[index] = num; + } + + /* 在尾部添加元素 */ + void add(int num) { + // 元素数量超出容量时,触发扩容机制 + if (size() == capacity()) + extendCapacity(); + arr[size()] = num; + // 更新元素数量 + arrSize++; + } + + /* 在中间插入元素 */ + void insert(int index, int num) { + if (index < 0 || index >= size()) + throw out_of_range("索引越界"); + // 元素数量超出容量时,触发扩容机制 + if (size() == capacity()) + extendCapacity(); + // 将索引 index 以及之后的元素都向后移动一位 + for (int j = size() - 1; j >= index; j--) { + arr[j + 1] = arr[j]; + } + arr[index] = num; + // 更新元素数量 + arrSize++; + } + + /* 删除元素 */ + int remove(int index) { + if (index < 0 || index >= size()) + throw out_of_range("索引越界"); + int num = arr[index]; + // 索引 i 之后的元素都向前移动一位 + for (int j = index; j < size() - 1; j++) { + arr[j] = arr[j + 1]; + } + // 更新元素数量 + arrSize--; + // 返回被删除元素 + return num; + } + + /* 列表扩容 */ + void extendCapacity() { + // 新建一个长度为原数组 extendRatio 倍的新数组 + int newCapacity = capacity() * extendRatio; + int *tmp = arr; + arr = new int[newCapacity]; + // 将原数组中的所有元素复制到新数组 + for (int i = 0; i < size(); i++) { + arr[i] = tmp[i]; + } + // 释放内存 + delete[] tmp; + arrCapacity = newCapacity; + } + + /* 将列表转换为 Vector 用于打印 */ + vector toVector() { + // 仅转换有效长度范围内的列表元素 + vector vec(size()); + for (int i = 0; i < size(); i++) { + vec[i] = arr[i]; + } + return vec; + } + }; + ``` + +=== "Java" + + ```java title="my_list.java" + /* 列表类 */ + class MyList { + private int[] arr; // 数组(存储列表元素) + private int capacity = 10; // 列表容量 + private int size = 0; // 列表长度(当前元素数量) + private int extendRatio = 2; // 每次列表扩容的倍数 + + /* 构造方法 */ + public MyList() { + arr = new int[capacity]; + } + + /* 获取列表长度(当前元素数量) */ + public int size() { + return size; + } + + /* 获取列表容量 */ + public int capacity() { + return capacity; + } + + /* 访问元素 */ + public int get(int index) { + // 索引如果越界则抛出异常,下同 + if (index < 0 || index >= size) + throw new IndexOutOfBoundsException("索引越界"); + return arr[index]; + } + + /* 更新元素 */ + public void set(int index, int num) { + if (index < 0 || index >= size) + throw new IndexOutOfBoundsException("索引越界"); + arr[index] = num; + } + + /* 在尾部添加元素 */ + public void add(int num) { + // 元素数量超出容量时,触发扩容机制 + if (size == capacity()) + extendCapacity(); + arr[size] = num; + // 更新元素数量 + size++; + } + + /* 在中间插入元素 */ + public void insert(int index, int num) { + if (index < 0 || index >= size) + throw new IndexOutOfBoundsException("索引越界"); + // 元素数量超出容量时,触发扩容机制 + if (size == capacity()) + extendCapacity(); + // 将索引 index 以及之后的元素都向后移动一位 + for (int j = size - 1; j >= index; j--) { + arr[j + 1] = arr[j]; + } + arr[index] = num; + // 更新元素数量 + size++; + } + + /* 删除元素 */ + public int remove(int index) { + if (index < 0 || index >= size) + throw new IndexOutOfBoundsException("索引越界"); + int num = arr[index]; + // 将索引 index 之后的元素都向前移动一位 + for (int j = index; j < size - 1; j++) { + arr[j] = arr[j + 1]; + } + // 更新元素数量 + size--; + // 返回被删除元素 + return num; + } + + /* 列表扩容 */ + public void extendCapacity() { + // 新建一个长度为原数组 extendRatio 倍的新数组,并将原数组拷贝到新数组 + arr = Arrays.copyOf(arr, capacity() * extendRatio); + // 更新列表容量 + capacity = arr.length; + } + + /* 将列表转换为数组 */ + public int[] toArray() { + int size = size(); + // 仅转换有效长度范围内的列表元素 + int[] arr = new int[size]; + for (int i = 0; i < size; i++) { + arr[i] = get(i); + } + return arr; + } + } + ``` + +=== "C#" + + ```csharp title="my_list.cs" + /* 列表类 */ + class MyList { + private int[] arr; // 数组(存储列表元素) + private int arrCapacity = 10; // 列表容量 + private int arrSize = 0; // 列表长度(当前元素数量) + private readonly int extendRatio = 2; // 每次列表扩容的倍数 + + /* 构造方法 */ + public MyList() { + arr = new int[arrCapacity]; + } + + /* 获取列表长度(当前元素数量)*/ + public int Size() { + return arrSize; + } + + /* 获取列表容量 */ + public int Capacity() { + return arrCapacity; + } + + /* 访问元素 */ + public int Get(int index) { + // 索引如果越界则抛出异常,下同 + if (index < 0 || index >= arrSize) + throw new IndexOutOfRangeException("索引越界"); + return arr[index]; + } + + /* 更新元素 */ + public void Set(int index, int num) { + if (index < 0 || index >= arrSize) + throw new IndexOutOfRangeException("索引越界"); + arr[index] = num; + } + + /* 在尾部添加元素 */ + public void Add(int num) { + // 元素数量超出容量时,触发扩容机制 + if (arrSize == arrCapacity) + ExtendCapacity(); + arr[arrSize] = num; + // 更新元素数量 + arrSize++; + } + + /* 在中间插入元素 */ + public void Insert(int index, int num) { + if (index < 0 || index >= arrSize) + throw new IndexOutOfRangeException("索引越界"); + // 元素数量超出容量时,触发扩容机制 + if (arrSize == arrCapacity) + ExtendCapacity(); + // 将索引 index 以及之后的元素都向后移动一位 + for (int j = arrSize - 1; j >= index; j--) { + arr[j + 1] = arr[j]; + } + arr[index] = num; + // 更新元素数量 + arrSize++; + } + + /* 删除元素 */ + public int Remove(int index) { + if (index < 0 || index >= arrSize) + throw new IndexOutOfRangeException("索引越界"); + int num = arr[index]; + // 将索引 index 之后的元素都向前移动一位 + for (int j = index; j < arrSize - 1; j++) { + arr[j] = arr[j + 1]; + } + // 更新元素数量 + arrSize--; + // 返回被删除元素 + return num; + } + + /* 列表扩容 */ + public void ExtendCapacity() { + // 新建一个长度为 arrCapacity * extendRatio 的数组,并将原数组拷贝到新数组 + Array.Resize(ref arr, arrCapacity * extendRatio); + // 更新列表容量 + arrCapacity = arr.Length; + } + + /* 将列表转换为数组 */ + public int[] ToArray() { + // 仅转换有效长度范围内的列表元素 + int[] arr = new int[arrSize]; + for (int i = 0; i < arrSize; i++) { + arr[i] = Get(i); + } + return arr; + } + } + ``` + +=== "Go" + + ```go title="my_list.go" + /* 列表类 */ + type myList struct { + arrCapacity int + arr []int + arrSize int + extendRatio int + } + + /* 构造函数 */ + func newMyList() *myList { + return &myList{ + arrCapacity: 10, // 列表容量 + arr: make([]int, 10), // 数组(存储列表元素) + arrSize: 0, // 列表长度(当前元素数量) + extendRatio: 2, // 每次列表扩容的倍数 + } + } + + /* 获取列表长度(当前元素数量) */ + func (l *myList) size() int { + return l.arrSize + } + + /* 获取列表容量 */ + func (l *myList) capacity() int { + return l.arrCapacity + } + + /* 访问元素 */ + func (l *myList) get(index int) int { + // 索引如果越界则抛出异常,下同 + if index < 0 || index >= l.arrSize { + panic("索引越界") + } + return l.arr[index] + } + + /* 更新元素 */ + func (l *myList) set(num, index int) { + if index < 0 || index >= l.arrSize { + panic("索引越界") + } + l.arr[index] = num + } + + /* 在尾部添加元素 */ + func (l *myList) add(num int) { + // 元素数量超出容量时,触发扩容机制 + if l.arrSize == l.arrCapacity { + l.extendCapacity() + } + l.arr[l.arrSize] = num + // 更新元素数量 + l.arrSize++ + } + + /* 在中间插入元素 */ + func (l *myList) insert(num, index int) { + if index < 0 || index >= l.arrSize { + panic("索引越界") + } + // 元素数量超出容量时,触发扩容机制 + if l.arrSize == l.arrCapacity { + l.extendCapacity() + } + // 将索引 index 以及之后的元素都向后移动一位 + for j := l.arrSize - 1; j >= index; j-- { + l.arr[j+1] = l.arr[j] + } + l.arr[index] = num + // 更新元素数量 + l.arrSize++ + } + + /* 删除元素 */ + func (l *myList) remove(index int) int { + if index < 0 || index >= l.arrSize { + panic("索引越界") + } + num := l.arr[index] + // 索引 i 之后的元素都向前移动一位 + for j := index; j < l.arrSize-1; j++ { + l.arr[j] = l.arr[j+1] + } + // 更新元素数量 + l.arrSize-- + // 返回被删除元素 + return num + } + + /* 列表扩容 */ + func (l *myList) extendCapacity() { + // 新建一个长度为原数组 extendRatio 倍的新数组,并将原数组拷贝到新数组 + l.arr = append(l.arr, make([]int, l.arrCapacity*(l.extendRatio-1))...) + // 更新列表容量 + l.arrCapacity = len(l.arr) + } + + /* 返回有效长度的列表 */ + func (l *myList) toArray() []int { + // 仅转换有效长度范围内的列表元素 + return l.arr[:l.arrSize] + } + ``` + +=== "Swift" + + ```swift title="my_list.swift" + /* 列表类 */ + class MyList { + private var arr: [Int] // 数组(存储列表元素) + private var _capacity = 10 // 列表容量 + private var _size = 0 // 列表长度(当前元素数量) + private let extendRatio = 2 // 每次列表扩容的倍数 + + /* 构造方法 */ + init() { + arr = Array(repeating: 0, count: _capacity) + } + + /* 获取列表长度(当前元素数量)*/ + func size() -> Int { + _size + } + + /* 获取列表容量 */ + func capacity() -> Int { + _capacity + } + + /* 访问元素 */ + func get(index: Int) -> Int { + // 索引如果越界则抛出错误,下同 + if index < 0 || index >= _size { + fatalError("索引越界") + } + return arr[index] + } + + /* 更新元素 */ + func set(index: Int, num: Int) { + if index < 0 || index >= _size { + fatalError("索引越界") + } + arr[index] = num + } + + /* 在尾部添加元素 */ + func add(num: Int) { + // 元素数量超出容量时,触发扩容机制 + if _size == _capacity { + extendCapacity() + } + arr[_size] = num + // 更新元素数量 + _size += 1 + } + + /* 在中间插入元素 */ + func insert(index: Int, num: Int) { + if index < 0 || index >= _size { + fatalError("索引越界") + } + // 元素数量超出容量时,触发扩容机制 + if _size == _capacity { + extendCapacity() + } + // 将索引 index 以及之后的元素都向后移动一位 + for j in sequence(first: _size - 1, next: { $0 >= index + 1 ? $0 - 1 : nil }) { + arr[j + 1] = arr[j] + } + arr[index] = num + // 更新元素数量 + _size += 1 + } + + /* 删除元素 */ + @discardableResult + func remove(index: Int) -> Int { + if index < 0 || index >= _size { + fatalError("索引越界") + } + let num = arr[index] + // 将索引 index 之后的元素都向前移动一位 + for j in index ..< (_size - 1) { + arr[j] = arr[j + 1] + } + // 更新元素数量 + _size -= 1 + // 返回被删除元素 + return num + } + + /* 列表扩容 */ + func extendCapacity() { + // 新建一个长度为原数组 extendRatio 倍的新数组,并将原数组拷贝到新数组 + arr = arr + Array(repeating: 0, count: _capacity * (extendRatio - 1)) + // 更新列表容量 + _capacity = arr.count + } + + /* 将列表转换为数组 */ + func toArray() -> [Int] { + var arr = Array(repeating: 0, count: _size) + for i in 0 ..< _size { + arr[i] = get(index: i) + } + return arr + } + } + ``` + +=== "JS" + + ```javascript title="my_list.js" + /* 列表类 */ + class MyList { + #arr = new Array(); // 数组(存储列表元素) + #capacity = 10; // 列表容量 + #size = 0; // 列表长度(当前元素数量) + #extendRatio = 2; // 每次列表扩容的倍数 + + /* 构造方法 */ + constructor() { + this.#arr = new Array(this.#capacity); + } + + /* 获取列表长度(当前元素数量)*/ + size() { + return this.#size; + } + + /* 获取列表容量 */ + capacity() { + return this.#capacity; + } + + /* 访问元素 */ + get(index) { + // 索引如果越界则抛出异常,下同 + if (index < 0 || index >= this.#size) throw new Error('索引越界'); + return this.#arr[index]; + } + + /* 更新元素 */ + set(index, num) { + if (index < 0 || index >= this.#size) throw new Error('索引越界'); + this.#arr[index] = num; + } + + /* 在尾部添加元素 */ + add(num) { + // 如果长度等于容量,则需要扩容 + if (this.#size === this.#capacity) { + this.extendCapacity(); + } + // 将新元素添加到列表尾部 + this.#arr[this.#size] = num; + this.#size++; + } + + /* 在中间插入元素 */ + insert(index, num) { + if (index < 0 || index >= this.#size) throw new Error('索引越界'); + // 元素数量超出容量时,触发扩容机制 + if (this.#size === this.#capacity) { + this.extendCapacity(); + } + // 将索引 index 以及之后的元素都向后移动一位 + for (let j = this.#size - 1; j >= index; j--) { + this.#arr[j + 1] = this.#arr[j]; + } + // 更新元素数量 + this.#arr[index] = num; + this.#size++; + } + + /* 删除元素 */ + remove(index) { + if (index < 0 || index >= this.#size) throw new Error('索引越界'); + let num = this.#arr[index]; + // 将索引 index 之后的元素都向前移动一位 + for (let j = index; j < this.#size - 1; j++) { + this.#arr[j] = this.#arr[j + 1]; + } + // 更新元素数量 + this.#size--; + // 返回被删除元素 + return num; + } + + /* 列表扩容 */ + extendCapacity() { + // 新建一个长度为原数组 extendRatio 倍的新数组,并将原数组拷贝到新数组 + this.#arr = this.#arr.concat( + new Array(this.capacity() * (this.#extendRatio - 1)) + ); + // 更新列表容量 + this.#capacity = this.#arr.length; + } + + /* 将列表转换为数组 */ + toArray() { + let size = this.size(); + // 仅转换有效长度范围内的列表元素 + const arr = new Array(size); + for (let i = 0; i < size; i++) { + arr[i] = this.get(i); + } + return arr; + } + } + ``` + +=== "TS" + + ```typescript title="my_list.ts" + /* 列表类 */ + class MyList { + private arr: Array; // 数组(存储列表元素) + private _capacity: number = 10; // 列表容量 + private _size: number = 0; // 列表长度(当前元素数量) + private extendRatio: number = 2; // 每次列表扩容的倍数 + + /* 构造方法 */ + constructor() { + this.arr = new Array(this._capacity); + } + + /* 获取列表长度(当前元素数量)*/ + public size(): number { + return this._size; + } + + /* 获取列表容量 */ + public capacity(): number { + return this._capacity; + } + + /* 访问元素 */ + public get(index: number): number { + // 索引如果越界则抛出异常,下同 + if (index < 0 || index >= this._size) throw new Error('索引越界'); + return this.arr[index]; + } + + /* 更新元素 */ + public set(index: number, num: number): void { + if (index < 0 || index >= this._size) throw new Error('索引越界'); + this.arr[index] = num; + } + + /* 在尾部添加元素 */ + public add(num: number): void { + // 如果长度等于容量,则需要扩容 + if (this._size === this._capacity) this.extendCapacity(); + // 将新元素添加到列表尾部 + this.arr[this._size] = num; + this._size++; + } + + /* 在中间插入元素 */ + public insert(index: number, num: number): void { + if (index < 0 || index >= this._size) throw new Error('索引越界'); + // 元素数量超出容量时,触发扩容机制 + if (this._size === this._capacity) { + this.extendCapacity(); + } + // 将索引 index 以及之后的元素都向后移动一位 + for (let j = this._size - 1; j >= index; j--) { + this.arr[j + 1] = this.arr[j]; + } + // 更新元素数量 + this.arr[index] = num; + this._size++; + } + + /* 删除元素 */ + public remove(index: number): number { + if (index < 0 || index >= this._size) throw new Error('索引越界'); + let num = this.arr[index]; + // 将索引 index 之后的元素都向前移动一位 + for (let j = index; j < this._size - 1; j++) { + this.arr[j] = this.arr[j + 1]; + } + // 更新元素数量 + this._size--; + // 返回被删除元素 + return num; + } + + /* 列表扩容 */ + public extendCapacity(): void { + // 新建一个长度为 size 的数组,并将原数组拷贝到新数组 + this.arr = this.arr.concat( + new Array(this.capacity() * (this.extendRatio - 1)) + ); + // 更新列表容量 + this._capacity = this.arr.length; + } + + /* 将列表转换为数组 */ + public toArray(): number[] { + let size = this.size(); + // 仅转换有效长度范围内的列表元素 + const arr = new Array(size); + for (let i = 0; i < size; i++) { + arr[i] = this.get(i); + } + return arr; + } + } + ``` + +=== "Dart" + + ```dart title="my_list.dart" + /* 列表类 */ + class MyList { + late List _arr; // 数组(存储列表元素) + int _capacity = 10; // 列表容量 + int _size = 0; // 列表长度(当前元素数量) + int _extendRatio = 2; // 每次列表扩容的倍数 + + /* 构造方法 */ + MyList() { + _arr = List.filled(_capacity, 0); + } + + /* 获取列表长度(当前元素数量)*/ + int size() => _size; + + /* 获取列表容量 */ + int capacity() => _capacity; + + /* 访问元素 */ + int get(int index) { + if (index >= _size) throw RangeError('索引越界'); + return _arr[index]; + } + + /* 更新元素 */ + void set(int index, int _num) { + if (index >= _size) throw RangeError('索引越界'); + _arr[index] = _num; + } + + /* 在尾部添加元素 */ + void add(int _num) { + // 元素数量超出容量时,触发扩容机制 + if (_size == _capacity) extendCapacity(); + _arr[_size] = _num; + // 更新元素数量 + _size++; + } + + /* 在中间插入元素 */ + void insert(int index, int _num) { + if (index >= _size) throw RangeError('索引越界'); + // 元素数量超出容量时,触发扩容机制 + if (_size == _capacity) extendCapacity(); + // 将索引 index 以及之后的元素都向后移动一位 + for (var j = _size - 1; j >= index; j--) { + _arr[j + 1] = _arr[j]; + } + _arr[index] = _num; + // 更新元素数量 + _size++; + } + + /* 删除元素 */ + int remove(int index) { + if (index >= _size) throw RangeError('索引越界'); + int _num = _arr[index]; + // 将索引 index 之后的元素都向前移动一位 + for (var j = index; j < _size - 1; j++) { + _arr[j] = _arr[j + 1]; + } + // 更新元素数量 + _size--; + // 返回被删除元素 + return _num; + } + + /* 列表扩容 */ + void extendCapacity() { + // 新建一个长度为原数组 _extendRatio 倍的新数组 + final _newNums = List.filled(_capacity * _extendRatio, 0); + // 将原数组拷贝到新数组 + List.copyRange(_newNums, 0, _arr); + // 更新 _arr 的引用 + _arr = _newNums; + // 更新列表容量 + _capacity = _arr.length; + } + + /* 将列表转换为数组 */ + List toArray() { + List arr = []; + for (var i = 0; i < _size; i++) { + arr.add(get(i)); + } + return arr; + } + } + ``` + +=== "Rust" + + ```rust title="my_list.rs" + /* 列表类 */ + #[allow(dead_code)] + struct MyList { + arr: Vec, // 数组(存储列表元素) + capacity: usize, // 列表容量 + size: usize, // 列表长度(当前元素数量) + extend_ratio: usize, // 每次列表扩容的倍数 + } + + #[allow(unused,unused_comparisons)] + impl MyList { + /* 构造方法 */ + pub fn new(capacity: usize) -> Self { + let mut vec = Vec::new(); + vec.resize(capacity, 0); + Self { + arr: vec, + capacity, + size: 0, + extend_ratio: 2, + } + } + + /* 获取列表长度(当前元素数量)*/ + pub fn size(&self) -> usize { + return self.size; + } + + /* 获取列表容量 */ + pub fn capacity(&self) -> usize { + return self.capacity; + } + + /* 访问元素 */ + pub fn get(&self, index: usize) -> i32 { + // 索引如果越界则抛出异常,下同 + if index >= self.size {panic!("索引越界")}; + return self.arr[index]; + } + + /* 更新元素 */ + pub fn set(&mut self, index: usize, num: i32) { + if index >= self.size {panic!("索引越界")}; + self.arr[index] = num; + } + + /* 在尾部添加元素 */ + pub fn add(&mut self, num: i32) { + // 元素数量超出容量时,触发扩容机制 + if self.size == self.capacity() { + self.extend_capacity(); + } + self.arr[self.size] = num; + // 更新元素数量 + self.size += 1; + } + + /* 在中间插入元素 */ + pub fn insert(&mut self, index: usize, num: i32) { + if index >= self.size() {panic!("索引越界")}; + // 元素数量超出容量时,触发扩容机制 + if self.size == self.capacity() { + self.extend_capacity(); + } + // 将索引 index 以及之后的元素都向后移动一位 + for j in (index..self.size).rev() { + self.arr[j + 1] = self.arr[j]; + } + self.arr[index] = num; + // 更新元素数量 + self.size += 1; + } + + /* 删除元素 */ + pub fn remove(&mut self, index: usize) -> i32 { + if index >= self.size() {panic!("索引越界")}; + let num = self.arr[index]; + // 将索引 index 之后的元素都向前移动一位 + for j in (index..self.size - 1) { + self.arr[j] = self.arr[j + 1]; + } + // 更新元素数量 + self.size -= 1; + // 返回被删除元素 + return num; + } + + /* 列表扩容 */ + pub fn extend_capacity(&mut self) { + // 新建一个长度为原数组 extend_ratio 倍的新数组,并将原数组拷贝到新数组 + let new_capacity = self.capacity * self.extend_ratio; + self.arr.resize(new_capacity, 0); + // 更新列表容量 + self.capacity = new_capacity; + } + + /* 将列表转换为数组 */ + pub fn to_array(&mut self) -> Vec { + // 仅转换有效长度范围内的列表元素 + let mut arr = Vec::new(); + for i in 0..self.size { + arr.push(self.get(i)); + } + arr + } + } + ``` + +=== "C" + + ```c title="my_list.c" + /* 列表类 */ + typedef struct { + int *arr; // 数组(存储列表元素) + int capacity; // 列表容量 + int size; // 列表大小 + int extendRatio; // 列表每次扩容的倍数 + } MyList; + + /* 构造函数 */ + MyList *newMyList() { + MyList *nums = malloc(sizeof(MyList)); + nums->capacity = 10; + nums->arr = malloc(sizeof(int) * nums->capacity); + nums->size = 0; + nums->extendRatio = 2; + return nums; + } + + /* 析构函数 */ + void delMyList(MyList *nums) { + free(nums->arr); + free(nums); + } + + /* 获取列表长度 */ + int size(MyList *nums) { + return nums->size; + } + + /* 获取列表容量 */ + int capacity(MyList *nums) { + return nums->capacity; + } + + /* 访问元素 */ + int get(MyList *nums, int index) { + assert(index >= 0 && index < nums->size); + return nums->arr[index]; + } + + /* 更新元素 */ + void set(MyList *nums, int index, int num) { + assert(index >= 0 && index < nums->size); + nums->arr[index] = num; + } + + /* 在尾部添加元素 */ + void add(MyList *nums, int num) { + if (size(nums) == capacity(nums)) { + extendCapacity(nums); // 扩容 + } + nums->arr[size(nums)] = num; + nums->size++; + } + + /* 在中间插入元素 */ + void insert(MyList *nums, int index, int num) { + assert(index >= 0 && index < size(nums)); + // 元素数量超出容量时,触发扩容机制 + if (size(nums) == capacity(nums)) { + extendCapacity(nums); // 扩容 + } + for (int i = size(nums); i > index; --i) { + nums->arr[i] = nums->arr[i - 1]; + } + nums->arr[index] = num; + nums->size++; + } + + /* 删除元素 */ + // 注意:stdio.h 占用了 remove 关键词 + int removeItem(MyList *nums, int index) { + assert(index >= 0 && index < size(nums)); + int num = nums->arr[index]; + for (int i = index; i < size(nums) - 1; i++) { + nums->arr[i] = nums->arr[i + 1]; + } + nums->size--; + return num; + } + + /* 列表扩容 */ + void extendCapacity(MyList *nums) { + // 先分配空间 + int newCapacity = capacity(nums) * nums->extendRatio; + int *extend = (int *)malloc(sizeof(int) * newCapacity); + int *temp = nums->arr; + + // 拷贝旧数据到新数据 + for (int i = 0; i < size(nums); i++) + extend[i] = nums->arr[i]; + + // 释放旧数据 + free(temp); + + // 更新新数据 + nums->arr = extend; + nums->capacity = newCapacity; + } + + /* 将列表转换为 Array 用于打印 */ + int *toArray(MyList *nums) { + return nums->arr; + } + ``` + +=== "Zig" + + ```zig title="my_list.zig" + // 列表类 + fn MyList(comptime T: type) type { + return struct { + const Self = @This(); + + arr: []T = undefined, // 数组(存储列表元素) + arrCapacity: usize = 10, // 列表容量 + numSize: usize = 0, // 列表长度(当前元素数量) + extendRatio: usize = 2, // 每次列表扩容的倍数 + mem_arena: ?std.heap.ArenaAllocator = null, + mem_allocator: std.mem.Allocator = undefined, // 内存分配器 + + // 构造函数(分配内存+初始化列表) + pub fn init(self: *Self, allocator: std.mem.Allocator) !void { + if (self.mem_arena == null) { + self.mem_arena = std.heap.ArenaAllocator.init(allocator); + self.mem_allocator = self.mem_arena.?.allocator(); + } + self.arr = try self.mem_allocator.alloc(T, self.arrCapacity); + @memset(self.arr, @as(T, 0)); + } + + // 析构函数(释放内存) + pub fn deinit(self: *Self) void { + if (self.mem_arena == null) return; + self.mem_arena.?.deinit(); + } + + // 获取列表长度(当前元素数量) + pub fn size(self: *Self) usize { + return self.numSize; + } + + // 获取列表容量 + pub fn capacity(self: *Self) usize { + return self.arrCapacity; + } + + // 访问元素 + pub fn get(self: *Self, index: usize) T { + // 索引如果越界则抛出异常,下同 + if (index < 0 or index >= self.size()) @panic("索引越界"); + return self.arr[index]; + } + + // 更新元素 + pub fn set(self: *Self, index: usize, num: T) void { + // 索引如果越界则抛出异常,下同 + if (index < 0 or index >= self.size()) @panic("索引越界"); + self.arr[index] = num; + } + + // 在尾部添加元素 + pub fn add(self: *Self, num: T) !void { + // 元素数量超出容量时,触发扩容机制 + if (self.size() == self.capacity()) try self.extendCapacity(); + self.arr[self.size()] = num; + // 更新元素数量 + self.numSize += 1; + } + + // 在中间插入元素 + pub fn insert(self: *Self, index: usize, num: T) !void { + if (index < 0 or index >= self.size()) @panic("索引越界"); + // 元素数量超出容量时,触发扩容机制 + if (self.size() == self.capacity()) try self.extendCapacity(); + // 将索引 index 以及之后的元素都向后移动一位 + var j = self.size() - 1; + while (j >= index) : (j -= 1) { + self.arr[j + 1] = self.arr[j]; + } + self.arr[index] = num; + // 更新元素数量 + self.numSize += 1; + } + + // 删除元素 + pub fn remove(self: *Self, index: usize) T { + if (index < 0 or index >= self.size()) @panic("索引越界"); + var num = self.arr[index]; + // 索引 i 之后的元素都向前移动一位 + var j = index; + while (j < self.size() - 1) : (j += 1) { + self.arr[j] = self.arr[j + 1]; + } + // 更新元素数量 + self.numSize -= 1; + // 返回被删除元素 + return num; + } + + // 列表扩容 + pub fn extendCapacity(self: *Self) !void { + // 新建一个长度为 size * extendRatio 的数组,并将原数组拷贝到新数组 + var newCapacity = self.capacity() * self.extendRatio; + var extend = try self.mem_allocator.alloc(T, newCapacity); + @memset(extend, @as(T, 0)); + // 将原数组中的所有元素复制到新数组 + std.mem.copy(T, extend, self.arr); + self.arr = extend; + // 更新列表容量 + self.arrCapacity = newCapacity; + } + + // 将列表转换为数组 + pub fn toArray(self: *Self) ![]T { + // 仅转换有效长度范围内的列表元素 + var arr = try self.mem_allocator.alloc(T, self.size()); + @memset(arr, @as(T, 0)); + for (arr, 0..) |*num, i| { + num.* = self.get(i); + } + return arr; + } + }; + } + ``` diff --git a/docs-en/chapter_array_and_linkedlist/ram_and_cache.md b/docs-en/chapter_array_and_linkedlist/ram_and_cache.md new file mode 100644 index 000000000..cab66850e --- /dev/null +++ b/docs-en/chapter_array_and_linkedlist/ram_and_cache.md @@ -0,0 +1,83 @@ +--- +comments: true +--- + +# 4.4   Memory and Cache * + +In the first two sections of this chapter, we explored arrays and linked lists, two fundamental and important data structures, representing "continuous storage" and "dispersed storage" respectively. + +In fact, **the physical structure largely determines the efficiency of a program's use of memory and cache**, which in turn affects the overall performance of the algorithm. + +## 4.4.1   Computer Storage Devices + +There are three types of storage devices in computers: "hard disk," "random-access memory (RAM)," and "cache memory." The following table shows their different roles and performance characteristics in computer systems. + +

Table 4-2   Computer Storage Devices

+ +
+ +| | Hard Disk | Memory | Cache | +| ---------- | -------------------------------------------------------------- | ------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------- | +| Usage | Long-term storage of data, including OS, programs, files, etc. | Temporary storage of currently running programs and data being processed | Stores frequently accessed data and instructions, reducing the number of CPU accesses to memory | +| Volatility | Data is not lost after power off | Data is lost after power off | Data is lost after power off | +| Capacity | Larger, TB level | Smaller, GB level | Very small, MB level | +| Speed | Slower, several hundred to thousands MB/s | Faster, several tens of GB/s | Very fast, several tens to hundreds of GB/s | +| Price | Cheaper, several cents to yuan / GB | More expensive, tens to hundreds of yuan / GB | Very expensive, priced with CPU | + +
+ +We can imagine the computer storage system as a pyramid structure shown in the Figure 4-9 . The storage devices closer to the top of the pyramid are faster, have smaller capacity, and are more costly. This multi-level design is not accidental, but the result of careful consideration by computer scientists and engineers. + +- **Hard disks are difficult to replace with memory**. Firstly, data in memory is lost after power off, making it unsuitable for long-term data storage; secondly, the cost of memory is dozens of times that of hard disks, making it difficult to popularize in the consumer market. +- **It is difficult for caches to have both large capacity and high speed**. As the capacity of L1, L2, L3 caches gradually increases, their physical size becomes larger, increasing the physical distance from the CPU core, leading to increased data transfer time and higher element access latency. Under current technology, a multi-level cache structure is the best balance between capacity, speed, and cost. + +![Computer Storage System](ram_and_cache.assets/storage_pyramid.png){ class="animation-figure" } + +

Figure 4-9   Computer Storage System

+ +!!! note + + The storage hierarchy of computers reflects a delicate balance between speed, capacity, and cost. In fact, this kind of trade-off is common in all industrial fields, requiring us to find the best balance between different advantages and limitations. + +Overall, **hard disks are used for long-term storage of large amounts of data, memory is used for temporary storage of data being processed during program execution, and cache is used to store frequently accessed data and instructions** to improve program execution efficiency. Together, they ensure the efficient operation of computer systems. + +As shown in the Figure 4-10 , during program execution, data is read from the hard disk into memory for CPU computation. The cache can be considered a part of the CPU, **smartly loading data from memory** to provide fast data access to the CPU, significantly enhancing program execution efficiency and reducing reliance on slower memory. + +![Data Flow Between Hard Disk, Memory, and Cache](ram_and_cache.assets/computer_storage_devices.png){ class="animation-figure" } + +

Figure 4-10   Data Flow Between Hard Disk, Memory, and Cache

+ +## 4.4.2   Memory Efficiency of Data Structures + +In terms of memory space utilization, arrays and linked lists have their advantages and limitations. + +On one hand, **memory is limited and cannot be shared by multiple programs**, so we hope that data structures can use space as efficiently as possible. The elements of an array are tightly packed without extra space for storing references (pointers) between linked list nodes, making them more space-efficient. However, arrays require allocating sufficient continuous memory space at once, which may lead to memory waste, and array expansion also requires additional time and space costs. In contrast, linked lists allocate and reclaim memory dynamically on a per-node basis, providing greater flexibility. + +On the other hand, during program execution, **as memory is repeatedly allocated and released, the degree of fragmentation of free memory becomes higher**, leading to reduced memory utilization efficiency. Arrays, due to their continuous storage method, are relatively less likely to cause memory fragmentation. In contrast, the elements of a linked list are dispersedly stored, and frequent insertion and deletion operations make memory fragmentation more likely. + +## 4.4.3   Cache Efficiency of Data Structures + +Although caches are much smaller in space capacity than memory, they are much faster and play a crucial role in program execution speed. Since the cache's capacity is limited and can only store a small part of frequently accessed data, when the CPU tries to access data not in the cache, a "cache miss" occurs, forcing the CPU to load the needed data from slower memory. + +Clearly, **the fewer the cache misses, the higher the CPU's data read-write efficiency**, and the better the program performance. The proportion of successful data retrieval from the cache by the CPU is called the "cache hit rate," a metric often used to measure cache efficiency. + +To achieve higher efficiency, caches adopt the following data loading mechanisms. + +- **Cache Lines**: Caches don't store and load data byte by byte but in units of cache lines. Compared to byte-by-byte transfer, the transmission of cache lines is more efficient. +- **Prefetch Mechanism**: Processors try to predict data access patterns (such as sequential access, fixed stride jumping access, etc.) and load data into the cache according to specific patterns to improve the hit rate. +- **Spatial Locality**: If data is accessed, data nearby is likely to be accessed in the near future. Therefore, when loading certain data, the cache also loads nearby data to improve the hit rate. +- **Temporal Locality**: If data is accessed, it's likely to be accessed again in the near future. Caches use this principle to retain recently accessed data to improve the hit rate. + +In fact, **arrays and linked lists have different cache utilization efficiencies**, mainly reflected in the following aspects. + +- **Occupied Space**: Linked list elements occupy more space than array elements, resulting in less effective data volume in the cache. +- **Cache Lines**: Linked list data is scattered throughout memory, and since caches load "by line," the proportion of loading invalid data is higher. +- **Prefetch Mechanism**: The data access pattern of arrays is more "predictable" than that of linked lists, meaning the system is more likely to guess which data will be loaded next. +- **Spatial Locality**: Arrays are stored in concentrated memory spaces, so the data near the loaded data is more likely to be accessed next. + +Overall, **arrays have a higher cache hit rate and are generally more efficient in operation than linked lists**. This makes data structures based on arrays more popular in solving algorithmic problems. + +It should be noted that **high cache efficiency does not mean that arrays are always better than linked lists**. Which data structure to choose in actual applications should be based on specific requirements. For example, both arrays and linked lists can implement the "stack" data structure (which will be detailed in the next chapter), but they are suitable for different scenarios. + +- In algorithm problems, we tend to choose stacks based on arrays because they provide higher operational efficiency and random access capabilities, with the only cost being the need to pre-allocate a certain amount of memory space for the array. +- If the data volume is very large, highly dynamic, and the expected size of the stack is difficult to estimate, then a stack based on a linked list is more appropriate. Linked lists can disperse a large amount of data in different parts of the memory and avoid the additional overhead of array expansion. diff --git a/docs-en/chapter_array_and_linkedlist/summary.md b/docs-en/chapter_array_and_linkedlist/summary.md new file mode 100644 index 000000000..865e4d2fe --- /dev/null +++ b/docs-en/chapter_array_and_linkedlist/summary.md @@ -0,0 +1,85 @@ +--- +comments: true +--- + +# 4.5   Summary + +### 1.   Key Review + +- Arrays and linked lists are two fundamental data structures, representing two storage methods in computer memory: continuous space storage and dispersed space storage. Their characteristics complement each other. +- Arrays support random access and use less memory; however, they are inefficient in inserting and deleting elements and have a fixed length after initialization. +- Linked lists implement efficient node insertion and deletion through changing references (pointers) and can flexibly adjust their length; however, they have lower node access efficiency and use more memory. +- Common types of linked lists include singly linked lists, circular linked lists, and doubly linked lists, each with its own application scenarios. +- Lists are ordered collections of elements that support addition, deletion, and modification, typically implemented based on dynamic arrays, retaining the advantages of arrays while allowing flexible length adjustment. +- The advent of lists significantly enhanced the practicality of arrays but may lead to some memory space wastage. +- During program execution, data is mainly stored in memory. Arrays provide higher memory space efficiency, while linked lists are more flexible in memory usage. +- Caches provide fast data access to CPUs through mechanisms like cache lines, prefetching, spatial locality, and temporal locality, significantly enhancing program execution efficiency. +- Due to higher cache hit rates, arrays are generally more efficient than linked lists. When choosing a data structure, the appropriate choice should be made based on specific needs and scenarios. + +### 2.   Q & A + +!!! question "Does storing arrays on the stack versus the heap affect time and space efficiency?" + + Arrays stored on both the stack and heap are stored in continuous memory spaces, and data operation efficiency is essentially the same. However, stacks and heaps have their own characteristics, leading to the following differences. + + 1. Allocation and release efficiency: The stack is a smaller memory block, allocated automatically by the compiler; the heap memory is relatively larger and can be dynamically allocated in the code, more prone to fragmentation. Therefore, allocation and release operations on the heap are generally slower than on the stack. + 2. Size limitation: Stack memory is relatively small, while the heap size is generally limited by available memory. Therefore, the heap is more suitable for storing large arrays. + 3. Flexibility: The size of arrays on the stack needs to be determined at compile-time, while the size of arrays on the heap can be dynamically determined at runtime. + +!!! question "Why do arrays require elements of the same type, while linked lists do not emphasize same-type elements?" + + Linked lists consist of nodes connected by references (pointers), and each node can store data of different types, such as int, double, string, object, etc. + + In contrast, array elements must be of the same type, allowing the calculation of offsets to access the corresponding element positions. For example, an array containing both int and long types, with single elements occupying 4 bytes and 8 bytes respectively, cannot use the following formula to calculate offsets, as the array contains elements of two different lengths. + + ```shell + # Element memory address = Array memory address + Element length * Element index + ``` + +!!! question "After deleting a node, is it necessary to set `P.next` to `None`?" + + Not modifying `P.next` is also acceptable. From the perspective of the linked list, traversing from the head node to the tail node will no longer encounter `P`. This means that node `P` has been effectively removed from the list, and where `P` points no longer affects the list. + + From a garbage collection perspective, for languages with automatic garbage collection mechanisms like Java, Python, and Go, whether node `P` is collected depends on whether there are still references pointing to it, not on the value of `P.next`. In languages like C and C++, we need to manually free the node's memory. + +!!! question "In linked lists, the time complexity for insertion and deletion operations is `O(1)`. But searching for the element before insertion or deletion takes `O(n)` time, so why isn't the time complexity `O(n)`?" + + If an element is searched first and then deleted, the time complexity is indeed `O(n)`. However, the `O(1)` advantage of linked lists in insertion and deletion can be realized in other applications. For example, in the implementation of double-ended queues using linked lists, we maintain pointers always pointing to the head and tail nodes, making each insertion and deletion operation `O(1)`. + +!!! question "In the image 'Linked List Definition and Storage Method', do the light blue storage nodes occupy a single memory address, or do they share half with the node value?" + + The diagram is just a qualitative representation; quantitative analysis depends on specific situations. + + - Different types of node values occupy different amounts of space, such as int, long, double, and object instances. + - The memory space occupied by pointer variables depends on the operating system and compilation environment used, usually 8 bytes or 4 bytes. + +!!! question "Is adding elements to the end of a list always `O(1)`?" + + If adding an element exceeds the list length, the list needs to be expanded first. The system will request a new memory block and move all elements of the original list over, in which case the time complexity becomes `O(n)`. + +!!! question "The statement 'The emergence of lists greatly improves the practicality of arrays, but may lead to some memory space wastage' - does this refer to the memory occupied by additional variables like capacity, length, and expansion multiplier?" + + The space wastage here mainly refers to two aspects: on the one hand, lists are set with an initial length, which we may not always need; on the other hand, to prevent frequent expansion, expansion usually multiplies by a coefficient, such as $\times 1.5$. This results in many empty slots, which we typically cannot fully fill. + +!!! question "In Python, after initializing `n = [1, 2, 3]`, the addresses of these 3 elements are contiguous, but initializing `m = [2, 1, 3]` shows that each element's `id` is not consecutive but identical to those in `n`. If the addresses of these elements are not contiguous, is `m` still an array?" + + If we replace list elements with linked list nodes `n = [n1, n2, n3, n4, n5]`, these 5 node objects are also typically dispersed throughout memory. However, given a list index, we can still access the node's memory address in `O(1)` time, thereby accessing the corresponding node. This is because the array stores references to the nodes, not the nodes themselves. + + Unlike many languages, in Python, numbers are also wrapped as objects, and lists store references to these numbers, not the numbers themselves. Therefore, we find that the same number in two arrays has the same `id`, and these numbers' memory addresses need not be contiguous. + +!!! question "The `std::list` in C++ STL has already implemented a doubly linked list, but it seems that some algorithm books don't directly use it. Is there any limitation?" + + On the one hand, we often prefer to use arrays to implement algorithms, only using linked lists when necessary, mainly for two reasons. + + - Space overhead: Since each element requires two additional pointers (one for the previous element and one for the next), `std::list` usually occupies more space than `std::vector`. + - Cache unfriendly: As the data is not stored continuously, `std::list` has a lower cache utilization rate. Generally, `std::vector` performs better. + + On the other hand, linked lists are primarily necessary for binary trees and graphs. Stacks and queues are often implemented using the programming language's `stack` and `queue` classes, rather than linked lists. + +!!! question "Does initializing a list `res = [0] * self.size()` result in each element of `res` referencing the same address?" + + No. However, this issue arises with two-dimensional arrays, for example, initializing a two-dimensional list `res = [[0] * self.size()]` would reference the same list `[0]` multiple times. + +!!! question "In deleting a node, is it necessary to break the reference to its successor node?" + + From the perspective of data structures and algorithms (problem-solving), it's okay not to break the link, as long as the program's logic is correct. From the perspective of standard libraries, breaking the link is safer and more logically clear. If the link is not broken, and the deleted node is not properly recycled, it could affect the recycling of the successor node's memory. diff --git a/docs-en/chapter_computational_complexity/index.md b/docs-en/chapter_computational_complexity/index.md index 5a1fd7fcb..db3800931 100644 --- a/docs-en/chapter_computational_complexity/index.md +++ b/docs-en/chapter_computational_complexity/index.md @@ -19,7 +19,7 @@ icon: material/timer-sand ## 本章内容 -- [2.1   Evaluating Algorithm Efficiency](https://www.hello-algo.com/chapter_computational_complexity/performance_evaluation/) +- [2.1   Algorithm Efficiency Assessment](https://www.hello-algo.com/chapter_computational_complexity/performance_evaluation/) - [2.2   Iteration and Recursion](https://www.hello-algo.com/chapter_computational_complexity/iteration_and_recursion/) - [2.3   Time Complexity](https://www.hello-algo.com/chapter_computational_complexity/time_complexity/) - [2.4   Space Complexity](https://www.hello-algo.com/chapter_computational_complexity/space_complexity/) diff --git a/docs-en/chapter_data_structure/basic_data_types.md b/docs-en/chapter_data_structure/basic_data_types.md new file mode 100644 index 000000000..c93d6f0d6 --- /dev/null +++ b/docs-en/chapter_data_structure/basic_data_types.md @@ -0,0 +1,172 @@ +--- +comments: true +--- + +# 3.2   Fundamental Data Types + +When we think of data in computers, we imagine various forms like text, images, videos, voice, 3D models, etc. Despite their different organizational forms, they are all composed of various fundamental data types. + +**Fundamental data types are those that the CPU can directly operate on** and are directly used in algorithms, mainly including the following. + +- Integer types: `byte`, `short`, `int`, `long`. +- Floating-point types: `float`, `double`, used to represent decimals. +- Character type: `char`, used to represent letters, punctuation, and even emojis in various languages. +- Boolean type: `bool`, used for "yes" or "no" decisions. + +**Fundamental data types are stored in computers in binary form**. One binary digit is equal to 1 bit. In most modern operating systems, 1 byte consists of 8 bits. + +The range of values for fundamental data types depends on the size of the space they occupy. Below, we take Java as an example. + +- The integer type `byte` occupies 1 byte = 8 bits and can represent $2^8$ numbers. +- The integer type `int` occupies 4 bytes = 32 bits and can represent $2^{32}$ numbers. + +The following table lists the space occupied, value range, and default values of various fundamental data types in Java. This table does not need to be memorized, but understood roughly and referred to when needed. + +

Table 3-1   Space Occupied and Value Range of Fundamental Data Types

+ +
+ +| Type | Symbol | Space Occupied | Minimum Value | Maximum Value | Default Value | +| ------- | -------- | -------------- | ------------------------ | ----------------------- | -------------- | +| Integer | `byte` | 1 byte | $-2^7$ ($-128$) | $2^7 - 1$ ($127$) | 0 | +| | `short` | 2 bytes | $-2^{15}$ | $2^{15} - 1$ | 0 | +| | `int` | 4 bytes | $-2^{31}$ | $2^{31} - 1$ | 0 | +| | `long` | 8 bytes | $-2^{63}$ | $2^{63} - 1$ | 0 | +| Float | `float` | 4 bytes | $1.175 \times 10^{-38}$ | $3.403 \times 10^{38}$ | $0.0\text{f}$ | +| | `double` | 8 bytes | $2.225 \times 10^{-308}$ | $1.798 \times 10^{308}$ | 0.0 | +| Char | `char` | 2 bytes | 0 | $2^{16} - 1$ | 0 | +| Boolean | `bool` | 1 byte | $\text{false}$ | $\text{true}$ | $\text{false}$ | + +
+ +Please note that the above table is specific to Java's fundamental data types. Each programming language has its own data type definitions, and their space occupied, value ranges, and default values may differ. + +- In Python, the integer type `int` can be of any size, limited only by available memory; the floating-point `float` is double precision 64-bit; there is no `char` type, as a single character is actually a string `str` of length 1. +- C and C++ do not specify the size of fundamental data types, which varies with implementation and platform. The above table follows the LP64 [data model](https://en.cppreference.com/w/cpp/language/types#Properties), used for Unix 64-bit operating systems including Linux and macOS. +- The size of `char` in C and C++ is 1 byte, while in most programming languages, it depends on the specific character encoding method, as detailed in the "Character Encoding" chapter. +- Even though representing a boolean only requires 1 bit (0 or 1), it is usually stored in memory as 1 byte. This is because modern computer CPUs typically use 1 byte as the smallest addressable memory unit. + +So, what is the connection between fundamental data types and data structures? We know that data structures are ways to organize and store data in computers. The focus here is on "structure" rather than "data". + +If we want to represent "a row of numbers", we naturally think of using an array. This is because the linear structure of an array can represent the adjacency and order of numbers, but whether the stored content is an integer `int`, a decimal `float`, or a character `char`, is irrelevant to the "data structure". + +In other words, **fundamental data types provide the "content type" of data, while data structures provide the "way of organizing" data**. For example, in the following code, we use the same data structure (array) to store and represent different fundamental data types, including `int`, `float`, `char`, `bool`, etc. + +=== "Python" + + ```python title="" + # Using various fundamental data types to initialize arrays + numbers: list[int] = [0] * 5 + decimals: list[float] = [0.0] * 5 + # Python's characters are actually strings of length 1 + characters: list[str] = ['0'] * 5 + bools: list[bool] = [False] * 5 + # Python's lists can freely store various fundamental data types and object references + data = [0, 0.0, 'a', False, ListNode(0)] + ``` + +=== "C++" + + ```cpp title="" + // Using various fundamental data types to initialize arrays + int numbers[5]; + float decimals[5]; + char characters[5]; + bool bools[5]; + ``` + +=== "Java" + + ```java title="" + // Using various fundamental data types to initialize arrays + int[] numbers = new int[5]; + float[] decimals = new float[5]; + char[] characters = new char[5]; + boolean[] bools = new boolean[5]; + ``` + +=== "C#" + + ```csharp title="" + // Using various fundamental data types to initialize arrays + int[] numbers = new int[5]; + float[] decimals = new float[5]; + char[] characters = new char[5]; + bool[] bools = new bool[5]; + ``` + +=== "Go" + + ```go title="" + // Using various fundamental data types to initialize arrays + var numbers = [5]int{} + var decimals = [5]float64{} + var characters = [5]byte{} + var bools = [5]bool{} + ``` + +=== "Swift" + + ```swift title="" + // Using various fundamental data types to initialize arrays + let numbers = Array(repeating: Int(), count: 5) + let decimals = Array(repeating: Double(), count: 5) + let characters = Array(repeating: Character("a"), count: 5) + let bools = Array(repeating: Bool(), count: 5) + ``` + +=== "JS" + + ```javascript title="" + // JavaScript's arrays can freely store various fundamental data types and objects + const array = [0, 0.0, 'a', false]; + ``` + +=== "TS" + + ```typescript title="" + // Using various fundamental data types to initialize arrays + const numbers: number[] = []; + const characters: string[] = []; + const bools: boolean[] = []; + ``` + +=== "Dart" + + ```dart title="" + // Using various fundamental data types to initialize arrays + List numbers = List.filled(5, 0); + List decimals = List.filled(5, 0.0); + List characters = List.filled(5, 'a'); + List bools = List.filled(5, false); + ``` + +=== "Rust" + + ```rust title="" + // Using various fundamental data types to initialize arrays + let numbers: Vec = vec![0; 5]; + let decimals: Vec = vec![0.0, 5]; + let characters: Vec = vec!['0'; 5]; + let bools: Vec = vec![false; 5]; + ``` + +=== "C" + + ```c title="" + // Using various fundamental data types to initialize arrays + int numbers[10]; + float decimals[10]; + char characters[10]; + bool bools[10]; + ``` + +=== "Zig" + + ```zig title="" + // Using various fundamental data types to initialize arrays + var numbers: [5]i32 = undefined; + var decimals: [5]f32 = undefined; + var characters: [5]u8 = undefined; + var bools: [5]bool = undefined; + ``` diff --git a/docs-en/chapter_data_structure/character_encoding.md b/docs-en/chapter_data_structure/character_encoding.md new file mode 100644 index 000000000..b3f23ddc1 --- /dev/null +++ b/docs-en/chapter_data_structure/character_encoding.md @@ -0,0 +1,97 @@ +--- +comments: true +--- + +# 3.4   Character Encoding * + +In computers, all data is stored in binary form, and the character `char` is no exception. To represent characters, we need to establish a "character set" that defines a one-to-one correspondence between each character and binary numbers. With a character set, computers can convert binary numbers to characters by looking up a table. + +## 3.4.1   ASCII Character Set + +The "ASCII code" is one of the earliest character sets, officially known as the American Standard Code for Information Interchange. It uses 7 binary digits (the lower 7 bits of a byte) to represent a character, allowing for a maximum of 128 different characters. As shown in the Figure 3-6 , ASCII includes uppercase and lowercase English letters, numbers 0 ~ 9, some punctuation marks, and some control characters (such as newline and tab). + +![ASCII Code](character_encoding.assets/ascii_table.png){ class="animation-figure" } + +

Figure 3-6   ASCII Code

+ +However, **ASCII can only represent English characters**. With the globalization of computers, a character set called "EASCII" was developed to represent more languages. It expands on the 7-bit basis of ASCII to 8 bits, enabling the representation of 256 different characters. + +Globally, a series of EASCII character sets for different regions emerged. The first 128 characters of these sets are uniformly ASCII, while the remaining 128 characters are defined differently to cater to various language requirements. + +## 3.4.2   GBK Character Set + +Later, it was found that **EASCII still could not meet the character requirements of many languages**. For instance, there are nearly a hundred thousand Chinese characters, with several thousand used in everyday life. In 1980, China's National Standards Bureau released the "GB2312" character set, which included 6763 Chinese characters, essentially meeting the computer processing needs for Chinese. + +However, GB2312 could not handle some rare and traditional characters. The "GBK" character set, an expansion of GB2312, includes a total of 21886 Chinese characters. In the GBK encoding scheme, ASCII characters are represented with one byte, while Chinese characters use two bytes. + +## 3.4.3   Unicode Character Set + +With the rapid development of computer technology and a plethora of character sets and encoding standards, numerous problems arose. On one hand, these character sets generally only defined characters for specific languages and could not function properly in multilingual environments. On the other hand, the existence of multiple character set standards for the same language caused garbled text when information was exchanged between computers using different encoding standards. + +Researchers of that era thought: **What if we introduced a comprehensive character set that included all languages and symbols worldwide, wouldn't that solve the problems of cross-language environments and garbled text?** Driven by this idea, the extensive character set, Unicode, was born. + +The Chinese name for "Unicode" is "统一码" (Unified Code), theoretically capable of accommodating over a million characters. It aims to incorporate characters from all over the world into a single set, providing a universal character set for processing and displaying various languages and reducing the issues of garbled text due to different encoding standards. + +Since its release in 1991, Unicode has continually expanded to include new languages and characters. As of September 2022, Unicode contains 149,186 characters, including characters, symbols, and even emojis from various languages. In the vast Unicode character set, commonly used characters occupy 2 bytes, while some rare characters take up 3 or even 4 bytes. + +Unicode is a universal character set that assigns a number (called a "code point") to each character, **but it does not specify how these character code points should be stored in a computer**. One might ask: When Unicode code points of varying lengths appear in a text, how does the system parse the characters? For example, given a 2-byte code, how does the system determine if it represents a single 2-byte character or two 1-byte characters? + +A straightforward solution to this problem is to store all characters as equal-length encodings. As shown in the Figure 3-7 , each character in "Hello" occupies 1 byte, while each character in "算法" (algorithm) occupies 2 bytes. We could encode all characters in "Hello 算法" as 2 bytes by padding the higher bits with zeros. This way, the system can parse a character every 2 bytes, recovering the content of the phrase. + +![Unicode Encoding Example](character_encoding.assets/unicode_hello_algo.png){ class="animation-figure" } + +

Figure 3-7   Unicode Encoding Example

+ +However, as ASCII has shown us, encoding English only requires 1 byte. Using the above approach would double the space occupied by English text compared to ASCII encoding, which is a waste of memory space. Therefore, a more efficient Unicode encoding method is needed. + +## 3.4.4   UTF-8 Encoding + +Currently, UTF-8 has become the most widely used Unicode encoding method internationally. **It is a variable-length encoding**, using 1 to 4 bytes to represent a character, depending on the complexity of the character. ASCII characters need only 1 byte, Latin and Greek letters require 2 bytes, commonly used Chinese characters need 3 bytes, and some other rare characters need 4 bytes. + +The encoding rules for UTF-8 are not complex and can be divided into two cases: + +- For 1-byte characters, set the highest bit to $0$, and the remaining 7 bits to the Unicode code point. Notably, ASCII characters occupy the first 128 code points in the Unicode set. This means that **UTF-8 encoding is backward compatible with ASCII**. This implies that UTF-8 can be used to parse ancient ASCII text. +- For characters of length $n$ bytes (where $n > 1$), set the highest $n$ bits of the first byte to $1$, and the $(n + 1)^{\text{th}}$ bit to $0$; starting from the second byte, set the highest 2 bits of each byte to $10$; the rest of the bits are used to fill the Unicode code point. + +The Figure 3-8 shows the UTF-8 encoding for "Hello算法". It can be observed that since the highest $n$ bits are set to $1$, the system can determine the length of the character as $n$ by counting the number of highest bits set to $1$. + +But why set the highest 2 bits of the remaining bytes to $10$? Actually, this $10$ serves as a kind of checksum. If the system starts parsing text from an incorrect byte, the $10$ at the beginning of the byte can help the system quickly detect an anomaly. + +The reason for using $10$ as a checksum is that, under UTF-8 encoding rules, it's impossible for the highest two bits of a character to be $10$. This can be proven by contradiction: If the highest two bits of a character are $10$, it indicates that the character's length is $1$, corresponding to ASCII. However, the highest bit of an ASCII character should be $0$, contradicting the assumption. + +![UTF-8 Encoding Example](character_encoding.assets/utf-8_hello_algo.png){ class="animation-figure" } + +

Figure 3-8   UTF-8 Encoding Example

+ +Apart from UTF-8, other common encoding methods include: + +- **UTF-16 Encoding**: Uses 2 or 4 bytes to represent a character. All ASCII characters and commonly used non-English characters are represented with 2 bytes; a few characters require 4 bytes. For 2-byte characters, the UTF-16 encoding is equal to the Unicode code point. +- **UTF-32 Encoding**: Every character uses 4 bytes. This means UTF-32 occupies more space than UTF-8 and UTF-16, especially for texts with a high proportion of ASCII characters. + +From the perspective of storage space, UTF-8 is highly efficient for representing English characters, requiring only 1 byte; UTF-16 might be more efficient for encoding some non-English characters (like Chinese), as it requires only 2 bytes, while UTF-8 might need 3 bytes. + +From a compatibility standpoint, UTF-8 is the most versatile, with many tools and libraries supporting UTF-8 as a priority. + +## 3.4.5   Character Encoding in Programming Languages + +In many classic programming languages, strings during program execution are encoded using fixed-length encodings like UTF-16 or UTF-32. This allows strings to be treated as arrays, offering several advantages: + +- **Random Access**: Strings encoded in UTF-16 can be accessed randomly with ease. For UTF-8, which is a variable-length encoding, locating the $i^{th}$ character requires traversing the string from the start to the $i^{th}$ position, taking $O(n)$ time. +- **Character Counting**: Similar to random access, counting the number of characters in a UTF-16 encoded string is an $O(1)$ operation. However, counting characters in a UTF-8 encoded string requires traversing the entire string. +- **String Operations**: Many string operations like splitting, concatenating, inserting, and deleting are easier on UTF-16 encoded strings. These operations generally require additional computation on UTF-8 encoded strings to ensure the validity of the UTF-8 encoding. + +The design of character encoding schemes in programming languages is an interesting topic involving various factors: + +- Java’s `String` type uses UTF-16 encoding, with each character occupying 2 bytes. This was based on the initial belief that 16 bits were sufficient to represent all possible characters, a judgment later proven incorrect. As the Unicode standard expanded beyond 16 bits, characters in Java may now be represented by a pair of 16-bit values, known as “surrogate pairs.” +- JavaScript and TypeScript use UTF-16 encoding for similar reasons as Java. When JavaScript was first introduced by Netscape in 1995, Unicode was still in its early stages, and 16-bit encoding was sufficient to represent all Unicode characters. +- C# uses UTF-16 encoding, largely because the .NET platform, designed by Microsoft, and many Microsoft technologies, including the Windows operating system, extensively use UTF-16 encoding. + +Due to the underestimation of character counts, these languages had to resort to using "surrogate pairs" to represent Unicode characters exceeding 16 bits. This approach has its drawbacks: strings containing surrogate pairs may have characters occupying 2 or 4 bytes, losing the advantage of fixed-length encoding, and handling surrogate pairs adds to the complexity and debugging difficulty of programming. + +Owing to these reasons, some programming languages have adopted different encoding schemes: + +- Python’s `str` type uses Unicode encoding with a flexible representation where the storage length of characters depends on the largest Unicode code point in the string. If all characters are ASCII, each character occupies 1 byte; if characters exceed ASCII but are within the Basic Multilingual Plane (BMP), each occupies 2 bytes; if characters exceed the BMP, each occupies 4 bytes. +- Go’s `string` type internally uses UTF-8 encoding. Go also provides the `rune` type for representing individual Unicode code points. +- Rust’s `str` and `String` types use UTF-8 encoding internally. Rust also offers the `char` type for individual Unicode code points. + +It’s important to note that the above discussion pertains to how strings are stored in programming languages, **which is a different issue from how strings are stored in files or transmitted over networks**. For file storage or network transmission, strings are usually encoded in UTF-8 format for optimal compatibility and space efficiency. diff --git a/docs-en/chapter_data_structure/classification_of_data_structure.md b/docs-en/chapter_data_structure/classification_of_data_structure.md index 4ec268b1d..766813b39 100644 --- a/docs-en/chapter_data_structure/classification_of_data_structure.md +++ b/docs-en/chapter_data_structure/classification_of_data_structure.md @@ -1,49 +1,58 @@ -# Classification Of Data Structures +--- +comments: true +--- -Common data structures include arrays, linked lists, stacks, queues, hash tables, trees, heaps, and graphs. They can be divided into two categories: logical structure and physical structure. +# 3.1   Classification of Data Structures -## Logical Structures: Linear And Non-linear +Common data structures include arrays, linked lists, stacks, queues, hash tables, trees, heaps, and graphs. They can be classified into two dimensions: "Logical Structure" and "Physical Structure". -**Logical structures reveal logical relationships between data elements**. In arrays and linked lists, data are arranged in sequential order, reflecting the linear relationship between data; while in trees, data are arranged hierarchically from the top down, showing the derived relationship between ancestors and descendants; and graphs are composed of nodes and edges, reflecting the complex network relationship. +## 3.1.1   Logical Structure: Linear and Non-Linear -As shown in the figure below, logical structures can further be divided into "linear data structure" and "non-linear data structure". Linear data structures are more intuitive, meaning that the data are arranged linearly in terms of logical relationships; non-linear data structures, on the other hand, are arranged non-linearly. +**The logical structure reveals the logical relationships between data elements**. In arrays and linked lists, data is arranged in a certain order, reflecting a linear relationship between them. In trees, data is arranged from top to bottom in layers, showing a "ancestor-descendant" hierarchical relationship. Graphs, consisting of nodes and edges, represent complex network relationships. -- **Linear data structures**: arrays, linked lists, stacks, queues, hash tables. -- **Nonlinear data structures**: trees, heaps, graphs, hash tables. +As shown in the Figure 3-1 , logical structures can be divided into two major categories: "Linear" and "Non-linear". Linear structures are more intuitive, indicating data is arranged linearly in logical relationships; non-linear structures, conversely, are arranged non-linearly. -![Linear and nonlinear data structures](classification_of_data_structure.assets/classification_logic_structure.png) +- **Linear Data Structures**: Arrays, Linked Lists, Stacks, Queues, Hash Tables. +- **Non-Linear Data Structures**: Trees, Heaps, Graphs, Hash Tables. -Non-linear data structures can be further divided into tree and graph structures. +![Linear and Non-Linear Data Structures](classification_of_data_structure.assets/classification_logic_structure.png){ class="animation-figure" } -- **Linear structures**: arrays, linked lists, queues, stacks, hash tables, with one-to-one sequential relationship between elements. -- **Tree structure**: tree, heap, hash table, with one-to-many relationship between elements. -- **Graph**: graph with many-to-many relationship between elements. +

Figure 3-1   Linear and Non-Linear Data Structures

-## Physical Structure: Continuous vs. Dispersed +Non-linear data structures can be further divided into tree structures and network structures. -**When an algorithm is running, the data being processed is stored in memory**. The figure below shows a computer memory module where each black square represents a memory space. We can think of the memory as a giant Excel sheet in which each cell can store data of a certain size. +- **Tree Structures**: Trees, Heaps, Hash Tables, where elements have one-to-many relationships. +- **Network Structures**: Graphs, where elements have many-to-many relationships. -**The system accesses the data at the target location by means of a memory address**. As shown in the figure below, the computer assigns a unique identifier to each cell in the table according to specific rules, ensuring that each memory space has a unique memory address. With these addresses, the program can access the data in memory. +## 3.1.2   Physical Structure: Contiguous and Dispersed -![memory_strip, memory_space, memory_address](classification_of_data_structure.assets/computer_memory_location.png) +**When an algorithm program runs, the data being processed is mainly stored in memory**. The following figure shows a computer memory stick, each black block containing a memory space. We can imagine memory as a huge Excel spreadsheet, where each cell can store a certain amount of data. + +**The system accesses data at the target location through memory addresses**. As shown in the Figure 3-2 , the computer allocates numbers to each cell in the table according to specific rules, ensuring each memory space has a unique memory address. With these addresses, programs can access data in memory. + +![Memory Stick, Memory Spaces, Memory Addresses](classification_of_data_structure.assets/computer_memory_location.png){ class="animation-figure" } + +

Figure 3-2   Memory Stick, Memory Spaces, Memory Addresses

!!! tip - It is worth noting that comparing memory to the Excel sheet is a simplified analogy. The actual memory working mechanism is more complicated, involving the concepts of address, space, memory management, cache mechanism, virtual and physical memory. + It's worth noting that comparing memory to an Excel spreadsheet is a simplified analogy. The actual working mechanism of memory is more complex, involving concepts like address space, memory management, cache mechanisms, virtual memory, and physical memory. -Memory is a shared resource for all programs, and when a block of memory is occupied by one program, it cannot be used by other programs at the same time. **Therefore, considering memory resources is crucial in designing data structures and algorithms**. For example, the algorithm's peak memory usage should not exceed the remaining free memory of the system; if there is a lack of contiguous memory blocks, then the data structure chosen must be able to be stored in non-contiguous memory blocks. +Memory is a shared resource for all programs. When a block of memory is occupied by one program, it cannot be used by others simultaneously. **Therefore, memory resources are an important consideration in the design of data structures and algorithms**. For example, the peak memory usage of an algorithm should not exceed the system's remaining free memory. If there is a lack of contiguous large memory spaces, the chosen data structure must be able to store data in dispersed memory spaces. -As shown in the figure below, **Physical structure reflects the way data is stored in computer memory and it can be divided into consecutive space storage (arrays) and distributed space storage (linked lists)**. The physical structure determines how data is accessed, updated, added, deleted, etc. Logical and physical structure complement each other in terms of time efficiency and space efficiency. +As shown in the Figure 3-3 , **the physical structure reflects how data is stored in computer memory**, which can be divided into contiguous space storage (arrays) and dispersed space storage (linked lists). The physical structure determines from the bottom level how data is accessed, updated, added, or deleted. Both types of physical structures exhibit complementary characteristics in terms of time efficiency and space efficiency. -![continuous vs. decentralized spatial storage](classification_of_data_structure.assets/classification_phisical_structure.png) +![Contiguous Space Storage and Dispersed Space Storage](classification_of_data_structure.assets/classification_phisical_structure.png){ class="animation-figure" } -**It is worth stating that all data structures are implemented based on arrays, linked lists, or a combination of the two**. For example, stacks and queues can be implemented using both arrays and linked lists; and implementations of hash tables may contain both arrays and linked lists. +

Figure 3-3   Contiguous Space Storage and Dispersed Space Storage

-- **Array-based structures**: stacks, queues, hash tables, trees, heaps, graphs, matrices, tensors (arrays of dimension $\geq 3$), and so on. -- **Linked list-based structures**: stacks, queues, hash tables, trees, heaps, graphs, etc. +It's important to note that **all data structures are implemented based on arrays, linked lists, or a combination of both**. For example, stacks and queues can be implemented using either arrays or linked lists; while hash tables may include both arrays and linked lists. -Data structures based on arrays are also known as "static data structures", which means that such structures' length remains constant after initialization. In contrast, data structures based on linked lists are called "dynamic data structures", meaning that their length can be adjusted during program execution after initialization. +- **Array-based Implementations**: Stacks, Queues, Hash Tables, Trees, Heaps, Graphs, Matrices, Tensors (arrays with dimensions $\geq 3$). +- **Linked List-based Implementations**: Stacks, Queues, Hash Tables, Trees, Heaps, Graphs, etc. + +Data structures implemented based on arrays are also called “Static Data Structures,” meaning their length cannot be changed after initialization. Conversely, those based on linked lists are called “Dynamic Data Structures,” which can still adjust their size during program execution. !!! tip - If you find it difficult to understand the physical structure, it is recommended that you read the next chapter, "Arrays and Linked Lists," before reviewing this section. + If you find it difficult to understand the physical structure, it's recommended to read the next chapter first and then revisit this section. diff --git a/docs-en/chapter_data_structure/index.md b/docs-en/chapter_data_structure/index.md index 147eee85c..7966e7227 100644 --- a/docs-en/chapter_data_structure/index.md +++ b/docs-en/chapter_data_structure/index.md @@ -1,13 +1,26 @@ -# Data Structure +--- +comments: true +icon: material/shape-outline +--- + +# Chapter 3.   Data Structures
-![data structure](../assets/covers/chapter_data_structure.jpg) +![Data Structures](../assets/covers/chapter_data_structure.jpg){ class="cover-image" }
!!! abstract - Data structures resemble a stable and diverse framework. - - They serve as a blueprint for organizing data orderly, enabling algorithms to come to life upon this foundation. + Data structures serve as a robust and diverse framework. + + They offer a blueprint for the orderly organization of data, upon which algorithms come to life. + +## 本章内容 + +- [3.1   Classification of Data Structures](https://www.hello-algo.com/chapter_data_structure/classification_of_data_structure/) +- [3.2   Fundamental Data Types](https://www.hello-algo.com/chapter_data_structure/basic_data_types/) +- [3.3   Number Encoding *](https://www.hello-algo.com/chapter_data_structure/number_encoding/) +- [3.4   Character Encoding *](https://www.hello-algo.com/chapter_data_structure/character_encoding/) +- [3.5   Summary](https://www.hello-algo.com/chapter_data_structure/summary/) diff --git a/docs-en/chapter_data_structure/number_encoding.md b/docs-en/chapter_data_structure/number_encoding.md new file mode 100644 index 000000000..7022b3f63 --- /dev/null +++ b/docs-en/chapter_data_structure/number_encoding.md @@ -0,0 +1,162 @@ +--- +comments: true +--- + +# 3.3   Number Encoding * + +!!! note + + In this book, chapters marked with an * symbol are optional reads. If you are short on time or find them challenging, you may skip these initially and return to them after completing the essential chapters. + +## 3.3.1   Integer Encoding + +In the table from the previous section, we noticed that all integer types can represent one more negative number than positive numbers, such as the `byte` range of $[-128, 127]$. This phenomenon, somewhat counterintuitive, is rooted in the concepts of sign-magnitude, one's complement, and two's complement encoding. + +Firstly, it's important to note that **numbers are stored in computers using the two's complement form**. Before analyzing why this is the case, let's define these three encoding methods: + +- **Sign-magnitude**: The highest bit of a binary representation of a number is considered the sign bit, where $0$ represents a positive number and $1$ represents a negative number. The remaining bits represent the value of the number. +- **One's complement**: The one's complement of a positive number is the same as its sign-magnitude. For negative numbers, it's obtained by inverting all bits except the sign bit. +- **Two's complement**: The two's complement of a positive number is the same as its sign-magnitude. For negative numbers, it's obtained by adding $1$ to their one's complement. + +The following diagram illustrates the conversions among sign-magnitude, one's complement, and two's complement: + +![Conversions between Sign-Magnitude, One's Complement, and Two's Complement](number_encoding.assets/1s_2s_complement.png){ class="animation-figure" } + +

Figure 3-4   Conversions between Sign-Magnitude, One's Complement, and Two's Complement

+ +Although sign-magnitude is the most intuitive, it has limitations. For one, **negative numbers in sign-magnitude cannot be directly used in calculations**. For example, in sign-magnitude, calculating $1 + (-2)$ results in $-3$, which is incorrect. + +$$ +\begin{aligned} +& 1 + (-2) \newline +& \rightarrow 0000 \; 0001 + 1000 \; 0010 \newline +& = 1000 \; 0011 \newline +& \rightarrow -3 +\end{aligned} +$$ + +To address this, computers introduced the **one's complement**. If we convert to one's complement and calculate $1 + (-2)$, then convert the result back to sign-magnitude, we get the correct result of $-1$. + +$$ +\begin{aligned} +& 1 + (-2) \newline +& \rightarrow 0000 \; 0001 \; \text{(Sign-magnitude)} + 1000 \; 0010 \; \text{(Sign-magnitude)} \newline +& = 0000 \; 0001 \; \text{(One's complement)} + 1111 \; 1101 \; \text{(One's complement)} \newline +& = 1111 \; 1110 \; \text{(One's complement)} \newline +& = 1000 \; 0001 \; \text{(Sign-magnitude)} \newline +& \rightarrow -1 +\end{aligned} +$$ + +Additionally, **there are two representations of zero in sign-magnitude**: $+0$ and $-0$. This means two different binary encodings for zero, which could lead to ambiguity. For example, in conditional checks, not differentiating between positive and negative zero might result in incorrect outcomes. Addressing this ambiguity would require additional checks, potentially reducing computational efficiency. + +$$ +\begin{aligned} ++0 & \rightarrow 0000 \; 0000 \newline +-0 & \rightarrow 1000 \; 0000 +\end{aligned} +$$ + +Like sign-magnitude, one's complement also suffers from the positive and negative zero ambiguity. Therefore, computers further introduced the **two's complement**. Let's observe the conversion process for negative zero in sign-magnitude, one's complement, and two's complement: + +$$ +\begin{aligned} +-0 \rightarrow \; & 1000 \; 0000 \; \text{(Sign-magnitude)} \newline += \; & 1111 \; 1111 \; \text{(One's complement)} \newline += 1 \; & 0000 \; 0000 \; \text{(Two's complement)} \newline +\end{aligned} +$$ + +Adding $1$ to the one's complement of negative zero produces a carry, but with `byte` length being only 8 bits, the carried-over $1$ to the 9th bit is discarded. Therefore, **the two's complement of negative zero is $0000 \; 0000$**, the same as positive zero, thus resolving the ambiguity. + +One last puzzle is the $[-128, 127]$ range for `byte`, with an additional negative number, $-128$. We observe that for the interval $[-127, +127]$, all integers have corresponding sign-magnitude, one's complement, and two's complement, and these can be converted between each other. + +However, **the two's complement $1000 \; 0000$ is an exception without a corresponding sign-magnitude**. According to the conversion method, its sign-magnitude would be $0000 \; 0000$, which is a contradiction since this represents zero, and its two's complement should be itself. Computers designate this special two's complement $1000 \; 0000$ as representing $-128$. In fact, the calculation of $(-1) + (-127)$ in two's complement results in $-128$. + +$$ +\begin{aligned} +& (-127) + (-1) \newline +& \rightarrow 1111 \; 1111 \; \text{(Sign-magnitude)} + 1000 \; 0001 \; \text{(Sign-magnitude)} \newline +& = 1000 \; 0000 \; \text{(One's complement)} + 1111 \; 1110 \; \text{(One's complement)} \newline +& = 1000 \; 0001 \; \text{(Two's complement)} + 1111 \; 1111 \; \text{(Two's complement)} \newline +& = 1000 \; 0000 \; \text{(Two's complement)} \newline +& \rightarrow -128 +\end{aligned} +$$ + +As you might have noticed, all these calculations are additions, hinting at an important fact: **computers' internal hardware circuits are primarily designed around addition operations**. This is because addition is simpler to implement in hardware compared to other operations like multiplication, division, and subtraction, allowing for easier parallelization and faster computation. + +It's important to note that this doesn't mean computers can only perform addition. **By combining addition with basic logical operations, computers can execute a variety of other mathematical operations**. For example, the subtraction $a - b$ can be translated into $a + (-b)$; multiplication and division can be translated into multiple additions or subtractions. + +We can now summarize the reason for using two's complement in computers: with two's complement representation, computers can use the same circuits and operations to handle both positive and negative number addition, eliminating the need for special hardware circuits for subtraction and avoiding the ambiguity of positive and negative zero. This greatly simplifies hardware design and enhances computational efficiency. + +The design of two's complement is quite ingenious, and due to space constraints, we'll stop here. Interested readers are encouraged to explore further. + +## 3.3.2   Floating-Point Number Encoding + +You might have noticed something intriguing: despite having the same length of 4 bytes, why does a `float` have a much larger range of values compared to an `int`? This seems counterintuitive, as one would expect the range to shrink for `float` since it needs to represent fractions. + +In fact, **this is due to the different representation method used by floating-point numbers (`float`)**. Let's consider a 32-bit binary number as: + +$$ +b_{31} b_{30} b_{29} \ldots b_2 b_1 b_0 +$$ + +According to the IEEE 754 standard, a 32-bit `float` consists of the following three parts: + +- Sign bit $\mathrm{S}$: Occupies 1 bit, corresponding to $b_{31}$. +- Exponent bit $\mathrm{E}$: Occupies 8 bits, corresponding to $b_{30} b_{29} \ldots b_{23}$. +- Fraction bit $\mathrm{N}$: Occupies 23 bits, corresponding to $b_{22} b_{21} \ldots b_0$. + +The value of a binary `float` number is calculated as: + +$$ +\text{val} = (-1)^{b_{31}} \times 2^{\left(b_{30} b_{29} \ldots b_{23}\right)_2 - 127} \times \left(1 . b_{22} b_{21} \ldots b_0\right)_2 +$$ + +Converted to a decimal formula, this becomes: + +$$ +\text{val} = (-1)^{\mathrm{S}} \times 2^{\mathrm{E} - 127} \times (1 + \mathrm{N}) +$$ + +The range of each component is: + +$$ +\begin{aligned} +\mathrm{S} \in & \{ 0, 1\}, \quad \mathrm{E} \in \{ 1, 2, \dots, 254 \} \newline +(1 + \mathrm{N}) = & (1 + \sum_{i=1}^{23} b_{23-i} \times 2^{-i}) \subset [1, 2 - 2^{-23}] +\end{aligned} +$$ + +![Example Calculation of a float in IEEE 754 Standard](number_encoding.assets/ieee_754_float.png){ class="animation-figure" } + +

Figure 3-5   Example Calculation of a float in IEEE 754 Standard

+ +Observing the diagram, given an example data $\mathrm{S} = 0$, $\mathrm{E} = 124$, $\mathrm{N} = 2^{-2} + 2^{-3} = 0.375$, we have: + +$$ +\text{val} = (-1)^0 \times 2^{124 - 127} \times (1 + 0.375) = 0.171875 +$$ + +Now we can answer the initial question: **The representation of `float` includes an exponent bit, leading to a much larger range than `int`**. Based on the above calculation, the maximum positive number representable by `float` is approximately $2^{254 - 127} \times (2 - 2^{-23}) \approx 3.4 \times 10^{38}$, and the minimum negative number is obtained by switching the sign bit. + +**However, the trade-off for `float`'s expanded range is a sacrifice in precision**. The integer type `int` uses all 32 bits to represent the number, with values evenly distributed; but due to the exponent bit, the larger the value of a `float`, the greater the difference between adjacent numbers. + +As shown in the Table 3-2 , exponent bits $E = 0$ and $E = 255$ have special meanings, **used to represent zero, infinity, $\mathrm{NaN}$, etc.** + +

Table 3-2   Meaning of Exponent Bits

+ +
+ +| Exponent Bit E | Fraction Bit $\mathrm{N} = 0$ | Fraction Bit $\mathrm{N} \ne 0$ | Calculation Formula | +| ------------------ | ----------------------------- | ------------------------------- | ---------------------------------------------------------------------- | +| $0$ | $\pm 0$ | Subnormal Numbers | $(-1)^{\mathrm{S}} \times 2^{-126} \times (0.\mathrm{N})$ | +| $1, 2, \dots, 254$ | Normal Numbers | Normal Numbers | $(-1)^{\mathrm{S}} \times 2^{(\mathrm{E} -127)} \times (1.\mathrm{N})$ | +| $255$ | $\pm \infty$ | $\mathrm{NaN}$ | | + +
+ +It's worth noting that subnormal numbers significantly improve the precision of floating-point numbers. The smallest positive normal number is $2^{-126}$, and the smallest positive subnormal number is $2^{-126} \times 2^{-23}$. + +Double-precision `double` also uses a similar representation method to `float`, which is not elaborated here for brevity. diff --git a/docs-en/chapter_data_structure/summary.md b/docs-en/chapter_data_structure/summary.md new file mode 100644 index 000000000..1eec400ed --- /dev/null +++ b/docs-en/chapter_data_structure/summary.md @@ -0,0 +1,37 @@ +--- +comments: true +--- + +# 3.5   Summary + +### 1.   Key Review + +- Data structures can be categorized from two perspectives: logical structure and physical structure. Logical structure describes the logical relationships between data elements, while physical structure describes how data is stored in computer memory. +- Common logical structures include linear, tree-like, and network structures. We generally classify data structures into linear (arrays, linked lists, stacks, queues) and non-linear (trees, graphs, heaps) based on their logical structure. The implementation of hash tables may involve both linear and non-linear data structures. +- When a program runs, data is stored in computer memory. Each memory space has a corresponding memory address, and the program accesses data through these addresses. +- Physical structures are primarily divided into contiguous space storage (arrays) and dispersed space storage (linked lists). All data structures are implemented using arrays, linked lists, or a combination of both. +- Basic data types in computers include integers (`byte`, `short`, `int`, `long`), floating-point numbers (`float`, `double`), characters (`char`), and booleans (`boolean`). Their range depends on the size of the space occupied and the representation method. +- Original code, complement code, and two's complement code are three methods of encoding numbers in computers, and they can be converted into each other. The highest bit of the original code of an integer is the sign bit, and the remaining bits represent the value of the number. +- Integers are stored in computers in the form of two's complement. In this representation, the computer can treat the addition of positive and negative numbers uniformly, without the need for special hardware circuits for subtraction, and there is no ambiguity of positive and negative zero. +- The encoding of floating-point numbers consists of 1 sign bit, 8 exponent bits, and 23 fraction bits. Due to the presence of the exponent bit, the range of floating-point numbers is much greater than that of integers, but at the cost of sacrificing precision. +- ASCII is the earliest English character set, 1 byte in length, and includes 127 characters. The GBK character set is a commonly used Chinese character set, including more than 20,000 Chinese characters. Unicode strives to provide a complete character set standard, including characters from various languages worldwide, thus solving the problem of garbled characters caused by inconsistent character encoding methods. +- UTF-8 is the most popular Unicode encoding method, with excellent universality. It is a variable-length encoding method with good scalability and effectively improves the efficiency of space usage. UTF-16 and UTF-32 are fixed-length encoding methods. When encoding Chinese characters, UTF-16 occupies less space than UTF-8. Programming languages like Java and C# use UTF-16 encoding by default. + +### 2.   Q & A + +!!! question "Why does a hash table contain both linear and non-linear data structures?" + + The underlying structure of a hash table is an array. To resolve hash collisions, we may use "chaining": each bucket in the array points to a linked list, which, when exceeding a certain threshold, might be transformed into a tree (usually a red-black tree). + From a storage perspective, the foundation of a hash table is an array, where each bucket slot might contain a value, a linked list, or a tree. Therefore, hash tables may contain both linear data structures (arrays, linked lists) and non-linear data structures (trees). + +!!! question "Is the length of the `char` type 1 byte?" + + The length of the `char` type is determined by the encoding method used by the programming language. For example, Java, JavaScript, TypeScript, and C# all use UTF-16 encoding (to save Unicode code points), so the length of the char type is 2 bytes. + +!!! question "Is there ambiguity in calling data structures based on arrays 'static data structures'? Because operations like push and pop on stacks are 'dynamic.'" + + While stacks indeed allow for dynamic data operations, the data structure itself remains "static" (with unchangeable length). Even though data structures based on arrays can dynamically add or remove elements, their capacity is fixed. If the data volume exceeds the pre-allocated size, a new, larger array needs to be created, and the contents of the old array copied into it. + +!!! question "When building stacks (queues) without specifying their size, why are they considered 'static data structures'?" + + In high-level programming languages, we don't need to manually specify the initial capacity of stacks (queues); this task is automatically handled internally by the class. For example, the initial capacity of Java's ArrayList is usually 10. Furthermore, the expansion operation is also implemented automatically. See the subsequent "List" chapter for details. diff --git a/docs-en/chapter_introduction/index.md b/docs-en/chapter_introduction/index.md index f7c583613..87736a1cd 100644 --- a/docs-en/chapter_introduction/index.md +++ b/docs-en/chapter_introduction/index.md @@ -19,6 +19,6 @@ icon: material/calculator-variant-outline ## 本章内容 -- [1.1   Algorithms Are Everywhere](https://www.hello-algo.com/chapter_introduction/algorithms_are_everywhere/) -- [1.2   What Is Algorithms](https://www.hello-algo.com/chapter_introduction/what_is_dsa/) +- [1.1   Algorithms are Everywhere](https://www.hello-algo.com/chapter_introduction/algorithms_are_everywhere/) +- [1.2   What is an Algorithm](https://www.hello-algo.com/chapter_introduction/what_is_dsa/) - [1.3   Summary](https://www.hello-algo.com/chapter_introduction/summary/) diff --git a/docs-en/chapter_preface/index.md b/docs-en/chapter_preface/index.md index 52239826f..1f7ef8679 100644 --- a/docs-en/chapter_preface/index.md +++ b/docs-en/chapter_preface/index.md @@ -19,6 +19,6 @@ icon: material/book-open-outline ## 本章内容 -- [0.1   The Book](https://www.hello-algo.com/chapter_preface/about_the_book/) +- [0.1   About This Book](https://www.hello-algo.com/chapter_preface/about_the_book/) - [0.2   How to Read](https://www.hello-algo.com/chapter_preface/suggestions/) - [0.3   Summary](https://www.hello-algo.com/chapter_preface/summary/) diff --git a/docs-en/chapter_preface/suggestions.md b/docs-en/chapter_preface/suggestions.md index 762633e18..dbe7b8e23 100644 --- a/docs-en/chapter_preface/suggestions.md +++ b/docs-en/chapter_preface/suggestions.md @@ -2,7 +2,7 @@ comments: true --- -# 0.2   How To Read +# 0.2   How to Read !!! tip diff --git a/docs/chapter_data_structure/character_encoding.md b/docs/chapter_data_structure/character_encoding.md index 8fff382e0..a86876558 100644 --- a/docs/chapter_data_structure/character_encoding.md +++ b/docs/chapter_data_structure/character_encoding.md @@ -92,6 +92,6 @@ UTF-8 的编码规则并不复杂,分为以下两种情况。 - Python 中的 `str` 使用 Unicode 编码,并采用一种灵活的字符串表示,存储的字符长度取决于字符串中最大的 Unicode 码点。若字符串中全部是 ASCII 字符,则每个字符占用 1 字节;如果有字符超出了 ASCII 范围,但全部在基本多语言平面(BMP)内,则每个字符占用 2 字节;如果有超出 BMP 的字符,则每个字符占用 4 字节。 - Go 语言的 `string` 类型在内部使用 UTF-8 编码。Go 语言还提供了 `rune` 类型,它用于表示单个 Unicode 码点。 -- Rust 语言的 str 和 String 类型在内部使用 UTF-8 编码。Rust 也提供了 `char` 类型,用于表示单个 Unicode 码点。 +- Rust 语言的 `str` 和 `String` 类型在内部使用 UTF-8 编码。Rust 也提供了 `char` 类型,用于表示单个 Unicode 码点。 需要注意的是,以上讨论的都是字符串在编程语言中的存储方式,**这和字符串如何在文件中存储或在网络中传输是不同的问题**。在文件存储或网络传输中,我们通常会将字符串编码为 UTF-8 格式,以达到最优的兼容性和空间效率。 diff --git a/docs/chapter_searching/binary_search.md b/docs/chapter_searching/binary_search.md index a804aa166..b08ef149d 100755 --- a/docs/chapter_searching/binary_search.md +++ b/docs/chapter_searching/binary_search.md @@ -569,7 +569,7 @@ comments: true if nums[m as usize] < target { // 此情况说明 target 在区间 [m+1, j) 中 i = m + 1; } else if nums[m as usize] > target { // 此情况说明 target 在区间 [i, m) 中 - j = m - 1; + j = m; } else { // 找到目标元素,返回其索引 return m; }