Merge pull request #138 from youyun/english

issue 137 translation of think_like_computer/前缀和技巧.md
This commit is contained in:
labuladong
2020-03-04 08:44:46 +08:00
committed by GitHub
17 changed files with 349 additions and 355 deletions

View File

Before

Width:  |  Height:  |  Size: 32 KiB

After

Width:  |  Height:  |  Size: 32 KiB

View File

Before

Width:  |  Height:  |  Size: 8.6 KiB

After

Width:  |  Height:  |  Size: 8.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 64 KiB

View File

Before

Width:  |  Height:  |  Size: 8.3 KiB

After

Width:  |  Height:  |  Size: 8.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 20 KiB

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 12 KiB

After

Width:  |  Height:  |  Size: 11 KiB

View File

Before

Width:  |  Height:  |  Size: 166 KiB

After

Width:  |  Height:  |  Size: 166 KiB

BIN
pictures/prefix_sum/2.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 160 KiB

View File

Before

Width:  |  Height:  |  Size: 47 KiB

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 100 KiB

View File

@ -1,221 +0,0 @@
# FloodFill算法详解及应用
啥是 FloodFill 算法呢,最直接的一个应用就是「颜色填充」,就是 Windows 绘画本中那个小油漆桶的标志,可以把一块被圈起来的区域全部染色。
![floodfill](../pictures/floodfill/floodfill.gif)
这种算法思想还在许多其他地方有应用。比如说扫雷游戏,有时候你点一个方格,会一下子展开一片区域,这个展开过程,就是 FloodFill 算法实现的。
![扫雷](../pictures/floodfill/扫雷.png)
类似的,像消消乐这类游戏,相同方块积累到一定数量,就全部消除,也是 FloodFill 算法的功劳。
![xiaoxiaole](../pictures/floodfill/xiaoxiaole.jpg)
通过以上的几个例子,你应该对 FloodFill 算法有个概念了,现在我们要抽象问题,提取共同点。
### 一、构建框架
以上几个例子,都可以抽象成一个二维矩阵(图片其实就是像素点矩阵),然后从某个点开始向四周扩展,直到无法再扩展为止。
矩阵,可以抽象为一幅「图」,这就是一个图的遍历问题,也就类似一个 N 叉树遍历的问题。几行代码就能解决,直接上框架吧:
```java
// (x, y) 为坐标位置
void fill(int x, int y) {
fill(x - 1, y); // 上
fill(x + 1, y); // 下
fill(x, y - 1); // 左
fill(x, y + 1); // 右
}
```
这个框架可以解决所有在二维矩阵中遍历的问题说得高端一点这就叫深度优先搜索Depth First Search简称 DFS说得简单一点这就叫四叉树遍历框架。坐标 (x, y) 就是 root四个方向就是 root 的四个子节点。
下面看一道 LeetCode 题目,其实就是让我们来实现一个「颜色填充」功能。
![title](../pictures/floodfill/leetcode.png)
根据上篇文章,我们讲了「树」算法设计的一个总路线,今天就可以用到:
```java
int[][] floodFill(int[][] image,
int sr, int sc, int newColor) {
int origColor = image[sr][sc];
fill(image, sr, sc, origColor, newColor);
return image;
}
void fill(int[][] image, int x, int y,
int origColor, int newColor) {
// 出界:超出边界索引
if (!inArea(image, x, y)) return;
// 碰壁:遇到其他颜色,超出 origColor 区域
if (image[x][y] != origColor) return;
image[x][y] = newColor;
fill(image, x, y + 1, origColor, newColor);
fill(image, x, y - 1, origColor, newColor);
fill(image, x - 1, y, origColor, newColor);
fill(image, x + 1, y, origColor, newColor);
}
boolean inArea(int[][] image, int x, int y) {
return x >= 0 && x < image.length
&& y >= 0 && y < image[0].length;
}
```
只要你能够理解这段代码,一定要给你鼓掌,给你 99 分,因为你对「框架思维」的掌控已经炉火纯青,此算法已经 cover 了 99% 的情况,仅有一个细节问题没有解决,就是当 origColor 和 newColor 相同时,会陷入无限递归。
### 二、研究细节
为什么会陷入无限递归呢,很好理解,因为每个坐标都要搜索上下左右,那么对于一个坐标,一定会被上下左右的坐标搜索。**被重复搜索时,必须保证递归函数能够能正确地退出,否则就会陷入死循环。**
为什么 newColor 和 origColor 不同时可以正常退出呢?把算法流程画个图理解一下:
![ppt1](../pictures/floodfill/ppt1.PNG)
可以看到fill(1, 1) 被重复搜索了,我们用 fill(1, 1)* 表示这次重复搜索。fill(1, 1)* 执行时,(1, 1) 已经被换成了 newColor所以 fill(1, 1)* 会在这个 if 语句被怼回去,正确退出了。
```java
// 碰壁:遇到其他颜色,超出 origColor 区域
if (image[x][y] != origColor) return;
```
![ppt2](../pictures/floodfill/ppt2.PNG)
但是,如果说 origColor 和 newColor 一样,这个 if 语句就无法让 fill(1, 1)* 正确退出,而是开启了下面的重复递归,形成了死循环。
![ppt3](../pictures/floodfill/ppt3.PNG)
### 三、处理细节
如何避免上述问题的发生,最容易想到的就是用一个和 image 一样大小的二维 bool 数组记录走过的地方,一旦发现重复立即 return。
```java
// 出界:超出边界索引
if (!inArea(image, x, y)) return;
// 碰壁:遇到其他颜色,超出 origColor 区域
if (image[x][y] != origColor) return;
// 不走回头路
if (visited[x][y]) return;
visited[x][y] = true;
image[x][y] = newColor;
```
完全 OK这也是处理「图」的一种常用手段。不过对于此题不用开数组我们有一种更好的方法那就是回溯算法。
前文「回溯算法详解」讲过,这里不再赘述,直接套回溯算法框架:
```java
void fill(int[][] image, int x, int y,
int origColor, int newColor) {
// 出界:超出数组边界
if (!inArea(image, x, y)) return;
// 碰壁:遇到其他颜色,超出 origColor 区域
if (image[x][y] != origColor) return;
// 已探索过的 origColor 区域
if (image[x][y] == -1) return;
// choose打标记以免重复
image[x][y] = -1;
fill(image, x, y + 1, origColor, newColor);
fill(image, x, y - 1, origColor, newColor);
fill(image, x - 1, y, origColor, newColor);
fill(image, x + 1, y, origColor, newColor);
// unchoose将标记替换为 newColor
image[x][y] = newColor;
}
```
这种解决方法是最常用的,相当于使用一个特殊值 -1 代替 visited 数组的作用,达到不走回头路的效果。为什么是 -1因为题目中说了颜色取值在 0 - 65535 之间,所以 -1 足够特殊,能和颜色区分开。
### 四、拓展延伸:自动魔棒工具和扫雷
大部分图片编辑软件一定有「自动魔棒工具」这个功能:点击一个地方,帮你自动选中相近颜色的部分。如下图,我想选中老鹰,可以先用自动魔棒选中蓝天背景,然后反向选择,就选中了老鹰。我们来分析一下自动魔棒工具的原理。
![抠图](../pictures/floodfill/抠图.jpg)
显然,这个算法肯定是基于 FloodFill 算法的但有两点不同首先背景色是蓝色但不能保证都是相同的蓝色毕竟是像素点可能存在肉眼无法分辨的深浅差异而我们希望能够忽略这种细微差异。第二FloodFill 算法是「区域填充」,这里更像「边界填充」。
对于第一个问题,很好解决,可以设置一个阈值 threshold在阈值范围内波动的颜色都视为 origColor
```java
if (Math.abs(image[x][y] - origColor) > threshold)
return;
```
对于第二个问题,我们首先明确问题:不要把区域内所有 origColor 的都染色,而是只给区域最外圈染色。然后,我们分析,如何才能仅给外围染色,即如何才能找到最外围坐标,最外围坐标有什么特点?
![ppt4](../pictures/floodfill/ppt4.PNG)
可以发现,区域边界上的坐标,至少有一个方向不是 origColor而区域内部的坐标四面都是 origColor这就是解决问题的关键。保持框架不变使用 visited 数组记录已搜索坐标,主要代码如下:
```java
int fill(int[][] image, int x, int y,
int origColor, int newColor) {
// 出界:超出数组边界
if (!inArea(image, x, y)) return 0;
// 已探索过的 origColor 区域
if (visited[x][y]) return 1;
// 碰壁:遇到其他颜色,超出 origColor 区域
if (image[x][y] != origColor) return 0;
visited[x][y] = true;
int surround =
fill(image, x - 1, y, origColor, newColor)
+ fill(image, x + 1, y, origColor, newColor)
+ fill(image, x, y - 1, origColor, newColor)
+ fill(image, x, y + 1, origColor, newColor);
if (surround < 4)
image[x][y] = newColor;
return 1;
}
```
这样,区域内部的坐标探索四周后得到的 surround 是 4而边界的坐标会遇到其他颜色或超出边界索引surround 会小于 4。如果你对这句话不理解我们把逻辑框架抽象出来看
```java
int fill(int[][] image, int x, int y,
int origColor, int newColor) {
// 出界:超出数组边界
if (!inArea(image, x, y)) return 0;
// 已探索过的 origColor 区域
if (visited[x][y]) return 1;
// 碰壁:遇到其他颜色,超出 origColor 区域
if (image[x][y] != origColor) return 0;
// 未探索且属于 origColor 区域
if (image[x][y] == origColor) {
// ...
return 1;
}
}
```
这 4 个 if 判断涵盖了 (x, y) 的所有可能情况surround 的值由四个递归函数相加得到,而每个递归函数的返回值就这四种情况的一种。借助这个逻辑框架,你一定能理解上面那句话了。
这样就实现了仅对 origColor 区域边界坐标染色的目的,等同于完成了魔棒工具选定区域边界的功能。
这个算法有两个细节问题,一是必须借助 visited 来记录已探索的坐标,而无法使用回溯算法;二是开头几个 if 顺序不可打乱。读者可以思考一下原因。
同理,思考扫雷游戏,应用 FloodFill 算法展开空白区域的同时,也需要计算并显示边界上雷的个数,如何实现的?其实也是相同的思路,遇到雷就返回 true这样 surround 变量存储的就是雷的个数。当然,扫雷的 FloodFill 算法不能只检查上下左右,还得加上四个斜向。
![](../pictures/floodfill/ppt5.PNG)
以上详细讲解了 FloodFill 算法的框架设计,**二维矩阵中的搜索问题,都逃不出这个算法框架**。
坚持原创高质量文章,致力于把算法问题讲清楚,欢迎关注我的公众号 labuladong 获取最新文章:
![labuladong](../pictures/labuladong.jpg)

View File

@ -0,0 +1,217 @@
# Analysis and Application of FloodFill Algorithm
**Translator: [youyun](https://github.com/youyun)**
**Author: [labuladong](https://github.com/labuladong)**
What is the FloodFill algorithm? A real-life example is color filling. In the default Windows application _Paint_, using the bucket icon, we can fill the selected area with a color.
![floodfill](../pictures/floodfill/floodfill.gif)
There are other applications of the FloodFill algorithm. Another example would be Minesweeper. Sometimes when you click on a tile, an area will expand out. The process of expansion is implemented through the FloodFill algorithm.
![Minesweeper](../pictures/floodfill/minesweeper.png)
Similarly, those puzzle-matching games such as Candy Crush also use the FloodFill algorithm to remove blocks of the same color.
![xiaoxiaole](../pictures/floodfill/xiaoxiaole.jpg)
Now you should have some idea about the FloodFill algorithm. Let's abstract out the problems and find out what is common.
### 1. Build Framework
All above examples can be abstract as a 2D array. In fact, a picture is an array of pixels. We take an element as the starting point and expand till the end.
An array can be further abstracted as a graph. Hence, the problem becomes about traversing a graph, similar to traversing an N-ary tree. A few lines of code are enough to resolve the problem. Here is the framework:
```java
// (x, y) represents the coordinate
void fill(int x, int y) {
fill(x - 1, y); // up
fill(x + 1, y); // down
fill(x, y - 1); // left
fill(x, y + 1); // right
}
```
Using this framework, we can resolve all problems about traversing a 2D array. The concept is also called Depth First Search (DFS), or quaternary (4-ary) tree traversal. The root node is coordinate (x, y). Its four child nodes are at root's four directions.
Let's take a look at [a LeetCode problem](https://leetcode.com/problems/flood-fill/). It's actually just a color fill function.
![title](../pictures/floodfill/leetcode_en.jpg)
In [another article](), we discussed a generic design of tree related algorithms. We can apply the concept here:
```java
int[][] floodFill(int[][] image,
int sr, int sc, int newColor) {
int origColor = image[sr][sc];
fill(image, sr, sc, origColor, newColor);
return image;
}
void fill(int[][] image, int x, int y,
int origColor, int newColor) {
// OUT: out of index
if (!inArea(image, x, y)) return;
// CLASH: meet other colors, beyond the area of origColor
if (image[x][y] != origColor) return;
image[x][y] = newColor;
fill(image, x, y + 1, origColor, newColor);
fill(image, x, y - 1, origColor, newColor);
fill(image, x - 1, y, origColor, newColor);
fill(image, x + 1, y, origColor, newColor);
}
boolean inArea(int[][] image, int x, int y) {
return x >= 0 && x < image.length
&& y >= 0 && y < image[0].length;
}
```
If you can understand this block of code, you are almost there! It means that you have honed the mindset of framework. This block of code can cover 99% of cases. There is only one tiny problem to be resolved: an infinite loop will happen if `origColor` is the same as `newColor`.
### 2. Pay Attention to Details
Why is there infinite loop? Each coordinate needs to go through its 4 neighbors. Consequently, each coordinate will also be traversed 4 times by its 4 neighbors. __When we visit an visited coordinate, we must guarantee to identify the situation and exit. If not, we'll go into infinite loop.__
Why can the code exit properly when `newColr` and `origColor` are different? Let's draw an diagram of the algorithm execution:
![ppt1](../pictures/floodfill/ppt1.PNG)
As we can see from the diagram, `fill(1, 1)` is visited twice. Let's use `fill(1, 1)*` to represent this duplicated visit. When `fill(1, 1)*` is executed, `(1, 1)` has already been replaced with `newColor`. So `fill(1, 1)*` will return the control directly at the _CLASH_, i.e. exit as expected.
```java
// CLASH: meet other colors, beyond the area of origColor
if (image[x][y] != origColor) return;
```
![ppt2](../pictures/floodfill/ppt2.PNG)
However, if `origColor` is the same as `newCOlor`, `fill(1, 1)*` will not exit at the _CLASH_. Instead, an infinite loop will start as shown below.
![ppt3](../pictures/floodfill/ppt3.PNG)
### 3. Handling Details
How to avoid the case of infinite loop? The most intuitive answer is to use a boolean 2D array of the same size as image, to record whether a coordinate has been traversed or not. If visited, return immediately.
```java
// OUT: out of index
if (!inArea(image, x, y)) return;
// CLASH: meet other colors, beyond the area of origColor
if (image[x][y] != origColor) return;
// VISITED: don't visit a coordinate twice
if (visited[x][y]) return;
visited[x][y] = true;
image[x][y] = newColor;
```
This is a common technique to handle graph related problems. For this particular problem, there is actually a better way: backtracking algorithm.
Refer to the article [Backtracking Algorithm in Depth]() for details. We directly apply the backtracking algorithm framework here:
```java
void fill(int[][] image, int x, int y,
int origColor, int newColor) {
// OUT: out of index
if (!inArea(image, x, y)) return;
// CLASH: meet other colors, beyond the area of origColor
if (image[x][y] != origColor) return;
// VISITED: visited origColor
if (image[x][y] == -1) return;
// choose: mark a flag as visited
image[x][y] = -1;
fill(image, x, y + 1, origColor, newColor);
fill(image, x, y - 1, origColor, newColor);
fill(image, x - 1, y, origColor, newColor);
fill(image, x + 1, y, origColor, newColor);
// unchoose: replace the mark with newColor
image[x][y] = newColor;
}
```
This is a typical way, using a special value -1 to replace the visited 2D array, to achieve the same purpose. Because the range of color is `[0, 65535]`, -1 is special enough to differentiate with actual colors.
### 4. Extension: Magic Wand Tool and Minesweeper
Most picture editing softwares have the function "Magic Wand Tool". When you click a point, the application will help you choose a region of similar colors automatically. Refer to the picture below, if we want to select the eagle, we can use the Magic Wand Tool to select the blue sky, and perform inverse selection. Let's analyze the mechanism of the Magic Wand Tool.
![CutOut](../pictures/floodfill/cutout.jpg)
Obviously, the algorithm must be based on the FloodFill algorithm. However, there are two differences:
1. Though the background color is blue, we can't guarantee all the blue pixels are exactly the same. There could be minor differences that can be told by our eyes. But we still want to ignore these minor differences.
2. FloodFill is to fill regions. Magic Wand Tool is more about filling the edges.
It's easy to resolve the first problem by setting a `threshold`. All colors within the threshold from the `origColor` can be recognized as `origColor`.
```java
if (Math.abs(image[x][y] - origColor) > threshold)
return;
```
As for the second problem, let's first define the problem clearly: _"do not color all `origColor` coordinates in the region; only care about the edges."_. Next, let's analyze how to only color edges. i.e. How to find out the coordinates at the edges? What special properties do coordinates at the edges hold?
![ppt4](../pictures/floodfill/ppt4.PNG)
From the diagram above, we can see that for all coordinates at the edges, there is at least one direction that is not `origColor`. For all inner coordinates, all 4 directions are `origColor`. This is the key to the solution. Using the same framework, using `visited` array to represent traversed coordinates:
```java
int fill(int[][] image, int x, int y,
int origColor, int newColor) {
// OUT: out of index
if (!inArea(image, x, y)) return 0;
// VISITED: visited origColor
if (visited[x][y]) return 1;
// CLASH: meet other colors, beyond the area of origColor
if (image[x][y] != origColor) return 0;
visited[x][y] = true;
int surround =
fill(image, x - 1, y, origColor, newColor)
+ fill(image, x + 1, y, origColor, newColor)
+ fill(image, x, y - 1, origColor, newColor)
+ fill(image, x, y + 1, origColor, newColor);
if (surround < 4)
image[x][y] = newColor;
return 1;
}
```
In this way, all inner coordinates will have `surround` equal to 4 after traversing the four directions; all edge coordinates will be either OUT or CLASH, resulting `surround` less than 4. If you are still not clear, let's only look at the framework's logic flow:
```java
int fill(int[][] image, int x, int y,
int origColor, int newColor) {
// OUT: out of index
if (!inArea(image, x, y)) return 0;
// VISITED: visited origColor
if (visited[x][y]) return 1;
// CLASH: meet other colors, beyond the area of origColor
if (image[x][y] != origColor) return 0;
// UNKNOWN: unvisited area that is origColor
if (image[x][y] == origColor) {
// ...
return 1;
}
}
```
These 4 `if`s cover all possible scenarios of (x, y). The value of `surround` is the sum of the return values of the 4 recursive functions. And each recursive function will fall into one of the 4 scenarios. You should be much clearer now after looking at this framework.
This implementation colors all edge coordinates only for the `origColor` region, which is what the Magic Wand TOol does.
Pay attention to 2 details in this algorithm:
1. We must use `visited` to record traversed coordinates instead of backtracking algorithm.
2. The order of the `if` clauses can't be modified. (Why?)
Similarly, for Minesweeper, when we use the FloodFill algorithm to expand empty areas, we also need to show the number of mines nearby. How to implement it? Following the same idea, return `true` when we meet mine. Thus, `surround` will store the number of mines nearby. Of course, in Minesweeper, there are 8 directions instead of 4, including diagonals.
![](../pictures/floodfill/ppt5.PNG)
We've discussed the design and framework of the FloodFill algorithm. __All searching problems in a 2D array can be fit into this framework.__

View File

@ -0,0 +1,132 @@
# Prefix Sum
**Translator: [youyun](https://github.com/youyun)**
**Author: [labuladong](https://github.com/labuladong)**
Let's talk about a simple but interesting algorithm problem today. Find the number of subarrays which sums to k.
![](../pictures/prefix_sum/title_en.jpg)
The most intuitive way is using brute force - find all the subarrays, sum up and compare with k.
The tricky part is, __how to find the sum of a subarray fast?__ For example, you're given an array `nums`, and asked to implement API `sum(i, j)` which returns the sum of `nums[i..j]`. Furthermore, the API will be very frequently used. How do you plan to implement this API?
Due to the high frequency, it is very inefficient to traverse through `nums[i..j]` each time. Is there a quick method which find the sum in time complexity of O(1)? There is a technique called __Prefix Sum__.
### 1. What is Prefix Sum
The idea of Prefix SUm goes like this: for a given array `nums`, create another array to store the sum of prefix for pre-processing:
```java
int n = nums.length;
// array of prefix sum
int[] preSum = new int[n + 1];
preSum[0] = 0;
for (int i = 0; i < n; i++)
preSum[i + 1] = preSum[i] + nums[i];
```
![](../pictures/prefix_sum/1.jpg)
The meaning of `preSum` is easy to understand. `preSum[i]` is the sum of `nums[0..i-1]`. If we want to calculate the sum of `nums[i..j]`, we just need to perform `preSum[j+1] - preSum[i]` instead of traversing the whole subarray.
Coming back to the original problem. If we want to find the number of subarrays which sums to k respectively, it's straightforward to implement using Prefix Sum technique:
```java
int subarraySum(int[] nums, int k) {
int n = nums.length;
// initialize prefix sum
int[] sum = new int[n + 1];
sum[0] = 0;
for (int i = 0; i < n; i++)
sum[i + 1] = sum[i] + nums[i];
int ans = 0;
// loop through all subarrays by brute force
for (int i = 1; i <= n; i++)
for (int j = 0; j < i; j++)
// sum of nums[j..i-1]
if (sum[i] - sum[j] == k)
ans++;
return ans;
}
```
The time complexity of this solution is O(N^2), while the space complexity is O(N). This is not the optimal solution yet. However, we can apply some cool techniques to reduce the time complexity further, after understanding how Prefix Sum and arrays can work together through this solution.
### 2. Optimized Solution
The solution in part 1 has nested `for` loop:
```java
for (int i = 1; i <= n; i++)
for (int j = 0; j < i; j++)
if (sum[i] - sum[j] == k)
ans++;
```
What does the inner `for` loop actually do? Well, it is used __to calculate how many `j` can make the difference of `sum[i]` and `sum[j]` to be k.__ Whenever we find such `j`, we'll increment the result by 1.
We can reorganize the condition of `if` clause:
```java
if (sum[j] == sum[i] - k)
ans++;
```
The idea of optimization is, __to record down how many `sum[j]` equal to `sum[i] - k` such that we can update the result directly instead of having inner loop.__ We can utilize hash table to record both prefix sums and the frequency of each prefix sum.
```java
int subarraySum(int[] nums, int k) {
int n = nums.length;
// mapprefix sum -> frequency
HashMap<Integer, Integer>
preSum = new HashMap<>();
// base case
preSum.put(0, 1);
int ans = 0, sum0_i = 0;
for (int i = 0; i < n; i++) {
sum0_i += nums[i];
// this is the prefix sum we want to find nums[0..j]
int sum0_j = sum0_i - k;
// if it exists, we'll just update the result
if (preSum.containsKey(sum0_j))
ans += preSum.get(sum0_j);
// record the prefix sum nums[0..i] and its frequency
preSum.put(sum0_i,
preSum.getOrDefault(sum0_i, 0) + 1);
}
return ans;
}
```
In the following case, we just need prefix sum of 8 to find subarrays with sum of k. By brute force solution in part 1, we need to traverse arrays to find how many 8 there are. Using the optimal solution, we can directly get the answer through hash table.
![](../pictures/prefix_sum/2.jpg)
This is the optimal solution with time complexity of O(N).
### 3. Summary
Prefix Sum is not hard, yet very useful, especially in dealing with differences of array intervals.
For example, if we were asked to calculate the percentage of each score interval among all students in the class, we can apply Prefix Sum technique:
```java
int[] scores; // to store all students' scores
// the full score is 150 points
int[] count = new int[150 + 1]
// to record how many students at each score
for (int score : scores)
count[score]++
// construct prefix sum
for (int i = 1; i < count.length; i++)
count[i] = count[i] + count[i-1];
```
Afterwards, for any given score interval, we can find how many students fall in this interval by calculating the difference of prefix sums quickly. Hence, the percentage will be calculated easily.
However, for more complex problems, simple Prefix Sum technique is not enough. Even the original question we discussed in this article requires one step further to optimize. We used hash table to eliminate an unnecessary loop. We can see that if we want to achieve the optimal solution, it is indeed important to understand a problem thoroughly and analyze into details.

View File

@ -1,134 +0,0 @@
# 前缀和技巧
今天来聊一道简单却十分巧妙的算法问题:算出一共有几个和为 k 的子数组。
![](../pictures/%E5%89%8D%E7%BC%80%E5%92%8C/title.png)
那我把所有子数组都穷举出来,算它们的和,看看谁的和等于 k 不就行了。
关键是,**如何快速得到某个子数组的和呢**,比如说给你一个数组 `nums`,让你实现一个接口 `sum(i, j)`,这个接口要返回 `nums[i..j]` 的和,而且会被多次调用,你怎么实现这个接口呢?
因为接口要被多次调用,显然不能每次都去遍历 `nums[i..j]`,有没有一种快速的方法在 O(1) 时间内算出 `nums[i..j]` 呢?这就需要**前缀和**技巧了。
### 一、什么是前缀和
前缀和的思路是这样的,对于一个给定的数组 `nums`,我们额外开辟一个前缀和数组进行预处理:
```java
int n = nums.length;
// 前缀和数组
int[] preSum = new int[n + 1];
preSum[0] = 0;
for (int i = 0; i < n; i++)
preSum[i + 1] = preSum[i] + nums[i];
```
![](../pictures/%E5%89%8D%E7%BC%80%E5%92%8C/1.jpg)
这个前缀和数组 `preSum` 的含义也很好理解,`preSum[i]` 就是 `nums[0..i-1]` 的和。那么如果我们想求 `nums[i..j]` 的和,只需要一步操作 `preSum[j+1]-preSum[i]` 即可,而不需要重新去遍历数组了。
回到这个子数组问题,我们想求有多少个子数组的和为 k借助前缀和技巧很容易写出一个解法
```java
int subarraySum(int[] nums, int k) {
int n = nums.length;
// 构造前缀和
int[] sum = new int[n + 1];
sum[0] = 0;
for (int i = 0; i < n; i++)
sum[i + 1] = sum[i] + nums[i];
int ans = 0;
// 穷举所有子数组
for (int i = 1; i <= n; i++)
for (int j = 0; j < i; j++)
// sum of nums[j..i-1]
if (sum[i] - sum[j] == k)
ans++;
return ans;
}
```
这个解法的时间复杂度 $O(N^2)$ 空间复杂度 $O(N)$,并不是最优的解法。不过通过这个解法理解了前缀和数组的工作原理之后,可以使用一些巧妙的办法把时间复杂度进一步降低。
### 二、优化解法
前面的解法有嵌套的 for 循环:
```java
for (int i = 1; i <= n; i++)
for (int j = 0; j < i; j++)
if (sum[i] - sum[j] == k)
ans++;
```
第二层 for 循环在干嘛呢?翻译一下就是,**在计算,有几个 `j` 能够使得 `sum[i]``sum[j]` 的差为 k。**毎找到一个这样的 `j`,就把结果加一。
我们可以把 if 语句里的条件判断移项,这样写:
```java
if (sum[j] == sum[i] - k)
ans++;
```
优化的思路是:**我直接记录下有几个 `sum[j]``sum[i] - k` 相等,直接更新结果,就避免了内层的 for 循环**。我们可以用哈希表,在记录前缀和的同时记录该前缀和出现的次数。
```java
int subarraySum(int[] nums, int k) {
int n = nums.length;
// map前缀和 -> 该前缀和出现的次数
HashMap<Integer, Integer>
preSum = new HashMap<>();
// base case
preSum.put(0, 1);
int ans = 0, sum0_i = 0;
for (int i = 0; i < n; i++) {
sum0_i += nums[i];
// 这是我们想找的前缀和 nums[0..j]
int sum0_j = sum0_i - k;
// 如果前面有这个前缀和,则直接更新答案
if (preSum.containsKey(sum0_j))
ans += preSum.get(sum0_j);
// 把前缀和 nums[0..i] 加入并记录出现次数
preSum.put(sum0_i,
preSum.getOrDefault(sum0_i, 0) + 1);
}
return ans;
}
```
比如说下面这个情况,需要前缀和 8 就能找到和为 k 的子数组了,之前的暴力解法需要遍历数组去数有几个 8而优化解法借助哈希表可以直接得知有几个前缀和为 8。
![](../pictures/%E5%89%8D%E7%BC%80%E5%92%8C/2.jpg)
这样,就把时间复杂度降到了 $O(N)$,是最优解法了。
### 三、总结
前缀和不难,却很有用,主要用于处理数组区间的问题。
比如说,让你统计班上同学考试成绩在不同分数段的百分比,也可以利用前缀和技巧:
```java
int[] scores; // 存储着所有同学的分数
// 试卷满分 150 分
int[] count = new int[150 + 1]
// 记录每个分数有几个同学
for (int score : scores)
count[score]++
// 构造前缀和
for (int i = 1; i < count.length; i++)
count[i] = count[i] + count[i-1];
```
这样,给你任何一个分数段,你都能通过前缀和相减快速计算出这个分数段的人数,百分比也就很容易计算了。
但是,稍微复杂一些的算法问题,不止考察简单的前缀和技巧。比如本文探讨的这道题目,就需要借助前缀和的思路做进一步的优化,借助哈希表去除不必要的嵌套循环。可见对题目的理解和细节的分析能力对于算法的优化是至关重要的。
希望本文对你有帮助。
坚持原创高质量文章,致力于把算法问题讲清楚,欢迎关注我的公众号 labuladong 获取最新文章:
![labuladong](../pictures/labuladong.jpg)