Merge pull request #208 from realism0331/english

Shuffle_Algorithm
2025-07-08 22:36:38 +08:00 · 2020-03-13 10:30:50 +08:00
parent a5d0075267 11e53aac90
commit 8fe29dfec0
8 changed files with 189 additions and 186 deletions
--- a/pictures/Shuffle_Algorithm/1.png
+++ b/pictures/Shuffle_Algorithm/1.png
--- a/pictures/Shuffle_Algorithm/2.png
+++ b/pictures/Shuffle_Algorithm/2.png
--- a/pictures/Shuffle_Algorithm/3.png
+++ b/pictures/Shuffle_Algorithm/3.png
--- a/pictures/Shuffle_Algorithm/4.png
+++ b/pictures/Shuffle_Algorithm/4.png
--- a/pictures/Shuffle_Algorithm/5.jpg
+++ b/pictures/Shuffle_Algorithm/5.jpg
--- a/pictures/Shuffle_Algorithm/6.png
+++ b/pictures/Shuffle_Algorithm/6.png
--- a/think_like_computer/Shuffle_Algorithm.md
+++ b/think_like_computer/Shuffle_Algorithm.md
@ -0,0 +1,189 @@
 # Shuffle Algorithm
 **Translator: [realism0331](https://github.com/realism0331)**
 **Author: [labuladong](https://github.com/labuladong)**
 I know people have all kinds of fancy sorting algorithms, but if you had to scramble an array, would you be able to do that? Even if you come up with an algorithm, how do you prove that your algorithm is correct? An unordered algorithm is not like a sorting algorithm, the only result is easy to check. But there are many different kinds of "unordered" , so, how can you prove that your algorithm is "really unordered" ?
 So we have two problems:
 1. What's the meaning of  "real mess" ?
 2.  What algorithms are designed to disrupt arrays in order to achieve "true chaos" ?
 This algorithm is called a random shuffle algorithm or a shuffle algorithm.
 This paper is divided into two parts, the first part of the most commonly used shuffle algorithm. Because the details of the Algorithm is error-prone, and there are several variants, although there are subtle differences but are correct, so this article to introduce a simple general idea to ensure that you write a correct shuffle algorithm. The second part explains the use of "Monte Carlo method" to test whether our disruption is real. Monte Carlo Method's ideas are not difficult, but they have their own characteristics.
 ### 1. Shuffle Algorithm
 Such algorithms rely on the exchange of randomly selected elements to obtain randomness, directly look at the code (pseudo-code) , the algorithm has four forms, both of them are correct:
 ```java
 // A random integer in the closed interval [min, Max] is obtained
 int randInt(int min, int max);
 // First case
 void shuffle(int[] arr) {
    int n = arr.length();
    /*** The only difference is these two lines ***/
    for (int i = 0 ; i < n; i++) {
        // Randomly select an element from i to the last
        int rand = randInt(i, n - 1);
    /*********************************************/
        swap(arr[i], arr[rand]);
    }
 }
 // Second case
    for (int i = 0 ; i < n - 1; i++)
        int rand = randInt(i, n - 1);
 // Third cse
    for (int i = n - 1 ; i >= 0; i--)
        int rand = randInt(0, i);
 // Forth case
    for (int i = n - 1 ; i > 0; i--)
        int rand = randInt(0, i);
 ```
 **To analyze the correctness of the shuffle algorithm: The result must have n! possibilities , or it's a mistake. **This is easy to explain, because an array of length n has a full permutation of n! possibilities. That is to say, the total number of disruption results are n! possibilities . The Algorithm must be able to reflect this fact in order to be correct.
 Let's use this rule to analyze the correctness of **the first one**:
 ```java
 // Suppose an arr is passed in like this
 int[] arr = {1,3,5,7,9};
 void shuffle(int[] arr) {
    int n = arr.length(); // 5
    for (int i = 0 ; i < n; i++) {
        int rand = randInt(i, n - 1);
        swap(arr[i], arr[rand]);
    }
 }
 ```
 At the first iteration of the for loop, `i=0`, the range of  `rand`  is `[0,4]`, and there are five possible values.
 ![第一次](../pictures/Shuffle_Algorithm/1.png)
 On the second iteration of the for loop, `i=1`,the range of ` rand`  is `[1,4]`, and there are four possible values.
 ![第二次](../pictures/Shuffle_Algorithm/2.png)
 And so on, until the last iteration, `i=4`, and  the range of  `rand`  is `[4,4]`, and only one possible value.
 ![最后一次](../pictures/Shuffle_Algorithm/3.png)
 As you can see, all the possible outcomes of this process is `n! = 5! = 5*4*3*2*1` ，so the algorithm is correct.
 In **the second case**, the previous iterations are the same, with only one iteration missing. So on the last iteration`i = 3`，the range of `rand` is `[3,4]`, and there are two possible values.。
 ```java
 // Second case
 // arr = {1,3,5,7,9}, n = 5
    for (int i = 0 ; i < n - 1; i++)
        int rand = randInt(i, n - 1);
 ```
 So all the possible outcomes of the whole process are still `5*4*3*2 = 5! = n!` ，because multiplying by one is optional. So that's correct.
 If you understand all of the above, you'll see that **the third way** is the first way, just iterating the array from back to front; **the fourth way** is the second way which is also iterating the array from back to front. So they're both correct.
 If the reader had thought about the Shuffle Algorithm, he might have come up with the following, but **that would be a mistake**:
 ```java
 void shuffle(int[] arr) {
    int n = arr.length();
    for (int i = 0 ; i < n; i++) {
        // Every time, elements are randomly selected from 
        //the closed interval [0, n-1] for exchange 
        int rand = randInt(0, n - 1);
        swap(arr[i], arr[rand]);
    }
 }
 ```
 Now you can see why this is wrong. Because all the possible outcomes of this notation are $n ^ n $, not $N!$ and $n ^ n$ can't be $N!$ integral multiples of $.
 So, for example `arr = {1,2,3}`，the correct result should be$3!= 6$ , but there are a total of $3 ^ 3 = 27 $possible results. Since 27 can not be divisible by 6, there must be some cases that are "biased, " meaning that some cases are more likely to occur, so the disruption is not really chaotic.
 we explained the correct criteria of shuffle algorithm through our intuition， there is no mathematical proof, I think we do not bother to prove. For probability problems, we can use the Monte Carlo Method for simple verification.
 ### 2. Monte Carlo method
 **The measure of the correctness** of a shuffle algorithm, or a random shuffle algorithm, is that **the probabilities of each possible outcome must be equal, that is, random enough.**
 If probability equality is not strictly proved mathematically, it is possible to use the Monte Carlo method approximation to estimate whether the probability is equal and whether the result is sufficiently random.
 In high school, I had remembered there was a math problem: randomly dot a square with a circle inside it, tell you the total number of dots and the number of dots falling in the circle, let you calculate Pi.
 ![正方形](../pictures/Shuffle_Algorithm/4.png)
 The trick is to use Monte Carlo method: When you hit enough dots, the number of dots can be approximated to the area of the graph. By the area formula, Pi can be easily deduced from the area ratio of a square to a circle. Of course, the more points played, the more accurate the calculation of Pi, fully reflects the truth of the miracle of force.
 Similarly, we can shuffle the same array one million times, counting the number of various results, regrading the frequency as the probability, it is easy to see whether the shuffle algorithm is correct. The overall idea is very simple, but there are some skills to implement, the following are brief analysis of several implementation ideas.
 **The first idea** is to enumerate all the permutations of the Array arr and make a histogram (assuming `arr = {1,2,3}`) :
 ![直方图](../pictures/Shuffle_Algorithm/5.jpg)
 After each shuffle algorithm, add one to the frequency corresponding to the disruption result, repeat 1 million times, if the total number of each result is almost the same, that each result should be the same probability. Write pseudo code for this idea:
 ```java
 void shuffle(int[] arr);
 // Monte Carlo
 int N = 1000000;
 HashMap count; // As histogram
 for (i = 0; i < N; i++) {
    int[] arr = {1,2,3};
    shuffle(arr);
    // At this time, arr has been disrupted 
    count[arr] += 1；
 }
 for (int feq : count.values()) 
    print(feq / N + " "); // frequency
 ```
 This test is possible, though one might ask, the whole array of arr is n! Species (n is the length of arr) , if n is larger, isn't that the explosion of space complexity?
 Yes, but as a verification method, we don't need n too big, so we'll just try arr of length 5 or 6, because we just want to compare the probabilities to see if it's correct.
 **The second way** to think about there is only one 1 in the arr array,others are all 0 . So we're going to mess up the Arr a million times, and we're going to record the number of occurrences of 1 per index, and if the number of  per index is about the same, then we're going to say that the probability of any kind of mess is equal.
 ```java
 void shuffle(int[] arr);
 // Monte Carlo method
 int N = 1000000;    
 int[] arr = {1,0,0,0,0};
 int[] count = new int[arr.length];
 for (int i = 0; i < N; i++) {
    shuffle(arr); // disrupt arr
    for (int j = 0; j < arr.length; j++) 
        if (arr[j] == 1) {
            count[j]++;
            break;
        }
 }
 for (int feq : count) 
    print(feq / N + " "); // frequency
 ```
 ![直方图](../pictures/Shuffle_Algorithm/6.png)
 This idea is also feasible, and to avoid the space complexity of factorial level, but more nested for loop, time complexity a little higher. However, since our test data volume will not be very large, these problems can be ignored.
 In addition, the careful reader may notice that the two lines of thought above state that Arr is in a different position, one inside the for loop and the other outside the for loop. Actually, the effect is the same, because our algorithm always messes the Arr, so the order of Arr doesn't matter, as long as the elements stay the same.
 ### 3. Final summary
 In the first part of this paper, the Shuffle Algorithm (random scrambling algorithm) is introduced. Through a simple analysis technique, four correct forms of the algorithm are proved, and a common wrong writing method is analyzed, I'm sure you'll be able to write the right Shuffle Algorithm.
 In the second part, I write the criterion of the correctness of the Shuffle Algorithm, that is, the probability of each kind of random result must be equal. If we don't use rigorous mathematical proof, we can use Monte Carlo method to perform miracles, roughly verifying that the algorithm is correct. The Monte Carlo method has  different approaches, but it doesn't have to be strict, because we're just looking for a simple validation.
--- a/think_like_computer/洗牌算法.md
+++ b/think_like_computer/洗牌算法.md
@ -1,186 +0,0 @@
 # 洗牌算法
 我知道大家会各种花式排序算法，但是如果叫你打乱一个数组，你是否能做到胸有成竹？即便你拍脑袋想出一个算法，怎么证明你的算法就是正确的呢？乱序算法不像排序算法，结果唯一可以很容易检验，因为「乱」可以有很多种，你怎么能证明你的算法是「真的乱」呢？
 所以我们面临两个问题：
 1. 什么叫做「真的乱」？
 2. 设计怎样的算法来打乱数组才能做到「真的乱」？
 这种算法称为「随机乱置算法」或者「洗牌算法」。
 本文分两部分，第一部分详解最常用的洗牌算法。因为该算法的细节容易出错，且存在好几种变体，虽有细微差异但都是正确的，所以本文要介绍一种简单的通用思想保证你写出正确的洗牌算法。第二部分讲解使用「蒙特卡罗方法」来检验我们的打乱结果是不是真的乱。蒙特卡罗方法的思想不难，但是实现方式也各有特点的。
 ### 一、洗牌算法
 此类算法都是靠随机选取元素交换来获取随机性，直接看代码（伪码），该算法有 4 种形式，都是正确的：
 ```java
 // 得到一个在闭区间 [min, max] 内的随机整数
 int randInt(int min, int max);
 // 第一种写法
 void shuffle(int[] arr) {
    int n = arr.length();
    /******** 区别只有这两行 ********/
    for (int i = 0 ; i < n; i++) {
        // 从 i 到最后随机选一个元素
        int rand = randInt(i, n - 1);
        /*************************/
        swap(arr[i], arr[rand]);
    }
 }
 // 第二种写法
    for (int i = 0 ; i < n - 1; i++)
        int rand = randInt(i, n - 1);
 // 第三种写法
    for (int i = n - 1 ; i >= 0; i--)
        int rand = randInt(0, i);
 // 第四种写法
    for (int i = n - 1 ; i > 0; i--)
        int rand = randInt(0, i);
 ```
 **分析洗牌算法正确性的准则：产生的结果必须有 n! 种可能，否则就是错误的。**这个很好解释，因为一个长度为 n 的数组的全排列就有 n! 种，也就是说打乱结果总共有 n! 种。算法必须能够反映这个事实，才是正确的。
 我们先用这个准则分析一下**第一种写法**的正确性：
 ```java
 // 假设传入这样一个 arr
 int[] arr = {1,3,5,7,9};
 void shuffle(int[] arr) {
    int n = arr.length(); // 5
    for (int i = 0 ; i < n; i++) {
        int rand = randInt(i, n - 1);
        swap(arr[i], arr[rand]);
    }
 }
 ```
 for 循环第一轮迭代时，`i = 0`，`rand` 的取值范围是 `[0, 4]`，有 5 个可能的取值。
 ![第一次](../pictures/%E6%B4%97%E7%89%8C%E7%AE%97%E6%B3%95/1.png)
 for 循环第二轮迭代时，`i = 1`，`rand` 的取值范围是 `[1, 4]`，有 4 个可能的取值。
 ![第二次](../pictures/%E6%B4%97%E7%89%8C%E7%AE%97%E6%B3%95/2.png)
 后面以此类推，直到最后一次迭代，`i = 4`，`rand` 的取值范围是 `[4, 4]`，只有 1 个可能的取值。
 ![最后一次](../pictures/%E6%B4%97%E7%89%8C%E7%AE%97%E6%B3%95/3.png)
 可以看到，整个过程产生的所有可能结果有 `n! = 5! = 5*4*3*2*1` 种，所以这个算法是正确的。
 分析**第二种写法**，前面的迭代都是一样的，少了一次迭代而已。所以最后一次迭代时 `i = 3`，`rand` 的取值范围是 `[3, 4]`，有 2 个可能的取值。
 ```java
 // 第二种写法
 // arr = {1,3,5,7,9}, n = 5
    for (int i = 0 ; i < n - 1; i++)
        int rand = randInt(i, n - 1);
 ```
 所以整个过程产生的所有可能结果仍然有 `5*4*3*2 = 5! = n!` 种，因为乘以 1 可有可无嘛。所以这种写法也是正确的。
 如果以上内容你都能理解，那么你就能发现**第三种写法**就是第一种写法，只是将数组从后往前迭代而已；**第四种写法**是第二种写法从后往前来。所以它们都是正确的。
 如果读者思考过洗牌算法，可能会想出如下的算法，但是**这种写法是错误的**：
 ```java
 void shuffle(int[] arr) {
    int n = arr.length();
    for (int i = 0 ; i < n; i++) {
        // 每次都从闭区间 [0, n-1]
        // 中随机选取元素进行交换
        int rand = randInt(0, n - 1);
        swap(arr[i], arr[rand]);
    }
 }
 ```
 现在你应该明白这种写法为什么会错误了。因为这种写法得到的所有可能结果有 $n^n$ 种，而不是 $n!$ 种，而且 $n^n$ 不可能是 $n!$ 的整数倍。
 比如说 `arr = {1,2,3}`，正确的结果应该有 $3!= 6$ 种可能，而这种写法总共有 $3^3 = 27$ 种可能结果。因为 27 不能被 6 整除，所以一定有某些情况被「偏袒」了，也就是说某些情况出现的概率会大一些，所以这种打乱结果不算「真的乱」。
 上面我们从直觉上简单解释了洗牌算法正确的准则，没有数学证明，我想大家也懒得证明。对于概率问题我们可以使用「蒙特卡罗方法」进行简单验证。
 ### 二、蒙特卡罗方法验证正确性
 洗牌算法，或者说随机乱置算法的**正确性衡量标准是：对于每种可能的结果出现的概率必须相等，也就是说要足够随机。**
 如果不用数学严格证明概率相等，可以用蒙特卡罗方法近似地估计出概率是否相等，结果是否足够随机。
 记得高中有道数学题：往一个正方形里面随机打点，这个正方形里紧贴着一个圆，告诉你打点的总数和落在圆里的点的数量，让你计算圆周率。
 ![正方形](../pictures/%E6%B4%97%E7%89%8C%E7%AE%97%E6%B3%95/4.png)
 这其实就是利用了蒙特卡罗方法：当打的点足够多的时候，点的数量就可以近似代表图形的面积。通过面积公式，由正方形和圆的面积比值是可以很容易推出圆周率的。当然打的点越多，算出的圆周率越准确，充分体现了大力出奇迹的真理。
 类似的，我们可以对同一个数组进行一百万次洗牌，统计各种结果出现的次数，把频率作为概率，可以很容易看出洗牌算法是否正确。整体思想很简单，不过实现起来也有些技巧的，下面简单分析几种实现思路。
 **第一种思路**，我们把数组 arr 的所有排列组合都列举出来，做成一个直方图（假设 arr = {1,2,3}）：
 ![直方图](../pictures/%E6%B4%97%E7%89%8C%E7%AE%97%E6%B3%95/5.jpg)
 每次进行洗牌算法后，就把得到的打乱结果对应的频数加一，重复进行 100 万次，如果每种结果出现的总次数差不多，那就说明每种结果出现的概率应该是相等的。写一下这个思路的伪代码：
 ```java
 void shuffle(int[] arr);
 // 蒙特卡罗
 int N = 1000000;
 HashMap count; // 作为直方图
 for (i = 0; i < N; i++) {
    int[] arr = {1,2,3};
    shuffle(arr);
    // 此时 arr 已被打乱
    count[arr] += 1；
 }
 for (int feq : count.values()) 
    print(feq / N + " "); // 频率
 ```
 这种检验方案是可行的，不过可能有读者会问，arr 的全部排列有 n! 种（n 为 arr 的长度），如果 n 比较大，那岂不是空间复杂度爆炸了？
 是的，不过作为一种验证方法，我们不需要 n 太大，一般用长度为 5 或 6 的 arr 试下就差不多了吧，因为我们只想比较概率验证一下正确性而已。
 **第二种思路**，可以这样想，arr 数组中全都是 0，只有一个 1。我们对 arr 进行 100 万次打乱，记录每个索引位置出现 1 的次数，如果每个索引出现的次数差不多，也可以说明每种打乱结果的概率是相等的。
 ```java
 void shuffle(int[] arr);
 // 蒙特卡罗方法
 int N = 1000000;    
 int[] arr = {1,0,0,0,0};
 int[] count = new int[arr.length];
 for (int i = 0; i < N; i++) {
    shuffle(arr); // 打乱 arr
    for (int j = 0; j < arr.length; j++) 
        if (arr[j] == 1) {
            count[j]++;
            break;
        }
 }
 for (int feq : count) 
    print(feq / N + " "); // 频率
 ```
 ![直方图](../pictures/%E6%B4%97%E7%89%8C%E7%AE%97%E6%B3%95/6.png)
 这种思路也是可行的，而且避免了阶乘级的空间复杂度，但是多了嵌套 for 循环，时间复杂度高一点。不过由于我们的测试数据量不会有多大，这些问题都可以忽略。
 另外，细心的读者可能发现一个问题，上述两种思路声明 arr 的位置不同，一个在 for 循环里，一个在 for 循环之外。其实效果都是一样的，因为我们的算法总要打乱 arr，所以 arr 的顺序并不重要，只要元素不变就行。
 ### 三、最后总结
 本文第一部分介绍了洗牌算法（随机乱置算法），通过一个简单的分析技巧证明了该算法的四种正确形式，并且分析了一种常见的错误写法，相信你一定能够写出正确的洗牌算法了。
 第二部分写了洗牌算法正确性的衡量标准，即每种随机结果出现的概率必须相等。如果我们不用严格的数学证明，可以通过蒙特卡罗方法大力出奇迹，粗略验证算法的正确性。蒙特卡罗方法也有不同的思路，不过要求不必太严格，因为我们只是寻求一个简单的验证。
 坚持原创高质量文章，致力于把算法问题讲清楚，欢迎关注我的公众号 labuladong 获取最新文章：
 ![labuladong](../pictures/labuladong.jpg)