mirror of
https://github.com/labuladong/fucking-algorithm.git
synced 2025-07-05 03:36:39 +08:00
issue 177 find duplicate and missing elements translation
This commit is contained in:
@ -32,7 +32,7 @@ This command specifies the `english` branch and limit the depth of clone, get ri
|
||||
* [Union-Find Application](think_like_computer/Union-Find-Application.md)
|
||||
* [Find Sebesquence With Binary Search](interview/findSebesquenceWithBinarySearch.md)
|
||||
* [Problems can be sloved by one line](interview/one-line-code-puzzles.md)
|
||||
* [如何寻找缺失和重复的元素](interview/缺失和重复的元素.md)
|
||||
* [How to Find Duplicate and Missing Element.md](interview/Find-Duplicate-and-Missing-Element.md.md)
|
||||
* [如何判断回文链表](interview/判断回文链表.md)
|
||||
|
||||
* II. Data Structure
|
||||
|
115
interview/Find-Duplicate-and-Missing-Element.md
Normal file
115
interview/Find-Duplicate-and-Missing-Element.md
Normal file
@ -0,0 +1,115 @@
|
||||
#How to Find Duplicate and Missing Element
|
||||
|
||||
**Translator: [bryceustc](https://github.com/bryceustc)**
|
||||
|
||||
**Author: [labuladong](https://github.com/labuladong)**
|
||||
|
||||
Today we are going to talk about a simple but skillfull problem: find duplicate and missing element. It seems to be similar to the previous problem [How to Find Missing Elements](./missing_elements.md), but there are some difference between these two problems.
|
||||
|
||||
Here is the detailed description of this problem(LeetCode 645: Set Mismatch)
|
||||
|
||||
The set ``S``originally contains numbers from 1 to ``n``. But unfortunately, due to the data error, one of the numbers in the set got duplicate to **another** number in the set, which results in repetition of one number and loss of another number.
|
||||
|
||||
Given an array ``nums`` representing the data status of this set after the error. Your task is to firstly find the number occurs twice and then find the number that is missing. Return them in the form of an array.
|
||||
|
||||
**Example 1:**
|
||||
|
||||
```
|
||||
Input: nums = [1,2,2,4]
|
||||
Output: [2,3]
|
||||
```
|
||||
|
||||
Actually, it's easy to solve this problem. Firstly, traverse over the whole `nums` array and use HashMap to store the number of times each element of the array. After this, we can consider every number from `1` to `n`, and check for its presence in map.
|
||||
|
||||
But here's a problem. This solution requires a HashMap that means the space complexity is O(n). We check the condition again. Consider the numbers from `1` to `n`, which happens to be one duplicate element and one missing element. There must be something strange about things going wrong.
|
||||
|
||||
We must traverse over the whole `nums` array of size `n` for each of the numbers from `1` to `n`. That means the time complexity is O(n). So we can think how to save the space used to reduce the space complexity to O(1).
|
||||
|
||||
### Analysis
|
||||
|
||||
The characteristic of this problem is that each element has a certain correspondence with the array index.
|
||||
|
||||
Let's change the condition of the problem temporarily. Change the elements in ``nums`` array to ``[0..N-1]``. Therefore, each element corresponds exactly to an array index, which is easy to understand.
|
||||
|
||||
We assume that there are no duplicate or missing elements in the array. Therefore, each element corresponds to a unique index value.
|
||||
|
||||
But the question now is one number is repeated that results which results in loss of another number. What would happen? This will result in two elements corresponding to the same index, and there will be an index with no elements to correspond.
|
||||
|
||||
If we can somehow find the duplicate corresponding index, which means we find the duplicate element. Then find the index that no element to correspond that also means we find the missing element.
|
||||
|
||||
So, how do you determine how many elements of an index correspond to without using extra space? Here is the subtlety of the question.
|
||||
|
||||
**By turning the element corresponding to each index into a negative number, it indicates that this index has been mapped once.**
|
||||
|
||||

|
||||
|
||||
If we find a duplicate element `4`, the intuitive result is that the element corresponding to index `4 `is already negative.
|
||||
|
||||

|
||||
|
||||
For the missing element `3`, the intuitive result is that the element corresponding to index `3 `is positive.
|
||||
|
||||

|
||||
|
||||
Therefore, we can code as follows:
|
||||
```c++
|
||||
vector<int> findErrorNums(vector<int>& nums) {
|
||||
int n = nums.size();
|
||||
int dup = -1;
|
||||
for (int i = 0; i < n; i++) {
|
||||
int index = abs(nums[i]);
|
||||
// nums[index] < 0 means find the duplicate element
|
||||
if (nums[index] < 0)
|
||||
dup = abs(nums[i]);
|
||||
else
|
||||
nums[index] *= -1;
|
||||
}
|
||||
|
||||
int missing = -1;
|
||||
for (int i = 0; i < n; i++)
|
||||
// nums[i] > 0 means find the missing element
|
||||
if (nums[i] > 0)
|
||||
missing = i;
|
||||
|
||||
return {dup, missing};
|
||||
}
|
||||
```
|
||||
|
||||
Now, the question is basically solved. But don't forget that we have just assumed that the elements in ``nums`` array is from `0` to `N-1`. Actually, it should be `1` to `N`. So we need to modify two places to get the right answer to the original question.
|
||||
|
||||
```c++
|
||||
vector<int> findErrorNums(vector<int>& nums) {
|
||||
int n = nums.size();
|
||||
int dup = -1;
|
||||
for (int i = 0; i < n; i++) {
|
||||
// Now, elements start at 1
|
||||
int index = abs(nums[i]) - 1;
|
||||
if (nums[index] < 0)
|
||||
dup = abs(nums[i]);
|
||||
else
|
||||
nums[index] *= -1;
|
||||
}
|
||||
|
||||
int missing = -1;
|
||||
for (int i = 0; i < n; i++)
|
||||
if (nums[i] > 0)
|
||||
// Convert index to element
|
||||
missing = i + 1;
|
||||
|
||||
return {dup, missing};
|
||||
}
|
||||
```
|
||||
|
||||
In fact, it makes sense for elements to start from `1`, and it must start with a non-zero number. If the element starts from `0`, the opposite number of `0` is still itself. So when the number `0` is repeated or missing, we can't deal with this situation. Our previous assumption was just to simplify the problem and make it easier to understand.
|
||||
|
||||
### Summary
|
||||
|
||||
**The key point is that elements and indexes appear in pairs for this kind of problems. Common methods include Sorting, XOR, and Map**
|
||||
|
||||
The idea of Map is the above analysis. Mapping each index and element, and recording whether an element is mapped with a sign.
|
||||
|
||||
The Sorting method is also easy to understand. For this problem, we can assume that if all elements are sorted from smallest to largest. If we find that the corresponding elements of the index didn't match, so we find duplicate and missing elements.
|
||||
|
||||
XOR operation is also commonly used. The XOR operation (`^`) has a special property: the result of a number XOR itself is 0, and the result of a number with 0 is itself. For instance: ``a ^ a = 0, a ^ 0 = a``. If we take XOR of the index and element at the same time, the paired index and element can be eliminated, and the remaining are duplicate or missing elements. You can look at the previous article [Find Missing Elements](./missing_elements.md) which introduce this method.
|
||||
|
||||
_We can stop by now._
|
@ -1,112 +0,0 @@
|
||||
今天就聊一道很看起来简单却十分巧妙的问题,寻找缺失和重复的元素。之前的一篇文章「寻找缺失元素」也写过类似的问题,不过这次的和上次的问题使用的技巧不同。
|
||||
|
||||
这是 LeetCode 645 题,我来描述一下这个题目:
|
||||
|
||||
给一个长度为 `N` 的数组 `nums`,其中本来装着 `[1..N]` 这 `N` 个元素,无序。但是现在出现了一些错误,`nums` 中的一个元素出现了重复,也就同时导致了另一个元素的缺失。请你写一个算法,找到 `nums` 中的重复元素和缺失元素的值。
|
||||
|
||||
```cpp
|
||||
// 返回两个数字,分别是 {dup, missing}
|
||||
vector<int> findErrorNums(vector<int>& nums);
|
||||
```
|
||||
|
||||
比如说输入:`nums = [1,2,2,4]`,算法返回 `[2,3]`。
|
||||
|
||||
其实很容易解决这个问题,先遍历一次数组,用一个哈希表记录每个数字出现的次数,然后遍历一次 `[1..N]`,看看那个元素重复出现,那个元素没有出现,就 OK 了。
|
||||
|
||||
但问题是,这个常规解法需要一个哈希表,也就是 O(N) 的空间复杂度。你看题目给的条件那么巧,在 `[1..N]` 的几个数字中恰好有一个重复,一个缺失,**事出反常必有妖**,对吧。
|
||||
|
||||
O(N) 的时间复杂度遍历数组是无法避免的,所以我们可以想想办法如何降低空间复杂度,是否可以在 O(1) 的空间复杂度之下找到重复和确实的元素呢?
|
||||
|
||||
### 思路分析
|
||||
|
||||
这个问题的特点是,每个元素和数组索引有一定的对应关系。
|
||||
|
||||
我们现在自己改造下问题,**暂且将 `nums` 中的元素变为 `[0..N-1]`,这样每个元素就和一个数组索引完全对应了,这样方便理解一些**。
|
||||
|
||||
如果说 `nums` 中不存在重复元素和缺失元素,那么每个元素就和唯一一个索引值对应,对吧?
|
||||
|
||||
现在的问题是,有一个元素重复了,同时导致一个元素缺失了,这会产生什么现象呢?**会导致有两个元素对应到了同一个索引,而且会有一个索引没有元素对应过去**。
|
||||
|
||||
那么,如果我能够通过某些方法,找到这个重复对应的索引,不就是找到了那个重复元素么?找到那个没有元素对应的索引,不就是找到了那个缺失的元素了么?
|
||||
|
||||
那么,如何不使用额外空间判断某个索引有多少个元素对应呢?这就是这个问题的精妙之处了:
|
||||
|
||||
**通过将每个索引对应的元素变成负数,以表示这个索引被对应过一次了**:
|
||||
|
||||

|
||||
|
||||
如果出现重复元素 `4`,直观结果就是,索引 `4` 所对应的元素已经是负数了:
|
||||
|
||||

|
||||
|
||||
对于缺失元素 `3`,直观结果就是,索引 `3` 所对应的元素是正数:
|
||||
|
||||

|
||||
|
||||
对于这个现象,我们就可以翻译成代码了:
|
||||
|
||||
```cpp
|
||||
vector<int> findErrorNums(vector<int>& nums) {
|
||||
int n = nums.size();
|
||||
int dup = -1;
|
||||
for (int i = 0; i < n; i++) {
|
||||
int index = abs(nums[i]);
|
||||
// nums[index] 小于 0 则说明重复访问
|
||||
if (nums[index] < 0)
|
||||
dup = abs(nums[i]);
|
||||
else
|
||||
nums[index] *= -1;
|
||||
}
|
||||
|
||||
int missing = -1;
|
||||
for (int i = 0; i < n; i++)
|
||||
// nums[i] 大于 0 则说明没有访问
|
||||
if (nums[i] > 0)
|
||||
missing = i;
|
||||
|
||||
return {dup, missing};
|
||||
}
|
||||
```
|
||||
|
||||
这个问题就基本解决了,别忘了我们刚才为了方便分析,假设元素是 `[0..N-1]`,但题目要求是 `[1..N]`,所以只要简单修改两处地方即可得到原题的答案:
|
||||
|
||||
```cpp
|
||||
vector<int> findErrorNums(vector<int>& nums) {
|
||||
int n = nums.size();
|
||||
int dup = -1;
|
||||
for (int i = 0; i < n; i++) {
|
||||
// 现在的元素是从 1 开始的
|
||||
int index = abs(nums[i]) - 1;
|
||||
if (nums[index] < 0)
|
||||
dup = abs(nums[i]);
|
||||
else
|
||||
nums[index] *= -1;
|
||||
}
|
||||
|
||||
int missing = -1;
|
||||
for (int i = 0; i < n; i++)
|
||||
if (nums[i] > 0)
|
||||
// 将索引转换成元素
|
||||
missing = i + 1;
|
||||
|
||||
return {dup, missing};
|
||||
}
|
||||
```
|
||||
|
||||
其实,元素从 1 开始是有道理的,也必须从一个非零数开始。因为如果元素从 0 开始,那么 0 的相反数还是自己,所以如果数字 0 出现了重复或者缺失,算法就无法判断 0 是否被访问过。我们之前的假设只是为了简化题目,更通俗易懂。
|
||||
|
||||
### 最后总结
|
||||
|
||||
对于这种数组问题,**关键点在于元素和索引是成对儿出现的,常用的方法是排序、异或、映射**。
|
||||
|
||||
映射的思路就是我们刚才的分析,将每个索引和元素映射起来,通过正负号记录某个元素是否被映射。
|
||||
|
||||
排序的方法也很好理解,对于这个问题,可以想象如果元素都被从小到大排序,如果发现索引对应的元素如果不相符,就可以找到重复和缺失的元素。
|
||||
|
||||
异或运算也是常用的,因为异或性质 `a ^ a = 0, a ^ 0 = a`,如果将索引和元素同时异或,就可以消除成对儿的索引和元素,留下的就是重复或者缺失的元素。可以看看前文「寻找缺失元素」,介绍过这种方法。
|
||||
|
||||
|
||||
|
||||
坚持原创高质量文章,致力于把算法问题讲清楚,欢迎关注我的公众号 labuladong 获取最新文章:
|
||||
|
||||

|
Reference in New Issue
Block a user