Find Duplicate Number and Missing Number

In this post, we will discuss two famous interview problems, Duplicate Number and Missing Number. From the first problem we will learn some concepts, and then we will apply it to solve a Leetcode hard problem. So get ready to nail some concepts.

Leetcode 287: Find Duplicate Number

Given an array nums[] containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.

Example 1:

Input: [1,3,4,2,2]
Output: 2

Example 2:

Input: [3,1,3,4,2]
Output: 3

Note:
(i) You must not modify the array (assume the array is read only).
(ii) You must use only constant, O(1) extra space.
(iii) Your runtime complexity should be less than O(n^2).
(iv) There is only one duplicate number in the array, but it could be repeated more than once.

Analysis of Problem Statement

The Problem Statement is very simple, you are given a dataset with (n+1) integers where each integer is between 1 and n (both inclusive). Also there is one duplicate integer present in the dataset and you have to find the duplicate one.

This problem seems cakewalk but within constraints it is a bit tricky one.

We will discuss each and every solution of it but it is highly recommended that please try it once. Don’t think of constraints much. Just go for any solution and from that we will learn efficient one.

Naive Method

The easiest method is to sort the dataset and return the consecutive repeated integer. Let’s see the code,

``````int findDuplicate(vector<int>& nums) {
sort(nums.begin(), nums.end());
int res;
for(int i = 0; i < nums.size()-1; i++)
{
if(nums[i] == nums[i+1]) {
res = nums[i];
break;
}
}
return res;
}``````

The above method takes O(log n) time complexity and O(1) space complexity.

Hash map based Solution

The idea is to use a hash map to store the occurrence of the integers, the integer which occurred twice is our desired result.

Let’s see it’s code too,

``````int findDuplicate(vector<int>& nums) {
unordered_map<int,int> m;
int res;
for(int i = 0; i < nums.size(); i++)
{
if(m.count(nums[i]) != 0)
{
res = nums[i];
break;
}
m[nums[i]] = 1;
}
return res;
}``````

The above solution takes O(n) time complexity and O(n) space complexity.

Can you solve it by taking constant space complexity ?

Efficient Solution O(1) space complexity

In the previous Solution, we used hash-map to store the occurrence of the integers and since we are using a hash-map, the space complexity increases to O(n).

For constant space complexity solution, we have to perform the same task without using a hash-map or any other data structures.

First of all we have to understand the actual role of a hash-map.

What is the main role of hash-map in the above solution..?

Hash-Map is actually giving us the information whether we have visited that integer before or not.

If we have visited it before, then that integer is our desired result. Hash-map is actually helping us for finding this information.

For constant space complexity solution, we have to get this information without using any extra space.

For that, we have to modify the given array such that it can give us that information.

Let us note it what are the information we have currently,

(i) Elements are between 1 and n.
(ii) Index Number of the array.

Yes only these information are available to us.

I will also give you a hint, use the index number as your weapon in constant space complexity solution.

So, I would recommend you to think of a way to modify the array such that we can know that whether we have visited that element before or not. Use the hint wisely.

Logic Behind constant space complexity Solution

The main logic behind constant space complexity solution is to transform the element present at index of current element to its negative form, so that whenever we find any duplicate of that element again, the negativity form of that element present at index of current element will in sure us that we have visited that element before and in this way we will get our desired result.

Let’s understand this concept more clearly by taking an example,

This was the array we have in the first example. We will start traversing from 0th index.

Since element at 0th index is 1, we will update index(1-1) to its negative form (1 to -1) and we will move ahead.

Since element at 0th index is in negative form, whenever we visit 1 again, the negativity form element at index (1-1) will insure us that this is not the first time we have visited 1, we have visited it before and hence it is the duplicate element.

Now element at 1st index is 3, we will transform element at index (3-1) to its negative form (4 to -4) as well and we will move ahead.

The conversion to negative form insures us that we have visited (index+1) and if we visit it again, that (index+1) will be our desired result.

Like in above example, element at index 2 is negative which makes us sure that we have visited (2+1) = 3 (index+1) before.

Hope this makes you understand what I mean to say. Now let’s move ahead.

Now we are at index 2, element is -4. So now we will again transform element at index (4-1) to its negative form (2 at index 3 to -2).

NOTE element at index 2 is -4, but we will transform element at index (+4) – 1, we will take positive part of it.

Now we are index 3 and element is -2, we will again transform the element at index (2-1) to its negative form (3 to -3) and we will move ahead.

Now we are at index 5, and element is 2, we will again try to transform the element at index (2-1) but this time element at index (2-1) is already negative (at index 1, 3 is already in negative form) which verifies us that we have already visited element 2 before, and this is the duplicate element which we were searching for.

Understand the logic, when we have visited element 2 at index 3, we updated the element at index (2-1) to its negative for keeping a proof that we have visited 2.

Now when we again visit 2 at index 4, we again wanted to update the element at index (2-1) to its negative form but the number at index (2-1) is already negative which insures us that we have visited (2) before.

Hope this above explanation is clear to you, now let’s see the code for better understanding.

``````int findDuplicate(vector<int>& nums) {

int n = nums.size();
int res = 0;

for(int i = 0; i < n; i++)
{
if(nums[abs(nums[i])-1] < 0)
{
res = abs(nums[i]);
break;
}
nums[abs(nums[i]) - 1] = (-1) * nums[abs(nums[i]) - 1];
}
return res;
}``````

The code is simple, I have written the same logic which we have discussed earlier. Hope this code and concept is clear to you. If it is not cleared please read it again. This is the main part of our post.

Remember the hashing based solution, there also we store visited element in a map as a proof that we have already visited it.

Here, we are keeping the negativity form of the element at index (current – 1) as a proof that we have visited it.

Time Complexity of this efficient solution is O(n) and space complexity is O(1).

Great, you have learnt a new concept today and I hope you have understand the intuition behind it. Now tight your hand guys because we are going to solve our next Question based on the logic we have just discussed.

Leetcode 21: First Missing Positive

Given an unsorted integer array, find the smallest missing positive integer.

Example 1:

Input: [1,2,0]
Output: 3

Example 2:

Input: [3,4,-1,1]
Output: 2

Example 3:

Input: [7,8,9,11,12]
Output: 1

Note: Your algorithm should run in O(n) time and uses constant extra space.

Analysis of Problem Statement

You are given a set of unsorted integers consisting of both positive and negative integers and you have to find the smallest missing positive integer.

NOTE the datasets may contain duplicate values.

The Problem is as simple as that and believes me this problem is a HARD Leetcode Problem and this is also one of the most frequently asked interview Problems.

Let us discuss the examples,

The output of the above example is 1. And it’s obvious because smallest positive number is 1 and it is not present in the above dataset.

NOTE positive number starts from 1, don’t consider 0 as a positive number.

The output of above example is 3 because 3 is the smallest missing positive number.

Examples are very simple and I hope it is clear to you.

It is highly recommended please try to solve this problem and try to solve within given constraints. We have already discussed one problem, try to solve it with that intuition.

Go for it and don’t worry, even I couldn’t able to solve this problem on the first try.

Naive Method

The first thing which comes to our mind is to sort the array and search for the first missing positive integer. And yes, it is one of the way to solve this problem.

NOTE that we can neglect negative numbers because we don’t care about them.

Let’s discuss its code,

``````int firstMissingPositive(vector<int>& nums) {

int n = nums.size(), k = 1;
sort(nums.begin(), nums.end());

for(int i = 0; i < n; ++i)
{
if(nums[i] > 0 && nums[i] < k) continue;

else if(nums[i] > 0 && nums[i] != k) return k;

else if(nums[i] > 0 && nums[i] == k) k++;
}

return k;
}``````

Hope the above code is clear to you. NOTE dataset may contain duplicate values, so you should consider this case before writing any code.

The time complexity of the above code is O(log n) and space complexity is constant.

But the problem asks us to run the algorithm in O(n) time complexity.

Hashing based Solution

We can even declare a data structure for storing all positive values up to size n, so that later we can iterate over that hash data structure to find the smallest missing positive integer.

Let’s see the code,

``````int firstMissingPositive(vector<int>& nums) {

int n = nums.size();
vector<int> res(nums.size(), -1);

for(int i = 0; i < n; ++i)
{
if(nums[i] > 0 && nums[i] <= n) res[nums[i]-1] = 1;
}

for(int i = 0; i < n; ++i)
{
if(res[i] == -1) return (i+1);
}

return n + 1;
}``````

Here we used a vector for hashing. I think code doesn’t need any explanation. It’s a simple hashing based solution.

Time Complexity of the above solution is O(n) and space complexity is O(n) as well.

But the Problems ask us to solve the problem in O(n) time and constant space complexity.

Can you reduce the space complexity of the above solution ??

Efficient Solution for First Missing Number (MasterStroke)

The approach of this solution is similar to the efficient solution of Find Duplicate Number problem which we have discussed earlier.

In this problem the elements of the dataset can be negative, it can have duplicate elements as well but we are only concerned about positive elements of the dataset because we have to find smallest positive missing number.

Before that let me ask you one question, what can be the range of the missing number ?

Yes, if you observe carefully, the missing number will be always between the range [1,n+1] where n is the size of the dataset. Isn’t it ?

Let us take us example,

This was our first example, here missing number is 2 which is in the range [1, 5].

In this example missing number is 1, which is also in the range [1, 6].

Here also missing number 3 is in range [1, 6].

This is obvious that missing number will be in the range [1, n+1]. But why ??

Let think of a worst case scenario, in the example below all positive integers up to n are present in the dataset, that’s why the output of it will be (n+1) which is 6.

So, in worst case always (n+1) will be the answer and this is the reason, missing number will always be in the range [1,n+1].

Hope this logic is clear to you. This will definitely help you a lot solving this problem. Now let’s move ahead.

Now since we don’t bother about negative integer, we can safely convert it into (n+2) because we only cares about the range [1, n+1].

Remaining steps are same as what we discussed in the earlier problem.

So now let’s see the code first,

``````int firstMissingPositive(vector<int>& nums) {

int n = nums.size();

for(int i = 0; i < n; ++i) {
if(nums[i] <= 0) nums[i] = n + 2;
}

for(int i = 0; i < n; ++i) {
if(abs(nums[i]) <= n)
{
nums[abs(nums[i]) - 1] = -abs(nums[abs(nums[i]) - 1]);
}
}

for (int i = 0; i < n; ++i) {
if (nums[i] > 0) return i + 1;
}
return n + 1;
}``````

Remaining steps are same, we will run a loop and we will check whether any integer is smaller than ‘n’ or not. If found, let say x then we will transform the element present at index (x-1) to its negative form.

Let’s discuss some example to understand,

After 1st step, this will array will get change to,

Now initially, element at 0th index is 3, so we will update element at index (3-1) to its negative form.

Element at 1st index is 4, so we will update element at index (4-1) to its negative form and then we will move ahead.

Now, Element at 2nd index is 5 (we will consider positive part of it), 5 is greater than 4, so we will not do anything in this iteration, we will simply move to the next index.

Element at the 3rd index is 1, so we will update the
element at index (1-1) to its negative form.

This will be our final array, at last step we will check whether any integer is positive or not, if found its (index+1) will be the our desired result because in our previous iteration we didn’t counter that element in the array.

If we have countered 2, then definitely the element at index (2-1) would have been negative.

Hope you are getting the logic. This way we get our desired result.

Let’s take another example,

As you can see in the above example, every element is in negative form, it says that we countered every integer from [1,n]. In such cases, (n+1) will be our desired solution.

Hope this explanation is clear to you. This intuition is very much similar to previous problem.

We have discussed two problems based on same concept. In both the problems we tried to use the property of hashing without using any hashing data structures.

This is the concept which I wanted you to learn.

Now I hope whenever you are asked to solve a problem based on hashing in O(n) time and constant space complexity, you can definitely think about this approach.

That’s all mates..!!!

Abhishek is currently pursuing CSE from Heritage Institute of Technology, Kolkata. He has a great interest in Data Structures and Algorithms, C++, Language, Competitive Coding, Android Development. His hobbies are Learning new skills, Content Writing, Competitive Coding, Teaching contents to Beginners.