dynammic programming

Longest Subsequence Problem

Leetcode Problem 1143

Given two strings text1 and text2, return the length of their longest common subsequence. It is known as longest subsequence problem.

A subsequence of a string is a new string generated from the original string with some characters(can be none) deleted without changing the relative order of the remaining characters. (eg, “ace” is a subsequence of “abcde” while “aec” is not). A common subsequence of two strings is a subsequence that is common to both strings.

If there is no common subsequence, return 0.

Example 1:
Input: text1 = “abcde”, text2 = “ace”
Output: 3
Explanation: The longest common subsequence is “ace” and its length is 3.

Example 2:
Input: text1 = “abc”, text2 = “abc”
Output: 3
Explanation: The longest common subsequence is “abc” and its length is 3.

Example 3:
Input: text1 = “abc”, text2 = “def”
Output: 0
Explanation: There is no such common subsequence, so the result is 0.


Constraints:
1 <= text1.length <= 1000
1 <= text2.length <= 1000
The input strings consist of lowercase English characters only.

Analysis of the Problem Statement

The longest subsequence problem statement says that we are given two strings and we have to find the longest common subsequence in this problem.

Let us understand the problem statement more clearly.

Let’s say text1 = “abcdhe” and text2 = “aedfhr”, what is the longest common subsequence in these two strings ??

Longest Common Subsequence : “adh”
Longest Common Subsequence Length : 3

For the above problem, your function should return 3.

NOTE that the relative order of a subsequence remains the same, although ‘e’ was present in both the strings it will not be considered as the part of the subsequence because their relative order is not the same.

“ae” is one of the common subsequences but “adhe” is not a common subsequence.

I hope the Problem Statement is clear now. It is highly recommended to try this problem once before moving to its solution.

Naive Method

The naive method is to run a recursion and start comparing the string characters from backward.

Two cases will arise – 

(i) If both the character matches, we will simply increment the count by 1. We will remove the last character of both the strings and then we will again call the recursion to the modified strings.

(ii)  If  both the character doesn’t match, again two cases will arise –

(a) We will first search that character in the second string, by removing the last character of it and calling recursion to the modified strings.

(b) Then, we will also search that character in the first string, by removing the last character of it and calling recursion to the modified strings.

At last whichever case gives us maximum LCS, we will take that case.

Let us see the code to understand it clearly.

int LCS(string text1, string text2, int m, int n)
{
	if(m==0 || n ==0) return 0;
	
	if(text1[m-1] == text2[n-1]) {
		return 1 + LCS(text1, text2, m-1, n-1);
	}
	else return max(LCS(text1, text2, m-1, n),LCS(text1, text2, m, n-1);
}
	
int longestCommonSubsequence(string text1, string text2) {
	int m = text1.length();
	int n = text 2.length();
	
	return LCS(text1, text2, m, n);
}

Let us understand the code more clearly,  Let take another example –
text1 = “abc” and text2 = “act”

dynammic programming

This is how recursion will take place, Hope this diagram makes your concept clear and make you visualize what actually is happening.

Analysis of Naive Solution

The above algorithm computes all possible subproblems and hence time complexity of the above algorithm is O(2^n).

Can we reduce its Complexity?

If you observe carefully, there are overlapping subproblems present in the above recursion tree. The Circular mark shows that we are calculating subproblem (ab, a) twice.

It is of no point to calculate the same thing again and again. It increases our overall time complexity. In such a situation, Dynamic Programming comes into play.

Dynamic Programming Solution

In the above solution, we observe that we are calculating the repeated things again and again (Overlapping Subproblems).

We can solve this overlapping problem by storing the solution of sub problems into some array or hash map, so that whenever we need the solution of that subproblem again, we can use this data instead of calculating it again. This technique is called Memoization.

See the code for better explanation.

int dp[MAX][MAX];
int LCS(string text1, string text2, int m, int n)
{
        if(dp[m][n] != -1) return dp[m][n];
        
        if(m==0 || n ==0) dp[m][n] = 0;

        else if(text1[m-1] == text2[n-1]) {
            dp[m][n] = 1 + LCS(text1, text2, m-1, n-1);
        }
        else dp[m][n] = max(LCS(text1,text2,m-1,n),LCS(text1,text2,m,n-1));
        return dp[m][n];
}

This above algorithm takes simply O(m*n) time complexity.

This is a top-down Solution of longest subsequence problem, here we have initialized a 2-D Array of size MAX, we have also initialized it to -1 initially, and later we store the solution of subproblems in it so that whenever we need it later, instead of again computing it, we can use it again.

Bottom-Up Approach (Tabulation)

We have already discussed Top-Down Approach for Solving this above problem. Now we will discuss its Tabulation Solution.

In Tabulation Solution, we make a 2-D array of size [m+1][n+1], (m and n is the size of string text1 and text2 respectively. We fill this array in the bottom up-approach.

One thing to note here is we are taking one extra space in the 2-D array, the reason for taking an extra space is to store the base condition.

Let’s see the code.

int longestCommonSubsequence(string text1, string text2) {
        
        int m = text1.length(), n = text2.length();
        int dp[m+1][n+1];
        
       	memset(dp,0,sizeof(dp));
       	for(int i = 1; i <= m; i++)
       	{
           	for(int j = 1; j <= n; j++)
            	{
                	if(text1[i-1] == text2[j-1]) {
                                dp[i][j] = 1 + dp[i-1][j-1];
                        }
                	else {
                    		dp[i][j] = max(dp[i-1][j], dp[i][j-1]);
                	}
            	}
        }
        return dp[m][n];
}

Let us understand the code more clearly,

The idea is similar to the top-down approach. In the top-down approach or memoized based solution, we were comparing the last character of both the strings and if it is found the same, we were incrementing the count by 1 and recursively calling the modified string by removing its last character.

For function LCS(string text1, string text2, m, n) , we were recursively calling for LCS(text1,text2,m-1,n-1). Here LCS means longest common subsequence

Similarly in the bottom-up approach, if both the last character matches we just add 1 to dp[i-1][j-1].

For the second case, when the last character of both strings were not matching, in the  top down approach we were recursively calling LCS(text1,text2,m,n-1) and LCS(text2,text2,m-1,n) and storing its maximum in dp[][]  array.

In the same way in bottom up approach, when both the last characters don’t match we find max(dp[i][j-1], dp[i-1][j]) and store it in current dp[i][j].

Both solutions look quite similar, the only difference is that in the top-down approach, we start comparing characters of both strings from the end, and in top-down approach, we start comparing it from the beginning.

Let us understand it more clearly by taking an example,

text1 = “abc” and text2 = “act”

dynammic programming

We will create a 2D array of size 4*4. Initially, we will initialize all the elements of the 2-D array to 0.

We will start iterating from i = 1 and j = 1, since both the characters matches (‘a’ from “abc” and ‘a’ from “act”) , we will increment dp[i][j]=1+dp[i-1][j-1].

Similarly, we will move ahead by comparing every character of both the strings and fill the 2D array by following the bottom-up Algorithm. At last, dp[m][n] will give us the required solution.


I hope both the solutions are clear, both the solutions (Memoised and Tabular) takes O(m.n) time complexity which is far better than exponential time complexity.

This is how Dynamic Programming works, first of all you have to come up with a recursion based solution and then you have to write a DP solution to avoid overlapping subproblems by any of the two way discussed above.

That’s all folks..!!!(longest subsequence problem)

Abhishek is currently pursuing CSE from Heritage Institute of Technology, Kolkata. He has a great interest in Data Structures and Algorithms, C++, Language, Competitive Coding, Android Development. His hobbies are Learning new skills, Content Writing, Competitive Coding, Teaching contents to Beginners.

2 Comments

  1. Hello ABHISHEK!!
    Really very helpful and interesting.
    In fact, when we see first time coding, it is very amazing.

    Best luck for your next Goal 👍
    Regards
    Vishal Meena.

Leave a Comment

Your email address will not be published. Required fields are marked *