 # Sub-tree Problems in Tree Data Structure

Today I am going to discuss two Tree Data Structure Problems, from the first sub tree sum problem we will learn some new concepts, and then we will use that concept to solve another problem. Basically, these problems are related to sub-tree in Tree Data Structures. So get ready to nail some concepts.

## Leetcode 508: Most Frequent Subtree Sum

Given the root of a tree, you are asked to find the most frequent sub tree sum in problem. The subtree sum of a node is defined as the sum of all the node values formed by the subtree rooted at that node (including the node itself). So what is the most frequent subtree sum value?

If there is a tie, return all the values with the highest frequency in any order.

Examples 1

Input:
[5, 2, -3]

return [2, -3, 4], since all the values happen only once, return all of them in any order.

Examples 2

Input:
[5, 2, -5]

return , since 2 happens twice, however -5 only occurs once.

Note: You may assume the sum of values in any sub tree is in the range of 32-bit signed integers.

### Analysis of Problem Statement

Let us understand the problem statement clearly, this is a tree problem where you are given the root of the tree and you have to return a vector with most frequent subtree sum which means we have to find subtree sum which occurred maximum time.

#### What is a subtree ??

I know it is a basic question but most of you might don’t know what actual definition of a subtree.

A subtree of a tree T is a tree S consisting of a node in T and all of its descendants in T.

Like for ex,

In the above example, [1,2] is a subtree because all the descendent of [1, 2] is actually the descendent of actual tree also.

[2, 5] is not a subtree because it is violating the rule.

Now let us move back to original problem statement now.

We have to find most frequent subtree sum. If there is only one subtree sum value which has occurred the most then vector which you are returning will have only one element or else you have to return all the values with the highest frequency in any order.

Let us understand it by some examples,

Output : 

In this above example, the subtree sum 9 has occurred twice. First in subtree [2, 4, 3] and second in the subtree .

NOTE a subtree with a leaf node is also the subtree of the main tree.

There are also subtree sums like sum of subtree [-3, 2, 9, 5, 2, 4, 3] is  but subtree sum  has occurred twice, since frequency is more therefore  is our desired result.

Let’s take one more example,

Output : [7, -1, 1, 3]

In the above example no subtree sum has occurred more than once. You can figure it out. Subtree sum of [2, 5, -1, 7] is  and subtree sum of [1,2] is . So, there is no subtree sum present which has occurred more than once.

Since 1 is the highest frequency, we have to return the vector of all subtree sums.

In the output you can see all subtree sums are unique so we have to return the vector containing all the subtree sums.

Hope problem Statement is clear.

### Approach to the Problem

For finding most frequent subtree sum, you need to find all sub-trees sum.

We can find it by both recursive and iterative way . Iterative process will be lengthy, so in this post we will try to solve this problem recursively.

Also iterative algorithm is a bit easier than recursive algorithm. So, I will discuss how to visualise recursion while solving this problem.

We will use a hash-map to store the sum of the sub-tree’s and then later we will try to find the most occurred sum from that map.

Let’s discuss how to find sum of sub-trees of a tree.

Suppose, above tree is given to you and you have to find sum of all sub-trees present there.

Since we know sub-trees reduces from top to bottom.

For example : [2,-3,1] get reduced from [5,2,-3,1,7,3,-2]. Also [-3] get reduced from [2,-3,1].

We will use inorder traversal to calculate the sum of sub-trees and we will store it in the hash – map.

Let’s see the code,

### Code in C/C++

``````unordered_map<int, int> map1;
int F(TreeNode *root)
{
if(root == NULL) return 0;

int r = root->val;

r += F(root->left);
r += F(root->right);

map1[r]++;

return r;
}

vector<int> findFrequentTreeSum(TreeNode* root) {

vector<int> v;
if(root == NULL) return v;

F(root);

map<int, vector<int>> m;
for(auto i = map1.begin(); i != map1.end(); i++)
{
m[(*i).second].push_back((*i).first);
}

auto itr = m.rbegin();
return (*itr).second;
}  ``````

Here, we stored sum of sub-trees and stored it in hash-map. Hash-map is also counting the occurrence of the sub-trees sum.

In the second step we again create a map and this time we are storing the sum values as a vector for particular frequency. In this map, frequency is the key.

Since map is always in sorted order we will return the vector whose key value will be the maximum and hence we get our desire result.

Now let’s run the code for above example,

We are initially at root. The pictorial representation below will show how actually the recursion will work.

This is how the recursion will look like. First, we will go to leftmost node until it is NULL. When it is NULL, the function will return 0. Similarly, for that parent will solve for it’s right node until it’s NULL.

We will calculate the subtree sum value for all the parents and we will store it in the hash-map. This way we will move ahead.

Try to understand and visualize recursion with the help of pen and paper. Try to take help from the above pictorial diagram.

This will be the hash-map for the above example, here highest value or frequency is 1, so we have to return a vector having sum value of every subtree present.

NOTE order of the subtree sum doesn’t matter. Hope, the explanation to this problem and the code is clear to you all. Now let’s move to our next problem.

## Leetcode 652 : Find Duplicate Subtrees

Given a binary tree, return all duplicate sub-trees. For each kind of duplicate sub-trees, you only need to return the root node of any one of them.

Two trees are duplicate if they have the same structure with same node values.

Example 1:

[1, 2, 3, 4, null, 2, 4, null, null, 4]

The following are two duplicate sub-trees: [2, 4] and 

Therefore, you need to return above trees’ root in the form of a list. For above example you should return [2, 4].

### Analysis of Problem Statement

I think problem statement is clear enough. You are given a Binary tree and you have to find all its duplicate subtree. Again definition of subtree is written above.

Your task to find all duplicate sub-trees and you have to return all root node of all duplicate sub-trees in form of a vector.

For each kind of duplicate sub-trees, you only need to return the root node of any one of them.

Let’s understand it clearly by an example,

Output : [2, 4]

In above example, duplicate sub-trees are marked. There are two kinds of duplicate sub-trees are present.

The first one is [2,4] and second one is . We have to return the vector having only the root node of the duplicate sub-trees. So, output is [2,4], 2 being the root node of first sub-tree and 4 being the root of second sub-tree.

Hope Problem Statement is clear to you.

### Approach to the Problem

The approach is similar to previous problem we discuss, here also we will store every subtree in a hash-map and if we find any subtree which is already present in the hash-map, then that subtree is the duplicate one.

But here there is one problem.

#### How will you store a sub-tree in a hash-map ?

Yes, problem arises here, how we can store a sub-tree in a hash-map.

The answer to this question is string. We can form a string of all the elements present in a subtree.

Yes it is as simple as that. Now let us see the code for the above logic.

``````unordered_map<string, int> m;

vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {

vector<TreeNode*> res;
find(root, res);

return res;
}

string find(TreeNode* root, vector<TreeNode*> &res){

if(root == NULL){
return "";
}

string str = to_string(root->val);

str += find(root->left, res);
str += find(root->right, res);

m[str]++;

if(m[str] == 2) res.push_back(root);

return str;
}``````

Yes, we have written the same logic which we have discussed earlier. We have converted integer to string by to_string() inbuilt function. After that, we store that string in a map.

If we find the same string already being present in the map, we store its root in the resultant vector.

Now let us understand it by an example,

We have taken the same example. The pictorial diagram below represents how recursion will take place.

This is how actual recursion will take place, we will store every subtree in a map. Since “24” and “4” is repeated, we will store it’s root node in our resultant vector.

Hope this explanation and concept is clear to you. If you couldn’t understand it please go through it again.

But wait there is still a small flaw present in the above code. Try to identify it. It is a very small logical error.

Let’s take another example,

Please try to apply the above algorithm in the above example and try to figure out the logical error.

#### Importance of a comma (,)

You will realise the importance of comma if you analyse the above example.

Please draw the recursion tree yourself for the above example.

What are the strings present in the map ?

If you draw the recursion tree, you will find strings present in the map are –

“11”
“111”
“1”
“111”
“2111111”

Yes, Although no duplicate sub-tree is present in the above example but we are actually getting a duplicate sub-tree that is “111” and yes this is the flaw.

The reason behind the flaw is that the string formed by taking [1,11] as a subtree is same as the string formed by taking [11,1] as a subtree.

Hope, now you have understood the logic behind the error behind it. The only change we have to make is to add a comma whenever you insert a number in the string.

Let’s see the updated code,

### Updated Code in C/C++

``````unordered_map<string, int> m;

vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {

vector<TreeNode*> res;
find(root, res);

return res;
}

string find(TreeNode* root, vector<TreeNode*> &res){

if(root == NULL){
return "";
}

string str = to_string(root->val) + ",";

str += find(root->left, res) + ",";
str += find(root->right, res) + ",";

m[str]++;

if(m[str] == 2) res.push_back(root);

return str;
}``````

Now, we have resolve that problem. Hope you have understood the code and logic behind this Question.

The strings now present in the map are –
“11,,,”
“1,11,,,,,”
“1,,,”
“11,1,,,,,”
“2,1,11,,,,,,11,1,,,,,,”

Now no duplicate string is present and that’s how we resolve our error.

Congrats by going through these problems, you have understood new concept today. I hope you enjoyed this post.

That’s all folks..!!! 