Today I am going to discuss two **Tree Data Structure** Problems, from the first sub tree sum problem we will learn some new concepts, and then we will use that concept to solve another problem. Basically, these problems are related to **sub-tree in Tree Data Structures**. So get ready to nail some concepts.

**Leetcode 508: Most Frequent Subtree Sum**

Given the root of a tree, you are asked to find the **most frequent sub tree sum** in problem. The subtree sum of a node is defined as the sum of all the node values formed by the subtree rooted at that node (including the node itself). So what is the most frequent subtree sum value?

If there is a **tie**, return all the values with the **highest frequency in any order**.**Examples 1****Input:**

[5, 2, -3]**return [2, -3, 4]**, since all the values happen only once, return all of them in any order.**Examples 2****Input:**

[5, 2, -5]**return [2]**, since 2 happens twice, however -5 only occurs once.**Note:** You may assume the sum of values in any sub tree is in the range of 32-bit signed integers.

**Analysis of Problem Statement**

Let us understand the **problem statement** clearly, this is a tree problem where you are given the **root** of the tree and you have to **return a vector** with **most frequent subtree sum** which means we have to find subtree sum which occurred maximum time.

**What is a subtree ??**

I know it is a basic question but most of you might don’t know what actual definition of a subtree.

A subtree of a **tree T** is a **tree S** consisting of a node in** T and all of its descendants in T**.

Like for ex,

In the above example, **[1,2]** is a subtree because all the descendent of **[1, 2]** is actually the descendent of a**ctual tree** also.**[2, 5]** is not a subtree because it is **violating the rule**.

Now let us move back to original problem statement now.

We have to find **most frequent subtree sum**. If there is only one subtree** sum value** which has occurred the most then vector which you are returning will have only **one element** or else you have to return all the values with the **highest frequency in any order**.

Let us understand it by some examples,

**Output :** [9]

In this above example, the subtree **sum 9** has occurred twice. First in subtree **[2, 4, 3] **and second in the subtree **[9]**.**NOTE** a subtree with a **leaf node** is also the subtree of the main tree.

There are also subtree sums like sum of subtree **[-3, 2, 9, 5, 2, 4, 3]** is **[22]** but subtree sum **[9]** has occurred **twice**, since frequency is more therefore **[9]** is our **desired result**.

Let’s take one more example,

**Output** : [7, -1, 1, 3]

In the above example no **subtree sum** has occurred more than **once**. You can figure it out. Subtree sum of **[2, 5, -1, 7]** is **[7]** and subtree sum of **[1,2]** is **[3]**. So, there is no subtree sum present which has occurred more than **once**.

Since **1** is the **highest frequency**, we have to return the vector of **all subtree sums**.

In the output you can see all subtree sums are **unique** so we have to return the vector containing all the **subtree sums.**

Hope problem Statement is clear.

**Approach to the Problem**

For finding **most frequent subtree sum**, you need to find all sub-trees sum.

We can find it by both **recursive** and **iterative** way . Iterative process will be **lengthy**, so in this post we will try to solve this problem **recursively**.

Also iterative algorithm is a bit easier than recursive algorithm. So, I will discuss how to **visualise recursion** while solving this problem.

We will use a **hash-map** to store the sum of the sub-tree’s and then later we will try to find the most occurred sum from that map.

Let’s discuss how to find sum of sub-trees of a tree.

Suppose, above tree is given to you and you have to find sum of all sub-trees present there.

Since we know **sub-trees** reduces from **top to bottom**.

For example : **[2,-3,1]** get reduced from** [5,2,-3,1,7,3,-2]**. Also **[-3]** get reduced from **[2,-3,1]**.

We will use **inorder traversal** to calculate the sum of sub-trees and we will store it in the hash – map.

Let’s see the code,

**Code in C/C++**

```
unordered_map<int, int> map1;
int F(TreeNode *root)
{
if(root == NULL) return 0;
int r = root->val;
r += F(root->left);
r += F(root->right);
map1[r]++;
return r;
}
vector<int> findFrequentTreeSum(TreeNode* root) {
vector<int> v;
if(root == NULL) return v;
F(root);
map<int, vector<int>> m;
for(auto i = map1.begin(); i != map1.end(); i++)
{
m[(*i).second].push_back((*i).first);
}
auto itr = m.rbegin();
return (*itr).second;
}
```

Here, we stored sum of sub-trees and stored it in hash-map. **Hash-map** is also counting the **occurrence** of the sub-trees sum.

In the second step we again create a map and this time we are storing the **sum values** as a vector for particular frequency. In this map, **frequency is the key.**

Since map is always in sorted order we will return the

**vector**whose key value will be the maximum and hence we get our desire result.

Now let’s run the code for above example,

We are initially at **root.** The pictorial representation below will show how actually the recursion will work.

This is how the** recursion** will look like. First, we will go to leftmost node until it is **NULL**. When it is NULL, the function will **return 0**. Similarly, for that parent will solve for it’s right node until it’s **NULL**.

We will calculate the **subtree sum value** for all the parents and we will store it in the **hash-map**. This way we will move ahead.

Try to understand and **visualize** recursion with the help of **pen and paper**. Try to take help from the above pictorial diagram.

This will be the **hash-map** for the above example, here highest value or frequency is **1**, so we have to return a vector having sum value of every subtree present.

**NOTE** order of the subtree sum doesn’t matter. Hope, the explanation to this problem and the code is clear to you all. Now let’s move to our **next problem**.

**Leetcode 652 : Find Duplicate Subtrees**

Given a binary tree, return **all duplicate sub-trees**. For each kind of duplicate sub-trees, you only need to return the root node of any one of them.

Two trees are **duplicate** if they have the **same structure** with same node values.**Example 1:**

[1, 2, 3, 4, null, 2, 4, null, null, 4]

The following are two **duplicate** sub-trees: **[2, 4]** and **[4]****Therefore**, you need to return above **trees’ root **in the form of a **list**. For above example you should return **[2, 4]**.

**Analysis of Problem Statement**

I think **problem statement** is clear enough. You are given a Binary tree and you have to find all its duplicate subtree. Again definition of subtree is written above.

Your task to find all **duplicate sub-trees** and you have to return all **root node** of all duplicate sub-trees in form of a vector.

For each kind of **duplicate sub-trees**, you only need to return the root node of **any one of them**.

Let’s understand it clearly by an example,

**Output** : [2, 4]

In above example, **duplicate sub-trees** are marked. There are two kinds of duplicate sub-trees are present.

The first one is **[2,4]** and second one is **[4]**. We have to return the vector having only the **root node** of the duplicate sub-trees. So, output is **[2,4]**, **2** being the root node of first sub-tree and **4** being the root of second sub-tree.

Hope Problem Statement is clear to you.

**Approach to the Problem**

The **approach** is similar to previous problem we discuss, here also we will **store** every subtree in a **hash-map** and if we find any subtree which is **already present** in the hash-map, then that subtree is the **duplicate one**.

But here there is one problem.

**How will you store a sub-tree in a hash-map ?**

Yes, problem arises here, how we can store a **sub-tree** in a **hash-map**.

The answer to this question is **string**. We can form a string of all the elements present in a subtree.

Yes it is as simple as that. Now let us see the code for the above **logic**.

```
unordered_map<string, int> m;
vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {
vector<TreeNode*> res;
find(root, res);
return res;
}
string find(TreeNode* root, vector<TreeNode*> &res){
if(root == NULL){
return "";
}
string str = to_string(root->val);
str += find(root->left, res);
str += find(root->right, res);
m[str]++;
if(m[str] == 2) res.push_back(root);
return str;
}
```

Yes, we have written the same logic which we have discussed earlier. We have converted integer to string by **to_string()** inbuilt function. After that, we store that string in a map.

If we find the **same string** already being present in the map, we store its root in the resultant vector.

Now let us understand it by an example,

We have taken the same example. The pictorial diagram below represents how **recursion** will take place.

This is how actual **recursion** will take place, we will store every subtree in a map. Since **“24”** and **“4”** is repeated, we will store it’s root node in our **resultant vector**.

Hope this explanation and concept is clear to you. If you couldn’t understand it please **go through it again**.

But wait there is still a small flaw present in the above code. Try to identify it. It is a very **small logical error**.

Let’s take another example,

Please try to apply the above **algorithm** in the above example and try to figure out the **logical error**.

**Importance of a comma (,)**

You will realise the importance of comma if you analyse the above example. **Please** draw the** recursion tree** yourself for the above example.**What are the strings present in the map ?**

If you draw the recursion tree, you will find strings present in the map are –

“11”

“111”

“1”

“111”

“2111111”

Yes, Although **no duplicate sub-tree** is present in the above example but we are actually getting a duplicate sub-tree that is **“111”** and yes this is the **flaw**.

The reason behind the **flaw** is that the string formed by taking** [1,11]** as a subtree is same as the string formed by taking** [11,1]** as a subtree.

Hope, now you have understood the logic behind the error behind it. The only change we have to make is to **add a comma **whenever you insert a number in the string.

Let’s see the updated code,

**Updated Code in C/C++**

```
unordered_map<string, int> m;
vector<TreeNode*> findDuplicateSubtrees(TreeNode* root) {
vector<TreeNode*> res;
find(root, res);
return res;
}
string find(TreeNode* root, vector<TreeNode*> &res){
if(root == NULL){
return "";
}
string str = to_string(root->val) + ",";
str += find(root->left, res) + ",";
str += find(root->right, res) + ",";
m[str]++;
if(m[str] == 2) res.push_back(root);
return str;
}
```

Now, we have resolve that problem. Hope you have understood the code and logic behind this Question.

The strings now present in the map are –

“11,,,”

“1,11,,,,,”

“1,,,”

“11,1,,,,,”

“2,1,11,,,,,,11,1,,,,,,”

Now no duplicate string is present and that’s how we resolve our **error**.

**Congrats** by going through these problems, you have understood new concept today. I hope you enjoyed this post.

That’s all folks..!!!