Design an algorithm and write code to serialize and deserialize a binary tree. Writing the tree to a file is called ‘serialization’ and reading back from the file to reconstruct the exact same binary tree is ‘deserialization’.

Being able to store a binary tree to a file presents lots of benefits. First, we are able to save the binary tree to a file and restore it at a later time. We are also able to transmit the binary tree representation via network and load it into another computer. Without doubt, serialization/deserialization of a binary tree is important and an algorithm to represent a binary tree in a compact way is very desirable.

Serialization is the process of converting a data structure or object into a sequence of bits so that it can be stored in a file or memory buffer, or transmitted across a network connection link to be “resurrected” later in the same or another computer environment.

**Hint:**

If you have not read my previous article about Saving a Binary Search Tree to a File, you should read it now. Since we are dealing with a binary tree, not a binary search tree (BST), our previous method will not work. However, this will get you started.

**Solution:**

Our previous method will not work in the case of Binary Tree, because binary trees are not bound with the same rule as BST. In order for the nodes to be inserted at the correct place, we would need to output the NULL nodes using some kind of sentinel (Here, we use ‘**#**‘ as the sentinel) as we are doing pre-order traversal.

Assume we have a binary tree below:

_30_ / \ 10 20 / / \ 50 45 35

Using pre-order traversal, the algorithm should write the following to a file:

`30 10 50 # # # 20 45 # # 35 # #`

The pre-order traversal code below does all the job to serialize a binary tree, believe it or not!

1 2 3 4 5 6 7 8 9 |
void writeBinaryTree(BinaryTree *p, ostream &out) { if (!p) { out << "# "; } else { out << p->data << " "; writeBinaryTree(p->left, out); writeBinaryTree(p->right, out); } } |

**Deserializing a Binary Tree:**

Reading the binary tree from the file is similar. We read tokens one at a time using pre-order traversal. If the token is a sentinel, we ignore it. If the token is a number, we insert it to the current node, and traverse to its left child, then its right child.

1 2 3 4 5 6 7 8 9 10 11 |
void readBinaryTree(BinaryTree *&p, ifstream &fin) { int token; bool isNumber; if (!readNextToken(token, fin, isNumber)) return; if (isNumber) { p = new BinaryTree(token); readBinaryTree(p->left, fin); readBinaryTree(p->right, fin); } } |

**Alternative Solution:**

We may also use level-order traversal to write/read binary tree. Level-order traversal works because like pre-order traversal, we see the parent node before its child nodes. The implementation of it using level-order traversal is left as an exercise to the reader. Read my post: Printing a Binary Tree in Level Order on how to traverse a binary tree in level-order.

**Further Thoughts:**

There is an obvious shortcoming in this method, that is: a sentinel is required to represent empty nodes. What if we need to store strings that can contain any characters (including the sentinel) in the binary tree? Could you come up with a solution to overcome this shortcoming?

Please note that this is not the only way to serialize a binary tree, and also probably not the most compact way. But this is good enough for a standard interview question. If you would like to explore deeper, and even apply serialization/deserialization to general trees, you may read the Binary Tree article’s encoding section on Wikipedia.

Serialization/Deserialization of a Binary Tree,
Your analysis of Binary Tree traversal is really good. I just saw a new one, maybe you can give me some thought. (Especially for the non-complete BST)

Print all edge nodes of a complete binary tree anti-clockwise.

That is all the left most nodes starting at root, then the leaves left to right and finally all the rightmost nodes.

In other words, print the boundary of the tree.

Variant: Print the same for a tree that is not complete.

Thanks.

+2Hint: Consider using depth-first traversal. You can detect if a node is a leaf node easily and print it out.

For printing the left/right most nodes, you would need to pass some variables in to hold the state. Basically, if the current node is leftmost node, when you traverse its left child node, it is also a leftmost node.

Note that the rightmost nodes are printed from bottom to top. You know how recursion works as a stack already if you had read some of my other posts. It's a very handy trick to know.

0did you miss the '!'? should it be:

if (!p) {

out << "# ";

}

+1Thanks for catching that! My reader's eyes are sharp 🙂

Fixed.

+1Is the function "readNextToken" in STL? can't find it…

0it is not STL function. This function can get a word from string. It is easy so author may ask you to write it by yourself.

0I have some question about the calling of “if (!readNextToken(token, fin, isNumber))” in your code.

Since all function arguments are passed by value in C, shouldn’t it be written as “readNextToken(&token, fin, &isNumber)”?

0On second thought, if your prototype for readNextToken(token, fin, isNumber) is “bool readNextToken(int &token, ifstream &fin, bool &isNumber)” in line with C++ syntax (call by reference), then nothing is wrong in your calling.

0Pingback: Serialization/Deserialization of a Binary Tree | 口├人人│의 Blog

How can the recursion of the function writeBinaryTree be stopped?

0Sorry for the misunderstanding. It should be stopped at the last leaf node.

0What is the time and space complexity guys?

O(nlogn) – time

O(height of the tree) – stack space.

And extra space for sentinels?

0I have a question – has anyone considered building a heap array out of a binary tree. For example, a node goes in position i, its left child goes in 2i+1 and right child goes in 2i+2. If either child does not exist then the place remains empty in the array. Some people have indicated that this is not the most efficient solution (once an interviewer). And, I am unable to understand why. May be I am missing something, can someone please explain? In my opinion, space wise, one node eats up 3 places in the array irrespective of the number of children – one for itself, one each for its left child and right child. But even in the pre-order traversal method, we save some sentinel for every null node, so we are allocating space for non-existent children even there. Time complexity wise, O(n) to serialize and O(nlogn) to build the tree back. So, what is inefficient about this method?

0If the tree is quite unbalanced, you will be wasting huge amount of space and time traversing these null nodes. Consider the extreme case of inorder insertion of an ordered array to a binary tree. On the other hand, sentinels terminates those null links at the very beginning.

+1Binary search trees do come with a performance hit.

The problem is – no one in the last 50 years has been able to come up with a solution to address the performance issues with using binary trees.

Hash table is better choice over BST.

0I actually though the same and agree with you. It seems absolutely the same with the sentinel as the missing children in the heap. Also since its not going to be a real constructing it back will only be O(n) and not nlog(n). Creating a real heap takes O(nlogn) :). I think your suggestion should work unless someone can explain why not?

0Just understand refractions space and time

0An angel of the arch.

01337c0d3r:

if (isNumber) {

p = new BinaryTree(token);

readBinaryTree(p->left, fin);

readBinaryTree(p->right, fin);

}

Shouldn’t you be setting p->left and p->right to null before making the call… Because say for example with the call on p->left , the character in the stream turns out to be “#”, then nothing is returned! shouldn’t we explicitly set it to null before the call is made.

0@Balaji: I think you can initialize p->left and p->right to null in BinaryTree’s constructor function. It is done when “new BinaryTree(token)”

0Pingback: Amazon Interview | Set 5 | codingfan

http://www.cs.usfca.edu/~galles/cs245/lecture/lecture9.pdf

Printing out nodes, in order that they would appear

in a PREORDER traversal does not work, because

we don’t know when we’ve hit a null pointer

0Sry. My Bad!!

0Pingback: Miscellaneous | Annotary

i think there are many possibilites of trees for a given preorder traversal

0I am trying to write a java version:

+5hey,

I really like your posts. Thanks for all the great solutions and detailed explanation.

I just have a question how to serialize/deserialize a generic tree (a.ka n-ary tree or k-ary tree). can you please provide a solution.

Any help will be appreciated.

0Pingback: Serialization/Deserialization of a Binary Tree | Parabolic Evidence

Further Thoughts:

There is an obvious shortcoming in this method, that is: a sentinel is required to represent empty nodes. What if we need to store strings that can contain any characters (including the sentinel) in the binary tree? Could you come up with a solution to overcome this shortcoming?

Just a thought:

If we have a preorder/postorder alongwith inorder traversal, any binary tree can be uniquely identified and we would not need to use sentinel here. The only thing is we will have to keep two arrays in place of one.

Even a preorder and postorder can uniquely identify a tree if none of the internal node has one child, i.e., every node has two children (internal node) or no child (leaf).

Let me know if this sounds good to you. I can post the implementation after that.

0Pingback: Serialize and deserialize binary tree using Haskell | e2718281828459045

I have a question, since we are keeping sentinels for leaf which will take O(n) space for any binary tree, why not just store preorder and inorder and uniquely determine the tree. After all these two traversal will take O(n) space in fact it will take exactly 2n space where n is number of node, while if we put sentinels we will have n+1 leaves for n node binary tree so total space in that case will be 2n+1.

so keeping two traversal have better benifits in terms of storage as well as simple deserilization of tree.

0Maybe we can get around this following shortcoming. Because a binary tree can be serialized with their unique keys(integer). And each key is can be mapped to a string in a file or memory.

“There is an obvious shortcoming in this method, that is: a sentinel is required to represent empty nodes. What if we need to store strings that can contain any characters (including the sentinel) in the binary tree? “

0Could you help explain how to implement the readNextToken with ifstream? I have no good idea so far.

00-20Further thought:

There is an obvious shortcoming in this method, that is: a sentinel is required to represent empty nodes. What if we need to store strings that can contain any characters (including the sentinel) in the binary tree? Could you come up with a solution to overcome this shortcoming?

Solution:

when we serialize the binary tree, we output the content to an array first. And put the serial number of array into file to represent the content of node. Then we output that array to another file (For instance, one item one line). By using this way, we can still use sentinel in tree structure file.

When we do deserialize, we read the file which contains array first and construct this array. Then we use normal deserializing process to construct tree. The only difference is we use the number we get from file to find real content we want to put in node.

0Another solution for further thought:

Put a status flag with value together in the file. This flag will tell us if this symbol is a sentinel or not.

For instance, we can output content as below:

30,1 aa,1 |,0 bb,1 |,1

so in the above sample, when we do deserialize, we will know the third value is a sentinel while the last one is a real node.

In real implementation, we can simply output to save space:

301 aa1 |0 bb1 |1

But no matter how we do, we will use some space to store flag. The previous solution also need to use space to store serial number. For large tree, the latter solution can save more space.

0Can the same method be used for Binary Search Tree?

0When reading the file to deserialize the tree, how do we know if the next item in the file is int (node data) or char (# or space)?

0Pingback: Serialization of Binary Tree | xiangcaohello

Pingback: Binary Tree Algorithms » Tech Blog

Hi, just read your article. I stumbled onto a similar problem and me & my friend came up with a simple solution to represent the tree structure (we don’t need the values when rebuilding the tree), by using 2 bits per node to indicate whether that branch continues or is a leaf.

00 10 11 10 11 would mean a tree like this:

(root)

/ \

/\ / \

o /\ o /\

o o o o

This is independent meta data with own finishing rules (can be scanned without additional stop markers or anything).

If you want values to be saved too, you can just read in the next n-bytes, the leaf-cont is the number of “1”s in the tree-metadata.

I wouldn’t save strings directly, instead use a dictionary and then the index as value.

This modular approach also allows for some other neat things (different/shared dictionaries, smaller tree-checksum, etc.)

0Pingback: Serialize DeSerialize BinaryTree | moonstonelin