Data anchoring and verification using zero-knowledge proof

Find out what data anchoring is and why is it beneficial. This article will teach you basic principles and simple usage so you can start thinking of ways to apply data anchoring in your business.

February 21, 2022

Introduction

In today's world trust is everything, but unfortunately, different parties, more often than not, do not trust each other. Therefore, it is crucial for one party to prove something to the other party - in that case, trust is not required, but only sophisticated mechanisms that would almost guarantee the truth.

Data anchoring with Merkle tree when implemented correctly can make a zero-knowledge proof. To find out how, first you would need to get familiar with those three terms - Data anchoring, Merkle tree and zero-knowledge proof.

Data anchoring

Data anchoring consists in creating a timestamped proof of existence for data and storing it to a tamper-resistant blockchain.

Key point is that having the data enables the recreation of the proof, but not the other way around. This way whoever is in possession of the data can verify if the data has been tampered with since its anchoring on the blockchain.

Merkle tree

Merkle tree or hash tree is a binary tree in which every leaf is labeled with cryptographic hash of a data block, and every node that is not a leaf is labeled with cryptographic hash of the labels of its child nodes.

Source: Wikipedia

Similar to data anchoring, with knowing all data blocks it is easy to construct a Merkle tree and find the label of the root node or simply called root hash (top hash), but it is not possible to find data blocks knowing only root hash. That is why Merkle tree is used in data anchoring where root hash is the proof that the data is stored onto blockchain.

Another key feature Merkle tree has is that it can be partially created with only subset of data and a receipt for that subset. It is important to note that only those who possess all the data are able to generate such receipts. For example: if we only have L1 and L2 data blocks from the picture above, and we have "Hash 1" inside the receipt, we are able to construct the left side of the Merkle tree with data blocks, and with "Hash 1" from the receipt we are able to calculate root hash.

Zero-knowledge proof

Zero-knowledge proof is a method by which one party (the prover) can prove to another party (the verifier) that a given statement is true while the prover avoids conveying any additional information apart from the fact that the statement is indeed true. The essence of zero-knowledge proofs is that it is trivial to prove that one possesses knowledge of certain information by simply revealing it; the challenge is to prove such possession without revealing the information itself. - from Wikipedia

How is it done?

Now that you have gotten familiar with these terms, I am guessing you got a hang of it - how could Merkle tree be used to achieve data anchoring and zero-knowledge proof? First, to anchor your data you would simply order your data blocks and construct a Merkle tree. It is important to arrange data blocks in deterministic order because later data blocks will need to be ordered exactly the same. The root hash of this tree has to be stored on a public blockchain and that is it - you anchored your data.

Second thing you want to do is to ensure you are able to recreate Merkle tree with only subset of your data blocks and an according receipt. You would have to write two algorithms - first algorithm, based on the subset of data blocks and all data blocks, creates the receipt for the subset; second one, given the subset and an according receipt, recreates the Merkle tree. First algorithm is used internally when you want to share a portion of your data, and second algorithm should be open source that would be used by the receiver of the data to verify whether the data has been tampered with since its anchoring.

Conclusion

Data anchoring is useful mechanism that finds its applications in various industries. There are a lot of ways to implement data anchoring, but in our case we used Merkle tree to do so. Merkle tree is a binary tree in which leaf nodes are labeled with hash of data blocks, and each non-leaf node is labeled with hash of its child nodes. Hence, Merkle tree algorithm can create a single hash (root hash) out of multiple data blocks. Also, root hash of a Merkle tree can be recreated either with all data blocks it contains, or with a subset of the data blocks and an according receipt.

Data anchoring using the Merkle tree algorithm creates the ability to prove certain data is valid without sharing all the data. This makes it zero-knowledge proof.

Blockchain

Written by

Andrej Ceraj

Software engineer

If you like this article, we're sure you'll love these!

Are OpenAI’s Vector Databases Good Enough for Your Needs?

Discover whether OpenAI’s Embeddings API is the right fit for your vector search needs. Compare it with top vector databases like FAISS, Pinecone, Milvus, and Weaviate.

Suprabit • 4 months ago

Knowledge Management: Applications for Modern Enterprises

Explore smarter ways to manage knowledge that drive efficiency, innovation, and seamless team collaboration

Suprabit • 7 months ago

View all →

Projects Careers Blog Contact Legal Privacy

Made with ❤️ in Zagreb