MLens is Knowledge Lens’ one-stop solution for enterprises’ big data requirements. With Big Data Backup, Automated Disaster Recovery, Data Ingestion, Compression, Encryption, and Archival capabilities, our solution has proven to deliver absolute control over enterprise data, its storage, analysis and recovery.
This seven-part blog series will take you through the different scenarios where MLens was deployed in our clients’ organizations, to solve their business challenges, and add value to all on board. In this post, we will look into our client’s requirement for HDFS Snapshot Backup and Recovery.
Snapshots are read-only copy of HDFS data that are created to protect sensitive data from user modifications. Although snapshot data cannot be modified by any user, snapshots as a whole can be deleted. Also, the snapshot data lies in the same HDFS cluster where the actual data is present. So, in case of a disaster where the entire HDFS cluster fails, the entire snapshot data could be lost.
It is a common practice in industries to take snapshots of production data to prevent it from accidental modification by users or programs. This raises the need for a backup of snapshots that is taken in a separate storage, other than the cluster.
Backup of HDFS cluster data to an external storage like Amazon S3 can be done by Distcp, a tool provided by Apache Hadoop. But due to the way in which snapshots are stored, it cannot be read directly by the Distcp tool. Thus, the backup and restore of snapshots became a huge challenge and it was practically impossible to do, manually.
The Knowledge Lens Solution:
MLens provides a snapshot disaster recovery tool which can perform an automated backup and restore of snapshots, using Amazon S3 as a backup storage. It has an efficient snapshot listing algorithm that searches for all available snapshots within a given HDFS path and backs up all the data and metadata of the snapshots.
The most important feature of this tool is that it can also restore all the snapshot data and metadata, in the correct order in which the snapshots were created originally.
This tool uses MLens’ distributed framework to perform high speed distributed parallel data transfer, which enables a highly robust snapshot backup and restore process. The MLens edit log parser is updated to detect incremental changes in snapshot operations like Create, Delete, Rename, and is integrated with this tool.
Why Knowledge Lens?
“MLens provided the unique advantage of supporting the incremental backup of snapshot data. This eliminated the need for cluster downtime and the copying of the same huge data sets, over and over again.”
Thus, the MLens Snapshot Disaster Recovery tool provided a unique solution for high speed distributed backup and restore of HDFS snapshots.
Looking for more resources? Read our previous posts in the series here:
Read our customer success stories here-
At Knowledge Lens, we constantly work towards improving our Lenses, so your business can do more for you. Visit us here to learn how you can grow your business operations through data- driven decision making, starting today.
Contributors: Rupak Das, Technology Lead, Knowledge Lens