Gartner’s recent survey of 3000 CIOs reported that Analytics and Business Intelligence (BI), are the top differentiating factors of an enterprise. As a result, Big Data and Big Data Analytics are being increasingly implemented across organizations, for maximum insights and value.

In light of an increasingly data-driven culture, enterprise-level businesses are now focusing their energies on custom solutions, that are founded on Big Data insights, to solve their unique challenges.

MLens is Knowledge Lens’s one-stop solution for your enterprise’s Big Data requirements. With Big Data Backup, Automated Disaster Recovery, Data Ingestion, Compression, Encryption, and Archival capabilities, our solution has proven to deliver absolute control over your enterprise data, its storage, analysis and recovery.

This seven-part blog series will take you through the different scenarios where MLens was deployed in our clients’ organizations, to solve their business challenges, and add value to all on board. In our previous post, we discussed how our client achieved secure and scalable data masking and data transfer. In this post, we will look into our client’s requirement for secure data transfer across Hadoop Clusters of different distribution.

Business Challenge

Transferring data from one Hadoop cluster to another was traditionally done using Distcp, a tool provided by Apache Hadoop. However, when our client began using multiple Hadoop distributions (like Cloudera, Hortonworks, MapR, and EMR), the data transfer became complicated.

Moreover, the Distcp tool requires the source and target clusters to be communicated across Namenode IP or hostname, in order for copying to take place. Due to the Hadoop clusters’ high availability, this became impossible.

Typically, the Distcp tool uses the cluster’s datanodes to read and write data, when triggered from a Hadoop cluster. In this case however, the datanodes of source and target clusters had different Kerberos realms and weren’t in communication with each other.

Thus, data transfer became a real challenge in this scenario, and our client began the manual copy of data; they would download data from the source Hadoop cluster, copy it to the local file system of a node in the target cluster, and then upload it to the target Hadoop cluster. This was a highly time-consuming and tedious process.

The Knowledge Lens Solution

To overcome all these complications, the MLens team deployed a Data Migrator tool. This tool can perform secure data transfer between multiple Hadoop clusters having different distributions, versions, and Kerberos realms, with no direct connectivity.

Why Knowledge Lens?

One big advantage of the MLens tool is that it doesn’t use any local disk storage for intermediate writes, enabling seamless, secure and scalable data migration.

“Our client was enabled to perform large-scale distributed migration

of data in the order of petabytes, with minimum effort.”

Explore MLens

Request Free Demo

Looking for more resources? Read about our customer success stories here-

High Speed Backup and Efficient Data Recovery

Reshaping the Creation of Enterprise Data Lakes

At Knowledge Lens, we constantly work towards improving our Lenses, so your business can do more for you. Visit us here to learn how you can grow your business operations through data- driven decision making, starting today.

Contributors: Rupak Das, Technology Lead, Knowledge Lens



Sneha Mary Christall

Marketing and Brand Executive, Knowledge Lens.

Leave a Reply

Your email address will not be published. Required fields are marked *