MLens is Knowledge Lens’ one-stop solution for enterprises’ big data requirements. With Big Data Backup, Automated Disaster Recovery, Data Ingestion, Compression, Ecryption, and Archival capabilities, our solution has proven to deliver absolute control over enterprise data, its storage, analysis and recovery.
This seven-part blog series will take you through the different scenarios where MLens was deployed in our clients’ organizations, to solve their business challenges, and add value to all on board. In this post, we will look into our client’s requirement for HDFS File Merge for their small files.
In HDFS, small size files are a huge challenge. As you may be aware, small size files are smaller than the HDFS block size. When we store a lot of small files in HDFS, the cluster size might not get utilized to the maximum; instead, the Name Node gets consumed at a larger scale.
In our client’s case, they had millions of small files, and the teams were unable to load new datasets into the cluster as the Name Node was completely utilized.
The Knowledge Lens Solution:
We overcame the challenge by implementing a distributed feature where the small files were read by the tool and created as a single large file. The threshold of the file size could be selected by the client/ user, along with the level of distribution. As long as all the files under the supplied directory had the same schema, this solution proved to solve our client’s challenge.
Why Knowledge Lens?
Typically, those who face similar issues, load their small size files on to a temporary table in Apache Hive. We provided the unique solution of not depending on such an intermediary phase.
“The MLens solution works on a distributed framework, at high speed. What’s more, the entire process is configurable by the client themselves, according to their desired parameters.”
Looking for more resources? Read our previous posts in the series here:
How a Global Payments Company achieved Secure, Scalable Data Masking and Data Transfer
How our client achieved Secure Data Transfer across multiple Hadoop Clusters
How our client achieved full and incremental Data Migration from a Live Cluster
Achieve Real-time Replication of Data across Multiple Live HBase Clusters
Achieve HDFS Snapshot Backup and Recovery in Flash Speed
Read our customer success stories here-
High Speed Backup and Efficient Data Recovery
Reshaping the Creation of Enterprise Data Lakes
At Knowledge Lens, we constantly work towards improving our Lenses, so your business can do more for you. Visit us here to learn how you can grow your business operations through data- driven decision making, starting today.
Contributors: Sayantan Ray, MLens Technology Lead