MLens is Knowledge Lens’ one-stop solution for enterprises’ big data requirements. With Big Data Backup, Automated Disaster Recovery, Data Ingestion, Compression, Ecryption, and Archival capabilities, our solution has proven to deliver absolute control over enterprise data, its storage, analysis and recovery.
This seven-part blog series will take you through the different scenarios where MLens was deployed in our clients’ organizations, to solve their business challenges, and add value to all on board. In this post, we will look into our client’s requirement for HDFS File Merge for their small files.
In HDFS, small size files are a huge challenge. As you may be aware, small size files are smaller than the HDFS block size. When we store a lot of small files in HDFS, the cluster size might not get utilized to the maximum; instead, the Name Node gets consumed at a larger scale.
In our client’s case, they had millions of small files, and the teams were unable to load new datasets into the cluster as the Name Node was completely utilized.
The Knowledge Lens Solution:
We overcame the challenge by implementing a distributed feature where the small files were read by the tool and created as a single large file. The threshold of the file size could be selected by the client/ user, along with the level of distribution. As long as all the files under the supplied directory had the same schema, this solution proved to solve our client’s challenge.
Why Knowledge Lens?
Typically, those who face similar issues, load their small size files on to a temporary table in Apache Hive. We provided the unique solution of not depending on such an intermediary phase.
“The MLens solution works on a distributed framework, at high speed. What’s more, the entire process is configurable by the client themselves, according to their desired parameters.”
Looking for more resources? Read our previous posts in the series here:
Read our customer success stories here-
At Knowledge Lens, we constantly work towards improving our Lenses, so your business can do more for you. Visit us here to learn how you can grow your business operations through data- driven decision making, starting today.
Contributors: Sayantan Ray, MLens Technology Lead