Hadoop is now a well-established and widely used technology for analyzing data on a large scale. All AI and data startups are helping the companies by providing the analytics to boost sell online. From a small server log to large...
Blog
Formula to Calculate the Number of DataNodes
When you are designing a Hadoop project from scratch then there are multiple things you may need to consider. For example, HDFS Node Storage, the number of datanodes etc. to be used. Here in this tutorial, I will share the...
Formula to Calculate HDFS Node Storage
This Formula to calculate HDFS node storage is equally important for both practical Hadoop practice and Hadoop interview. So let’s get started with how to calculate HDFS node storage? Here is the formula to find the HDFS storage...
Datasets for Hadoop Practice
In this Datasets for Hadoop Practice tutorial, I am going to share few free Hadoop data sources available for use. You can download these and start practicing Hadoop easily. I have compiled the list of datasets available and have...
Word Count in Python
This article is all about word count in python. In our last article, I explained word count in PIG but there are some limitations when dealing with files in PIG and we may need to write UDFs for that. Those can be cleared in...