MapReduce Unwinding … Reduce

Once this shuffling completed, it is where REDUCE come into action.  Its task is to process the input given by SHUFFLE into the output so that user can understand what is the result of the file processed by hadoop. After shuffling completed, it is clear that one word will be processed by only one DN and not Read more about MapReduce Unwinding … Reduce[…]

MapReduce Unwinding … Sort & Shuffle

This is in continuation of MapReduce Processing …… This output will be input for next process which is SORT. Sort takes this [L<K,V>] and sorts all the words in order of alpha bates (a to z) on each DN. Sorted arrangement on DNs : DN -1: NODE – 1 [ L (K, V)] PKT-1(K) V Read more about MapReduce Unwinding … Sort & Shuffle[…]

MapReduce Unwinding. . . . . Map

In last discussion on MapReduce, we discussed the algorithm which is used by Hadoop for data processing using MapReduce. Now its time to understand this in detail with help of an example. Lets consider our scenario : We have 7 Node cluster where 1 Node is Name Node (NN) and rest of 6 node is Read more about MapReduce Unwinding. . . . . Map[…]

MapReduce Unwinding. . . . . .Algorithm

With discussion, in my last blog, about “How Hadoop manages Fault Tolerance” within its cluster while processing data, it is now time to discuss the algorithm which MapReduce used to process these data. It is Name Node (NN) where a user submits his request to process data and submits his data files.  As soon as NN receives data Read more about MapReduce Unwinding. . . . . .Algorithm[…]

MapReduce : Fault Tolerance

The Fault Tolerance: Before we see the intermediate data produced by the mapper, it would be quite interesting to see the fault tolerant aspects of Hadoop with respect to MapReduce processing. The Replication Factor: Once Name node (NN) received data files which has to be processed, it splits data files to assign it to Data Read more about MapReduce : Fault Tolerance[…]

MapReduce Internals: Philosophy

The Philosophy: The philosophy of MapReduce internals workings is straight forward and can be summarized in 6 steps. The smaller, the better, the quicker: Whatever data we provide as input to Hadoop, it first splits these data into smaller no of pieces. Typical Size of data: Typically, the size of data split is limited to Read more about MapReduce Internals: Philosophy[…]

MapReduce : Internals

The MapReduce Framework: MapReduce is a programming paradigm that provides an interface for developers to map end-user requirements (any type of analysis on data) to code. This framework is one of the core components of Hadoop. The capabilities: The way it provides fault-tolerant and massive scalability across hundreds or thousands of servers in a cluster Read more about MapReduce : Internals[…]

MAGIC OF HADOOP

Disadvantage of DWH: Because of the limitation of currently available Enterprise data warehousing tools, Organizations were not able to consolidate their data at one place to maintain faster data processing. Here comes the magic of hadoop for their rescue. Traditional ETL tools may take hours, days and sometimes even weeks.  And because of this, performances Read more about MAGIC OF HADOOP[…]

Big Data: An Introduction

Innovations in technologies made the resources cheaper than earlier.  This enables organizations to store more data at lower cost and thus increasing the size of data. Gradually the size of data becomes bigger and now it moves from Megabytes (MB) to Petabytes (1e+9 MB). This huge increase in data requires some different kind of processing.  Read more about Big Data: An Introduction[…]

Tum Laut Jao Priye