Sample Hadoop MapReduce Interview Questions and Answers
Here I have made a list of most commonly asked Hadoop MapReduce Interview Questions and Answers for you to read before your interview. These are Hadoop Basic Interview Questions and Answers for freshers and experienced. If you applying for a job position that needs yo have knowledge of Big Data Hadoop, then go through this list of Sample Hadoop MapReduce Interview Questions and Answers.
Context Object is used to help the mapper interact with other Hadoop systems. Context Object can be used for updating counters, to report the progress and to provide any application level status updates. ContextObject has the configuration details for the job and also interfaces, that helps it to generating the output.
The 3 core methods of a reducer are –
1)setup () – This method of the reducer is used for configuring various parameters like the input data size, distributed cache, heap size, etc.
Function Definition- public void setup (context)
2)reduce () it is heart of the reducer which is called once per key with the associated reduce task.
Function Definition -public void reduce (Key,Value,context)
3)cleanup () - This method is called only once at the end of reduce task for clearing all the temporary files.
Function Definition -public void cleanup (context)
Sort Phase- Hadoop MapReduce automatically sorts the set of intermediate keys on a single node before they are given as input to the reducer.
Partitioning Phase-The process that determines which intermediate keys and value will be received by each reducer instance is referred to as partitioning. The destination partition is same for any key irrespective of the mapper instance that generated it.
Steps to write a Custom Partitioner for a Hadoop MapReduce Job-
- A new class must be created that extends the pre-defined Partitioner Class.
- getPartition method of the Partitioner class must be overridden.
- The custom partitioner to the job can be added as a config file in the wrapper which runs Hadoop MapReduce or the custom partitioner can be added to the job by using the set method of the partitioner class.
A single job can be broken down into one or many tasks in Hadoop.
It is not necessary to write Hadoop MapReduce jobs in java but users can write MapReduce jobs in any desired programming language like Ruby, Perl, Python, R, Awk, etc. through the Hadoop Streaming API.
If there is limited storage space on commodity hardware, the split size can be changed by implementing the “Custom Splitter”. The call to Custom Splitter can be made from the main method.
The 3 primary phases of a reducer are –
1)Shuffle
2)Sort
3)Reduce
The actual hadoop MapReduce jobs that run on each slave node are referred to as Task instances. Every task instance has its own JVM process. For every new task instance, a JVM process is spawned by default for a task.
Reducers always run in isolation and they can never communicate with each other as per the Hadoop MapReduce programming paradigm.
Also Read :
