1. Number of mappers/ reducer tasks
-
The number of mapper and reducer tasks in
each node of a cluster directly depends on the total available memory and
various other services running in the node (for e.g. HBase)
-
By default the memory allocated to JVM is
512MB. Hence, each mapper and the reducer task will get the same memory.
-
The JVM memory for a Hadoop job can be
controlled through any of the three following options:
o Hadoop Cluster Configuration
It is accomplished by changing the mapred.map.child.opts, mapred.reduce.child.opts properties. The effect will be reflected in every job executed in the Cluster unless the properties are overridden by the job in its job configuration.
It is accomplished by changing the mapred.map.child.opts, mapred.reduce.child.opts properties. The effect will be reflected in every job executed in the Cluster unless the properties are overridden by the job in its job configuration.
o Job Arguments
The most efficient way of allocating necessary memory for a mapper/reducer is by passing the required value as Hadoop job arguments. (The Driver class of each job should implement the ToolRunner.)
For example, if the job requires a map task to be allocated with 2GB of memory, following is the argument that needs to be passed.
The most efficient way of allocating necessary memory for a mapper/reducer is by passing the required value as Hadoop job arguments. (The Driver class of each job should implement the ToolRunner.)
For example, if the job requires a map task to be allocated with 2GB of memory, following is the argument that needs to be passed.
§ -Dmapred.map.child.opts=-x2048m
o Set the configuration properties inside
the driver class of the Job
If the amount of the memory required for map/reduce task is expected to be consistent and known beforehand, then setting them in job configuration object inside the driver class is the good approach.
If the amount of the memory required for map/reduce task is expected to be consistent and known beforehand, then setting them in job configuration object inside the driver class is the good approach.
-
Blindly increasing the memory might cause
adverse effect if the node running the task is unable to allocate the requested
memory. This will cause ‘out of memory’ errors, in turn resulting in job
failures.
-
Always consider the nature of the job, size
of data flowing into and the resources available in the cluster before deciding
on the memory to be used.
2. Oversubscription
-
Over subscription of resources is driven by
the fact that the total available tasks in the cluster will not require the
maximum allocated memory for each of it. This can be very helpful in the way
that we get more number of tasks (with less resources than required). Works
well when we have large number of tasks with less memory intensive activities.
-
However, over subscription can cause memory
issues too. If the memory required by the tasks running exceeds the memory that
can be allocated by the system, then it will cause over subscription and job
failures with error code 137.
-
The workaround for the above issues is to
control the memory requested by the tasks does not exceed the total available
memory. This can be accomplished by any of the following:
o Reduce
the number of inputs. (Execute multiple jobs by splitting the input if possible)
o Control
the number of mapper/reducer tasks (This can be obtained by either reducing the
amount of input or controlling the number of tasks spawn by task queues
(cluster level configuration)).
3. Hadoop job arguments and code compatibility
-
The most efficient way of setting job
specific configuration is passing them via Hadoop arguments. This is done by
‘-D’ option.
-
The important point to remember is that the
job should complement by taking in the Hadoop arguments. It is a common mistake
to create a new configuration object inside each job that will override the
configuration passed through arguments.
-
ToolRunner needs to be implemented for
reading the Hadoop arguments.
4. Speculative execution
-
If a particular drive is taking a long time
to complete a task, Hadoop will create a duplicate task on another disk. Disks
that finish the task first are retained and disks that do not finish first are
killed. The disk will continue to be used for other tasks, as Hadoop does not
have the ability to 'fail' a disk, it just keeps it from being used for a
particular task.
-
The speculative execution is a smart idea in
the fact that if tasks are running in a degraded node, the time taken for
completion might be longer and another parallel task running on the same data
on different node might be able to succeed faster. The Hadoop framework is
intelligent enough to take the faster running task and kill the slower tasks.
This will help in improvement of overall job performance.
-
However, on the other hand, speculative
execution will use more cluster resources than actually required for completion
of a job. This may in turn cause slower performance and memory issues. As a
thump rule, it is good to have speculative execution ON for large number of
small jobs and OFF for large jobs.
-
Always make sure that the speculative
execution is turned OFF for any HBase intensive jobs. If it is ON, it will
burden the HBase with double the actual number of requests. This will create
god chance of region servers exceeding the allocated memory (by default 4GB)
and go into dead state.
-
How to disable Speculative execution in
Hadoop
-
Speculative execution is enabled by default
in Hadoop. You can disable speculative execution for the mappers and reducers
in mapred-site.xml
<property>
<name>mapred.map.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapred.reduce.tasks.speculative.execution</name>
<value>false</value>
</property>
and Set in job configuration
jobconf.set(“mapred.map.tasks.speculative.execution”,false);
jobconf.set(“mapred.reduce.tasks.speculative.execution”,false);
or passing the same through
Hadoop job arguments
-Dmapred.map.tasks.speculative.execution=false
-Dmapred.reduce.tasks.speculative.execution=false
mapreduce.map.speculative
and mapreduce.reduce.speculative are the
newer version alternative for
mapred.map.tasks.speculative.execution
and mapred.reduce.tasks.speculative.execution
5. Multiple Outputs, Task attempts and Speculative
Execution
-
The Multiple Outputs class simplifies writing
output data to multiple outputs. It helps in writing to additional outputs
other than the job default output and to write data to different files provided
by user.
-
When the job tracker is informed that a
particular task fails, it is rescheduled to execute another attempt. By default
4 attempts will be made.
-
In cases where we use multiple outputs to be
pointed outside the default job output, only one task attempt will be made. All
the subsequent attempts will fail with ‘File Exists’ error. This is because,
the failed attempt does not automatically cleanup the files.
-
A work around is to write the multiple
outputs to separate directories inside the default job output and when the jobs
completes successfully, move it to the required location.
-
Always take care to turn OFF speculative
execution when there is multiple outputs written to directories other than the
default job output folder. Else, it will cause failed tasks due to ‘file
exists’ and might eventually lead to job failures.
Nice :)
ReplyDelete