In the standard word count MapReduce algorithm, why might using a combiner reduce the overall Job running time?
A. Because combiners perform local aggregation of word counts, and then transfer that data to reducers without writing the intermediate data to disk.
B. Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster.
C. Because combiners perform local aggregation of word counts, thereby reducing the number of mappers that need to run.
D. Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that need to be snuff let across the network to the reducers.
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 2:
You write a MapReduce job to process 100 files in HDFS. Your MapReduce algorithm uses TextInputFormat and the IdentityReducer: the mapper applies a regular expression over input values and emits key-value pairs with the key consisting of the matching text, and the value containing the filename and byte offset. Determine the difference between setting the number of reducers to zero.
A. There is no difference in output between the two settings.
B. With zero reducers, instances of matching patterns are stored in multiple files on HDFS. With one reducer, all instances of matching patterns are gathered together in one file on HDFS.
C. With zero reducers, no reducer runs and the job throws an exception. With one reducer, instances of matching patterns are stored in a single file on HDFS.
D. With zero reducers, all instances of matching patterns are gathered together in one file on HDFS. With one reducer, instances of matching patterns stored in multiple files on HDFS.
正解:B
解説: (Pass4Test メンバーにのみ表示されます)
質問 3:
Given a Mapper, Reducer, and Driver class packaged into a jar, which is the correct way of submitting the job to the cluster?
A. hadoop jar class MyJar.jar MyDriverClass inputdir outputdir
B. jar MyJar.jar
C. hadoop jar MyJar.jar MyDriverClass inputdir outputdir
D. jar MyJar.jar MyDriverClass inputdir outputdir
正解:C
解説: (Pass4Test メンバーにのみ表示されます)
質問 4:
In a MapReduce job, you want each of you input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
A. Write a custom FileInputFormat and override the method isSplittable to always return false.
B. Increase the parameter that controls minimum split size in the job configuration.
C. Set the number of mappers equal to the number of input files you want to process.
D. Write a custom MapRunner that iterates over all key-value pairs in the entire file.
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 5:
Which of the following utilities allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer?
A. Hadoop Streaming
B. Flume
C. Sqoop
D. Oozie
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
質問 6:
Which of the Following best describes the lifecycle of a Mapper?
A. The TaskTracker spawns a new Mapper to process each key-value pair.
B. The TaskTracker spawns a new Mapper to process all records in a single input split.
C. The JobTracker spawns a new Mapper to process all records in a single file.
D. The JobTracker calls the FastTracker's configure () method, then its map () method and finally its closer ()
正解:A
解説: (Pass4Test メンバーにのみ表示されます)
Nishikata -
CCD-333試験に一発で合格したい人にはピッタリだと思う