CORRECT TEXT
Problem Scenario 63 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2) val b = a.map(x => (x.length, x)) operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, String}] = Array((4,lion), (3,dogcat), (7,panther), (5,tigereagle))
正解:
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
b.reduceByKey(_ + _).collect
reduceByKey JPair] : This function provides the well-known reduce functionality in Spark.
Please note that any function f you provide, should be commutative in order to generate reproducible results.
質問 2:
CORRECT TEXT
Problem Scenario 35 : You have been given a file named spark7/EmployeeName.csv
(id,name).
EmployeeName.csv
E01,Lokesh
E02,Bhupesh
E03,Amit
E04,Ratan
E05,Dinesh
E06,Pavan
E07,Tejas
E08,Sheela
E09,Kumar
E10,Venkat
1. Load this file from hdfs and sort it by name and save it back as (id,name) in results directory. However, make sure while saving it should be able to write In a single file.
正解:
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution:
Step 1 : Create file in hdfs (We will do using Hue). However, you can first create in local filesystem and then upload it to hdfs.
Step 2 : Load EmployeeName.csv file from hdfs and create PairRDDs
val name = sc.textFile("spark7/EmployeeName.csv")
val namePairRDD = name.map(x=> (x.split(",")(0),x.split(",")(1)))
Step 3 : Now swap namePairRDD RDD.
val swapped = namePairRDD.map(item => item.swap)
step 4: Now sort the rdd by key.
val sortedOutput = swapped.sortByKey()
Step 5 : Now swap the result back
val swappedBack = sortedOutput.map(item => item.swap}
Step 6 : Save the output as a Text file and output must be written in a single file.
swappedBack. repartition(1).saveAsTextFile("spark7/result.txt")
質問 3:
CORRECT TEXT
Problem Scenario 65 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "cat", "owl", "gnu", "ant"), 2)
val b = sc.parallelize(1 to a.count.tolnt, 2)
val c = a.zip(b)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(String, Int)] = Array((owl,3), (gnu,4), (dog,1), (cat,2>, (ant,5))
正解:
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution : c.sortByKey(false).collect
sortByKey [Ordered] : This function sorts the input RDD's data and stores it in a new RDD.
"The output RDD is a shuffled RDD because it stores data that is output by a reducer which has been shuffled. The implementation of this function is actually very clever.
First, it uses a range partitioner to partition the data in ranges within the shuffled RDD.
Then it sorts these ranges individually with mapPartitions using standard sort mechanisms.
質問 4:
CORRECT TEXT
Problem Scenario 42 : You have been given a file (sparklO/sales.txt), with the content as given in below.
spark10/sales.txt
Department,Designation,costToCompany,State
Sales,Trainee,12000,UP
Sales,Lead,32000,AP
Sales,Lead,32000,LA
Sales,Lead,32000,TN
Sales,Lead,32000,AP
Sales,Lead,32000,TN
Sales,Lead,32000,LA
Sales,Lead,32000,LA
Marketing,Associate,18000,TN
Marketing,Associate,18000,TN
HR,Manager,58000,TN
And want to produce the output as a csv with group by Department,Designation,State with additional columns with sum(costToCompany) and TotalEmployeeCountt
Should get result like
Dept,Desg,state,empCount,totalCost
Sales,Lead,AP,2,64000
Sales.Lead.LA.3.96000
Sales,Lead,TN,2,64000
正解:
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
step 1 : Create a file first using Hue in hdfs.
Step 2 : Load tile as an RDD
val rawlines = sc.textFile("spark10/sales.txt")
Step 3 : Create a case class, which can represent its column fileds. case class
Employee(dep: String, des: String, cost: Double, state: String)
Step 4 : Split the data and create RDD of all Employee objects.
val employees = rawlines.map(_.split(",")).map(row=>Employee(row(0), row{1), row{2).toDouble, row{3)))
Step 5 : Create a row as we needed. All group by fields as a key and value as a count for each employee as well as its cost, val keyVals = employees.map( em => ((em.dep, em.des, em.state), (1 , em.cost)))
Step 6 : Group by all the records using reduceByKey method as we want summation as well. For number of employees and their total cost, val results = keyVals.reduceByKey{
(a,b) => (a._1 + b._1, a._2 + b._2)} // (a.count + b.count, a.cost + b.cost)}
Step 7 : Save the results in a text file as below.
results.repartition(1).saveAsTextFile("spark10/group.txt")
質問 5:
CORRECT TEXT
Problem Scenario 57 : You have been given below code snippet.
val a = sc.parallelize(1 to 9, 3) operationl
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(String, Seq[lnt])] = Array((even,ArrayBuffer(2, 4, G, 8)), (odd,ArrayBuffer(1, 3, 5, 7,
9)))
正解:
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
a.groupBy(x => {if (x % 2 == 0) "even" else "odd" }).collect
質問 6:
CORRECT TEXT
Problem Scenario 62 : You have been given below code snippet.
val a = sc.parallelize(List("dogM, "tiger", "lion", "cat", "panther", "eagle"), 2) val b = a.map(x => (x.length, x)) operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, String)] = Array((3,xdogx), (5,xtigerx), (4,xlionx), (3,xcatx), (7,xpantherx),
(5,xeaglex))
正解:
See the explanation for Step by Step Solution and configuration.
Explanation:
Solution :
b.mapValuesf'x" + _ + "x").collect
mapValues [Pair] : Takes the values of a RDD that consists of two-component tuples, and applies the provided function to transform each value. Tlien,.it.forms newtwo-componend tuples using the key and the transformed value and stores them in a new RDD.
相川** -
CCA175を解くことで出題傾向を掴み、頻出用語はきっちり押さえて得点力アップを狙いにいってるPass4Testサイトっすから合格でよかったすう