How to process tab-separated files in Spark? -

i have file tab separated. third column should key , entire record should value (as per map reduce concept).

val ceffile = sc.textfile("c:\\text1.txt") val cefdim1 =  ceffile.filter { line => line.startswith("1") } val joinedrdd = ceffile.map(x => x.split("\\t"))  joinedrdd.first().foreach { println }

i able value of first column not third. can suggest me how accomplish this?

after you've done split x.split("\\t") rdd (which in example called joinedrdd i'm going call parsedrdd since haven't joined yet) going rdd of arrays. turn array of key/value tuples doing parsedrdd.map(r => (r(2), r)). being said - aren't limited map & reduce operations in spark possible data structure might better suited. tab separated files, use spark-csv along spark dataframes if fit eventual problem looking solve.

Search This Blog

Guide

How to process tab-separated files in Spark? -

Comments

Post a Comment

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

dns - Dokku server hosts two sites with TLD's, both domains are landing on only one app -

swift - Button on Table View Cell connected to local function -