apache spark - Why does reading csv file with empty values lead to IndexOutOfBoundException? -


i have csv file foll struct

name | val1 | val2 | val3 | val4 | val5 john     1      2 joe      1      2 david    1      2            10    11 

i able load rdd fine. tried create schema , dataframe , indexoutofbound error.

code ...

val rowrdd = filerdd.map(p => row(p(0), p(1), p(2), p(3), p(4), p(5), p(6) ) 

when tried perform action on rowrdd, gives error.

any appreciated.

this not answer question. may solve problem.

from question see trying create dataframe csv.

creating dataframe using csv can done using spark-csv package

with spark-csv below scala code can used read csv val df = sqlcontext.read.format("com.databricks.spark.csv").option("header", "true").load(csvfilepath)

for sample data got following result

+-----+----+----+----+----+----+ | name|val1|val2|val3|val4|val5| +-----+----+----+----+----+----+ | john|   1|   2|    |    |    | |  joe|   1|   2|    |    |    | |david|   1|   2|    |  10|  11| +-----+----+----+----+----+----+ 

you can inferschema latest version. see answer


Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

dns - Dokku server hosts two sites with TLD's, both domains are landing on only one app -

swift - Button on Table View Cell connected to local function -