apache spark - Why does reading csv file with empty values lead to IndexOutOfBoundException? -

i have csv file foll struct

name | val1 | val2 | val3 | val4 | val5 john     1      2 joe      1      2 david    1      2            10    11

i able load rdd fine. tried create schema , dataframe , indexoutofbound error.

code ...

val rowrdd = filerdd.map(p => row(p(0), p(1), p(2), p(3), p(4), p(5), p(6) )

when tried perform action on rowrdd, gives error.

any appreciated.

this not answer question. may solve problem.

from question see trying create dataframe csv.

creating dataframe using csv can done using spark-csv package

with spark-csv below scala code can used read csv val df = sqlcontext.read.format("com.databricks.spark.csv").option("header", "true").load(csvfilepath)

for sample data got following result

+-----+----+----+----+----+----+ | name|val1|val2|val3|val4|val5| +-----+----+----+----+----+----+ | john|   1|   2|    |    |    | |  joe|   1|   2|    |    |    | |david|   1|   2|    |  10|  11| +-----+----+----+----+----+----+

you can inferschema latest version. see answer

Search This Blog

Guide

apache spark - Why does reading csv file with empty values lead to IndexOutOfBoundException? -

Comments

Post a Comment

Popular posts from this blog

swift - Button on Table View Cell connected to local function -

dns - Dokku server hosts two sites with TLD's, both domains are landing on only one app -

c# - ajax - How to receive data both html and json from server? -