apache spark - Why does reading csv file with empty values lead to IndexOutOfBoundException? -

i have csv file foll struct

name | val1 | val2 | val3 | val4 | val5 john     1      2 joe      1      2 david    1      2            10    11

i able load rdd fine. tried create schema , dataframe , indexoutofbound error.

code ...

val rowrdd = filerdd.map(p => row(p(0), p(1), p(2), p(3), p(4), p(5), p(6) )

when tried perform action on rowrdd, gives error.

any appreciated.

this not answer question. may solve problem.

from question see trying create dataframe csv.

creating dataframe using csv can done using spark-csv package

with spark-csv below scala code can used read csv val df = sqlcontext.read.format("com.databricks.spark.csv").option("header", "true").load(csvfilepath)

for sample data got following result

+-----+----+----+----+----+----+ | name|val1|val2|val3|val4|val5| +-----+----+----+----+----+----+ | john|   1|   2|    |    |    | |  joe|   1|   2|    |    |    | |david|   1|   2|    |  10|  11| +-----+----+----+----+----+----+

you can inferschema latest version. see answer

Search This Blog

Guide

apache spark - Why does reading csv file with empty values lead to IndexOutOfBoundException? -

Comments

Post a Comment

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

Delphi 7 and decode UTF-8 base64 -

html - Is there any way to exclude a single element from the style? (Bootstrap) -