apache spark - Why does reading csv file with empty values lead to IndexOutOfBoundException? -
i have csv file foll struct
name | val1 | val2 | val3 | val4 | val5 john 1 2 joe 1 2 david 1 2 10 11
i able load rdd fine. tried create schema , dataframe
, indexoutofbound
error.
code ...
val rowrdd = filerdd.map(p => row(p(0), p(1), p(2), p(3), p(4), p(5), p(6) )
when tried perform action on rowrdd
, gives error.
any appreciated.
this not answer question. may solve problem.
from question see trying create dataframe csv.
creating dataframe using csv can done using spark-csv package
with spark-csv below scala code can used read csv val df = sqlcontext.read.format("com.databricks.spark.csv").option("header", "true").load(csvfilepath)
for sample data got following result
+-----+----+----+----+----+----+ | name|val1|val2|val3|val4|val5| +-----+----+----+----+----+----+ | john| 1| 2| | | | | joe| 1| 2| | | | |david| 1| 2| | 10| 11| +-----+----+----+----+----+----+
you can inferschema latest version. see answer
Comments
Post a Comment