python - Parsing CSV to list of tuples without using CSV module -
i'm working on assignment in python class @ moment, , 1 particular part asking me import csv file (with data in format of "text, number, number, ..., number, number") without use of csv module (or modules @ all, in fact), , return data list of tuples, in format:
[(’text’, [number, number, ..., number, number]), (’text’, [number, number, ..., number, number]), .....] i think i've got actual process of opening file , beginning read line line correct (see snippet below), i'm not quite sure on how proceed regards parsing each line format needed.
def load_data(filename):     open(filename)     line in filename i've tried searching can seem find says use csv module (which isn't particularly helpful because we're not allowed import modules bar math library) or has data being input and/or output in different format. if give me pointers should doing or can start super helpful. thanks!
edit: per suggestion made @dotancohen here sample data:
slow loris, 21.72, 29.3, 20.08, 29.98, 29.85, 26.22, 29.68 ocelot, 57.51, 47.59, 55.89, 47.15, 46.71, 51.7, 46.68, 54.54 tiger, 75.0, 82.43, 112.11, 89.93, 103.19, 80.6, 113.44, 75.55, 102.29, 108.1, 98.84, 101.48, 77.75, 98.57, 70.31, 78.28, 80.18 also below have @ moment potential solution:
def load_data(filename):     open(filename) file     output = []     line in filename         temp_list = line.split(',')         temp_item = temp_list.pop(0)         tup = (temp_item, temp_list)         output.append(tup)     return output 
csv files have lines delimited either comma or tab, in naive case give different fields:
for line in filename:     fields = line.split(',')  # comma-delimited files     # - or -     fields = line.split('\t') # tab-delimited files however, can seldom allow ourselves naive. csv files have, among others, following caveats:
- quoted values: legal field in csv: "i think, therefore am". need careful not split on comma inside quotes. newines can appear in quoted values, cannot reliably use for line in filenamenaively.
- escaped quotes in quoted values: legal field in csv: "she said\"i think so\"". means state machine matching per-character in-quoted out-quoted state needs lookback mechanism well.
thus, reliably parse csv files need state machine saves state across lines in files. there horrible surprises along way, such dealing unicode csv files in python 2 (hint: if have non-ascii text, use python 3). there small surprises, applications putting space after comma delimiter or not adding commas blank fields @ end of line.
therefore, if accept csv files input users, use csv module. however, if can control input (i.e. produce script) can use naive line.split('\t') method.
as per sample data posted op, see not not have worry quoted fields, csv source in fact adding erroneous spaces after comma delimiters. thus, code specific op's situation:
for line in filename:     fields = line.split(',')     fields = [x.strip() x in fields] # remove whitespace 
Comments
Post a Comment