python - Passing objects to Spark -
i try understand capabilities of spark, fail see if following possible in python.
i have objects non pickable (wrapped c++ swig). have list of objects obj_list = [obj1, obj2, ...] objects have member function called .dostuff
i'd parallelized following loop in spark (in order run on aws since don't have big architecture internally. use multiprocessing, don't think can send objects on network):
[x.dostuff() x in obj_list]
any pointers appreciated.
if object's aren't picklable options pretty limited. if can create them on executor side though (frequently useful option things database connections), can parallelize
regular list (e.g. maybe list of constructor parameters) , use map
if dostuff
function returns (picklable) values want use or foreach
if dostuff function called side effects (like updating database or similar).
Comments
Post a Comment