Python 2.7: Remove subdomains from list -


i have list 1,300,000 items. example, ['.a', '.b.a', '.c.b', '.f.c.b'].

i'd remove subdomains (e.g. '.b.a' , '.f.c.b' in list above).

i'm newbie. trying learn speed. following attempts, seem slow. suggestions:

# create separate lists, perhaps faster a1 = [] b2 = [] c3 = [] d4 = [] e5 = [] f6 = [] in dupesgone:     j = i.count('.')     if j == 1:         a1.append(i)     elif j == 2:         b2.append(i)     elif j == 3:         c3.append(i)     elif j == 4:         d4.append(i)     elif j == 5:         e5.append(i)     else:         f6.append(i)  in a1:     la = -len(a)     b in b2:         if == b[la:]:             b2.remove(b)     c in c3:         if == c[la:]:             c3.remove(c)     d in d4:         if == d[la:]:             d4.remove(d)     --snip--  # how this, faster [b2.remove(b) b in b2 in a1 if == b[-len(a):]] [c3.remove(c) c in c3 in a1 if == c[-len(a):]] [d4.remove(d) d in d4 in a1 if == d[-len(a):]] [e5.remove(e) e in e5 in a1 if == e[-len(a):]] [f6.remove(f) f in f6 in a1 if == f[-len(a):]] 

should create dictionary? faster?

thanks help.

as practical matter, think fastest algorithm

  1. reverse every item (so ".b.c" becomes "c.b.")
  2. sort list
  3. loop through list idea of "current" item. if next item on list starts (i.e. subdomain of) of current item, next item added output list , becomes current item.
  4. reverse each item on output list

here untested sketch of code:

def reverse(s):   return s[::-1]  r = map(reverse, devgone) r.sort() ci = none out = [] ni in r:   if not ci or not ni.startswith(ci):      out.append(ni)      ci = ni return map(reverse, out) 

Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

Delphi 7 and decode UTF-8 base64 -

html - Is there any way to exclude a single element from the style? (Bootstrap) -