Python 2.7: Remove subdomains from list -

i have list 1,300,000 items. example, ['.a', '.b.a', '.c.b', '.f.c.b'].

i'd remove subdomains (e.g. '.b.a' , '.f.c.b' in list above).

i'm newbie. trying learn speed. following attempts, seem slow. suggestions:

# create separate lists, perhaps faster a1 = [] b2 = [] c3 = [] d4 = [] e5 = [] f6 = [] in dupesgone:     j = i.count('.')     if j == 1:         a1.append(i)     elif j == 2:         b2.append(i)     elif j == 3:         c3.append(i)     elif j == 4:         d4.append(i)     elif j == 5:         e5.append(i)     else:         f6.append(i)  in a1:     la = -len(a)     b in b2:         if == b[la:]:             b2.remove(b)     c in c3:         if == c[la:]:             c3.remove(c)     d in d4:         if == d[la:]:             d4.remove(d)     --snip--  # how this, faster [b2.remove(b) b in b2 in a1 if == b[-len(a):]] [c3.remove(c) c in c3 in a1 if == c[-len(a):]] [d4.remove(d) d in d4 in a1 if == d[-len(a):]] [e5.remove(e) e in e5 in a1 if == e[-len(a):]] [f6.remove(f) f in f6 in a1 if == f[-len(a):]]

should create dictionary? faster?

thanks help.

as practical matter, think fastest algorithm

reverse every item (so ".b.c" becomes "c.b.")
sort list
loop through list idea of "current" item. if next item on list starts (i.e. subdomain of) of current item, next item added output list , becomes current item.
reverse each item on output list

here untested sketch of code:

def reverse(s):   return s[::-1]  r = map(reverse, devgone) r.sort() ci = none out = [] ni in r:   if not ci or not ni.startswith(ci):      out.append(ni)      ci = ni return map(reverse, out)

Search This Blog

Guide

Python 2.7: Remove subdomains from list -

Comments

Post a Comment

Popular posts from this blog

swift - Button on Table View Cell connected to local function -

dns - Dokku server hosts two sites with TLD's, both domains are landing on only one app -

c# - ajax - How to receive data both html and json from server? -