Python 2.7: Remove subdomains from list -

i have list 1,300,000 items. example, ['.a', '.b.a', '.c.b', '.f.c.b'].

i'd remove subdomains (e.g. '.b.a' , '.f.c.b' in list above).

i'm newbie. trying learn speed. following attempts, seem slow. suggestions:

# create separate lists, perhaps faster a1 = [] b2 = [] c3 = [] d4 = [] e5 = [] f6 = [] in dupesgone:     j = i.count('.')     if j == 1:         a1.append(i)     elif j == 2:         b2.append(i)     elif j == 3:         c3.append(i)     elif j == 4:         d4.append(i)     elif j == 5:         e5.append(i)     else:         f6.append(i)  in a1:     la = -len(a)     b in b2:         if == b[la:]:             b2.remove(b)     c in c3:         if == c[la:]:             c3.remove(c)     d in d4:         if == d[la:]:             d4.remove(d)     --snip--  # how this, faster [b2.remove(b) b in b2 in a1 if == b[-len(a):]] [c3.remove(c) c in c3 in a1 if == c[-len(a):]] [d4.remove(d) d in d4 in a1 if == d[-len(a):]] [e5.remove(e) e in e5 in a1 if == e[-len(a):]] [f6.remove(f) f in f6 in a1 if == f[-len(a):]]

should create dictionary? faster?

thanks help.

as practical matter, think fastest algorithm

reverse every item (so ".b.c" becomes "c.b.")
sort list
loop through list idea of "current" item. if next item on list starts (i.e. subdomain of) of current item, next item added output list , becomes current item.
reverse each item on output list

here untested sketch of code:

def reverse(s):   return s[::-1]  r = map(reverse, devgone) r.sort() ci = none out = [] ni in r:   if not ci or not ni.startswith(ci):      out.append(ni)      ci = ni return map(reverse, out)

Search This Blog

Guide

Python 2.7: Remove subdomains from list -

Comments

Post a Comment

Popular posts from this blog

dns - Dokku server hosts two sites with TLD's, both domains are landing on only one app -

c# - ajax - How to receive data both html and json from server? -

ajax - ERR_CONNECTION_REFUSED in Chrome while loading jQuery DataTable server side -