python - How to regex hashtag with digits after in python2? -


i want find hashtag 1 6 digits after python 2.7 , regex doesn't match properly.

here example :

chaine = "[url=http://forum.darkgyver.fr/t27142-probleme-boitier-papillon-faisceau#265132:<uid>]http://forum.darkgyver.fr/t27142-probleme-boitier-papillon-faisceau#265132[/url:<uid>]" regex = re.compile('http://forum.darkgyver.fr/(.*)\#(\d{1-6})') match = regex.search(chaine) if match:         pos1 = match.start()         pos2 = match.end() else:             pos1 = -1         pos2 = -1  print "pos1 %d" % pos1 print "pos2 %d" % pos2 url_tempo = chaine[pos1:pos2] print "url_tempo %s" % url_tempo             pospost = pos1 + url_tempo.find('#') + 1 numpost = chaine[pospost:pos2] print "numpost %s" % numpost          

this first regex returns "no match". perhaps hashtag not declared properly.

so changed regex follows:

regex = re.compile('http://forum.darkgyver.fr/(.*)\#([0-9]+(:| |    |\n|\[|$))') 

which matches wrong position pos2=161 should pos2=80

how can fix regex hashtag , 1 6 digits behind?

you attempting extract hashtag url. string have given, seem more logical try , extract digits between # , : characters. if had hashtag 7 digits, want 7 digits or want not match it? in case, guess not want first 6 digits.

by using grouping operator (), if there match, hashtag can seen using qroup(1) command, avoiding need try , extract using string slicing.

the following shows 1 possible way extract hashtag:

chaine = "[url=http://forum.darkgyver.fr/t27142-probleme-boitier-papillon-faisceau#265132:<uid>]http://forum.darkgyver.fr/t27142-probleme-boitier-papillon-faisceau#265132[/url:<uid>]"  re_hashtag = re.search(re.escape("http://forum.darkgyver.fr") + ".*?#(\d+):", chaine)  print re_hashtag.start()     print re_hashtag.end() print re_hashtag.group(1) 

this display following:

5 80 265132 

the start position of 5 because starts matching chosen http.

note, have used escape() function make sure url correctly escaped. if print following see how initial regular expression should have been written:

print re.escape("http://forum.darkgyver.fr") 

giving:

http\:\/\/forum\.darkgyver\.fr 

Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

Delphi 7 and decode UTF-8 base64 -

html - Is there any way to exclude a single element from the style? (Bootstrap) -