jsoup - Java - Get every webpage associated with domain name programmatically -


i want make program user enters url, , program responds every web page associated under domain name. right now, i'm using jsoup every <a href> link, not cover every web page on site if site changes pages through angularjs or else. advice on how best this?

you don't need jsoup this. navigate host's robots.txt

https://stackoverflow.com/robots.txt

and find sitemap.xml.

sitemap: /sitemap.xml 

in case of so, theirs cached on google:

cache:https://stackoverflow.com/sitemap.xml

this have of links website wants publicly available. or in case of so, list of additional site maps scan through.

https://stackoverflow.com/sitemap-questions-0.xml       https://stackoverflow.com/sitemap-questions-1.xml  https://stackoverflow.com/sitemap-questions-2.xml  https://stackoverflow.com/sitemap-questions-3.xml  https://stackoverflow.com/sitemap-questions-4.xml  https://stackoverflow.com/sitemap-questions-5.xml  https://stackoverflow.com/sitemap-questions-6.xml .... 

Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

Delphi 7 and decode UTF-8 base64 -

html - Is there any way to exclude a single element from the style? (Bootstrap) -