jsoup - Java - Get every webpage associated with domain name programmatically -


i want make program user enters url, , program responds every web page associated under domain name. right now, i'm using jsoup every <a href> link, not cover every web page on site if site changes pages through angularjs or else. advice on how best this?

you don't need jsoup this. navigate host's robots.txt

https://stackoverflow.com/robots.txt

and find sitemap.xml.

sitemap: /sitemap.xml 

in case of so, theirs cached on google:

cache:https://stackoverflow.com/sitemap.xml

this have of links website wants publicly available. or in case of so, list of additional site maps scan through.

https://stackoverflow.com/sitemap-questions-0.xml       https://stackoverflow.com/sitemap-questions-1.xml  https://stackoverflow.com/sitemap-questions-2.xml  https://stackoverflow.com/sitemap-questions-3.xml  https://stackoverflow.com/sitemap-questions-4.xml  https://stackoverflow.com/sitemap-questions-5.xml  https://stackoverflow.com/sitemap-questions-6.xml .... 

Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

how to prompt save As Box in Excel Interlop c# MVC 4 -

xslt 1.0 - How to access or retrieve mets content of an item from another item? -