jsoup - Java - Get every webpage associated with domain name programmatically -
i want make program user enters url, , program responds every web page associated under domain name. right now, i'm using jsoup every <a href>
link, not cover every web page on site if site changes pages through angularjs or else. advice on how best this?
you don't need jsoup this. navigate host's robots.txt
https://stackoverflow.com/robots.txt
and find sitemap.xml
.
sitemap: /sitemap.xml
in case of so, theirs cached on google:
cache:https://stackoverflow.com/sitemap.xml
this have of links website wants publicly available. or in case of so, list of additional site maps scan through.
https://stackoverflow.com/sitemap-questions-0.xml https://stackoverflow.com/sitemap-questions-1.xml https://stackoverflow.com/sitemap-questions-2.xml https://stackoverflow.com/sitemap-questions-3.xml https://stackoverflow.com/sitemap-questions-4.xml https://stackoverflow.com/sitemap-questions-5.xml https://stackoverflow.com/sitemap-questions-6.xml ....
Comments
Post a Comment