🔍 Website Crawler
This tool will find all urls on page
The crawler is configured to stay within the boundaries of the specified domain,
making a complete index of your site.
Great alternative to offline apps like screamingfrog
or other site crawlers that are severely limited, or asks you to register, solve captchas and interfere in other ways.
We keep it simple here.
The site will gently crawl starting from the first URL you specify, and will stop when either page limit reached or by time. See complete details of how this crawler works below. By "confirm" you own this site or have permission. caused a denial-of-service attack on Koster's server.[5]
ProxyNova Crawler
Ignores sitemap.xml. WILL respect robots.txt (optionally). US ip, user agent: proxynova crawler. Crawl slow (1-3
adjustable pages per second). Will crawl vue apps no problem
Stops at either when
- website fully crawled (no more new pages found)
- 1000 pages found. We will not crawl more than 500 (screaming frog caps at 500)
- running for 10 minutes (lambda limit).
Crawler identification. Will identify through User-Agent as :proxynova.com/robot
Can appear from any US based IP address. May even make different IP address during same crawl.
otherwise, it will appear as normal user browsing variant of Google Chrome.
Will not respectr robots.txt. In the future, you will be allowed by putting this into your robots.txt file:
Agent: proxynova
Disallow: /
Very gentle. No more than 1 request per second, which is only a few times faster than a human visitor
If a single crawler is performing multiple requests per second and/or downloading large files,
a server can have a hard time keeping up with requests from multiple crawlers.
1 to 3 requess per second.
☑ To-Do List
Suggest more using the feedback form on this site.
- Allow crawler results to be exported in the JSON-like format that allows anyone to build their own search engine for their own website
- Allow specifying to follow only links containing some string. Example follow links containing /dp/ followRules: /product
- Allow parsing of custom data via customDataSelector -> customSelectorCaptureText (will extract first 255 characters)