WebScraper uses the Integrity v8 engine to quickly scan a website, and can output extracted data as CSV or JSON. Plus download images to a folder.
Easy to scan a site - just enter the starting URL and press "Go"
Easy to export - choose the columns you want
Plenty of extraction options, including HTML elements with certain classes or IDs, regular expressions, or entire content in a number of formats (html, plain text, markdown)
'helper' utilities within the app make it easy to find a suitable class / id or produce a regular expression (regex) to extract the data you want
Since v4.1 can download to a folder all images discovered
Configuration of various limits on the crawl and the output file size
What's new in WebScraper
Adds timeout control under Advanced Scan Settings. Previously this was internally set to 30s. If you experience timeouts, it's important to use the threads slider and/or 'limit requests to X per minute' to limit the crawl. This will usually cure the problem. Context help added by the timeout field to this effect.