Post by @ed

@acdha@code4lib.social for single page I really like Harvard LIL's Scoop, which I believe is used by their flagship perma.cc service via a celery job that talks to scoop-api, a (closed source?) web service that wraps Scoop:

https://github.com/harvard-lil/scoop#readme

For more than one page crawls I think that @webrecorder@digipres.club's browsertrix-crawler is still the best thing out there. I helped write this howto, which I stand by:

https://sciop.net/docs/scraping/webpages/

PS. thanks for asking me about my favorite $work thing :-)