Sort by: Newest, Oldest, Most Relevant
(#momapxa) @mckinley I can confirm the library "does what it says on the tin" ๐Ÿ‘Œ I'll put up my little CLI tool up for you to play with, its pretty damn stupid and basic right now as I'm not completely yet really sure how to flesh this out. Will need you to guide me on this, there's probably a fair few nuances to writing a decent web mirroring tool (at least it does the right thing though and handles dynamic content rendered with Javascript -- Which I tested by hitting my files.mills.io web app which has a pure JS frontend using MithrilJS)

matched #wxzw6za score:11.01 Search by:
Search by 1 mentions:
Search by 1 tags:
(#momapxa) If I can get a proper static copy of MDN, I'll make a torrent and share a magnet link here. I know I'm not the only one who wants something like this. I don't think the file sizes will be so bad. My current "build" of the entire site is sitting at 1.36 GiB. (Only a little more than double the size of `node_modules`!) So, with browser compatibility data and such, I think it'll still be less than 2GiB. Aggressively compressed with `bzip2 -9`, it's only 114.29 MiB. A compression ratio of 0.08. That blows my mind.

matched #hxac37q score:11.01 Search by:
Search by 1 tags:
(#momapxa) @prologic What I need it to do is crawl a website, executing JavaScript along the way, and saving the resulting DOMs to HTML files. It isn't necessary to save the files downloaded via XHR and the like, but I would need it to save page requisites. CSS, JavaScript, favicons, etc. Something that I'd like to have, but isn't required, is mirroring of content (+ page requisites) in frames. (Example) This would involve spanning hosts, but I only need to span hosts for this specific purpose. It would also be nice if the program could resolve absolute paths to relative paths (`/en-US/docs/Web/HTML/Global_attributes` -> `../../Global_attributes`) but this isn't required either. I think I'm going to have to have a local Web server running anyway because just about all the links are to directories with an `index.html`. (i.e the actual file referenced by `/en-US/docs/Web/HTML/Global_attributes` is `/en-US/docs/Web/HTML/Global_attributes/index.html`.)

matched #jc7ojca score:11.01 Search by:
Search by 1 mentions:
Search by 1 tags:
This is twtxt search engine and crawler. Please contact Support if you have any questions, concerns or feedback!