Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upInvestigate what would be needed to include crawl-sites visualization #146
Comments
ruebot
self-assigned this
Jul 3, 2018
This comment has been minimized.
This comment has been minimized.
Here's an example of the output for other following along: http://lintool.github.io/warcbase/vis/crawl-sites/. I ran this on all the WALK collections, FWIW, and was able to do the full thing in a few minutes on a laptop if I remember correctly. Here's one of our 4-5TB ones: https://web-archive-group.github.io/WALK-CrawlVis/crawl-sites/ALBERTA_government_information_all_urls.html. |
This comment has been minimized.
This comment has been minimized.
FYI I dug back into our past workflow and am glad I did as it's a bit janky. Here's the latest workflow I was using to do this. https://github.com/web-archive-group/WALK-CrawlVis/blob/master/WORKFLOW.md Note that the major problem is the output from the domain count is different than what |
ruebot
added
question
discussion
labels
Aug 20, 2018
This comment has been minimized.
This comment has been minimized.
If we add an additional spark sub-job:
We'll get output like this:
Then we'll probably need to adapt After that, it'd just be following the path of the Sigmajs visualization for this implementation. |
This comment has been minimized.
This comment has been minimized.
This sounds promising! I'd defer to you on the implementation, but creating this file and then possibly adding it to the clean-up job is a good route forward? |
added a commit
that referenced
this issue
Feb 6, 2019
ruebot
added
Background jobs
feature
in progress
Rails
and removed
discussion
question
labels
Feb 6, 2019
This comment has been minimized.
This comment has been minimized.
Easy part done. Now I have to port |
ruebot commentedJul 3, 2018
See what is needed to add crawl-sites.
graph.js