Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upHomepage for New Docs Site #70
Comments
Here is my suggested content. I couldn't seem to find a good way to get links to the "undraw" images I selected, but they do seem to have unique identifiers in their names. Homepage Analyze Large Web ArchivesUndraw Image: "Analysis" The Archives Unleashed Toolkit is an open-source platform for analyzing web archives built on Apache Spark, which provides powerful tools for analytics and data processing. What will you find here?Undraw Image: "Onboarding" This documentation is based on a cookbook approach, providing a series of "recipes" for addressing a number of common analytics tasks to provide inspiration for your own analysis. We generally provide examples for resilient distributed datasets (RDD) in Scala, and DataFrames in both Scala and Python. We leave it up to you to choose Scala or Python flavours of Spark. I want to pitch in! Can I help?Undraw image: "Proud Coder" If you would like to contribute code or help with the development of the Archives Unleashed Cloud or Toolkit, we'd love to have you as part of the project! Visit our GitHub organization and repositories, open issues, submit pull requests, or make any suggestion in our Slack group. What else is going on with your project?Undraw image: "Newsletter" Interested in the project? Subscribe to our newsletter! Or you can follow visit our website here for more information. |
@ianmilligan1 the other three sound like callouts for the Archives Unleashed site. What I'm looking for here is to highlight 2-4 |
Oh, I see. Something like?
|
Yeah, totally! |
How about something like? Analyze Large Web ArchivesUndraw Image: "Analysis" The Archives Unleashed Toolkit is an open-source platform for analyzing web archives built on Apache Spark, which provides powerful tools for analytics and data processing. This documentation is based on a cookbook approach, providing a series of "recipes" for addressing a number of common analytics tasks to provide inspiration for your own analysis. We generally provide examples for resilient distributed datasets (RDD) in Scala, and DataFrames in both Scala and Python. We leave it up to you to choose Scala or Python flavours of Spark. What can you do with the Toolkit? Extract text from your web archivesUndraw Image: "Body text" Do you have WARCs or ARCs? Want text? With the Archives Unleashed Toolkit, you can run jobs to extract all the plain text from a web archive. You can also use a variety of filters, including filtering by date, language, keyword, domain, or URL pattern. Soon you'll be mining text to your heart's content. Explore hyperlink networks within a web archiveUndraw image: "Nakamoto" Hyperlinking practice can tell us a lot about web archives: where did people link to for their information? How did these links change over time? Which websites, based on their hyperlinks, were the most influential? The toolkit allows you to extract site link structures, and organize them by URL pattern or crawl date. We also support seamless exportation to Gephi. Learn about your collections in many other waysUndraw image: "Instant analysis" That's not all. We support collection analysis (what can you find within the collection, from URLs to content type), image analysis, as well as the extraction of binary files from PowerPoint files to spreadsheets to PDFs. Don't see something that you wish we did? Let us know in a GitHub issue. |
If we go with that, the links would be to the specific parts of the documentation above (I didn't put those in yet as everything's in flux). |
I'm going to yank those buttons. Not 100% on them. We can put them back if there is an over all demand. |
Yeah sounds good on tweaking the images - but it otherwise looks really great @ruebot (and sounds good on yanking those buttons, I think they would confuse and busy it up more than they'd add). |
I only see this when I shrink my window, but the white on green (?) text is unreadable to my colour-deficient eyes: Also, as I mentioned in Slack, maybe we should have a "Enter the Docs" or "Read the Docs" button somewhere prominently that brings us into the current docs. I am finding it a bit hard to navigate. |
Good call. I had that in the header originally, and removed it. I'll get that added.
Good to know where the secondary colour setting finally comes into play! I'll get that taken care of too. |
Homepage is live at https://aut.docs.archivesunleashed.org/ - I think we can close this omnibus ticket as the homepage is working nicely. |
ianmilligan1 commentedJun 1, 2020
This ticket is to hold information for the homepage of the new docs site, i.e. what is currently at https://ruebot.github.io/aut-docs-redux/.