Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Homepage for New Docs Site #70

Closed
ianmilligan1 opened this issue Jun 1, 2020 · 12 comments
Closed

Homepage for New Docs Site #70

ianmilligan1 opened this issue Jun 1, 2020 · 12 comments

Comments

@ianmilligan1
Copy link
Member

@ianmilligan1 ianmilligan1 commented Jun 1, 2020

This ticket is to hold information for the homepage of the new docs site, i.e. what is currently at https://ruebot.github.io/aut-docs-redux/.

Screen Shot 2020-06-01 at 2 08 07 PM

@ianmilligan1
Copy link
Member Author

@ianmilligan1 ianmilligan1 commented Jun 1, 2020

Here is my suggested content. I couldn't seem to find a good way to get links to the "undraw" images I selected, but they do seem to have unique identifiers in their names. 🤷‍♂️ @ruebot

Homepage

Analyze Large Web Archives

Undraw Image: "Analysis"

The Archives Unleashed Toolkit is an open-source platform for analyzing web archives built on Apache Spark, which provides powerful tools for analytics and data processing.

What will you find here?

Undraw Image: "Onboarding"

This documentation is based on a cookbook approach, providing a series of "recipes" for addressing a number of common analytics tasks to provide inspiration for your own analysis. We generally provide examples for resilient distributed datasets (RDD) in Scala, and DataFrames in both Scala and Python. We leave it up to you to choose Scala or Python flavours of Spark.

I want to pitch in! Can I help?

Undraw image: "Proud Coder"

If you would like to contribute code or help with the development of the Archives Unleashed Cloud or Toolkit, we'd love to have you as part of the project! Visit our GitHub organization and repositories, open issues, submit pull requests, or make any suggestion in our Slack group.

What else is going on with your project?

Undraw image: "Newsletter"

Interested in the project? Subscribe to our newsletter! Or you can follow visit our website here for more information.

@ruebot
Copy link
Member

@ruebot ruebot commented Jun 1, 2020

@ianmilligan1 the other three sound like callouts for the Archives Unleashed site. What I'm looking for here is to highlight 2-4 aut features to draw folks in. That make sense?

@ianmilligan1
Copy link
Member Author

@ianmilligan1 ianmilligan1 commented Jun 1, 2020

Oh, I see. Something like?

  • Extract the plain text out of your web archive
  • Explore changing link relationships
  • Unlock your collections in many other ways
@ruebot
Copy link
Member

@ruebot ruebot commented Jun 1, 2020

Yeah, totally!

@ianmilligan1
Copy link
Member Author

@ianmilligan1 ianmilligan1 commented Jun 1, 2020

How about something like?

Analyze Large Web Archives

Undraw Image: "Analysis"

The Archives Unleashed Toolkit is an open-source platform for analyzing web archives built on Apache Spark, which provides powerful tools for analytics and data processing.

This documentation is based on a cookbook approach, providing a series of "recipes" for addressing a number of common analytics tasks to provide inspiration for your own analysis. We generally provide examples for resilient distributed datasets (RDD) in Scala, and DataFrames in both Scala and Python. We leave it up to you to choose Scala or Python flavours of Spark.

What can you do with the Toolkit?

Extract text from your web archives

Undraw Image: "Body text"

Do you have WARCs or ARCs? Want text? With the Archives Unleashed Toolkit, you can run jobs to extract all the plain text from a web archive. You can also use a variety of filters, including filtering by date, language, keyword, domain, or URL pattern. Soon you'll be mining text to your heart's content.

Explore hyperlink networks within a web archive

Undraw image: "Nakamoto"

Hyperlinking practice can tell us a lot about web archives: where did people link to for their information? How did these links change over time? Which websites, based on their hyperlinks, were the most influential? The toolkit allows you to extract site link structures, and organize them by URL pattern or crawl date. We also support seamless exportation to Gephi.

Learn about your collections in many other ways

Undraw image: "Instant analysis"

That's not all. We support collection analysis (what can you find within the collection, from URLs to content type), image analysis, as well as the extraction of binary files from PowerPoint files to spreadsheets to PDFs. Don't see something that you wish we did? Let us know in a GitHub issue.

@ianmilligan1
Copy link
Member Author

@ianmilligan1 ianmilligan1 commented Jun 1, 2020

If we go with that, the links would be to the specific parts of the documentation above (I didn't put those in yet as everything's in flux).

@ruebot
Copy link
Member

@ruebot ruebot commented Jun 1, 2020

I'm going to yank those buttons. Not 100% on them. We can put them back if there is an over all demand.

@ruebot
Copy link
Member

@ruebot ruebot commented Jun 1, 2020

Screenshot_2020-06-01 Archives Unleashed Toolkit · An open-source platform for analyzing web archives with Apache Spark

I tweaked the first one a bit.

I'm gonna see if I can play with the people in the image. Right now they all appear to be caucasian.

@ianmilligan1
Copy link
Member Author

@ianmilligan1 ianmilligan1 commented Jun 1, 2020

Yeah sounds good on tweaking the images - but it otherwise looks really great @ruebot (and sounds good on yanking those buttons, I think they would confuse and busy it up more than they'd add).

ruebot added a commit that referenced this issue Jun 1, 2020
@ianmilligan1
Copy link
Member Author

@ianmilligan1 ianmilligan1 commented Jun 1, 2020

I only see this when I shrink my window, but the white on green (?) text is unreadable to my colour-deficient eyes:

Screen Shot 2020-06-01 at 4 41 45 PM

Also, as I mentioned in Slack, maybe we should have a "Enter the Docs" or "Read the Docs" button somewhere prominently that brings us into the current docs. I am finding it a bit hard to navigate.

@ruebot
Copy link
Member

@ruebot ruebot commented Jun 1, 2020

docs

Good call. I had that in the header originally, and removed it. I'll get that added.

Green on white

Good to know where the secondary colour setting finally comes into play! I'll get that taken care of too.

ruebot added a commit that referenced this issue Jun 1, 2020
ruebot added a commit that referenced this issue Jun 1, 2020
ruebot added a commit that referenced this issue Jun 3, 2020
@ianmilligan1
Copy link
Member Author

@ianmilligan1 ianmilligan1 commented Jun 6, 2020

Homepage is live at https://aut.docs.archivesunleashed.org/ - I think we can close this omnibus ticket as the homepage is working nicely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.