Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upCopy Editing for Documentation #71
Comments
hey @SamFritz all our DataFrame fields have been renamed to lowercase: but in lots of cases, field names are still upper case, e.g., here: import io.archivesunleashed._
import io.archivesunleashed.udfs._
RecordLoader.loadArchives("/path/to/warcs", sc).webpages()
.select(extractDomain($"Url").as("Domain"))
.groupBy("Domain").count().orderBy(desc("count"))
.show(20, false) hey @ruebot can you confirm that |
@lintool if you're going to join in on reviewing, you'll want to review these docs: https://github.com/archivesunleashed/aut-docs/tree/docusaurus/docs I'll check though for column name issues now. |
Edits for @ruebot AUT Documentation Suggested Changes Home
Dependencies
Usage
The Toolkit at Scale
DataFrame Schemas
Toolkit Walkthrough
Collection Analysis
|
@SamFritz I think I got most of it. Just pushed things up locally, and have a preview here https://ruebot.github.io/aut-docs-redux/ |
I don't follow. Can you link to an example?
Maybe? @ianmilligan1, thoughts?
Yep, for now. |
Re: |
A few catches:
The rest are one-offs:
I also did get a page not found when clicking forward to https://ruebot.github.io/aut-docs-redux/docs/rdd-filters. Somewhere there's a pointer to Otherwise things are looking good to me! |
Python line continuations with backslash should have a space, per PEP8: from aut import *
WebArchive(sc, sqlContext, "/path/to/warcs")\
.webpages() \
.select("url") \
.show(20, False) So, slash on first line needs extra space; others are fine. Issue throughout docs. |
Only remaining TODO is #72. |
BTW I have an ongoing draft PR for typo fixes/clean ups etc. at #76. |
SamFritz commentedJun 1, 2020
Going through Documentation for copy editing and prose suggestions/clean up
Areas for Review: