Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upCopy Editing for Documentation #71
Comments
hey @SamFritz all our DataFrame fields have been renamed to lowercase: but in lots of cases, field names are still upper case, e.g., here: import io.archivesunleashed._
import io.archivesunleashed.udfs._
RecordLoader.loadArchives("/path/to/warcs", sc).webpages()
.select(extractDomain($"Url").as("Domain"))
.groupBy("Domain").count().orderBy(desc("count"))
.show(20, false) hey @ruebot can you confirm that |
@lintool if you're going to join in on reviewing, you'll want to review these docs: https://github.com/archivesunleashed/aut-docs/tree/docusaurus/docs I'll check though for column name issues now. |
Edits for @ruebot AUT Documentation Suggested Changes Home
Dependencies
Usage
The Toolkit at Scale
DataFrame Schemas
Toolkit Walkthrough
Collection Analysis
|
@SamFritz I think I got most of it. Just pushed things up locally, and have a preview here https://ruebot.github.io/aut-docs-redux/ |
I don't follow. Can you link to an example?
Maybe? @ianmilligan1, thoughts?
Yep, for now. |
Re: |
A few catches:
The rest are one-offs:
I also did get a page not found when clicking forward to https://ruebot.github.io/aut-docs-redux/docs/rdd-filters. Somewhere there's a pointer to Otherwise things are looking good to me! |
Python line continuations with backslash should have a space, per PEP8: from aut import *
WebArchive(sc, sqlContext, "/path/to/warcs")\
.webpages() \
.select("url") \
.show(20, False) So, slash on first line needs extra space; others are fine. Issue throughout docs. |
Only remaining TODO is #72. |
BTW I have an ongoing draft PR for typo fixes/clean ups etc. at #76. |
Updating thread here to include final copy editing changes I've found (most are pretty minor, but did raise a question for occurrence I found throughoutBinary analysis section). @ruebot I know this final week for you is a bit busy, so I'm happy to help implement changes where you need support. Noting the following copyedits below for documentation: Generation ResultsText Analysis
Link Analysis
Image Analysis
Binary Analysis
Standard Derivatives
Thanks for ushering in this amazing documentation Nick! and for all the testing @ianmilligan1! |
@SamFritz These are great catches! I can put these into a pull request tomorrow. |
Thanks @ianmilligan1! |
Will implement all except
The bracket is part of the output, so let's leave it in as a code snippet here. Otherwise just staging up the PR and will have it up momentarily. |
Awesome this can be closed with the PR. Caught a few extra ones in the binary section! |
SamFritz commentedJun 1, 2020
•
edited
Going through Documentation for copy editing and prose suggestions/clean up
Areas for Review: