Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd DataFrame schemas; resolves #45. #46
Conversation
This comment has been minimized.
This comment has been minimized.
We'll probably want to delete this too? https://github.com/archivesunleashed/aut-docs/blob/master/current/collection-analysis.md#view-all-fields-available |
This comment has been minimized.
This comment has been minimized.
This is exactly where I would have put it... |
@@ -0,0 +1,157 @@ | |||
# Archives Unleashed Toolkit DataFrames | |||
|
This comment has been minimized.
This comment has been minimized.
ianmilligan1
Feb 10, 2020
Member
Yeah we should add a tiny bit of descriptive text here. I assume this is for advanced users?
Something like:
Below you can find all of the DataFrame schemas found in each object. For example, if you extract .all
from WARCs, you will see the fields below. Some of the most popular ones include all
(which includes content, URLs, and file types); webpages
(which includes full-text content and language); and webgraph
which includes hyperlink information.
- `mime_type_tika` (string) | ||
- `content` (string) | ||
- `language` (string) | ||
- `content` (string) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@@ -0,0 +1,157 @@ | |||
# Archives Unleashed Toolkit DataFrames | |||
|
This comment has been minimized.
This comment has been minimized.
ianmilligan1
Feb 10, 2020
Member
Add text here too (same as whatever language we coalesce around above).
- `mime_type_tika` (string) | ||
- `content` (string) | ||
- `language` (string) | ||
- `content` (string) |
ruebot commentedFeb 10, 2020
@ianmilligan1 @lintool if you're good with this path, let me know, and I'll add one for 0.50.0 so we have that.
...I'm guessing there is a bit of prose we can add too. Feel free to comment, or just push to the branch.