Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
Permalink
Tree: 45de55b06b
Commits on Jan 15, 2020
Commits on Jan 13, 2020
  1. More DF filter updates. (#37)

    ruebot authored and ianmilligan1 committed Jan 13, 2020
Commits on Jan 8, 2020
  1. Add extract-simple-site-link-structure DF example. (#35)

    ruebot committed Jan 8, 2020
    * Add extract-simple-site-link-structure DF example.
Commits on Dec 18, 2019
  1. Update filters documentation for https://github.com/archivesunleashed… (

    ruebot authored and ianmilligan1 committed Dec 18, 2019
    #33)
    
    * Update filters documentation for archivesunleashed/aut#391
    
    - Add ToC
    - Add Scala RDD, Scala DF, and Python DF sections
    
    * review
Commits on Dec 17, 2019
Commits on Dec 5, 2019
  1. Updates for archivesunleashed/aut#387 (#30)

    ruebot authored and ianmilligan1 committed Dec 5, 2019
    * Updates for archivesunleashed/aut#387
    
    * Missed some in #24
Commits on Nov 26, 2019
  1. Add "Find Images Shared Between Domains" section. (#27)

    ruebot authored and ianmilligan1 committed Nov 26, 2019
    * Add "Find Images Shared Between Domains" section.
    
    - Resolves archivesunleashed/aut#237
    
    * review
Commits on Nov 22, 2019
  1. Add example for Scala DF version of "Extract Most Frequent Images MD5… (

    ruebot authored and ianmilligan1 committed Nov 22, 2019
    #28)
    
    * Add example for Scala DF version of "Extract Most Frequent Images MD5 Hash".
    
    - See archivesunleashed/aut#382
    
    * rename
Commits on Nov 21, 2019
Commits on Nov 19, 2019
  1. Move cookbook to standard derivatives guide (#21)

    ruebot committed Nov 19, 2019
    - Update all current cookbook examples to follow documentation style
    guide
    - Add Parquet, CSV, S3, and Python DF examples
    - Update index
Commits on Nov 12, 2019
Commits on Nov 7, 2019
  1. Updates for changing RemoveHttpHeader to RemoveHTTPHeader. (#19)

    SinghGursimran authored and ruebot committed Nov 7, 2019
    - Add ScalaDF example for: Extract Plain Text Without HTTP Headers
    - See also:
       - archivesunleashed/aut#368
       - archivesunleashed/aut#374
       - archivesunleashed/aut#370
Commits on Nov 6, 2019
Commits on Nov 5, 2019
Commits on Oct 28, 2019
  1. Incorporate PySpark setup into overall documentation. (#16)

    ruebot authored and ianmilligan1 committed Oct 28, 2019
    * Incorporate PySpark setup into overall documentation.
    
    - Removes standalone PySpark documentation.
    - Incorporates PySpark setup into getting started documentation.
    - Incorporates PySpark examples into overall documentation.
    - Breaks out scaling documentation to it's own documentation.
    - Removes cruft.
    - Renames files so they're all lowercase now.
    - Updates README ToC
Commits on Oct 26, 2019
  1. Add binary analysis (#11)

    ruebot committed Oct 26, 2019
    - Add documentation for binary analysis and extraction in Scala DF and
    Python DF
    - Add Scala DF and Python DF version of extractImageLinks
    - Update main ToC
Commits on Oct 25, 2019
  1. Fix link-analysis ToC links. (#12)

    ruebot authored and ianmilligan1 committed Oct 25, 2019
Commits on Oct 23, 2019
  1. Fixed Table of Content on Current Doc README (#10)

    ianmilligan1 committed Oct 23, 2019
    * Fixing table of content links
    
    * Adding more relative links
    
    * Adding image (fixing existing markdown)
    
    * Removing image on seeing it rendered
Commits on Oct 21, 2019
  1. Changed text-analysis.md to use consistent phrasing (#8)

    lintool committed Oct 21, 2019
    Heading changed to verb phrases so it fits with "How do I..."
  2. Delete unneeded files (#7)

    lintool committed Oct 21, 2019
  3. Refactoring Documentation for Explanations and Consistent Structure (#5)

    ianmilligan1 authored and ruebot committed Oct 21, 2019
    - Flesh out root README with a site-wide table of contents;
    - Provide some basic introduction;
    - Provide some context on RDD/DF; and
    - Break the large "getting started and overview" document into at least two parts.
Commits on Oct 20, 2019
  1. Documentation reorg (#2)

    lintool authored and ruebot committed Oct 20, 2019
    * Beginning of doc reorg.
    
    * Update collection-analysis.md
    
    * Update collection-analysis.md
    
    * Update collection-analysis.md
    
    * Update collection-analysis.md
    
    * Update collection-analysis.md
    
    * Update collection-analysis.md
    
    * Update collection-analysis.md
    
    * Update collection-analysis.md
    
    * Update collection-analysis.md
    
    * Update index.md
    
    * Create link-analysis.md
    
    * Update index.md
    
    * Create text-analysis.md
    
    * Create image-analysis.md
    
    * Update index.md
    
    * Update link-analysis.md
    
    * Update collection-analysis.md
    
    * Update collection-analysis.md
    
    * Update collection-analysis.md
    
    * Update collection-analysis.md
Commits on Oct 19, 2019
  1. Wiping the slate clean (removing all files), but preserving all previ…

    lintool committed Oct 19, 2019
    …ous history of aut-docs.
Commits on Oct 17, 2019
  1. Updated Cookbook (markdown)

    ruebot committed Oct 17, 2019
  2. Updated Cookbook (markdown)

    ruebot committed Oct 17, 2019
Commits on Oct 10, 2019
Commits on Oct 8, 2019
  1. Updated 0.18.0 (markdown)

    ruebot committed Oct 8, 2019
Commits on Oct 1, 2019
  1. Updated _Sidebar (markdown)

    ruebot committed Oct 1, 2019
Older
You can’t perform that action at this time.