Skip to content
Permalink
Tree: 36d86baa59
Commits on Feb 10, 2020
  1. Update to 0.50.0 and DataFrames where applicable. (#44)

    ruebot committed Feb 10, 2020
    * Update to 0.50.0 and DataFrames where applicable.
  2. Add DataFrame schemas; resolves #45. (#46)

    ruebot committed Feb 10, 2020
    * Add DataFrame schemas; resolves #45.
    * Add 0.50.0 version, and rename to imagegraph in current
    * remove redundant doc
Commits on Feb 6, 2020
  1. Document keepValidPagesDF() (#43)

    ruebot committed Feb 6, 2020
    * Document keepValidPagesDF()
    * Words are hard.
Commits on Feb 5, 2020
  1. 0.50.0 release

    ruebot committed Feb 5, 2020
  2. Code formatting and code consistency review. (#42)

    ruebot committed Feb 5, 2020
    * Code formatting and code consistency review.
    
    - Resolves #29
    
    * review
Commits on Feb 3, 2020
Commits on Jan 20, 2020
  1. Update documentation for archivesunleashed/aut#372 (#39)

    ruebot committed Jan 20, 2020
    * Add DF results Python section
    * Add won't implement language to binary analysis.
    * Add won't implement language to standard derivatives.
    * Remove index, fix ToC in setting up.
    * Update README, add scala df to link analys, add TSV rdd results
    * text-analysis scala df
    * Add to be implemented
    * Resolves #22
Commits on Jan 15, 2020
Commits on Jan 13, 2020
  1. More DF filter updates. (#37)

    ruebot authored and ianmilligan1 committed Jan 13, 2020
Commits on Jan 8, 2020
  1. Add extract-simple-site-link-structure DF example. (#35)

    ruebot committed Jan 8, 2020
    * Add extract-simple-site-link-structure DF example.
Commits on Dec 18, 2019
  1. Update filters documentation for https://github.com/archivesunleashed… (

    ruebot authored and ianmilligan1 committed Dec 18, 2019
    #33)
    
    * Update filters documentation for archivesunleashed/aut#391
    
    - Add ToC
    - Add Scala RDD, Scala DF, and Python DF sections
    
    * review
Commits on Dec 17, 2019
Commits on Dec 5, 2019
  1. Updates for archivesunleashed/aut#387 (#30)

    ruebot authored and ianmilligan1 committed Dec 5, 2019
    * Updates for archivesunleashed/aut#387
    
    * Missed some in #24
Commits on Nov 26, 2019
  1. Add "Find Images Shared Between Domains" section. (#27)

    ruebot authored and ianmilligan1 committed Nov 26, 2019
    * Add "Find Images Shared Between Domains" section.
    
    - Resolves archivesunleashed/aut#237
    
    * review
Commits on Nov 22, 2019
  1. Add example for Scala DF version of "Extract Most Frequent Images MD5… (

    ruebot authored and ianmilligan1 committed Nov 22, 2019
    #28)
    
    * Add example for Scala DF version of "Extract Most Frequent Images MD5 Hash".
    
    - See archivesunleashed/aut#382
    
    * rename
Commits on Nov 21, 2019
Commits on Nov 19, 2019
  1. Move cookbook to standard derivatives guide (#21)

    ruebot committed Nov 19, 2019
    - Update all current cookbook examples to follow documentation style
    guide
    - Add Parquet, CSV, S3, and Python DF examples
    - Update index
Commits on Nov 12, 2019
Commits on Nov 7, 2019
  1. Updates for changing RemoveHttpHeader to RemoveHTTPHeader. (#19)

    SinghGursimran authored and ruebot committed Nov 7, 2019
    - Add ScalaDF example for: Extract Plain Text Without HTTP Headers
    - See also:
       - archivesunleashed/aut#368
       - archivesunleashed/aut#374
       - archivesunleashed/aut#370
Commits on Nov 6, 2019
Commits on Nov 5, 2019
Commits on Oct 28, 2019
  1. Incorporate PySpark setup into overall documentation. (#16)

    ruebot authored and ianmilligan1 committed Oct 28, 2019
    * Incorporate PySpark setup into overall documentation.
    
    - Removes standalone PySpark documentation.
    - Incorporates PySpark setup into getting started documentation.
    - Incorporates PySpark examples into overall documentation.
    - Breaks out scaling documentation to it's own documentation.
    - Removes cruft.
    - Renames files so they're all lowercase now.
    - Updates README ToC
Commits on Oct 26, 2019
  1. Add binary analysis (#11)

    ruebot committed Oct 26, 2019
    - Add documentation for binary analysis and extraction in Scala DF and
    Python DF
    - Add Scala DF and Python DF version of extractImageLinks
    - Update main ToC
Commits on Oct 25, 2019
  1. Fix link-analysis ToC links. (#12)

    ruebot authored and ianmilligan1 committed Oct 25, 2019
Commits on Oct 23, 2019
  1. Fixed Table of Content on Current Doc README (#10)

    ianmilligan1 committed Oct 23, 2019
    * Fixing table of content links
    
    * Adding more relative links
    
    * Adding image (fixing existing markdown)
    
    * Removing image on seeing it rendered
Commits on Oct 21, 2019
  1. Changed text-analysis.md to use consistent phrasing (#8)

    lintool committed Oct 21, 2019
    Heading changed to verb phrases so it fits with "How do I..."
  2. Delete unneeded files (#7)

    lintool committed Oct 21, 2019
  3. Refactoring Documentation for Explanations and Consistent Structure (#5)

    ianmilligan1 authored and ruebot committed Oct 21, 2019
    - Flesh out root README with a site-wide table of contents;
    - Provide some basic introduction;
    - Provide some context on RDD/DF; and
    - Break the large "getting started and overview" document into at least two parts.
Older
You can’t perform that action at this time.