Skip to content
Permalink
Tree: 4a3a22482a
Commits on Jun 26, 2020
  1. Commit changes

    ruebot committed Jun 26, 2020
  2. Commit changes

    ruebot committed Jun 26, 2020
  3. cleanup

    ruebot committed Jun 26, 2020
  4. Add Python formatter GitHub Action.

    ruebot committed Jun 26, 2020
    - Setup Python formatter that runs isort and black.
    - Rename scala formatter, so all three are consistent now.
    - Resolves #488
Commits on Jun 25, 2020
  1. Add Google Java Formatter as an action, and apply it. (#485)

    ruebot committed Jun 25, 2020
    - Apply Google Java Formatter to all Java files.
    - Add GitHub Action for the formatter
    - Resolves #484
    - Update TravisCI to use Bionic
  2. Add scalafmt GitHub action and apply it to scala code. (#487)

    ruebot committed Jun 25, 2020
    - Add scalafmt GitHub action
    - Apply scalafmt to Scala codebase
    - Resolves #486
Commits on Jun 18, 2020
  1. Update documentation link.

    ruebot committed Jun 18, 2020
  2. Spark 3.0.0 + Java 11 support. (#375)

    ruebot committed Jun 18, 2020
    - Update to Spark 3.0.0
    - Update to Java 11
    - Update README
    - Remove Java8 support
    - Resolves #375
Commits on Jun 17, 2020
  1. Add Python implementation of SaveBytes. (#482)

    ruebot committed Jun 17, 2020
    - Resolves #478
    - Tweak formatting in DataFrameLoader
Commits on Jun 15, 2020
  1. Bump xercesImpl from 2.11.0 to 2.12.0 (#481)

    dependabot committed Jun 15, 2020
    Bumps xercesImpl from 2.11.0 to 2.12.0.
    
    Signed-off-by: dependabot[bot] <support@github.com>
    
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Commits on Jun 8, 2020
Commits on Jun 3, 2020
  1. [maven-release-plugin] prepare for next development iteration

    ruebot committed Jun 3, 2020
Commits on Jun 1, 2020
  1. Remove RDD suffixes on file, class, and object names. (#479)

    ruebot committed Jun 1, 2020
    - Remove all the RDD suffixes added previously
    - Rename image_graph to imagegraph (Python)
    - Rename GetExtensionMime to GetExtensionMIME (Scala)
    - Remove textFiles (Scala)
    - Remove text_files (Python)
    - Remove TextFilesInformationExtractor
    - Rename files all affected files as needed
    - Update tests as needed
Commits on May 29, 2020
  1. PEP8 Python app method names. (#477)

    ruebot committed May 29, 2020
    - Resolve 468
  2. Move Python UDF methods out of their own class. (#475)

    ruebot committed May 29, 2020
    - Resolve #467
    - README button colour tweak for UserDocs
Commits on May 28, 2020
  1. Add DataFrame udf tests. (#474)

    ruebot committed May 28, 2020
    - Resolves #473
Commits on May 27, 2020
  1. Remove tabDelimit. (#472)

    ruebot committed May 27, 2020
    - Resolves #471
    - Resolves #59
    - Remove TupleFormatter (tabDelimit functionality)
    - Remove shapeless from pom
  2. Remove NER functionality. (#470)

    ruebot committed May 27, 2020
    - Resolves #48
    - Resolves #52
    - Resolves #53
    - Resolves #469
    - Remove all NER associated functionality
    - Tweak pom.xml to handle the removal
  3. Add ExtractPopularImages, WriteGEXF, and WriteGraphML to Python. (#466)

    ruebot committed May 27, 2020
    - Resolves #409
    - Add Python implementations of
      - ExtractPopularImages
      - WriteGraphML
      - WriteGEXF
    - Clean up formatting in app.py, and udfs
    - Cleanup doc comments on the Scala side
Commits on May 24, 2020
  1. Remove ExtractImageDetailsDF; resolves #464. (#465)

    ruebot committed May 24, 2020
Commits on May 19, 2020
  1. Implement Scala Matchbox UDFs in Python. (#463)

    ruebot committed May 19, 2020
    - Resolves #408
    - Alphabetizes DataFrameloader functions
    - Alphabetizes UDFs functions
    - Move DataFrameLoader to df packages
    - Move UDFs out of df into their own package
    - Rename UDFs (no more DF tagged to the end).
    - Update tests as necessary
    - Partially addresses #410, #409
    - Supersedes #412.
Commits on May 10, 2020
  1. Import clean-up for df package. (#462)

    ruebot committed May 10, 2020
Commits on May 4, 2020
  1. [maven-release-plugin] prepare for next development iteration

    ruebot committed May 4, 2020
  2. [skip travis] README updates (#460)

    ruebot committed May 4, 2020
    - `$` should only be used if output is also shown (mdl)
    - Add UserDoc badge, and yank buried documentation section
    - Additional formatting and typo fixes
  3. Set spark-submit app name to be "aut - extractorName". (#459)

    ruebot committed May 4, 2020
    - Resolves #458
Commits on Apr 27, 2020
  1. Add RemovePrefixWWWDF to DomainFrequencyExtractor. (#457)

    ruebot committed Apr 27, 2020
    - Resolves #456
    - Update test
Commits on Apr 23, 2020
Commits on Apr 22, 2020
  1. Add option to save to Parquet for app. (#454)

    ruebot committed Apr 22, 2020
    - Resolves #448
    - Update test
    - Add CSV headers to coalesce CSV output
    - Update README
  2. Update PlainTextExtractor to output a single column; text. (#453)

    ruebot committed Apr 22, 2020
    - Resolves #452
    - PlainTextExtractor runs ExtractBoilerplate on `content`
    - Update test
Commits on Apr 21, 2020
  1. Add a number of additional app extractors. (#451)

    ruebot committed Apr 21, 2020
    - Resolves #447
    - Add AudioInformationExtractor, ImageInformationExtractor,
    PDFInformationExtractor, PresentationProgramInformationExtractor,
    SpreadsheetInformationExtractor, TextFilesInformationExtractor,
    VideoInformationExtractor, WebGraphExtractor,
    WordProcessorInformationExtractor
    - Add tests for the new extractors
    - Update CommandLineApp to use new extractors
    - Add domain, and language column WebPagesExtractor
    - Change "TEXT" to "csv"
    - Lower case "GEXF" and "GRAPHML"
Commits on Apr 20, 2020
  1. Remove RDD option in app; DataFrame only now. (#450)

    ruebot committed Apr 20, 2020
    - Resolves #449
    - Updates and renames tests were applicable
    - Update README to reflect updates
Commits on Apr 15, 2020
  1. [skip-travis] Add spark-submit option to README; resolves #444. (#446)

    ruebot committed Apr 15, 2020
  2. [maven-release-plugin] prepare for next development iteration

    ruebot committed Apr 15, 2020
Older
You can’t perform that action at this time.