Skip to content
Permalink
Tree: 00e816629a
Commits on Jun 26, 2020
  1. Add Python formatter GitHub Action. (#489)

    ruebot committed Jun 26, 2020
    - Setup Python formatter that runs isort and black.
    - Rename scala formatter, so all three are consistent now.
    - Resolves #488
    - Normalize all the formatters
    - Sets all the formatters to run, but not apply changes. It would be helpful to do this, but we can't because we can't do
    that, because we can't push to a person's PR.
    - Apply isort/black
    - Make sure isort and black are consistent in how they are opinionated
    about imports
    - Use scalafmt 2.3.2 (same as GH action)
Commits on Jun 25, 2020
  1. Add Google Java Formatter as an action, and apply it. (#485)

    ruebot committed Jun 25, 2020
    - Apply Google Java Formatter to all Java files.
    - Add GitHub Action for the formatter
    - Resolves #484
    - Update TravisCI to use Bionic
  2. Add scalafmt GitHub action and apply it to scala code. (#487)

    ruebot committed Jun 25, 2020
    - Add scalafmt GitHub action
    - Apply scalafmt to Scala codebase
    - Resolves #486
Commits on Jun 18, 2020
  1. Update documentation link.

    ruebot committed Jun 18, 2020
  2. Spark 3.0.0 + Java 11 support. (#375)

    ruebot committed Jun 18, 2020
    - Update to Spark 3.0.0
    - Update to Java 11
    - Update README
    - Remove Java8 support
    - Resolves #375
Commits on Jun 17, 2020
  1. Add Python implementation of SaveBytes. (#482)

    ruebot committed Jun 17, 2020
    - Resolves #478
    - Tweak formatting in DataFrameLoader
Commits on Jun 15, 2020
  1. Bump xercesImpl from 2.11.0 to 2.12.0 (#481)

    dependabot committed Jun 15, 2020
    Bumps xercesImpl from 2.11.0 to 2.12.0.
    
    Signed-off-by: dependabot[bot] <support@github.com>
    
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Commits on Jun 8, 2020
Commits on Jun 3, 2020
  1. [maven-release-plugin] prepare for next development iteration

    ruebot committed Jun 3, 2020
Commits on Jun 1, 2020
  1. Remove RDD suffixes on file, class, and object names. (#479)

    ruebot committed Jun 1, 2020
    - Remove all the RDD suffixes added previously
    - Rename image_graph to imagegraph (Python)
    - Rename GetExtensionMime to GetExtensionMIME (Scala)
    - Remove textFiles (Scala)
    - Remove text_files (Python)
    - Remove TextFilesInformationExtractor
    - Rename files all affected files as needed
    - Update tests as needed
Commits on May 29, 2020
  1. PEP8 Python app method names. (#477)

    ruebot committed May 29, 2020
    - Resolve 468
  2. Move Python UDF methods out of their own class. (#475)

    ruebot committed May 29, 2020
    - Resolve #467
    - README button colour tweak for UserDocs
Commits on May 28, 2020
  1. Add DataFrame udf tests. (#474)

    ruebot committed May 28, 2020
    - Resolves #473
Commits on May 27, 2020
  1. Remove tabDelimit. (#472)

    ruebot committed May 27, 2020
    - Resolves #471
    - Resolves #59
    - Remove TupleFormatter (tabDelimit functionality)
    - Remove shapeless from pom
  2. Remove NER functionality. (#470)

    ruebot committed May 27, 2020
    - Resolves #48
    - Resolves #52
    - Resolves #53
    - Resolves #469
    - Remove all NER associated functionality
    - Tweak pom.xml to handle the removal
  3. Add ExtractPopularImages, WriteGEXF, and WriteGraphML to Python. (#466)

    ruebot committed May 27, 2020
    - Resolves #409
    - Add Python implementations of
      - ExtractPopularImages
      - WriteGraphML
      - WriteGEXF
    - Clean up formatting in app.py, and udfs
    - Cleanup doc comments on the Scala side
Commits on May 24, 2020
  1. Remove ExtractImageDetailsDF; resolves #464. (#465)

    ruebot committed May 24, 2020
Commits on May 19, 2020
  1. Implement Scala Matchbox UDFs in Python. (#463)

    ruebot committed May 19, 2020
    - Resolves #408
    - Alphabetizes DataFrameloader functions
    - Alphabetizes UDFs functions
    - Move DataFrameLoader to df packages
    - Move UDFs out of df into their own package
    - Rename UDFs (no more DF tagged to the end).
    - Update tests as necessary
    - Partially addresses #410, #409
    - Supersedes #412.
Commits on May 10, 2020
  1. Import clean-up for df package. (#462)

    ruebot committed May 10, 2020
Commits on May 4, 2020
  1. [maven-release-plugin] prepare for next development iteration

    ruebot committed May 4, 2020
  2. [skip travis] README updates (#460)

    ruebot committed May 4, 2020
    - `$` should only be used if output is also shown (mdl)
    - Add UserDoc badge, and yank buried documentation section
    - Additional formatting and typo fixes
  3. Set spark-submit app name to be "aut - extractorName". (#459)

    ruebot committed May 4, 2020
    - Resolves #458
Commits on Apr 27, 2020
  1. Add RemovePrefixWWWDF to DomainFrequencyExtractor. (#457)

    ruebot committed Apr 27, 2020
    - Resolves #456
    - Update test
Commits on Apr 23, 2020
Commits on Apr 22, 2020
  1. Add option to save to Parquet for app. (#454)

    ruebot committed Apr 22, 2020
    - Resolves #448
    - Update test
    - Add CSV headers to coalesce CSV output
    - Update README
  2. Update PlainTextExtractor to output a single column; text. (#453)

    ruebot committed Apr 22, 2020
    - Resolves #452
    - PlainTextExtractor runs ExtractBoilerplate on `content`
    - Update test
Commits on Apr 21, 2020
  1. Add a number of additional app extractors. (#451)

    ruebot committed Apr 21, 2020
    - Resolves #447
    - Add AudioInformationExtractor, ImageInformationExtractor,
    PDFInformationExtractor, PresentationProgramInformationExtractor,
    SpreadsheetInformationExtractor, TextFilesInformationExtractor,
    VideoInformationExtractor, WebGraphExtractor,
    WordProcessorInformationExtractor
    - Add tests for the new extractors
    - Update CommandLineApp to use new extractors
    - Add domain, and language column WebPagesExtractor
    - Change "TEXT" to "csv"
    - Lower case "GEXF" and "GRAPHML"
Commits on Apr 20, 2020
  1. Remove RDD option in app; DataFrame only now. (#450)

    ruebot committed Apr 20, 2020
    - Resolves #449
    - Updates and renames tests were applicable
    - Update README to reflect updates
Commits on Apr 15, 2020
  1. [skip-travis] Add spark-submit option to README; resolves #444. (#446)

    ruebot committed Apr 15, 2020
  2. [maven-release-plugin] prepare for next development iteration

    ruebot committed Apr 15, 2020
Commits on Apr 14, 2020
  1. Remove WriteGraph; resolves #439. (#441)

    ruebot committed Apr 14, 2020
    * Cleanup WriteGraphML doc comments.
Commits on Apr 13, 2020
  1. Remove GraphX support; resolves #442. (#443)

    ruebot committed Apr 13, 2020
    - Remove graphx dependencies from pom
    - Remove ExtractGraphX and related tests
    - Remove WriteGraphXML and related tests
Older
You can’t perform that action at this time.