Skip to content
Permalink
Tree: 2ec06b3335
Commits on Jun 29, 2020
  1. Don't pull in Spark conf settings via application.yml.

    ruebot committed Jun 29, 2020
    - Use Spark conf to configure Spark jobs instead of application.yml
    because delayed jobs needs to be restarted to in order to tweak Spark
    job settings
    
    - Add Spark conf example
    
    - Don't email user on failed jobs anymore. Just email Nick.
    
    - Move from master branch to main branch
    
    - Add Page count for seed jobs for troubleshooting wasapi issues
Commits on Jun 18, 2020
  1. Update parallel to version 1.19.2 (#408)

    depfu committed Jun 18, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
  2. Update loofah to version 2.6.0 (#407)

    depfu committed Jun 18, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on Jun 16, 2020
  1. Update rack to version 2.2.3 (#406)

    depfu committed Jun 16, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on Jun 11, 2020
  1. Update codecov to version 0.1.17 (#405)

    depfu committed Jun 11, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on Jun 10, 2020
  1. Update ffi to version 1.13.1 (#404)

    depfu committed Jun 10, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on Jun 5, 2020
  1. Update websocket-extensions to version 0.1.5 (#403)

    depfu committed Jun 5, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
  2. Update sitemap_generator to version 6.1.2 (#402)

    depfu committed Jun 5, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on Jun 2, 2020
  1. Update ffi to version 1.13.0 (#401)

    depfu committed Jun 2, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on May 28, 2020
  1. Update kaminari to version 1.2.1 (#400)

    depfu committed May 28, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on May 27, 2020
  1. Update sitemap_generator to version 6.1.1 (#399)

    depfu committed May 27, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on May 21, 2020
  1. Update puma to version 3.12.6 (#398)

    depfu committed May 21, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on May 19, 2020
  1. Update all of rails to version 5.2.4.3 (#397)

    depfu committed May 19, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on May 12, 2020
  1. Update figaro to version 1.2.0 (#396)

    depfu committed May 12, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on May 9, 2020
  1. Update eslint to version 7.0.0 (#395)

    depfu committed May 9, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
  2. Update jquery-rails to version 4.4.0 (#394)

    depfu committed May 9, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on Apr 30, 2020
  1. Get date finished from domains file.

    ruebot committed Apr 30, 2020
  2. Get date finished from filtered_text directory.

    ruebot committed Apr 30, 2020
  3. Add crawl date frequency vizualization, and parallelize graphpass job.

    ruebot committed Apr 30, 2020
    - Add sub-job to textfilter job to create crawl frequency count data
    - Add helper method to prepare data for visualization
    - Update controller and view to show visualization
    - Add route for crawl date visualization data
    - Update Graphpass job to run two parallelized sub-jobs
      - Graphpass + combining part files
      - Removing output directories
Commits on Apr 28, 2020
  1. Start logging before the jobs are executed.

    ruebot committed Apr 28, 2020
  2. Make sure we only download arc/warc files, not wats or wanes.

    ruebot committed Apr 28, 2020
    - Update Spark job to run on directory, since wildcard will break things
    on directories with MANY files.
  3. Update Spark job to run auk jobs via spark-submit in parallel. (#393)

    ruebot committed Apr 28, 2020
    * Update Spark job to run auk jobs via spark-submit in parallel.
    
    - Update Rubocop config
    - Remove AUK Notebooks link
    - Update application config example
    - Tweak analyzed date helper to use filtered text to get date (last item
    that is ran in the pipeline)
    - Change name of "Full Text" derivative to "Web Page Text" since full
    text is misleading
    - Multiply data analyzed total by 3 since that's what we're doing in
    reality
Commits on Apr 24, 2020
  1. Update byebug to version 11.1.3 (#392)

    depfu committed Apr 24, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on Apr 18, 2020
  1. Update byebug to version 11.1.2 (#390)

    depfu committed Apr 18, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
Commits on Apr 17, 2020
  1. We should remove www prefixes on the domain count job.

    ruebot committed Apr 17, 2020
  2. We should remove www prefixes on the domain count job.

    ruebot committed Apr 17, 2020
Commits on Apr 16, 2020
  1. Another language tweak for the download tooltips.

    ruebot committed Apr 16, 2020
  2. tweak toolkit language

    ruebot committed Apr 16, 2020
  3. [ImgBot] Optimize images (#389)

    imgbot and ImgBotApp committed Apr 16, 2020
    /app/assets/images/Tutorial_domain_derivative_file.png -- 168.85kb -> 149.15kb (11.66%)
    
    Signed-off-by: ImgBotApp <ImgBotHelp@gmail.com>
    
    Co-authored-by: ImgBotApp <ImgBotHelp@gmail.com>
  4. Minor tweaks to domain page (follow up on #387) (#388)

    ianmilligan1 committed Apr 16, 2020
    * Minor tweaks to domain page (follow up on #387)
    * Suggest using "Import"; text -> CSV
    * Updating image
  5. Update jobs to use aut-0.60.0. (#387)

    ruebot committed Apr 16, 2020
    - Resolves #386
    - Move the faux txt derivatives to what they actually are; csv.
    - Update Spark job to use DataFrames
    - Update auk documentation and lessons with correct file extension
    (s/txt/csv)
    - Data migration needs to be completed on prod
      - rename full-text and full-domains
        - s/.txt/.csv/g
      - on all -fullurls.txt
        - remove the first and last character on each line. ( )
    - TravisCI should only test Ruby 2.6.5
    - Update tests to reflect changes
    - Rename text fixtures
Commits on Apr 14, 2020
  1. Tooooooooooooooooo many escapes.

    ruebot committed Apr 14, 2020
Commits on Apr 6, 2020
  1. Update loofah to version 2.5.0 (#385)

    depfu committed Apr 6, 2020
    Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com>
  2. Update aut version in use.

    ruebot committed Apr 6, 2020
Older
You can’t perform that action at this time.