Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Spark jobs to use 0.60.0 of aut, and DataFrames instead of RDD #386

Closed
ruebot opened this issue Apr 16, 2020 · 1 comment
Closed

Update Spark jobs to use 0.60.0 of aut, and DataFrames instead of RDD #386

ruebot opened this issue Apr 16, 2020 · 1 comment
Assignees

Comments

@ruebot
Copy link
Member

@ruebot ruebot commented Apr 16, 2020

No description provided.

@ruebot ruebot self-assigned this Apr 16, 2020
@ruebot

This comment has been minimized.

Copy link
Member Author

@ruebot ruebot commented Apr 16, 2020

data migration:

  • s/.txt/.csv/g
  • on all -fullurls.txt remove the first and last character on each line. ( )
ruebot added a commit that referenced this issue Apr 16, 2020
- Resolves #386
- Move the faux txt derivatives to what they actually are; csv.
- Update Spark job to use DataFrames
- Update auk documentation and lessons with correct file extension
(s/txt/csv)
- Data migration needs to be completed on prod
  - rename full-text and full-domains
    - s/.txt/.csv/g
  - on all -fullurls.txt
    - remove the first and last character on each line. ( )
ruebot added a commit that referenced this issue Apr 16, 2020
- Resolves #386
- Move the faux txt derivatives to what they actually are; csv.
- Update Spark job to use DataFrames
- Update auk documentation and lessons with correct file extension
(s/txt/csv)
- Data migration needs to be completed on prod
  - rename full-text and full-domains
    - s/.txt/.csv/g
  - on all -fullurls.txt
    - remove the first and last character on each line. ( )
- TravisCI should only test Ruby 2.6.5
- Update tests to reflect changes
- Rename text fixtures
@ruebot ruebot closed this in #387 Apr 16, 2020
ruebot added a commit that referenced this issue Apr 16, 2020
- Resolves #386
- Move the faux txt derivatives to what they actually are; csv.
- Update Spark job to use DataFrames
- Update auk documentation and lessons with correct file extension
(s/txt/csv)
- Data migration needs to be completed on prod
  - rename full-text and full-domains
    - s/.txt/.csv/g
  - on all -fullurls.txt
    - remove the first and last character on each line. ( )
- TravisCI should only test Ruby 2.6.5
- Update tests to reflect changes
- Rename text fixtures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

1 participant
You can’t perform that action at this time.