Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upConvert RecordLoader.loadArchives to a Spark Data Source #371
Labels
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ruebot commentedNov 5, 2019
•
edited
Since we're pivoting to full DataFrame support (#223, #190), we should convert/migrate
RecordLoader.loadArchives
, and any other related functions to a Spark Data Source. That way we could do things like:Then, we could, (since it's an open issue #147) write WARCs that way too?🤷♂
These are the Spark core data sources:
Community implemented data sources: