Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update aut documentation for https://github.com/archivesunleashed/aut/pull/292 #74

Closed
ruebot opened this issue Nov 23, 2018 · 1 comment
Closed

Comments

@ruebot
Copy link
Member

@ruebot ruebot commented Nov 23, 2018

Place-holder ticket for updating aut documentation for archivesunleashed/aut#292 when the next release happens.

@greebie

This comment has been minimized.

Copy link
Collaborator

@greebie greebie commented Nov 27, 2018

The syntax for getting just the filename (not the fullpath) from the path (using @ruebot 's code here archivesunleashed/aut#292 (comment)) is:

import io.archivesunleashed._
import io.archivesunleashed.matchbox._
import org.apache.commons.io.FilenameUtils

RecordLoader.loadArchives("/home/nruest/tmp/test-warcs/5467/*.gz", sc)
  .map(r => (r.getArchiveFilename, r.getHttpStatus, FilenameUtils.getName(r.getArchiveFilename)))
  .saveAsTextFile("/home/nruest/tmp/292_final_test")
ianmilligan1 added a commit that referenced this issue Nov 28, 2018
getArchiveFilename and getHttpStatus
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.