Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: option to exclude URL patters #12

Closed
adam3smith opened this issue Jan 17, 2019 · 3 comments
Labels

Comments

@adam3smith
Copy link
Contributor

@adam3smith adam3smith commented Jan 17, 2019

Mostly we can assume those to be permanent, no need to extra archive them.

(this needn't be in the first release; just keeping track of ideas)

@greebie

This comment has been minimized.

Copy link
Collaborator

@greebie greebie commented Jan 18, 2019

Sounds good. Basically, it would just require a flag, some conditional logic and a new regex.

@adam3smith adam3smith changed the title Feature: option to exclude DOI links Feature: option to exclude URL patters Feb 5, 2019
@adam3smith

This comment has been minimized.

Copy link
Contributor Author

@adam3smith adam3smith commented Feb 5, 2019

Have thought about this more -- the way to do this is to add an exclude parameter to the extract and archiv functions that takes a regular expression and doesn't archive those, so e.g.
archiv.fromText("filepath", exclude="^https?:\\/\\/doi\\.org\\/") would exclude DOIs

@adam3smith

This comment has been minimized.

Copy link
Contributor Author

@adam3smith adam3smith commented Dec 4, 2019

implemented in #31

@adam3smith adam3smith closed this Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.