Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upPatch for #269: Replace backslash with forward slash in URL #276
Conversation
borislin
added some commits
Oct 10, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
codecov-io
Oct 16, 2018
Codecov Report
Merging #276 into master will not change coverage.
The diff coverage is100%
.
@@ Coverage Diff @@
## master #276 +/- ##
=======================================
Coverage 70.36% 70.36%
=======================================
Files 41 41
Lines 1046 1046
Branches 192 192
=======================================
Hits 736 736
Misses 244 244
Partials 66 66
Impacted Files | Coverage Δ | |
---|---|---|
.../io/archivesunleashed/matchbox/ExtractDomain.scala | 87.5% <100%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 4fe05a5...bf458e9. Read the comment docs.
codecov-io
commented
Oct 16, 2018
•
Codecov Report
@@ Coverage Diff @@
## master #276 +/- ##
=======================================
Coverage 70.36% 70.36%
=======================================
Files 41 41
Lines 1046 1046
Branches 192 192
=======================================
Hits 736 736
Misses 244 244
Partials 66 66
Continue to review full report at Codecov.
|
borislin
requested review from
greebie,
lintool and
ruebot
Oct 16, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ruebot
Oct 17, 2018
Member
@ianmilligan1 you want to test this one out since it is for #269?
@borislin can you update your branch?
@ianmilligan1 you want to test this one out since it is for #269? @borislin can you update your branch? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
@ruebot yep, will do! |
ianmilligan1
approved these changes
Oct 17, 2018
Tested and works well – thanks @borislin!
borislin commentedOct 16, 2018
•
edited
This PR improves ExtractDomain by replacing backslash with forward slash in URL before passing it into Java URL class.
GitHub issue(s):
What does this Pull Request do?
This PR improves URL parsing in
ExtractDomain
by replacing backslash with forward slash before passing it into Java URL class, allowingExtractDomain
to capture the true domain of an URL.How should this be tested?
git fetch --all
git checkout fix-url
mvn clean install
mkdir -p path/to/where/ever/you/can/write/output/all-text path/to/where/ever/you/can/write/output/all-domains path/to/where/ever/you/can/write/output/gephi path/to/where/ever/you/can/write/spark-jobs
Current Results
With this PR patch:
/tuna1/scratch/aut-issue-269/derivatives/all-domains
or/tuna1/scratch/aut-issue-269/derivatives/all-domains.txt
(a combined version of all files in/tuna1/scratch/aut-issue-269/derivatives/all-domains
)/tuna1/scratch/aut-issue-269/derivatives/gephi
(doesn't contain backslash anymore, proper domainseetorontonow.canada-booknow.com
has been extracted from URL)Without this PR patch (
master
branch):/tuna1/scratch/aut-issue-269/derivatives/all-domains-without-patch
or/tuna1/scratch/aut-issue-269/derivatives/all-domains-without-patch.txt
(combined version)/tuna1/scratch/aut-issue-269/derivatives/gephi-without-patch
(contains backslash as in URLseetorontonow.canada-booknow.com\booking_results.php
)Interested parties
@lintool @ianmilligan1 @ruebot @greebie