Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add img alt text to imagegraph(); resolves #420. #422

Merged
merged 1 commit into from Feb 10, 2020
Merged

Conversation

@ruebot
Copy link
Member

ruebot commented Feb 10, 2020

GitHub issue(s): #420

What does this Pull Request do?

  • Update ExtractImageLinksRDD to grab alt text
  • Add alt_text column to imagegraph
  • Update tests

How should this be tested?

  • Travis should do it.
  • If you want:
import io.archivesunleashed._

RecordLoader.loadArchives("/home/nruest/Projects/au/aut/src/test/resources/warc", sc).imagegraph()
  .show(25, false)

// Exiting paste mode, now interpreting.

+----------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------+
|crawl_date|src                             |image_url                                                                                                                                                                      |alt_text                   |
+----------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------+
|20080430  |http://www.archive.org/         |http://www.archive.org/images/logoc.jpg                                                                                                                                        |                           |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/logo.jpg                                                                                                                                         |(logo)                     |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/main-header.jpg                                                                                                                                  |(navigation image)         |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/wayback_logo-sm.gif                                                                                                                              |(wayback logo)             |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/rss.png                                                                                                                                          |See recent additions in RSS|
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/mediatype_movies.gif                                                                                                                             |movies icon                |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/mediatype_etree.gif                                                                                                                              |etree icon                 |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/mediatype_audio.gif                                                                                                                              |audio icon                 |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/mediatype_texts.gif                                                                                                                              |texts icon                 |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/services/get-item-image.php?identifier=a_few_good_gmen&collection=machinima&mediatype=movies                                                            |(movies pick)              |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/services/get-item-image.php?identifier=gd1978-12-16.sonyecm250-no-dolby.walker-scotton.miller.82212.sbeok.flac16&collection=GratefulDead&mediatype=etree|(etree pick)               |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/services/get-item-image.php?identifier=zh27814&collection=zh27&mediatype=audio                                                                          |(audio pick)               |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/services/get-item-image.php?identifier=secretarmiesb00spivrich&collection=americana&mediatype=texts                                                     |(texts pick)               |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/star.png                                                                                                                                         |5.00 out of 5 stars        |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/star.png                                                                                                                                         |5.00 out of 5 stars        |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/star.png                                                                                                                                         |5.00 out of 5 stars        |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/star.png                                                                                                                                         |5.00 out of 5 stars        |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/star.png                                                                                                                                         |5.00 out of 5 stars        |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/no_star.png                                                                                                                                      |0.00 out of 5 stars        |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/no_star.png                                                                                                                                      |0.00 out of 5 stars        |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/no_star.png                                                                                                                                      |0.00 out of 5 stars        |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/no_star.png                                                                                                                                      |0.00 out of 5 stars        |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/no_star.png                                                                                                                                      |0.00 out of 5 stars        |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/no_star.png                                                                                                                                      |0.00 out of 5 stars        |
|20080430  |http://www.archive.org/index.php|http://www.archive.org/images/no_star.png                                                                                                                                      |0.00 out of 5 stars        |
+----------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------+
only showing top 25 rows

import io.archivesunleashed._
- Update ExtractImageLinksRDD to grab alt text
- Add alt_text column to imagegraph
- Update tests
@ruebot ruebot requested review from lintool and ianmilligan1 Feb 10, 2020
ruebot added a commit to archivesunleashed/aut-docs that referenced this pull request Feb 10, 2020
@ruebot

This comment has been minimized.

Copy link
Member Author

ruebot commented Feb 10, 2020

@codecov

This comment has been minimized.

Copy link

codecov bot commented Feb 10, 2020

Codecov Report

Merging #422 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #422      +/-   ##
==========================================
+ Coverage   77.96%   77.97%   +0.01%     
==========================================
  Files          41       41              
  Lines        1570     1571       +1     
  Branches      292      293       +1     
==========================================
+ Hits         1224     1225       +1     
  Misses        218      218              
  Partials      128      128
@ianmilligan1 ianmilligan1 merged commit 8f1a9f1 into master Feb 10, 2020
3 checks passed
3 checks passed
codecov/patch 100% of diff hit (target 77.96%)
Details
codecov/project 77.97% (+0.01%) compared to 87c9734
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@ianmilligan1 ianmilligan1 deleted the issue-420 branch Feb 10, 2020
ianmilligan1 pushed a commit to archivesunleashed/aut-docs that referenced this pull request Feb 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.