Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
Permalink
Browse files

Add example for Scala DF version of "Extract Most Frequent Images MD5… (

#28)

* Add example for Scala DF version of "Extract Most Frequent Images MD5 Hash".

- See archivesunleashed/aut#382

* rename
  • Loading branch information...
ruebot authored and ianmilligan1 committed Nov 22, 2019
1 parent e162384 commit 9cffe13f8215da549fb71a2932b10c1499c25cf2
Showing with 10 additions and 2 deletions.
  1. +10 −2 current/image-analysis.md
@@ -261,14 +261,22 @@ import io.archivesunleashed.app._
import io.archivesunleashed.matchbox._
val r = RecordLoader.loadArchives("example.arc.gz",sc).persist()
ExtractPopularImages(r, 500, sc).saveAsTextFile("500-Popular-Images")
ExtractPopularImagesRDD(r, 500, sc).saveAsTextFile("500-Popular-Images")
```

Will save the 500 most popular URLs to an output directory.

### Scala DF

TODO
```scala
import io.archivesunleashed._
import io.archivesunleashed.app._
val df = RecordLoader.loadArchives("example.arc.gz",sc)
.images()
ExtractPopularImagesDF(df,10,30,30).show()
```

### Python DF

0 comments on commit 9cffe13

Please sign in to comment.
You can’t perform that action at this time.