Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
Permalink
Browse files

Add discardMimeTypesTika, and tweak other MIME type examples. (#38)

  • Loading branch information
ruebot authored and ianmilligan1 committed Jan 15, 2020
1 parent 8ed292b commit 45de55b06b09135393755bb10cd90c1e10a9dce7
Showing with 17 additions and 6 deletions.
  1. +17 −6 current/filters.md
@@ -343,9 +343,9 @@ Filters out detected MIME Types (identified by the web server).
```scala
import io.archivesunleashed._
val mimetypes = Set("text/html", "text/plain")
val mimeTypes = Set("text/html", "text/plain")
val r = RecordLoader.loadArchives("example.warc.gz",sc)
r.discardMimeTypes(mimetypes)
r.discardMimeTypes(mimeTypes)
```

### Scala DF
@@ -354,9 +354,11 @@ r.discardMimeTypes(mimetypes)
import io.archivesunleashed._
import io.archivesunleashed.df._
val mimeTypes = Set("text/html", "text/plain")
RecordLoader.loadArchives("example.warc.gz",sc)
.webpages()
.discardMimeTypesDF(Set("text/html"))
.discardMimeTypesDF(mimeTypes)
```

### Python DF
@@ -372,14 +374,23 @@ Filters out detected MIME Types (identified by [Apache Tika](https://tika.apache
```scala
import io.archivesunleashed._
val mimetypes = Set("text/html", "text/plain")
val mimeTypes = Set("text/html", "text/plain")
val r = RecordLoader.loadArchives("example.warc.gz",sc)
r.discardMimeTypesTika(mimetypes)
r.discardMimeTypesTika(mimeTypes)
```

### Scala DF

TODO
```scala
import io.archivesunleashed._
import io.archivesunleashed.df._
val mimeTypes = Set("text/html", "text/plain")
RecordLoader.loadArchives("example.warc.gz",sc)
.webpages()
.discardMimeTypesTikaDF(mimeTypes)
```

### Python DF

0 comments on commit 45de55b

Please sign in to comment.
You can’t perform that action at this time.