Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upUpdate Doc for Latest Changes #19
+22
−12
Conversation
Just a couple changes, other than that. Good to go. |
.saveAsTextFile("plain-text-noheaders/") | ||
``` | ||
|
||
As most plain text use cases do not require HTTP headers to be in the output, we are removing headers in the following examples. | ||
|
||
### Scala DF | ||
|
||
TODO | ||
```scala |
This comment has been minimized.
This comment has been minimized.
ruebot
Nov 7, 2019
Member
Let's change this so it is consistent with the others:
import io.archivesunleashed._
import io.archivesunleashed.df._
RecordLoader.loadArchives("example.warc.gz", sc)
.extractValidPagesDF()
.select(RemoveHTML($"content"))
.write
.option("header","true")
.csv("plain-text-noheaders/")
current/text-analysis.md
Outdated
import io.archivesunleashed._ | ||
import io.archivesunleashed.df._ | ||
RecordLoader.loadArchives("src/test/resources/warc/example.warc.gz", sc).extractValidPagesDF() |
This comment has been minimized.
This comment has been minimized.
3 commits
Nov 7, 2019
added
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
SinghGursimran commentedNov 7, 2019
No description provided.