Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DomainGraphExtractor examples #52

Merged
merged 3 commits into from Apr 8, 2020
Merged

Add DomainGraphExtractor examples #52

merged 3 commits into from Apr 8, 2020

Conversation

@ruebot
Copy link
Member

ruebot commented Apr 8, 2020

No description provided.

@ruebot ruebot requested review from lintool, ianmilligan1 and SamFritz Apr 8, 2020
Copy link
Member

ianmilligan1 left a comment

The RDD commands are throwing an error, the DF ones work.

Text output:

```shell
spark-submit --class io.archivesunleashed.app.CommandLineAppRunner path/to/aut-fatjar.jar --extractor ImageGraphExtractor --input /path/to/warcs/* --output output/path --output-format TEXT

This comment has been minimized.

Copy link
@ianmilligan1

ianmilligan1 Apr 8, 2020

Member

I couldn't get this to work without the --df flag (i.e. the commands below worked). Running the above command leads to:

20/04/08 16:09:28 ERROR CommandLineApp: ImageGraphExtractor not supported with RDD. The following extractors are supported:
20/04/08 16:09:28 ERROR CommandLineApp: DomainFrequencyExtractor
20/04/08 16:09:28 ERROR CommandLineApp: DomainGraphExtractor
20/04/08 16:09:28 ERROR CommandLineApp: PlainTextExtractor
GEXF output:

```shell
spark-submit --class io.archivesunleashed.app.CommandLineAppRunner path/to/aut-fatjar.jar --extractor ImageGraphExtractor --input /path/to/warcs/* --output output/path --output-format GEXF

This comment has been minimized.

Copy link
@ianmilligan1

ianmilligan1 Apr 8, 2020

Member

Same error here

current/aut-spark-submit-app.md Show resolved Hide resolved

```shell
spark-submit --class io.archivesunleashed.app.CommandLineAppRunner path/to/aut-fatjar.jar --extractor ImageGraphExtractor --input /path/to/warcs/* --output output/path --df --partition 1
spark-submit --class io.archivesunleashed.app.CommandLineAppRunner path/to/aut-fatjar.jar --extractor ImageGraphExtractor --input /path/to/warcs/* --output output/path --df --output-format GEXF

This comment has been minimized.

Copy link
@ianmilligan1

ianmilligan1 Apr 8, 2020

Member

👍 same here

Copy link
Member

ianmilligan1 left a comment

Looks good (of course, GEXF results are off as per aut/#436).

@ianmilligan1 ianmilligan1 merged commit a377479 into master Apr 8, 2020
2 checks passed
2 checks passed
delivery
Details
delivery
Details
@ianmilligan1 ianmilligan1 deleted the issue-14-follow-up branch Apr 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.