New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quit graphpass on graph with > 1000000 edges in the graph. #61

Merged
merged 5 commits into from Sep 10, 2018

Conversation

Projects
None yet
3 participants
@greebie
Collaborator

greebie commented Sep 5, 2018

The title of this pull-request should be a brief description of what the pull-request fixes/improves/changes. Ideally 50 characters or less.

#60

What does this Pull Request do?

Drops the graphpass job if the edges are greater than 1,000,000.

How should this be tested?

Unfortunately, this may take some testing to be sure that we are happy with the 1,000,000 edge limit. For example, are there gexfs in our collection that do not potato browsers but yet have 1,000,000 or more edges?

You can use graphpass to get the edge count using the no-save and verbose flags (n & v):
./graphpass -nv -f {FILENAME}. I will do a check myself, but thought I'd push this one in case someone else has time to check a few as well.

Additional Notes:

This should cover the vast number of problems with files being too large. The only other potential future way we could have huge files is if labels or other data components were super-large.

I do not think we need to resolve this now, but it's worth mentioning for the future.

Interested parties

@ianmilligan1 @ruebot

Show outdated Hide outdated src/headers/graphpass.h
@ianmilligan1

This comment has been minimized.

Show comment
Hide comment
@ianmilligan1

ianmilligan1 Sep 5, 2018

Member

Sure. We're not in a big hurry, so let me just test this on all our graphml files. I'll record the output.

Member

ianmilligan1 commented Sep 5, 2018

Sure. We're not in a big hurry, so let me just test this on all our graphml files. I'll record the output.

@greebie

This comment has been minimized.

Show comment
Hide comment
@greebie

greebie Sep 5, 2018

Collaborator

Tested on all the graphs in auk_graphml over 100MB. So far only 2402 gets cancelled.

Collaborator

greebie commented Sep 5, 2018

Tested on all the graphs in auk_graphml over 100MB. So far only 2402 gets cancelled.

Add optional max-nodes and max-edges flags.
Some minor clean up of flags for alpha order.
@greebie

This comment has been minimized.

Show comment
Hide comment
@greebie

greebie Sep 5, 2018

Collaborator

The latest commit adds the --max-edges (-x) and --max-nodes (-y) flag options. If there are no flags, it will default to 50k and 1m.

test using:
./graphpass -nv -f {FILENAME} --max-edge 90 --max-nodes 60000

You should see a result that shows the edges and node count. It should not pass if the edges is more than 90.

Collaborator

greebie commented Sep 5, 2018

The latest commit adds the --max-edges (-x) and --max-nodes (-y) flag options. If there are no flags, it will default to 50k and 1m.

test using:
./graphpass -nv -f {FILENAME} --max-edge 90 --max-nodes 60000

You should see a result that shows the edges and node count. It should not pass if the edges is more than 90.

@ianmilligan1

This comment has been minimized.

Show comment
Hide comment
@ianmilligan1

ianmilligan1 Sep 5, 2018

Member

Thanks for the update @greebie. FWIW I'm running this on all our WALK graphmls (~117 of them). I'd like to get a table of nodes/edges so we can make sure to draw the line accordingly.

Member

ianmilligan1 commented Sep 5, 2018

Thanks for the update @greebie. FWIW I'm running this on all our WALK graphmls (~117 of them). I'd like to get a table of nodes/edges so we can make sure to draw the line accordingly.

@greebie

This comment has been minimized.

Show comment
Hide comment
@greebie

greebie Sep 5, 2018

Collaborator

Thanks @ianmilligan1. I'm good with whatever default values you'd like to work with.

Collaborator

greebie commented Sep 5, 2018

Thanks @ianmilligan1. I'm good with whatever default values you'd like to work with.

@ianmilligan1

This is good, but let's lower the default edge count.

As you recall, we came into this issue with two problematic GEXF files - one that was 217MB and one that was 618MB.

The 279-gephi.gexf file is 217MB, but has 36478 nodes and 834922 edges. So this issue would still process it. Whereas 2402-gephi.gexf is 618MB with 37,296 nodes and 2,581,125 edges so it would be caught here.

They are rare edge cases. After reivewing the node/edge count of all the files, lowering it to 500,000 would make more sense.

Show resolved Hide resolved src/headers/graphpass.h
@ianmilligan1

This comment has been minimized.

Show comment
Hide comment
@ianmilligan1

ianmilligan1 Sep 10, 2018

Member

Can you update the branch, @greebie?

Member

ianmilligan1 commented Sep 10, 2018

Can you update the branch, @greebie?

@ruebot

ruebot approved these changes Sep 10, 2018

[nruest@wombat:graphpass] (git)-[issue-60]-$ ./graphpass --file 2402-gephi.graphml --output /tmp --dir /tmp -g -q
FAIL >>> Graphpass can only conduct analysis on graphs with fewer than 50000 nodes and 500000 edges.
FAIL >>> Exiting...

@ruebot ruebot merged commit bbde012 into master Sep 10, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@ruebot ruebot deleted the issue-60 branch Sep 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment