Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upQuit graphpass on graph with > 1000000 edges in the graph. #61
Conversation
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ianmilligan1
Sep 5, 2018
Member
Sure. We're not in a big hurry, so let me just test this on all our graphml files. I'll record the output.
Sure. We're not in a big hurry, so let me just test this on all our graphml files. I'll record the output. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
greebie
Sep 5, 2018
Collaborator
Tested on all the graphs in auk_graphml over 100MB. So far only 2402 gets cancelled.
Tested on all the graphs in auk_graphml over 100MB. So far only 2402 gets cancelled. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
greebie
Sep 5, 2018
Collaborator
The latest commit adds the --max-edges (-x) and --max-nodes (-y) flag options. If there are no flags, it will default to 50k and 1m.
test using:
./graphpass -nv -f {FILENAME} --max-edge 90 --max-nodes 60000
You should see a result that shows the edges and node count. It should not pass if the edges is more than 90.
The latest commit adds the --max-edges (-x) and --max-nodes (-y) flag options. If there are no flags, it will default to 50k and 1m. test using: You should see a result that shows the edges and node count. It should not pass if the edges is more than 90. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ianmilligan1
Sep 5, 2018
Member
Thanks for the update @greebie. FWIW I'm running this on all our WALK graphmls (~117 of them). I'd like to get a table of nodes/edges so we can make sure to draw the line accordingly.
Thanks for the update @greebie. FWIW I'm running this on all our WALK graphmls (~117 of them). I'd like to get a table of nodes/edges so we can make sure to draw the line accordingly. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
greebie
Sep 5, 2018
Collaborator
Thanks @ianmilligan1. I'm good with whatever default values you'd like to work with.
Thanks @ianmilligan1. I'm good with whatever default values you'd like to work with. |
ianmilligan1
requested changes
Sep 10, 2018
This is good, but let's lower the default edge count.
As you recall, we came into this issue with two problematic GEXF files - one that was 217MB and one that was 618MB.
The 279-gephi.gexf file is 217MB, but has 36478 nodes and 834922 edges. So this issue would still process it. Whereas 2402-gephi.gexf is 618MB with 37,296 nodes and 2,581,125 edges so it would be caught here.
They are rare edge cases. After reivewing the node/edge count of all the files, lowering it to 500,000 would make more sense.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Can you update the branch, @greebie? |
ruebot
approved these changes
Sep 10, 2018
[nruest@wombat:graphpass] (git)-[issue-60]-$ ./graphpass --file 2402-gephi.graphml --output /tmp --dir /tmp -g -q
FAIL >>> Graphpass can only conduct analysis on graphs with fewer than 50000 nodes and 500000 edges.
FAIL >>> Exiting...
greebie commentedSep 5, 2018
The title of this pull-request should be a brief description of what the pull-request fixes/improves/changes. Ideally 50 characters or less.
#60
What does this Pull Request do?
Drops the graphpass job if the edges are greater than 1,000,000.
How should this be tested?
Unfortunately, this may take some testing to be sure that we are happy with the 1,000,000 edge limit. For example, are there gexfs in our collection that do not potato browsers but yet have 1,000,000 or more edges?
You can use graphpass to get the edge count using the no-save and verbose flags (n & v):
./graphpass -nv -f {FILENAME}
. I will do a check myself, but thought I'd push this one in case someone else has time to check a few as well.Additional Notes:
This should cover the vast number of problems with files being too large. The only other potential future way we could have huge files is if labels or other data components were super-large.
I do not think we need to resolve this now, but it's worth mentioning for the future.
Interested parties
@ianmilligan1 @ruebot