Missing nodes in GEXF output files (on AUK) #25

ianmilligan1 · Aug 23, 2018

Describe the bug
On a few collections, we have sigma js network visualizations that have misplaced nodes. Edges point to blank space, and nodes are hovering arbitrary. Here's an example from sigma:

The same thing appears when the GEXF file is opened in Gephi:

The original GRAPHML file (pre GraphPass transformation), however, has edges properly connected to nodes.

Something is going awry with GraphPass, presumably in the x/y placement of nodes.

To Reproduce
Find a broken collection. There doesn't seem to be a universal rhyme or reason for why this happens.

@greebie I will send you the before/after file of the collection above, so you can work on GraphPass to fix it.

Expected behavior
Edges and nodes should connect. 😄

Desktop/Laptop (please complete the following information):
@ruebot and I have reproduced this on Safari, Chrome, and Firefox on both Linux and Windows. The Gephi test pretty firmly indicates that it is a GraphPass related issue.

These appear to be nodes that only link to themselves. For example, search for "hermanleonard.com" in the test example. Gephi handles these by providing a loop back representation (shown above), sigma does not support self-links. However, apparently switching the edge type to "curved" looks more attractive and may resolve the problem. cf

There are also plugins available to support self-referential links.

It is possible to remove self-referential links in Graphpass completely if desired.

greebie · Aug 27, 2018

These appear to be nodes that only link to themselves. For example, search for "hermanleonard.com" in the test example. Gephi handles these by providing a loop back representation (shown above), sigma does not support self-links. However, apparently switching the edge type to "curved" looks more attractive and may resolve the problem. cf

There are also plugins available to support self-referential links.

It is possible to remove self-referential links in Graphpass completely if desired.

Why are the edges pointing at blank space then?

ianmilligan1 · Aug 27, 2018

Why are the edges pointing at blank space then?

Okay - after further exploration, it looks as if some nodes are collecting negative sizes. This probably happens when Graphpass tries to figure out a reasonable sizing pattern for the nodes. I will try to resolve tomorrow.

greebie · Aug 27, 2018

Okay - after further exploration, it looks as if some nodes are collecting negative sizes. This probably happens when Graphpass tries to figure out a reasonable sizing pattern for the nodes. I will try to resolve tomorrow.

Looks like I am using the node count for the graph size and that creates problems when the max node sizes - min node size value is equal to the node size. Will switch to the number of edges instead.

greebie · Aug 28, 2018

Looks like I am using the node count for the graph size and that creates problems when the max node sizes - min node size value is equal to the node size. Will switch to the number of edges instead.

Okay -- here is the full explanation of the bug, complete with mathematics. :)

Because our network outputs in aut contain websites with lots and lots of links and others with very few, it can be difficult to visualise the outputs in gephi or otherwise using the total links. For instance, it's common to get this:

In order to make it possible to view the nodes together in a more visually appealing way, it's common to use a scale of some sort. You could take the square root of each number, for instance, or cut the nodes in half, so that a node of size 1000 goes to 500 while a node of size 2 goes to 1.

The calculation I used was to multiply every node by log10( total number of nodes in the graph / (degree of largest node - degree of smallest node)). This worked fine when the denominator of the function was less than the total nodes, but failed when it was larger because the scale would all be negative numbers. This meant that sigma had no basis on which to scale.

Basically, I was fooled into believing this approach worked generally because sigma did its own massaging.

The new approach will be more correct and will provide an attractive output for both sigma and gephi (or any other visualisation tool).

It uses the following:

MAX_SCALE_VALUE * ((log(x +1) - log(minimum +1)) / (log(maximum +1) - log(minimum +1))

where

x = the actual degree value for each node.
minimum = the lowest degree value
maximum = the highest degree value

Each of x, minimum and maximum are increased by one to avoid log(0) which is undefined.

greebie · Aug 28, 2018

Okay -- here is the full explanation of the bug, complete with mathematics. :)

Because our network outputs in aut contain websites with lots and lots of links and others with very few, it can be difficult to visualise the outputs in gephi or otherwise using the total links. For instance, it's common to get this:

In order to make it possible to view the nodes together in a more visually appealing way, it's common to use a scale of some sort. You could take the square root of each number, for instance, or cut the nodes in half, so that a node of size 1000 goes to 500 while a node of size 2 goes to 1.

The calculation I used was to multiply every node by log10( total number of nodes in the graph / (degree of largest node - degree of smallest node)). This worked fine when the denominator of the function was less than the total nodes, but failed when it was larger because the scale would all be negative numbers. This meant that sigma had no basis on which to scale.

Basically, I was fooled into believing this approach worked generally because sigma did its own massaging.

The new approach will be more correct and will provide an attractive output for both sigma and gephi (or any other visualisation tool).

It uses the following:

MAX_SCALE_VALUE * ((log(x +1) - log(minimum +1)) / (log(maximum +1) - log(minimum +1))

where

x = the actual degree value for each node.
minimum = the lowest degree value
maximum = the highest degree value

Each of x, minimum and maximum are increased by one to avoid log(0) which is undefined.

Using the new formula, this is what the same graph looks like in Gephi.

However, we should do some serious testing in sigma to make sure it works properly.

greebie · Aug 28, 2018

Using the new formula, this is what the same graph looks like in Gephi.

However, we should do some serious testing in sigma to make sure it works properly.

ianmilligan1 added the bug label Aug 23, 2018

ianmilligan1 assigned greebie Aug 23, 2018

ianmilligan1 referenced this issue Aug 24, 2018
Closed
Blank Sigma Visualization (on AUK) #26

This was referenced Aug 28, 2018

Closed

Better scaling method to determine node size. #45

Merged

Take a better approach to scaling using normalized (log(x)) formula. #46

ruebot closed this in ceda07c Sep 2, 2018

archivesunleashed/graphpass

Missing nodes in GEXF output files (on AUK) #25

ianmilligan1 commented Aug 23, 2018

ianmilligan1 added the bug label Aug 23, 2018

ianmilligan1 assigned greebie Aug 23, 2018

ianmilligan1 referenced this issue Aug 24, 2018

Blank Sigma Visualization (on AUK) #26

This comment has been minimized.

greebie commented Aug 27, 2018 •

edited

Edited 1 time

greebie edited Aug 27, 2018 (most recent)

greebie created Aug 27, 2018

This comment has been minimized.

ianmilligan1 commented Aug 27, 2018

This comment has been minimized.

greebie commented Aug 27, 2018

This comment has been minimized.

greebie commented Aug 28, 2018

This comment has been minimized.

greebie commented Aug 28, 2018

This comment has been minimized.

greebie commented Aug 28, 2018

This was referenced Aug 28, 2018

Better scaling method to determine node size. #45

Take a better approach to scaling using normalized (log(x)) formula. #46

ruebot closed this in `ceda07c` Sep 2, 2018

archivesunleashed/graphpass

Join GitHub today

Missing nodes in GEXF output files (on AUK) #25

Comments

ianmilligan1 commented Aug 23, 2018

ianmilligan1 added the bug label Aug 23, 2018

ianmilligan1 assigned greebie Aug 23, 2018

ianmilligan1 referenced this issue Aug 24, 2018

This comment has been minimized.

greebie commented Aug 27, 2018 • edited Edited 1 time greebie edited Aug 27, 2018 (most recent) greebie created Aug 27, 2018

This comment has been minimized.

ianmilligan1 commented Aug 27, 2018

This comment has been minimized.

greebie commented Aug 27, 2018

This comment has been minimized.

greebie commented Aug 28, 2018

This comment has been minimized.

greebie commented Aug 28, 2018

This comment has been minimized.

greebie commented Aug 28, 2018

This was referenced Aug 28, 2018

ruebot closed this in ceda07c Sep 2, 2018

greebie commented Aug 27, 2018 •

edited

Edited 1 time

greebie edited Aug 27, 2018 (most recent)

greebie created Aug 27, 2018

ruebot closed this in `ceda07c` Sep 2, 2018