Clarification around network diagrams #275

ruebot · Mar 25, 2019

During the Team Kompromat presentation at the DC Datathon, @edsu noted that the network diagrams can be misleading. One could assume that the network diagram represents what is in the archive itself that was analyzed. We should clarify that this is not the case. So, what's the best place to do it? A note on the diagram, something in the documentation? Something else?

ianmilligan1 · Mar 25, 2019

Hmm. Maybe a note in the documentation as well as a hover-over question mark icon to display some help text like we do with the derivatives?

greebie · Mar 25, 2019

Could we be specific about what is the case (we capture every domain and create an edge for every link we find in the web page)? That is a limitation of the network graphs, since I think people imagine the archives to contain everything in Way Back. (That would be really nice, of course!)

ianmilligan1 · Mar 25, 2019

Just that what is being visualized is the domains that are captured as well as the domains that they link to (which may or may not be in the actual web archived collection).

ianmilligan1 · Mar 25, 2019

What about something like this?

greebie · Mar 25, 2019

That works for me!

ruebot · Mar 25, 2019

@ianmilligan1 I like that!

@edsu does that work?

edsu · Mar 25, 2019

Thanks for hearing this part of the presentation, and dropping it in here. You guys are awesome. I like the explanation.

I guess I was imagining (at least) two different types of users of this view.

Archivists might like to see what was linked to but not crawled, because it could help them building their collections.
Researchers who are trying to understand the content might not care too much about what was archived, and are more interested in seeing the relationships regardless of whether they were crawled. Although I guess seeing what was not crawled could help inform other visualizations, like text analysis, etc.

Maybe it would need to be two views? It would be nice if the underlying derivative Gephi file had a property indicating whether it was crawled or not. Then it could be easy for people to examine...

greebie · Mar 25, 2019

Adding a "crawled" or "domain"=1 attribute to the gexf would not be too expensive or difficult. Might be worth considering something in the sigmaJS to indicate a crawl as well (change the text size and/or colour? or the node shape?).

ianmilligan1 · Mar 27, 2019

My inclination is to not overcomplicate the sigmaJS, as we're fairly limited in what we can add there.

In terms of a crawled attribute or something, that makes sense. Or maybe just noting which nodes are origins, in which case they're part of the crawl, as opposed to destinations?

edsu · Mar 27, 2019

That makes sense to not over complicate the sigmaJS. Maybe I'm going out on a limb, but I think most archivists would want to see what is actually in the collection, rather than a mixture of what is there and what isn't. We have similar quandries in DocNow where we have vis elements that ought to behave slightly differently based on the audience (researcher vs archivist).

I think the edge already has a source and a target in the gexf file. A target could be the source of another edge though. Perhaps this isn't simple, and would require post-processing the graph...

Just out of curiosity does the SIgmaJS data get created as a artifact of the processing pipeline? Or are the gephi files in some way used to generate it?

ianmilligan1 · Mar 27, 2019

I think that's probably true, but as you've noted, researchers will also want to see what isn't there as well. So I think providing options in the Gephi file might be a good compromise here?

ruebot · Mar 27, 2019

@edsu #146 is a long running issue I've been working on porting from Python to Ruby. It's awful. I might just make exec calls out to the Python script and call it a day. Anyway, I think that visualization would be what you're looking for.

ruebot · Mar 27, 2019

Just out of curiosity does the SIgmaJS data get created as a artifact of the processing pipeline? Or are the gephi files in some way used to generate it?

The SigmaJs viz comes from the gexf file that is created by GraphPass during the derivative generation pipeline. If you want me to point you some of the code, let me know.

ianmilligan1 · Mar 27, 2019

Ahhh, good point @ruebot re #146, that would give a good sense of what's in the collection (as opposed to relying on the network diagram to see).

edsu · Mar 27, 2019

I think y'all should feel like you can close this ticket. Especially since it seems like #146 covers a known issue.

greebie · Mar 27, 2019

The #146 does seem like the better option. After looking at AUT a bit, it may be slightly more difficult to include the crawled=True attributes than I thought. The main issue is the many approaches to networks we take in aut means we'd have to make changes in multiple places (Hashed vs proper ids; graphx vs flatmap over tuples; gexf vs graphml etc.) or risk inconsistent outputs.

ianmilligan1 · Mar 27, 2019

Heh thanks @edsu but I wouldn't sell this issue short.. I think at a minimum we should add some helper text explaining the visualization, and I do like the idea of letting people filter in the Gephi file.

ruebot · Mar 27, 2019

Thanks @edsu!

@ianmilligan1 want to put in a PR with your work when you get a chance, then I'll start working on #146 again.

ianmilligan1 · Mar 27, 2019

Sounds good! (timeline all depends on whether I make the standby list on a flight this afternoon or not.. heh) Thanks @edsu @ruebot @greebie for your thoughts on this important ticket.

ruebot · Mar 27, 2019

Deployed #277 to production.

ruebot added the ux label Mar 25, 2019

ianmilligan1 added a commit that referenced this issue Mar 27, 2019

Explaining graph viz, partially resolves #275

06cf2be

ianmilligan1 referenced this issue Mar 27, 2019
Merged
Explaining graph visualization, partially resolves #275 #277

ruebot closed this in #277 Mar 27, 2019

ruebot added a commit that referenced this issue Mar 27, 2019

Explaining graph visualization, partially resolves #275 (#277)
* Explaining graph viz, partially resolves #275 * Fleshes out the Gephi files documentation as well

Loading status checks…

9a80b86

archivesunleashed/auk

Join GitHub today

Clarification around network diagrams #275

Comments

ruebot commented Mar 25, 2019

ruebot added the ux label Mar 25, 2019

This comment has been minimized.

ianmilligan1 commented Mar 25, 2019

This comment has been minimized.

greebie commented Mar 25, 2019

This comment has been minimized.

ianmilligan1 commented Mar 25, 2019

This comment has been minimized.

ianmilligan1 commented Mar 25, 2019

This comment has been minimized.

greebie commented Mar 25, 2019

This comment has been minimized.

ruebot commented Mar 25, 2019

This comment has been minimized.

edsu commented Mar 25, 2019

This comment has been minimized.

greebie commented Mar 25, 2019

This comment has been minimized.

ianmilligan1 commented Mar 27, 2019

This comment has been minimized.

edsu commented Mar 27, 2019

This comment has been minimized.

ianmilligan1 commented Mar 27, 2019

This comment has been minimized.

ruebot commented Mar 27, 2019

This comment has been minimized.

ruebot commented Mar 27, 2019

This comment has been minimized.

ianmilligan1 commented Mar 27, 2019

This comment has been minimized.

edsu commented Mar 27, 2019

This comment has been minimized.

greebie commented Mar 27, 2019

This comment has been minimized.

ianmilligan1 commented Mar 27, 2019

This comment has been minimized.

ruebot commented Mar 27, 2019

This comment has been minimized.

ianmilligan1 commented Mar 27, 2019

ianmilligan1 added a commit that referenced this issue Mar 27, 2019

ianmilligan1 referenced this issue Mar 27, 2019

Explaining graph visualization, partially resolves #275 #277

ruebot closed this in #277 Mar 27, 2019

ruebot added a commit that referenced this issue Mar 27, 2019

This comment has been minimized.

ruebot commented Mar 27, 2019