Web Science and Digital Libraries Research Group: 2019-05-29: In The Battle of the Surrogates: Social Cards Probably Win

Wednesday, May 29, 2019

2019-05-29: In The Battle of the Surrogates: Social Cards Probably Win

Web archive collections provide meaning by sampling specific resources from the web. We want to summarize these resources by sampling mementos from those collections and visualizing them as a social media story.

On Tuesday, we released our latest pre-print "Social Cards Probably Provide Better Understanding of Web Archive Collections". My work builds on AlNoamany's work of using social media storytelling to provide a visualization that summarizes web archive collections. In previous blog posts I discussed different storytelling services. A key component of their capability to convey understanding is the surrogate, a small visualization of a web page that provides a summary of that page, like the surrogate within the Twitter Tweet example shown below. However, there are many types of surrogates. We want to use a group of surrogates together as a story to provide a summary of a web archive collection. Which type of surrogate works best for helping users understand the underlying collection?

An annotated tweet containing a surrogate referring to one of my prior blog posts.

Dr. Nelson, Dr. Weigle, and I iterated for several months to produce this study. Using Mechanical Turk, we evaluated six different surrogate types and discover that the social card, as produced by our MementoEmbed project, probably provides better understanding of the overall collection than the surrogate currently employed in the Archive-It interface.

How Much Information Do We Get From the Surrogates on the Archive-It Collection Page?

As seen in this screenshot, each Archive-It collection page contains surrogates of its seeds. For most collections, how much information do the surrogates provide the user about the collection? (link to collection in screenshot)

Archive-It allows curators to supply optional metadata on seeds. We analyzed how much information might be available to a user viewing such metadata and found that 54.60% of all Archive-It seeds have no metadata. As shown in the scatter plot below, we discovered that, as the number of seeds in a collection increases, the average number of metadata fields decreases.

As the number of seeds increases, we see a decrease in the mean number of metadata fields per collection.

Without this metadata, an Archive-It surrogate consists of the seed URL, the number of mementos, and the first and last memento-datetimes, as shown below. Is this enough for a user to glean meaning about the underlying documents?

A minimal Archive-It surrogate

We adapted some of Lulwah Alkwai's recent work (link forthcoming) and determine that seed URLs still do contain some information that may lead to understanding. An Euler diagram counting the URLs that contain some of this information is shown below. Thus, seed URLs still may help with collection understanding.

An Euler diagram showing the number of Archive-It seed URLs that contain different categories of information.

In the paper, we also highlight the top 10 metadata fields in use and define the different information classes found in seed URLs.

In a Story, Which Surrogate Best Supports Collection Understanding?

Brief Methodology

The figures below show the different types of surrogates that we displayed to participants. Each story consisted of a set of mementos visualized as surrogates in a given order. We varied the surrogates but did not change the order of the mementos. The mementos for each story had been chosen by human curators from AlNoamany's previous work and are available as a Figshare dataset. In our pre-print, we chose stories from four different collections to display to participants.

Our first surrogate type is the de-facto Archive-It interface that users would encounter when trying to understand a web archive collection. We used our own Archive-It Utilities to gather the metadata from the Archive-It collection in order to generate these surrogates.

A screenshot of part of an example story using surrogates from the Archive-It interface.

Our second is the browser thumbnail, commonly used by web archives. We employed MementoEmbed to generate these thumbnails.

A screenshot of an example story using browser thumbnails.

Next was the social card, as produced by MementoEmbed.

A screenshot of an example story using social cards.

The next three surrogates we displayed to users were combinations of browsers and thumbnails.

A screenshot of an example story using social cards next to browser thumbnails.

A screenshot of an example story using social cards, but with thumbnails instead of striking images

A screenshot of an example story using social cards, but where thumbnails appear when the user hovers over the striking image.

For each participant, we showed them the story using a given surrogate for 30 seconds. We then refreshed the web page and presented them with six surrogates of the same type as the story that they had just viewed. Two surrogates represented pages from the collection, but the other four were drawn from different collections. We asked them to select the two surrogates from the six that they believed belonged to the same collection. We recorded all mouse hovers and clicks over links and images.

Brief Results

Our results show no significant difference in response times at p < 0.05, but they do show a difference in answer accuracy for social cards vs. the Archive-It interface at p = 0.0569 and social cards side-by-side with thumbnails at p = 0.0770. The paper further details these results overall and per collection. Even though our use case is different, our results are similar to those in a 2013 IR study performed by Capra et al.

More users interacted with thumbnails than any other surrogate element. We assume that the user was attempting to zoom in and see the thumbnail better. Also, more users clicked on thumbnails to read the web page behind the surrogate than they did for social cards. In fact, social cards had the least number of participants interacting with them compared to other surrogate types. We assume that this means that most users were satisfied with the information provided by the social card and did feel the need to interact as much.

The Future

In this post, I briefly summarized our recent pre-print "Social Cards Probably Provide Better Understanding of Web Archive Collections." This is not the end, however. We are planning more studies to further examine different types of storytelling with future participants. Our work has implications not only for our own web archive summarization efforts, but for any storytelling tool that employs surrogates.

-- Shawn M. Jones

Thank-you @assertpub for letting us know that this pre-print was the #1 paper on arXiv in the Digital Libaries category for May 29, 2019.

"Social Cards Probably Provide For Better Understanding Of Web Archive Collections" is the #1 paper on Arxiv today in digital libraries. Congrats @phonedude_mln @weiglemc. See it at -> https://t.co/sFprFLIGvr and https://t.co/p2JQdoRMpD. Please retweet.
— Assert Arxiv (@assertpub_) May 29, 2019

Web Science and Digital Libraries Research Group