Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage links with the "cite-as" relation type #58

Open
hvdsomp opened this issue Dec 20, 2019 · 6 comments
Open

Leverage links with the "cite-as" relation type #58

hvdsomp opened this issue Dec 20, 2019 · 6 comments

Comments

@hvdsomp
Copy link

@hvdsomp hvdsomp commented Dec 20, 2019

RFC8574 defines the "cite-as" link relation type that can, among others, be used to point from a landing page URI to the URI that should be used for citation purposes. A major motivation for defining the relation type was that research found that a very significant number of resources that have a DOI are not cited by means of their DOI but rather by means of their landing page URI [1].

A link with the "cite-as" relation type would typically be provided in the HTTP Link header of the landing page. For example, using the landing page https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:61891 for a dataset in the DANS EASY collection:

curl -I https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:61891

HTTP/1.1 200 OK
Date: Fri, 20 Dec 2019 08:28:07 GMT
Link: <https://doi.org/10.17026/dans-zm3-nkk4> ; rel="cite-as"
Content-Type: text/html;charset=UTF-8
Content-Language: en-US

The "cite-as" approach is part of the broader Signposting effort aimed at clarifying common landing page patterns to machines by using utterly simple REST/HATEOAS techniques. Early Signposting adopters typically implement "cite-as" with priority. As such:

  • It would be nice if the CiteAs service could take "cite-as" links provided on landing page URIs into account
  • It would be great if the CiteAs effort would promote the use of "cite-as" links

[1] Van de Sompel, H., Klein, M., and Shawn, J. (2016) Persistent URIs Must Be Used To Be Persistent. Poster at WWW 2016; arXiv preprint at http://arxiv.org/1602.09102

@hvdsomp

This comment has been minimized.

Copy link
Author

@hvdsomp hvdsomp commented Dec 20, 2019

Note that other Signposting patterns can also be leveraged by the citeas.org service. Links with the "describedby" relation type provided by the landing page URI lead to bibliographic information about the artifact the landing page is about. Using the same landing page as above, the complete HTTP Link header is as follows:

curl -I https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:61891

HTTP/1.1 200 OK
Date: Fri, 20 Dec 2019 08:28:07 GMT
Link: <https://doi.org/10.17026/dans-zm3-nkk4> ; rel="cite-as",
 <https://doi.org/10.17026/dans-zm3-nkk4> ; rel="describedby" ; type="application/vnd.datacite.datacite+xml",
 <https://doi.org/10.17026/dans-zm3-nkk4> ; rel="describedby" ; type="application/vnd.citationstyles.csl+json"
Content-Type: text/html;charset=UTF-8
Content-Language: en-US

The bibliographic information that is linked to is - in this case - obviously also available from the DOI API. But that requires the client to have out-of-bound knowledge about that API. With the above "describedby" links, a client can just follow its nose to the metadata.

@caseydm

This comment has been minimized.

Copy link
Collaborator

@caseydm caseydm commented Jan 6, 2020

This is great! I am close to implementing it within CiteAs. One issue I have found with the above example is that there are actually two cite-as relations returned for that link:

<https://doi.org/10.17026/dans-zm3-nkk4> ; rel="cite-as",
<http://www.persistent-identifier.nl?identifier=urn%3Anbn%3Anl%3Aui%3A13-z7og-v8> ; rel="cite-as"

Requests has good support for these headers, but in this case it returns the second link:

import requests
r = requests.get('https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:61891')
print(r.links['cite-as']['url'])
http://www.persistent-identifier.nl?identifier=urn%3Anbn%3Anl%3Aui%3A13-z7og-v8

I think the DOI may be better but that would involve manual parsing of the headers. I prefer to use requests as it will provide consistent results. Do you think the second result is bad? It's still pointing to a cite-as reference that the author apparently intended.

@hvdsomp

This comment has been minimized.

Copy link
Author

@hvdsomp hvdsomp commented Jan 6, 2020

To me this seems like a bug with Requests in that it looks like it is unable to parse multiple links with the same link relation type. Meaning a similar problem would occur for Link headers with eg multiple “alternate” links (instead of “cite-as” links), which is rather common. Maybe an issue should be posted for Requests with this regard?

BTW in most current implementations there will only be a single “cite-as” link, typically pointing at a DOI. But multiple persistent identifiers for a resource is not really uncommon. So, it would be better to tackle the issue at the source, which - I speculate - is Requests.

@caseydm

This comment has been minimized.

Copy link
Collaborator

@caseydm caseydm commented Jan 7, 2020

Yes I'm with you. Requests should be returning a list rather than a single item. I'll implement this with Requests but also submit an issue so that it hopefully gets fixed. Or maybe they have a suggestion on how to handle these situations. It would be nice to iterate through the list and look for a DOI.

@caseydm

This comment has been minimized.

Copy link
Collaborator

@caseydm caseydm commented Jan 13, 2020

The first version of this is implemented on the site:

http://citeas.org/cite/https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:61891

I would love to see some additional examples if you have any. After using it a few times I think it needs to be refactored into its own provenance section, so I am planning to do that next. But I wanted to get the initial feature tested and working.

As for looking at multiple links, I went to submit an issue for requests but found a way to parse the links with their utility function called parse_header_links. So this works to get the list of cite-as links so the software can find the preferred DOI link:

r = requests.get(url)
header_links = requests.utils.parse_header_links(r.headers['link'])
cite_as_links = [link for link in header_links if link['rel'] == 'cite-as']
@hvdsomp

This comment has been minimized.

Copy link
Author

@hvdsomp hvdsomp commented Jan 13, 2020

Cool! More examples are available via the Early Adopters page of Signposting, see http://signposting.org/adopters/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.