Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
iPres, Amsterdam, The Netherlands, Se...
Upcoming SlideShare
Loading in …5

Who is Asking - Humans and Machines Experience a Different Scholarly Web


Published on

Who is Asking? Humans and Machines Experience a Different Scholarly Web
Presentation at iPres 2019

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Who is Asking - Humans and Machines Experience a Different Scholarly Web

  1. 1. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Martin Klein Los Alamos National Laboratory @mart1nkle1n with Lyudmila Balakireva (LANL) Harihar Shankar (98point6) Who is Asking? Humans and Machines Experience a Different Scholarly Web HEAD GET GET+ Chrome IA Crawl 2xx 3xx 4xx 5xx HEAD GET GET+ Chrome IA Crawl 010002000300040005000 2xx 3xx 4xx 5xx
  2. 2. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Imagine this is your phone…
  3. 3. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …and you are calling 112…
  4. 4. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …this person responds...
  5. 5. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …and you are getting the help you need!
  6. 6. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 What if this is your phone …
  7. 7. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …and you are calling 112…
  8. 8. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …this other person responds...
  9. 9. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …and some “help” is coming!
  10. 10. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 But what if this is your phone …
  11. 11. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 … and you are calling 112 …
  12. 12. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …no one responds...
  13. 13. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …and you don’t get any help!
  14. 14. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 No more scary 112 calls! • Phones are web clients • 112 calls are HTTP requests against DOIs • Regardless of the web client you use, would you not expect the same response from a web server responding to the request against a DOI?
  15. 15. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Idea… • Comparative study investigating scholarly publishers’ responses • To common HTTP requests • Against DOIs • Using multiple different web clients, resembling • Machines browsing • Humans browsing
  16. 16. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Why is this relevant? • Archival use case • Libraries, archives, preservation orgs capturing/archiving scholarly resources on the web • Dynamic nature of the web • Requires continuous updating of crawling frameworks • If we can discover and learn patterns • Crawling and archiving frameworks could be “smarter”
  17. 17. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work? 10.1007/978-3-540-87599-4_38
  18. 18. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this not work? 10.1007/978-3-540-87599-4_38
  19. 19. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work?
  20. 20. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work?
  21. 21. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work?
  22. 22. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work? 
  23. 23. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work?  
  24. 24. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work?   
  25. 25. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 DOI dataset • Gathering a representative sample is not trivial! • Internet Archive conducts crawls of the scholarly domain • June 2018: 93 million DOIs • Obtained WARC files and extracted DOI redirect chain • Investigate publisher distribution • Final link of redirect chain and extract host e.g.:  Domain: • Randomly pick 100 DOIs from the 100 most frequent domains • 10,000 DOIs
  26. 26. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Domain distribution 0 2000 4000 6000 8000 10000 1e+001e+021e+041e+06 Hosts Frequency
  27. 27. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 1/4 • HEAD request • Server responds with response headers • *but no* response body • Client: cURL
  28. 28. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 1/4 • HEAD request • Server responds with response headers • *but no* response body • Client: cURL
  29. 29. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 2/4 • GET request • Server responds with response headers • *and* response body • Client: cURL
  30. 30. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 2/4 • GET request • Server responds with response headers • *and* response body • Client: cURL
  31. 31. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 3/4 • GET+ • GET request with request headers • User Agent (desktop Chrome browser) • Specified connection timeout • Specified maximum number of redirects • Cookies accepted and stored • Insecure connections allowed • Client: cURL
  32. 32. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 3/4 • GET+ • GET request with request headers • User Agent (desktop Chrome browser) • Specified connection timeout • Specified maximum number of redirects • Cookies accepted and stored • Insecure connections allowed • Client: cURL
  33. 33. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 4/4 • Chrome: • GET request via Selenium Webdriver controlled browser • Client: Chrome
  34. 34. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 4/4 • Chrome: • GET request via Selenium Webdriver controlled browser • Client: Chrome
  35. 35. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Regarding response headers, RFC 7231 states: “The server SHOULD send the same header fields in response to a HEAD request as it would have sent if the request had been a GET...”.
  36. 36. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 HTTP response codes • 2xx • Success • 3xx • Redirection • 4xx • Client error • 5xx • Server error
  37. 37. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Response codes of last link in redirect chain 200 301 302 303 400 401 403 404 405 406 500 502 503 509 520 020006000 HEAD GET GET+ Chrome IA Crawl
  38. 38. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome IA Crawl 2xx 3xx 4xx 5xx
  39. 39. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Frequency of number of redirects 1 2 3 4 5 6 7 8 14 21 0100030005000 HEAD GET GET+ Chrome IA Crawl
  40. 40. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Frequency of number of redirects for final 200s 2 3 4 5 6 7 8 14 050015002500 HEAD GET GET+ Chrome IA Crawl
  41. 41. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Take-aways & next steps • Scholarly publishers respond differently to requests against DOIs • Depending on HTTP client and request method • Implications for crawlers: • Test different combinations of clients and request methods • Pretend to be as human as possible • Repeat from within LANL network with subscriptions to publishers’ content • Repeat at a later point in time, check for changes in redirection chains
  42. 42. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Martin Klein Los Alamos National Laboratory @mart1nkle1n with Lyudmila Balakireva (LANL) Harihar Shankar (98point6) Who is Asking? Humans and Machines Experience a Different Scholarly Web HEAD GET GET+ Chrome IA Crawl 2xx 3xx 4xx 5xx HEAD GET GET+ Chrome IA Crawl 010002000300040005000 2xx 3xx 4xx 5xx
