Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes and Updates - M #1200

Merged
merged 14 commits into from Jan 7, 2017

Conversation

@zuphilip
Copy link
Collaborator

commented Dec 27, 2016

Some comments are also in the individual commit messages.

zuphilip added some commits Dec 27, 2016

Complete rewrite of Mainichi Daily News.js
 * I also simplified the target expression, because I couldn't find examples for the other patterns. CC @fbennett
 * I couldn't find the examples anymore, but I add new examples.
 * It seems that the Japenese websites are sometimes (every second time?) not working during the test, which might be just some problems in loading the page completely...
Update Musee du Louvre.js
This is a minimalistic update which just
ensures that the translator is working again
and the tests will not fail anymore.
Rewrite and rename Microsoft Academic (Search)
* Renaming: The original Microsoft Academic Search has been completely
decommissioned and the website suggest to use the
Microsoft Academic instead: http://academic.research.microsoft.com/
* The test cases are not working when called from
Scaffold but everything seems to work fine when called
inside the browser.
* CC @aurimasv
@adam3smith
Copy link
Collaborator

left a comment

a couple of small issues. Looks good otherwise.

}
Z.debug("Somehow during automatic testing this does not work");
Z.debug("Setting the correct type therefore here manually for the test cases")
return 'journalArticle';

This comment has been minimized.

Copy link
@adam3smith

adam3smith Dec 29, 2016

Collaborator

in its current form, I believe this will return false positives for new searches
https://academic.microsoft.com/#/news/41008148?nte=5&nts=0
as well as the homepage https://academic.microsoft.com/
I'd suggest restricting thetarget to
^https?://academic\\.microsoft\\.com/#/(search|detail)

(note also the missing ^ in your version)

This comment has been minimized.

Copy link
@zuphilip

zuphilip Dec 29, 2016

Author Collaborator

Okay, I changed the target as you suggested.

This comment has been minimized.

Copy link
@zuphilip

zuphilip Dec 29, 2016

Author Collaborator

But the dynamic loading seems to be a larger problem. Because when I comment this line out (which was only there for testing) the detection on the single item pages are not working. As far as I can see all the data is delivered via JSON into the HTML template (?) which is loaded first, e.g.
dyn-loading

Any idea how to deal with this? I tried to use Z.monitorDOMChanges but couldn't get it working and actually I am not sure that the DOM really changes, or maybe they use some other mechanism...

This comment has been minimized.

Copy link
@adam3smith

adam3smith Dec 29, 2016

Collaborator

not really. I do trial and error with monitor DOMChanges, too, so I don't think I'd be able to do any better than you.

This comment has been minimized.

Copy link
@zuphilip

zuphilip Dec 29, 2016

Author Collaborator

I might have a solution now... Stay tuned.

{
"title": "Snapshot"
},
{

This comment has been minimized.

Copy link
@adam3smith

adam3smith Dec 29, 2016

Collaborator

I don't think the proliferation of attachments is good or even acceptable, even if they're just links. Could we just take the first one?

This comment has been minimized.

Copy link
@zuphilip

zuphilip Dec 29, 2016

Author Collaborator

We could save all these links in one note with some HTML, then they would still be clickable? Technically we can also just take the first one, but I don't know if the ordering is anything than arbitrarily. I don't have strong feelings about this, but maybe one of the reason to scrape info from a search engine are these different links or other additional info.

This comment has been minimized.

Copy link
@adam3smith

adam3smith Dec 29, 2016

Collaborator

I think putting the links wrapped into a href= into a note makes sense (and they'd be clickable that way). I see that they're useful, but I don't think users will be pleased with 10+ links for items.

"title": "Détails des activités <i>La Dentellière</i> de Vermeer",
"extra": "Visites guidées",
"url": "http://www.louvre.fr/llv/activite/detail_evenement.jsp?CONTENT%3C%3Ecnt_id=10134198673390879&CURRENT_LLV_ACTIVITE%3C%3Ecnt_id=10134198673390879&FOLDER%3C%3Efolder_id=9852723696500927",
"abstractNote": "Stèle : déesse Ishtar . Provenance : Tell Ahmar, antique Til Barsip. Epoque néo-assyrienne, VIIIe siècle av. J.-C. - Département des Antiquités orientales",

This comment has been minimized.

Copy link
@adam3smith

adam3smith Dec 29, 2016

Collaborator

fix spacing in abstracts

This comment has been minimized.

Copy link
@zuphilip

zuphilip Dec 29, 2016

Author Collaborator

This comes directly from the corresponding meta tag:

<meta name="description" content="Stèle : déesse Ishtar . Provenance : Tell Ahmar, antique Til Barsip. Epoque néo-assyrienne, VIIIe siècle av. J.-C. - Département des Antiquités orientales" />

Moreover, I am not sure how exactly the correct spacing would look like here and if there is a general pattern for that... I guess we could update and improve more things in this translator, but currently I wanted to concentrate that it works again. Can we maybe postpone this and make in issue for it to look at it later?

This comment has been minimized.

Copy link
@zuphilip

zuphilip Dec 29, 2016

Author Collaborator

Continued in #1206

item.attachments.push({
title: "Snapshot",
document: doc
});

This comment has been minimized.

Copy link
@zuphilip

zuphilip Dec 29, 2016

Author Collaborator

The saved snapshots are not really useful, because they don't contain any information.

This comment has been minimized.

Copy link
@adam3smith

adam3smith Dec 29, 2016

Collaborator

Fine to remove then.

Update Microsoft Academic.js
* Save links to source now in one note instead
of multiple attachments
* Restrict target expression
* Tweak the detection code for dynamic loading
of the website (this seems to work now but
it is quite fragile).
* The test cases will still fail.
* Update some spacing
@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 29, 2016

Please have a look at the new version.

@adam3smith

This comment has been minimized.

Copy link
Collaborator

commented Dec 31, 2016

It looks good, but the MS Academic translator doesn't work correctly for me on Linux (tested in Chrome and Firefox): It works fine in Scaffold and it works when _re_loading the page, but it doesn't work when first going to MS Academic. In other words, going to
academic.microsoft.com and searching for "test" doesn't detect.
But then reloading the page or directly loading https://academic.microsoft.com/#/search?iq=%2540test%2540&q=test&filters=&from=0&sort=0 does.

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 31, 2016

What happens if you directly start with https://academic.microsoft.com/#/detail/2084324324 and then click around the linked pages which stay on Microsoft Academic, i.e. journal, author pages, some other publications, reference, field of interests? Does this work for you as expected with the correct symbol detected?

When you start at https://academic.microsoft.com then the target will exclude the translator to do anything and the monitorDOMChanges will not be called in the first place. If this is the "only" problem, then we could try to tweak the target and maybe check more details in the detectWeb.

@adam3smith

This comment has been minimized.

Copy link
Collaborator

commented Dec 31, 2016

What doesn't seem to work is going from /detail/ to /search/ e.g. when I start on
https://academic.microsoft.com/#/detail/2084324324 and then search for "test" at the top, the DOM changes aren't detected, i.e. it keeps detecting and importing the single item.
Clicking on a journal such as https://academic.microsoft.com/#/detail/161027966 does work, i.e. detects and shows multiples.

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 31, 2016

Okay, I see. In this case the monitored DOM part is not changing but a parent node is set to invisible display : none and the node containing the search result is set to display : block. It seems that another monitorDOMChanges can handle that...

@zuphilip

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 1, 2017

Okay, I extensively tested and adjusted the monitorDOMChanges. Everything works now smoothly for me. Please have a look at the new version.

@adam3smith adam3smith merged commit c9a994d into zotero:master Jan 7, 2017

@adam3smith

This comment has been minimized.

Copy link
Collaborator

commented Jan 7, 2017

Yes! Nicely done on the MS Academic site!

@zuphilip zuphilip deleted the zuphilip:fixes-M branch Jan 8, 2017

zuphilip added a commit to zuphilip/translators that referenced this pull request Mar 28, 2018

Fixes and Updates - M (zotero#1200)
* Complete rewrite of Mainichi Daily News.js
** I also simplified the target expression, because I couldn't find examples for the other patterns. CC @fbennett
 * I couldn't find the examples anymore, but I add new examples.
 * It seems that the Japenese websites are sometimes (every second time?) not working during the test, which might be just some problems in loading the page completely...

* Update test in Max Planck Institute for the History of Science Virtual Laboratory Library.js
* Update MDPI Journals.js
* Update medes.js
* Rewrite Medium.js
* Update tests in mEDRA.js
* Update tests in Microbiology Society Journals.js
* Update Musee du Louvre.js
**This is a minimalistic update which just ensures that the translator is working again
and the tests will not fail anymore.
* Rewrite and rename Microsoft Academic (Search)
** Renaming: The original Microsoft Academic Search has been completely
decommissioned and the website suggest to use the
Microsoft Academic instead: http://academic.research.microsoft.com/
** The test cases are not working when called from
Scaffold but everything seems to work fine when called
inside the browser.

zuphilip added a commit to zuphilip/translators that referenced this pull request Mar 28, 2018

Fixes and Updates - M (zotero#1200)
* Complete rewrite of Mainichi Daily News.js
** I also simplified the target expression, because I couldn't find examples for the other patterns. CC @fbennett
 * I couldn't find the examples anymore, but I add new examples.
 * It seems that the Japenese websites are sometimes (every second time?) not working during the test, which might be just some problems in loading the page completely...

* Update test in Max Planck Institute for the History of Science Virtual Laboratory Library.js
* Update MDPI Journals.js
* Update medes.js
* Rewrite Medium.js
* Update tests in mEDRA.js
* Update tests in Microbiology Society Journals.js
* Update Musee du Louvre.js
**This is a minimalistic update which just ensures that the translator is working again
and the tests will not fail anymore.
* Rewrite and rename Microsoft Academic (Search)
** Renaming: The original Microsoft Academic Search has been completely
decommissioned and the website suggest to use the
Microsoft Academic instead: http://academic.research.microsoft.com/
** The test cases are not working when called from
Scaffold but everything seems to work fine when called
inside the browser.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.