Automatic translator testing as part of the continuous integration #1310

zuphilip · May 28, 2017

It would be really good to have automatic translator testing as part of the continuous integration. This doesn't have to be a strict test, but the information could be just shown for all participants of a PR.

I imagine something like this:

There is a testing instance of the translation server running online somewhere, e.g.
a. The same instance we are using now for the daily testing.
b. The same machine but different instance we are using now for the daily testing.
c. Some service like heroku should be easy to set up with the Dockerfile
Some script is called by some CI
a. Travis CI which we already use
b. CircleCI or anything else
The script will on the testing instance checkout the suggested code in the PR
The tests are run (possibly only the ones from the changed translators) and compared to the test results from the master branch.
Some diff feedback is given.

retorquere · Feb 1, 2019

I'm mostly done with a test runner that runs import tests and know how I'm going to implement export tests. web tests should also be feasible. Not all tests pass yet, but the import tests in all translators that have them run in a second or so.

retorquere · Feb 1, 2019

(and it does indeed show diffs)

dstillman · Feb 1, 2019

To be clear, though, there's already a test runner — we don't need a new one (but the existing one can be improved).

retorquere · Feb 1, 2019

That's cool, then it will be just for me. Is the existing one planned to be ran on every checkin/PR?

dstillman · Feb 1, 2019

That's what this ticket is for, yes. As I mentioned in the other thread, we already have daily translator tests using the same test runner used in Scaffold. For CI tests, we'd test just the modified translator. We could consider also testing translators that depend on the modified one, but that could mean hundreds of translators for something like RIS, so it probably makes more sense just to make sure those translators have appropriate coverage in their own tests.

retorquere · Feb 1, 2019

I'm still seeing some unexpected things assuming the existing test already run daily. I'm not sure about any of this though, please be gentle -- these are what I found, but the way I found it could be way off base.

testcase 2 (and a similar issue in testcase 6) in RIS.js: the input has Y2 - 1986/6/23 but the items array in that test case doesn't have an accessDate field, so the test should fail (I think?), and in Zotero, the date is imported as 1986-06-22

maybe more a question than anything else, but in testcase 3 and 5 of RIS.js, the items array has a version field but I've been going over the code and I don't see where it is set from the IS input field

in testcase 7 of RIS.js there's a "volume": "7" field in the items array that I think should be edition (if I'm interpreting line 6403 of RIS.js correctly)

BibTeX.js test case 8 misses an "accessDate": "2012-07-27"

Frieze.js has a syntax error (#1820)

dstillman · Feb 2, 2019

I'm still seeing some unexpected things assuming the existing test already run daily.

The existing translator tests are purely informational, and as you can see in the results, a large number are currently failing, for various reasons: because of genuine problems, because the tests are out of date (the tests were down for a long time until fairly recently), or simply because the translator tester doesn't run JS and therefore doesn't have access to the same initial page DOM that exists when web translators are run in the browser.

the items array in that test case doesn't have an accessDate field, so the test should fail

It looks like accessDate is stripped when the item is sanitized. Presumably that's because the access date is always different, but clearly that's not the right thing to do. We should normalize to a specific value (e.g., "CURRENT_TIMESTAMP") when present and otherwise leave it out.

the items array has a version field but I've been going over the code and I don't see where it is set from the IS input field

IS is changed to ET for computer programs, and that imports as the version.

there's a "volume": "7" field in the items array that I think should be edition (if I'm interpreting line 6403 of RIS.js correctly)

6403 is the last line of the file, so not sure which line you mean, but VL seems correct for volume for a book.

retorquere · Feb 2, 2019

I thought the accessDate didn't change for imports? Maybe web, but straight imports should just believe the accessDate being passed right?

I meant line 867, not 6403, sorry.

dstillman · Feb 2, 2019

Maybe web, but straight imports should just believe the accessDate being passed right?

Yes. I noted that on the linked ticket.

I meant line 867, not 6403, sorry.

Looks like that's only in ProCite mode, but I don't know the details here.

retorquere · Feb 2, 2019

But accessDate isn't always stripped -- Bookmarks.js has an accessdate in the item.

In any case, found a few things, cool, on to other stuff.

adam3smith · Feb 2, 2019

Just to be clear @retorquere -- getting the test runner working via CI on builds would be a huge help to @zuphilip and me, so I don't think anyone wants to discourage you from helping with that -- very much on the contrary. The way I'm understanding Dan was only that he wanted to save you from duplicating existing work.

retorquere · Feb 2, 2019

Just tell me where I can help, I'm big on automated tests and still a little shook that I PRed a breaking change -- getting tests running is an atonement for that in part. But with regard to duplicating existing work, there's this thing I'm working on you may know that duplicates the BibTeX exporter 🤣

retorquere · Feb 2, 2019

IS is changed to ET for computer programs, and that imports as the version.

That's right and I can now see that happening (thanks for the pointer), but looking at https://translator-tests.zotero.org/ though I see two mentions of

Invalid field 'version' for item type 'computerProgram'.

in the log for RIS, and from my understanding, fields that trigger such warnings are not imported, right? If that is right, those two testcases in the RIS file shouldn't have the version field (I'm on thin ice here -- the RIS translator is pretty complex and I don't understand most of it).

zuphilip · Feb 2, 2019

For local testing Zotero translators there is also the built-in page chrome://zotero/content/tools/testTranslators/testTranslators.html. You could open that one in v4.0 in Firefox, but now in v5.0 you have to open it in Zotero itself, e.g. I open it in the browser tab of Scaffold. The code for that page should be found in the Zotero repository.

Anyway, running the test cases automatically for each PR would indeed be very appreciated!

However, I am not so sure that the plan I outlined here at the very beginning is still the way to go. I would imagine more that we now can install translation-server on-the-fly at the Travis build, then make sure that we use the updated code within translation-server, and run the test cases for the new/modified translators, compare + output.

retorquere · Feb 2, 2019

Can I do a shallow clone of translation-server including submodules and expect it to work? I'm on a fairly slow link and the recursive clone is taking ages.

dstillman · Feb 2, 2019

I've made a few tweaks to translation-server's test runner (which is what's used for the daily full tests) so that it can be used for this. Here's what a Travis config will need to do:

Parse the modified translators out of the commit range (TRAVIS_COMMIT_RANGE, etc.)
git clone https://github.com/zotero/translation-server
cd translation-server
git submodule update --init modules/zotero (non-recursive, since we don't need styles, translators, or CSL locales)
TRANSLATORS_DIR=.. node test/testTranslators/testTranslators.js --num-concurrent-tests=1 -g '(translatorID1|translatorID2|...)

Or something like that. Basically TRANSLATORS_DIR will need to be set to the root directory containing the current zotero/translators commit that Travis makes available, instead of using modules/translators or modules/zotero/translators.

If any tests fail, it will exit with a non-0 exit code.

Later, we'll probably want to improve/colorize the output from that script.

retorquere · Feb 3, 2019

There are also web tests that request URLs that either don't exist (404) or require auth (403). What should I make of tests of type artwork? They look like web test to me.

adam3smith · Feb 3, 2019

404 is a test failure -- often that means that the site has moved or closed down. 403 is trickier because whoever is testing may not have the same access as the person creating the test.

No idea where tests of type artwork come from but I agree that sounds like web.

zuphilip · Feb 3, 2019

Maybe, we should have a possibility to classify how openly a test case is accessible. At least for automated testing like in CI we would need to skip them.

No idea where tests of type artwork come from but I agree that sounds like web.

I see this type in ARTstor.js and yes there is also an URL given there, such that one needs to fix this into web.

retorquere · Feb 3, 2019

How about mocking out web access by saving the pages as part of the test suite? That would remove one source of spurious errors and also speed up the tests - and I'm seeing lots more of these, and also 500 errors and redirect loops.

In more general terms I'd argue against tests that require login access; I realize some sites only have content when logged in but again mocked GET requests would get around that.

retorquere · Feb 3, 2019

What to make of translators that have import tests but that do not provide detectImport or doImport?

dstillman · Feb 3, 2019

How about mocking out web access by saving the pages as part of the test suite?

That would defeat the main purpose of the tests, which is to make sure they’re still working on the sites where they’re supposed to work. (It also wouldn’t work for many tests, which require follow-up requests.)

We do try to mock web requests in client tests, where we’re interested in testing the client architecture itself.

dstillman · Feb 3, 2019

Marking tests that aren’t expected to work publicly makes sense to me, though.

retorquere · Feb 3, 2019

Fair enough - there's two ways to see these tests I guess; protecting against code breakage, and making sure they track the current version of the site. My mind went to the other kind.

But for tests that require login, who will ever run these?

retorquere · Feb 3, 2019

(in my local tests I had doGet mocked to cache responses, so followup requests would work)

zuphilip · Feb 3, 2019

But for tests that require login, who will ever run these?

They are useful to run, when changing the code. Whoever will work on the translator will also have access to the database and thereby can prove that the results are either the same, or improving, or getting worse. Thereby we can maintain sites and help these collaborators to work on them even when we don't have access to the sites ourselves.

dstillman · Feb 3, 2019

What to make of translators that have import tests but that do not provide detectImport or doImport?

Example?

retorquere · Feb 3, 2019

My bad -- I had picked up RefWorks Tagged.js and Web of Science Tagged.js because they don't have doImport() but rather doImport(variable). What kind of input is doImport provided with?

dstillman · Feb 3, 2019

Zotero.Translate.Import.prototype._getParameters() isn't defined (like it is for Zotero.Translate.Web) and grepping for doImport in the translators directory doesn't show any arguments being passed for the direct calls, so I'd guess none.

@adam3smith might remember why that was originally added in 2012, though. The variable in Web of Science isn't used.

retorquere · Feb 4, 2019

Not as bad as I thought with broken tests -- 49 broken test cases across all import/web testcases, and 17 requiring auth.

How will we mark these? Add "auth": true for cases which require authentication, and "error": "some explanation of what went wrong" for failing tests (404s, 416s, etc)?

retorquere · Feb 4, 2019

Uh forget about marking reasons for error -- I just did that locally to speed up sussing out which cases were failing.

adam3smith · Feb 4, 2019

@adam3smith might remember why that was originally added in 2012, though. The variable in Web of Science isn't used.

No, sorry. Especially for RefWorks Tagged, this doesn't make any sense since that's not called from any web translator, nor would we expect it to be. I was relatively new to writing import translators in 2012, so I'm guessing this was just a mistake on my part, likely based on some sloppy copy&paste job.

adam3smith · Feb 4, 2019

@retorquere is your work on this visible already? Just curious to take a look.

retorquere · Feb 4, 2019

It's visible but only half-finished as I mean to focus on the translation-server based tests (but I still do use it for quicky tests).

A patch that marks translators as broken or requiring auth so my test runner skips them is at https://gist.github.com/8fc7abd258dda9ac7f848ccbc5f1989d; I've not checked this in as I don't mean to submit bulk PRs without discussion first. The test runner (unfinished as it is) lives at https://github.com/retorquere/translators/blob/eslint/.ci/index.js

retorquere · Feb 4, 2019

https://github.com/retorquere/translators/blob/ci/.ci/index.js implements the steps from #1310 (comment) but even when I intentionally introduce errors, I get a 0 exit status.

dstillman referenced this issue Feb 1, 2019
Merged
Re-introduce changes from #1814 #1818

dstillman referenced this issue Feb 2, 2019
Open
Normalize access date in translator tests #1632

zotero/translators

Join GitHub today

Automatic translator testing as part of the continuous integration #1310

Comments

zuphilip commented May 28, 2017

dstillman referenced this issue Feb 1, 2019

Re-introduce changes from #1814 #1818

This comment has been minimized.

retorquere commented Feb 1, 2019

This comment has been minimized.

retorquere commented Feb 1, 2019

This comment has been minimized.

dstillman commented Feb 1, 2019 • edited

This comment has been minimized.

retorquere commented Feb 1, 2019

This comment has been minimized.

dstillman commented Feb 1, 2019

This comment has been minimized.

retorquere commented Feb 1, 2019

This comment has been minimized.

dstillman commented Feb 2, 2019

dstillman referenced this issue Feb 2, 2019

Normalize access date in translator tests #1632

This comment has been minimized.

retorquere commented Feb 2, 2019

This comment has been minimized.

dstillman commented Feb 2, 2019

This comment has been minimized.

retorquere commented Feb 2, 2019

This comment has been minimized.

adam3smith commented Feb 2, 2019

This comment has been minimized.

retorquere commented Feb 2, 2019

This comment has been minimized.

retorquere commented Feb 2, 2019

This comment has been minimized.

zuphilip commented Feb 2, 2019

This comment has been minimized.

retorquere commented Feb 2, 2019

dstillman added a commit to zotero/translation-server that referenced this issue Feb 2, 2019

This comment has been minimized.

dstillman commented Feb 2, 2019 • edited

This comment has been minimized.

retorquere commented Feb 3, 2019

This comment has been minimized.

adam3smith commented Feb 3, 2019

This comment has been minimized.

zuphilip commented Feb 3, 2019

This comment has been minimized.

retorquere commented Feb 3, 2019

This comment has been minimized.

retorquere commented Feb 3, 2019

This comment has been minimized.

dstillman commented Feb 3, 2019 • edited

This comment has been minimized.

dstillman commented Feb 3, 2019

This comment has been minimized.

retorquere commented Feb 3, 2019

This comment has been minimized.

retorquere commented Feb 3, 2019

This comment has been minimized.

zuphilip commented Feb 3, 2019

This comment has been minimized.

dstillman commented Feb 3, 2019

This comment has been minimized.

retorquere commented Feb 3, 2019

This comment has been minimized.

dstillman commented Feb 3, 2019

This comment has been minimized.

retorquere commented Feb 4, 2019 • edited

This comment has been minimized.

retorquere commented Feb 4, 2019

This comment has been minimized.

adam3smith commented Feb 4, 2019

This comment has been minimized.

adam3smith commented Feb 4, 2019

This comment has been minimized.

retorquere commented Feb 4, 2019

This comment has been minimized.

retorquere commented Feb 4, 2019

dstillman commented Feb 1, 2019 •

edited

dstillman commented Feb 2, 2019 •

edited

dstillman commented Feb 3, 2019 •

edited

retorquere commented Feb 4, 2019 •

edited