New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NER Learning Guide #248

Open
ianmilligan1 opened this Issue Jan 11, 2019 · 1 comment

Comments

Projects
None yet
1 participant
@ianmilligan1
Copy link
Member

ianmilligan1 commented Jan 11, 2019

In #246 we explored the use of NER within AUK to generate new derivatives with named entities, and concluded that it was too computationally intensive (by several orders of magnitude) to justify adding into the platform.

I suggested:

We could write a tutorial about how to do NER locally on your own full text files, so then a user could sample accordingly?

So let's add a new learning guide under full text.

Some Questions

What platform should we use? The simplest is to just point them to the Archives Unleashed Toolkit and to use this script which is found here.

import io.archivesunleashed._
import io.archivesunleashed.app._
import io.archivesunleashed.matchbox._

sc.addFile("/path/to/classifier")

ExtractEntities.extractFromScrapeText("english.all.3class.distsim.crf.ser.gz", "/path/to/extracted/text", "output-ner/", sc)

They just add the classifier, find the extracted text files, and then get NER output.

Option Two would be to build on the earlier learning guide on NLTK and point them to that in a Python environment.

Any thoughts? I don't have too much experience with NER.

@ianmilligan1

This comment has been minimized.

Copy link
Member Author

ianmilligan1 commented Feb 15, 2019

If we have nltk in the Jupyter Notebooks that @greebie is working on, I say we just have a cell there to run NER on the text. If not, I think we should just point them to the AUT toolkit and use that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment