New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NER Learning Guide #248

Open
ianmilligan1 opened this Issue Jan 11, 2019 · 0 comments

Comments

Projects
None yet
1 participant
@ianmilligan1
Member

ianmilligan1 commented Jan 11, 2019

In #246 we explored the use of NER within AUK to generate new derivatives with named entities, and concluded that it was too computationally intensive (by several orders of magnitude) to justify adding into the platform.

I suggested:

We could write a tutorial about how to do NER locally on your own full text files, so then a user could sample accordingly?

So let's add a new learning guide under full text.

Some Questions

What platform should we use? The simplest is to just point them to the Archives Unleashed Toolkit and to use this script which is found here.

import io.archivesunleashed._
import io.archivesunleashed.app._
import io.archivesunleashed.matchbox._

sc.addFile("/path/to/classifier")

ExtractEntities.extractFromScrapeText("english.all.3class.distsim.crf.ser.gz", "/path/to/extracted/text", "output-ner/", sc)

They just add the classifier, find the extracted text files, and then get NER output.

Option Two would be to build on the earlier learning guide on NLTK and point them to that in a Python environment.

Any thoughts? I don't have too much experience with NER.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment