NER Learning Guide #248

ianmilligan1 · Jan 11, 2019

In #246 we explored the use of NER within AUK to generate new derivatives with named entities, and concluded that it was too computationally intensive (by several orders of magnitude) to justify adding into the platform.

I suggested:

We could write a tutorial about how to do NER locally on your own full text files, so then a user could sample accordingly?

So let's add a new learning guide under full text.

Some Questions

What platform should we use? The simplest is to just point them to the Archives Unleashed Toolkit and to use this script which is found here.

import io.archivesunleashed._
import io.archivesunleashed.app._
import io.archivesunleashed.matchbox._

sc.addFile("/path/to/classifier")

ExtractEntities.extractFromScrapeText("english.all.3class.distsim.crf.ser.gz", "/path/to/extracted/text", "output-ner/", sc)

They just add the classifier, find the extracted text files, and then get NER output.

Option Two would be to build on the earlier learning guide on NLTK and point them to that in a Python environment.

Any thoughts? I don't have too much experience with NER.

ianmilligan1 · Feb 15, 2019

If we have nltk in the Jupyter Notebooks that @greebie is working on, I say we just have a cell there to run NER on the text. If not, I think we should just point them to the AUT toolkit and use that.

ianmilligan1 added the question label Jan 11, 2019

ianmilligan1 self-assigned this Jan 11, 2019

ianmilligan1 referenced this issue Jan 11, 2019
Closed
Discussion: Do we want to implement LGA as a derivative in AUK? #245

archivesunleashed/auk

NER Learning Guide #248

NER Learning Guide #248

ianmilligan1 commented Jan 11, 2019

ianmilligan1 added the question label Jan 11, 2019

ianmilligan1 self-assigned this Jan 11, 2019

ianmilligan1 referenced this issue Jan 11, 2019

Discussion: Do we want to implement LGA as a derivative in AUK? #245

This comment has been minimized.

ianmilligan1 commented Feb 15, 2019

archivesunleashed/auk

Join GitHub today

NER Learning Guide #248

Comments

ianmilligan1 commented Jan 11, 2019

Some Questions

ianmilligan1 added the question label Jan 11, 2019

ianmilligan1 self-assigned this Jan 11, 2019

ianmilligan1 referenced this issue Jan 11, 2019

Discussion: Do we want to implement LGA as a derivative in AUK? #245

This comment has been minimized.

ianmilligan1 commented Feb 15, 2019