Permalink
Please
sign in to comment.
Browse files
Align NER output to WANE format; addresses #297 (#361)
- Update Stanford core NLP - Format NER output in json - Add getPayloadDigest to ArchiveRecord - Add test for getPayloadDigest - Add payload digest to NER output - Remove extractFromScrapeText - Remove extractFromScrapeText test - TODO: PERSON -> persons, LOCATION -> locations, ORGANIZATION -> organizations (involves writing a new class or overriding NER output🤢
- Loading branch information
Showing
with
48 additions
and 49 deletions.
- +1 −1 pom.xml
- +1 −1 src/main/java/io/archivesunleashed/data/WarcRecordUtils.java
- +12 −1 src/main/scala/io/archivesunleashed/ArchiveRecord.scala
- +18 −28 src/main/scala/io/archivesunleashed/app/ExtractEntities.scala
- +1 −1 src/main/scala/io/archivesunleashed/app/NERCombinedJson.scala
- +2 −2 src/main/scala/io/archivesunleashed/matchbox/NERClassifier.scala
- +13 −0 src/test/scala/io/archivesunleashed/ArchiveRecordTest.scala
- +0 −15 src/test/scala/io/archivesunleashed/app/ExtractEntitiesTest.scala
0 comments on commit
379cc68