Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upMake stopword list for hc report terms configurable #14
Labels
Comments
See Automatically Building a Stopword List for an Information Retrieval System for an idea on how we might automatically compute stopwords. I suspect that we need to include stopwords elsewhere to improve the results of DSA1. With this realization, we might want to give this a little more thought before just testing and releasing the recent code changes. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
shawnmjones commentedJun 25, 2020
The stopwords for
hc report terms
are currently hardcoded. Even worse, the are hard coded only in the sumgram code and not the general n-gram code.hypercane/hypercane/report/sumgrams.py
Lines 53 to 162 in 2e071d7
The generic terms report will need to accept the same stopword list at
get_document_tokens
:hypercane/hypercane/report/terms.py
Lines 6 to 28 in 2e071d7