Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upBasic Token Frequency #4
Comments
This comment has been minimized.
This comment has been minimized.
For the tokenization that we do now, it is tokenization per row, which will includes a |
This comment has been minimized.
This comment has been minimized.
This is what I was thinking, but
might be more effective? Curious if @lintool has any suggestions... |
This comment has been minimized.
This comment has been minimized.
How's this now? Does it cover the spirit of the issue? https://github.com/archivesunleashed/notebooks/blob/master/parquet_text_analyis.ipynb |
This comment has been minimized.
This comment has been minimized.
Looks good! |
ianmilligan1 commentedNov 11, 2019
Could we put some basic token frequency after tokens are generated? Most popular words, etc. If it could be broken down by date that would also be perhaps interesting (most popular words in year1 vs year2). That could flow into the word cloud nicely I think.