Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd user stop words Madlib. #45 #48
Conversation
ianmilligan1
requested changes
Mar 25, 2019
Diff was too large to provide inline comments. Looks good. The suggestions I'd make are:
Otherwise, works quite well. I'm fine with removing the internationalization code here – it is straightforward and I'll defer to you and the datathon participants findings – but I think in the future we should try to keep PRs to one thing in scope. |
This comment has been minimized.
This comment has been minimized.
Okay. Will fix. I knew you were going to say that about international(), but I just couldn't bear seeing it just be wrong and out there in the open for such a small change. Also, I knew that #42 was still in the issues, so the tracking is still available. |
greebie
added some commits
Mar 25, 2019
This comment has been minimized.
This comment has been minimized.
Latest commit should resolve all issues stated above. |
ianmilligan1
requested changes
Mar 25, 2019
This comment has been minimized.
This comment has been minimized.
I think that's a different issue which might be fixed more generally in another branch, especially if we look at making a module to cover the functions. Essentially we have general madlibs and more specific madlibs and it's not always clear which is which. I don't think that it's that helpful to tell people to scroll up to change something. Let's look at the overall explanations / docs / comments together as a separate issue. (Maybe after receiving feedback from the datathon as well.) |
This comment has been minimized.
This comment has been minimized.
OK. I still don't think
is clear? Filter what? Filter in? Filter out? Can you try to take a kick at the can to make it a bit more usable? |
This comment has been minimized.
This comment has been minimized.
No. That's in scope for this PR. Ian's suggestion makes sense. Can you update your PR to include that, and we can continue to iterate from there. |
This comment has been minimized.
This comment has been minimized.
Now that I've been able to look at this more closely, I see your point. I have some follow-up work this morning, but will get this fixed in the afternoon. |
ianmilligan1
requested changes
Mar 27, 2019
@@ -645,7 +645,7 @@ | |||
"source": [ | |||
"## Overall Collection Characteristics\n", | |||
"\n", | |||
"Change the variables in the following cell to manipulate the analysis you'll be running to understand overall collection characteristics. In this case, they mostly affect the visualization that we generate.\n" | |||
"Change the variables in the following cell to manipulate the analysis you'll be running to understand overall collection characteristics. In this case, they mostly affect the visualization that we generate. You may wish to add or remove words to `STOP_WORDS_USER`in the user configuration cell to fine-tune the results.\n" |
This comment has been minimized.
This comment has been minimized.
ianmilligan1
Mar 27, 2019
Member
maybe fine-tune the results
-> remove words that may be overwhelming your analysis
greebie commentedMar 25, 2019
This PR gives the user the options to use NLTK stopwords and/or add their own. This was a common demand from the datathon.
I also removed the International () function call because it is not doing what I originally thought it was doing.