Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python module. Issue #47 #49

Merged
merged 5 commits into from Apr 11, 2019

Conversation

Projects
None yet
3 participants
@greebie
Copy link
Collaborator

greebie commented Apr 4, 2019

This PR replaces the function cell in the notebook with a python module and revises the function to work with the Notebook class.

A few notes:

  • If you make changes to the class, the notebook will not recognize the changes until the notebook is restarted. (Ask me about the fun of debugging in this environment.) I assume this will not be a problem once the module is finalized.
  • The network graphs currently use very little of the new module (only the gephi filename). I would like some advice on whether we should resolve that here or in a different PR (e.g. when we split the notebooks).
  • The module contains a class called Notebook that takes a collection ID, a folder name and **kwargs that sets the various variables. One potential issue is if a config has a var that's not in the class. Could use some advice about whether an error is fine or if we should handle things more gracefully.
  • For the most part, this just means that where before we said get_text(), instead we now use nb.get_text().
  • The previous set-up used default parameters i.e. fun(x, min_length="2") - now it just uses the MINIMUM_WORD_LENGTH variable, so
nb.MINIMUM_WORD_LENGTH = 2
nb.fun(x)
  • There is some ambiguity re: style as we have ALL_CAPS in the notebook to handle madlibs, but inside a class this is poor Python style for non-constants. An easy solution is just to use lower() on instantiation for the class, but whatever other advice anyone has is welcome.
Add Python module.
Fix notebook to call from Python module.

Remove notebook functions.
@greebie

This comment has been minimized.

Copy link
Collaborator Author

greebie commented Apr 4, 2019

Marked this draft for general comments first. Once the approach is okay, I'll mark it ready for review and we can work through some of the details.

@ruebot
Copy link
Member

ruebot left a comment

Couple style changes here to to start. You'll also want to run it through something that'll validate PEP-8. http://pep8online.com/checkresult

You'll also need to update the Dockerfile to include the library. Otherwise, it's missing from the build and won't run. See here: https://mybinder.org/v2/gh/archivesunleashed/auk-notebooks/issue-47?filepath=auk-notebook.ipynb

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-8eb457572b52> in <module>
----> 1 from auknb import Notebook

ModuleNotFoundError: No module named 'auknb'

You can put a COPY auknb.py $HOME on line 18 https://github.com/archivesunleashed/auk-notebooks/blob/master/Dockerfile#L18

Notebook itself:

  • Probably want to change "# Change to alter path." to something a bit more descriptive. Maybe "Change value to full path to your data, including trailing slash."
  • # Setup Archives Unleashed Cloud data. you can get rid of that comment in the required packages section.
  • Put the library import in with all the others.
  • Probably need to change the intro language and section header for "User Configuration"
auknb.py Outdated
return(result)

if __name__ == "__main__":
print ("This module is intended for use with Archives Unleashed Cloud")

This comment has been minimized.

Copy link
@ruebot

ruebot Apr 4, 2019

Member

You can put all this in a header comment.

auknb.py Outdated
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.corpus import stopwords

class Notebook():

This comment has been minimized.

Copy link
@ruebot

ruebot Apr 4, 2019

Member

Let's change the class name to AuNotebook.

greebie added some commits Apr 5, 2019

Run class through PEP8 Lint.
Remove print statements in favor of docstrings.
Provide fixes as per review:
- class variables are lower case as per PEP8
- add auknb.py to Dockerfile
- fix explanations for function calls etc.
- add instructions to view functions and class documentation.
x
@greebie

This comment has been minimized.

Copy link
Collaborator Author

greebie commented Apr 5, 2019

I think the latest commit fixes the above review fixes. I also included mention that help(nb.function_name()) will provide instructions. Once I get word from @ianmilligan1 that the approach is okay, I will take this out of draft and work on fine tuning.

@ianmilligan1
Copy link
Member

ianmilligan1 left a comment

Works on my end as a user, but I'd want to make sure @ruebot green-lights this as he knows exponentially more than I do.

@greebie

This comment has been minimized.

Copy link
Collaborator Author

greebie commented Apr 7, 2019

Thanks @ianmilligan1 @ruebot what are your thoughts on unit tests here? I can build them, but am not sure if they belong in the auk-notebooks folder.

@ruebot

This comment has been minimized.

Copy link
Member

ruebot commented Apr 8, 2019

Out of scope for now. We'll cross that bridge when we publish it in it's own repo.

@ruebot
Copy link
Member

ruebot left a comment

Notebook review:

  • AUK_PATH comment has two full-stops.
  • What's the purpose of displaying: help(AuNotebook.clean_domain)?
  • Change: The following cell sets up AuNotebook to The following cell sets up the notebook
  • Cell 13 has an output error # Get a list of the top words in the collection (regardless of year). Did you clear the notebook and re-run it?
  • You should update the library filename to reflect the class name change. Make sure you update the Dockerfile as well.

greebie added some commits Apr 8, 2019

Edits per review.
- remove help cell (was only for testing).
- change module name to aunb.
- fix output errors (i.e. be more patient when waiting for notebook to run.)
@greebie

This comment has been minimized.

Copy link
Collaborator Author

greebie commented Apr 8, 2019

Latest commit should be ready to go. Moving this out of draft.

@greebie greebie marked this pull request as ready for review Apr 8, 2019

@ruebot

ruebot approved these changes Apr 11, 2019

@ruebot ruebot merged commit 4b8938b into master Apr 11, 2019

1 check passed

ci/dockercloud Your tests passed in Docker Cloud
Details

@ruebot ruebot deleted the issue-47 branch Apr 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.