Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP8 Naming - UDFs, App method names, DataFrame names, and filters. #468

Closed
ruebot opened this issue May 27, 2020 · 10 comments
Closed

PEP8 Naming - UDFs, App method names, DataFrame names, and filters. #468

ruebot opened this issue May 27, 2020 · 10 comments
Assignees
Labels

Comments

@ruebot
Copy link
Member

@ruebot ruebot commented May 27, 2020

Let's evaluate how we've named everything; UDFs, App method names, DataFrame names, and filters.

Are we all good with how we've named everything?

Do we need to change anything before a 1.0.0 release?

See also #467

@lintool
Copy link
Member

@lintool lintool commented May 29, 2020

My thoughts: In Python, I would default to PEP8 unless there's a convincing reason otherwise: https://www.python.org/dev/peps/pep-0008/

Apps are most like Classes, so I would go CamelCase. Note, it says:

Note: When using acronyms in CapWords, capitalize all the letters of the acronym. Thus HTTPServerError is better than HttpServerError.

UDFs are like methods, so should follow standard snake_case.

@ruebot
Copy link
Member Author

@ruebot ruebot commented May 29, 2020

@lintool anything on the Scala side? If not, I'll get some PRs ready for aut, documentation, and notebooks.

@ruebot
Copy link
Member Author

@ruebot ruebot commented May 29, 2020

  • ExtractPopularImages
  • WriteGexf
  • WriteGraphml
@ruebot ruebot changed the title [DISCUSSION] Naming - UDFs, App method names, DataFrame names, and filters. PEP8 Naming - UDFs, App method names, DataFrame names, and filters. May 30, 2020
@ruebot
Copy link
Member Author

@ruebot ruebot commented May 30, 2020

Resolved with 8037745

@ruebot ruebot closed this May 30, 2020
1.0.0 Release of AUT automation moved this from In Progress to Done May 30, 2020
@lintool
Copy link
Member

@lintool lintool commented Jun 1, 2020

Feel free to vote down if you think this ship has already sailed... Currently, we have removeHTML and remove_html for DF UDFs in Scala and Python, respective. This is good.

However, for RDDs, we still have RemoveHTMLRDD. I suppose we can rename those to just RemoveHTML. The casing differences make it clear that small camel is DF and regular camel is RDD?

@lintool lintool reopened this Jun 1, 2020
@lintool
Copy link
Member

@lintool lintool commented Jun 1, 2020

And also: https://github.com/archivesunleashed/aut-docs/blob/master/current/dataframe-schemas.md

Thoughts about Python/Scala naming conventions? Right now we use small camel case. In Python DF, this goes against the best practice of downcased snake case.

There are three relevant cases:

  • .presentationProgramFiles()
  • .wordProcessorFiles()
  • .textFiles()

If we change to:

  • .presentations()
  • .documents()
  • .textfiles()

We would completely resolve any of these issues. .documents() might be slightly confusing; .textfiles() seems fine to me as all lower cased since we have .imagegraph().


And with this, I think I've said my peace on naming.

@ruebot
Copy link
Member Author

@ruebot ruebot commented Jun 1, 2020

Good call on the Scala side. It'll be nice to remove those RDD suffixes. I'll get the Scala side sorted out here in a bit.

For the three additional cases, these two -- presentationProgramFiles and wordProcessorFiles -- their collective names come directly from Wikipedia, and I'm fairly certain I outlined all this the last time it came up in an issue. As you say, documents is too confusing. I'd rather be fairly explicit with this, rather than vague. On the Python side, word_processor and presentation_program, so I could see changing the Scala side to wordProcessor and presentationProgram. I think that's a good compromise.

As for textFiles vs textfiles, I'm happy to make that change, and even happier to just yank it completely since they're so hard to nail now. We get way too many false positives on textFiles() extraction jobs. But, then again, it could be helpful for research questions 🤷‍♂️

@lintool
Copy link
Member

@lintool lintool commented Jun 1, 2020

Good call on the Scala side. It'll be nice to remove those RDD suffixes. I'll get the Scala side sorted out here in a bit.

👍

For the three additional cases, these two -- presentationProgramFiles and wordProcessorFiles -- their collective names come directly from Wikipedia, and I'm fairly certain I outlined all this the last time it came up in an issue. As you say, documents is too confusing. I'd rather be fairly explicit with this, rather than vague. On the Python side, word_processor and presentation_program, so I could see changing the Scala side to wordProcessor and presentationProgram. I think that's a good compromise.

I think I'm fine with presentationProgramFiles and wordProcessorFiles on the Scala side and presentation_program_files and word_processor_files on the Python side. I think this is a bit clearer - we mean files from those application, not those applications themselves.

As for textFiles vs textfiles, I'm happy to make that change, and even happier to just yank it completely since they're so hard to nail now. We get way too many false positives on textFiles() extraction jobs. But, then again, it could be helpful for research questions 🤷‍♂️

Let's yank.

@ruebot
Copy link
Member Author

@ruebot ruebot commented Jun 1, 2020

@lintool #479 should address all of this.

@ianmilligan1
Copy link
Member

@ianmilligan1 ianmilligan1 commented Jun 1, 2020

Should be closed with 1ac97ef (#479).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.