Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: pyHoops: A Python package for advanced basketball data analytics #1845

Open
whedon opened this issue Oct 30, 2019 · 28 comments
Assignees
Labels

Comments

@whedon
Copy link
Collaborator

@whedon whedon commented Oct 30, 2019

Submitting author: @alessandroBombelli (Alessandro Bombelli)
Repository: https://github.com/alessandroBombelli/pyHoops
Version: v1.5
Editor: @danielskatz
Reviewer: @mado89, @ctsilva
Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/18856808e77f3bd157c0ea099c9668f0"><img src="https://joss.theoj.org/papers/18856808e77f3bd157c0ea099c9668f0/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/18856808e77f3bd157c0ea099c9668f0/status.svg)](https://joss.theoj.org/papers/18856808e77f3bd157c0ea099c9668f0)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@mado89 & @ctsilva, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @danielskatz know.

Please try and complete your review in the next two weeks

Review checklist for @mado89

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
    As of now there is no clear LICENSE file in the repository
  • Contribution and authorship: Has the submitting author (@alessandroBombelli) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
    No real installation instructions given. For me the software worked out of the box, but I assume it is only because I had coincidentally all required software installed.
  • Functionality: Have the functional claims of the software been confirmed?
    No claims given so I cannot answer this. However, I noticed some problems and opened issues for them.
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
    No real statement given
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
    The
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
    The provided test script provides some sample usage. However, the input and output could be explained better.
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
    No thorough documentation of the script.
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
    No tests. However, I think due to the nature of the software they might not be necessary. Hence, I ticked this one off.
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
    The Readme contains the following statement "We welcome all feedback that might help improve the package.". I think this could be expanded to incorporate 1,2 and 3.

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
    Not really. See comment below
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
    Not really. See comment below
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
    Not really. See comment below
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
    see below

Review checklist for @ctsilva

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@alessandroBombelli) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
@whedon

This comment has been minimized.

Copy link
Collaborator Author

@whedon whedon commented Oct 30, 2019

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @mado89, @ctsilva it looks like you're currently assigned to review this paper 🎉.

⭐️ Important ⭐️

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
@whedon

This comment has been minimized.

Copy link
Collaborator Author

@whedon whedon commented Oct 30, 2019

Attempting PDF compilation. Reticulating splines etc...
@whedon

This comment has been minimized.

Copy link
Collaborator Author

@whedon whedon commented Oct 30, 2019

@danielskatz

This comment has been minimized.

Copy link

@danielskatz danielskatz commented Oct 30, 2019

👋 @mado89 & @ctsilva, Thanks for agreeing to review this submission.

As stated above (please read the comments above), please carry out your review in this issue by updating your checklist.

If you see things that block you from checking an item and they can be described briefly, feel free to do so here, but if they need more words, or even preferentially, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread here. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

If you have any questions or problems, please let me know.

@danielskatz

This comment has been minimized.

Copy link

@danielskatz danielskatz commented Nov 6, 2019

👋 @mado89 & @ctsilva - I just wanted to check on how you are doing on your reviews, and to make sure that you are able to check boxes (and have accepted the invitation at https://github.com/openjournals/joss-reviews/invitations)

@mado89

This comment has been minimized.

Copy link
Collaborator

@mado89 mado89 commented Nov 7, 2019

@danielskatz sorry for the delay. In the next days and on the weekend ill have some free time to do the review. I expect to have it done by 11/12 of november

@alessandroBombelli

This comment has been minimized.

Copy link

@alessandroBombelli alessandroBombelli commented Nov 7, 2019

👋 @mado89 & @ctsilva - I just wanted to check on how you are doing on your reviews, and to make sure that you are able to check boxes (and have accepted the invitation at https://github.com/openjournals/joss-reviews/invitations)

Dear @mado89 , @ctsilva, @danielskatz, I just realized the website I am using to web-scrape data for my examples changed the html structure of their play-by-play tables. This is retroactive, in the sense that is also applied to the webpages containing the scores I am using as examples. This means the examples will probably not work because I need to adjust the web-scraping issue. I sincerely apologize for this. I am planning to fix the problem over the weekend (best case scenario). I am not expecting the html structure to be changed anytime soon again, but I will find a different strategy for my examples so that if this situation occurs again, the code does not need to be fixed as it concerns the examples. Unfortunately, this is a problem that occurs with every web-scraping routine. If the original html webpage changes, the code must be adapted as well. If this is not consistent with your timeline or requirements, I totally understand. You can reject the paper and I will submit an improved version that circumvents this issue. My apologies

@mado89

This comment has been minimized.

Copy link
Collaborator

@mado89 mado89 commented Nov 9, 2019

@alessandroBombelli or @danielskatz Can you clarrify: Shall we proceed with the review or pause this?

@danielskatz

This comment has been minimized.

Copy link

@danielskatz danielskatz commented Nov 9, 2019

Sorry, I missed the notification from @alessandroBombelli 2 days ago.

Let's just pause for now and see if this can be resolved fairly quickly.

@danielskatz danielskatz added the paused label Nov 9, 2019
@mado89

This comment has been minimized.

Copy link
Collaborator

@mado89 mado89 commented Nov 9, 2019

Ok. Meanwhile, could you help me/us out over here? alessandroBombelli/pyHoops#49
Question is whether its ok to state in the readme to be under MIT license or whether a LICENSE file is required

@danielskatz

This comment has been minimized.

Copy link

@danielskatz danielskatz commented Nov 9, 2019

I replied there - thanks for also bringing this to my attention

@alessandroBombelli

This comment has been minimized.

Copy link

@alessandroBombelli alessandroBombelli commented Nov 12, 2019

Sorry, I missed the notification from @alessandroBombelli 2 days ago.

Let's just pause for now and see if this can be resolved fairly quickly.

Dear @mado89, @danielskatz, @ctsilva just a quick update. I should be done with my fixes in a couple of days tops. I learnt from this issue, and will generate a .csv of python pickle file that can be processed independently from the initial web-scraping routing, should the html structure of the webpage change anytime soon. Sorry again for the inconvenience

@alessandroBombelli

This comment has been minimized.

Copy link

@alessandroBombelli alessandroBombelli commented Nov 13, 2019

Sorry, I missed the notification from @alessandroBombelli 2 days ago.
Let's just pause for now and see if this can be resolved fairly quickly.

Dear @mado89, @danielskatz, @ctsilva just a quick update. I should be done with my fixes in a couple of days tops. I learnt from this issue, and will generate a .csv of python pickle file that can be processed independently from the initial web-scraping routing, should the html structure of the webpage change anytime soon. Sorry again for the inconvenience

Dear @mado89, @danielskatz, @ctsilva, I updated the code to reflect the changes in the original website I web-scrape. Tomorrow, I will update the github files so that they be downloaded and run. In the menawhile, I updated the readme.md file as well, and added the MIT license as requested. Thanks for the patience

@danielskatz

This comment has been minimized.

Copy link

@danielskatz danielskatz commented Nov 13, 2019

Thanks - let us know when this is all done and I'll remove the paused label.

@alessandroBombelli

This comment has been minimized.

Copy link

@alessandroBombelli alessandroBombelli commented Nov 14, 2019

Thanks - let us know when this is all done and I'll remove the paused label.

Dear @mado89, @danielskatz, @ctsilva, I updated the Github repo. As mentioned in the Installation section, for now the easiest thing is to download the tests folder as provided, which already contains the pyHoops.py file. I will be soon working on uploading the file in PyPi, so a classic (and more standard) installation of the package is possible. I also updated the readme.md to better show why this package could be relevant for research purposes. The NBA recently launched a competition to use machine learning algorithms to predict game scores using historical data (https://www.nbadatachallenge.com/). This package can be used to extract "real-time" historical data in an easy way (provided the web-scraping part is customized with respect to the actual web-page. Unfortunately there is not free lunch here) for the NBA and other basketball leagues. In the meanwhile, thanks for the patience and do not hesitate to contact me if you have issues.
Cheers,
Alessandro

@mado89

This comment has been minimized.

Copy link
Collaborator

@mado89 mado89 commented Nov 15, 2019

Dear @mado89, @danielskatz, @ctsilva, I updated the code to reflect the changes in the original website I web-scrape. Tomorrow, I will update the github files so that they be downloaded and run. In the menawhile, I updated the readme.md file as well, and added the MIT license as requested. Thanks for the patience

I am sorry but I can't see the license file. I am looking at the master branch. I guess my problems are also related to the issue I have just created (alessandroBombelli/pyHoops#56). When I inspect the tag "1.5" the project looks more like a python package (although LICENSE is located in pyHoops and not in the root directory).

In order to not spend too much time digging around and looking for the code to review, could I ask you to clean up the repository and notify us once this issue is fixed?
To be precise I'd like to open the repository using https://github.com/alessandroBombelli/pyHoops and see the branch to be reviewed displayed (in case this is not the master branch you can change that somewhere in github afaik). Sorry for all the maintenance extra work but I feel your work will benefit from this.

@alessandroBombelli

This comment has been minimized.

Copy link

@alessandroBombelli alessandroBombelli commented Nov 15, 2019

Dear @mado89, @danielskatz, @ctsilva, I updated the code to reflect the changes in the original website I web-scrape. Tomorrow, I will update the github files so that they be downloaded and run. In the menawhile, I updated the readme.md file as well, and added the MIT license as requested. Thanks for the patience

I am sorry but I can't see the license file. I am looking at the master branch. I guess my problems are also related to the issue I have just created (alessandroBombelli/pyHoops#56). When I inspect the tag "1.5" the project looks more like a python package (although LICENSE is located in pyHoops and not in the root directory).

In order to not spend too much time digging around and looking for the code to review, could I ask you to clean up the repository and notify us once this issue is fixed?
To be precise I'd like to open the repository using https://github.com/alessandroBombelli/pyHoops and see the branch to be reviewed displayed (in case this is not the master branch you can change that somewhere in github afaik). Sorry for all the maintenance extra work but I feel your work will benefit from this.

Hi @mado89, I updated yesterday the repository https://github.com/alessandroBombelli/pyHoops and, more specifically,
(i) I modified the "Installation" section specifying that, for now, the only way to download the file is to download the "tests" folder which contains the file itself
(ii) I added the MIT license in the "License" section (on top of the link to the MIT license website)

As example, my "Installation" section now says

Installation:
The most updated version of pyHoops is directly accessible via the folder tests in this repository. The package pyHoops.py would need to be saved in the same folder as the main code calling it (see Section "Example"). pyHoops will be soon added to PyPi, so that an automatic installation via pip is also possible.

If you are not seeing this, we might be checking different branches. That is my master branch as updated yesterday.
Going back to your advice when it comes to PyPi, you are totally right. I actually followed those very same instructions when I initially created the package, and it worked the first time. I modified the package and wanted to update the version, and all of a sudden the same command I was running from the terminal to upload/update the package in PyPi stopped working. After an unsuccessful afternoon, I gave up. I will follow your advice and try again. If it worked the first time, it should work again. Again, I agree with you in the sense that the current layout is not consistent with what a good package should look like, and I apologize for that

@mado89

This comment has been minimized.

Copy link
Collaborator

@mado89 mado89 commented Nov 15, 2019

This is what I see:

Screenshot from 2019-11-15 21-58-11

@alessandroBombelli

This comment has been minimized.

Copy link

@alessandroBombelli alessandroBombelli commented Nov 15, 2019

This is what I see:

Screenshot from 2019-11-15 21-58-11

That looks right. The "Installation" and "License" sections have been updated though. As example, now I mention that pyHoops will be uploaded to PyPi in the next future and advise the (not so ideal) direct download of the "tests" file. If that is what you see, we are looking at the same thing.

As a side note, there is already a PyPi version of pyHoops (https://pypi.org/project/pyHoops/), which I created following the tutorial you suggested. It is a deprecated version I wanted to update, and that is when I started having issues. I will try to solve the issue during the weekend.

In the meanwhile, thanks for the help and the suggestions.

@danielskatz

This comment has been minimized.

Copy link

@danielskatz danielskatz commented Nov 27, 2019

👋 @mado89 - How is your review proceeding? Specifically, what is blocking you at this point?

@danielskatz

This comment has been minimized.

Copy link

@danielskatz danielskatz commented Nov 27, 2019

👋 @ctsilva - Have you been able to start your review? I don't see any checked boxes in your checklist.

@mado89

This comment has been minimized.

Copy link
Collaborator

@mado89 mado89 commented Dec 2, 2019

wave @mado89 - How is your review proceeding? Specifically, what is blocking you at this point?

For me its the general structure of the project. For now it looks more like a utility script that needs to be copied. Although there is a defunct "setup.py" on a different branch. The missing LICENSE file is also on a different branch. So the review would need to consider different branches with different problems.

@danielskatz

This comment has been minimized.

Copy link

@danielskatz danielskatz commented Dec 2, 2019

For me its the general structure of the project.

What do you suggest for the author to do to improve this?
Can this be reviewed in its current state? (An option is to pause the review until some changes are made.)

@danielskatz

This comment has been minimized.

Copy link

@danielskatz danielskatz commented Dec 6, 2019

@mado89 - Can you make some specific suggestions? (see my previous comment)

@mado89

This comment has been minimized.

Copy link
Collaborator

@mado89 mado89 commented Dec 7, 2019

Review:

I now - finally - had time to give the software a thorough review.
As mentioned in my previous comments I have the feeling that the author needs to polish the structure of the project. Some (vital) aspects such as the LICENSE file and setup.py are spread over different branches.
Moreover, the main software is duplicated (once in root, once in tests).
The generated graphs in the test directory need some work in my opinion. In almost every graph the logo is overlapping with the graph. Sometimes even with the title (I opened an issue for this, alessandroBombelli/pyHoops#57).

Paper:

  • The paper describing the software lacks some scientific rigour. The need of the software is only justified by references to popculture or interviews. Especially in sport science what people tell you in interviews differs quite a lot to what is actually done behind the scences (At least in my experience). This is not to say that there is no need for pyHoops.
  • The architecture of the software is not described thoroughly. Moreover, I was unable to identify the 'two major blocks' mentioned in the paper when looking at the software.
  • The paper mentiones 'indices' but does not describe which indices/metrics are computed and whether and how additional ones could be added to the software.

Software qualitiy:
In my opinion the software could benefit from a refactoring.
As of now it seems that it does not adhere to the principles of object oriented programming.
There are a couple of functions/methods but no checks on input types etc.
Moreover, it seems that webparsing and data processing is tied together which should not be the case.
I'd suggest to have different utility classes for extracting the information from the web and processing it.
This would also solve your problem of changing websites. Moreover, someone could write additional parsers for different websites (if they exist. I haven't checked up on this)

The software seems to be a useful tool. Although, I have to disclose that I do not work in the field of basketball. However, as of now it looks more like a utility script rather than a complete software. I'd suggest improving on the comments made and resubmitting it.

I hope my comments make sense to you @alessandroBombelli If not please let me know so that I can assist in clarifying.

@mado89

This comment has been minimized.

Copy link
Collaborator

@mado89 mado89 commented Dec 7, 2019

@mado89 - Can you make some specific suggestions? (see my previous comment)

@danielskatz: I made some comments above and opened some issues. My main concern is that there are some problems with using git / version control in this project (see alessandroBombelli/pyHoops#56 and alessandroBombelli/pyHoops#49).
If I am too strict on this matter please let me know!

@danielskatz

This comment has been minimized.

Copy link

@danielskatz danielskatz commented Dec 8, 2019

Thanks @mado89 - I don't think you are being too strict

One of the requirements for JOSS is

The software should be feature-complete (no half-baked solutions) and designed for maintainable extension (not one-off modifications). Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable

I think the things you are asking for here fall into the

designed for maintainable extension (not one-off modifications)

category

@danielskatz

This comment has been minimized.

Copy link

@danielskatz danielskatz commented Dec 8, 2019

@mado89 - I also appreciate your time and thoroughness in the review and feedback so far

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.