Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign up[PRE REVIEW]: Hierdenc: Retrieval and clustering of large categorical data sets with locality-sensitive hashing #693
Comments
whedon
added
the
pre-review
label
Apr 23, 2018
This comment has been minimized.
This comment has been minimized.
Hello human, I'm @whedon. I'm here to help you with some common editorial tasks. For a list of things I can do to help you, just type:
|
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
PDF failed to compile for issue #693 with the following error: Can't find any papers to compile :-( |
This comment has been minimized.
This comment has been minimized.
@wandreopoulos - do you know why the source code is not showing up here? https://sourceforge.net/p/hierdenc/code/ref/master/ |
This comment has been minimized.
This comment has been minimized.
wandreopoulos
commented
Apr 23, 2018
Sorry, I hadn't pushed the files with git. The files are available here.
Should I push with git?
https://sourceforge.net/projects/hierdenc/files/
…On Mon, Apr 23, 2018 at 3:42 AM, Arfon Smith ***@***.***> wrote:
@wandreopoulos <https://github.com/wandreopoulos> - do you know why the
source code is not showing up here? https://sourceforge.net/p/
hierdenc/code/ref/master/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#693 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABprddUJYDOLmhZi85h_fhaVsFLZ1Naxks5trbAigaJpZM4TfoFY>
.
|
This comment has been minimized.
This comment has been minimized.
@wandreopoulos - yes, please push to git too. |
This comment has been minimized.
This comment has been minimized.
wandreopoulos
commented
Apr 25, 2018
I did |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
kyleniemeyer
commented
May 1, 2018
@arfon I don't think I'm the best person to edit this; someone more data-sciency might be better. |
This comment has been minimized.
This comment has been minimized.
@whedon generate pdf |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
PDF failed to compile for issue #693 with the following error: /app/vendor/ruby-2.3.4/lib/ruby/2.3.0/find.rb:43:in |
This comment has been minimized.
This comment has been minimized.
@whedon generate pdf |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
whedon
added
Python
TeX
labels
May 4, 2018
This comment has been minimized.
This comment has been minimized.
@whedon list reviewers |
This comment has been minimized.
This comment has been minimized.
Here's the current list of reviewers: https://bit.ly/joss-reviewers |
This comment has been minimized.
This comment has been minimized.
@wandreopoulos - could you take a look at the list above |
This comment has been minimized.
This comment has been minimized.
wandreopoulos
commented
May 25, 2018
I can suggest these potential reviewers:
dhhagan
benjamin-lee
d-chambers
jrbourbeau
mlgill
highlando
cadair
zhampel
…On Sun, May 13, 2018 at 5:57 PM, Arfon Smith ***@***.***> wrote:
@wandreopoulos <https://github.com/wandreopoulos> - could you take a look
at the list above
|
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
I can take it. @wandreopoulos It looks like your list formatting is missing some newlines to make it format as a list instead of as prose. Here is a recently-accepted example with a list that formats correctly. https://github.com/jrbourbeau/pyunfold/blob/master/docs/joss_paper/paper.md Please also consider addressing community guidelines and examples. @arfon Correct me if wrong, but I would expect the "repository" link to be viewable in a web browser, versus a URL that can only be cloned by the Git client. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
OK, the editor is @jedbrown |
whedon
assigned
jedbrown
Jun 5, 2018
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
This is a Python package with no install process (no |
This comment has been minimized.
This comment has been minimized.
wandreopoulos
commented
Jun 16, 2018
Sorry about this. I checked the community guidelines you sent me and I need
to follow them more closely. So, the submission is "not feature complete".
I do have tests and examples, but I need to push them to a new branch. I
will also make a setup.py and the proper docstrings, following the example
you sent me.
I hope it is ok to wait before reviewer assignment, until I push a new
branch.
Thank you for your feedback to improve this software.
…On Fri, Jun 15, 2018 at 1:40 PM, Jed Brown ***@***.***> wrote:
This is a Python package with no install process (no setup.py; it is
meant to be run out of the source directory), no tests, and no examples. It
requires modifying source code just to run. The tool is a total of 480
lines of Python and there are no docstrings. I was hoping to hear from
@wandreopoulos <https://github.com/wandreopoulos> before reviewer
assignment because it appears at first glance to be either "not feature
complete" or a "minor utility", though it has potential.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#693 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABprdT-IOI0Uvm8y1xdjnAdD1q38i6qtks5t9Bu2gaJpZM4TfoFY>
.
|
jedbrown
added
the
paused
label
Sep 21, 2018
This comment has been minimized.
This comment has been minimized.
Hi @wandreopoulos - how are you getting along making these changes? |
This comment has been minimized.
This comment has been minimized.
wandreopoulos
commented
Nov 5, 2018
•
Dear editors,
I made the changes to follow the community guidelines. I replied to the thread a couple of weeks ago, but the message didn't go through because the thread was paused.
I have pushed the updates to the sourceforge repository.
git clone ssh://billandreo@git.code.sf.net/p/hierdenc/code hierdenc-code
I am sorry for the delay. It has been challenging to find time lately.
The project is a database-backed tool for locality-sensitive hashing of
large categorical datasets. It can be used for search or clustering of
categorical datasets of any size.
Looking forward to hearing back from you.
Thank you for any feedback,
Bill
…On Thu, Nov 1, 2018 at 6:13 PM Arfon Smith ***@***.***> wrote:
Hi @wandreopoulos <https://github.com/wandreopoulos> - how are you
getting along making these changes?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#693 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABprdSXOUmaBbvlvS6MlkXv70xWFb6U_ks5uq5wzgaJpZM4TfoFY>
.
|
This comment has been minimized.
This comment has been minimized.
Have you tested with Python-3 (up to 3.6, as Please also double-check the review criteria, especially community guidelines (I don't see anything about contributions or support) and documentation. I'm perplexed by the software design to use classes with no members or data as a wrapper around a single |
This comment has been minimized.
This comment has been minimized.
wandreopoulos
commented
Nov 29, 2018
Thank you.
I have replaced most of the classes with plain functions, except for 2
classes (Insert_Pairwise_Sims and Insert_Radius_Sim_Density) that inherit
from multiprocessing.Process. That is a simpler design.
I have added community guidelines about contributions and support in the
README.txt.
I have replaced MySQL-python with mysqlclient in the requirements.txt.
At the moment I am running the code with python2.7. Therefore I show only
python2.7 compatibility in setup.py.
Even though I do follow the py3 syntax through the code, the mysql
connectivity will need some more testing with python3 in the future.
Thanks,
Bill
…On Sat, Nov 10, 2018 at 3:00 PM Jed Brown ***@***.***> wrote:
Have you tested with Python-3 (up to 3.6, as setup.py claims to support)?
The dependency on MySQL-python <https://github.com/farcepest/MySQLdb1>
(which has been abandoned for years) seems to prevent it.
Please also double-check the review criteria
<https://joss.readthedocs.io/en/latest/review_criteria.html>, especially
community guidelines (I don't see anything about contributions or support)
and documentation. I'm perplexed by the software design to use classes with
no members or data as a wrapper around a single run method that does not
access self. The docstrings are then associated with the class but meant
to apply to the method. Seems like all of these would be better as plain
functions, but documentation should go in the right place in any case.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#693 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABprdb85b-BlDnXD8_OMbskReDTq6Qubks5ut1qIgaJpZM4TfoFY>
.
|
This comment has been minimized.
This comment has been minimized.
Thanks for the updates and sorry about my slow reply in this thread. Please take a look at Packaging Python Projects and consider following the recommendations strictly unless you have a compelling reason not to (current directory structure is unusual). The paper should also include some discussion of related software. Under what circumstances is hierdenc a better choice for researchers than other software? Can you also check whether the requirements are accurate? I presume If one installs the package as is, they get
|
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
wandreopoulos
commented
Feb 13, 2019
•
Hi Lorena,
I am really sorry for the delay. I just had an extremely busy couple of months.
I will be improving this software and I plan to work on it within the next week or two.
I have been working with the software for running some data analysis in the meanwhile, but I need to get my mind to the previous comments I got and see how I can address those.
Thank you very much for your email and consideration,
Bill
…On Wed, Feb 13, 2019 at 7:53 AM Lorena A. Barba ***@***.***> wrote:
--
Thanks,
Bill
______________
William B. Andreopoulos, Ph.D.
Joint Genome Institute
LBNL
|
This comment has been minimized.
This comment has been minimized.
wandreopoulos
commented
Feb 23, 2019
•
Hello: Thank you for all your feedback. I am sorry for my terribly slow response. I just went through a hectic period.
I worked on setup.py and the directory structure, according to the link you sent me.
I added to paper.md:
I have checked the requirements and fixed them. I removed SciPy and Pandas.
The scripts/create_db.py functionality is now provided in a py script that setups the schema of the db. It is not installed under the bin directory. It can be run under the hierdenc/scripts directory.
This is fixed and the python package hierdenc is now installed as described in the README.md file (see sections 4.1-4.2). I used a virtualenv and then imported the package in python in order to check that it gets installed. |
This comment has been minimized.
This comment has been minimized.
@jedbrown -- Are you planning to move this paper into review? |
This comment has been minimized.
This comment has been minimized.
@wandreopoulos Thanks for your work on this. The install currently doesn't actually install anything, and if I |
This comment has been minimized.
This comment has been minimized.
wandreopoulos
commented
Feb 24, 2019
•
Sorry. hierdenc is provided as a package under a directory with a init.py file. wandreo-lm:hierdenc andreopo$ python
I also tried Please remember to pull or clone the latest code: git clone ssh://billandreo@git.code.sf.net/p/hierdenc/code hierdenc-code |
This comment has been minimized.
This comment has been minimized.
I tried again and it looks like an environment issue. Please try this to reproduce in a clean environment.
We change directory to force the import to use the installed version rather than the current directory. |
This comment has been minimized.
This comment has been minimized.
wandreopoulos
commented
Mar 10, 2019
Hi Jed,
I ran those commands on my mac and it finds the hierdenc package. I did
this in a new Terminal window. I ran the install with sudo though, which
added the directory to my $PATH. Was your $PATH updated?
(TESTENV) $ sudo python setup.py install
(TESTENV) $ cd /
(TESTENV) $ python -c 'from hierdenc import *'
(TESTENV) $ python
Python 2.7.15 (default, Aug 22 2018, 16:36:18)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>> from hierdenc import *
>>
(TESTENV) $ echo $PATH
/.../hierdenc/TESTENV/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin
Thanks,
Bill
…On Mon, Mar 4, 2019 at 3:10 PM Jed Brown ***@***.***> wrote:
I tried again and it looks like an environment issue. Please try this to
reproduce in a clean environment.
$ virtualenv TESTENV
$ . TESTENV/bin/activate
$ python setup.py install
$ cd /
$ python -c 'from hierdenc import *'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/jed/joss/hierdenc/hierdenc/TESTENV/lib/python3.7/site-packages/Hierdenc_billandreo-1.96-py3.7.egg/hierdenc/BandHasher.py", line 12, in <module>
from Constants import *
ModuleNotFoundError: No module named 'Constants'
We change directory to force the import to use the installed version
rather than the current directory.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#693 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABprdVx9mKNHllYdRYXYaQ5X2sF3x3Rsks5vTafwgaJpZM4TfoFY>
.
--
Thanks,
Bill
______________
William B. Andreopoulos, Ph.D.
Joint Genome Institute
LBNL
|
This comment has been minimized.
This comment has been minimized.
Oh, Python3 was the issue. It works with Python2, though that language will be unmaintained in 9 months and the libraries you use (numpy and sklearn) are dropping support sooner. What do you think about supporting Python-3? |
This comment has been minimized.
This comment has been minimized.
wandreopoulos
commented
Mar 12, 2019
•
Hi Jed, |
arfon
removed
the
TeX
label
May 11, 2019
This comment has been minimized.
This comment has been minimized.
wandreopoulos
commented
May 13, 2019
•
Hello:
I just pushed the new code on a new "sqlite3" branch. The tool works with
Python3. I used Python 3.6.8.
I replaced MySQL with sqlite3. With sqlite3 the installation on Macs will
be much easier.
Thank you for your patience,
Bill
…On Sat, Mar 9, 2019 at 4:20 PM Jed Brown ***@***.***> wrote:
Oh, Python3 was the issue. It works with Python2, though that language
will be unmaintained in 9 months and the libraries you use (numpy and
sklearn) are dropping support sooner. What do you think about supporting
Python-3?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#693 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABprdWmwl0ic3AKLBd65zCrqsQG6Fvs7ks5vVE_PgaJpZM4TfoFY>
.
--
Thanks,
Bill
______________
William B. Andreopoulos, Ph.D.
Joint Genome Institute
LBNL
|
This comment has been minimized.
This comment has been minimized.
danielskatz
commented
May 20, 2019
|
This comment has been minimized.
This comment has been minimized.
@wandreopoulos I've tried installing with your branch and the virtualenv procedure above does not work. Some issues
|
This comment has been minimized.
This comment has been minimized.
Also:
|
whedon commentedApr 23, 2018
•
edited
Submitting author: @wandreopoulos (William Andreopoulos)
Repository: https://git.code.sf.net/p/hierdenc/code
Version: v1.0
Editor: @jedbrown
Reviewer: Pending
Author instructions
Thanks for submitting your paper to JOSS @wandreopoulos. The JOSS editor (shown at the top of this issue) will work with you on this issue to find a reviewer for your submission before creating the main review issue.
@wandreopoulos if you have any suggestions for potential reviewers then please mention them here in this thread. In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission.
Editor instructions
The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type: