Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upmax candidates parameter ? #25
Comments
This comment has been minimized.
This comment has been minimized.
schymane
commented
Feb 25, 2019
There is no explicit max candidates parameter at this stage as far as I am aware, this is also somewhat problematic as candidates are retrieved effectively randomly, so if you only retrieve a portion, you may be missing “the one”. My gut feeling is that generally above 1000 candidates is difficult to pick the right answer, and above 10,000 candidates the chances of a good result is minimal, but a lot of this depends on your metadata and context, as well as, of course, on the quality and distinctiveness of the fragmentation spectrum.
If you search with massive mass windows, you will retrieve far too many candidates and need to restrict the search window further, if possible. Through the web interface, you retrieve candidates first, can see the numbers, then decide whether to proceed. If you use command line, I would recommend to run your queries on a small database (KEGG, HMDB etc) before jumping for PubChem, so you can grasp the possible candidate numbers first (candidate numbers are orders of magnitude greater on PubChem and ChemSpider than the smaller databases).
Note that since the scoring is normalized, and MetFrag now has very many scoring options, providing a generalised score cut-off is challenging as this depends on your settings every time … if you have only fragmentation, you can still get score 1 with a very poor match, if all other candidates are even worse …
Some reading:
https://jcheminf.biomedcentral.com/articles/10.1186/s13321-016-0115-9
http://www.chemconnector.com/2017/12/08/guest-post-by-emma-schymanski-suspect-screening-with-metfrag-and-the-comptox-chemistry-dashboard/
Hope that helps for now…
|
This comment has been minimized.
This comment has been minimized.
Thanks for your answer ! I understand that all depends of the fragmentation spectrum and what I would like to do is running a full data directly and in command line... I saw that there is some preprocess filters and I can also add some post process filters but to reduce computing time only preprocess filters looks interesting ! Thanks again for your answer ! |
jsaintvanne
closed this
Mar 1, 2019
This comment has been minimized.
This comment has been minimized.
schymane
commented
Mar 1, 2019
You may (or may not!) be interested in some of the functions I’ve played around with here:
https://github.com/schymane/ReSOLUTION
https://github.com/schymane/ReSOLUTION/blob/master/R/MetFragConfigR.R
In my latest scripts (not yet online) I ended up hard coding a limit based on file size, but I ran all the MetFrag results in advance, it was rather the function I used post processing couldn’t deal with the file size of the excels generated. Someone has pointed me to a better one, but I have not yet had a chance to try it.
I basically run my dataset on a small database first before trying PubChem, and just leave the long stuff running as long as it needs … I have more than enough to keep me busy while waiting for results so have not really optimized it time wise ;-)
If you are using command line and PubChem and will be doing it a lot, it may be worth investigating using PubChem as a local database to increase speed (it’s what the web interface does); just this will not work if you need patent and reference information (this has to be retrieved via the web services and is not in the downloaded mirror). I am not the right person to ask about having PubChem as a local database, I have not done this ;-) as I need the refs and patents, but the experts in Halle can help you there!
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
jsaintvanne commentedFeb 20, 2019
Hello !
I'm searching for a max candidates parameter. Does it exist ?
If not how can I have a kind of threshold on score or something to have a better time process ?
Thanks you for your answers !
Bests,
Julien