Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to retrieve all candidates in PubChem using MetFragR or MetFragCL #24

Open
JustinZZW opened this issue May 20, 2019 · 8 comments

Comments

Projects
None yet
2 participants
@JustinZZW
Copy link

commented May 20, 2019

Hi,
I wonder to know how to retrieve all possible candidates in PubChem with defined NeutralPrecursorMass and DatabaseSearchRelativeMassDeviation using MetFragR? I noticed in CASMI 2017, it said "The candidates were retrieved as InChI structures from PubChem (mirror dated 2017-02-03) using MetFrag 2.4.2". It is easily completed in the MetFrag webserver, but I do not how to do it in MetFragR or MetFragCL.

Thanks very much!

@JustinZZW JustinZZW changed the title How to retrieve all candidates using MetFragR or MetFragCL How to retrieve all candidates in PubChem using MetFragR or MetFragCL May 20, 2019

@schymane

This comment has been minimized.

Copy link

commented May 20, 2019

I have example functions, data and documentation here:
https://github.com/schymane/ReSOLUTION/
https://github.com/schymane/ReSOLUTION/blob/master/R/MetFragConfigR.R

We are looking at improving this our side and streamlining documentation etc, if you can install this package from github this should get you started in the interim. The IPB team also have other workflows that may be useful for you.

@JustinZZW

This comment has been minimized.

Copy link
Author

commented May 20, 2019

Thanks, Emma,
I tried the examples in ReSOLUTION package as following

peaklist_path <- system.file("extdata","EA026206_Simazine_peaks.txt",package="ReSOLUTION")
test_dir <- "I:/software/MetFrag/test_190520"
config_file <- MetFragConfig(201.0776, 
                             "[M+H]+", 
                             "Simazine_neutralMass_PubChem", 
                             peaklist_path, 
                             test_dir, 
                             DB="PubChem", 
                             neutralPrecursorMass=TRUE)
MetFragAdductTypes <- read.csv(system.file("extdata","MetFrag_AdductTypes.csv",package="ReSOLUTION"))
metfrag_dir <- "I:/software/MetFrag/"

MetFragCL_name <- "MetFrag2.4.3-CL.jar"
runMetFrag(config_file, metfrag_dir, MetFragCL_name)

It founds 1022 candidates, and return a final result with 682 candidates. However, which function should I use to export the all 1022 candidates? Or how do I modify the parameter to download all 1022 candidates?

This is the log:

INFO de.ipbhalle.metfraglib.database.OnlineExtendedPubChemDatabase - Fetching candidates from PubChem
INFO de.ipbhalle.metfraglib.database.OnlineExtendedPubChemDatabase - Fetching PubMed references
INFO de.ipbhalle.metfraglib.database.OnlineExtendedPubChemDatabase - Fetching patents
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - Got 1022 candidate(s)
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 10 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 20 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 30 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 40 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 50 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 60 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 70 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 80 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 90 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 100 %
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - Processed 951 candidate(s)
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - 71 candidate(s) were discarded before processing due to pre-filtering
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - 0 candidate(s) discarded during processing due to errors
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - 269 candidate(s) discarded after processing due to post-filtering
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - Stored 682 candidate(s)

@schymane

This comment has been minimized.

Copy link

commented May 20, 2019

The log indicates that your post processing settings are reducing the candidates:
de.ipbhalle.metfraglib.process.CombinedMetFragProcess - 269 candidate(s) discarded after processing due to post-filtering

I highly recommend you choose some examples and try them on the web interface, see what settings result in the parameter files (which you can download) and use this to choose the options that suit what you want to do.

@JustinZZW

This comment has been minimized.

Copy link
Author

commented May 20, 2019

Thanks, I can set "filter_by_InChIKey" and "filter_isotopes" as FALSE to turn off the post-filtering.

However, it still indicates 70 cancidates were discarded before processing due to pre-filter. How do I turn off this function?

Besides, I try the same netural mass in the web server with same setting, but it only retrieve 700 candidates. So I wonder to know why the difference? In the local computer, I use the MetFrag2.4.3-CL.

I attach the config, log and config in webserver.
MetFragWeb_Parameters.txt

Simazine_neutralMass_PubChem_new_2_log.txt

Simazine_neutralMass_PubChem_new_2_config.txt

@schymane

This comment has been minimized.

Copy link

commented May 20, 2019

These candidates are salts / mixtures / disconnected and cannot possibly be observed at the mass of interest and are thus excluded entirely. It is a factor of the way the data is retrieved from PubChem. Since they will not be observed at the mass you have given, it makes no sense to include them, so we have not added an "on/off" option for this case.

@schymane

This comment has been minimized.

Copy link

commented May 20, 2019

To check compatibility between web and CL version in detail, you have to compare the parameter files. Some discrepancy can arise if the web uses the local PubChem mirror versus the live online PubChem query.

@JustinZZW

This comment has been minimized.

Copy link
Author

commented May 20, 2019

Thanks a lot for the patient explanation.

I agree your point that the difference may be casued by the pubchem mirror and online query. But how can I know which it used in webserver and CL verison? I compared both parameter files, and do not find the corresponding parameter to clarify.

In addition, if the CL verison use the online query, how to make the result reproducible?

Thanks again.

@schymane

This comment has been minimized.

Copy link

commented May 20, 2019

You have to check the database parameters to see. On the web, if you tick the references box, it automatically uses the online version. I can't seem to download your parameter file from here. I just gave and posted a talk on this today, maybe the slides help a little? https://zenodo.org/record/3046373

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.