Permalink
Browse files

Merge pull request #3 from CDK-R/master

latest updates
  • Loading branch information...
schymane committed Nov 11, 2018
2 parents 6b01c3b + a9332c6 commit 5d074e5031a2b04c99d58364b0a59ed20143f738
Showing with 244 additions and 137 deletions.
  1. +1 −1 README.md
  2. +53 −0 rcdk/R/fingerprint.R
  3. +32 −0 rcdk/R/frags.R
  4. +0 −49 rcdk/man/frags.Rd
  5. +36 −0 rcdk/man/get.exhaustive.fragments.Rd
  6. +72 −0 rcdk/man/get.fingerprint.Rd
  7. +50 −0 rcdk/man/get.murcko.fragments.Rd
  8. +0 −87 rcdk/man/getfp.Rd
@@ -54,4 +54,4 @@ sudo R CMD javareconf
install.packages('rJava', type="source")
```
Further informaiton about R's use of Java can be [found here](https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Java-support).
Further informaiton about R's use of Java can be [found here](https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Java-support).
@@ -1,3 +1,56 @@
#' Generate molecular fingerprints
#'
#' `get.fingerprint` returns a `fingerprint` object representing molecular fingerprint of
#' the input molecule.
#'
#' @param molecule A \code{jobjRef} object to an \code{IAtomContaine}
#' @param type The type of fingerprint. Possible values are:
#' \itemize{
#' \item standard - Considers paths of a given length. The default is
#' but can be changed. These are hashed fingerprints, with a
#' default length of 1024
#' \item extended - Similar to the standard type, but takes rings and
#' atomic properties into account into account
#' \item graph - Similar to the standard type by simply considers connectivity
#' \item hybridization - Similar to the standard type, but only consider hybridization state
#' \item maccs - The popular 166 bit MACCS keys described by MDL
#' \item estate - 79 bit fingerprints corresponding to the E-State atom types described by Hall and Kier
#' \item pubchem - 881 bit fingerprints defined by PubChem
#' \item kr - 4860 bit fingerprint defined by Klekota and Roth
#' \item shortestpath - A fingerprint based on the shortest paths between pairs of atoms and takes into account ring systems, charges etc.
#' \item signature - A feature,count type of fingerprint, similar in nature to circular fingerprints, but based on the signature
#' descriptor
#' \item circular - An implementation of the ECFP6 fingerprint
#' }
#' @param fp.mode The style of fingerprint. Specifying "`bit`" will return a binary fingerprint,
#' "`raw`" returns the the original representation (usually sequence of integers) and
#' "`count`" returns the fingerprint as a sequence of counts.
#' @param depth The search depth. This argument is ignored for the
#' `pubchem`, `maccs`, `kr` and `estate` fingerprints
#' @param size The final length of the fingerprint.
#' This argument is ignored for the `pubchem`, `maccs`, `kr`, `signature`, `circular` and
#' `estate` fingerprints
#' @param verbose Verbose output if \code{TRUE}
#' @return an S4 object of class \code{\link{fingerprint-class}} or \code{\link{featvec-class}},
#' which can be manipulated with the fingerprint package.
#' @export
#' @author Rajarshi Guha (\email{rajarshi.guha@@gmail.com})
#' @examples
#' ## get some molecules
#' sp <- get.smiles.parser()
#' smiles <- c('CCC', 'CCN', 'CCN(C)(C)', 'c1ccccc1Cc1ccccc1','C1CCC1CC(CN(C)(C))CC(=O)CC')
#' mols <- parse.smiles(smiles)
#'
#' ## get a single fingerprint using the standard
#' ## (hashed, path based) fingerprinter
#' fp <- get.fingerprint(mols[[1]])
#'
#' ## get MACCS keys for all the molecules
#' fps <- lapply(mols, get.fingerprint, type='maccs')
#'
#' ## get Signature fingerprint
#' ## feature, count fingerprinter
#' fps <- lapply(mols, get.fingerprint, type='signature', fp.mode='raw')
get.fingerprint <- function(molecule, type = 'standard', fp.mode = 'bit', depth=6, size=1024, verbose=FALSE) {
if (is.null(attr(molecule, 'jclass'))) stop("Must supply an IAtomContainer or something coercable to it")
if (attr(molecule, "jclass") != "org/openscience/cdk/interfaces/IAtomContainer") {
@@ -1,3 +1,32 @@
#' Generate Bemis-Murcko Fragments
#'
#' Fragment the input molecule using the Bemis-Murcko scheme
#'
#' A variety of methods for fragmenting molecules are available ranging from
#' exhaustive, rings to more specific methods such as Murcko frameworks. Fragmenting a
#' collection of molecules can be a useful for a variety of analyses. In addition
#' fragment based analysis can be a useful and faster alternative to traditional
#' clustering of the whole collection, especially when it is large.
#'
#' Note that exhaustive fragmentation of large molecules (with many single bonds) can become
#' time consuming.
#'
#' @param mols A list of `jobjRef` objects of Java class `IAtomContainer`
#' @param min.frag.size The smallest fragment to consider (in terms of heavy atoms)
#' @param as.smiles If `TRUE` return the fragments as SMILES strings. If not, then fragments
#' are returned as `jobjRef` objects
#' @param single.framework If `TRUE`, then a single framework (i.e., the framework consisting of the
#' union of all ring systems and linkers) is returned for each molecule. Otherwise, all combinations
#' of ring systems and linkers are returned
#' @return Returns a list with each element being a list with two elements: `rings` and
#' `frameworks`. Each of these elements is either a character vector of SMILES strings or a list of
#' `IAtomContainer` objects.
#' @author Rajarshi Guha (\email{rajarshi.guha@@gmail.com})
#' @seealso [get.exhuastive.fragments()]
#' @examples
#' mol <- parse.smiles('c1ccc(cc1)CN(c2cc(ccc2[N+](=O)[O-])c3c(nc(nc3CC)N)N)C')[[1]]
#' mf1 <- get.murcko.fragments(mol, as.smiles=TRUE, single.framework=TRUE)
#' mf1 <- get.murcko.fragments(mol, as.smiles=TRUE, single.framework=FALSE)
get.murcko.fragments <- function(mols, min.frag.size = 6, as.smiles = TRUE, single.framework = FALSE) {
if (!is.list(mols)) mols <- list(mols)
klasses <- unlist(lapply(mols, function(x) attr(x, "jclass")))
@@ -22,6 +51,9 @@ get.murcko.fragments <- function(mols, min.frag.size = 6, as.smiles = TRUE, sing
return(ret)
}
#' @inherit get.murcko.fragments
#' @return returns a list of length equal to the number of input molecules. Each
#' element is a character vector of SMILES strings or a list of `jobjRef` objects.
get.exhaustive.fragments <- function(mols, min.frag.size = 6, as.smiles = TRUE) {
if (!is.list(mols)) mols <- list(mols)
klasses <- unlist(lapply(mols, function(x) attr(x, "jclass")))

This file was deleted.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.
Oops, something went wrong.

0 comments on commit 5d074e5

Please sign in to comment.