Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize function names (removePeaks, clean) #89

Open
schymane opened this issue Jan 21, 2020 · 14 comments
Open

optimize function names (removePeaks, clean) #89

schymane opened this issue Jan 21, 2020 · 14 comments

Comments

@schymane
Copy link

@schymane schymane commented Jan 21, 2020

Discussion at metaRbolomics, these two functions are a bit of a misnoma:

https://github.com/rformassspectrometry/Spectra/blob/master/R/Spectra.R#L401
"removePeaks" is rather "markPeaks" or something (it doesn't actually remove peaks)

`#' - `removePeaks`: *removes* peaks lower or equal to a threshold intensity
#'   value `threshold` by setting their intensity to `0`. With the default
#'   `threshold = "min"` all peaks with an intensity smaller or equal to the
#'   minimal non-zero intensity is set to `0`. If the spectrum is in
#'   profile mode, ranges of successive non-0 peaks <= `threshold` are set to 0.
#'   If the spectrum is centroided, then individual peaks <= `threshold` are set
#'   to 0. Note that the number of peaks is not changed unless `clean` is
#'   called after `removePeaks`.`

Whereas "clean" is not really clean but more analogous to trim

https://github.com/rformassspectrometry/Spectra/blob/master/R/Spectra.R#L412

`#' @param all for `clean`: `logical(1)` whether all 0 intensity peaks should be
#'     removed (`TRUE`) or whether 0-intensity peaks directly adjacent to a
#'     non-zero intensity peak should be kept (`FALSE`).`
sneumann added a commit to sneumann/Spectra that referenced this issue Jan 21, 2020
Related to discussions here at the RFMF workshop (see rformassspectrometry#89 )
Yours, Steffen
@jorainer jorainer added this to the bioc_submission milestone Jun 18, 2020
@jorainer
Copy link
Member

@jorainer jorainer commented Jun 18, 2020

I totally agree that the current clean and removePeaks have misleading names.

I'm OK with renaming clean to trim.

For removePeaks instead of renaming, what if we make the function actually remove the peaks?

@lgatto @sgibb, any thoughts on that?

@lgatto
Copy link
Member

@lgatto lgatto commented Jun 18, 2020

Indeed, very confusing, and I take the blame for this many years ago.

I am not convinced about renaming clean() to trim(). The latter evokes more filterMz and is used as such in MALDIquant.

I would suggest the following changes, including in the signature of removePeaks:

  • change removePeaks(object, threshold = "min", msLevel. = unique(msLevel(object))) to setIntsToZero(object, threshold = minIntensity(), msLevel. = unique(msLevel(object))) (or setToZero) where minIntensity() is a function that returns a vector minimum intensities of a Spectra object. This will simplify the code since threshold will always be numeric.

  • change clean(object, all = FALSE, msLevel. = unique(msLevel(object))) to removeZeroInts (or rmZeroInts) with the same signature.

These names are a bit longer but avoid any ambiguity.

@jorainer
Copy link
Member

@jorainer jorainer commented Jun 18, 2020

Do we need both functions? What are the use cases for them? I used them only in combinations, i.e. first removePeaks followed by a clean (with all = TRUE) to remove peaks which are below a certain threshold. For that we could even have a filterIntensity function instead.

@lgatto
Copy link
Member

@lgatto lgatto commented Jun 18, 2020

  1. I think keeping a trace of peaks that are were set to zero is important. Especially those around non-zero peaks (the all argument).

  2. I like your suggestion to create a filter function for intensities. What about

filterIntensities(object, 
                  threshold = minIntensity(), 
                  all = TRUE, 
                  msLevel. = unique(msLevel(object)))

that filters (i.e. completely remove) peaks and

setIntsToZero(object, 
              threshold = minIntensity(), 
              msLevel. = unique(msLevel(object)))

that sets them to zero.

@jorainer
Copy link
Member

@jorainer jorainer commented Jun 18, 2020

I like the filterIntensities function.

For setIntsToZero, what about using instead:

replaceIntensityBelow(object, 
                                     threshold = min,
                                     value = 0, 
                                     msLevel. = unique(msLevel(object))

with value the user has then the possibility to choose the replacement value.

In addition (don't know if that's too much then) we could also have a filterEmptyPeaks (along the filterEmptySpectra that does then what the clean did. filterIntensity could then be defined as:

filterIntensity(object,
                       intensity = c(min, Inf),
                       msLevel. = unique(msLevel(object)))

This function would then match the filterRt and filterMz functions that also take ranges of values.

@lgatto
Copy link
Member

@lgatto lgatto commented Jun 18, 2020

  • Ok for replaceIntensityBelow as suggested.

  • Not sure what the c(min, Inf) means - we want to filter Intensities <= min.

  • Would filterEmptyPeaks be a shortcut for filterIntensity(object, intensity = 0)?

@jorainer
Copy link
Member

@jorainer jorainer commented Jun 19, 2020

the c(min, Inf) was to support also ranges of values instead of a single cut-off to be consistent with filterRt and filterMz that both allow to specify a lower and upper limit - not sure if that would be of interest also for intensities, but you never know...

And yes, filterEmptyPeaks would be a shortcut for intensity = 0. Also, filterIntensity (and filterEmptyPeaks) should remove NA intensities too.

@lgatto
Copy link
Member

@lgatto lgatto commented Jun 19, 2020

  • filterRt(c(min, max)) means keep retention times between min and max.
  • filterMz(c(min, max))` means keep m/z between min and max.
  • filterIntensity(c(min, max) would mean keep intensities min and max instead of filterIntensity(min) that would keep intensities > min. Given that removing high intensities isn't something I came across, using a range feels confusing and will lead to more unnecessary typing. But if it's something you feel is actually useful and used, then ok.
@schymane
Copy link
Author

@schymane schymane commented Jun 19, 2020

I could see plausible use cases for filtering max I as well (e.g. signal suppression / saturation was encountered, or you already analysed the Top X peaks and want to look at the middle ground) and would be pro leaving this in for enhanced flexibility. One could always design the function so that if only one value is given it is implicitly min but if two are provided it is c(min,max)?

@lgatto
Copy link
Member

@lgatto lgatto commented Jun 19, 2020

Ok, thank you @schymane.

@jorainer - do you think it's worth having such an ambiguity in the intensity parameter to handle lengths 1 and 2, or should we simply go for a length of 2 with the extra typing?

@schymane
Copy link
Author

@schymane schymane commented Jun 19, 2020

Hmm can you do it so that it asks for both parameters officially, but provides the implicit shortcut of setting max=Inf if only one value is provided, for those that don't want to type?

@lgatto
Copy link
Member

@lgatto lgatto commented Jun 19, 2020

  • We could either have two separate min and max parameters, but that would be different from the other filter functions.
  • We could have a single intensities parameter and in the function body check its lengths, something along the lines of
if (length(intensities) == 1) {
   min_int <- intensities
   max_int <- Inf
} else if (length(intensities) == 2) {
   min_int <- intensities[1]
   max_int <- intensities[2]
} else {
   stop("intensities must be of length 1 or 2. See man page for details."
}

I am in favour of the second option.

@schymane
Copy link
Author

@schymane schymane commented Jun 19, 2020

+1 for the latter

@jorainer
Copy link
Member

@jorainer jorainer commented Jun 19, 2020

OK, summarizing: we agreed on:

  • replaceIntensitiesBelow
  • filterIntensity with parameter intensity either length 1 or 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.