diff --git a/DESCRIPTION b/DESCRIPTION index af973548e..fbbfc59c3 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -14,7 +14,6 @@ License: Artistic-2.0 Encoding: UTF-8 LazyData: true Roxygen: list(markdown = TRUE) -RoxygenNote: 7.3.3 biocViews: MassSpectrometry, Proteomics, Software, DataImport, QualityControl Depends: R (>= 4.0) @@ -82,3 +81,4 @@ Collate: 'utils_logging.R' 'utils_shared_peptides.R' VignetteBuilder: knitr +Config/roxygen2/version: 8.0.0 diff --git a/R/utils_documentation.R b/R/utils_documentation.R index 3bf208f71..59b64b3d7 100644 --- a/R/utils_documentation.R +++ b/R/utils_documentation.R @@ -5,7 +5,11 @@ #' @param removeFewMeasurements TRUE (default) will remove the features that have 1 or 2 measurements across runs. #' @param useUniquePeptide TRUE (default) removes peptides that are assigned for more than one proteins. #' We assume to use unique peptide for each protein. -#' @param summaryforMultipleRows max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters. +#' @param summaryforMultipleRows max or sum - when multiple PSMs identify +#' the same feature within a single MS run (duplicate PSMs), use the +#' highest (max) or sum of the duplicate intensities. Default is max for +#' label-free converters and sum for TMT converters. Note that this parameter +#' does NOT control collapsing across fractions of the same biological mixture. #' @param removeProtein_with1Feature TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default. #' @param removeProtein_with1Peptide TRUE will remove the proteins which have only 1 peptide and charge. FALSE is default. #' @param removeOxidationMpeptides TRUE will remove the peptides including 'oxidation (M)' in modification. FALSE is default. diff --git a/man/DIAUmpiretoMSstatsFormat.Rd b/man/DIAUmpiretoMSstatsFormat.Rd index 60ddb5623..19f8d2339 100644 --- a/man/DIAUmpiretoMSstatsFormat.Rd +++ b/man/DIAUmpiretoMSstatsFormat.Rd @@ -38,7 +38,11 @@ DIAUmpiretoMSstatsFormat( \item{removeProtein_with1Feature}{TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{use_log_file}{logical. If TRUE, information about data processing will be saved to a file.} diff --git a/man/FragPipetoMSstatsFormat.Rd b/man/FragPipetoMSstatsFormat.Rd index f2b2e86f4..3553e61c7 100644 --- a/man/FragPipetoMSstatsFormat.Rd +++ b/man/FragPipetoMSstatsFormat.Rd @@ -27,7 +27,11 @@ We assume to use unique peptide for each protein.} \item{removeProtein_with1Feature}{TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{use_log_file}{logical. If TRUE, information about data processing will be saved to a file.} diff --git a/man/MSstatsConvert.Rd b/man/MSstatsConvert.Rd index e87142964..1a2e9ca25 100644 --- a/man/MSstatsConvert.Rd +++ b/man/MSstatsConvert.Rd @@ -23,6 +23,7 @@ signal processing tools to a format suitable for statistical analysis with the M Authors: \itemize{ + \item Anthony Wu \email{wu.anthon@northeastern.edu} \item Mateusz Staniak \email{mtst@mstaniak.pl} \item Devon Kohler \email{kohler.d@northeastern.edu} \item Meena Choi \email{mnchoi67@gmail.com} diff --git a/man/MaxQtoMSstatsFormat.Rd b/man/MaxQtoMSstatsFormat.Rd index dd22234e9..3e154cbac 100644 --- a/man/MaxQtoMSstatsFormat.Rd +++ b/man/MaxQtoMSstatsFormat.Rd @@ -34,7 +34,11 @@ MaxQtoMSstatsFormat( \item{useUniquePeptide}{TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{removeFewMeasurements}{TRUE (default) will remove the features that have 1 or 2 measurements across runs.} diff --git a/man/MaxQtoMSstatsTMTFormat.Rd b/man/MaxQtoMSstatsTMTFormat.Rd index af761b6be..aa468e64e 100644 --- a/man/MaxQtoMSstatsTMTFormat.Rd +++ b/man/MaxQtoMSstatsTMTFormat.Rd @@ -39,7 +39,11 @@ We assume to use unique peptide for each protein.} \item{rmProtein_with1Feature}{TRUE will remove the proteins which have only 1 peptide and charge. Default is FALSE.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{use_log_file}{logical. If TRUE, information about data processing will be saved to a file.} diff --git a/man/MetamorpheusToMSstatsFormat.Rd b/man/MetamorpheusToMSstatsFormat.Rd index 1e8851be7..55333e30f 100644 --- a/man/MetamorpheusToMSstatsFormat.Rd +++ b/man/MetamorpheusToMSstatsFormat.Rd @@ -36,7 +36,11 @@ We assume to use unique peptide for each protein.} \item{removeProtein_with1Feature}{TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{use_log_file}{logical. If TRUE, information about data processing will be saved to a file.} diff --git a/man/OpenMStoMSstatsFormat.Rd b/man/OpenMStoMSstatsFormat.Rd index 2f8c39710..f72239444 100644 --- a/man/OpenMStoMSstatsFormat.Rd +++ b/man/OpenMStoMSstatsFormat.Rd @@ -31,7 +31,11 @@ We assume to use unique peptide for each protein.} \item{removeProtein_with1Feature}{TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{use_log_file}{logical. If TRUE, information about data processing will be saved to a file.} diff --git a/man/OpenSWATHtoMSstatsFormat.Rd b/man/OpenSWATHtoMSstatsFormat.Rd index 74ed2a3e2..a37e507a9 100644 --- a/man/OpenSWATHtoMSstatsFormat.Rd +++ b/man/OpenSWATHtoMSstatsFormat.Rd @@ -37,7 +37,11 @@ We assume to use unique peptide for each protein.} \item{removeProtein_with1Feature}{TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{use_log_file}{logical. If TRUE, information about data processing will be saved to a file.} diff --git a/man/PDtoMSstatsFormat.Rd b/man/PDtoMSstatsFormat.Rd index 40dbe8c05..d220d9ecf 100644 --- a/man/PDtoMSstatsFormat.Rd +++ b/man/PDtoMSstatsFormat.Rd @@ -34,7 +34,11 @@ Run information. 'Run' will be matched with 'Spectrum.File'.} \item{useUniquePeptide}{TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{removeFewMeasurements}{TRUE (default) will remove the features that have 1 or 2 measurements across runs.} diff --git a/man/PDtoMSstatsTMTFormat.Rd b/man/PDtoMSstatsTMTFormat.Rd index 46a4cc561..e26438085 100644 --- a/man/PDtoMSstatsTMTFormat.Rd +++ b/man/PDtoMSstatsTMTFormat.Rd @@ -37,7 +37,11 @@ We assume to use unique peptide for each protein.} \item{rmProtein_with1Feature}{TRUE will remove the proteins which have only 1 peptide and charge. Default is FALSE.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{use_log_file}{logical. If TRUE, information about data processing will be saved to a file.} diff --git a/man/PhilosophertoMSstatsTMTFormat.Rd b/man/PhilosophertoMSstatsTMTFormat.Rd index d2c8225a7..fa2ae66a7 100644 --- a/man/PhilosophertoMSstatsTMTFormat.Rd +++ b/man/PhilosophertoMSstatsTMTFormat.Rd @@ -50,7 +50,11 @@ We assume to use unique peptide for each protein.} \item{rmProtein_with1Feature}{TRUE will remove the proteins which have only 1 peptide and charge. Default is FALSE.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{use_log_file}{logical. If TRUE, information about data processing will be saved to a file.} diff --git a/man/ProgenesistoMSstatsFormat.Rd b/man/ProgenesistoMSstatsFormat.Rd index 3d4cc7caf..aa3a8307f 100644 --- a/man/ProgenesistoMSstatsFormat.Rd +++ b/man/ProgenesistoMSstatsFormat.Rd @@ -27,7 +27,11 @@ ProgenesistoMSstatsFormat( \item{useUniquePeptide}{TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{removeFewMeasurements}{TRUE (default) will remove the features that have 1 or 2 measurements across runs.} diff --git a/man/ProteinProspectortoMSstatsTMTFormat.Rd b/man/ProteinProspectortoMSstatsTMTFormat.Rd index 9e9d49db2..e2032fcac 100644 --- a/man/ProteinProspectortoMSstatsTMTFormat.Rd +++ b/man/ProteinProspectortoMSstatsTMTFormat.Rd @@ -32,7 +32,11 @@ We assume to use unique peptide for each protein.} \item{removeProtein_with1Feature}{TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{use_log_file}{logical. If TRUE, information about data processing will be saved to a file.} diff --git a/man/SpectroMinetoMSstatsTMTFormat.Rd b/man/SpectroMinetoMSstatsTMTFormat.Rd index cca8cb167..efbf4bec3 100644 --- a/man/SpectroMinetoMSstatsTMTFormat.Rd +++ b/man/SpectroMinetoMSstatsTMTFormat.Rd @@ -36,7 +36,11 @@ We assume to use unique peptide for each protein.} \item{rmProtein_with1Feature}{TRUE will remove the proteins which have only 1 peptide and charge. Defaut is FALSE.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{use_log_file}{logical. If TRUE, information about data processing will be saved to a file.} diff --git a/man/SpectronauttoMSstatsFormat.Rd b/man/SpectronauttoMSstatsFormat.Rd index dea2e4b99..91caca283 100644 --- a/man/SpectronauttoMSstatsFormat.Rd +++ b/man/SpectronauttoMSstatsFormat.Rd @@ -75,7 +75,11 @@ We assume to use unique peptide for each protein.} \item{removeProtein_with1Feature}{TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{calculateAnomalyScores}{Default is FALSE. If TRUE, will run anomaly detection model and calculate anomaly scores for each feature. Used downstream to weigh measurements in differential analysis.} diff --git a/man/dot-getFullDesign.Rd b/man/dot-getFullDesign.Rd index a9f4ecbfc..f7b2720c3 100644 --- a/man/dot-getFullDesign.Rd +++ b/man/dot-getFullDesign.Rd @@ -13,12 +13,12 @@ \code{feature_col} and \code{measurement_col} will be created within each unique value of this column} -\item{is_tmt}{if TRUE, data will be treated as coming from TMT experiment.} - -\item{`feature_col`}{name of the column that labels features} +\item{feature_col}{name of the column that labels features} -\item{`measurement_col`}{name of a column with measurement labels - Runs in +\item{measurement_col}{name of a column with measurement labels - Runs in label-free case, Channels in TMT case.} + +\item{is_tmt}{if TRUE, data will be treated as coming from TMT experiment.} } \value{ data.table diff --git a/man/dot-sharedParametersAmongConverters.Rd b/man/dot-sharedParametersAmongConverters.Rd index c9e7c6cd7..9bf0ab9cc 100644 --- a/man/dot-sharedParametersAmongConverters.Rd +++ b/man/dot-sharedParametersAmongConverters.Rd @@ -12,7 +12,11 @@ \item{useUniquePeptide}{TRUE (default) removes peptides that are assigned for more than one proteins. We assume to use unique peptide for each protein.} -\item{summaryforMultipleRows}{max or sum - when there are multiple measurements for certain feature and certain run, use highest or sum of multiple intensities. Default is max for label-free converters and sum for TMT converters.} +\item{summaryforMultipleRows}{max or sum - when multiple PSMs identify +the same feature within a single MS run (duplicate PSMs), use the +highest (max) or sum of the duplicate intensities. Default is max for +label-free converters and sum for TMT converters. Note that this parameter +does NOT control collapsing across fractions of the same biological mixture.} \item{removeProtein_with1Feature}{TRUE will remove the proteins which have only 1 feature, which is the combination of peptide, precursor charge, fragment and charge. FALSE is default.} diff --git a/vignettes/msstats_data_format.Rmd b/vignettes/msstats_data_format.Rmd index dbf77916a..17bec0d53 100644 --- a/vignettes/msstats_data_format.Rmd +++ b/vignettes/msstats_data_format.Rmd @@ -304,13 +304,29 @@ should consists of elements named # Fractions and balanced design -Finally, after preprocessing, `MSstatsBalancedDesign` function can be applied to -handle fractions and create balanced design. -For label-free and SRM data, it means that fractionation or technical replicates will be detected if these information is not provided. Features measured in multiple fractions (overlapped) will be assigned to a unique fraction. Then, the data will be adjusted so that within each fraction, every feature has a row for certain run. If the intensity value is missing, it will be denoted by `NA`. - -For TMT data, a unique fraction will be selected for each overlapped feature and the -data will adjusted so that within each run, every feature has a row for each channel. -If the intensity is missing for a channel, it will be denoted by `NA`. +Finally, after preprocessing, the `MSstatsBalancedDesign` function can be +applied to handle fractions and create a balanced design. + +For label-free data, fractionation or technical replicates are +detected if this information is not provided. Features that overlap +across multiple fractions of the same sample are assigned to a single +fraction by the following rule: for each feature, the fraction with the +**largest number of MS runs containing a non-missing measurement** is +kept. If multiple fractions tie on that count, the tie is broken by +choosing the fraction with the **highest mean intensity**. The +remaining fractions' rows for that feature are dropped. The data are +then adjusted so that within each fraction, every feature has a row +for each run. If the intensity value is missing, it is denoted by `NA`. + +For TMT data, a unique fraction is selected for each overlapped feature +as well. After fraction selection, the data are adjusted so +that within each run, every feature has a row for each channel. If +the intensity is missing for a channel, it is denoted by `NA`. + +Note also that this fraction-collapsing logic is distinct from the +`summaryforMultipleRows` argument on each converter, which only +combines duplicate PSMs identifying the same feature within a single +MS run. ```{r } maxquant_balanced = MSstatsBalancedDesign(maxquant_processed, feature_columns)