- OBOBrowseA - OBO Browse and Annotate
The software allows to load and display OBO files in tree or graph representation. It further enables the user to interactively browse through the onotology, search for ontology classes and annotate textual data.
- A new data mining approach for profiling and categorizing kinetic patterns of metabolic biomarkers after myocardial injury
The discovery of new and unexpected biomarkers in cardiovascular disease is a highly data-driven process that requires the complementary power of modern metabolite profiling technologies, bioinformatics and biostatistics. Clinical biomarkers of early myocardial injury are lacking. A prospective biomarker cohort study was carried out to identify, categorize, and profile kineticpatterns of early metabolic biomarkers of planned (PMI) and spontaneous (SMI) myocardial infarction. We applied a targeted MS-based metabolite profiling platform to serial blood samples drawn from carefully phenotyped patients undergoing alcohol septal ablation for hypertrophic obstructive cardiomyopathy serving as a human model of PMI. Patients with SMI and patients undergoing catheterization without induction of myocardial infarction served as
positive and negative controls to assess generalizability of markers identified in PMI.
To identify metabolites of high predictive value in MS/MS data, we introduced a new feature selection method for the categorization of metabolic signatures into three classes of weak, moderate and strong predictors which can be easily applied to both paired and unpaired samples. Our paradigm outperformed standard null-hypothesis significance testing and other popular methods for feature selection in terms of the area under the ROC curve and the product of sensitivity and specificity. Our results emphasize that this new method was able to identify, classify and validate alterations in levels in multiple metabolites participating in pathways associated with myocardial injury as early as 10 minutes after PMI.
Baumgartner et al., Bioinformatics, 2010; in press.
- Improving Phosphopeptide/Protein Identification Using a New Mining Framework for MS/MS Spectra Preprocessing
Phosphopeptide/protein identification using tandem mass spectrometry (MS/MS) is a challenging issue in proteomics research. In particular, phosphopeptides typically exhibit low intensity peaks of b and y ions in spectra when serine or threonine is phosphorylated. Consequently, the existing algorithms for peptide and protein identification generate a high false discovery rate when coping with phosphopeptide spectra. In order to increase the number of correct phosphopeptide identifications using database search, a new data mining approach for spectra preprocessing is proposed. A support vector machine classifier is used to calculate the probability of
each peak representing a b or y ion. Next, low-probability peaks are removed from spectra, while remaining peaks have their intensities enhanced. As a result, a huge increase in signal-to-noise ratio is provided and the chances of detecting important peaks are significantly advanced. Experiments using MASCOT and SEQUEST along with Peptide/ProteinProphet and a decoy database approach showed a significant improvement in the sensitivity of phosphopeptide identification without compromising specificity, demonstrating that our new strategy for MS/MS spectra preprocessing is a powerful proteomics tool for enhancing phosphopeptide identifications.
Cerqueira et al.,J Proteomics Bioinform 2009;2:150-164.
- A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry (IMR-MS)
Alcoholic fatty liver disease (AFLD) and nonalcoholic fatty liver disease (NAFLD) can progress to severe liver diseases such as steatohepatitis, cirrhosis and cancer. Thus, the detection of early liver disease is essential; however, minimal invasive diagnostic methods in clinical hepatology still lack specificity.
Ion molecule reaction mass spectrometry (IMR-MS) was applied to a total of 126 human breath gas samples comprising 91 cases (AFLD, NAFLD and cirrhosis) and 35 healthy controls. A new feature selection modality termed Stacked Feature Ranking (SFR) was developed to identify potential liver disease marker candidates in breath gas samples, relying on the combination of different entropy-, correlation- and t-test- based feature ranking methods using a two-level architecture with a suggestion and a decision layer. We benchmarked SFR against four single feature selection methods, a wrapper and a recently described ensemble method, indicating a significantly higher discriminatory ability of up to 10-15% for the SFR selected gas compounds expressed by the area under the ROC curve of AUC=0.85-0.95. Using this approach, we were able to identify unexpected breath gas marker candidates in liver disease of high predictive value. A literature study further supports top ranked markers to be associated with liver disease. We propose SFR as a powerful tool for biomarker search in breath gas and other biological samples using mass spectrometry.
Netzer et al., Bioinformatics, 2009;25(7):941-947.
- A new rule-based algorithm for identifying metabolic markers in prostate cancer using tandem mass spectrometry
Prostate cancer is the most prevalent tumor in males and its incidence is expected to increase as the population ages. Prostate cancer is treatable by excision if detected at an early enough stage. The challenges of early diagnosis require the discovery of novel biomarkers and tools for prostate cancer management. A novel feature selection algorithm termed associative voting (AV) was developed for identifying biomarker candidates in prostate cancer data measured via targeted metabolite profiling MS/MS analysis. We benchmarked our algorithm against two standard entropy-based and correlation-based feature selection methods (Information Gain and ReliefF) and observed that, on a variety of classification tasks in prostate cancer diagnosis, our algorithm identified subsets of biomarker candidates that are both smaller and show higher discriminatory power than the subsets identified by Information Gain and ReliefF. A literature study confirms that the highest-ranked biomarker candidates identified by AV have independently been identified as important factors in prostate cancer development.
Osl et al., Bioinformatics, 2008;24(24):2908-2914.
- SeMoP: A New Computational Strategy for the Unrestricted Search for Modified Peptides Using LC-MS/MS Data
SeMoP strategy enables the unrestricted discovery and verification of peptide modifications using LC-MS/MS data. SeMoP relies on coupling standard database searching with a new algorithm for an unrestricted search of peptide modifications. Interesting modifications found in unrestricted search are targeted in a standard database search to verify modified peptides. Various modifications, including post-translational modifications, sequence polymorphisms, as well as sample handling-induced changes, can be identified using this approach.
Baumgartner et al., J Proteome Res, 2008;7(9):4199-208.
- LCF: Instance based classification with local density
Classification is an important data mining task in biomedicine. In particular, classification on biomedical data often claims the separation of pathological and healthy samples with highest discriminatory performance for diagnostic issues. Even more important than the overall accuracy is the balance of a classifier, particularly if data sets of unbalanced class size are examined. A novel instance-based classification technique was developed which takes both information of different local density of data objects and local cluster structures into account. Our method, which adopts the basic ideas of density based outlier detection, determines the local point density in the neighborhood of an object to be classified and of all clusters in the corresponding region. A data object is assigned to that class where it fits best into the local cluster structure. The experimental evaluation on biomedical data demonstrates that our approach outperforms most popular classification methods.
Plant et al., Bioinformatics, 2006;22(8):981-8.