Compound Discoverer Software

Turn data into knowledge

Transform your small molecule data, whether a small or large dataset, from liquid chromatography (LC), gas chromatography (GC), and ion chromatography (IC), full-scan and MSn data into insights. Thermo Scientific Compound Discoverer software offers a fully integrated suite of advanced software tools for known-parent and unknown data processing and interpretation. Compound Discoverer software streamlines compound identification, comparative analyses and provides extensive filtering and data visualization capabilities in easy to use and powerful software workflows to drive rapid insights from your valuable data.

No matter what your small molecule research application, from metabolomics to stable isotope labeling, environmental and food safety, pharma metabolite or impurity identification, extractables and leachables to forensic or clinical toxicology, and more, Compound Discoverer software offers an unparalleled toolbox to enable you to transform data into insights.

Contact us   Download free demo  Download brochure


Quick links


Key benefits of Compound Discoverer software
Reduce the number of mouse clicksKnow your unknownsFind real differences in your sample setsUnderstand biological pathways
Take control of your data analysis and processing with custom workflows, flexible visualization, and grouping tools. Share results with customizable reporting, or transfer your results directly to Thermo Scientific TraceFinder software for targeted analyses.Rapidly and confidently identify your unknowns with mass spectral library searching against both the online mzCloud spectral library, in-house Thermo Scientific mzVault spectral libraries, and numerous built-in annotation tools.Quickly find significant statistical differences between sample sets. See trends in compounds across a study or identify the key compounds of interest between multiple sample groups using interactively linked displays, including volcano plots, PCA, PLS-DA, and hierarchical clustering.Perform fully untargeted stable isotope labeling experiments, view pathways using Thermo Scientific Metabolika, KEGG, and BioCyc databases, and map detected compounds and associated information directly onto pathways.


Creating workflows with Compound Discoverer software

Compound Discoverer software provides an extensive, flexible, and customizable toolkit for processing your data. It includes pre-defined workflow templates, so you can be up and running instantly, or quickly adapt a template into a processing workflow designed specifically for your experiment.

Compound Discoverer software benefits from the power of Thermo Scientific Orbitrap-based mass spectrometers, coupled to either LC, GC, or IC separations, which deliver consistent, accurate, high-resolution data. This data enables the software to align components across samples, determine elemental compositions, make library matches and identify unknowns confidently, including built-in quality control processes.

The consistent mass accuracy and high-resolution spectral data from Orbitrap-based MS systems enables fine isotopic information to be obtained, as shown above for the compound davunavir. The resolution and accuracy provide confidence in elemental composition assignments and subsequent library and database matching, which can be further confirmed using MSn fragmentation information.

Interpreting results and delivering insights

Studies, whether simple or extensive, produce complex data that contains a wealth of information. To get the most valuable insights from that data, Compound Discoverer software offers careful data processing, followed by insightful data reviewing and linking capabilities.

Whether you are conducting single-sample analysis or extensive large-sample studies, Compound Discoverer software provides everything you need for small-molecule unknown data processing, including:

  • Unknown peak detection using multi-factorial Peak Quality factors to identify and quantify with confidence
  • Advanced statistical tools
  • Interactive data visualization capabilities
  • Compound annotation tools
  • Integrated database and mass spectral libraries
  • Biochemical pathway mapping
  • Untargeted stable isotope labelling analysis
  • Normalization tools for large studies

Regardless of study sample size, each sample contains a wealth of raw data points. Some of those data points are related to one another and many are not. Making sense of this complex, but high-quality , MS/MS, and MSn information, requires data reduction to reach meaningful insights.

Workflows can be set up using drag-and-drop capabilities, using one of the multiple application-specific templates or editing one of those templates to make data processing quick and easy. Each processing step is accounted for by a ‘node’ within a given workflow tree, which can be connected to drive data processing and interpretation based upon your study requirements; new nodes can be created using a software developer kit or custom scripts like those from R or Python, and subsequently used with the Scripting Node, tailoring workflows to your needs.

Image of a pre-defined workflow template for untargeted metabolomics
 Click image to enlargeAn example of a pre-defined workflow template for untargeted metabolomics. This template is designed to find and identify differences between samples. Each node is linked and performs a specific task. Here, retention time alignment is performed before unknown compounds are detected and grouped across all samples within the study. Elemental compositions are predicted using the accurate mass data, with compounds identified using the mzCloud mass spectral library and MS/MS information. Where there is no match from mzCloud, ChemSpider  is used. For results with a ChemSpider match, mzLogic is used to rank results by likelihood of a match. Resulting compounds are then mapped to biological pathways using Metabolika. If QC samples are present, then normalization is performed, and subsequent differential analysis calculated (t-test or ANOVA).

The Compound Discoverer software interface streamlines review of results by showing the information most relevant to the questions being asked; each plot and table is linked so that your view is instantly updated to reflect the compound or sample(s) that you are reviewing.

An image of volcano plots from differential analysis
 Click image to enlargeFrom volcano plots from differential analysis (left), S-Plots from partial least squares discriminant analysis (middle), and hierarchical clustering analysis (right), it is easy to visualize complex data sets and determine what is statistically different using Compound Discoverer software. Each plot is active, so data points selected in the plot can be marked in the results tables and vice versa, helping determine the cause of observed differences or similarities and tracking compounds in complex data sets.

Applications for Compound Discoverer software

Compound Discoverer software can be used for a variety of applications from metabolomics to environmental and food safety and drug development to forensic toxicology.

Metabolomic studies can be very complex, so ensuring acquisition of high-quality, comprehensive data is challenging, as is analyzing that data to gain insights. Ensuring complete sample coverage typically requires extensive manual work to create inclusion and exclusion lists for Data Dependent Acquisition (DDA) experiments.

AcquireX, an automated workflow, allows direct interrogation of all sample components through improved MS/MS sampling with automated background ion exclusion and data acquisition that focuses on true sample components.

Combining AcquireX with other enabling tools for Compound Discoverer software dramatically reduces the number of compounds without MS/MS spectra and significantly increases the number of compounds with confident identification and ranked putative identifications.

Stable isotope labelling can assist with untargeted metabolomic studies, and Compound Discoverer software provides a range of data review and visualization tools to support this workflow. Compound Discoverer software automatically detects labelled compounds (isotopologues) based on formulas of unlabelled compounds found in reference file(s). Once processed, the exchange rate (or rate of incorporation) can be plotted to see the response across multiple files or overlaid onto Metabolika pathways.

Compound Discoverer software includes structurally intelligent dealkylation/diarylation and general metabolism prediction capabilities that allow you to find, identify, and report metabolites of interest. Identification of impurities and degradation products follows similar workflows and relies on a range of software tools and customizable approaches to enable confident detection of related components in complex samples.

Used for structural annotation of fragmentation spectra, Fragment Ion Search (FISh) can localize the site of potential transformations in addition to enabling structural elucidation for unknowns.

The Compound Class Scoring node, provides another tool to ensure nothing is missed. It uses a set of representative fragments, created from one or more known molecules in a compound class, to identify other components that could be related or are from the same compound class.

Compound Discoverer software reduces the complexity of samples by reducing matrix interferences, as well as targeting specific compound classes through their related mass defects, so you can identify, detect, and review of complex datasets faster.

Data illustrating how MMDF can be used to simplify complex matrix samples
 Click image to enlargeUpper left shows the Total Ion Chromatogram (TIC) for a sample in bile matrix, illustrating the potential complexity and matrix interferences present; bottom left shows the resulting trace following the use of Multiple Mass Defect Filtering (MMDF) and how it can be used to effectively simplify complex matrix samples such as bile, feces, blood, and plasma. The plot on the right demonstrates how the mass defect plot can be used to visualize data and mine using Kendrick formulas, for example unknown polymer identification. All data is interactively linked between plots and data tables within Compound Discoverer software to streamline data review.

Compound Discoverer software can be used to analyze the metabolic fate and structural composition of food impurities and degradation products as well as detect environmental contaminants in soil and water. Once unknown compounds are identified in environmental and food safety studies, they often require high-throughput screening using either quadrupole or high-resolution MS-based techniques. Compound Discoverer software allows you to export your data directly to a new or existing mzVault library or targeted list to be used with Thermo Scientific TraceFinder software for screening and quantitation to reduce the burden of method transfer within your organization.

Compound Discoverer software detects unknown metabolites of drugs of abuse and structurally related designer drugs; for example many new drugs contain similar structures, and the Compound Class Scoring Node can be used to score detected compounds against common fragment ions, therefore aiding the ability to find new drugs based upon characteristic fragments. This information can transferred to screening methods to help you keep up with an ever-expanding array of new drugs and their metabolites. Once unknown compounds have been identified using any of the multiple workflows available within Compound Discoverer software, the data can be exported directly to a new or existing mzVault library, or a targeted list that can be used with Thermo Scientific TraceFinder software for screening and quantitation using either quadrupole or high-resolution MS-based techniques.

For the analysis of data acquired using Thermo Scientific GC-Orbitrap based mass spectrometers, there are two primary workflows, enabled using specific workflow nodes such as Electron Impact (EI) and Chemical Ionisation (CI) deconvolution nodes. GC-Orbitrap data can be analyzed using the extensive tools within Compound Discoverer to enable confident compound identification, or statistical analysis, for example.

Examples of two GC-based workflow trees; the first is an EI workflow that can be used to find biomarkers through statistical analysis and identify unknown compounds via library search, and the second is a CI workflow that can identify unknown compounds of interest through molecular formula determination and structural elucidation of MS/MS spectra.

When using GC-EI or -CI data within Compound Discoverer, the data analysis tools and relevant fields are easily accessed to ensure simple data review, and access to results.

The above shows GC-EI compound identification in the result view. On the upper right-hand side, a mirror plot can be seen between the deconvolved spectrum and the library spectrum. Highlighted in the second level table under Library Search Results are total score, delta mass of molecular ion and RI delta: Total score is a composite score that includes contribution from the HRF score and SI score; delta mass is the mass accuracy of the molecular ion if it is present in the deconvoluted spectrum; RI delta is the difference between the library RI and calculated RI. Based on the total score “94.9”, the less than 1 ppm delta mass of the molecular ion, and the RI delta value of one, there is very high confidence in this identification.


Enabling tools for Compound Discoverer software

Several tools come into play when it comes to understanding and interpreting comprehensive data sets. Compound Discoverer software can access numerous online and offline resources, as well as use intelligent algorithms when there is no direct spectral match to help identify an unknown compound.

  • mzCloud, an extensive online advanced mass spectral fragmentation database
  • mzLogic, a data analysis algorithm that combines the millions of available structural databases with the extensive mass spectral fragmentation library of mzCloud to rank order putative structures for unknowns when there is no direct mass spectral match
  • mzVault, a repository that can be used when you do not have online access or need to use your own proprietary libraries. It provides access to the MS/MS-level content from mzCloud, or the ability to create custom, local libraries.
  • Statistical analysis and data normalization tools for uni- and multi-variate statistical analysis

All identified compounds can be linked through these tools, making it easy to select and export data to multiple different sources for use in the next stage of analysis.

Learn more about the powerful enabling tools available for Compound Discoverer

Data acquired using GC-electron impact (EI) and chemical ionization (CI) techniques from GC data can be processed using the same tools as IC- and LC-MS data, such as unknown compound identification and statistical analysis. Understanding and interpreting comprehensive GC EI and CI data sets requires the ability to confidently deconvolute spectral data, accounting for extensive fragmentation (from EI), or potential multiple molecular ions (from CI), then subsequent identification, and analysis.

Accurate deconvolution of EI data (above) is performed to identify all the features and bin them based on the apex retention time to form compounds. Second, the user can (optionally) calculate retention indices based on the retention times of n-alkanes adjacent to the analytical peaks, to help identify compounds when performing library searches. Deconvolved spectra are searched either against unit mass libraries, such as NIST, or high-resolution accurate mass libraries, such as GC Orbitrap libraries. Cross sample peak grouping allows grouping of the same compound across multiple samples in a batch to enable subsequent statistical or comparative analyses.

Similar to EI Deconvolution, the first step performs chromatographic peak deconvolution, which is followed by molecular ion identification. The algorithm looks for [M+H]+ pseudo-molecular ions for each deconvoluted compound, with each compound being assigned additional pseudo-molecular ions, such as [M+C2H5]+, [M+C3H5]+ and [M-H]+ for methane PCI; adduct patterns help to confirm molecular ion identification.

mzCloud

Covering a wide range of small molecule applications, the extensive structural and chemical diversity of mzCloud, ensures absolute confidence in any unknown identifications.

Making use of exhaustive high-resolution MS/MS and multi-stage MSn spectra, combined with extensive metadata, the worlds largest LC-MSn reference spectral library, and most extensively curated mass spectral library delivers powerful unknown identification capabilities.

Identify more unknowns with MSn and SubTree search

More unknowns can be confidently identified with MSn and substructure spectral matching, utilizing the full power of structure retrieval from online databases or user provided structures.

How was the world's largest mass spectral fragmentation library, mzCloud, created?

The many precursor and MSn fragmentation spectra are logically organized into Spectral Trees for each compound within mzCloud. Each level of a spectral tree symbolizes an MSn stage, where the top level starts at n=1, or the precursor spectra. Each level can contain numerous spectra, as data are acquired using various different experimental conditions to ensure a broad and representative coverage of subsequent fragments, increasing the likelihood of high-quality search results.

A schematic representation of a spectral tree from mzCloud

A schematic representation of a spectral tree from mzCloud. The MS spectra are acquired for a given compound in multiple polarities (ESI +/-), and for a range of adducts. Each precursor is exhaustively fragmented using different fragmentation techniques (CID, HCD) and at multiple collision energies to produce collections of fragmentation spectra at each fragmentation level (MS2, MS3, MS4 etc.), generating a comprehensive spectral tree of information for each library entry.

The extensive data for each library entry is critical for accurate compound identifications, matching experimentally obtained data to that of the library contents, with fit confidence and data visualization provided in the Compound Discoverer and Mass Frontier data analysis software packages. Additional tools include mzLogic, which uses the extensive fragmentation information to confidently identify unknowns that cannot be identified based upon the spectral library compound entries alone.

mzLogic

What happens when you don't get a match from your library search? You can still utilize the comprehensive fragmentation information contained within mzCloud! Through spectral similarity and sub-structural information (precursor ion fingerprinting), mzLogic can take all of this information and provide you with the best candidates for your true unknowns.

When small molecule unknowns don't provide a spectral hit, how can we still identify them?

Maximize your real fragmentation data by combining spectral library similarity searching with chemical database searching.

Create, edit and search reaction pathways with Metabolika. With publication-quality graphical functionality to create and edit reaction pathways, and more than 370 curated and annotated biochemical pathways for a range of organisms included, you can easily share your pathway knowledge.

The information in Metabolika is also used for fragmentation prediction and mzLogic, further increasing the chances of unknown compound identification.

Additionally, for stable isotope labeling analyses, you can include your exchange rate (or rate of incorporation) in Metabolika to give a more comprehensive view of your pathway.

In addition to Metabolika, Compound Discoverer software supports both KEGG and BioCyc biological pathway databases. Compound mapping can be shown in two different ways: Context-specific, i.e., looking at a specific compound, you can see what pathways this compound was mapped to, or you can use the global view where you start from the list of pathways and visualize all compounds that were mapped to a given pathway. Detected compounds can be confirmed using mzCloud, for example, with the resulting data color-coded on the embedded pathways.

Your data has inherent value, as it is the knowledge that you acquire. mzVault provides you the capabilities to access and search the MS2-level spectral data from mzCloud off-line, or to store your own spectral library information. Spectral information, and your knowledge, can be automatically sent from Compound Discoverer into a new, or existing library, which can then be searched using Compound Discoverer or TraceFinder software, or edited using Thermo Scientific Mass Frontier software.

Even with extensive online structural databases, and mzLogic to propose a structure or sub-structure, unknowns may sometimes remain unknown. It can be useful to store this information alongside your libraries of previously identified, proprietary compounds, and use it to answer the question, “Have I seen this before?”

For many applications, Compound Discoverer software provides the means to confidently identify unknowns from novel environmental contaminants to designer drugs and metabolites. The next step for some of these applications can be higher throughput identification and/or quantitation using quadrupole or high-resolution MS with TraceFinder software, or further analysis with third party packages.

Good experimental design is critically important for any analysis, especially for statistical studies to ensure that any potential trends observed occur based upon real changes, rather than those which can be attributed to experimental effects. As such, there are protocols for large-scale studies where the use of pooled quality control (QC) samples are utilized to achieve normalization of these large-scale studies.

A comparison of two scatter plots of QC samples
 Click image to enlargeUsing pooled QC samples, which are analyzed throughout data acquisition, allows for the correction of batch-effects over time. Correction for each sample is performed individually, the upper plot shows a curve fitted to the QC samples, with the bottom plot showing the resulting data set after correction. This capability is based upon a peer-reviewed methodology published by Dunn et. al. in Nature Protocols. Compound Discoverer software provides the capability to also view the impact of any changes to the data pre- and post- normalization according to this protocol.

An extensive suite of powerful statistical tools within Compound Discoverer software are fully linked to help you understand what compounds/groups of data change and by how much.

Statistical analysis can be used across a range of different analyses from metabolomic, environmental, food safety and adulteration, forensics, clinical, impurities, and extractable and leachable studies; when using GC-EI and -CI data, the statistical plots chart individual compounds to aid the easy identification of features that are relevant to the analysis. The capability to perform a range of univariate and multivariate analyses from differential analysis, ANOVA, PCA through to PLS-DA, and combining the output from these tools with the results from compound identification through workflows in a highly graphical and interactive way provides deep insights into your data which can easily be reported and shared.

A screenshot illustrating how different views of the data can be connected in Compound Discoverer
 Click image to enlargeDemonstrating the connectivity of data within Compound Discoverer software, the data points highlighted by blue circles in the Volcano Plot (bottom right) are selected within the Compound Table (bottom left). Selecting any compound in any plot automatically updates all plots to show the relevant data. The interconnected tools enable you to rapidly identify differences and the compounds or groups of compounds responsible for those differences. Additionally, it streamlines follow-on confirmation by giving you the ability to filter and review the relevant data.

Compound Discoverer software offers multiple ways to visualize complex data sets and relationships, giving you the ability to add multiple plots across monitors to track and view these relationships and better understand your data.

Data displayed in three different formats, an S plot, hierarchical clustering and a box whisker chart
 Click image to enlargeFrom Principal Components Analysis (PCA) for unbiased review of data to supervised techniques like Partial Least Squares – Discriminant Analysis (PLS-DA) and the use of S Plots (left) identify compounds that give rise to any observed grouping of samples. Hierarchical Clustering (center) not only shows the clustering of samples along the x-axis, and clustering of compounds on the y-axis, it provides user-configurable heat mapping to visualize any clustering. Box Whisker Charts (right) allow visualization by groupings, time points, and more, with dynamic.

Once complex data sets are thoroughly reviewed, and the components that give rise to differences are evaluated, more substantial analyses may be required in order to verify that the changes/differences are caused by the identified compounds. Checked compounds can easily be exported from Compound Discoverer software to a range of different outputs to facilitate additional analysis. For more information see the “Custom, local libraries and data transfer” section.

Exploring the relationships among your compounds can reveal additional information and insights into your data sets. With Molecular Networks, you can interactively explore relationships between compounds in your analysis based on transformation and spectral similarity, for example a range of Phase I and Phase II transformations.

Relationships between compounds visualized using Molecular Networks
 Click image to enlargeThe fully interactive Molecular Networks visualization browser allows you to view your data in a different way. Identified compounds are shown by nodes (circles) and when a relationship is identified, the nodes are connected. Selecting a node (compound) or connection (transformation) displays pertinent information (right) about the identified compound and the relevant transformation(s). All of the visualized data can be interactively filtered using thresholds, data quality information or text search for specific compounds or transformations.

Fragment Ion Search, or FISh, provides fast screening of structurally similar compounds based on the fragmentation pattern of the parent compound acquired either by theoretical fragment prediction or experimental MSn data. The parent compound structure and its potential metabolites are used to filter out the majority of matrix-related background ions, to make identification of relevant compounds quick and easy. FISh provides extensive lists of Phase l and Phase ll biotransformations as well as the ability to build customized lists.

With the inclusion of the HighChem Fragmentation Library, which contains information from more than 52,000 fragmentation schemes, 217,000 individual reactions, 256,000 chemical structures and 216,000 decoded mechanisms from peer reviewed literature, FISh is a powerful tool to that helps make structural assignments for putative metabolites, or other potential structures. FISh uses real data to provide greater confidence when proposing fragmentation structures for putative structures and calculates a score to describe how well the fragmentation data can be explained by a given structural candidate.



Watch the videos, below, to learn more about the powerful features of Compound Discoverer software from our users and scientists.



Compound Discoverer software How to video Neutral Loss Scoring
Compound Discoverer software How to video Molecular Networking

Ordering guide for Compound Discoverer software




Resources

Scientific posters

Support

Additional resources

Style Sheet for Global Design System
CMD SchemaApp code