National Health Research Institutes National Health Research Institutes
 Home   Tutorial   About   Contact 


MamPhEA: a Web tool for Mammalian Phenotype Enrichment Analysis

1. Introduction
Mutant phenotypes characterize the consequences of disturbing the information output of a gene and thus are ideal to aid understanding of how a gene functions at the systems level. To facilitate knowledge discovery in mammalian biology, MamPhEA allows users to perform enrichment analysis on large mammalian gene sets based on the corresponding mutant phenotypes of the genes (or their mouse orthologs). The mutant phenotypes being searched for enrichment by default are predefined by MGI (Mouse Genome Informatics; http://www.informatics.jax.org/); however, they can also be user-defined for the study of complex traits. Different types of mutations impact protein functions distinctly. Therefore, to make fair comparisons, MamPhEA enables users to perform analysis on phenotypes exclusively derived from loss-of-function (null) mutations. Typically, MamPhEA compares two given gene sets by Fisher’s exact test, with Bonferroni correction for multiple tests. When one gene set is given, the gene set is compared to the rest of genes of the same genome. In order to give easily comprehensible results, MamPhEA generates graphical output displaying enriched or depleted phenotypes according to the hierarchical structure of the phenotypic classification. MamPhEA supports analysis of genes of all mammalian species with a complete or draft genome sequence.

2. Data Input
Prior to running an analysis, data must be entered in the following four sections.

(1) SOURCE OF GENE LISTS
Select the organism that possesses the input genes.

(2) GENE SETS TO BE ANALYZED
Supply MamPhEA with target gene set(s).

(3) ALTERNATIVE HYPOTHESIS
Choose the alternative hypothesis for the statistical tests.

(4) MUTANT PHENOTYPES
Specify either MGI predefined phenotypes or user-defined phenotypes for the subsequent analysis; use phenotypes derived from loss-of-function mutations in the analysis exclusively; choose M. musculus orthologs to be used for the analysis (when the input genes are from a species other than M. musculus).

NOTE: Please launch a new window (not a new tab) of your Web browser before initiating a new submission to ensure the analysis runs smoothly.

(1) SOURCE OF GENE LISTS
In order to indentify enriched/depleted mutant phenotypes of given gene sets, the organism (i.e. species) of the associated gene set(s) to be analyzed must firstly be specified. In the current version of MamPhEA, the drop-down menu (Figure 1) contains numerous insect and mammalian species, each of which has a complete or draft genomic sequence. When a species is chosen, enrichment analysis based on mutant phenotypes of phylogenetically closest model organism will be recommended. That is, MamPhEA for mammalian gene sets and DroPhEA for insect gene sets. Users can decide to proceed following the recommendation or not.


Figure 1. Specify the organism of gene set(s) to be analyzed.

(2) GENE SETS TO BE ANALYZED

Types of Analyses

Two types of enrichment analyses are available in MamPhEA. The first type detects differentially enriched phenotypes by comparing two gene sets. Users must name (A in Figure 2), and fill the text fields with two lists of genes (B in Figure 2). The second analysis type identifies enriched or depleted phenotypes of a given gene set, using the remaining genes in the genome as reference genes. The left text field is completed (B in Figure 2) with a list of genes for gene set 1, and “Rest of genome” is selected as gene set 2 (C in Figure 2). Please note that DroPhEA recognizes and automatically filters out genes of organisms other than the species selected (A in Figure 1) from the gene list. For example, the gene with the ID “NM_001033515” belongs to “Homo sapiens”. If “Mus musculus” is selected as the organism to be analyzed, NM_001033515 will be discarded from the analysis if it is included in the input gene sets.

Data input format

Users must use official gene IDs to input gene lists. Currently, DroPhEA and MamPhEA support Ensembl gene IDs only. Others, e.g. HGNC gene IDs, will appear no entry found to be associated with mutant phenotypes. For non-Ensembl gene IDs, please use the following instruction to convert.

Input list genes have to be separated by (a) a tab, (b) a return, (c) a comma, (d) a semicolon, or (e) a single space. Examples can be automatically input by clicking the links (D in Figure 2).
Input list genes have to be separated by (a) a tab, (b) a return, (c) a comma, (d) a semicolon, or (e) a single space. Examples are automatically input by clicking the link (D in Figure 2).

Figure 2. Input gene set(s).

Instruction to convert gene IDs to Ensembl


Step 1. Click the hyperlink on the top on the Ensembl home page.


Step 2.1. Select the “Ensembl Gene” database
Step 2.2. Select the source of your gene set(s)


Step 3.1. Click the “Filters”
Step 3.2. Toggle the "GENE" option


Step 4.1. Tick the “Input external reference ID list”
Step 4.2. Select your gene source from drop-down menu
Step 4.3. Copy and paste gene list
Step 4.4. After complete the form, press the "Result" button on the top and the converting result will be automatically downloaded

(3) ALTERNATIVE HYPOTHESIS
Testing an alternative hypothesis for enrichment analysis is specified in this section (Figure 3). MamPhEA first directs genes in the given gene sets (data in A and B; Figure 2) to their corresponding mouse genes (or orthologs), and identifies the associated mutant phenotypes for each gene. Based on the number of genes associated or not associated with specific phenotypes, 2×2 contingency tables are constructed for each phenotype. Fisher’s exact tests are conducted to obtain a P-value for each phenotype based on the null hypothesis. The option “differentially enriched” indicates a two-sided test for no enrichment/depletion for the examined phenotype. The options “Set 1 Enriched” or “Set 1 Depleted” represents a one-sided test, and is indicated. The P-values can be Bonferroni corrected for multiple tests.


Figure 3. Specify hypothesis for enrichment analysis.

(4) MUTANT PHENOTYPES
Selected a model organism from which the mutant phenotypes were derived from ( “mouse” to use MamPhEA; “fruit fly” to use DroPhEA ) (A in Figure 4).

In MamPhEA, select “Use loss-of-function phenotypes only” if you wish to conduct analysis exclusively on phenotype derived only from loss-of-function mutations (B in Figure 4); otherwise phenotypes derived from all mutation types will be used for subsequent analyses. Select “One-to-one orthologs only” if your analysis is exclusive to genes with one-to-one orthologs in the mouse genome (C in Figure 4). This option is provided only when the organism of interest is not M. musculus.

The phenotypes of interest must be specified to perform enrichment analyses using MamPhEA. The simplest way is to conduct a comprehensive search on phenotypic terms predefined by MGI. Alternatively, users can customize their own phenotypes by combining existing MGI phenotypic terms. See below for instructions.

Enrichment Analysis on MGI Phenotypes

The mutant phenotypic terms (MP IDs) in MGI mammalian phenotype ontology are hierarchically structured. That is, one parent MP ID (e.g., MP:0002102, abnormal ear morphology) represents a phenotype lineage that may include several child MP IDs to describe a more detailed phenotype (e.g., MP:0000026, abnormal inner ear morphology; MP:0002177, abnormal outer ear morphology). When “Enrichment Analysis on MGI-predefined Phenotypes” is chosen (D in Figure 4), users can specify a range of phenotypic levels to investigate gene lists (E in Figure 4). A higher-level MP ID represents a phenotype at a broader scale; a lower-level MP ID describes a more specific phenotype or phenotypes at a smaller scale. Analysis of more detailed phenotypes or multiple level phenotypes requires more time to obtain the results.


Figure 4. Perform analysis on phenotypic terms predefined by MGI (MP IDs).

Enrichment Analysis on User-defined Phenotypes

MamPhEA enables users to evaluate complex traits by defining phenotypes of interest; users combine existing MP IDs by selecting “Enrichment Analysis on User-defined Phenotypes” (D in Figure 4).
When “Enrichment Analysis on User-defined Phenotypes” is chosen, four sections will appear: B, C, D and E in Figure 5. The user-defined phenotypes can be customized by merging existing MP IDs by manual input, browsing phenotype ontologies, or keyword search (B in Figure 5). C in Figure 5 represents the primary MP ID selection. Users can directly fill in the text field with MP IDs, or alternatively, search the database by keywords. D is designated for temporary storage of selected phenotypes, allowing users to arrange phenotypes selected from C. To customize your own phenotypes: (1) collect and store a primary set of term IDs in “Selected Phenotype” (D in Figure 5); and (2) highlight and merge a subset of term IDs with a new given name for your defined phenotype (E in Figure 5). The procedures are shown in Figures 6 and 7.


Figure 5. User interface to customize complex phenotype(s).


Figure 6. Collect phenotypic terms from Mammalian Phenotype Ontology.


Figure 7. Create and term a customized phenotype.

3. Results
After completing data input, click the “submit” button to initiate phenotype enrichment analysis. A new page will load, indicating the data are being processed. Once data processing is complete, the results will be shown in two frames. The top frame provides the figure legend, the parameters selected by users, and the user-control interface (Figure 8). The bottom frame shows the phenotype enrichment analysis results, with two display options. The default mode, which displays the phenotypes, gives the enriched or depleted phenotypes according to the hierarchical structure of term IDs (Figure 9). The selection of “as plain text” for “Display result” in item B provides a display of enriched/depleted phenotypes as a list with statistical details. Additional output options include labeling phenotypes and screening out the phenotypes that are not differentially enriched (Figure 10). Please note that the hierarchical structure output option is not available for user-defined phenotype because all customized phenotypes were not formerly recognized and classified by MGI. MamPhEA allows you to download the processed data and results. Select the requested file, and click the download button. Your browser will prompt you with a window asking if you want to save the file in a local directory.


Figure 8. Parameters controlling the user interface.


Figure 9. Results of differentially enriched phenotypes in a hierarchical structure.


Figure 10. Listed results of differentially enriched phenotypes.

4. Citation
Meng-Pin Weng and Ben-Yang Liao. 2010. MamPhEA: a web tool for mammalian phenotype enrichment analysis. Bioinformatics 26: 2212-2213.


Division of Biostatistics and Bioinformatics
Institute of Population Health Sciences
National Health Research Institutes, Taiwan, ROC
All Rights Reserved

Last update 01/28/2016 (see more info)