National Health Research Institutes
 Home   Tutorial   About   Contact 

DroPhEA: Drosophila phenotype enrichment analysis for insect functional genomics

1. Introduction
Mutant phenotypes represent the consequences of altering the information output of a gene, and are therefore ideal to aid in our understanding of how a gene functions at the organismal level. DroPhEA enables users to perform enrichment analysis on large gene sets based on two major types of mutant phenotypes of fruitfly (Drosophila melanogaster) genes (or orthologs): (i) A lexicon of ‘phenotypic class’ represents the pathology, or the effect of the mutation on the whole organism (e.g., lethal, sterile), and (ii) A lexicon of ‘anatomy’ describes the body part marked by the mutation (e.g., eye, antenna). Hierarchical structured phenotypic descriptions in “phenotypic classes” or “anatomy” are respectively represented as FBcv terms or FBbt terms in FlyBase ( In addition to phenotypes predefined by FlyBase, DroPhEA allows users to customize their own phenotypes for the study of complex traits. Different types of mutations exhibit a distinct impact on protein function(s). Therefore, for unbiased comparisons, DroPhEA enables users to perform analyses that exclude phenotypes derived from gain-of-function mutations. Typically, DroPhEA compares two given gene sets by Fisher’s exact test, with Bonferroni correction for multiple tests. When one gene set is given, it is compared to the remaining genes of the same genome. DroPhEA provides easily comprehensible results by generating graphical output that displays enriched or depleted phenotypes according to the hierarchical structure of the phenotypic classification. DroPhEA supports analysis of genes of all insect species with a complete or draft genome sequence.

2. Data Input
Prior to running an analysis, data must be entered in the following four sections.

Select the organism that possesses the input genes.

Supply DroPhEA with target gene set(s).

Choose the alternative hypothesis for the statistical tests.

Specify either FlyBase predefined phenotypes or user-defined phenotypes for the subsequent analysis; exclude phenotypes derived from gain-of-function mutations; choose D. melanogaster orthologs to be used for the analysis (when the input genes are from a species other than D. melanogaster ).

NOTE: Please launch a new window (not a new tab) of your Web browser before initiating a new submission to ensure the analysis runs smoothly.

In order to indentify enriched/depleted mutant phenotypes of given gene sets, the organism (i.e. species) of the associated gene set(s) to be analyzed must firstly be specified. In the current version of DroPhEA, the drop-down menu (Figure 1) contains numerous insect and mammalian species, each of which has a complete or draft genomic sequence. When a species is chosen, enrichment analysis based on mutant phenotypes of phylogenetically closest model organism will be recommended. That is, DroPhEA for insect gene sets and MamPhEA for mammalian gene sets. Users can decide to proceed following the recommendation or not.

Figure 1. Specify the organism of gene set(s) to be analyzed.


Types of Analyses

Two types of enrichment analyses are available in DroPhEA. The first type detects differentially enriched phenotypes by comparing two gene sets. Users must name (A in Figure 2), and fill the text fields with two lists of genes (B in Figure 2). The second analysis type identifies enriched or depleted phenotypes of a given gene set, using the remaining genes in the genome as reference genes. The left text field is completed (B in Figure 2) with a list of genes for gene set 1, and “Rest of genome” is selected as gene set 2 (C in Figure 2). Please note that DroPhEA recognizes and automatically filters out genes of organisms other than the species selected (A in Figure 1) from the gene list. For example, the gene with ID “BGIBMGA006855” is referenced to the silkworm Bombyx mori. If D. melanogaster is the organism selected in A of Figure 1, BGIBMGA006855 will be discarded from the input gene sets prior to the analysis.

Data input format

Users must use official gene IDs to input gene lists. Currently, DroPhEA and MamPhEA support Ensembl gene IDs only. Others, e.g. HGNC gene IDs, will appear no entry found to be associated with mutant phenotypes. For non-Ensembl gene IDs, please use the following instruction to convert.

Input list genes have to be separated by (a) a tab, (b) a return, (c) a comma, (d) a semicolon, or (e) a single space. Examples can be automatically input by clicking the links (D in Figure 2).

Figure 2. Input gene set(s).

Instruction to convert gene IDs to Ensembl

Step 1. Click the hyperlink on the top on the Ensembl home page.

Step 2.1. Select the “Ensembl Gene” database
Step 2.2. Select the source of your gene set(s)

Step 3.1. Click the “Filters”
Step 3.2. Toggle the "GENE" option

Step 4.1. Tick the “Input external reference ID list”
Step 4.2. Select your gene source from drop-down menu
Step 4.3. Copy and paste gene list
Step 4.4. After complete the form, press the "Result" button on the top and the converting result will be automatically downloaded

Testing an alternative hypothesis for enrichment analysis is specified in this section (Figure 3). DroPhEA first directs genes in the given gene sets (data in A and B; Figure 2) to their corresponding fly genes (or orthologs), and identifies the associated mutant phenotypes for each gene. Based on the number of genes associated or not associated with specific phenotypes, 2×2 contingency tables are constructed for each phenotype. Fisher’s exact tests are conducted to obtain a P-value for each phenotype based on the null hypothesis. The option “differentially enriched” indicates a two-sided test for no enrichment/depletion for the examined phenotype. The options “Set 1 Enriched” or “Set 1 Depleted” represents a one-sided test, and is indicated. The P-values can be Bonferroni corrected for multiple tests.

Figure 3. Specify hypothesis for enrichment analysis.

Selected a model organism from which the mutant phenotypes were derived from (“fruit fly” to use DroPhEA or “mouse” to use MamPhEA) (A in Figure 4).

In DroPhEA, select “Remove phenotypic entries caused by gain-of-function mutation(s)” if you wish phenotypic analysis to exclude traits derived from gain of function mutations (B in Figure 4); otherwise phenotypes derived from all mutation types will be used for subsequent analyses. Select “One-to-one orthologs only” if your analysis is exclusive to genes with one-to-one orthologs in the fly genome (C in Figure 4). This option is provided only when the organism of interest is not D. melanogaster .

The phenotypes of interest must be specified to perform enrichment analyses using DroPhEA. The simplest way is to conduct a comprehensive search on phenotypic terms predefined by FlyBase (FBcv or FBbt terms). Alternatively, users can customize their own phenotypes by combining existing FBcv or FBbt terms. See below for instructions.

Enrichment Analysis on FlyBase Phenotypes

FBcv and FBbt terms are hierarchically structured. One parent ID (e.g., FBbt:00000006, head segment) represents a phenotypic lineage that may include several progeny IDs to describe a more detailed phenotype (e.g., FBbt:00000007, procephalic segment; FBbt:00000011, gnathal segment; FBbt:00000157, embryonic head segment; FBbt:00001732, larval head segment; FBbt:00003009, adult head segment). When “Enrichment Analysis on FlyBase Phenotypes” is chosen (D in Figure 4), users can specify a range of phenotypic levels to investigate (E in Figure 4). A higher-level FBcv or FBbt term represents a phenotype at a broader scale; a lower-level FBcv or FBbt term describes a more specific phenotype or phenotypes at a smaller scale. Analysis of more detailed phenotypes or multiple level phenotypes requires more time to obtain the results.

Figure 4. Perform analysis on phenotypes predefined by the FlyBase control lexicons.

Enrichment Analysis on User-defined Phenotypes

DroPhEA enables users to evaluate complex traits by defining phenotypes of interest; users combine existing FBcv or FBbt terms by selecting “Enrichment Analysis on User-defined Phenotypes” (D in Figure 4).
When “Enrichment Analysis on User-defined Phenotypes” is chosen, four sections will appear: B, C, D and E in Figure 5. The user-defined phenotypes can be customized by merging existing FBcv or FBbt terms by manual input, browsing phenotype ontologies, or keyword search (B in Figure 5). C in Figure 5 represents the primary control lexicon and anatomical ID selection. Users can directly fill in the text field with FBcv or FBbt term IDs, or alternatively, search the database by keywords. D is designated for temporary storage of selected phenotypes, allowing users to arrange phenotypes selected from C. To customize your own phenotypes: (1) collect and store a primary set of term IDs in “Selected Phenotype” (D in Figure 5); and (2) highlight and merge a subset of term IDs with a new given name for your defined phenotype (E in Figure 5). The procedures are shown in Figures 6 and 7.

Figure 5. User interface to customize complex phenotype(s).

Figure 6. Collect phenotypic terms from “anatomy” FBbt terms to “Selected Phenotype”.

Figure 7. Create and term a customized phenotype.

3. Results
After completing data input, click the “submit” button to initiate phenotype enrichment analysis. A new page will load, indicating the data are being processed. Once data processing is complete, the results will be shown in two frames. The top frame provides the figure legend, the parameters selected by users, and the user-control interface (Figure 8). The bottom frame shows the phenotype enrichment analysis results, with two display options. The default mode, which displays the phenotypes, gives the enriched or depleted phenotypes according to the hierarchical structure of term IDs (Figure 9). The selection of “as plain text” for “Display result” in item B provides a display of enriched/depleted phenotypes as a list with statistical details. Additional output options include labeling phenotypes and screening out the phenotypes that are not differentially enriched (Figure 10). Please note that the hierarchical structure output option is not available for user-defined phenotype because all customized phenotypes were not formerly recognized and classified by FlyBase. DroPhEA allows you to download the processed data and results. Select the requested file, and click the download button. Your browser will prompt you with a window asking if you want to save the file in a local directory.

Figure 8. Parameters controlling the user interface.

Figure 9. Results of differentially enriched phenotypes in a hierarchical structure.

Figure 10. Listed results of differentially enriched phenotypes.

4. Citation
Weng MP and Liao BY (2011) DroPhEA: Drosophila phenotype enrichment analysis for insect functional genomics, submitted

Division of Biostatistics and Bioinformatics
Institute of Population Health Sciences
National Health Research Institutes, Taiwan, ROC
All Rights Reserved

Last update 01/28/2016 (see more info)