modMine Fly Guide

March 19th, 2010 Leave a comment Go to comments

Introduction

The aim of this document is to help researchers access and use the data in modMine for the detailed analysis papers. It provides information regarding where to find specific sets of data, how to view and download these and how to access and query specific subsets of the data. NOTE: this document is currently still a draft version. Please also note that templates and lists can be added to ModMine at any time, so if there is a query or list that you would like included please let us know and we will try and add it for you. Please feed back to rachel@flymine.org with any comments or anything you would like adding to this document.

NOTE: This document mentions several public lists.  Unfortunately there was a problem porting these lists between modMine release-14 and release-15. We hope to rectify this soon.

Section 1: Elements: Annotating functional elements

Classes of elements and their genomic / epigenomics / comparative signatures:


Protein Coding Genes:


Project: The Drosophila Transcriptome

PI: Sue Celniker

Experiment: Gene model prediction: http://intermine.modencode.org/query/experiment.do?experiment=Gene%20Model%20Prediction

Features generated:

  • NaturalTransposableElement
  • Exon
  • Gene
  • MRNA
  • OverlappingESTSet
  • ExonRegion
  • CDS
  • FivePrimeUTR
  • StartCodon
  • ThreePrimeUTR
  • CDNA
  • StopCodon
  • TranscriptRegion
  • EST

The experiment page accessible from the modMine home page and via the link above allows you to download the above features either for the entire experiment or for each individual submission. Here you can also access links to GBrowse tracks and to the original raw data files.

Further analysis:

Using lists to analyze this data:

To carry out analysis of this data within modMine it is useful to be able to use lists of subsets of the data. Lists enable you to run queries (templates or custom query builder queries) on the particular subset you are interested in.

From the experiments page it is possible to create lists of any of the subsets of data from each submission, for example, the set of gene models from each submission. To create a list, navigate to the experiments page. The green box for each submission has the option ‘Create list’ for each type of feature. This will create a list of that subset of data with a default name. If you are logged in this list will be permanently saved to your account. You can now analyse this list in the following ways:

  • The list analysis page gives you additional information about your list
  • The list can be used to run template queries and query builder queries
  • From the lists ‘view’ page you can carry out list operations with other lists – union, intersect and subtract.

How do I find overlapping and nearby features?

For features generated by each submission it is possible to find other features which either overlap or are within a certain Kb upstream or downstream. Such queries can be run from the ‘green box’ on each submission page (accessed by clicking on the submission title or dccid on the experiments page). In addition, some of the templates listed below allow you to find overlaps and nearby features.

How do I find FlyBase genes which overlap the features generated from a particular submission?

To find overlapping gene models use the template ‘Gene models + chromosome –> Overlapping Flybase genes’, where the input is a particular gene model submission and the chromosome you are interested in:

If you have created lists of features, as described above, you can use the template ‘Features + chromosome –> Overlapping Flybase genes’, to find flybase genes which overlap those features, again limited to a particular chromosome:

How do I find features which overlap the same type of feature from flybase (e.g. exons which overlap flybase exons):

There are a set of templates which allow you to find overlaps between features generated by a modEncode submission and features from FlyBase. Please let us know if you would like more combinations of overlap queries added:

For a specific submission or set of submissions use:

If you do not want to specify a submission or list of submissions use:

Long Non-coding RNAs:


Input: Level2 dataset of confirmed transcripts that lack protein-coding signatures – this will come from above analysis on protein coding genes so no data directly from modMine.

Full gene structures for these (invPCR/RNAseq)? (Celniker) :

Celniker has a number of submissions characterizing the Dm transcriptome using RNAseq which could be used in analysis of the confirmed transcripts with no protein-coding signature:

Project: The Drosophila Transcriptome

PI: Sue Celniker

Experiments:

Features generated: These experiments produce no features in modMine.

GBrowse tracks and Downloads: Each experiment page (links above) provides links to GBrowse tracks and to the raw data files for each submission for download.


Small Regulatory RNAs (small non-coding RNAs)


Project: Small and microRNAs

PI: Eric Lai

Experiment: Small RNA identification: http://intermine.modencode.org/release-15/experiment.do?experiment=Small%20RNA%20identification

Experiment type: RNA-seq

Factors: developmental stage, strain, cell line

Features generated: This experiment has no features in modMine.

GBrowse tracks and Downloads: The experiment page (link above) provides links to GBrowse tracks and to the raw data files for each submission for download.

Transcription start sites and promoter regions


Input: inferred TSSs from RNAseq / invPCR / CAGE data (Celniker) :

RNA-seq data:

Project: The Drosophila Transcriptome

PI: Sue Celniker

Experiments:

Features generated: These experiments produce no features in modMine.

GBrowse tracks and Downloads: Each experiment page (links above) provides links to GBrowse tracks and to the raw data files for each submission for download.


invPCR / CAGE data -??


Distant-acting enhancer regions.


Input: Genome-wide maps of histone marks (Karpen, White)

Two projects characterize the genomic distribution of histone modifications:

Project: Chromosomal Proteins

PI: Gary Karpen

Experiment: Genomic Distributions of Histone Modifications (Karpen): http://intermine.modencode.org/release-15/experiment.do?experiment=Genomic%20Distributions%20of%20Histone%20Modifications

Experiment Type: ChIP-chip

Features: Binding site

Factors: Cell line

Project: Regulatory Elements in Drosophila

PI: Kevin White

Experiment: Chromatin Binding Site Mapping (white): http://intermine.modencode.org/release-15/experiment.do?experiment=Chromatin%20Binding%20Site%20Mapping

Experiment Type: ChIP-chip and ChIP-seq

Features: Protein Binding site (and 1 histone binding site: sub 771).

Factors: Developmental Stage, Cell line

1. Karpen data: Genomic Distributions of Histone Modifications:

The Karpen data looks at histone modifications in cell lines. In addition, binding sites of a set of histone modification enzymes have been mapped (see below). The experiment page lists all the submssions. However, below is listed a summary of the histone modifications have been mapped in the indicated cell lines. The easiest way to identify the data associated with each type of histone modification is to look at the submission lists within modMine (accessed from the ‘Lists view’ page). On the lists view page you will find several lists labelled Karpen_xxxx, where xxx refers to the type of histone modification or the cell line. These lists enable you to work with subsets of the data – I.E all the data associated with a particular type of histone modification or a particular cell line. Any of these lists can be used in template queries and queries constructed using the query builder. At the bottom of each list page is a set of template queries that have been automatically run for the list you are viewing – for example, the template “Submissions –> Binding sites” will give a table of all the binding sites for that set of submissions.

  • H4AcTetra
  • H4K16ac(L)
  • H4K16ac(M)
  • H4K5ac
  • H4K8ac
  • H2B-ubiq
  • H3K18Ac
  • H3K23ac
  • H3K27Me3
  • H3K27Ac
  • H3K36me1
  • H3K36me3
  • H3K4me1
  • H3K4me2
  • H3K4Me3(LP)
  • H3K4me3
  • H3K79Me1
  • H3K79Me2
  • H3K9ac
  • H3K9me2
  • H3K9me3

Histone modification enzymes:

  • BRE1_Q2539
  • JIL1_Q3433
  • NURF301_Q2602
  • PCL Q3412
  • Su(var)3-9
  • Trx-C
  • dMi-2_Q2626
  • dRING Q3200
  • Ez

2. White data: Chromatin Binding Site Mapping:

The white data looks at the following histone modifications in various developmental stages. All of these have also been mapped by the Karpen experiment in cell lines. In addition H3K4me3 histone modifiactions have been mapped in S2-DRSC and Kc167 cells by the White group. The modMine ‘lists view’ page provides sets of lists of submissions for each type of histone modification or each developmental stage. As for the Karpen data a set of template queries at the bottom of each list page enable you to query for data relating to all the submissions in the list you are viewing. Binding sites for each type of histone modification have been mapped using both chIP-chip and chIP-seq experiments. Lists dividing submissions according to histone modification and experiment type are available on the lists page.

  • H3K4me1
  • H3K4me3
  • H3K27me3
  • H3K27Ac
  • H3K36me3
  • H3K9me3
  • H3K9Ac

Further Analysis:

How do I find all the histone binding sites at a particular developmental stage?
How do I find all the histone binding sites in a particular cell line?

How do I find histone binding sites at a particular chromosome location?

How do I find all the binding sites for a particular histone modification?

There are three possible templates depending on your starting point:

How do I find binding sites from two different submissions or lists of submissions that overlap (for example, overlapping binding sites from related Karpen and White submissions or from different developemental stages/cell lines)?

Other template queries:

The following templates allow you to find binding sites according to various criteria, using either the name of the antibody to the histone or the antibody target as input. All of these templates can be run using either single values or lists of values:

Insulators and Boundary elements:


Input: Predicted set of insulator regions from Karpen and White groups

White data:

Project: Regulatory Elements in Drosophila

Experiment: Chromatin binding site mapping: http://intermine.modencode.org/release-15/experiment.do?experiment=Chromatin%20Binding%20Site%20Mapping

Experiment type: ChIP-chip and ChIP-seq

Factors: Developmental stage, Cell line

Binding sites for the following insulators were mapped in 0-12 hour embryos. CTCF binding sites were also mapped in S2-DRSC and Kc167 cells:

  • CTCF
  • BEAF-32
  • Cp190
  • su(Hw) chip-chip
  • su(Hw) chIP-seq
  • Trl
  • mod(mdg4)

Lists:

A list containing this set of submissions is available from the modMine lists “view’ page and is called White_insulators. The templates at the bottom of this lists page show results for queries on all submissions in this list. For example if you want to download all the binding sites for all insulators use the template submission(s) –> binding sites. See below for further relevent template queries.

Karpen data:

Project: Chromosomal Proteins

Experiment: Genomic distribution of histone modifications: http://intermine.modencode.org/release-15/experiment.do?experiment=Genomic%20Distributions%20of%20Histone%20Modifications

Experiment type: ChIP-chip

Factors:

Binding sites for the following insulators were mapped in cell lines:

  • BEAF32
  • CP190
  • CTCF
  • mod(mdg4)
  • su(Hw)
  • Chro / GAF

A list containing this set of submissions is available from the modMine lists “view’ page and is called Karpen_insulators. The templates at the bottom of this lists page show results for queries on all submissions in this list. For example if you want to download all the binding sites for all insulators use the template submission(s) –> binding sites. See below for further relevent template queries.

Futher Analysis:

The further analysis and template queries described above for ‘Distant-acting enhancer regions’ is also relevant to insulators and boundary elements.

Origins of Replication:

Project: Origins of replication

Early origins:

Project: Origins of Replication

PI: David MacAlpine

Experiment: MacAlpine Early Origin of Replication Identification: http://intermine.modencode.org/release-15/experiment.do?experiment=MacAlpine%20Early%20Origin%20of%20Replication%20Identification

Experiment type: ChIP-Chip

Factors: Cell Line

Features: Origin of replication

Origins of replication were mapped in the following cell lines:

  • Submission 711: ML-DmBG3-c2
  • Submission 709: Kc167 (NOTE: This submissiom also mistakenly has S2-DRSC as cell line – looks as though this is wrong).
  • Submission 710: S2-DRSC

Further analysis:

How do I find the origins of replication for each submission?

The easiest way to view and download the Origins of replication for each submission is via the experiments page (http://intermine.modencode.org/release-15/experiment.do?experiment=MacAlpine%20Early%20Origin%20of%20Replication%20Identification). Here you will find links to download or view data from all submissions or for each individual submission. Links include options to view origins of replication as a results table or in GBrowse or download in tab, csv or gff3 format. Sequences can also be downloaded. There is also the option to ‘Create list’ of all the origins of replication from each submission. If you are logged in such a list will be permanently saved to your account. You can now analyse this list in the following ways:

  • The list analysis page gives you additional information about your list
  • The list can be used to run template queries and query builder queries
  • From the lists ‘view’ page you can carry out list operations with other lists – union, intersect and subtract.

In addition the template query, Submission –> origins of replication, gives all origins of replication for a particular submission and provides a good starting point if you want to add additional information to a query:

How do I find the Origins of replication identified in a particular cell line?

This can also be limited to a particular chromosome or chromosome region using the following templates:

How do I find binding sites that overlap origins of replication?

The following template allows you to find particular binding sites that overlap the origins of replication on a particular chromosome. The binding sites are constrained by submission, making it possible to limit the query to a particular type of binding site, e.g. histone binding sites, by the submission or list of submissions

Section 2: Dynamics: Defining dynamics of transcription and chromatin state

1. Transcription factors in ModMine:


Project: Regulatory elements in Drosophila

PI: Kevin White

Experiment: Chromatin Binding Site mapping: http://intermine.modencode.org/release-15/experiment.do?experiment=Chromatin%20Binding%20Site%20Mapping

Experiment type: ChIP-chip and chIP-seq

Factors: Developmental stage

Features: TFBinding site and Protein Binding site

Binding sites for the following transcription factors were mapped by this project:

  • D
  • Dll
  • Stat92E
  • Trl
  • Ubx
  • bab1
  • cnc
  • en
  • ftz-f1
  • gsb-n
  • inv
  • run
  • su(Hw)
  • ttk
  • zfh1
  • CtBP
  • kn
  • sbb

(Not included: Mod(mdg4) or gro as according to flyTF and flybase these do not look like transcription factors).

Further analysis:

As submissions for these transcription factors are part of a larger experiment which maps other chromosome binding proteins, it is useful to make use of a list of the submissions that just map the transcription factor binding sites. Such a list is available on the lists ‘view’ page and is called ‘White_transcriptionFactors’. This list enables you to analyse the properties of just that set of submissions using the list analysis page and by running template and query builder queries.

How do I find all the binding sites for all the transcription factors mapped?

Use the following template and run it with the above list, White_transcriptionFactors:

How do I find all the binding sites for a particular transcription factor?

The easiest way to view and download the binding sites for each transcription factor is via the experiments page, http://intermine.modencode.org/query/experiment.do?experiment=Chromatin%20Binding%20Site%20Mapping . Here you will find links to download or view data for each individual submission. Links include options to view binding sites as a results table or in GBrowse or download in tab, csv or gff3 format. Sequences can also be downloaded. There is also the option to ‘Create list’ of all the binding sites from each submission. If you are logged in such a list will be permanently saved to your account. You can now analyse this list in the following ways:

  • The list analysis page gives you additional information about your list
  • The list can be used to run template queries and query builder queries
  • From the lists ‘view’ page you can carry out list operations with other lists – union, intersect and subtract.

NOTE: THE EXPERIMENTS PAGE HAS A LINK TO DOWNLOAD ALL FEATURES OF EACH TYPE (Binding site, protein binding site, histone binding site and TFbinding site). The submissions mapping transcription factor binding sites and histone binding sites have not been consistently annotated as ‘TFBinding site’ and ‘Histone binding site’, in fact most are annotated as ‘Protein binding site’. Using these links will therefore not give you the complete set of each. This will hopefully be rectified in a future release of modMine.

How do I find all binding sites at a particular developmental stage?

The following template query can be run with the list of all transcription factor submissions described above (White_transcriptionFactors).

How do I find all binding sites for a particular transcription factor at a particular developmental stage?

There are four possible templates depending on your starting point:

If you know the antibody to the transcription factor you are interested in use the following template. NOTE: a number of submissions use a different antibody to the same transcription factor, or have given an antibody to the same antigen different names in different submissions. If you are not sure, it is usually safer to use the antibody target gene as your starting point (see below).

To start your query from the antibody target gene use the following template:

Alternatively if you have a specifc submission or list of submissions use this template:

or if your list of submissions is already constrained by developmental stage:

How do I find all transcription factor binding sites in a particular chromosomal location?

The following template query can be run with the list of all transcription factor submissions described above (White_transcriptionFactors).

How do I find the binding sites for a particular transcription factor in a particular chromosomal location?

There are three possible template queries depending on your starting point:

If you know the antibody to the transcription factor you are interested in use the following template. NOTE: a number of submissions use a different antibody to the same transcription factor, or have given an antibody to the same antigen different names in different submissions. If you are not sure, it is usually safer to use the antibody target gene as your starting point (see below).

To start your query from the antibody target gene use the following template:

Alternatively, if you have a particular submission or list of submissions, use the following template:

How do I find all binding sites at a particular developmental stage in a particular chromosomal location?

To start your query from the antibody target gene use the following template:

Alternatively, if you have a particular submission or list of submissions, use the following template:

Or, for all binding sites regardless of transcription factor or submission:

How do I find all binding sites upstream of a particular gene or list of genes?

To find all binding sites use the template query:

To find specifically transcription factor binding sites use the following template, and set the submission to be either a list of transcription factor submissions (White_transcriptionFactors) or a specific submission (to find binding sites for a specific transcription factor).


  1. No comments yet.
  1. No trackbacks yet.