Archive Ensembl Home The Wellcome Trust Sanger Institute The European Bioinformatics Institute
You are here:  Home Ensembl News

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

Pre Ensembl Release of Opossum genome 3rd Dec 2004

We are pleased to announce the pre-Ensembl site for the first preliminary assembly for Monodelphis domestica (the opossum genome).

The Pre-site is available at: http://pre.ensembl.org/Monodelphis_domestica/.

The project coordination and genome sequencing and assembly is provided by the Broad Institute.

The assembly has a base coverage of approximately 7.19X, constructed from 19348 supercontigs, having N50 length 4047488. The total contig length is 3492108230, spanning 3559101070 bases (including gaps). The pre site offers BLAST and SSAHA access and a limited raw compute showing where Genscan ab initopredictions are, raw BLAST hits, eponine hits, CpG islands and repeats. In addition, the M.domestica pre site presents preliminary protein based gene models built by a cut down Ensembl genebuild pipeline. As this is a preliminary site and this is the first marsupial assembly some programs will have been run with default parameters and so may give unpredictable results on opossum.

More

Ensembl pre-release: Xenopus tropicalis 5th Nov 2004

We are pleased to announce the first pre-ensembl site for a preliminary assembly of Xenopus tropicalis. The pre-site is available here.

The Xenopus tropicalis genome assembly 3.0 is the third of a series of preliminary assembly releases by the JGI that are planned as part of the ongoing X. tropicalis genome project. The current assembly includes approximately 7X in small insert end-sequence coverage.

The assembly was constructed with the JGI assembler, Jazz, using paired end sequencing reads. After trimming for vector and quality, 19.1 Million reads assembled into 27,064 scaffolds totaling 1.63 Gbp. Roughly half of the genome is contained in 392 scaffolds all at least 1.2 Mb in length.

The assembly can be downloaded directly from JGI at: http://genome.jgi-psf.org/frog4x1/frog4x1.home.html

More

Ensembl version 26 released 3rd Nov 2004

The Ensembl team are pleased to announce the release of version 26 of Ensembl. This release includes a new human assembly and gene build in addition to fixes/updates in other species.

New Data

Human
Ensembl Human

Human NCBI build 35 is the latest version of the human genome which has a number of small gaps and rearrangements with respect to the previous build (34), mainly in pericentromeric regions. Ensembl 26 contains a complete new gene build on this assembly, in which the automated predictions are supplemented by some gene structures drawn from manually-annotated resources such as Vega.

The new gene build has been carefully assessed with respect to the previous build. This has shown a decrease of entirely missing genes (to only 85 missing cases from Swissprot) and an increase in complete (Met-to-STOP) predictions, and an increase in UTR containing transcripts. This assessment process also provided a list of genes that can be improved, and we will be releasing a patched set of gene predictions in December.

We expect to progressively improve the gene set over this year, and we are interested in all reports of missing genes or incomplete structures where there is data for the complete structure. Please send a report via our helpdesk (helpdesk@ensembl.org).

  • Core
    • New NCBI35 assembly
    • New gene build (including pseudo-, ncRNA, and mitochondrial genes)
  • SNP
    • dbSNP121 mapped to the new assembly

Note that there are currently no EST or ESTgene databases for NCBI35.

The human database version reflects the new assembly version, e.g. homo_sapiens_core_26_35.

Mouse
Ensembl Mouse
  • Core
    • marker_feature/marker_map_location fixed
    • supercontig names fixed
    • New AffyMetrix probe mapping
    • GO terms have been mapped to transcripts via UniProt

The mouse database version has been bumped to 33b, e.g. mus_musculus_core_26_33b

Chicken
Ensembl chicken
  • Core
    • ncRNA genes added

The chicken database version has been bumped to 1c, e.g. gallus_gallus_core_26_1c

Rat
Ensembl Rat
  • Core
    • New AffyMetrix probe mapping

The rat database version has been bumped to 3d, e.g. rattus_norvegicus_core_26_3d

Zebrafish
Ensembl Zebrafish
  • Core
    • New AffyMetrix probe mapping

The zebrafish database version has been bumped to 4a, e.g. danio_rerio_core_26_4a

Tetraodon
Ensembl Tetraodon
  • Core
    • "Monkey" exons removed

The Tetraodon database version has been bumped to 1a, e.g. tetraodon_nigroviridis_core_26_1a

Multi-species

New comparative data (ensembl_compara_26_1)

  • Human/Chimp BLAST_NET from UCSC (as well as BLASTZ_NET_TIGHT generated in-house)
  • Human/Mouse BLASTZ_NET from UCSC (as well as BLASTZ_NET_TIGHT generated in-house)
  • Human/Rat BLAST_NET from UCSC (as well as BLASTZ_NET_TIGHT generated in-house)
  • Human/Chicken BLAST_NET from UCSC (as well as BLASTZ_NET_TIGHT generated in-house)
  • Human/fugu TRANSLATED_BLAT
  • Human/tetraodon TRANSLATED_BLAT
  • Human/chicken TRANSLATED_BLAT
  • Human/Zebrafish TRANSLATED_BLAT
  • New synteny for Human/Chimp, Human/Mouse, Human/Rat, Human/Chicken
  • Orthologues rebuilt
  • Human paralogues rebuilt
  • Protein clusters rebuilt. Multiple alignments run with MUSCLE on each family, except family ENSF00000000041 that has been run with CLUSTALW.

Mart database (ensembl_mart_26_1)

  • New build

Schema changes

Core

Probeset data has been moved from the misc_feature_table to three new tables. This both extends the data associated with probes, and greatly improves retrieval speed for other misc_features.

  • 3 new tables: affy_feature, affy_probe, affy_array
  • external_db table: db_name is now varchar(27)
  • density_feature: added a index on seq_region_id

The 25 to 26 patch file is available in CVS at ensembl/sql/patch_25_26.sql. In addition, the script ensembl/sql/transfer_misc_affy.pl can be used to move the affy data from misc_feature table to the new tables.

Compara
  • synteny_region table: new column 'method_link_species_set_id'
  • dnafrag_region table: 2 columns renamed from seq_start, seq_end to dnafrag_start and dnafrag_end respectively.
  • When possible primary keys are now UNSIGNED
  • genomic_align_block_id changed to UNSIGNED BIGINT in genomic_align and genomic_align_block tables
  • genomic_align_id changed to UNSIGNED BIGINT in genomic_align and genomic_align_group tables
  • perc_id changed to UNSIGNED TINYINT in genomic_align_block table
  • level_id changed to UNSIGNED TINYINT in genomic_align table

In ensembl-compara/sql/table.sql all foreign-key constraints are now explicitly defined (even though MySQL ignores them).

Website Changes

ExportView
  • A new tab, "Pip", has been added. This exports a sequence and annotation file for use in comparative sequence analysis tools like PipMaker, Vista or zPicture.
FeatureView
  • FeatureView displays the location of all alignments of the selected feature against the genome. Currently the display works for probe sets, DNA and protein sequences. You can get to FeatureView from the alignment tracks in ContigView, from TextView and from the Affymetrix probes in GeneView.

Availability

The Ensembl FTP site is currently being updated with new copies of all databases and flatfiles. This should be complete within a day or so. Your patience is appreciated during this process.

The databases will also be copied to the public MySQL server, ensembldb.ensembl.org, within the next few days.

The databases included in this release are:

anopheles_gambiae_core_26_2b
anopheles_gambiae_estgene_26_2b
anopheles_gambiae_lite_26_2b
anopheles_gambiae_snp_26_2b
apis_mellifera_core_26_1
caenorhabditis_briggsae_core_26_25
caenorhabditis_briggsae_estgene_26_25
caenorhabditis_elegans_core_26_116a
danio_rerio_core_26_4a
danio_rerio_est_26_4a
danio_rerio_estgene_26_4a
danio_rerio_lite_26_4a
danio_rerio_snp_26_4a
drosophila_melanogaster_core_26_3b
ensembl_compara_26_1
ensembl_go_26_1
ensembl_mart_26_1
fugu_rubripes_core_26_2c
fugu_rubripes_est_26_2c
fugu_rubripes_estgene_26_2c
gallus_gallus_core_26_1c
gallus_gallus_est_26_1c
gallus_gallus_estgene_26_1c
gallus_gallus_lite_26_1c
gallus_gallus_snp_26_1c
homo_sapiens_core_26_35
homo_sapiens_disease_26_35
homo_sapiens_haplotype_26_35
homo_sapiens_lite_26_35
homo_sapiens_snp_26_35
homo_sapiens_vega_26_35
mus_musculus_core_26_33b
mus_musculus_est_26_33b
mus_musculus_estgene_26_33b
mus_musculus_lite_26_33b
mus_musculus_snp_26_33b
pan_troglodytes_core_26_1
rattus_norvegicus_core_26_3d
rattus_norvegicus_est_26_3d
rattus_norvegicus_estgene_26_3d
rattus_norvegicus_lite_26_3d
rattus_norvegicus_snp_26_3d
tetraodon_nigroviridis_core_26_1a

More

Ensembl pre-release: Cow Genome 6th Oct 2004

Btau_1.0 is a preliminary 3x assembly of the draft genome sequence of cow (Bos taurus), Hereford breed, using whole genome shotgun (WGS) reads from small insert clones. The project coordination and genome sequencing and assembly is provided by the Human Genome Sequencing Center at Baylor College of Medicine.

The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 of the contigs is 4.2 kb. The N50 of the scaffolds is 13.5 kb. The total length of all contigs is 2.26 Gb. When the gaps between contigs in scaffolds are included, the total span of the assembly is 2.34 Gb.

As this is a pre-release, the database does not contain any genes. Subsequent annotations including the ensembl genebuild are ongoing and will be added as soon as they are completed. Future assemblies will include WGS sequences with a larger insert sizes, BAC end sequences, BAC sequences, and marker information for more contiguous assembly, better scaffolding, and chromosome assignment.

More

Ensembl version 25 released 4th Oct 2004

The Ensembl team are pleased to announce the release of version 25 of Ensembl. The main data updates in this release are in the Compara database, which has both new data and some schema changes.

New Data

There are no new assemblies in Ensembl v25.

Chicken
Ensembl chicken
  • SNP
    • New database from dbSNP 122

The chicken database version has been bumped to 1b, e.g. gallus_gallus_core_25_1b

Mouse
Ensembl Mouse
  • Core
    • Added FPC BAC map data to misc_features
    • Added clone map data to misc_features
    • Added accessioned clone map data to misc_features (subset of clone map)
    • Added 1Mbase clone set to misc_features (see Chung et al, Genome Research 2004 14:188-196)

The mouse database version has been bumped to 33a, e.g. mus_musculus_core_25_33a

Multi-species
  • Comparative genomics (ensembl_compara_25_1)
    • Mouse/Rat BLASTZ_NET from UCSC (as well as BLASTZ_NET_TIGHT generated in house)
    • Mouse/Chicken BLASTZ_NET from UCSC (as well as BLASTZ_NET_TIGHT generated in house)
    • C.elegans/C.briggsae BLASTZ_GROUP_TIGHT now in
    • Tetraodon/Fugu BLASTZ_GROUP_TIGHT now in
    • Mouse/Rat Synteny added
    • New synteny for Mouse/Chicken
  • Data bugs fixed:
    • Some cigar_line inversions have been corrected
    • Added N, S, dN, dS, LnL, threshold_on_ds values to the human paralogues
  • Mart database (ensembl_mart_25_1)
    • New build
    • Can now filter on Uniprot ID lists and Uniprot Accession lists, as well as SWProt and SPTrembl. These ID/accessions are also available as attributes.

Schema changes

Core

Tables gene_stable_id, exon_stable_id, transcript_stable_id & translation_stable_id

  • "created_date" and "modified_date" columns added (back). Both have been set to '2004-09-20 00:00:00'.
Compara
  • Deleted Tables
    • 'source'
    • 'method_link_species' (replaced by the new 'method_link_species_set' table)
  • New tables
    • 'method_link_species_set'
    • 'genomic_align'
    • 'genomic_align_group'
  • Modified Tables
    • 'genomic_align_block' has lost some data now transfered to new tables 'genomic_align' and 'genomic_align_group'
    • 'member' has now 'source_name' instead of 'source_id'
    • 'family' has now 'method_link_species_set_id' instead of 'source_id'
    • 'homology' has now 'method_link_species_set_id' instead of 'source_id'
  • The changes were necessary
    • to enable whole genome multiple alignments storage/querying
    • to improve the method_link_species_set data (formely in method_link_species) consistency with the data actually present in compara

Full details of the Compara schema and these changes can be found in CVS in ensembl-compara/docs/schema_doc.html

Availability

The Ensembl FTP site is currently being updated with new copies of all databases and flatfiles. This should be complete within a day or so. Your patience is appreciated during this process.

The databases will also be copied to the public MySQL server, ensembldb.ensembl.org, within the next couple of days.

More

Ensembl version 24 released 10th Sep 2004

The Ensembl team are pleased to announce the release of version 24 of the Ensembl website. This release sees the inclusion of two new species into Ensembl - Honey Bee (Apis mellifera) and a Fresh Water Pufferfish (Tetraodon Nigroviridis), and new assemblies for Zebrafish (Danio rerio) and Mouse (Mus musculus).

New Species Data

Honeybee (Apis mellifera)
Ensembl Honeybee

Ensembl 24 presents an annotation of release 1.1 of the Apis mellifera genome assembly. The honeybee genome sequence was determined by whole genome shotgun at the Human Genome Sequencing Center at Baylor College of Medicine.

The honeybee release includes:

  • Core database

The data comprises:

Repeats and low complexity sequence identified with RepeatMasker (using the Drosophila melanogaster repeat library) and Dust.

Ab initio gene predictions generated with Genscan.

Blast features showing similarities to entries in Swall from a sensitive search. There are also similarities to the the Drosophila melanogaster proteins and proteins from Anopheles gambiae (est) gene predictions.

Gene predictions generated from a combination of evidence sources: honeybee-specific peptides, Drosophila melanogaster-specific peptides, Anopheles gambiae (est) gene predictions, honeybee-specific ESTs and UniProt/Swiss-Prot and UniProt/TrEMBL. This set is incomplete due to a lack of honeybee-specific evidence.

New genes (compared to pre-site no new assembly). No stable ID mapping required.

Danio Rerio
Ensembl Zebrafish

This release includes the zebrafish assembly version 4 (Zv4), as released on the 12th July 2003. This assembly was produced by integrating the whole genome shotgun assembly with data from the physical map.

There are new core, EST and EST gene databases, new SNP and lite databases.

Drosophila melanogaster
Ensembl Fruitfly

Updated core translation table to include stop codon in translation (protein and transcript sequences unaffected). Gene set still based on FlyBase release 3.1.

Database renamed to drosophila_melanogaster_core_24_3b

Chicken
Ensembl chicken

New SNP database. New lite database.

Mouse
Ensembl Mouse

This release provides a full Ensembl gene build for the NCBI m33 mouse assembly (freeze May 27, 2004). After extensive QC, principally from the Sanger Institute, most artefactual assembly issues introduced in build m32 have been removed. The whole genome N50 is 22.3 Mb. (Build m32 was 17.7 Mb).

New software systems have improved the gene set. More than 85% of genes from build m32 retain the same Ensembl gene ids in this release. New gene identifiers were assigned where a many-to-one or many-to-many mapping of old genes to new gene structures was detected.

The interpolated mouse map will be included in the next release and patches to the build will be provided regularly as more detailed analysis is performed.

New core, est and estgene databases built on the NCBIM33 assembly. New SNP and lite databases.

Tetraodon nigroviridis
Ensembl Tetraodon

First release of the Tetraodon nigroviridis genome project sequence data from Genoscope and the Broad Institute (MIT).

The genome assembly was performed using Arachne (Jaffe D.B. et. al. 2003. Gen. Res. 13, 91-96). This site presents version 7 of the assembly.

Genes were annotated by Genoscope, combining evidence from Geneid, Genscan, Genewise and Exofish predictions with alignments of Tetraodon cDNAs to the genome. This was done automatically using GAZE (Howe K., Chothia T. and Durbin R. 2002. Gen. Res. 12, 1418-27) with a custom-designed configuration and gene structure model.

The annotation also includes 87 manually curated structures of a number of HOX and Cytokine genes.

New core database. The assembly, gene set and other annotation features have been provided by Genoscope.

Data Changes

Compara
  • New whole genome alignments for Danio, Mouse, Tetradon (as these are new assemblies).
  • New homology and family data
    • Family rebuilt to incorporate new genomes (Honeybee, Danio, Mouse, Tetraodon). MUSCLE was used for the family multiple sequence alignments rather than ClustalW. Families 1, and 17 were unable to run with MUSCLE and were run with ClustalW. All others were run with MUSCLE. All families have multiple alignment CIGAR lines defined for their peptide members.
  • In-house production of BLASTZ
    • mouse vs human
    • mouse vs rat (will be updated with UCSC data in october release)
    • mouse vs chicken (will be updated with UCSC data in october release)
    • C. elegans vs C. briggsae
  • In-house production of translated BLAT
    • mouse vs zebrafish
      mouse vs Fugu rubripes
      mouse vs Tetraodon nigroviridis
      mouse vs chicken
    • zebrafish vs Fugu rubripes
      zebrafish vs Tetraodon nigroviridis
      zebrafish vs chicken
      zebrafish vs rat
    • Tetraodon nigroviridis vs chicken,
      T. nigroviridis vs Fugu rubripes,
      T. nigroviridis vs human,
      T. nigroviridis vs rat

Schema changes

Core database
  • Added 'display_label' column to prediction_transcript.
  • Changed indices on align feature tables to improve performance of range queries.
  • SQL has been provided to enable schema 23 databases to be patched to schema 24 without the need to re-download the data.
Compara database
  • New tables: peptide_align_feature, analysis
  • Changed tables:
    • added NOT NULL to dnafrag.dnafrag_type, sequence.length, and sequence.sequence (backwards compatible)
    • homology: added column subtype varchar(40)
    • homology_member : added column peptide_align_feature_id int(10)
    • The homology.subtype is a more detailed classification of the nature of the homology.

      Changes are transparent to both MART and the web.

  • Extended Protein/Gene homology algorithm:
  • Adapted for cases where there are equal 'best' hits (same query peptide hits multiple target peptides with same score, evalue, %identity, %positivity). Usually caused by target peptides having identical sequence.
  • Extended BRH labeling
  • New homology description naming to correspond with algorithm changes. The old naming from schema 23 was BRH and RHS. BRH is now divided into 2 different naming categories:

    1. UBRH - (Unique Best Reciprocal Hit) These are BRHs where there is only one uniquely best hit in both directions. Or a simple 1-to-1 BRH
    2. MBRH - (Multiple Best Reciprocal Hit) These are BRHs where there were multiple but identical best hits in one or both directions. This can occur when there is perfect protein sequence duplication of translated genes within a species. In the old algorithm a random BRH was picked from the equally bests, now they are all reported.

RHS - (Reciprocal Hit base on Synteny): unchanged from schema 23

  • New homology subtype
  • For MBRH types there are the following subtypes defined:

    1. MBRH , subtype ='DUP 1.#' (eg 1.3 or 1.5)
      These MBRHs are defined such that all the BRHs in the web of interconnection fall where there is only 1 gene in one species aligning equally best with # genes in the other species such that all these other genes fall on the same chromosome and within 1.5 megabases of each other. Hence these correspond to a highly probable recent 1-to-many gene duplication event and are paralogs. In the previous release one of these would have been a BRH and the rest would have been RHS.
    2. MBRH, subtype = 'SYN'
      A homology of this subtype is one pair from a complex MBRH graphs which is syntenous (like an RHS) with a UBRH or an MBRH/DUP. In the previous release this may have been a BRH or a RHS
    3. MBRH, subtype = 'complex'
      This is one pair from a complex MBRH graph that can't be easily classified. In the previous release this may have appeared as a BRH or may have been skipped.

For descriptions/types UBRH and RHS there are no subtypes defined yet.

Website Changes

SNPView
  • The SNP neighbourhood image now has a Features drop-down menu, similar to the menus on ContigView. This menu provides options for displaying all SNPs, just genotyped SNPs and different transcripts on the image.
  • The selected SNP is highlighted in the neighbourhood image.
Sitemap

Updated sitemap for www.ensembl.org

Drawing code

Code simplified to allow more tracks to be created without writing additional modules - by just using the drawing code configuration.

Configuration-only drawing code simplified by addition of "add_tracks" family of calls to the EnsEMBL::Web::UserConfig.

GeneView

Ab-initio predictions now shown on the transcript neighbourhood image.

API

Registry added: a central static hash for storage/retrieval for all the adaptors. Adaptor calls are backwards compatable and all old code should work exactly the same as previously but underlying code will now utilise the Registry. New methods allow easier access to adaptors via the Registry.

FTP Site Changes

Availability

The Ensembl FTP site is currently being updated with new copies of all databases and flatfiles. This should be complete within a day or so. Your patience is appreciated during this process.

The databases will also be copied to the public MySQL server, ensembldb.ensembl.org, within the next few days.

More

Ensembl version 23 released 26th Jul 2004

We are pleased to announce the release of Ensembl v23. This release sees a number of data additions and improvements, including human paralogues, ncRNAs, new SNPs, and improved Affy mappings. Also in this release are improvements to the website such as sequence markup and coverage graphs in BlastView, display of Compara DNA-DNA and gene homology alignments in AlignView, and configurable markup of gene sequence from GeneView.

New Data

There are no new assemblies in Ensembl v23.

Ensembl Human
Human

Human release 23 contains some data types new to Ensembl: ncRNAs and selenocysteine proteins have been added for the first time.

The ncRNA mappings come from Sean Eddy and Tom Jones, and include micro-RNA sets. We are investigating with Rfam how to extend ncRNA annotation to other vertebrate species.

The human dataset now contains 23 selenocystine proteins with the correct recoding of the TGA codon to selenocystine (U). These data are modelled in the schema as translation attributes.

In addition, the human gene set has had a number of small changes to improve some otherwise troublesome gene structures and reannotate some starting codons to more realistic ATG positions than those submitted from cDNA projects.

  • Core
    • Improved gene set
    • Addition of ncRNAs to gene set
    • Addition of selenocysteines to translation model
    • Fixed misc_attrib for 32K BACs, changed 'non_ref' to 'name'
    • Addition of ENCODE regions to misc_features
    • New Affy mappings
  • EST
    • New database
  • ESTgene
    • New database
  • Core, EST, ESTgene, Vega:
    • Fixed PAR coordinates
  • Lite
    • Updated with the new SNP and gene data
  • SNP
    • New database from dbSNP 121 & schema change (see schema changes below)

As a result of these changes, the human database version has been bumped to 34e, e.g. homo_sapiens_core_23_34e

Mouse
Ensembl Mouse
  • Core
    • GO mappings added.
    • Duplicate Exon stable IDs fixed
    • New Affy mappings
  • SNP
    • New database from dbSNP 121 & schema change (see schema changes below)
  • Lite
    • Updated with the new SNP data

The mouse database version has been bumped to 32c, e.g. mus_musculus_core_23_32c

Rat
Ensembl Rat
  • Core
    • GO mappings added
    • RGD symbols mapped to genes
    • New QTLs
    • New Affy mappings
  • SNP
    • New database from dbSNP 121 & schema change (see schema changes below)
  • Lite
    • Updated with the new SNP data

The rat database version has been bumped to 3c, e.g. rattus_norvegicus_core_23_3c

Zebrafish
Ensembl Zebrafish
  • Core
    • New Affy mappings
  • SNP
    • schema change

The zebrafish database version has been bumped to 3c, e.g. danio_rerio_core_22_3c

Chicken
Ensembl chicken
  • Core
    • Add BACend data to misc_features
  • SNP
    • New database of BGI SNP data
  • Lite
    • New database with the new SNP data

The chicken database version has been bumped to 1a, e.g. gallus_gallus_core_23_1a

Mosquito
Ensembl Mosquito
  • Core
    • Fixed exon rank of prediction transcripts. They should begin at 1, but began at 0.
  • SNP
    • schema change (see schema changes below)

The anopheles database version has not changed.

Multi-species

Comparative genomics (ensembl_compara_23_1)

  • Addition of human recent paralogues (see below)
  • Complete rebuild of all orthologues
  • New protein clustering
  • Schema change (see schema changes section below)

The new Compara data release now includes information on recently duplicated human genes. A raw set of gene homologies was derived by performing all-against-all blast against a dataset of Ensembl predicted genes (including mouse and rat genes for outgroups). The resultant homolog pairs were then filtered and ranked according to genetic distance and gene coverage. Groups of duplicated human genes were determined by clustering genes that shared common reciprocal matches. The genetic distance cut-off that defined the extent of a gene group was dynamically set as the distance to the most-related rodent gene. Hence, groups of recently duplicated genes identified in this manner have phylogenetic meaning and can be formally defined as being paralogous human genes that have arisen since the human/rodent divergence.

Mart database (ensembl_mart_23_1)

  • New build

Schema changes

SNP
  • Table Freq
    • count column smallint(5) unsigned changed to float
  • Table SubSNP
    • added a column strand_to_rs tinyint(4)
Compara
  • Table method_link_species
    • changed index from UNIQUE method_link_id (method_link_id,species_set,genome_db_id) to KEY method_link_id (method_link_id,species_set,genome_db_id) to allow intra-species data set such as the new human paralogues set.

Website Changes

MultiContigView
  • Can now locate homologous regions by gene homology, as well as the original DNA-DNA alignment method. Orthologues on GeneView are now linked into MultiContigView.
  • Can show links between homologous transcripts (select "Join transcripts" from the "Compara" menu).
  • Simple features (SNPs, Eponine, tRNA, etc) can now be shown on MultiContigView
GeneView
  • Now has a link ("Sequence Markup") that displays the genomic sequence of the gene, optionally marked-up with exons, SNPs, and line numbers.
  • The gene neighbourhood image at the top of GeneView now has a Features drop-down menu, similar to the menus on ContigView. This menu provides options for displaying SNPs and different transcripts on the image.
BlastView
  • Addition of an ncRNA BLAST database for human
  • The SETUP page has been extended:
    • Query sequences can be loaded via ID (EMBL/Uniprot/RefSeq) e.g. NM_002931
    • Searches can be run using one of five pre-set sensitivity levels; exact, near-exact, near-exact (oligo), local mismatch, and distant homology.
  • New features on the DISPLAY page:
    • The top 'n' alignments (various sort options) to display can now be specified.
    • There is a new graph that displays the location of matches on the length of the query sequence.
    • For the alignment summary table, there are new options to allow the alignment location to de displayed in any coordinate system.
    • A new page (follow the "[G]" link) shows genomic sequence (with user-definable coordinate system, orientation, and length of flanking sequence) with the following features highlighted:
      • Exons (Ensembl, VEGA, ESTGene etc)
      • SNPs
AlignView
  • This page has been extended to display Compara DNA-DNA and gene homology alignments, in a variety of different formats. These alignments can be reached via GeneView (for homologues) or from the Compara tracks in ContigView. e.g. see here
GeneSNPView
  • The table at the bottom of the page now displays the SNP type, AA change, and AA position for all transcripts of the gene.
FastaView
  • Can display data from the core database for mapped Affy identifiers, including description, locations. and other members of a composite group. These data are linked to from the mapped Affys on ContigView and GeneView.
ContigView
  • New ncRNA tracks for human
  • New ENCODE region track
  • New tracks for Affy hg_u95b, c, d, and e probesets
Drawing code
  • The gene and match tracks have been combined into generic GlyphSets (generic_gene and generic_match). All genes, and most similarity features are drawn using these tracks. Adding a new gene/match track is now just a matter of providing the appropriate configuration.

FTP Changes

  • a fasta/RNA directory has been added to hold RNA dumps
  • the gene ID has been added to the FASTA headers, e.g.:
    >ENSP00000317931 pep:known chromosome:NCBI34:1:801456:802749:-1 gene:ENSG00000177750 transcript:ENST00000326725
    >ENST00000327169 cdna:known chromosome:NCBI34:1:407522:408460:1 gene:ENSG00000177799

Availability

The Ensembl FTP site is currently being updated with new copies of all databases and flatfiles. This should be complete within a day or so. Your patience is appreciated during this process.

The databases will also be copied to the public MySQL server, ensembldb.ensembl.org, within the next few days.

More

Ensembl pre-release: Mouse NCBIm33 15th Jul 2004

We are pleased to announce the release of the NCBI m33 assembly of the mouse genome.

Build 33 (freeze May 27, 2004) has undergone extensive QC, principally from the Sanger Institute. Most of the artefactual assembly issues introduced in build 32 have been removed. The whole genome N50 is 22.3 Mb (compared to 17.7 Mb from Build 32).

Mouse build 33 represents a composite assembly made by merging HTGS phase 3 sequence with the Mouse Genome Sequence Consortium v3 Whole Genome Shotgun Assembly (MGSCv3). The assembly was performed by NCBI using a 'combined' tiling path that was largely created automatically, but was manually curated in places. This facilitated placing finished sequence in the context of the MGSCv3. Draft sequence was not included in this build as the slight increase in coverage one gains by using this is offset by the increase in build errors.

As this is a pre-release, the database only contains repeat analysis, ab initio gene predictions, and BLAST comparisons. The Ensembl gene prediction pipeline is in progress, and no complete Ensembl gene predictions are available yet. The annotated assembly will be released on the main Ensembl site (http://www.ensembl.org/), currently planned for the start of September 2004.

More

Ensembl pre-release: Dog genome 14th Jul 2004

We are pleased to announce the availability of the assembly for the Dog (Canis familiaris) genome.

The Dog genome was sequenced by a consortium led by the Broad Institute and funded by NIH-NHGRI. It is a 7.6x assembly with a super-contig N50 of 41.6MB and a contig N50 of 123KB. For more information, please go to the Broad web site at http://www.broad.mit.edu/

Ensembl "Pre" sites give early access to assemblies at mainly the DNA level for searching, along with some gene structures. In this release Dog cDNAs have been mapped onto the genome. It is expected that a fully featured Ensembl Dog site will be available this autumn.

More

Ensembl version 22 released 3rd Jun 2004

We are pleased to announce the release of Ensembl v22. This release contains the first release of MultiContigView, a new comparative genomics display which lets you view simultaneously two or more genomes which share local order (e.g. Human, Mouse, Rat). For example: click here. Release 22 also sees the arrival of the first annotated draft chicken assembly in Ensembl.

New Data

Chicken (Gallus gallus)
Ensembl chicken

Ensembl 22 presents an annotation of the first draft chicken genome assembly. The chicken genome sequence was determined by whole genome shotgun at the Genome Sequencing Center at Washington University, St Louis. The analysis of the chicken sequence involves an international group of scientists including individuals from the US, UK, Europe and China.

A slightly modified Ensembl gene build was run for chicken, resulting in 17784 genes with 185326 exons. Continuing analysis suggests that about 10% of the gene content of chicken is absent from this gene build. Around half of this missing content can be attributed to representation issues in the whole genome shotgun, probably due to high GC content regions not being well represented. The other half of the missing set is poorly represented as one or two exon assemblies (in particular in chromosome Un) which did not pass Ensembl's quality for gene structures. This QC level has been set to avoid spurious pseudogene structures being called as genes.

We are working with our colleagues in the chicken community to analyse these data further and the analysis group expects to submit a paper this summer in addition to providing improved data resources.

The chicken release includes:

  • Core database
  • EST database
  • ESTgene database
Human
Ensembl Human

The human SNP database has additional information about RefSNPs that are part of the HapMAP project. The schema of the database has changed slightly to accomodate this data (see schema changes below).

Multi-species
  • Affy probe hits

    These data have been added to the core database as "misc features". New method of mapping affy probe hits to translations has changed some mappings. This applies to the following databases:

    • homo_sapiens_core_22_34d
    • mus_musculus_core_22_32b
    • rattus_norvegicus_core_22_3b
    • danio_rerio_core_22_3b
  • RefSeq links

    RefSeq mRNA links (NM_ identifiers) have been added for each of the protein links (NP identifiers) in the following core databases:

    • homo_sapiens_core_22_34d
    • mus_musculus_core_22_32b
    • rattus_norvegicus_core_22_3b
    • drosophila_melanogaster_core_22_3a
    • danio_rerio_core_22_3b
  • Comparative genomics (ensembl_compara_22_1)
    • New chicken genebuild was added to compara for homology and family analysis
    • Honeybee sequence was added to compara and assigned genome_db_id=12. Honeybee sequence was queried against both mosquito and fruitfly DNA via translated BLAT. The results are stored in the genomic_align_block table.
    • Family was recalculated so as to include chicken genes and the latest SWISSPROT and SPTREMBL
    • Orthologue analysis was extended so that now all species pairs have putative orthologues. For cross-phylum analyses (e.g. mosquito vs C.elegans), only BRH (best reciprocal hit) were calculated.
    • Schema changes (see schema changes below)
  • Mart database (ensembl_mart_22_1)
    • New build, including chicken
    • New table-naming convention

Schema changes

Core
  • 2 new tables (translation_attrib & transcript_attrib) added
    • these tables will be used for handling exceptional cases in transcripts/translations, e.g selenocysteins and RNA edits. Data to populate these tables is still in preparation.
  • misc_set table
    • The "code" column was expanded from varchar(15) to varchar(25).
SNP
  • RefSNP table
    • added column "hapmap_snp" to provide a boolean flag indicating whether this RefSNP has been typed in the HapMap project or not.
Compara
  • member table
    • added column "chr_strand" which copies Gene and Transcript strandedness (1 or -1) from the core databases
  • genome_db table
    • added column "locator" which stores a locator string which describes how to get a DBAdaptor for the corresponding core database. It is used in pipeline production, but set to an empty string for release.

Website Changes

MultiContigView
  • We are pleased to announce a new comparative genomics view for Ensembl: MultiContigView. This page displays simultaneous contigviews for multiple species, aligned by compara genomic alignment blocks. e.g. click here
  • You can enter the page on a location (from one species), along with the name of one or more additional species. The initial alignment is selected as the best available between the two species, interpolated from the DNA align features in the compara database. MultiContigView is linked to from ContigView (via Compara DNA align features), and GeneView.
  • MultiContigView shows lines connecting the dna alignment blocks along with the gene predictions for each species.
  • Navigation is currently available to:
  • Change the location/size of all the sequence regions shown by
    • clicking on either the top display or overview
    • using the input boxes at the top of "Detailed view"
    • using the buttons above the detailed view menu
  • Change individual species sequence region
    • flip species region
    • nudge species region up/down stream
    • zoom in/out on region
    • realign the sequence to the focus sequence
  • Change the primary (focus) species
    • any of the species shown can be changed to be the primary species, by clicking on the "P" button.
  • This is the initial release of this page, and we are planning a number of improvements. We would be appreciative of any feedback you might have, good or bad, about this display.
SNPView
  • Now displays a link to the International HapMap Project (http://www.hapmap.org/) if the SNP has been typed in HapMap.
Site Maps
  • The site maps have been reworked, and are now generated dynamically from the available data. This means they are more up-to-date, more accurate, and more useful for navigating the site. See, e.g. http://www.ensembl.org/Homo_sapiens/sitemap/
ContigView
  • Affy probe track. The Affymetrix probe hits stored in the core database are now displayed as a track on ContigView. This track can be switched on and off via the "Features" menu.
  • Transcript tracks are now collapsible. Transcript tracks can be collapsed down into genes by clicking the red "-" symbol to the left of the track name.
Ab initio predictions
  • Ab initio predictions, such as Genscan, SNAP, etc, are now stored in the same way to Ensembl genes. This means that these ab initio transcripts can now be viewed in TransView, ProtView and ExonView, and exported from ExportView, like Ensembl transcripts.

FTP Site Changes

In this release of Ensembl we have slightly reorganised some of the files on the FTP site. The "golden_path" data directory has been merged into the fasta/dna directory, and these files have been given more consistent and useful names, as follows:

<species>.<version>.<seqtype>.<idtype>.<id>.fa.gz

e.g. for Human, the old 'golden_path' files become;

Homo_sapiens.NCBI34.dna.chromosome.1.fa.gz etc, for unmasked sequence

Homo_sapiens.NCBI34.dna_rm.chromosome.1.fa.gz etc, for repeatmasked sequence

In addition, we have added all the "non-chromosomal" sequence to the dumps; i.e., all the sequence which has not been mapped into the assembled chromosomes: NT contigs, Unknown chromosomes, etc. This sequence has been grouped into single files, e.g.

Homo_sapiens.NCBI34.dna.nonchromosomal.fa.gz
Homo_sapiens.NCBI34.dna_rm.nonchromosomal.fa.gz

Finally, the files that used to be in the fasta/dna directory, which were dumps of the assembly at the sequence level, have also been renamed. For example:

Homo_sapiens.NCBI34.dna.contig.fa.gz
Anopheles_gambiae.MOZ2a.dna.chunk.fa.gz
Fugu_rubripes.FUGU2.dna.scaffold.fa.gz

Note that the sequence "container" may be different in different species: contigs in human, chunks in Anopheles, scaffolds in Fugu.

These changes mean we provide a more complete set of sequence dumps, with more useful names.

Availability

The Ensembl FTP site is currently being updated with new copies of all databases and flatfiles. This should be complete within a day or so. Your patience is appreciated during this process.

The databases will also be copied to the public MySQL server, ensembldb.ensembl.org, within the next few days.

More

Ensembl version 21 released 11th May 2004

We are pleased to announce the release of Ensembl v21. This release includes: a major update to the gene buid on human (on the same assembly, NCBI34), a new gene build on Fugu (again without assembly change), the first appearance of a gene build on chimpanzee, and improvements to orthologs in key model organisms. We also now provide an experimental text mining resource and links to papers of disease association studies from the genetic association database.

New Data

Homo sapiens
Ensembl Human

Human 21.34d contains a new gene build on the NCBI34 assembly, with a modest increase in the number of protein coding genes (from 23,531 to 23,758) but a more significant increase in the number of protein coding transcripts (from 29,802 to 34,091).

This build is best described as a semi-automated build; it takes advantage of the Vega manual annotation and makes more effective use of Uniprot. We have tracked a number of statistics for improvement and have 3,681 more Met-to-STOP protein coding predictions, presumed to be complete coding sequence (an increse of 18%) and 1,789 more predictions with both 5' and 3' UTRs and a complete Met-to-STOP prediction (an increase of 13%). These statistics are actually underestimates of the improvements as we have also removed around 1,000 3' UTR clones (probably genomic contamination cloning errors) which gave rise to single-exon open-reading-frames 3' of well-annotated genes.

More extensive statistics and discussion of future gene building plans can be found here.

  • Core database
    • New genebuild
    • PAR/Haplotype data
  • SNP database
    • Updated to dbSNP120
  • Lite database
    • Updated with new SNPs

As a result of these changes, the Human database version has been bumped to 34d, e.g. homo_sapiens_core_21_34d.

Chimpanzee (Pan troglodytes)
Ensembl Fugu

This release presents the first annotated chimp assembly in Ensembl. The genome used is the 4x shotgun assembly from the chimpanzee Genome Consortium. This was then aligned to human by UCSC using blastz. The resulting alignments were then used to transfer human gene structures (Human Build 34d) to chimpanzee.

The transfer process had to cope with many complications in the alignment which are primarily due to problems in the chimp sequence. The quality of the sequence itself is a product of the low coverage of the chimpanzee genome (4x shotgun is very low in particular for an outbred organism such as chimp: 4x therefore represents only 2x on each haplotype). This means that there are missing areas of the chimp genome, missassemblies, misplacements and small insertions, deletions and substitutions.

With better data it is expected that nearly every human gene has an almost identical chimpanzee gene. However, the areas which are significantly different between chimpanzee and human will be precisely the areas where there is the largest amount of uncertainity about the alignment.

This gene build attempted to transfer across human genes wherever possible. For coding regions the following applies:

  • If there was an insertion or deletion error that preserved frame up to 10 amino acids then this was kept.
  • If there was an insertion or deletion error which did not preserve frame up to 10 base pairs then a small "intron" was added (in effect modelling a sequence insertion or deletion).
  • If a significant part of a exon (or an entire exon) was missing then the transcript was broken into two separate transcripts within the same gene.
  • If twhe final transcript had a small ORF at the end of this process then it was discarded.

It is anticipated that improvements to this system in the future will be largely driven by work on the chimpanzee genome sequence.

Fugu rubripes build 2c
Ensembl Fugu

Fugu build 2c is a rebuild on the Singapore assembly, which utilises the considerable increase in cDNA and vertebrate protein evidence which has accumulated since the original gene build. This improvement in data has allowed the new build to employ more stringent quality control, in particular focusing more on longer gene structures. As a consequence the overal gene number has dropped to 22,089, while the number of orthologs which we can find to mammals and Zebrafish has remained reasonably constant. We believe this is a more useful and complete dataset than the previous build.

  • Core database
    • New genebuild
  • EST database
  • ESTgene database
    • New genebuild
Multi-species
  • Comparative genomics (ensembl_compara_21_1)

Blastz and BLAT now have scores and percent identities calculated for each alignment. BLAT is also now available for: Fugu and Danio with mouse, rat and chicken, and chicken with mouse and rat.

The homologue data have been extended to include human, mouse and rat with all genomes except chimp. However, deep homologues only have BRH (best reciprocal hit) and not RHS (Reciprocal hit based on synteny). This means that Ensembl now contains probable orthologous relationships between mammalian genes and those from the model organisms Drosophila, Anopheles and the nematode worms.

Chimp orthologues were derived from whole genome alignments(DWGA) rather than by BRH.

  • Mart database (ensembl_mart_21_1)
    • New build, including chimp

Schema changes

Compara
  • New sequence table, and the addition of a sequence_id in member (and removal of member.sequence).
  • New version column for peptide/gene versions in the member table.
  • New genebuild column in the genome_db table, to allow for different gene builds off the same assembly.

Website Changes

ContigView
  • Haplotype and Pseudo Autosomal Regions (PAR) are now displayed on ContigView. The chromosome-level display indicates the long-range position of these assembly features, while the detailed view colours these regions differently (red for Haps, blue for PARs). Click on the HAP/PAR track allows you to switch between the different versions of the assembly.
    e.g. http://www.ensembl.org/Homo_sapiens/contigview?l=6:32437499-32637500
  • FirstEF track added. This track shows features produced by FirstEF: a first-exon and promoter prediction program for human DNA (http://rulai.cshl.org/tools/FirstEF/)
GeneView
  • Displays alternative location for genes (e.g. in a PAR/HAP)
  • New GeneDAS sources
    • HUGO_text : an experimental source of Medline text-mining for HUGO symbols
    • GAD: the Genetic Association Database tracks papers reporting association studies to around 2,000 disease genes.

Availability

The Ensembl FTP site is currently being updated with new copies of all databases and flatfiles. This should be complete within a day or so. Your patience is appreciated during this process.

The databases will also be copied to the public MySQL server, ensembldb.ensembl.org, within the next few days.

More

Ensembl version 20 released 1st Apr 2004

The Ensembl Developers are pleased to announce the release of Ensembl 20.

The main focus of this release is an improvement in the underlying technology of Ensembl, with only minor data updates and visualisation improvements. One exception is the addition of allele frequency and genotype data for typed SNPs, which is displayed in SNPView.

This release includes significant changes to the API and schema. Both the database and API have been extended, with several goals in mind:

  • To generalise the way in which assembly and sequence information is stored in the database so that a variety of genomes could more easily be accomodated.
  • To improve the efficiency of, and reduce some of the complexity of using the API.
  • To allow for the inclusion of some genome anomolies such as pseudo autosomal regions and structural haplotypes.

The following are the most significant alterations to the perl API:

  • The concept of coordinate systems has been introduced. A CoordSystem object represents a coordinate system and the CoordSystemAdaptor can be used to retrieve available coordinate systems from the database.
  • The Slice class has been generalised. Formerly a Slice object was restricted to representing chromosomal regions. A Slice object may now represent a region in any coordinate system which is in the database. Accordingly the SliceAdaptor has also been extended so that it can be used to obtain Slices in any coordinate system.
  • The RawContig, Chromosome and Clone classes are deprecated. These have been replaced by the generalised Slice class. Similarly RawContigAdaptor, CloneAdaptor and ChromosomeAdaptors have been replaced by the SliceAdaptor.
  • The SeqFeature class is deprecated. This has been replaced by a simpler Feature class.
  • The Protein and ProteinAdaptor classes have been merged with and replaced by the Translation and TranslationAdaptor classes.

A considerable amount of effort has been made so that many old programs will continue to work against the new API, albeit with deprecated messages.

The database schema has undergone similar changes to the API. The following are the most significant:

  • The chromosome, contig and clone tables have been replaced by a single general table named seq_region.
  • The assembly table has been generalised so that it describes the relationship between arbitrary coordinate systems rather than just contigs and clones.
  • Feature table columns contig_id, contig_start, contig_end, contig_strand have been replaced by seq_region_start, seq_region_end, seq_region_strand columns respectively. Features may now be stored with coordinates in any coordinate system which is in the database and are no longer restricted to the contig coordinate system.
  • Most data has been removed from the denormalised lite database. The performance of the core database has been improved and most aspects of the lite database are no longer needed.

The hope is that all of these alterations will result in long term benefits for the users of the schema and API, and that they have helped to make Ensembl a more powerful and flexible system.

For more details of the changes, see: http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/EnsemblCore.html.

Data

There are no new assemblies or gene builds in release 20, but every species core schema databases have been updated to the new schema.

Anopheles gambiae
Ensembl Mosquito
  • Core database
    • 343 proteins removed and their genes re-typed as "bacterial_contamination"
  • SNP database
    • updated to use chromosome names 2L/2R/3L/3R instead of 2/3
  • Lite database
    • now only contains SNP data, see below

As a result of these changes, the Anopheles database version has been bumped to 2b, e.g. anopheles_gambiae_core_20_2b

Caenorhabditis briggsae

No data updates, other than port to new schema. Lite database removed.

Caenorhabditis elegans
Ensembl C. elegans
  • Core database
    • missing wormpep_protein and pseudogene xrefs fixed
  • Lite database
    • removed, see below

As a result of these changes, the C.elegans database version has been bumped to 116a, e.g. caenorhabditis_elegans_core_20_116a

Danio rerio
Ensembl Zebrafish
  • EST database
    • new for this release
  • ESTgene database
    • new for this release
  • Lite database
    • now only contains SNP data, see below

As a result of these changes, the Danio database version has been bumped to 3b, e.g. danio_rerio_core_20_3b

Drosophila melanogaster

No data updates, other than port to new schema. Lite database removed.

Fugu rubripes
Ensembl Fugu
  • Core database
    • prediction transcript strands fixed (from 0 to -1)
  • Lite database
    • removed

As a result of these changes, the Fugu database version has been bumped to 2b, e.g. fugu_rubripes_core_20_2b

Homo sapiens
Ensembl Human
  • SNP database
    • Updated to dbSNP 119
  • Vega database
    • Updated with new Vega release data
    • new chromosomes: 9 & 10
    • updated chromosomes 13 & 20
  • Lite database
    • now only contains SNP data, see below

As a result of these changes, the Human database version has been bumped to 34c, e.g. homo_sapiens_core_20_34c

Mus musculus
Ensembl Mouse
  • Core database
    • tmhmm protein features added
  • EST database
    • new for this release
  • ESTgene database
    • new for this release
  • SNP database
    • Genotype data added
  • Lite database
    • now only contains SNP data, see below

As a result of these changes, the Mouse database version has been bumped to 32b, e.g. mus_musculus_core_20_32b

Rattus norvegicus

No data updates, other than port to new schema. Lite database now only contains SNPs.

Multi-species
  • GO database (ensembl_go_20_1)
    • Updated to the Feb 2004 release from geneontology.org
  • Compara database (ensembl_compara_20_1)
    • New protein clustering and families
    • New BLAT alignments from UCSC
    • Addition of N, S and LnL data and schema
    • Storage of dS cut-off value in the db
    • Updated blatsz alignments to take in account GroupId and LevelId
    • Corrected synteny data
    • Schema changes (see below)
  • Mart database (ensembl_mart_20_1)
    • New build

Schema changes

Core

As mentioned above, details of the new core schema and API can be found here.

Lite

As a result of the core schema and API changes, the lite database now only contains SNP information. If a species has no SNP database (e.g. C.briggsae) that species will also have no lite database.

SNP
  • GTInd table added
    • holds individual genotype data
  • Strain table
    • added columns ssid and ind_id
Compara
  • genomic_align_block table
    • added columns group_id, level_id and flip_alignment
  • homology table added

Website Changes

Nearly all of the work on the webcode for this release has been modifying it to work with the new v20 API. Other changes include:

ContigView

BLAT tracks added for the new Compara data

SNPView

Where available, the following data have been added to the display:

  • Allele frequencies, per SubSNP/population/assay
  • Strain genotypes

Availability

The Ensembl FTP site is currently being updated with new copies of all databases and flatfiles. This should be complete within a day or so. Your patience is appreciated during this process.

The databases will also be copied to the public MySQL server, ensembldb.ensembl.org, within the next few days.

More

Ensembl pre-release: Chicken genome 2nd Mar 2004

We are pleased to announce the release of the first draft assembly of the chicken genome.

The Red Jungle Fowl, Gallus gallus, is the ancestor of the domestic chicken (Gallus domesticus) and is the first avian genome released. The genome, with a haploid size of 1.1 Gigabases, was determined by whole genome shotgun at the Genome Sequencing Center at Washington University, St Louis. The analysis of the chicken sequence involves an international group of scientists including individuals from the US, UK, Europe and China.

http://pre.ensembl.org/ provides displays of genomes that are in the process of being annotated. Genomes displayed here have undergone initial BLAST analysis on the assembly but have not gone through a complete gene build. These data are provided as an "early access" site for our users.

Repeats have been identified with RepeatMasker (using the latest set of chicken repeats) and Dust. A set of ab initio gene predictions has been generated with Genscan and similarities to entries in Swall, Unigene and EMBL VertRNA identified with Blast. Markers from UniSTS have been placed onto the assembly with EPCR. Chicken proteins and cDNAs have been mapped onto the assembly and preliminary gene models created for them.

http://pre.ensembl.org/Gallus_gallus offers browsable chromosomes, and BLAST and SSAHA functionality will be available within twenty-four hours.

Chromosomal sequences, both masked and unmasked, can be downloaded from Washington University (http://genome.wustl.edu/projects/chicken/).

More

Ensembl version 19.2 released 13th Feb 2004

The Ensembl Developers are pleased to announce the second release of Ensembl on schema 19.

New Data

Mouse NCBI m32 release
Ensembl Mouse

Mouse 19.32.2 presents the first Ensembl genebuild on the NCBI build 32 composite mouse assembly. Chromosomes were assembled using slightly different algorithms depending upon available mapping date. Chromosomes 2, 4, 5, 7, 11, 15, 18, 19, X and Y were assembled using a clone based tiling path, with whole genome shotgun sequence used to fill gaps. Chromosomes 1, 3, 6, 8, 9, 10, 12, 13, 14, 16 and 17 were assembled using the MGSCv3 as a tiling path and integrating HTGS sequence (both finished and draft) as appropriate.

Over 90% of the genes with cDNA evidence maintained stable IDs between the previous release of Ensembl and m32. This number would have been higher but for some complex duplications, in particular in olfactory receptor clusters. In these cases the duplication structure (some of which is known to be due to artefacts in the mixed clone and WGS assembly) are different between the two assemblies in difficult to reconcile ways. In these areas we have had to be conservative and reassign new stable IDs. Elsewhere the Ensembl gene predictions have mapped very consistently from the old assembly to the new assembly.

In addition to the new assembly and genebuild, Mouse 19.32.2 contains:

  • SNP set from dbSNP 118
  • Additional homology data from the Compara database
Zebrafish WGS assembly 3 Release
Ensembl Zebrafish

Zebrafish 19.3.2 features the zebrafish whole genome shotgun assembly sequence version 3, as released on the 27th November 2003.

In addition to the new assembly and genebuild, Zebrafish 19.3.2 contains:

  • SNP set from dbSNP 118
  • New mapped EST database
  • Additional homology data from the Compara database
C. elegans Wormbase 116 dataset
Ensembl C. elegans

This release of Ensembl features a direct import of the C.elegans 116 dataset from Wormbase. As usual, no additional genebuild was carried out, but a series of blast runs was performed to provde ESTs, SwissProt hits, and other similarity data.

The canonical data for C. elegans is managed at http://www.wormbase.org/.

Compara

The Compara database has been updated as follows:

  • New blastz alignments from UCSC (details at end of mail) and calculation if synteny based on them, for:
    • human/mouse
    • human/rat
    • mouse/rat
    • human/chimp
  • New phusion/blastn and calculation if synteny based on them, for:
    • C.elegans/C.briggsae
  • New homology data:
    • mouse/human -> new dN/dS
    • mouse/rat -> new dN/dS
    • mouse/fugu
    • mouse/zebrafish
    • zebrafish/human
    • zebrafish/rat
    • zebrafish/fugu
    • C.elegans/C.briggsae -> new dN/dS
  • New family clustering.
New Rat SNPs

The Rattus norvegicus SNP database has been updated to dbSNP 118.

Updated Data
  • Fugu rubripes core database has been updated with the addition of 11782 scaffolds (<2kb) that were previously missing. There is no new genebuild.
  • Homo sapiens core database has been updated with new Affymetrix xref data. There is no new genebuild.

Schema changes

There are no schema changes in this release.

Website Changes

GeneDAS

With release 19.2 of the web code, GeneDAS and ProteinDas sources can be added and removed from GeneView and ProtView pages.

See, e.g. http://www.ensembl.org/Homo_sapiens/geneview?gene=BRCA2

Webcode redesign

Release 19.2 sees the continued rollout of the redesigned webcode. Updated pages in this release are HaploView and MarkerView.

HaploView update

The 19.2 webcode incorporates an improved version of HaploView, contributed by Pedro Gomez-Fabre of GSK. The improvements include being able to select SNPs involved in a haplotype block and calculate the minimal set required to distinguish the haplotypes.

See, e.g. http://www.ensembl.org/Homo_sapiens/haploview?haplotype=CHR22_A_10

Many thanks to Pedro & GSK for this contribution.

FTP dump change

From this release, pseudogene cDNAs are dumped to a separate FASTA file from the known and novel cDNA files.

Availability

The Ensembl FTP site is currently being updated with new copies of all databases and flatfiles. This should be complete within a day or so. Your patience is appreciated during this process.

The databases will also be copied to the public MySQL server, ensembldb.ensembl.org, within the next few days.

Please note

The/human, mouse/rat and human/rat blastz alignments stored in Compara originated from UCSC

mouse NCBIM32 vs human NCBI34

Downloaded from http://genome.ucsc.edu/goldenPath/mm4/vsHg16/

  • axtNet directory for "blastz net" track description at UCSC here
  • axtTight directory for "blastz net tight" track description at UCSC here
mouse NCBI32 vs rat RGSC3.1

Downloaded from http://genome.ucsc.edu/goldenPath/mm4/vsRn3/

  • axtNet directory for "blastz net" track description at UCSC here
  • axtTight directory for "blastz net tight" track description at UCSC here
human NCBI34 vs rat RGSC3.1

Downloaded from http://genome.ucsc.edu/goldenPath/hg16/vsRn3/

  • axtNet directory for "blastz net" track description at UCSC here
  • axtTight directory for "blastz net tight" no track description at UCSC (basically the same as for mouse/human axtTight)
human NCBI34 vs chimp BROAD1

Downloaded from http://genome.ucsc.edu/goldenPath/hg16/vsPt0/

  • axtBest directory for "blastz recip net" track description at UCSC here

More

Ensembl version 19 released 17th Dec 2003

The Ensembl Developers are pleased to announce the release of Ensembl 19.

New Data

Human Build 34a Gene Update

Ensembl v19 contains an updated gene build on the NCBI34 assembly released in v18. The majority of gene structures from build 34 are unchanged, but 200-300 are improved predictions produced as a result of corrections in our gene building software. The build contains 23531 gene predictions with 31609 transcripts, including 1744 pseudogenes.

We are continuing to improve our prediction methods and expect to produce another NCBI34 rebuild early in 2004.

The 34a version increment indicates a data update without a change in assembly.

In addition to the updated genebuild, Human 19.34a.1 contains:

  • Improved archived identifier data in the core database
  • Filtered SNP set from dbSNP 117. The new SNP database contains 276000 fewer SNPs, as these were mapped to the alternate HSC_TCAG which is not represented in Ensembl
  • Additional homology data from the Compara database
Rat 3a ESTGene update

Due to an uncaught error in database production, the Rat ESTgenes in the last (v18) release of Ensembl erroneously shared IDs with Ensembl genes. This error has been corrected for release v19 which contains updated estgene and lite databases. The Ensembl Mart database also contains the corrected data.

Compara

The v19 Compara database has been updated using the new human geneset data. This includes new protein family clustering and new homologous gene pairs implicating human genes.

v19 Compara also contains the new addition of dN and dS values for each homologous gene pair. These data are displayed on the GeneView pages, and can be retrieved via EnsMart.

Schema changes

  • Compara
    • addition of two columns, "dn" and "ds" in the homology table

These columns are filled only for some paired species, i.e. human/mouse, human/rat, mouse/rat and elegans/briggsae. For the other paired species, the ds values obtained were saturated and were viewed as unreliable. For those cases, the ds and dn columns are NULL.

See the end of this release note for details of how dN and dS values were calculated.

Website Changes

GeneDAS

This release sees a further extension and incorporation of the DAS protocol into Ensembl. For a long time DAS has been used to include external annotations, including user data, on ContigView displays. This concept has now been applied to the GeneView display, enabling external annotations on specific genes to be incorporated into the page.

The first dataset to be included is SwissProt literature references for genes. This is only the beginning for GeneDAS, and we are working on providing a number of data sources, as well as enabling users to display their own gene annotations.

See, e.g. http://www.ensembl.org/Homo_sapiens/geneview?gene=BRCA2

GeneSNPView

GeneSNPView is a new gene-centric SNP display. This page, linked from GeneView, shows details of SNPs and Pfam domains in, or close to, the exons of the transcripts of a particular gene. The SNP data includes their location, alleles, classification, and effects on the different transcripts.

See, e.g. http://www.ensembl.org/Homo_sapiens/genesnpview?gene=ENSG00000006756

ChromoView data display

A new "tool" page has been added this release. ChromoView enables the customisable display of feature frequency data against a single chromosome or a karyotype. The data can be provided by cut-and-paste, by file upload, or by providing a URL of a datafile. ChromoView accepts a variety of formats, including simple tab- or space-delimited, PSL, BED and GFF.

ChromoView is available for all species with an assembly mapped to chromosomes - e.g. http://www.ensembl.org/Homo_sapiens/chromoview

Please let us know if you find it useful, or have suggestions for improvements.

GeneView Orthologue display

GeneView now displays further details from the Compara database. Orthologues now show how they were selected (Best Reciprocal Hit or Reciprocal Hit based on Synteny around BRH) and give a value for dN/dS where available.

ContigView Tracks
  • New tracks
    • Anopheles Gap track (Decorations menu, Gaps). This track displays the location of gaps both between and within scaffolds.
  • Updated tracks
    • Ensembl & Vega transcript tracks now show include the type of transcript in the labels below each transcript. This can be switched off by selecting "Concise labels" in the Decorations menu.
    • The BLAST hit track now shows additional match details.
  • New DAS sources
    • OMIM Disease Phenotypes
    • Chimpanzee Contig alignments
    • Chimpanzee BAC pairs
    • NCBI Gnomon gene predictions
    • Rfam RNA gene predictions
Webcode redesign

Release 19 sees the continued rollout of the redesigned webcode (http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/WebcodeRedesign.html). Updated pages in this release are FASTAView, ExonView, MapView, AnchorView and SNPView.

MartView updates
  • Number of transcripts per gene is available for export and filtering.
  • Exons are flagged as constitutive or alternative depending on their presence or not in all the transcripts of a gene.
  • New data available on orthologous gene relationships
    • protein stable ids involved in the match
    • %identity, %coverage, and %positivity of the match
    • In addition for human<->mouse<->rat and elegans<->briggsae comparisons the dn and ds values are shown where ds < median*2 based on the whole paired species set.

Availability

The Ensembl FTP site is currently being updated with new copies of all databases and flatfiles. This should be complete within a day or so. Your patience is appreciated during this process.

The databases will also be copied to the public MySQL server, ensembldb.ensembl.org, within the next few days.

As is customary, there will be no release at the beginning of January. As a result, the next release of Ensembl is scheduled for 2 February 2004.

dN/dS calculation details

dN and dS values were generated using the codeml program included in the PAML package (http://abacus.gene.ucl.ac.uk/software/paml.html) (Ref 1). With the parameters we have used, codeml performs pairwise Maximum Likelihood calculations of dN and dS for each set of orthologs. We have used the F3x4 codon evolution model (Ref 2). This takes into account both the bias deriving from the different probabilities of transition (T<->C and A<->G) versus transversion (T/C<->A/G) mutations, and the bias due to different nucleotide frequencies at the three codon positions.

dN and dS values are only provided for orthologues from some species pairs, i.e. human/mouse, human/rat, mouse/rat and elegans/briggsae. Orthologs for other species pairs are too divergent for dS to be an accurate measure. Most synonymous sites will have be subjected to more than one mutation and ancestral changes cannot be reliably inferred from extant sequences, (i.e. dS is saturated).

Orthology predictions for human/mouse, human/rat, mouse/rat and elegans/briggsae may not be perfect. Incorrect assignments will manifest anomalously high dS values. We have, therefore, applied a cutoff of twice the median value of all dS for each species pair as the criterion for displaying the dN/dS ratio. Predicted orthology relationships with dS above this threshold are likely to be errors. (This filter has been used successfully for the mouse and rat genome analysis papers and was suggested by Chris Ponting's group in Oxford).

Here are the dS threshold values used

                        dS threshold
human/mouse             1.26775
human/rat               1.27342
mouse/rat               0.41278
elegans/brigssae        4.53168

For example, for human/mouse orthologues, the dN/dS ratio is displayed only when dS<=1.26775

NB: some may consider that elegans/briggsae dS values have to be considered as saturated (median = 2.26584, much > 1). After applying the dS threshold (4.53168), the median and average on the remaining set were 1.93962 and 2.062 respectively, and comparable with the data was published in the C. briggsae genome paper (Ref 3), average 1.78 (no median was provided in the paper).

  1. Yang, Z. (1997) "PAML: a program package for phylogenetic analysis by maximum likehood." Comput. Appl. Biosci. 1997 13: 555-556.
  2. Goldman, N. & Zang, Y. (1994) "A codon-based model of nucleotide substitution for protein-coding DNA sequences." Mol. Biol. Evol. 11, 725-736.
  3. Stein, LD et al. (2003) "The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics." PLoS Biol. 1, 166-192.

More

Chimp Pre Ensembl site available 11th Dec 2003

We have released the recently deposited chimpanzee assembly on the pre site at the Pre Ensembl site.

The pre site provides browsability of the genome with intial gene structures determined by matching Swissprot and RefSeq to the chimpanzee. BLAST and SSAHA searches are also enabled and download of specific regions. However a full gene build (with peptide dumps etc) is not yet available.

We hope to release a fully annotated chimpanzee assembly in early 2004, but the timelines will depend on how the data processing goes and as this is the first time we have processed a very near species pair we can't set a firm schedule.

The sequence of the chimpanzee, Pan troglodytes, was assembled by NHGRI-funded teams led by Eric Lander, Ph.D., at The Eli & Edythe L. Broad Institute of the Massachusetts Institute of Technology and Harvard University, Cambridge, Mass., USA; and Richard K. Wilson, Ph.D., at the Genome Sequencing Center, Washington University School of Medicine, Saint Louis, USA.

We are using the nominated Arachne assembly from the sequencing group. From their release notes:

This assembly is a merge of a "modified de novo" (MDN) assembly from the ARACHNE group, which used the human genome to establish that particular inserts were not chimeric, and a separate "validated chimp-on-human" (VCH) assembly which took the chimp reads which align uniquely to human, formed them into contigs via this alignment, and removed contigs which failed a two-haplotype consistency check.

Shared reads were used to align the two assemblies to each other, and where they were consistent, the VCH sequence was transferred to the MDN, resulting in the released merged assembly.

This release of the assembly has the following properties: 361782 contigs, having N50 length 15.7 kb contig length total 2.73 Gb, spanning 3.02 Gb 37849 supercontigs, having N50 length 8.6 Mb (not including gaps).

More

Zv3 pre-ensembl 27th Nov 2003

A Zv3 pre-ensembl has been released today. It comprises the sequence of the first zebrafish assembly that could be tied to the fpc map. The full ensembl database including a gene build is scheduled for release in February 2004.

More

Mus musculus pre-release website is now available 21st Nov 2003

The Mus musculus pre-release website site shows only the DNA sequence, initial gene placement, repeatmasking and raw BLAST hits on this genome. We are currently working on providing a fully-featured Ensembl build, including a full gene build, cross-references to other datasets and data mining interface. The annotated assembly will be released on the main Ensembl site (http://www.ensembl.org/) site at the start of Feb 2004.

Browsable annotations include
  • hits to EMBL vertebrate mRNA sequences
  • hits to GenBank Unigene clusters
  • hits to SWISS-PROT, TrEMBL and RefSeq proteins
  • ab-initio gene predictions from genscan analyses
Other functionality includes
  • BLAST/SSAHA sequence similarity searches for assembly and genscan predictions
  • export of genomic regions in Fasta, EMBL, GenBank, GFF, text and image formats

More

Ensembl version 18 released 5th Nov 2003

The Ensembl Developers are pleased to announce the release of Ensembl 18.

New Data

Human Build 34

This release contains the Ensembl gene build on human assembly 34. Build 34 is an update to the finished human genome, with a number of small improvements in genome sequence on a number of chromosomes.

This release has 22,184 genes comprising 27,941 coding transcripts and 1853 pseudogenes which are easily confused with genes. It is expected that there are between 20,000 to 30,000 pseudogenes in the genome; Ensembl only currently annotates those which confuse the gene prediction process.

Depending on the precise estimates of total protein gene number, Ensembl has annotated between 80% to 90% of all protein coding genes. 92% of genes from the previous build transferred across to the new build, with the missing 8% of genes predominantly being inappropriate protein coding genes (e.g. coming from large scale cDNA projects which have a number of artefactual errors, or from chimeric cDNA clones from cancer cell lines). However, a very small number (around 1%) were clearly "correct" genes which were misclassified as artefactual errors. The next release of Ensembl (due December 15th) will update the data for this very small number of genes.

In addition to the new assembly and genebuild, Human 18.34.1 contains:

  • new EST database
  • new Vega annotation on NCBI34
  • new ESTgene database
  • new SNP data from dbSNP 117 mapped to NCBI34
Rat assembly 3

This release also contains the Ensembl gene build on rat build 3.1. Build 3.1 is a draft genome assembly covering more than 90% of the estimated 2.8 Gb genome.

This release has 22,159 genes comprising 28,545 transcripts, and 1,592 pseudogenes. Using an estimate of 26,000-29,000 protein coding genes, Ensembl has annotated around 75-85% of the total.

76% of genes from the previous build transferred across to the new build, with the majority of genes which have been missed primarily being due to assembly changes.

In addition to the new assembly and genebuild, Rat 18.3.1 contains:

  • new EST database
  • new ESTgene database
  • new SNP data from dbSNP 117 mapped to the new assembly
Compara
  • Updated for the above new species data. In addition, in this release, the peptide family data has been merged into compara, and no longer has its own database. This release of compara, then, contains newly computed peptide families and multiple alignments of family members, using latest SWISSPROT,SPTREMBL and EnsEMBL metazoan peptide sets.
GO
  • The Ensembl GO database has been updated to the latest, October 2003, release.

Schema & API Changes

  • Compara
    • database contains new tables for family data
    • compara API includes family objects
  • Core
    • two columns renamed in identity_xref table
    • hit_start -> query_start
    • hit_end -> query_end

Website Improvements

Webcode redesign

In order to improve reusability and maintainability, the webteam have redesigned the general scheme for the webcode. This release sees the first pages (GeneView, TransView, ProtView) implemented under the new design. There should be little or no visible difference to the pages; all the changes are behind the scenes.

For more details on the new design, see: here

KaryoView data display

A new "tool" page has been added this release. KaryoView enables the customisable display of user data on a karyotype, and is intended to provide images for use in presentations, papers, etc. KaryoView will work on all species with an assembly and a karyotype - e.g. http://www.ensembl.org/Homo_sapiens/karyoview

Please let us know if you find it useful, or have suggestions for improvements.

MartView updates
  • Genes can now be filtered on whether they have a 5' or 3' UTR

Availability

The Ensembl FTP site is currently being updated with new copies of all databases and flatfiles. This should be complete within a day. Your patience is appreciated during this process.

The databases will be copied to the public MySQL server, ensembldb.ensembl.org, within the next few days.

More

Ensembl version 17 released 3rd Oct 2003

The Ensembl Developers are pleased to announce the release of Ensembl 17.

Updated Data

This release is mostly a data/schema update. There are no new assemblies, and only one new gene-build.

Anopheles gambiae

There has been a re-annotation and gene-build of the version 2 assembly. The new data are:

  • New SNAP gene predictions which have been specifically trained for the Anopheles data set.
  • New EST mappings including a new gene build with ESTs.
  • Added tRNA using tRNAscan-SE 1.23, CPG islands, Tandem repeats.
  • An improved set of repeats from RepeatMasker.
  • New BlastX hits of SWALL and Drosphila peptides.
  • Preliminary version of the transposon submission tool.
Homo sapiens
  • Import of dbSNP 116 SNPs
Mus musculus
  • Import of dbSNP 116 SNPs

Schema Changes

  • Core database
    • cigar line and associated information added to identity_xref table
  • SNP database (mouse/human only)
    • additional moltype column added to RefSNP table

Website Improvements

MartView updates
  • New Multispecies focus
  • Simplified Affymetrix IDs
  • Prototype of the command line interface to Mart - MartShell

See the EnsMart homepage for further details of these updates.

Availability

The Ensembl FTP site has been updated with new copies of all databases and flatfiles.

The databases will be copied to the public MySQL server, ensembldb.ensembl.org, within the next few days.

More

Ensembl version 16 released 4th Aug 2003

The Ensembl Developers are pleased to announce the release of Ensembl 16.

Updated Data

This release is mostly a data/schema update. There are no new assemblies, and no new genebuilds.

Homo sapiens
  • New (correct) Karyotype lengths obtained from UCSC and imported (core & Vega)
  • New GO evidence tags associated with go xrefs (core)
  • Gene descriptions loaded into Vega (Vega)
  • Many superflous features refencing non-existant contigs removed (core & Vega)
Danio rerio
  • missing gene_decripions added
  • corrected analysis.gff_feature and analysis.gff_source values for some protein features
Fugu rubripes
  • Many incorrect locuslink xrefs are now correctly labelled as SwissProt xrefs
Compara
  • New compara database generated with improved blastp parameters and improved best-reciprocal-hit putative orthologue analysis

Schema Changes

  • Core database
    • go_xref table definition altered
    • type column added to stable_id_event table

Website Improvements

New BLAST/SSAHA interface

A new interface, along the lines of Martview, has been implemented for this release. The new page provides access to both BLAST and SSAHA, can search against multiple species, and accepts multiple query sequences. Under the new interface is a complete redesign of the Blast/SSAHA submission and parsing code, intended to make the page readily extensible to new search tools and output formats, and eventually easier to install locally.

Multiple Protein Alignments

The Ensembl protein family database contains alignments for members of all but the largest protein families. These can now be exported from the familyview pages in a variety of formats (FASTA, MSF, ClustalW, etc).

MartView updates

See the EnsMart homepage for details of Mart updates.

Availability

The Ensembl FTP site has been updated with new copies of all databases and flatfiles. The databases will be copied to kaka within the next few days.

More

Ensembl version 15 released 4th Jul 2003

The Ensembl Developers are pleased to announce the release of Ensembl 15.

New Data

Finished Human Assembly

Ensembl human release 15.33.1 is built around the NCBI 33 genome assembly. This is the first 'essentially complete' assembly of the human genome and covers about 99 percent of the euchromatic sequence with less than 400 gaps. The average size of contiguous sequence is now over 27Mb which is more than 300 times longer than the working draft produced in 2000. The ensembl annotation consists of 23,299 protein coding genes (30,035 transcripts) and, for the first time, automatic annotation of 962 pseudogenes.

In addition to the new assembly and genebuild, Human 15.33.1 contains:

  • new EST database on NCBI33
  • new Vega annotation on NCBI33
  • new ESTgene database on NCBI33
  • new SNP data from dbSNP 115 mapped to NCBI33
  • new stable_id archive data
Caenorhabditis elegans

New assembly and annotation imported from wormbase (release 102)

Drosophila melanogaster

New assembly information and annotation imported from flybase (v. 3.1)

Danio rerio
  • new interpro data
  • three more analysis: trf, eponine, BAC-end matches
  • improved xref mapping
  • trimmed out low-scoring blast hits
  • new SNP database containing data from dbSNP 115
  • new stable_id archive data
Compara

Updated for the above new species data

GO

The Ensembl GO database has been updated to the latest, May 2003, release.

Family

newly computed peptide families and multiple alignments of family members, using latest SWISSPROT,SPTREMBL and EnsEMBL metazoan peptide sets.

Schema & API Changes

Core database have a new table : go_xref

Website Improvements

New BLAST/SSAHA interface

A new interface, along the lines of Martview, has been implemented for this release. The new page provides access to both BLAST and SSAHA, can search against multiple species, and accepts multiple query sequences. Under the new interface is a complete redesign of the Blast/SSAHA submission and parsing code, intended to make the page readily extensible to new search tools and output formats, and eventually easier to install locally.

Stable ID archive

Ensembl identifiers which are no longer active should now be recognised by web pages such as geneview.

Multiple Protein Alignments

The Ensembl protein family database contains alignments for members of all but the largest protein families. These can now be displayed from familyview pages, using the JalView java multiple alignment editor.

URL-based data upload tracks

This release sees an implementation of UCSC-style URL-based remote data annotation, allowing custom data tracks to be displayed without the need to set up or configure a DAS server.

Searchable Mail Archive

The Ensembl mailing lists are now archived at mailarchive.sanger.ac.uk, with a new search interface.

MartView updates

See the EnsMart homepage for details of Mart updates

Availability

The Ensembl FTP site is currently being updated with new copies of all databases and flatfiles. This should be complete within a day. Your patience is appreciated during this process.

The databases are copied to kaka. Please note that this release also sees a new fasta file naming convention as follows:

<species_name>.<assembly_name>.<file_content_type>.fa

For example the files:

Homo_sapiens.cdna.known.fa
Mus_musculus.latestgp.fa
Ratus_norvegicus.pep.genscan.fa

have now become:

Homo_sapiens.NCBI33.cdna_known.fa
Mus_musculus.NCBIM30.contig.fa
Rattus_norvegicus.RGSC2.pep_genscan.fa

More

Ensembl version 14 released 3rd Jun 2003

We are pleased to announce the release of Ensembl 14.

New Data
  • New ZFISH2 whole genome shotgun assembly, annotation, and estgenes