Home » Database


List of biological databases



Meta Databases

Meta databases are databases of databases that collect data about data to generate new data. They are capable of merging information from different sources and making it available in a new and more convenient form, or with an emphasis on a particular disease or organism.

    1. BioGraph (University of Antwerp, Vlaams Instituut voor Biotechnologie) A knowledge discovery service based on the integration of more than 20 heterogeneous database
  • ConsensusPathDB – A molecular functional interaction database, integrating information from 12 other databases.
  • mGen containing four of the world biggest databases GenBank, Refseq, EMBL and DDBJ – easy and simple program friendly gene extraction
  • SOURCE (Stanford University) encapsulates the genetics and molecular biology of genes from the genomes of Homo sapiens, Mus musculus, and Rattus norvegicus into easy to navigate GeneReports
  • iRefIndex: provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, InnateDB, IntAct, MatrixDB, MINT, MPact, MPIDB, MPPI and OPHID
  • Nowomics Tracks changes in several biological databases, users ‘follow’ genes and keywords to see a news feed of new data and papers.
  • The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups to build a comprehensive parts list of functional elements in the human genome. The corresponding data is available for download and analysis from UCSC Genome Browser.
  • Human Epigenome Atlas, a collection of normal epigenomes of different tissues produced by Roadmap Epigenomics Project. Data types include histone modifications, DNA methylation, chromatin accessibility, gene expression, and small RNA expression.
  • Metascape provides click-to-extract access to gene-centric function annotations compiled from dozens of databases including NCBI (Entrez, OMIM, ClinVar), GO, KEGG, MSiGDB, UniProt, Protein Atlas, Ensembl, JAX, DrugBank, NHGRI-EBI, DDG2P.




Nucleic Acid Databases


DNA Databases

Primary Databases International Nucleotide Sequence Database (INSD) consists of the following databases.

DNA Data Bank of Japan (National Institute of Genetics)




The three databases, DDBJ (Japan), GenBank (USA) and European Nucleotide Archive (Europe), are repositories for nucleotide sequence data from all organisms. All three databases accept nucleotide sequence submissions, and then exchange new and updated data on a daily basis to achieve optimal synchronisation between them. These three databases are primary databases, as they house original sequence data. They collaborate with Sequence Read Archive (SRA), which archives raw reads from high-throughput sequencing instruments.

Secondary Databases

    1. RefSeq
    • SNP / Disease Databases
    1. OMIM Online Mendelian Inheritance in Man OMIM Inherited Diseases




Gene Expression Databases (mostly Microarray data)


Main article: Microarray databases
    1. ArrayExpress (European Bioinformatics Institute)
  • GPX(Scottish Centre for Genomic Technology and Informatics)
  • Bgee Bgee is a database to retrieve and compare gene expression patterns between species. It only contains wild-type and manually curated microarray/RNASeq/in situ experiments.
  • The European Genome-phenome Archive (EGA)
  • The Genotype-Tissue Expression (GTEx) Project (GTEx). The Genotype-Tissue Expression (GTEx) project aims to provide to the scientific community a resource with which to study human gene expression and regulation and its relationship to genetic variation. This project will collect and analyze multiple human tissues from donors who are also densely genotyped, to assess genetic variation within their genomes.
  • Expression Atlas: Differential and Baseline Expression (https://www.ebi.ac.uk/gxa/home).The Expression Atlas provides information on gene expression patterns under different biological conditions. Gene expression data is re-analysed in-house to detect genes showing interesting baseline and differential expression patterns.
  • The Human Protein Atlas (http://www.proteinatlas.org). The Human Protein Atlas contains information for a large majority of all human protein-coding genes regarding the expression and localization of the corresponding proteins based on both RNA and protein data. The atlas consists of three subparts; cell, normal tissue, and cancer with each subpart containing images and data based on antibody-based proteomics and transcriptomics.



Genome Databases

These databases collect genome sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single model organism genome.

    1. Bioinformatic Harvester
  • SNPedia
  • FeatSNP.org. Functional Epigenetic Annotation Tool of SNPs (FeatSNP) is an online tool and a curated database for exploring common SNPs’ potential functional impact on the human brain. FeatSNP currently supplies the following functional and epigenetic annotations of human SNPs
  • CAMERA Resource for microbial genomics and metagenomics
  • Corn, the Maize Genetics and Genomics Database
  • EcoCyc a database that describes the genome and the biochemical machinery of the model organism E. coli K-12
  • Ensembl Genomes provides genome-scale data for bacteria, protists, fungi, plants and invertebrate metazoa, through a unified set of interactive and programmatic interfaces (using the Ensembl software platform).
  • PATRIC, the PathoSystems Resource Integration Center
  • RegulonDB RegulonDB is a model of the complex regulation of transcription initiation or regulatory network of the cell E. coli K-12.
  • Repbase Repbase is the most commonly used database for repetitive elements (transposons).
  • The SEED platform for microbial genome analysis includes all complete microbial genomes, and most partial genomes. The platform is used to annotate microbial genomes using subsystems.
  • TAIR, The Arabidopsis Information Resource.
  • [4] INTEGRALL: Database dedicated to integrons, bacterial genetic elements involved in the antibiotic resistance
  • EzGenome, comprehensive information about manually curated genome projects of prokaryotes (archaea and bacteria)[3]
  • GeneDB for Apicomplexan Protozoa, Kinetoplastid Protozoa, Parasitic Helminths, Parasite Vectors + several bacteria and viruses
  • EuPathDB Eukaryotic pathogen database resources includes amoeba, fungi, plasmodium, trypanosomatids etc.
  • SNiPhunter SNP search engine: search for SNPs in Pubmed open access literature using SNP IDs.
  • The 1000 Genomes Project was launched in January 2008. The genomes of more than a thousand anonymous participants from a number of different ethnic groups were analyzed and made publicly available.
  • PeanutBase: genetic and genomic data to enable more rapid crop improvement in peanut.[5]
  • Legume Federation: A consortium of scientists working to support robust agriculture for a substantially legume-fed world.[4]


Phenotype Databases

    1. PhenCode linking human mutations with phenotyp
  • PhenomicDB multi-organism database linking genotype to phenotype
  • PHI-base Pathogen-host interaction database. It links gene information to phenotypic information from microbial pathogens on their hosts. Information is manually curated from peer reviewed literature.


  • Planform: planarian formalized-experiments database, linking surgical, genetic, and pharmacological perturbations to morphological phenotypic outcomes from published planarian regeneration experiments.
  • Limbform: limb formalized-experiments database, linking surgical, genetic, and pharmacological perturbations to morphological phenotypic outcomes from published multi-organism limb regeneration experiments.




RNA Databases

    1. C-It-Loci [5] – A database of RNA expression and conserved loci for studying lncRNAs across species.
  • LncRNAWiki [6], a wiki-based database for community curation of known human long non-coding RNAs
  • Rfam [7], a database of RNA families
  • DASHR The DAtabase of Small Human non-coding RNAs: integrated annotation and sequencing-based expression data for all major classes of human small non-coding RNAs (sncRNAs) for both full sncRNA transcripts and mature sncRNA products derived from these larger RNAs.
  • MONOCLdb The MOuse NOnCode Lung database: Annotations and expression profiles of mouse long non-coding RNAs (lncRNAs) involved in Influenza and SARS-CoV infections.
  • MINTbase, a framework for the interactive exploration of mitochondrial and nuclear tRNA fragments
  • RDP, the Ribosomal Database Project


Amino Acid / Protein Databases


Protein Sequence Databases

    1. UniProt Universal Pesource (EBI, Swiss Institute of Bioinformatics, PIR)
  • PEDANT Protein Extraction, Description and ANalysis Tool (Forschungszentrum f. Umwelt & Gesundheit)
  • SUPERFAMILY Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
  • neXtProt – a human protein centric knowledge resource
  • InterPro Classifies proteins into families and predicts the presence of domains and sites.
  • ProteomeScout – Includes a graphics exports of protein annotations including domains, secondary structure, and post-translational modifications


Protein Structure Databases

Primary databases

    1. Protein Data Bank (PDB) comprising:



Secondary databases

    1. SCOP Structural Classification of Proteins
  • CATH Protein Structure Classification



For more protein structure databases, see also Protein structure database


Protein Model Databases

    1. Swiss-model[9] Server and Repository for Protein Structure Models
  • Protein Model Portal[11] (PMP) Meta database that combines several databases of protein structure models (Biozentrum, Basel, Switzerland)
  • Similarity Matrix of Proteins (SIMAP) is a database of protein similarities computed using FASTA.




Protein-Protein and Other Molecular Interactions

    1. BIND Biomolecular Interaction Network Database
  • IntAct molecular interaction database: a central, standards-compliant repository of molecular interactions, including protein–protein, protein–small molecule and protein–nucleic acid interactions.
  • iRefIndex: provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, InnateDB, IntAct, MatrixDB, MINT, MPact, MPIDB, MPPI and OPHID.




Proteomics Databases

    1. Proteomics Identifications Database (PRIDE) A public repository for proteomics data, containing protein and peptide identifications and their associated supporting evidence as well as details of post-translational modifications. (European Bioinformatics Institute)
  • ProteomeScout – A public repository of processed proteomics datasets concerning post-translational modifications, includes quantification across conditions (if applicable). Also includes a graphics exports of protein annotations.
  • OWL – A public non-redundant database for protein search, derived from : SWISS PROT, PIR, GenBank(translation) and NRL-3D
  • ProteomeXchange provides a coordinated submission of mass spectrometry proteomics data to the main existing proteomics repositories. It includes datasets such as PRIDE, Tranche, and PeptideAtlas.


Additional Databases


Carbohydrate Structure Databases

    1. EuroCarbDB[13], A repository for both carbohydrate sequences/structures and experimental data.


Signal Transduction Pathway Databases


Metabolic Pathway and Protein Function Databases

  • MetaboLights [18] Metabolomics experiments and derived information: metabolite structures, reference spectra, biological roles, locations and concentrations. (European Bioinformatics Institute)
  • MetaNetX Automated Model Construction and Genome Annotation for Large-Scale Metabolic Networks




Metabolomic Databases


Exosomal Databases


Mathematical Model Databases

  • The Cell Collective: build and simulate large-scale models in real-time and in a highly collaborative fashion



PCR and Quantitative PCR Primer Databases


Taxonomic Databases


  • EzTaxon-e, database for the identification of prokaryotes based on 16S ribosomal RNA gene sequences
  • BacDive is a bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity, including taxonomy information.


Radiologic Databases


Specialized Databases (Alphabetically Ordered)

  • betydb.org BETYdb is a database of plant traits, yields, and ecosystem services.
  • Bgee Bgee is a database to retrieve and compare gene expression patterns between species.
  • BioNumbers a database of useful biological numbers
  • Colorectal Cancer Atlas catalogs multiple genomic and proteomic data types from 13,711 tissue samples to identify sequence variants in more than 165 colorectal cancer cell lines.
  • Connectivity map Transcriptional expression data and correlation tools for drugs
  • DisGeNET DisGeNET is database that integrates information on gene-disease associations
  • DiProDB A database to collect and analyse thermodynamic, structural and other dinucleotide properties.
  • Drug2Gene Provides integrated information for identified and reported relations between genes/proteins and drugs/compounds
  • Dryad a repository of data underlying scientific publications in the basic and applied biosciences.
  • EpimiRBase A comprehensive database of microRNA-epilepsy associations.
  • FunSecKB The fungal secretome knowledgebase.
  • FunSecKB2 The fungal secretome and subcellular proteome knowledgebase (version 2)
  • GreenPhylDB (A phylogenomic database for plant comparative genomics)
  • HUGO (Official Human Genome Database: HUGO Gene Nomenclature Committee)
  • HvrBase++ Human and primate mitochondrial DNA
  • IEDB Immune Epitope Database
  • IMGT The international ImMunoGeneTics information system
  • INTERFEROME The Database of Interferon Regulated Genes
  • MetazSecKB The metazoa [human/animal] secretome and subcellular proteome knowledgebase
  • NCBI-UniGene (National Center for Biotechnology Information)
  • OrthoMaM (A database of Orthologous Mammalian Markers)
  • OrthoMCL Ortholog Groups of Protein Sequences from Multiple Genomes including Archaea, Bacteria and Eukaryotes.
  • p53 The p53 Knowledgebase
  • PASD The plant alternative splicing database
  • PlantSecKB The plant secretome and subcullular proteome knowledgebase
  • SABIO-RK SABIO-RK is a curated database that contains information about biochemical reactions, their kinetic rate equations with parameters and experimental conditions.
  • SciClyc An Open-access database to shared antibodies, cell cultures, and documents for biomedical research.
  • Selectome Selectome is a database of positive selection based on a rigorous branch-site specific likelihood test. Positive selection is detected using CODEML on all branches of animal gene trees.
  • SHMPD The Singapore Human Mutation and Polymorphism Database
  • SNPSTR database A database of SNPSTRs – compound genetic markers consisting of a microsatellite (STR) and one tightly linked SNP – in human, mouse, rat, dog and chicken.


  • The Cancer Genome Atlas (TCGA) provides data from hundreds of cancer samples obtained using high-throughput techniques such as gene expression profiling, copy number variation profiling, SNP genotyping, genome wide DNA methylation profiling, microRNA profiling, and exon sequencing of at least 1,200 genes.
  • TDR Targets A chemogenomics database focused on drug discovery in tropical diseases.
  • TRANSFAC A database about eukaryotic transcription factors, their genomic binding sites and DNA-binding profiles.
  • TreeBASE An open-access database of phylogenetic trees and the data behind them
  • Treefam TreeFam (Tree families database) is a database of phylogenetic trees of animal genes
  • [XTractor] Discovering Newer Scientific Relations Across PubMed Abstracts. A tool to obtain manually annotated relationships for Proteins, Diseases, Drugs and Biological Processes as they get published in PubMed.



Wiki-Style Databases



  • PubMed (references and abstracts on life sciences and biomedical topics)
  • FINDbase (the Frequency of INherited Disorders database)


    1. ^ Wren JD, Bateman A (2008). “Databases, data tombs and dust in the wind.”. Bioinformatics. 24 (19): 2127–8. doi:10.1093/bioinformatics/btn464. PMID 18819940.