IntroductionPhilosophyMedical LinguisticsBiologyEvo-DevoEcor

Swiss Institute of Bioinformatics

 

Who are we

The Swiss Institute of Bioinformatics (SIB) is an academic not-for-profit foundation established on March 30, 1998 whose mission is to promote research, development of databanks and computer technologies, teaching and service activities in the field of bioinformatics, in Switzerland with international collaborations.

The Swiss-Prot group of the SIB, headed by Pr Amos Bairoch develops and improves the Swiss-Prot/Uniprot protein knowledgebase, the most widely used protein information resource. This activity is carried out in close collaboration with the European Bioinformatics Institute (EBI), in Hinxton (UK). The goal is to provide the worldwide Life Science community with the highest quality level of protein-related information. In addition, the Swiss-Prot group maintains and distributes related biological databases: PROSITE, a database of protein families and domains; ENZYME, a repository of information relative to the nomenclature of enzymes; NEWT, a taxonomy database. The Swiss-Prot group is also involved in the diffusion of resources to the international community through the ExPASy proteomics server maintained by the SIB.

Controlled vocabularies

Developing a database which is considered as a reference in the genomic/proteomic domain means playing a crucial role in promoting the use of a standard vocabulary for the description of biological entities. We are working towards implementing controlled terms in many fields of Swiss-Prot database entries, thus:

  • Protein names are carefully given by curators according to the published information about a protein. Synonyms are kept, but spurious names are discarded. The same procedure is applied for names of genes coding for the protein.
  • Protein family names are given, which allow to link proteins from different species and to distinguish paralogous and orthologous evolutionary relationships between proteins. The hierarchical organisation of these families is given at three levels at the most - super-family, family and subfamily.
  • Chemical reactions and pathways in which the protein is involved are described using a controlled vocabulary based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB). The various protein isoforms coded by a single gene are also described with a controlled vocabulary.
  • Protein sequence features, such as domains or post-translational modifications, are displayed in a table format.
  • Keyworks are provided, which qualify the protein in order to help query the database. These terms are precisely defined, including information about attribution rules and hierarchical relationship.
  • Gene Ontology terms, are manually attributed to Swiss-Prot entries in the framework of the GOA project.

These various resources are listed and described on the server of Swiss-Prot/Uniprot.

Metabolic pathway ontology

We are currently developing an ontology to represent metabolic pathways in the context of Swiss-Prot database. Information relative to metabolic pathway is presented as comment lines (topic PATHWAY) in Swiss-Prot entries. This information, currently given as free or semi-structured text, should be standardised using a controlled vocabulary in order to check consistency and to facilitate computational analysis. This controlled vocabulary is organised in super-pathway (parent in the pathway classification), pathway, and the reaction catalysed by the protein. A database dedicated to metabolic pathways has been developed. This database aims to represent the set of actors involved in metabolism as well as their relationships. Each component (super-pathway, pathway, reaction) has a label that can be used to generate automatically the new PATHWAY lines. These labels are based on controlled vocabulary and, when possible, are defined by rules in order to minimise human intervention. Based on this explicit representation, it is possible to give access to additional information (definitions, synonyms, graphical representation of pathways, reactions, compounds, etc) and to facilitate exchanges with other metabolic pathway databases.

Future projects

We plan to increase the medical annotation of protein in Swiss-Prot, e.i. information about the involvement of specific proteins in disease development. For that purpose we need to map the corresponding protein entries to the main medical resources, such as UMLS and other medical dictionaries or ontologies. In collaboration with IFOMIS, we started a preliminary study on the knowledge representation of colon cancer pathology. A formal model of ontology has been created in order to describe pathological stages of cancer by representing the relationship between entities at different level of granularity, e.g. organ, cell, subcellular location, substances. We currently need to instantiate this ontology.

Participants

Swiss-Prot ontology and controlled vocabulary group

Lina Yip, PhD, senior scientist
Anne Morgat, PhD, senior scientist
Serenella Ferro, PhD, senior curator
Livia Famiglietti, Ph.D., senior curator
Kristian Axelsen, PhD, senior curator

Text mining group

Anne-Lise Veuthey, PhD, senior scientist
Violaine Pillet, PhD, post-doc
Marc Zhender, PhD, post-doc
Pavel Dobrokhotov, PhD student

Publications
Apweiler R., Bairoch A., Wu C.H., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J., Natale D.A., O'Donovan C., Redaschi N., Yeh L.S.; UniProt: the Universal Protein knowledgebase; Nucleic Acids Res. 32:D115-D119(2004).
Hulo N., Sigrist C.J., Le Saux V., Langendijk-Genevaux P.S., Bordoli L., Gattiker A., De Castro E., Bucher P., Bairoch A.; Recent improvements to the PROSITE database; Nucleic Acids Res. 32:D134-D137(2004).
Phan I.Q.H., Pilbout S.F., Fleischmann W., Bairoch A.; Newt, a new taxonomy portal; Nucleic Acids Res. 31:3822-3823(2003).
Fleischmann A., Darsow M., Degtyarenko K., Fleischmann W., Boyce S., Axelsen K.B., Bairoch A., Schomburg D., Tipton K.F., Apweiler R.; IntEnz, the integrated relational enzyme database; Nucleic Acids Res. 32:D434-D437(2004).
Yip L., Famiglietti L.M., Gasteiger E., Bairoch A.; Protein variations: resources and tools; (In) Biomedical application of proteomics, Hochstrasser D.F., Corthals G., Sanchez J.-C., Eds, pp.389-421, Wiley-VCH (2004).
Yip L., Scheib H., Diemand A.V., Gattiker A., Famiglietti L.M., Gasteiger E., Bairoch A.; The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants; Hum. Mutat. 23:464-470(2004).
Bairoch A., Boeckmann B., Ferro S., Gasteiger E.; Swiss-Prot: juggling between evolution and stability; Briefings Bioinform. 5:39-55(2004).
Farriol-Mathis N., Garavelli J.S., Boeckmann B., Duvaud S., Gasteiger E., Gateau A., Veuthey A.-L., Bairoch A.; Annotation of post-translational modifications in the Swiss-Prot knowledgebase; Proteomics 4:1537-1550(2004).
Kumar A., Yip L. Smith B. Grenon P.; Bridging the gap between medical and Bioinformatics: An ontological case study in colon carcinoma; Computers in Biology and Medicine, submitted.
Pillet V., Zehnder M., Seewald A., Veuthey A.-L., Petrack J.; GPSDB: a new database for synonyms expansion of gene and protein names; Bioinformatics, in press.
 
INTRODUCTION | PHILOSOPHY | MEDICAL LINGUISTICS | BIOLOGY | EVO-DEVO | ECOR

Webdesign COS | Geneva | Switzerland
Last update : 13.06.2006