|
| Who are we |
 |
The Swiss
Institute of Bioinformatics (SIB) is an academic not-for-profit
foundation established on March 30, 1998 whose mission is to promote
research, development of databanks and computer technologies,
teaching and service activities in the field of bioinformatics,
in Switzerland with international collaborations.
The Swiss-Prot group of the SIB, headed by Pr Amos Bairoch develops
and improves the Swiss-Prot/Uniprot
protein knowledgebase, the most widely used protein information
resource. This activity is carried out in close collaboration
with the European
Bioinformatics Institute (EBI), in Hinxton (UK). The goal
is to provide the worldwide Life Science community with the highest
quality level of protein-related information. In addition, the
Swiss-Prot group maintains and distributes related biological
databases: PROSITE,
a database of protein families and domains; ENZYME,
a repository of information relative to the nomenclature of enzymes;
NEWT, a taxonomy database. The Swiss-Prot group is also involved
in the diffusion of resources to the international community through
the ExPASy
proteomics server maintained by the SIB.
|
| |
| Controlled
vocabularies |
| |
Developing a database which is considered as a reference in the
genomic/proteomic domain means playing a crucial role in promoting
the use of a standard vocabulary for the description of biological
entities. We are working towards implementing controlled terms
in many fields of Swiss-Prot database entries, thus:
- Protein names are carefully given by curators according to
the published information about a protein. Synonyms are kept,
but spurious names are discarded. The same procedure is applied
for names of genes coding for the protein.
- Protein family names are given, which allow to link proteins
from different species and to distinguish paralogous and orthologous
evolutionary relationships between proteins. The hierarchical
organisation of these families is given at three levels at the
most - super-family, family and subfamily.
- Chemical reactions and pathways in which the protein is involved
are described using a controlled vocabulary based on the recommendations
of the Nomenclature Committee of the International Union of Biochemistry
and Molecular Biology (IUBMB). The various protein isoforms coded
by a single gene are also described with a controlled vocabulary.
- Protein sequence features, such as domains or post-translational
modifications, are displayed in a table format.
- Keyworks are provided, which qualify the protein in order to
help query the database. These terms are precisely defined, including
information about attribution rules and hierarchical relationship.
- Gene Ontology terms, are manually attributed to Swiss-Prot entries
in the framework of the GOA
project.
These various resources are listed and described on the server
of Swiss-Prot/Uniprot. |
| |
| Metabolic
pathway ontology |
| |
We are currently developing an ontology to represent metabolic
pathways in the context of Swiss-Prot database. Information relative
to metabolic pathway is presented as comment lines (topic PATHWAY)
in Swiss-Prot entries. This information, currently given as free
or semi-structured text, should be standardised using a controlled
vocabulary in order to check consistency and to facilitate computational
analysis. This controlled vocabulary is organised in super-pathway
(parent in the pathway classification), pathway, and the reaction
catalysed by the protein. A database dedicated to metabolic pathways
has been developed. This database aims to represent the set of
actors involved in metabolism as well as their relationships.
Each component (super-pathway, pathway, reaction) has a label
that can be used to generate automatically the new PATHWAY lines.
These labels are based on controlled vocabulary and, when possible,
are defined by rules in order to minimise human intervention.
Based on this explicit representation, it is possible to give
access to additional information (definitions, synonyms, graphical
representation of pathways, reactions, compounds, etc) and to
facilitate exchanges with other metabolic pathway databases.
|
| |
| Future projects |
| |
We plan to increase the medical annotation of protein in Swiss-Prot,
e.i. information about the involvement of specific proteins in
disease development. For that purpose we need to map the corresponding
protein entries to the main medical resources, such as UMLS and
other medical dictionaries or ontologies. In collaboration with
IFOMIS, we started a preliminary study on the knowledge representation
of colon cancer pathology. A formal model of ontology has been
created in order to describe pathological stages of cancer by
representing the relationship between entities at different level
of granularity, e.g. organ, cell, subcellular location, substances.
We currently need to instantiate this ontology.
|
| |
| Participants |
| |
Swiss-Prot ontology and controlled vocabulary group
Lina Yip, PhD, senior scientist
Anne Morgat, PhD, senior scientist
Serenella Ferro, PhD, senior curator
Livia Famiglietti, Ph.D., senior curator
Kristian Axelsen, PhD, senior curator
Text mining group
Anne-Lise Veuthey, PhD, senior scientist
Violaine Pillet, PhD, post-doc
Marc Zhender, PhD, post-doc
Pavel Dobrokhotov, PhD student
|
| |
| Publications |
| |
| |
Apweiler R., Bairoch A., Wu C.H., Barker W.C.,
Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane
M., Martin M.J., Natale D.A., O'Donovan C., Redaschi N., Yeh L.S.;
UniProt: the Universal Protein knowledgebase; Nucleic Acids Res. 32:D115-D119(2004). |
| |
Hulo N., Sigrist C.J., Le Saux V., Langendijk-Genevaux
P.S., Bordoli L., Gattiker A., De Castro E., Bucher P., Bairoch A.;
Recent improvements to the PROSITE database; Nucleic Acids Res. 32:D134-D137(2004). |
| |
Phan I.Q.H., Pilbout S.F., Fleischmann W., Bairoch
A.; Newt, a new taxonomy portal; Nucleic Acids Res. 31:3822-3823(2003). |
| |
Fleischmann A., Darsow M., Degtyarenko K., Fleischmann
W., Boyce S., Axelsen K.B., Bairoch A., Schomburg D., Tipton K.F.,
Apweiler R.; IntEnz, the integrated relational enzyme database; Nucleic
Acids Res. 32:D434-D437(2004). |
| |
Yip L., Famiglietti L.M., Gasteiger E., Bairoch
A.; Protein variations: resources and tools; (In) Biomedical application
of proteomics, Hochstrasser D.F., Corthals G., Sanchez J.-C., Eds,
pp.389-421, Wiley-VCH (2004). |
| |
Yip L., Scheib H., Diemand A.V., Gattiker A., Famiglietti
L.M., Gasteiger E., Bairoch A.; The Swiss-Prot variant page and the
ModSNP database: a resource for sequence and structure information
on human protein variants; Hum. Mutat. 23:464-470(2004). |
| |
Bairoch A., Boeckmann B., Ferro S., Gasteiger E.;
Swiss-Prot: juggling between evolution and stability; Briefings Bioinform.
5:39-55(2004). |
| |
Farriol-Mathis N., Garavelli J.S., Boeckmann B.,
Duvaud S., Gasteiger E., Gateau A., Veuthey A.-L., Bairoch A.; Annotation
of post-translational modifications in the Swiss-Prot knowledgebase;
Proteomics 4:1537-1550(2004). |
| |
Kumar A., Yip L. Smith B. Grenon P.; Bridging the
gap between medical and Bioinformatics: An ontological case study
in colon carcinoma; Computers in Biology and Medicine, submitted. |
| |
Pillet V., Zehnder M., Seewald A., Veuthey A.-L.,
Petrack J.; GPSDB: a new database for synonyms expansion of gene and
protein names; Bioinformatics, in press. |
| |
|
|