IntroductionPhilosophyMedical LinguisticsBiologyEvo-DevoEcor

Medical linguistics

Who are we

The Service d'Informatique Médicale (SIM) is part of the Radiology and Medical Informatics Department of the University Hospitals of Geneva, This entity is in charge of development of medical applications like patient record, medical orders and other knowledge based applications.

A group of SIM has been long specialized for Natural Language Processing. Under the leadership of Robert Baud, several scientists spend months or years as active participants of this group. They are: Anne-Marie Rassinoux, Christian Lovis, Judith Wagner, Laurence Alpay, Patrick Ruch, Paul Fabry. See the list of publications for more details.

The Patient Process Ontology

The Patient Record is the main source of information about patients and for knowledge extraction. The Patient Record being mainly made of free text, one has to concentrate on the main axes governing its content. It appears that two axes account for up to 70% of the whole content: body part and process. In other words, the story of a patient is composed of a set of statements like "Process has_location BodyPart". A model of Anatomy is hopefully satisfactory available under the form of the Foundational Model of Anatomy (FMA). The situation is more difficult at the level of a model of Patient Process, where numerous terminologies may help, but where the level of a well-formed ontology has not been reached and where the specific aspects of a process as found in a narrative about the patient have not been sufficiently considered. There is clearly a need for a Patient Process Ontology (PPO).

On the contrary of multiple ontologies oriented principally on indurent objects, processes are occurent objects: they occur at some point in time, this means they have a start time and a stop time. The Patient Record can be seen as a set of co-occurring processes concerning the patient. Simultaneously the patient is recovering from a pneumonia, is following a care process, complicated by a diabetes, the patient is taking a prescribed drug, he is subject to an allergic reaction, and he is becoming older: altogether there are in a single sentence description already six different parallel processes, not necessarily connected by causal links. This is typically the essence of the Patient Record.

When an adverse event is reported in the patient story, the important point is generally not this event but the recovery from the newly created situation. A patient with a broken leg is experiencing, to the point of view of the Patient Record, a process of recovery, started by an accident and ending when the patient is healthy again. The same is true when a drug is prescribed: as long as the medical order is active, the patient is in a process influenced or guided by this prescription; this is not the trigger event, which is important, but the follow up. Even the age of the patient is considered as an aging process starting at birth. On the basis of this argument, several aspects of the Patient Record may be considered as processes.

The top objects of this new ontology are presented and documented. There are a number of intrinsic difficulties to be solved and the presentation will emphasize some possible solutions. In order to match the reality, a set of true patient letters have been manually analysed for extraction of actual processes and comparison with the ontology. The objective is to annotate the available medical lexicons for all the entries pointing to objects of this ontology, preparing for a sound representation of the Patient Record and opening the way for new intelligent applications.

From Terminologia Anatomica to the Foundational Model of Anatomy

The Terminologia Anatomica TA is the result of a consensus of anatomists working under the umbrella of the Federative Committee on Anatomical Terminology. In 1998, they published a reference terminology on gross anatomy. Recently, another effort has been done under the form of the Foundational Model of Anatomy FMA, compatible with the TA, but with the formal qualities necessary for adequate handling of such a terminology for computer processing.
The SIM has been active since 2003 on the aspect of Natural Language Processing NLP in relation to the TA and in conjunction with the FMA. The goal is to develop a data base representation of the TA, especially tailored to the need of NLP. First, an relational implementation has been developed in order to accommodate the structure of the TA, the links with the FMA, and the numerous synonym terms, past, present and future. Second, the TA being originally available in Latin and English, a translation into French has been achieved. Third, a bridge to the relevant Mesh terms for any TA entry is prepared.

The TA is a universal consensus, but its success is strongly dependent of its usage. In order to favour the TA dissemi-nation, a number of accompanying measures are necessary. The main one is the translation into several languages. Such translation should reach a good level of quality and should be validated by agreed relevant committees. Such initiatives are at least underway for French and Spanish.

Another important accompanying measure is the release of different services and tools. The most basic ones should be available in the public domain. The opening of TA-dedicated website is certainly a need.

The Lexical Suite

The SIM has been involved for two decades now with Natural Language Processing of medical texts. In the eighties, during a sabbatical year, Naomi Sager - once named the mother of medical NLP - was the trigger of new developments. Since then, the SIM was involved in numerous research projects like Helios or Galen.

The cumulative development of several tools results in a package of NLP utilities, known as the Lexical Suite, tailored for French, English and German. Data resources have been set up resulting in a French lexicon with more than 46'000 entries, a relevant lexicon for the French medical domain. Such a lexicon is intended to be made available in the public domain in an effort known as UMLF and meaning Unified Medical Language for French.

Retrieval and Categorization Tools

The SIM is also active in information retrieval. Therefore, we participate in main competitions related to the biomedical domain (TREC Genomics, BioCreative). Our approaches combine general purpose retrieval tools, implementing advanced retrieval models, such as the Deviation from Randomness, and knowledge-driven modules based on the UMLS, tailored to improve navigation in biomedical text repositories. Because application areas range from literature articles in medicine and bioinformatics to clinical contents, our tools are largely language and genre-independent. Recently, in cooperation with the Swiss-Prot team of the SIB and the EBI, we have started to investigate the development of tools to help annotation of proteins in Swiss-Prot using automatic categorization tools based on the Gene Ontology.

Gene Ontology

Formaly, the Gene Ontology is a controlled vocabulary organized as a direct acyclic graphs (DAGs). It merges three structured vocabularies, that describe gene products in terms of their associated biological process, cellular component (about 1400) and molecular function in a species-independent manner. The molecular function terms describe activities at the molecular level. A biological process is accomplished by one or more ordered assemblies of molecular functions. The cellular component is a component of the cell, which is part of some larger object. For example either an anatomical structure or a gene product group.

Because, the Gene Ontology contains more than 15000 concepts, annotating proteins with the full ontology is a rather difficult task for humans, hence the importance of assisting categorization tools to help maintaining consistency of the curation process.

Publications

by Robert Baud

(as first author only, limited to the period 2000 to 2004)

  • Baud RH, Ruch P, Gaudinat A, Fabry P, Lovis C, Geissbuhler A. Coping with the variability of medical terms, IOS Press, Medinfo 2004;2004:322-6.
  • Baud RH, A natural language based search engine for ICD10 diagnosis encoding.
    Med Arh. 2004;58(1 Suppl 2):79-80.
  • Baud RH, Ruch P, Lovis C, Rassinoux A-M, Geissbuhler A. De la composition des mots français du domaine médical par des entités signifiantes. JFIM Journées Francophones d'Informatique Médicale, September 2003, Tunis.
  • Baud RH, Ruch P. The future of Natural Language Processing for Biomedical Applications. IJMI 67 (2002) p1-5.
  • Baud RH, Lovis C, Rassinoux A-M, Ruch P, Geissbuhler A, Controlling the Vocabulary for Anatomy. Proc AMIA Symp, 2002, p26-30.
  • Baud RH, Lovis C, Weber P, Geissbuhler A. Multilingual approach to ICD 10: On the need for a source reference database. Medical Informatics Europe, Budapest 2002, MIE'2002, IOS Press, Stud Health Technol Inform. 2002;90:406-10.
  • Baud RH, Lovis C, Ruch P, Rassinoux AM. Conceptual Search in Electronic Patient Record. Proc MEDINFO 2001.
  • Baud RH, Weber P, Lovis C. Coding in context. PCS/E-EFMI-WG1 Special Topic Conference, Bruges, 10-13 October, 2001.
  • Baud RH, Lovis C, Ruch P, Rassinoux A-M. A Light Knowledge Model for Linguistic Applications. J Am Med Inform Assoc 2001; (Symposium Suppl).
  • Baud RH, Ruch P, Lovis C, Rassinoux AM. Recherche conceptuelle dans les textes médicaux. 8ème Journées Francophones d'Informatique médicale, Informatique et santé, Springer-Verlag France, Volume 9.
  • Baud RH, Lovis C, Ruch P, Rassinoux AM. A Toolset for Medical Text Processing. Medical Informatics Europe, Hannover 2000, MIE'2000, IOS Press.
  • (A search on http://www.ncbi.nlm.nih.gov/entrez/query.fcgi with "Baud R" will display a complete list of 72 publications by the author).

by Patrick Ruch

(Related to Ontology)

  • P Ruch. Query Translation by Text Categorization, COLING 2004, 2004.
  • P Ruch, R Baud, and A Geissbühler. Learning-free Text Categorization, AIME 2003, LNCS/LNAI 2780, Dojat M; Keravnou E; Barahona P (Eds.).
  • P Ruch, R Baud, and A Geissbühler. Using Lexical Disambiguation and Named-Entity Recognition to Improve Spelling Correction in the Electronic Patient Record Art Intell Med, Volume 29, Issues 1-2, September-October, Pages 169-184, 2003.
  • P Ruch, R Baud, A Geissbuhler, and AM Rassinoux. Comparing general and medical texts for information retrieval based on natural language processing: an inquiry into lexical disambiguation. Proceedings of Medinfo'2001, pages 261-5, 1999.
  • P Ruch, J Wagner, P Bouillon, and R Baud. Tag-like Semantics for Medical Document Indexing. J Am Med Inform Assoc (Symposium Suppl), pages 137-141, 1999.

(more than 20 papers in MEDLINE…)

Curricullum Vitae
INTRODUCTION | PHILOSOPHY | MEDICAL LINGUISTICS | BIOLOGY | EVO-DEVO | ECOR

Webdesign COS | Geneva | Switzerland
Last update : 13.06.2006