A protocol for constructing a domain-specific ontology for use in biomedical information extraction using lexical-chaining analysis
In order to do more semantics-based information extraction, we require specialized domain models. We develop a hybrid approach for constructing such a domain-specific ontology, which integrates key concepts from the protein-protein–interaction domain with the Gene Ontology. In addition, we present a method for using the domain-specific ontology in a discourse-based analysis module for analyzing full-text articles on protein interactions. The analysis module uses a lexical chaining technique to extract strings of semantically related words that represent the topic structure of the text. We show that the domain-specific ontology improved the performance of the lexical-chaining module. As well the topic structure as represented by the lexical chains contains important information on protein-protein interactions appearing in the same textual context.