Sangam: A Confluence of Knowledge Streams

Retrieving Definitions from Scientific Text in the Salmon Fish Domain by Lexical Pattern Matching

Show simple item record

dc.contributor Technical Communication
dc.creator Gabbay, Igal
dc.date 2016-06-27T19:03:43Z
dc.date 2016-06-27T19:03:43Z
dc.date 2004-01
dc.date.accessioned 2023-03-03T18:51:43Z
dc.date.available 2023-03-03T18:51:43Z
dc.identifier eprint:287
dc.identifier http://hdl.handle.net/10919/71562
dc.identifier.uri http://localhost:8080/xmlui/handle/CUHPOERS/282045
dc.description While an information retrieval system takes as input a user query and returns a list of relevant documents chosen from a large collection, a question answering system attempts to produce an exact answer. Recent research, motivated by the question answering track of the Text REtrieval Conference (TREC) has focused mainly on answering ‘factoid’ questions concerned with names, places, dates etc. in the news domain. However, questions seeking definitions of terms are common in the logs of search engines. The objective of this project was therefore to investigate methods of retrieving definitions from scientific documents. The subject domain was salmon, and an appropriate test collection of articles was created, pre-processed and indexed. Relevant terms were obtained from salmon researchers and a fish database. A system was built which accepted a term as input, retrieved relevant documents from the collection using a search engine, identified definition phrases within them using a vocabulary of syntactic patterns and associated heuristics, and produced as output phrases explaining the term. Four experiments were carried out which progressively extended and refined the patterns. The performance of the system, measured using an appropriate form of precision, improved over the experiments from 8.6% to 63.6%. The main findings of the research were: (1) Definitions were diverse despite the documents’ homogeneity and found not only in the Introduction and Abstract sections but also in the Methods and References; (2) Nevertheless, syntactic patterns were a useful starting point in extracting them; (3) Three patterns accounted for 90% of candidate phrases; (4) Statistically, the ordinal number of the instance of the term in a document was a better indicator of the presence of a definition than either sentence position and length, or the number of sentences in the document. Next steps include classifying terms, using information extraction-like templates, resolving basic anaphors, ranking answers, exploiting the structure of scientific papers, and refining the evaluation process.
dc.format application/pdf
dc.format application/pdf
dc.format application/pdf
dc.format application/pdf
dc.format application/pdf
dc.format application/pdf
dc.format application/pdf
dc.format application/pdf
dc.language en
dc.publisher University of Limerick
dc.rights In Copyright
dc.rights http://rightsstatements.org/vocab/InC/1.0/
dc.subject question answering
dc.subject definition questions
dc.subject salmon
dc.subject definitional questions
dc.subject computational linguistics
dc.subject natural language processing
dc.subject QH301
dc.subject QA75
dc.subject P1
dc.subject Q1
dc.subject AI
dc.title Retrieving Definitions from Scientific Text in the Salmon Fish Domain by Lexical Pattern Matching
dc.type Thesis


Files in this item

Files Size Format View
287_1.pdf 51.07Kb application/pdf View/Open
287_2.pdf 266.4Kb application/pdf View/Open
287_3.pdf 57.22Kb application/pdf View/Open
287_4.pdf 71.69Kb application/pdf View/Open
287_5.pdf 205.6Kb application/pdf View/Open
287_6.pdf 38.24Kb application/pdf View/Open
287_7.pdf 147.1Kb application/pdf View/Open

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse