Basic Information

Student: Igor Shevchenko

Advisors: Alan Akbik, Christoph Boden

Degree: Master


This thesis proposes a way to build a global repository of events from exploiting topically related clusters of news articles. An automated approach is based on the hierarchical grouping of events and event mentions. Firstly, event mentions are extracted on the sentence level from individual news articles using OpenIE techniques. Secondly, highly similar event mentions that refer to the same point in time are grouped within a news cluster to form unique cluster events. Finally, highly similar cluster events are grouped within the repository to form unique global events. The proposed approach is intended to extract an unrestricted set of open-domain events and all their possible textual references. Several grouping methods are implemented and evaluated. Evaluation is performed using two evaluation corpuses: adapted Event Coreference Bank Plus (ECB+) corpus and manually annotated real-word corpus. Experiments indicate the strong potential for the construction of a global repository of events. The cases in which an algorithm fails are illustrated and ways for addressing sources of errors are described.

