Abstract |
Today, valuable business information is increasingly stored as unstructured data (documents, emails, etc.). For example, documents exchanged between business partners capture information on transactions between them like purchases or invoices. A major challenge is to correctly recognize and associate real-world entities in unstructured data, e.g. documents, with those stored in structured data e.g., enterprise databases. To address this, we propose in this paper a robust process methodology consisting of three phases: entity extraction from documents, generation of mapping of recognized entities with structured data, and disambiguation of mappings exploiting relationships from the enterprise data and the documents' structure. |