Recent research has shown great progress on fine-grained entity typing. Most existing methods require pre-defining a set of types and training a multi-class classifier from a large labeled data set based on multi-level linguistic features. They are thus limited to certain domains, genres and languages. In this paper, we pro- pose a novel unsupervised entity typing framework by combin- ing symbolic and distributional semantics. We start from learn- ing general embeddings for each entity mention, compose the em- beddings of specific contexts using linguistic structures, link the mention to knowledge bases and learn its related knowledge rep- resentations. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representa- tions. This framework doesn’t rely on any annotated data, prede- fined typing schema, or hand-crafted features, therefore it can be quickly adapted to a new domain, genre and language. Further- more, it has great flexibility at incorporating linguistic structures (e.g., Abstract Meaning Representation (AMR), dependency rela- tions) to improve specific context representation. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.
2021-11-03 14:24:06
995KB
Entity
1