Automatic Entity Recognition and Typing in Massive Text Data
FRIDAY, July 1, 2016 (1:30pm - 3:00pm)
Abstract: In today's computerized and information-based society, individuals are constantly presented with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To extract value from these large, multi-domain pools of text, it is of great importance to gain an understanding of entities and their relationships.
In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their fine-grained types (e.g. people, product and food) in a scalable way. Since these methods do not rely on annotated data, predefined typing schema or handcrafted features, they can be quickly adapted to a new domain, genre and language. We demonstrate on real datasets including various genres (e.g. news articles, discussion forum posts, and tweets), domains (general vs. biomedical domains) and languages (e.g., English, Chinese, Arabic, and even low-resource languages like Hausa and Yoruba) how these typed entities aid in knowledge discovery and management.
URL for the Slides:
Xiang Ren is a Ph.D. candidate of Department of Computer Science at Univ. of Illinois at Urbana-Champaign. His research focuses on knowledge acquisition from text data and mining linked data. He is the recipient of the 2016 Google PhD Fellowship in Data Management and Databases, was the recipient of C. L. and Jane W.S. Liu Award and Yahoo!-DAIS Research Excellence Award in 2015, and received the Microsoft Young Fellowship from Microsoft Research Asia in 2012.
Ahmed El-Kishky, Ph.D. candidate, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research interests include mining large unstructured data, text mining, and network mining. He is the recipient of both the National Science Foundation Graduate Research Fellowship as well as National Defense Science and Engineering Fellowship.
Heng Ji is an Edward P. Hamilton Development Chair Associate Professor of Computer Science Department of Rensselaer Polytechnic Institute. Her research interests focus on Natural Language Processing and its connections with Data Mining and Vision. She received "AI's 10 to Watch" Award by IEEE Intelligent Systems in 2013 and NSF CAREER award in 2009. She coordinated the NIST TAC Knowledge Base Population task in 2010, 2011, 2014, 2015 and 2016.
Jiawei Han, Abel Bliss Professor, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research areas encompass data mining, data ware-housing, information network analysis, and database systems, with over 600 conference and journal publications. He is Fellow of ACM and Fellow of IEEE, and received ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), and IEEE Computer Society W. Wallace McDowell Award (2009). His co-authored textbook "Data Mining: Concepts and Techniques", 3rd ed., (Morgan Kaufmann, 2011) has been adopted popularly world-wide.