Intelligence in Wikipedia


November 11, 2008


Berners-Lee’s vision of the Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method: creating enough structured data to motivate the development of applications. We believe that autonomously `Semantifying Wikipedia’ is the best way to bootstrap. We choose Wikipedia as an initial data source, because it is comprehensive, high-quality, modestly sized, and contains enough manually-derived structure to bootstrap an autonomous, self-supervised process. In this talk I will present our success to date in this endeavor:

A novel approach for self-supervised learning of CRF information extractors

Automatic construction of a comprehensive ontology via statistical-relational learning

Vast improvements in extraction recall through shrinkage over this ontology and retraining

The stimulation of a virtuous feedback cycle between communal content creation and information extraction

We aim to construct a knowledge base of outstanding size to support inference, automatic question answering, faceted browsing, and potentially to bootstrap the Semantic Web.

Speaker: Daniel S. Weld
Daniel S. Weld is Thomas J. Cable / WRF Professor of Computer Science and Engineering at the University of Washington. After formative education at Phillips Academy, he received bachelor’s degrees in both Computer Science and Biochemistry at Yale University in 1982. He landed a Ph.D. from the MIT Artificial Intelligence Lab in 1988, received a Presidential Young Investigator’s award in 1989, an Office of Naval Research Young Investigator’s award in 1990, was named AAAI Fellow in 1999 and deemed ACM Fellow in 2005. Dan is an area editor for the Journal of the ACM, on the editorial board of Artificial Intelligence, was a founding editor and member of the advisory board for the Journal of AI Research, was guest editor for Computational Intelligence and Artificial Intelligence, edited the AAAI report on the Role of Intelligent Systems in the National Information Infrastructure, and was Program Chair for AAAI-96. Dan has published two books and scads of technical papers.

Dan is an active entrepreneur with several patents and technology licenses. In May 1996, he co-founded Netbot Incorporated, creator of Jango Shopping Search and later acquired by Excite. In October 1998, Dan co-founded AdRelevance, a revolutionary monitoring service for internet advertising which was acquired by Media Metrix and subsequently by Nielsen NetRatings. In June 1999, Dan co-founded data integration company Nimble Technology which was acquired by the Actuate Corporation. In January 2001, Dan joined the Madrona Venture Group as a Venture Partner and member of the Technical Advisory Board.