Concepts in the Lexicon: Introduction

John F. Sowa

The lexicon is the bridge between a language and the knowledge expressed in that language. Every language has a different vocabulary, but every language provides the grammatical mechanisms for combining its stock of words to express an open-ended range of concepts. Different languages, however, differ in the grammar, the words, and the concepts they express. The differences arise from three kinds of variation:

Grammars and words belong to the province of linguistics, but the concepts they express belong to the extra-linguistic knowledge about the world. For each language, the lexicon must provide the links that enable a language processor to carry messages from one province to the other.

Besides accommodating the idiosyncracies of each language, the lexicon must support all the possible uses of language. Each use has a different purpose, which requires a different kind of information. A simple spelling checker, for example, can catch many errors with nothing but a list of words. To distinguish there from their, however, it must contain syntactic information. To distinguish sight from site, it must also contain semantics. And to distinguish infer from imply, it must contain enough information to enable a language processor to recognize the context, the topic, and the logical inferences necessary to determine what was being inferred or implied.

The demands on the lexicon also vary with the type of application: speech transcription, information retrieval, information extraction, text summarization, message classification, question answering, machine translation, and discourse understanding. Each application can also be processed at levels of detail ranging from a rough approximation triggered by keywords to a deep understanding that applies all the resources of syntax, semantics, and pragmatics. As a bridge, the lexicon is partly language dependent, partly language independent, and partly domain and application dependent. It need not contain all information about the language and domain, but it must contain the hooks that link the language-dependent words to the language-dependent grammar and to the language-independent, but domain-dependent conceptual structures.

This document is a revised, reorganized, and updated compilation of material extracted from several papers by John Sowa. The major contributions are taken from three papers (Sowa 1988, 1992a, 1993). Additional material has been excerpted from several other papers (Sowa 1991, 1998, 1999, Sowa & Way 1986), and the terminology and notation have been revised to conform to the book Knowledge Representation (Sowa 2000). The result is organized in three parts:

  1. Problems and Issues. Part I is a survey of linguistic examples that impose requirements on the kinds of knowledge that must be represented in the lexicon. It emphasizes the problems and their implications rather than the details of any particular theory or notation.

  2. Representations. Part II addresses the structure of the lexicon and its links to syntax, semantics, and world knowledge. It uses logic as a theory-neutral representation and shows how other representations, both theoretical and computational, can be translated to logic in either the predicate calculus or conceptual graph notations.

  3. Language Processing. Part III shows how the lexicon is used in language parsing, information extraction, semantic interpretation, discourse analysis, and ambiguity resolution. It shows how the problems and issues raised in Part I can be addressed by using the lexical representations introduced in Part II.
The combined bibliography is located in the reference section. Clicking on any citation represented in blue transfers the browser to the corresponding reference; clicking on the back button of the browser returns to the previous text.


Send comments to John F. Sowa.

  Last Modified: