Meaning-Preserving Translations

by John F. Sowa

Informally, different statements in different languages can mean "the same thing." Formally, that "thing," called a proposition, represents abstract, language-independent, semantic content. As an abstraction, a proposition has no physical embodiment that can be written or spoken. Only its statements in particular languages can be expressed as strings of symbols.

According to Peirce (1905), "The meaning of a proposition is itself a proposition. Indeed, it is no other than the very proposition of which it is the meaning: it is a translation of it." Mathematically, Peirce's informal statement may be formalized by defining a proposition as an equivalence class of sentences that can be translated from one to another while preserving meaning. Some further criteria are necessary to specify what kinds of translations are considered to "preserve meaning." Formally, a meaning-preserving translation f from a language L1 to a language L2 may be defined as a function that satisfies the following constraints:

These four criteria ensure that the sentences s and g(f(s)) are highly similar, if not identical. If s is the sentence Every farmer who owns a donkey beats it, then the sentence g(f(s)) might be If a farmer x owns a donkey y, then x beats y. Those sentences use different logical and syntactical symbols, but they are provably equivalent, they have the same content words, and they have the same structure when expressed with only , ~, and $.

Attempts to apply formal definitions to natural languages are fraught with pitfalls, exceptions, and controversies. To avoid such problems, the definition of meaning-preserving translation may be restricted to formal languages, like CGs and KIF. The sample sentence in Figure 5.12 could be defined as part of a formal language called stylized English, which happens to contain many sentences that look like English. Yet even for formal languages, the four criteria require further explanation and justification:

These conditions impose strong constraints on translations that are said to preserve meaning. They ensure that the content words or predicates remain identical or synonymous, they preserve the logical structure, and they prevent irrelevant content from being inserted.

Examples of Meaning-preserving Translations.  To illustrate the issues, consider meaning-preserving translations between two different notations for first-order logic. Let L1 be predicate calculus with Peano's symbols , , ~, , $, and ", and let L2 be predicate calculus with Peirce's symbols +, ×, -, -<, S, and P. Then for any formulas or subformulas p and q in L1, let f produce the following translations in L2:

The sentences generated by f use only the operators ×, -, and S, but the inverse g is defined for all operators in L2: The functions f and g meet the criteria for meaning-preserving translations: they are invertible, proof preserving, vocabulary preserving, and structure preserving. Furthermore, the proof of equivalence can be done in linear time by showing that two sentences s and t in L1 map to the same form with the symbols , ~, and $.

The functions f and g in the previous example show that it is possible to find functions that meet the four criteria. They don't map any sentences to the same equivalence class unless they can be said to "preserve meaning" in a very strict sense, but they leave many closely related sentences in different classes: permutations such as pq and qp; duplications such as p, pp, and ppp; and formulas with renamed variables such as ($x)P(x) and ($y)P(y). To include more such sentences in the same equivalence classes, a series of functions f1, f2, ..., can be defined, all of which have the same inverse g:

  1. Sorting. The function f1 makes the same symbol replacements as f, but it also sorts conjunctions in alphabetical order. As a result, pq and qp in L1 would both be mapped to p×q in L2, which would be mapped by g back to pq. Therefore, f1 groups permutations in the same equivalence class. Since a list of N terms can be sorted in time proportional to NlogN, the function f1 takes just slightly longer than linear time.

  2. Renaming variables. The function f2 is like f1, but it also renames the variables to a standard sequence, such as x1 ,x2 , ... . For very long sentences with dozens of variables of the same type, the complexity of f2 could increase exponentially. A typed logic can help reduce the number of options, since the new variable names could be assigned in the same alphabetical order as their type labels. For the kinds of sentences used in human communications, most variables have different types, and the computation time for f2 would be nearly linear.

  3. Deleting duplicates. After f1 and f2 sort conjunctions and rename variables, the function f3 would eliminate duplicates by deleting any conjunct that is identical to the previous one. The deletions could be performed in linear time.
For the kinds of sentences that people speak and understand, the total computation time of all three functions would be nearly linear. Although it is possible to construct sentences whose computation time would increase exponentially, those sentences would be hopelessly unintelligible to humans. What is unnatural for humans would be inefficient for computers.

This series of functions shows how large numbers of closely related sentences can be reduced to a single canonical form. If two sentences express the same proposition, their canonical forms, which can usually be calculated efficiently, would be the same. The function f2 has the effect of reducing sentences to Peirce Normal Form (PNF) the result of translating a sentence from predicate calculus to an existential graph and back again. As an example, consider the following sentence, which Leibniz called the Praeclarum Theorema (splendid theorem):

  • ((p r) (q s)) ((p q) (r s)).
  • This formula may be read If p implies r and q implies s, then p and q imply r and s. When translated to L2 by f3 and back to L1 by g, it has the following Peirce Normal Form:

  • ~((~(p ~r) ~(q ~s)) ~(~(p q) ~(r s)) ).
  • This form is not as readable as the original, but it serves as the canonical representative of an equivalence class that contains 864 different, but highly similar sentences. The function f3, which deletes duplicates, can reduce an infinite number of sentences to the same form. Such transformations can factor out the differences caused by the choice of symbols or syntax.

    To account for synonyms and definitions, another function f4 could be used to replace terms by their defining lambda expressions. If recursions are allowed, the replacements and expansions would be equivalent in computing power to a Turing machine; they could take exponential amounts of time or even be undecidable. Therefore, f4 should only expand definitions without recursions, direct or indirect. Since the definitions may introduce permutations, duplications, and renamed variables, f4 should expand the definitions before performing the reductions computed by f3. Without recursion, the expansions would take at most polynomial time.

    Meaning in Natural Languages.  When functions like the fi series are extended to natural languages, they become deeply involved with the problems of syntax, semantics, and pragmatics. In his early work on transformational grammar, Noam Chomsky (1957) hoped to define transformations as meaning-preserving functions. But the transformations that moved phrases and subphrases had the effect of changing the scope of quantifiers and the binding of pronouns to their antecedents:

    To account for the implications of such transformations, Chomsky (1982) developed his theory of government and binding, which replaced all transformations by a single operator called move-a and a set of constraints on where the phrase a could be moved. In his most recent minimalist theory, Chomsky (1995) eliminated movement altogether and formulated the principles of grammar as a set of logical constraints. With that theory, both language generation and interpretation become constraint-satisfaction problems of the kind discussed in Section 4.6. The common thread running through these theories is Chomsky's search for a syntax-based characterization of the meaning-preserving translations.

    AI-based computational linguistics has also involved a search for meaning-preserving translations, but with more emphasis on semantics and pragmatics than on syntax. Roger Schank (1975), for example, developed his conceptual dependency theory as a canonical representation of meaning with an ontology of eleven primitive action types. Although Schank was strongly opposed to formalization of any kind, his method of reducing a sentence to canonical form could be viewed as a version of function f4. In his later work (Schank & Abelson 1977; Schank 1982), he went beyond the sentence to higher-level structures called scripts, memory organization packets (MOPs), and thematic organization packets (TOPs). These structures, which have been implemented in framelike and graphlike versions of EC logic, address meaning at the level of paragraphs and stories. Stuart Shapiro and his colleagues have implemented versions of propositional semantic networks, which support similar structures in a form that maps more directly to logic (Shapiro 1979; Shapiro & Maida 1982; Shapiro & Rappaport 1992). Shapiro's propositional nodes serve the same purpose as Peirce's ovals and McCarthy's contexts.

    Besides the structural forms of syntax and logic, the meaning-preserving translations for natural languages must account for the subtle interactions of many thousands of words. The next two sentences, for example, were adapted from a news report on finance:

    The first sentence implies that inflation is not increasing, but the second one implies that it is. The negation, which is critical for understanding the sentences, does not appear explicitly. Instead, it comes from an implicit negation in the meaning of the noun concern: if some agent x has a concern about y, then x hopes that some bad event does not happen to y. The concern is eased when the bad event is less likely to occur, and the concern is heightened when the bad event is more likely to occur. In the normal use of language, people understand such sentences and their implications. For a computer to understand them, it would require detailed definitions of the words, background knowledge that rising inflation is bad for the economy, and the reasoning ability to combine such information. Doug Lenat and his group in the Cyc project have been working since 1984 on the task of encoding and reasoning with the millions of rules and facts needed for such understanding.


      Last Modified: