A system-theoretical view of EBMT

June 8, 2017 | Autor: Michael Carl | Categoria: Cognitive Science, Machine Translation, System Theory, Second Order, Data Format

Share Embed

Denunciar este link

Descrição do Produto

Mach Translat (2005) 19:229–249 DOI 10.1007/s10590-006-9012-8 O R I G I NA L PA P E R

A system-theoretical view of EBMT Michael Carl

Received: 30 January 2006/Accepted: 30 August 2006 / Published online: 21 December 2006 © Springer Science+Business Media B.V. 2006

Abstract According to the system theory of von Bertalanffy (1968), a “system” is an entity that can be distinguished from its environment and that consists of several parts. System theory investigates the role of the parts, their interaction and the relation of the whole with its environment. System theory of the second order examines how an observer relates to the system. This paper traces some of the recent discussion of example-based machine translation (EBMT) and compares a number of EBMT and statistical MT systems. It is found that translation examples are linguistic systems themselves that consist of words, phrases and other constituents. Two properties of Luhmann’s (2002) system theory are discussed in this context: EBMT has focussed on the properties of structures suited for translation and the design of their reentry points, and SMT develops connectivity operators which select the most likely continuations of structures. While technically the SMT and EBMT approaches complement each other, the principal distinguishing characteristic results from different sets of values which SMT and EBMT followers prefer. Keywords Example-based machine translation · Statistical machine translation · System theory · Emergent behaviour · Statistical EBMT

1 Introduction There is an ongoing discussion as to whether statistical machine translation (SMT) and example-based machine translation (EBMT) systems are different and how they could be characterised. Some of the questions are repeated here:1 1 On the MT-List in the summer of 2004 (www.mail-archive.com/[email protected]/msg00716.html), in the spring of 2005 (www.mail-archive.com/[email protected]/msg00772.html), and in various papers and conferences.

M. Carl (B) Institut für Angewandte Informationsforschung, 14 Martin-Luther-Str, 66111 Saarbrücken, Germany e-mail: [email protected]

230

Mach Translat (2005) 19:229–249

(a) (b) (c) (d)

Is SMT defined by using the noisy channel model? Do EBMT and SMT differ in the amount of parallel data needed? Are there essential differences in the learning methods of SMT and EBMT? Are representations a defining criterion in distinguishing between EBMT and SMT? Is phrase-based SMT just EBMT with a statistical twist, using n-grams instead of syntactic reasoning? Does the distinction between SMT and EBMT depend on whether discontinuous or continuous phrases are used? Is how the data is used more important than how it is represented? Does EBMT store as many examples as possible, while SMT dissolves examples in statistics?

(e) (f) (g) (h)

This paper first outlines classification schemas for data-driven MT in general and the research questions emerging from them. It then gives an overview of some of the definitions of EBMT and SMT held to date. As these definitions are controversial, we look into the details of some system descriptions. Different modes of processing, run-time versus preprocessing, and models of representations, rules versus statistics, are examined. We find that (a) no distinguishing properties can be determined and (b) there is a mutual dependency between the models of representation and the models of operation. That is, changing modes of operation entails a change in the representations and vice versa. EBMT is reconsidered from a system-theoretical point of view. By tracing the development of EBMT systems we see that translation examples have been considered systems themselves, and properties of these example systems are discussed, such as their compositionality and to what extent they can be reused for translation. SMT, in contrast, has formalised the estimation and reuse of statistical properties and developed generative operators that combine the data in the most likely way. However, recently, EBMT systems have integrated statistical operators and SMT systems have made use of complex representations, so that some of the above questions are now being asked within one system architecture, rather than as defining two competing schools of data-driven MT. EBMT and SMT approaches seem to diverge in the goals and expectations of the translation systems: while designers of SMT systems seek to optimise an average statistical translation quality, measured on a set of random sentences, followers of EBMT seem to be more interested in increasing the accuracy of the generalisation and reproduction capacities of the example systems. It is to date unclear whether and to what extent these values are complementary and compatible.

2 Three-dimensional MT classification spaces In this section, we look at three attempts to classify MT systems in a threedimensional space. In order to allow a general view of MT and computational tools that help the translation process, Maegaard et al. (1999) define MT as “includ[ing] any computer-based process that transforms (or helps a user to transform) written text from one human language into another”. They assume different qualities of the MT system output according to whether it makes use of symbolic or statistical methods. This is shown in Table 1, where the implication is that statistical methods will dominate

Mach Translat (2005) 19:229–249 Table 1 Comparing symbolic and statistical methods (based on Maegaard et al. 1999)

231 Quality of MT system

Symbolic

Statistical

Robustness/coverage: Quality/fluency: Representation:

Lower Higher Deeper

Higher Lower Shallower

where robustness and wide coverage are at issue, while symbolic methods are more relevant where translation quality is important. Supposing that all useful systems will be hybrids in the future, Maegaard et al. (1999) ask the burning, but yet unanswered question as to which aspects of MT systems are best approached by statistical methods, and which by traditional, linguistic ones. Carl (2000) approaches a related question by elaborating a three-dimensional MT-model space according to the assumptions of the underlying theory of meaning that the MT system implements. Theories of meaning are characterised by three dichotomies: they can be holistic or molecular, they can be austere or rich and they can be fine-grained or coarse-grained. A system with a rich theory of meaning computes complex representations including morphology, syntax and semantics while an austere theory relies on the mere graphemic surface form of the text. A holistic theory derives meaning descriptions from a set of reference translations by examining their similarities and differences. In a molecular approach, the meaning descriptions are obtained from a finite set of predefined features. In a fine-grained theory, the minimal length of translation units is equivalent to a morpheme while in a coarse-grained theory translation units are morpheme clusters, phrases or even sentences. Wu (2006) classifies MT systems along three axes, statistical vs. compositional vs. example-based. His aim is to offer an analytical perspective on EBMT from an SMT standpoint. His classification parallels largely the model of Carl (2000). The dichotomy fine-grained vs. coarse-grained coincides roughly with “schema-based” vs. “examplebased” in Wu’s MT space: translation units are likely to be coarse-grained when using a large library of examples (Wu 2006, p 216) while they will be finer grained the more the schemas are abstracted. The axis molecular vs. holistic is related to “logical” vs. “statistical” since statistics can be used to derive shades of meaning distinctions from corpora while mathematical logics are required to compose larger meaning entities from finite sets of features. The dimension austere vs. rich relates to “lexical” vs. “compositional” insofar as mere lexical translations will be close to their graphemic surface forms while rich representations are required for compositional translations. The three-dimensional MT-model spaces discussed in this section are homomorphic in the sense of Ashby (1956): even though these spaces are differently described and conceptualised, they all seek to define a framework for finding answers about how to integrate symbolic and statistical methods with rules and corpora, how to find the “right combination of ingredients” (Wu 2006, p 223) and how to evaluate and measure the success of the techniques used. Thus, translation memories are lexical, logical and example-based in Wu’s space and austere, molecular and coarse-grained in Carl’s space. The more a system becomes holistic, coarse-grained and rich in Carl’s terms, the more it becomes statistical, example-based and compositional in Wu’s terminology.

232

Mach Translat (2005) 19:229–249

However, while such spaces give a clear picture of the components of MT systems, they do not in themselves explain what the challenges and ruptures are when moving from one point to another and whether the points in the spaces are indeed evenly distributed or reachable. There also seem to be differences in how to measure the “right combination of ingredients”: while Wu seeks to find answers which “ultimately help determine the optimum point in the MT model space” (Wu 2006, p 223), for Carl (2000) there is no such singular optimum point. Rather, it is expected that the more the system makes use of coarse-grained and holistic translation units, the higher the translation quality, and the more the theory applies rich representations, the broader the coverage of the system will be. We will later come back to this issue, but first have a closer look at other definitions of EBMT so far.

3 Definitions of EBMT According to Somers (2003), there are necessary and sufficient conditions for a system to count as EBMT. Necessary is the use of a bilingual corpus. However, since many rule-based systems also use bilingual corpora, Somers (2003, p 46) introduces the term “primarily example-based” to refer to MT systems that, among other things are data-, rather than theory-driven. They use data to cover real language constructions, and they provide relief from structure-preserving translation which is an unsolved problem in the rule-based paradigm. The term “primarily example-based” suggests descriptions rather than definitions of EBMT. While statistical approaches are seen to be a subset of EBMT, Somers perceives EBMT as an alternative, complementing rule-based methods “to enhance and, sometimes, replace it” (ibid., p 47) rather than as a rival. With this assumption he follows Sumita et al. (1990), who suggest that some phenomena may be suited to EBMT while others are better tackled by rule-based MT (RBMT). We will give some examples in Sect. 8. Turcato and Popowich (2003) compare a variety of EBMT systems to lexicalist MT and find that almost all of the methods used in EBMT are also used in lexicalist MT. The overriding characteristic of EBMT is thus “translation by analogy”, a concept originally introduced by Nagao (1984). Translation by analogy assumes that we cannot a priori know which parts in an example are relevant for the translation of a new sentence. Any preprocessed generalisation and decomposition of the examples would undermine this analogy principle and finally resemble known MT approaches. According to this definition, a true EBMT implementation is described, for example, in Doi et al. (2005) and, based on proportional analogy, in Lepage and Denoual (2005). As a conclusion, they ask MT practitioners to emphasise things in common among different approaches since the real challenge is “combining different approaches and insights into a comprehensive whole” (Turcato and Popowich 2003, p 60) for which determination of the real differences is a prerequisite. Hutchins (2005) (see also Hutchins (2006) for a very similar analysis) defines EBMT in a broader sense. According to him, the core transfer process in EBMT involves the matching of source-language fragments and extraction of equivalent target-language fragments from a bilingual corpus as partial potential translations. Whether the corpus is preprocessed or not and what structure these fragments have is not a defining criterion for EBMT. The transfer is example-based because it is “performed with reference to databases of paired [source- and target-language] sentences and phrases” (Hutchins 2005, p 68, emphasis original). He then contrasts EBMT with SMT and

Mach Translat (2005) 19:229–249

233

RBMT and finds that some EBMT approaches differ little from RBMT and others differ little from SMT. Phrase-based SMT and EBMT can be regarded “as variants of a single framework” (ibid., p 69) and EBMT systems which make use of dependency and phrase-structure trees “appear to be variants of traditional RBMT” (idem.). Turcato and Popowich (2003) and Hutchins (2005, 2006) assume that the way knowledge is acquired does not contribute to the characterisation of EBMT; rather “what matters is how the knowledge is used in operation” (Hutchins 2006, p 199). Similarly, for Hovy (2005) it seems that the “method of construction is less important than the method of operation”. We will now investigate what might count as methods of operation and methods of construction, and conclude that we cannot draw a meaningful distinction between methods being used in EBMT and methods being used in SMT. We will then take a look from a system-theoretical perspective and investigate whether this leads to additional insights.

4 Run-time versus preprocessing This distinction assumes that we could tell SMT from EBMT by looking at whether or not the system makes use of the parallel corpus at run-time. However, we claim that it is preferable for all imaginable types of systems to preprocess a maximum amount of information to avoid repeated computation of the same information at run-time. Once the information is computed, it can be stored, indexed and efficiently retrieved. So why would not all systems preprocess all information? The answer to this is that not all information can be preprocessed. Whether information can be preprocessed depends on two factors: (a) the time and space required to compute and store the data, and (b) whether new information is expected at run-time that makes preprocessing of the data impossible. Lack of space and time prevents the generation of backoff language models above a given n. In most cases it will be inefficient to compute all n-grams for most realistically sized texts simply because there are too many. For instance, Brown et al. (1990) estimate 81 million parameters from 40,000 pairs of sentences when taking into account 9,000 word-to-word translations and a trigram language model. While they could still preprocess all the data, many more parameters will be required when phrase translations are considered and, say, 12-gram language models. If n-grams are needed beyond a given n, they need to be computed at run-time by examining the reference text. Accordingly, Zhang and Vogel (2005) compute in an SMT system phrase-to-phrase translations of arbitrary length at run-time. They use suffix arrays that allow efficient retrieval of phrases in the training data and their probabilistic translation alignment. It is interesting to note that exactly the same technology is also used for “searchable translation memories” (Callison-Burch et al. 2005) with the difference that a user is presented with the search results. We have here techniques which come close to the original idea of translation by analogy, but interestingly used in an SMT system and in a translation memory. On the other hand, even the most “pure” EBMT systems, for instance Lepage and Denoual (2005), use sentence pairs that are aligned, and thus preprocessed, in advance, and many EBMT systems use even more sophisticated (for instance

234

Mach Translat (2005) 19:229–249

linguistic) approaches to prepare a bilingual corpus in advance (Sato and Nagao 1990, Way 2003, Quirk and Menezes 2006). In other cases, preprocessing is only partially possible due to the lack of complete information before the translation process starts. For instance, a system that is “open” to adaptation to new reference translations which become available during run-time cannot preprocess that information. In both cases, whether the system is open or whether information can be only partially preprocessed due to the lack of space, the result is a mixture of run-time and compilation-time processing: whenever the choice exists to organise the data in one way or the other, the reason to choose run-time or compilation-time operations and appropriate representations is to increase the quality and/or speed of the translation and to allow it to adapt to less likely cases. Thus, we have run-time and compilation-time SMT systems, and we have run-time and compilation-time EBMT systems, which lets us conclude that the distinction run-time versus compilation-time is probably an important detail of the system architecture, but one which does not distinguish major processing paradigms.

5 Structure of translation unit Within EBMT systems, words, phrases, templates and structures have all been used as translation units, yet the same holds for SMT. EBMT has used translation templates since the early 1990s (Kaji et al. 1992). Recently, Chiang (2005) has extracted very similar representations based on statistical assumptions which he refers to as “hierarchical phrases”. In both cases the extracted representation is a coupled string of source–target symbols where corresponding units (words, phrases) are replaced by variables. Kaji et al. parse the previously aligned source- and target-language sentences and use a bilingual dictionary to detect word translations. In a bottom-up inference process, they detect recursive phrase translations which are substituted by coindexed variables to generate recursive translation templates. Similar methods are also described in Brown (1999), Carl (2003) and Cicekli (2005). Chiang (2005) computes word alignments running giza++ (Och and Ney 2003). He infers “all possible differences of phrase pairs” and generates a very large number of templates.2 In order to restrict the number of templates generated, Chiang makes use of heuristics such as: allow at most two nonterminals per phrase, prohibit nonterminals that are adjacent or limit phrase length to 10, and so on. While the representations are virtually identical, Kaji et al. (1992) refer to their approach as “example-based” and Chiang (2005) calls his “statistical”. It may be argued that this is the case because for Kaji et al. the slots exemplify linguistically motivated content while for Chiang the number and distribution of slots are based on statistical assumptions, and heuristics are introduced to cope with the complexity. Note that Chiang concludes that a better integration of parsers will be a goal for the future. It is, however, unclear whether the system will become example-based in this instance. 2 Interestingly, Chiang also refers to these generated templates as “rules”: What is the status of such rules? Are they similar to those generated by (computational) linguists? In what respects will they differ? See Sect. 6 for a discussion.

Mach Translat (2005) 19:229–249

235

In addition, tree-based representations are used in SMT and EBMT. Yamada and Knight (2001) present an SMT model which maps a source-language parse tree into a target-language string using three operations: reordering child nodes, inserting extra words and transferring leaf words. These operations are statistically trained. Quirk and Menezes (2006) describe a statistical EBMT system that is based on dependency trees. Similar to Yamada and Knight (2001), they use statistical operations as introduced by Och and Ney (2002) to train the word alignments and reordering operations. As with templates, there is no principal difference in the tree representations used and even the same sort of operations are carried out on them. So in what respect are EBMT and SMT systems different? Similar to the distinction run-time versus preprocessing, it seems the choice of the data structure is independent of whether or not the whole system will be labelled “statistical” or “example-based”. Rather, the reason to use more structured representations in SMT as well as in EBMT lies in the fact that “tree-structured translation models have the potential to encode more information using fewer parameters” (Burbank et al. 2005, p 6). Using tree structures is a means of distinguishing different layers of information. Only a subset of the total information has to be considered at any one time, and sets of other pieces of information are coded and evaluated in partial subtrees. Also the design of the data structure, for instance whether it is flat, binary or n-ary, whether it allows a type or feature system and so on, already determines a number of parameters that otherwise would have to be statistically estimated. Using better theoretically grounded representations in conjunction with statistics extracted from corpora is a technique for both SMT and EBMT. Is the essential dichotomy, then, how these representations are generated, whether or not rules are used? The next section looks into this in more detail.

6 Rules versus statistics While ten years ago the concepts “rule” and “statistics” were largely incompatible, there are now a number of attempts to integrate both concepts. How did this change come into being? Searle (1995) distinguishes between “constitutive” and “regulative” rules. Regulative rules describe a behaviour which exists independently of the rules. For instance, rules of polite table behaviour regulate eating, but eating itself exists independently of these rules. In contrast to this, constitutive rules are tightly linked to the behaviour which they create and determine. Examples of constitutive rules are games: kicking a ball becomes a football game only if the rules for football games are respected. Constitutive rules are analytical statements and make explicit what is contained in the behaviour. Rules in traditional RBMT systems are constitutive. Translations are constructed through a set of rules and would not exist without them. A transfer rule, for instance, produces and explains a representation. It determines under which conditions we obtain which translation. Statistics, as defined by Wu (2006), is a matter of collecting and analysing numerical data. Statistical methods are used for “disambiguation … ranking, scoring, parameter estimation, etc.” (Wu 2006, p 215). While it seems uncontroversal what the term

236

Mach Translat (2005) 19:229–249

“statistics” means, there are several definitions for SMT.3 For Yamada and Knight, SMT is “a mathematical model in which the process of human-language translation is statistically modeled” (2001, p 523). Now, assume you have a traditional rule-based transfer system, but you learned the transfer rules by counting tree transformations in a large corpus, and associated each transfer rule with a probability. The MT system could then either simply apply the most probable rule at each point or, alternatively, it could apply many transfer rules at each point, generate many translation candidates and use a pruning technique to rank the results. So, will these be statistical or rule-based systems?4 In our view, it is neither, in the sense defined above. Unless one considers representations to be a negligible part in the process of human language translation, translations are not statistically modelled in the sense of Yamada and Knight. The statistical component cannot (re)produce the outcome without the representations produced by the rules. However, the availability of representations is a prerequisite for them to be selected or ranked by statistics. It is also not an RBMT system because it does not make use of “constitutive” rules, and the rule-based component does not determine why and under which conditions a translation is produced. Note that rules may be constitutive for a particular representation while they are regulative for the behaviour of the entire system. For instance, a generated translation candidate or system-internal representation may be constructed by a constitutive set of rules. However, if rules describe mere possibilities of the overall system behaviour which are selected or ranked by some outside device, they become regulative. According to our experience (Carl et al. 2000), rules in traditional RBMT systems are difficult to reinterpret or modify as regulative.5 This requires at least some of the decisions to be made by an external device; but rules build on each other in such a way that internal representations, if not purposely designed as such, are not ready to be communicated to an external device and external decisions cannot usually be taken into account. However, a number of research activities successfully integrate regulative rules and statistics: for Richardson et al. (2001), Gamon et al. (2002), or Ringger et al. (2004), linguistic rules describe what possible transformations a parse tree can undergo, but statistics decide under which conditions a particular rule is applied. Another example in this line is research reported in Groves and Way (2005), where they successfully integrate rule-induced chunk translations with a statistical decoder. Section 9 will give more examples and details of similar systems. As Flach and Kakas (1997) point out, rule-based (that is, “logical”, in terms of Wu 2006) approaches tend to concentrate on hypothesis formation, while probabilistic approaches are concerned with evaluation and selection of hypotheses. Rule-based and statistical techniques can thus be combined in one system, but this activity goes beyond adding statistics to traditional RBMT. In fact, the “hybrid” system is a completely new entity which cannot be explained as a mere union of their original components. The representations and the way they are processed are adapted in such a way that the components interact and complement each other in an effective way. 3 Another definition of SMT is given in Ney (2005), which will be discussed in Sect. 9. 4 A similar question was raised by Hovy (2005). 5 The traditional RBMT system was in our case cat2 (Sharp 1988, Streiter 1996).

Mach Translat (2005) 19:229–249

237

However, EBMT is more than the mere combination of regulative rules and statistics. While examples exemplify regularities and the usage of rules, they also code exceptions in a context which goes beyond the conventional notion of rule. In the next sections we analyse this situation from a system-theoretical point of view.

7 What is system theory? System theory is an interdisciplinary science that uses systems to describe and explain complex phenomena. Ludwig von Bertalanffy (1901–1972) introduced this new scientific paradigm to oppose classical physics and to criticise its deductive methods which investigate single phenomena in isolation. Instead of looking at single phenomena, which would never occur as such in reality, they would be better described and understood within a network. Von Bertalanffy (1968) introduced the “general system theory” as a new paradigm which should control model construction in all sciences. Its task was to deduce the universal principles which are valid for systems in general. The system concept represents a set of interrelated components, a complex entity in space and time which shows isomorphic similarities. Since those isomorphisms exist between living organisms, machines and social systems, one can simulate interdisciplinary models and transfer the data of one scientific domain to another. Complex systems emerge if the elements in a system are not linearly organised but mutually depend on each other. System theory investigates the complex exchange and dependency between its elements. A “system” is defined as follows: (a) A system has boundaries and it can be distinguished from its environment. It consists of its boundaries, elements and the interaction between the elements. An open system interacts with its environment, while a closed system does not. (b) The elements within the boundary of a system interact in such a way that they produce meaningful and goal-oriented behaviour. (c) The architecture and the function of the system depends on the point of view of the observer. A number of extensions to system theory have been proposed. For Luhmann (2002), systems (for instance psychological, social and biological systems) do not exist as such but rather emerge through “differentiation” out of a nondescript environment. A system recursively produces new distinctions, such as new institutions, new concepts, new species and so on, thereby reproducing itself. To avoid circularity he introduces the term “points of reentry”, at which new distinctions emerge and which allows “self-creation” and “self-consciousness” to be explained. A unique operator allows for “connectivity” (Anschlussfähigkeit) and reproduction of the system at the points of reentry. Since systems consist of conjunctions of elements, structures and relations, he calls this operator “anding”. Differences between the system and its environment emerge by successive and-ings. Structures stabilise a system by avoiding a dependence of all things on every other thing. Structures thus make the system more robust since external disturbances can be adapted locally without the need to find a new globally stable state. A system with “simple complexity” has no (or at best shallow) structures and allows the combinatorial combination of all its elements. A system has a “complex complexity”

238

Mach Translat (2005) 19:229–249

if different types of elements are available and different operations are required at different levels of organisation. Complex systems code more information, they are more selective in the way elements can be related but they also risk becoming inflexible. How much and what kind of complexity (simple or complex) a system develops is determined, among other things, by the extent to which reliable decisions are being made: a system which produces suboptimal but useful decisions may be fit to survive if it shows operational security and intelligible reasoning. For Luhmann, all systems are closed and there is no fixed world independent of an observer. The theory of Luhmann is related to that of von Foerster (1973, 1993). Von Foerster introduced “system theory of the second order”, which investigates what the observer of a system can theoretically know about a system. According to Luhmann and von Foerster, observers are part of the system itself which they are investigating. It thus becomes important to know which questions can be decided and which cannot. Von Foerster’s fundamental theorem of metaphysics gives a guideline: “We can decide only those questions which are in principle undecidable” because all other questions are already solved through the framework in which the questions are asked. For instance, the question whether the number 4 can be divided by 2 without a remainder is a decidable question. Similarly, if we want to utter English sentences, we know that the subject and the predicate have to agree in number and person, that the predicate usually follows the subject and so on. We cannot decide these questions, even though it can take very long to find out whether a question is decidable or not. However, we are free to choose the words and the contents of the sentences that we want to utter. According to von Foerster, undecidable questions are for instance: — How did the Universe come into being? — Am I a part of, or outside, the Universe? According to von Foerster, we can find only personal answers to these kinds of questions. We are free to decide in the way we wish, and then we have to take responsibility for our decision. On the other hand, only systems that can decide undecidable questions have the potential to evolve.

8 A system-theoretical review of EBMT With respect to MT there are many undecidable questions, such as whether MT is possible at all or whether MT is useful. These questions are analytically undeterminable because it depends only on us whether we find MT useful or possible. In order to approach such questions in a more constructive way, we have to find a framework which allows us to find a basis for an answer. Such questions could be: — — — —

What are the conditions under which MT is possible and useful? For whom will it be useful and what are the expectations? What kind of resources are required to reach which level of quality? Is user involvement required to produce good MT output?

These and related questions imply that the MT system and the MT user form an inseparable unit: since a user determines the conditions under which MT is useful, there will also be a number of different answers. There will be different expectations as to whether MT is used for dissemination or for assimilation, for translation of news or technical documents and so on.

Mach Translat (2005) 19:229–249

239

Reconsidering EBMT in a system-theoretical framework we have to ask what the parts of EBMT are, how they interact, how and when EBMT started, what the decidable questions are, which questions are undecidable and so on. Obviously, the most basic ingredients in EBMT are sets of examples. So we will first look at definitions of the term “example” and then see how EBMT systems deal with them. According to one dictionary, an example is: — a typical instance, — a fact … that forms a particular case of a principle … — a problem framed to illustrate a rule, — a parallel case (Onions 1973) An example is thus a system in itself, it is an entity and the representation of a whole. It puts a phenomenon into a context, thereby explaining and disambiguating it. An example is also a unit made up of several parts. The parts can be substituted or replaced. The impact of these substitutions, how they can be controlled, what sideeffects they trigger and how the side-effects can be managed, is one of the major issues discussed in the EBMT literature (Somers 2003). As Hutchins (2005) outlines, since its very beginning, EBMT incorporated ruleinduced representations and corpora. While corpora represent a collection of translation examples, rules were used to determine and represent their compositionality and to get a handle on the parts of the examples. The design of EBMT systems is thus determined by the properties of the examples. According to system theory, these properties depend on the point of view of the observer, and so it is not surprising to find many different descriptions. Sumita et al. (1990) distinguish two types of EBMT systems: (a) those transferring total sentences by the example-based paradigm and (b) those finding a way of integrating EBMT with conventional RBMT.6 Katoh and Aizawa (1994, p 29) contrast “cooperative methods” which use a more or less “tightly woven combination of example-based and rule-based approaches” with their own approach where the methods are used independently. Similarly, Carl et al. (2000) distinguish weak and strong integration of rule-based and example-based methods. A weak integration implies a sequential, stratificational combination of the methods while in a strong integration, the same data structures are shared among different components. There are many EBMT implementations realising these types in different shades. Katoh and Aizawa suggest a weak integration of example-based and rule-based techniques in a three-process MT system called ENTS. The first process translates “fixed sentences” making use of translation templates. The second process is a rule-based system translating domain-specific sentences and the third process (also rule-based) translates the remaining sentences. Similarly, but more tightly combined, Furuse and Iida (1992) provide multilevel transfer knowledge on a string, pattern and grammar level in their transfer-driven MT model. Transfer takes place at the most concrete matching level in the order of string, pattern and grammar matches. If this fails, a further analysis module is interrogated and the results are returned to the transfer module. For Sato and Nagao (1990), one of the first papers on EBMT, the basic process was to find examples of target-language sentences analogous to input source-language 6 Note that Groves and Way (2005) offer a further possibility, namely integrating EBMT with SMT.

240

Mach Translat (2005) 19:229–249

sentences, and rules were applied only when examples could not be found in the database. In Carl et al. (2000), segmentation and segment translation is decided dynamically on a subsentential level. The example-based system translates pieces which are stored in its database; the remaining pieces are handlled by the RBMT system. In contrast to this, Sumita et al. (1990) suggest that RBMT is first introduced as a base system while the EBMT system comes into action as soon as suitable phrases for EBMT are recognized. They provide a list of phenomena suited for example-based translation and show that Japanese noun phrases of the form “N1 no N2 ” are better translated into English with EBMT techniques. There are many more EBMT methods to name here, all of which tend to distinguish different kinds of translation “problems” and tackle them in different components of the system, be it slots of generalised templates, (sub)trees or rule-based components. Gough and Way (2004) make use of the so-called “marker hypothesis” to induce translation templates based on parallel sequences of words headed by closed-class vocabulary items. Brown (1999) generalises translation templates by substituting equivalence classes in translation examples to allow recursive matching and compositional translation. A similar approach is also described in Kaji et al. (1992) or Cicekli (2005), who even elaborates a type system for the generalised slots. These systems model how units can be combined and what decisions are taken in which units. However, these systems work in a linear stratificational manner. That is, one knowledge resource is interrogated at a time and information is passed from one module to the next with no attempt to “negotiate” the outcome. As a result, representations and processing steps are somewhat rigid and inflexible and thus difficult to integrate in a different context. To give an example from our previous work, in Carl et al. (2000) we felt that example-based and rule-based components were difficult to integrate because both have independent sets of representation. There was a need for an appropriate level of adaptability between the example-based and rule-based modules, and although communication was possible through sets of features, it was difficult to determine dependencies between the components and fix their desirable behaviour in all contexts. For instance, a sentence wrongly segmented by one component cannot be treated correctly by the other one and vice versa. How to decide exactly which chunks must be skipped and which parts should be tackled by which component was a problem still awaiting a proper solution. Accordingly, the idea was to elaborate a formal definition of the basic system modules to “disentangle their mutual dependencies” and a starting point was provided in Streiter et al. (2000). The main idea here was to decide automatically how and when to replace the modules dynamically. Instead of replacing the modules, recent research has investigated how the the structures they produce can adapt dynamically. In the next section, we examine whether system theory can provide a coherent framework for the description of this task.

9 Statistical EBMT As pointed out in Sect. 7, Luhmann (2002) suggests that points of “reentry” are required to allow self-creation and self-correction, a property also sought in Streiter et al. (2000). In many EBMT systems such reentry points are realised by slots in

Mach Translat (2005) 19:229–249

241

templates (Kaji et al. 1992, Brown 1999, Carl 2003, Cickeli 2006), by tree-internal nodes (Watanabe et al. 2003, Langlais and Gotti 2006, Liu et al. 2006, Quirk and Menezes 2006) and/or by connecting points to rule-based components (Sumita et al. 1990, Katoh and Aizawa 1994, Carl et al. 2000). All of these approaches allow for recursive continuation and reproduction of the structure. Thus, EBMT followers have mainly been experimenting with structured representations modelling meaningful differences and properties of reentry points in datainduced structure. As we will see, SMT has developed an informed and-ing operator, which allows the most probable continuations of the structures to be selected. During more than ten years of research to date, SMT has developed a number of models, but the basic principles were laid out in Brown et al. (1993), in which probabilities of a language model and probabilities of a translation model were multiplied to find the most probable translation. For instance, word-based SMT models as in Brown et al. (1990) make use of consistent pieces of information, probabilities of word n-grams and probabilities of word translations. A maximisation operator searches through a space of probabilities to find the most probable translations of sentences. Despite the general applicability of this operation, word-based SMT has shown that querying a corpus with questions which cannot be decided because they are language inherent (such as agreement, word order and so on) produces too many unnecessary parameters which cannot be tackled and in many instances introduces more noise than meaningful decisions. Unfortunately, in many cases the decidable parameters require rather abstract representations, such as constraints on word order, complementation, head switching, category change and so on (cf. Dorr 1994). For instance, English SVO clauses become Japanese SOV (Quirk and Menezes 2006) while other constraints express mere possibilities such as direct objects in Spanish becoming prepositional objects in English. Structures are suitable for the represention of such constraints and the need for elaborate representations is shared in Burbank et al. (2005), who claim that “In the long-term, the price of implausible models [of representation] is reduced insight, and therefore slower progress”. Recently, EBMT systems have been developed which aim at a stronger integration of structured representations and statistical and-ing operators, which selects the most likely combinations of the structures and which complements with current developments in SMT. There are various ways of tackling uncertainties in the dependencies of the data (Zadeh 2005) and a whole set of tools to ground decisions empirically in corpora is being developed. For instance, Jackson (2005) uses genetic algorithms for parsing and translation of arithmetic and logical expressions. Neural networks are used for MT (McLean 1992, Castaño et al. 1997) or for word-sense disambiguation (Chung et al. 2002). Gamon et al. (2002) use decision trees to learn the contexts for complex linguistic operations and special-purpose operators are also being suggested as, for instance, in Doi et al. (2005), who implement five operators to expand a state into a successor state. However, statistical and probabilistic methods have reached a high degree of maturity and acceptance in the MT research community. According to Ney (2005), and in contrast to the definition of Yamada and Knight (2002), SMT investigates “the more or less purely algorithmic concepts of how we model the dependencies of the data” (Ney 2005, p 16).

242

Mach Translat (2005) 19:229–249

Och and Ney (2002) extend the noisy-channel model of Brown et al. (1993) by adding weighting coefficients with feature functions and combining them in a log-linear fashion.7 Instead of using a language and a translation model, there are M feature functions hm (·) and M weighting coefficients λm that are taken into account. A search procedure seeks to find the target sentence eˆ with the highest probability (1). eˆ = argmax

M

λm hm (·)

(1)

m=1

While the feature functions hm (·) can be independent and trained on separate data, the weighting coefficients λm are trained on held-out data and the system can be fine-tuned to particular situations. Thus, Langlais and Gotti (2006) use dependency parsers to analyse parallel English and French sentences. The nodes of the trees are aligned and all source- to target-language correspondences of depth 1, so-called “treelets”, are extracted and stored in a database. When generating new translations, the retrieved target-language treelets are recombined and their best combinations are searched for by evaluating the log-linear combinations of seven feature functions. The features represent, among other things, a language model, distortion features and word-alignment features which are trained on different resources. Quirk and Menezes (2006) follow a similar approach using even more features which they classify roughly into order models, channel models and target-language models. In contrast to Langlais and Gotti, they allow treelets of arbitrary size and apply the log-linear scoring recursively when constructing the target-language derivation. Liu et al. (2006) first compute tree-to-string correspondences from aligned sentences. For generation of the translations, they make use of three feature functions: semantic similarity between the dependency tree in the corpus and those of the new sentence to be translated, translation probability of the words and a statistical language model. So, in the end, is there any difference between statistical and example-based MT?

10 Towards a disciplinary matrix for EBMT In his seminal work on the nature of scientific change, Kuhn (1962) describes the development of sciences as a revolutionary process. As new facts and observations continue to accumulate which can no longer be explained, a new scientific paradigm must replace the old one. This change is an intellectually violent act since one conceptual world view is replaced by another. 7 Even though their origins and history are very different, Och and Ney’s (2002) log-linear combination of feature functions resembles activity propagation in artificial neural networks. The activity of a neuron si in a backpropagation network (Rumelhard et al. 1986) is modelled as the sum over the weighted activities of their incoming neurons sj as in equation (i). The sigmoid function σ serves to smooth the summed activities and map it to a number between 0 and 1.

si = σ

J j=1

wij sj

(i)

Mach Translat (2005) 19:229–249

243

Central to Kuhn’s theory is the notion of “scientific paradigm” (later Kuhn (1976) referred to as “disciplinary matrix”), a “collection of beliefs shared by scientists” and “a set of agreements about how problems are to be understood”. A disciplinary matrix is more than a theory and more than a number of theories comprising, among other things: (a) Symbolic generalisations which are used, accepted and understood by the members of the scientific community. The MT community has a long tradition and accepted language for representing derivation trees, templates, features and so on which are also used in data-driven approaches. Also notions on statistical processing as introduced by Brown et al. (1993) or Och and Ney (2002) are firmly established in the literature. (b) Common metaphysical values. In the MT community, it is generally believed that translation is compositional,8 that some texts (for instance, technical texts) are better suited for automated translation and that some language pairs are easier to translate than others. These (and other) metaphysical values are helpful to decide what counts as an explanation of the level of success of solutions and what the (unresolved) problems are. (c) Prototypical solutions. According to Kuhn, example solutions serve as analogies for researchers to solve new problems. A profound understanding is required and trained during (higher) education, not so much to understand the underlying theory but rather for the researcher to acquire an intuitive knowledge of the nature of the field which is being studied, its possible states and solutions to the problems. There are various example solutions for a number of particular problems in data-driven MT, some of which are mentioned in the previous sections. (d) Common (empirical) values. Despite the fact that researchers can have very different values, they play a crucial role in the identity of a scientific community and include shared judgments about accuracy, compatibility, plausibility, predictability and so on, of the theories. Divergences of the members in the accepted tolerances of these values may provoke a crisis and finally lead to what Kuhn calls a “scientific revolution”. According to the analyses in the previous sections it is not so much the technical details which distinguishes SMT from EBMT and one system from another. In this section, we will show that it is mainly due to divergences of the common values which has fragmented the scientific community and which has also been at the core of the putative differences in SMT and EBMT. Let us start with a success story. Since the existence of the commonly used evaluation metric bleu (Papineni et al. 2002), results of various MT systems have become comparable with respect to their translation accuracy and predictability. Despite the criticism of bleu and related automatic evaluation metrics (for example NIST, Doddington 2002), automated metrics in general provide consistent rankings (Popescu-Belis 2003) and correlate with the average evaluation results by humans (Akiba et al. 2003). Critics emphasise that these metrics are not sensitive to global syntactic structure, they cannot detect paraphrases, the scores are not very meaningful in themselves, subtleties in translation output are not captured, and so on. However, for the majority, they seem to meet the requirements, as Koehn (2004) points out, of “a trusted experimental 8 Although Lepage and Denoual (2005) describe an EBMT system where even this value, or at least its conventional representation and the compositionality of transfer, is questioned.

244

Mach Translat (2005) 19:229–249

framework [which] is essential for drawing conclusions on the effects of system changes” (Koehn 2004, p 394). As a result we see a convergence of common values and efforts in the past few years, mainly triggered through a number of MT evaluation campaigns such as the NIST MT evaluation plan since 2002, the IWSLT evaluations beginning in 2004, TC-Star 2006, and others. These evaluation campaigns are open to all “MT practitioners” and serve as a normalising factor of the standards and values in the community. While they also bring together researchers from different background, it is to be asked what kinds of system changes are to be tested and accordingly, how the training and test data should be designed. Koehn (2004) suggests selecting the test sentences from different parts of a potentially infinite set of random sentences and proving statistically significant improvements in the systems’ translation quality based on these random sets. It is thus important (for SMT systems) not to test on training data, and not to use the same test set repeatedly. Underlying this, we have the assumption that “an SMT system may be able to produce perfect translations even when the sentence given as input does not resemble any sentence from the training corpus” (Marcu 2001, p 379) and it is to be tested how well this can be achieved on average. This kind of “performance evaluation” is not in all cases the best means of assessing different technical implementations. For many EBMT systems the goal is not first of all to enhance the average translation quality for any random sentence but to see how translation performance can be learned and increased for restricted text. For Quirk and Menezes (2006, p 46) “an EBMT system attempts to reuse translation information from its parallel corpus, preferably reusing information in segments that are as large as possible”. Since alignments remain available in the system, EBMT systems have the potential to “‘learn’ from previously encountered data”, whereas many SMT systems process all input “in the same way as ‘unseen’ data” (Way and Gough 2005). Evaluation strategies for EBMT will thus focus on different aspects. In such learning systems, one would like to investigate how successful a learning strategy is, how reliable generalisations are and to what extent they can be reused in different contexts. It might not even be desirable to increase statistically the overall translation quality of a random set of test sentences if this implies as a side-effect that the system “forgets” how to produce a small but well-defined set of “good” translations. Rather, one would expect perfect translations for sentences which are in the reference set and a gradual decrease of translation capacities for material which it does not contain or which is composed from parts of different reference sentences. However, it has never been investigated to what extent we can expect to produce translations for sentences that have never been seen and how this relates to the capacity of reproducing translations which are in fact contained in the training set. For instance, if a noun in a reference translation is replaced by another one, unknown to the system, one would expect to obtain similar translations for the entire sentence, modulo the new word. However, even if only known word material occurs in structures and combinations never seen in the reference sentences, it is more likely that the system might not know how to produce a translation. Knowing subtle (lexical and structural) differences between the test sentences and the reference set will allow us to draw deeper conclusions about the properties and potential of the system and its learnability. On the other hand, with structured representations it becomes increasingly interesting and important to test the

Mach Translat (2005) 19:229–249

245

translatability and reliability of the different structure parameters systematically as this will give hints as to where the problems of the approaches lie, which structures are particularly difficult and which are easy to translate, and, as a consequence, which parts of the translations are better lexicalised, which can be tackled through statistics, and which by means of rules. Such tests will not only be interesting for “diagnostic evaluation” to identify limitations, errors and deficiencies of the system but they might also help a potential user in an “adequacy evaluation to determine the fitness of MT systems within a specified operational context” (Hutchins 1996). The ultimate constraining factor in data-driven MT would be a set of reference translations. Departing from this set and a toolbox of analyses, generation and recombination techniques we would have to ask: How far can we go? What are the translation problems that can be solved given the reference translations? Which problems are hard to estimate or remain uncertain and unsolved? In order to push these boundaries further, how do we integrate regular and exceptional knowledge with structures and statistics?

11 Conclusion EBMT seeks to produce (parts of) translations relying on examples as a “means of capturing and storing knowledge about language and translation” (Simard 2005). The examples are considered to be linguistic entities which consist of words, phrases and other constituents as well as the relation between them. The process of translation is modelled in many different ways through a (sometimes recursive) process which is in line with the analyses of the examples. SMT models the dependencies in the data and has developed a number of “connectivity” operators which have recently been used in some EBMT systems (Langlais and Gotti 2006, Liu et al. 2006, Quirk and Menezes 2006). Statistical operators have the potential to smooth the continuation of generated structure in their reentry points. A major issue in this research is the generation and selection of appropriate features and to investigate to what extent they are suitable for these operators. While some of the features may involve sophisticated linguistic processing, Yamada and Knight hint that features “should be carefully selected [so as] not to cause data-sparseness problems” (2002, p 304). These newly emerging possibilities give a chance to investigate and to verify empirically in an MT context which questions are better analytically decided (that is, properties of the data structure and generated features) and which questions are better solved by consulting a reference corpus to exploit fully the knowledge stored in the examples and, finally, better meet users’ requirements. Section 10 of this paper has suggested an evaluation framework which takes into account these advanced learning capacities by establishing a fine-grained testing scenario that contains various sets of sentences which have different degrees of similarity to those in the training set. System theory predicts that the introduction of new elements, operators or relations in a system entails not only new behaviour of the system as a whole, but also reflects on the properties of other parts in the system. Thus, a new type of element entails the creation of new types of relations or operators and vice versa. In line with

246

Mach Translat (2005) 19:229–249

this, we currently see an increasing number of features being explored by statistical operators while the remaining architecture adapts accordingly. In such an orthogonalisation process, new qualities are likely to emerge as the behaviour of the whole exceeds those of its components. One reason why emergent behaviour is hard to predict is that the number of interactions between components of a system increases combinatorially, thus potentially allowing many new and subtle types of behaviour to emerge. Acknowledgements I would like to thank Harold Somers and Andy Way for their constructive feedback on earlier versions of this paper.

References Akiba Y, Sumita E, Nakaiwa H, Yamamoto S, Okunoz HG (2003) Experimental comparison of MT evaluation methods: Red vs. bleu. In: MT Summit IX: Proceedings of the ninth machine translation summit, New Orleans, USA, pp 1–8 Ashby WR (1956) An introduction to cybnernetics. Chapman & Hall, London Brown PE, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Lafferty JD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16:79–85; repr. in Nirenburg et al. (2003), 355–362 Brown PE, Della Pietra VJ, Della Pietra SA, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19: 263–311 Brown RD (1999) Adding linguistic knowledge to a lexical example-based translation system. In: Proceedings of the eighth international conference on theoretical and methodological issues in machine translation (TMI-99), Chester, England, pp 22–32 Burbank A, Carpuat M, Clark S, Dreyer M, Fox P, Groves D, Hall K, Hearne M, Melamed ID, Shen Y, Way A, Wellington B, Wu D (2005) Final report of the 2005 language engineering workshop on statistical machine translation by parsing. Johns Hopkins University Center for Speech and Language Processing, Baltimore, MD, http://www.clsp.jhu.edu/ws2005/groups/statistical/ [Last accessed 4 August 2006] Callison-Burch C, Bannard C, Schroeder J (2005) A compact data structure for searchable translation memories. In: Proceedings of the 10th annual conference of the European Association for Machine Translation, Budapest, Hungary, pp 59–65 Carl M (2000) A model of competence for corpus-based machine translation. In: Proceedings of the 18th international conference on computational linguistics: COLING 2000 in Europe, Saarbrücken, Germany, pp 997–1001 Carl M (2003) Inducing translation grammars from bracketed alignments. In: Carl and Way (2003), pp 339–363 Carl M, Iomdin LL, Pease C, Streiter O (2000) Towards dynamic linkage of example-based and rule-based machine translation. Mach Translat 15: 223–257 Carl M, Way A (eds) (2003) Recent advances in example-based machine translation, Kluwer Academic Publisher, Dordrecht, The Netherlands Castaño MA, Casacuberta F, Vidal E (1997) Machine translation using neural networks and finite state models. In: Proceedings of the 7th international conference on theoretical and methodological issues in machine translation, Santa Fe, NM, pp 160–167 Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: 43rd annual meeting of the Association for Computational Linguistics, Ann Arbor, MI, pp 263–270 Chung Y-J, Kang S-J, Moon K-H, Lee J-H (2002) Word sense disambiguation in a Korean-to-Japanese MT system using neural networks. In: Proceedings of the workshop on machine translation in Asia, Taipei, Taiwan, pp 74–80 Cicekli I (2005) Inducing translation templates with type constraints. Mach Translat 19: 283–299 Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the ARPA workshop on human language technology notebook proceedings, San Diego, CA, pp 139–145 Doi T, Yamamoto H, Sumita E (2005) Graph-based retrieval for example-based machine translation using edit distance. In: Proceedings of the MT Summit X workshop: Second workshop on example-based machine translation, Phuket, Thailand, pp 51–59

Mach Translat (2005) 19:229–249

247

Dorr BJ (1994) Machine translation divergences. Comput Linguist 20: 597–633 Flach P, Kakas A (1997) Workshop report: IJCAI’97 workshop on abduction and induction in AI. Nagoya, Japan. http://www.cs.bris.ac.uk/flach/IJCAI97/IJCAI97report.html [Last accessed 4 August 2006] Furuse O, Iida H (1992) Cooperation between transfer and analysis in example-based framework. In: Proceedings of the fifteenth [sic] international conference on computational linguistics, COLING-92, Nantes, France, pp 645–651 Gamon M, Ringger E, Corston-Oliver S, Moore R (2002) Machine-learned contexts for linguistic operations in German sentence realization. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, pp 25–32 Gough N, Way A (2004) Example-based controlled translation. In: Proceedings of the 9th EAMT workshop “Broadening horizons of machine translation and its applications”, Valletta, Malta, pp 73–81 Groves D, Way A (2005) Hybrid data driven models of MT. Mach Translat 19: 301–323 Hovy E (2005) Re: [Mt-list] Phrasal SMT vs EBMT, posted to mt-list, 8 February 2005 http://www.mailarchive.com/[email protected]/msg00777.html [Last accessed 4 August 2006] Hutchins J (1996) Evaluation of machine translation and translation tools. In: Mariana J (ed) Evaluation, Ch 13 of Cole RA, Mariani J, Uszkoreit H, Zaenen A, Zue V (eds) Survey of the state of the art in human language technology, Report for the National Science Foundation and European Commission, http://cslu.cse.ogi.edu/HLTsurvey/ch13node5.html [Last accessed 7 August 2006] Hutchins J (2005) Towards a definition of example-based machine translation. In: Proceedings of the MT Summit X workshop: Second workshop on example-based machine translation, Phuket, Thailand, pp 63–70 Hutchins J (2006) Example-based machine translation: a review and commentary. Mach Translat 19:197–211 Jackson D (2005) Parsing and translation of expressions by genetic programming. In: Proceedings of the genetic and evolutionary computation conference (GECCO), Washington, DC, pp 1681–1688 Kaji H, Kida Y, Morimoto Y (1992) Learning translation templates from bilingual text. In: Proceedings of the fifteenth [sic] international conference on computational linguistics, COLING-92, Nantes, France, pp 672–678 Katoh N, Aizawa T (1994) Machine translation of sentences with fixed expressions. In: 4th conference on applied natural language processing, Stuttgart, Germany, pp 28–33 Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona, Spain, pp 388–395 Kuhn T (1962, 1976) The structure of scientific revolutions, 3rd edn. University of Chicago Press, Chicago, IL Langlais P, Gotti F (2006) EBMT by tree-phrasing. Mach Translat 20:1–25 Lepage Y, Denoual E (2005) Purest ever example-based machine translation: detailed presentation and assessment. Mach Translat 19:251–282 Liu Z, Wang H, Wu H (2006) Example-based machine translation based on TSC and statistical generation. Mach Translat 20:27–44 Luhmann N (2002) Einführung in die Systemtheorie [Introduction to system theory]. Carl-AuerSysteme Verlag, Heidelberg, Germany Maegaard B (ed), Bel N, Dorr B, Hovy E, Knight K, Iida H, Boitet C, Maegaard B, Wilks Y (1999) Machine translation. In: Hovy E, Ide N, Frederking R, Mariani J, Zampolli A (eds) Multilingual information management: current levels and future abilities, report commissioned by the US National Science Foundation and also delivered to the European Commission’s Language Engineering Office and the US Defense Advanced Research Projects Agency http://www.cs.cmu.edu/ref/mlim/chapter4.html [Last accessed 4 August 2006] Marcu D (2001) Towards a unified approach to memory- and statistical-based machine translation. In: Association for Computational Linguistics 39th annual meeting and 10th conference of the European Chapter, Toulouse, France, pp 378–385 McLean IJ (1992) Example-based machine translation using connectionist matching. In: Fourth international conference on theoretical and methodological issues in machine translation: empiricist vs. rationalist methods in MT, TMI-92, Montreal, Canada, pp 35–43 Nagao M (1984) A framework of a mechanical translation between Japanese and English by analogy principle. In: Elithorn A, Banerji R (eds) Artificial and human intelligence (Edited review papers presented at the international NATO symposium on artificial and human intelligence), NorthHolland, Amsterdam, The Netherlands, pp 173–180; repr. in Nirenburg et al. (2003) pp 351–354

248

Mach Translat (2005) 19:229–249

Ney H (2005) One decade of statistical machine translation: 1996–2005. In: MT Summit X: The tenth machine translation summit, Phuket, Thailand, pp i-12–i-17 Nirenburg S, Somers H, Wilks Y (eds) (2003) Readings in machine translation, MIT Press, Cambridge, MA Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, pp 295–302 Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29:19–51 Onions C (1973) The shorter Oxford English dictionary. Oxford University Press, Oxford, England Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: A method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, pp 311–318 Popescu-Belis A (2003) An experiment in comparative evaluation: Humans vs. computers. In: Proceedings of the ninth machine translation summit, New Orleans, USA, pp 307–314 Quirk C, Menezes A (2006) Dependency treelet translation: The convergence of statistical and example-based machine-translation? Mach Translat 20:45–66 Richardson SD, Dolan WB, Menezes A, Pinkham J (2001) Achieving commercial-quality translation with example-based methods. In: MT Summit VIII: “Machine translation in the information age”, Santiago de Compostela, Spain, pp 293–298 Ringger E, Gamon M, Moore RC, Rojas D, Smets M, Corston-Oliver S (2004) Lexically informed statistical models of constituent structure for ordering in sentence realization. In: Coling: 20th international conference on computational linguistics, Geneva, Switzerland, pp 673–680 Rumelhard DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL, PDP Research Group (eds) Parallel distributed processing: Explorations in the microstructure of cognition, Vol 1: Foundations. MIT Press, Cambridge, MA, pp 318–363 Sato S, Nagao M (1990) Towards memory-based translation. In: COLING-90, papers presented to the 13th international conference on computational linguistics, vol 3. Helsinki, Finland, pp 247–252 Searle JR (1995) The construction of social reality. Allen Lane, London, UK Sharp R (1988) Cat2 – Implementing a formalism for multi-lingual MT. In: Proceedings of the second international conference on theoretical and methodological issues in machine translation of natural languages, Pittsburgh, Pennsylvania, pp 76–87 Simard M (2005) [Mt-list] Phrasal SMT vs EBMT, posted to mt-list, 9 February 2005 http://www.mailarchive.com/[email protected]/msg00778.html [Last accessed 7 August 2006] Somers H (2003) An overview of EBMT. In: Carl and Way (2003), pp 3–57 Streiter O (1996) Linguistic modeling for multilingual machine translation. Shaker Verlag, Aachen, Germany Streiter O, Carl M, Iomdin LL (2000) A virtual translation machine for hybrid machine translation. In: Proceedings of the Dialogue 2000 international seminar in computational linguistics and applications, Tarusa, Russia, pp 1–13 Sumita E, Iida H, Kohyama H (1990) Translating with examples: a new approach to machine translation. In: Proceedings of the third international conference on theoretical and methodological issues in machine translation of natural language, Austin, Texas, pp 203–212 Turcato D, Popowich F (2003) What is example-based machine translation? In: Carl and Way (2003), pp 59–81 von Bertalanffy L (1968) General system theory: foundations, development, applications. George Braziller, New York, NY von Foerster H (1973) On constructing a reality. In: Preiser WFE (ed) Environmental design research, vol 2. Dowden, Hutchinson & Ross, Stroudsburg, PA, pp 35–46 von Foerster H (1993) KybernEthik [CybernEthics]. Merve Verlag, Berlin Watanabe H, Kurohashi S, Aramaki E (2003) Finding translation patterns from dependency structures. In: Carl and Way (2003), pp 397–412 Way A (2003) Translating with examples: The LFG-DOT models of translation. In: Carl and Way (2003), pp 443–472 Way A, Gough N, (2005) Comparing example-based & statistical machine translation. Presentation to Univesity of Edinburg, ICCS/HCRC seminar series; slides available at http:// www.computing.dcu.ie/∼away/PUBS/2005/Edinburgh.ppt, [Last accessed 30th November 2006]

Mach Translat (2005) 19:229–249

249

Wu D (2006) MT model space: Statistical vs. compositional vs. example-based machine translation. Mach Translat 19:213–228 Yamada K, Knight K (2001) A syntax-based statistical translation model. In: Association for Computational Linguistics 39th annual meeting and 10th conference of the European Chapter, Toulouse, France, pp 523–529 Yamada K, Knight K (2002) A decoder for syntax-based statistical MT. In: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, pp 303–310 Zadeh LA (2005) Toward a generalized theory of uncertainty (GTU): An outline. Inform Sci 172: 1–40 Zhang Y, Vogel S (2005) An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora. In: 10th annual conference of the European Association for Machine Translation, Budapest, Hungary, pp 294–301

Lihat lebih banyak...

A system-theoretical view of EBMT

Descrição do Produto

Comentários