Hapax legomenon

From Wikipedia, the free encyclopedia - View original article

 
Jump to: navigation, search
Rank-frequency plot for words in the novel Moby-Dick. About 44% of the distinct set of words in this novel, such as "matrimonial", occur only once, and so are hapax legomena (red). About 17%, such as "dexterity", appear twice (so-called dis legomena, in blue). Zipf's law predicts that the words in this plot should approximately fit a straight line.

A hapax legomenon (/ˈhæpəks lɨˈɡɒmɨnɒn/ also /ˈhæpæks/ or /ˈhpæks/;[1][2] pl. hapax legomena; sometimes abbreviated to hapax, pl. hapaxes) is a word that occurs only once within a context, either in the written record of an entire language, in the works of an author, or in a single text. The term is sometimes incorrectly used to describe a word that occurs in just one of an author's works, even though it occurs more than once in that work. Hapax legomenon is a transliteration of Greek ἅπαξ λεγόμενον, meaning "(something) said (only) once".[3]

The related terms dis legomenon, tris legomenon, and tetrakis legomenon respectively (/ˈdɪs/, /ˈtrɪs/, /ˈtɛtrəkɨs/) refer to double, triple, or quadruple occurrences, but are far less commonly used.

Hapax legomena are quite common, as predicted by Zipf's law,[4] which states that the frequency of any word in a work (corpus) is inversely related to its rank in the frequency table. For large corpora, about 40% to 60% of the words (counting by type) are hapax legomena, and another 10% to 15% are dis legomena.[5] Thus, in the Brown Corpus of American English, about half of the 50,000 words are hapax legomena within that corpus.[6]

Note that hapax legomenon refers to a word's appearance in a body of text and to neither its origin nor its prevalence in speech. It thus differs from a nonce word, which may never be recorded, or which may find currency and may be widely recorded, or which may appear several times in the work which coins it, and so on.

Significance[edit]

Workman'sPaulineHapaxes.svg
Workman'sShakespearePlays.svg

Hapax legomena in ancient texts are usually difficult to decipher, since it is easier to infer meaning from multiple contexts than from just one. For example, many of the remaining undeciphered Mayan glyphs are hapax legomena, and Biblical (particularly Hebrew) hapax legomena pose sometimes difficult issues in translation. Hapax legomena also pose challenges in natural language processing.[7]

Some scholars consider Hapax legomena useful in determining the authorship of written works. For example, each of Shakespeare's plays contains a roughly similar percentage of hapax legomena not found elsewhere in his work.

P.N. Harrison, in The Problem of the Pastoral Epistles (1921)[8] made hapax legomena popular among Bible scholars, when he argued that there are considerably more of them in the three Pastoral Epistles than in other Pauline Epistles. He argued that the number of hapax legomena in a putative author's corpus indicates his or her vocabulary and is characteristic of the author as an individual.

Harrison's theory has faded in significance due to a number of problems raised by other scholars. For example, in 1896, W.P. Workman found the following numbers of hapax legomena in each Pauline Epistle: Rom. 113, I Cor. 110, II Cor. 99, Gal. 34, Eph. 43 Phil. 41, Col. 38, I Thess. 23, II Thess. 11, Philem. 5, I Tim. 82, II Tim. 53, Titus 33. At first glance, the last three totals (for the Pastoral Epistles) are not out of line with the others.[9] To take account of the varying length of the epistles, Workman also calculated the average number of hapax legomena per page of the Greek text, which ranged from 3.6 to 13, as summarized in the diagram on the right.[9] Although the Pastoral Epistles have more hapax legomena per page, Workman found the differences to be moderate in comparison to the variation among other Epistles. This was reinforced when Workman looked at several plays by Shakespeare, which showed similar variations (from 3.4 to 10.4 per page of Irving's one-volume edition), as summarized in the second diagram on the right.[9]

Apart from author identity, there are several other factors that can explain the number of hapax legomena in a work:

In the particular case of the Pastoral Epistles, all of these variables are quite different from those in the rest of the Pauline corpus, and hapax legomena are no longer widely accepted as strong indicators of authorship (although the authorship of the Pastorals is subject to debate on other grounds).[11]

There are also subjective questions over whether two forms amount to "the same word": dog vs dogs, clue vs clueless, sign vs signature; many other gray cases also arise. The Jewish Encyclopedia points out that, although there are 1,500 hapaxes in the Hebrew Bible, only about 400 are not obviously related to other attested word forms.[12]

It would not be especially difficult for a forger to construct a work with any percentage of hapax legomena desired. However, it seems unlikely that forgers much before the 20th century would have conceived such a ploy, much less thought it worth the effort.

A final difficulty with the use of hapax legomena for authorship determination is that there is considerable variation among works known to be by a single author, and disparate authors often show similar values. In other words, hapax legomena are not a reliable indicator. Authorship studies now usually use a wide range of measures to look for patterns rather than rely upon single measurements.

Computer science[edit]

In the fields of computational linguistics and natural language processing (NLP), esp. corpus linguistics and machine-learned NLP, it is common to disregard hapax legomena (and sometimes other infrequent words), as they are likely to have little value for computational techniques. This disregard has the added benefit of significantly reducing the memory use of an application, since, by Zipf's law, many words are hapaxes.[13]

Examples[edit]

The following are some examples of hapax legomena in languages or corpora.

Arabic examples[edit]

English examples[edit]

Greek examples[edit]

Hebrew examples[edit]

There are about 1,500 Hapax legomena in the Old Testament; however, due to Hebrew roots, suffixes and prefixes, only 400 are "true" hapax legomena. A full list can be seen at the Jewish Encyclopedia entry for "Hapax Legomena."[19]

Some examples include:

Irish examples[edit]

Italian examples[edit]

Latin examples[edit]

See also[edit]

References[edit]

  1. ^ "hapax legomenon". Oxford English Dictionary (3rd ed.). Oxford University Press. September 2005. 
  2. ^ "hapax legomenon". Dictionary.com Unabridged. Random House, Inc. 
  3. ^ ἅπαξ. Liddell, Henry George; Scott, Robert; A Greek–English Lexicon at the Perseus Project
  4. ^ Paul Baker, Andrew Hardie, and Tony McEnery, A Glossary of Corpus Linguistics, Edinburgh University Press, 2006, page 81, ISBN 0-7486-2018-4.
  5. ^ András Kornai, Mathematical Linguistics, Springer, 2008, page 72, ISBN 1-84628-985-8.
  6. ^ Kirsten Malmkjær, The Linguistics Encyclopedia, 2nd ed, Routledge, 2002, ISBN 0-415-22210-9, p. 87.
  7. ^ Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing,MIT Press, 1999, page 22, ISBN 0-262-13360-1.
  8. ^ P.N. Harrison. The Problem of the Pastoral Epistles. Oxford University Press, 1921.
  9. ^ a b c Workman, "The Hapax Legomena of St. Paul", Expository Times, 7 (1896:418), noted in The Catholic Encyclopedia, s.v. "Epistles to Timothy and Titus".
  10. ^ Steven J. DeRose. "A Statistical Analysis of Certain Linguistic Arguments Concerning the Authorship of the Pastoral Epistles." Honors thesis, Brown University, 1982; Terry L. Wilder. "A Brief Defense of the Pastoral Epistles’ Authenticity". Midwestern Journal of Theology 2.1 (Fall 2003), 38–4. (on-line)
  11. ^ Mark Harding. What are they saying about the Pastoral epistles?, Paulist Press, 2001, page 12. ISBN 0-8091-3975-8, ISBN 978-0-8091-3975-0.
  12. ^ Article on Hapax Legomena in The Jewish Encyclopedia [1]. Includes a list of all the Old Testament hapax legomena, by book.
  13. ^ D. Jurafsky and J.H. Martin (2009). Speech and Language Processing. Prentice Hall.
  14. ^ Orhan Elmaz. "Die Interpretationsgeschichte der koranischen Hapaxlegomena." Doctoral thesis, University of Vienna, 2008, page 29
  15. ^ Hibbard, ed. by G.R. (1998). Hamlet (Reissued as ... pbk. ed.). Oxford: Oxford University Press. p. 163. ISBN 9780192834164. 
  16. ^ e.g. Richard Bauckham The Jewish world around the New Testament: collected essays I p431 2008: "a New Testament hapax, which occurs 19 times in Hermas. . ."
  17. ^ John F. Walvoord and Roy B. Zuck, The Bible Knowledge Commentary: New Testament Edition, David C. Cook, 1983, page 860, ISBN 0-88207-812-7.
  18. ^ Pharr, Clyde (1920). Homeric Greek, a book for beginners. D. C. Heath & Co., Publishers. p. xxii. 
  19. ^ Jewish Encyclopedia entry for Hapax Legomena
  20. ^ "Ark, Design and Size" Aid to Bible Understanding, Watchtower Bible and Tract Society, 1971.
  21. ^ [2]

External links[edit]