Censoring (statistics)

From Wikipedia, the free encyclopedia - View original article

 
Jump to: navigation, search

In statistics, engineering, economics, and medical research, censoring occurs when the value of a measurement or observation is only partially known.

For example, suppose a study is conducted to measure the impact of a drug on mortality rate. In such a study, it may be known that an individual's age at death is at least 75 years (but may be more). Such a situation could occur if the individual withdrew from the study at age 75, or if the individual is currently alive at the age of 75.

Censoring also occurs when a value occurs outside the range of a measuring instrument. For example, a bathroom scale might only measure up to 150 kg. If a 175 kg individual is weighed using the scale, the observer would only know that the individual's mass is at least 150 kg.

Types[edit]

Censoring should not be confused with the related idea truncation. With censoring, observations result either in knowing the exact value that applies, or in knowing that the value lies within an interval. With truncation, observations never result in values outside a given range – values in the population outside the range are never seen or never recorded if they are seen. Note that in statistics, truncation is not the same as rounding.

The problem of censored data, in which the observed value of some variable is partially known, is related to the problem of missing data, where the observed value of some variable is unknown.

Interval censoring can occur when observing a value requires follow-ups or inspections. Left and right censoring are special cases of interval censoring, with the beginning of the interval at zero or the end at infinity, respectively.

Estimation methods for using left-censored data vary, and not all methods of estimation may be applicable to, or the most reliable, for all data sets.[1]

Epidemiology[edit]

One of the earliest attempts to analyse a statistical problem involving censored data was Daniel Bernoulli's 1766 analysis of smallpox morbidity and mortality data to demonstrate the efficacy of vaccination.[2]

Operating life testing[edit]

Example of five replicate tests resulting in four failures and one suspended time.

Reliability testing often consists of conducting a test on an item (under specified conditions) to determine the time it takes for a failure to occur.

An analysis of the data from replicate tests includes both the times-to-failure for the items that failed and the time-of-test-termination for those that did not fail.

Analysis[edit]

Special techniques may be used to handle censored data. Tests with specific failure times are coded as actual failures; censored data are coded for the type of censoring and the known interval or limit. Special software programs (often reliability oriented) can conduct a maximum likelihood estimation for summary statistics, confidence intervals, etc.

References[edit]

  1. ^ Helsel, D. Much ado about next to Nothing: Incorporating Nondetects in Science, Ann. Occup. Hyg., Vol. 54, No. 3, pp. 257-262, 2010
  2. ^ Bernoulli D. (1766) "Essai d’une nouvelle analyse de la mortalité causée par la petite vérole. Mem. Math. Phy. Acad. Roy. Sci. Paris, reprinted in Bradley (1971) 21 and Blower (2004)

Bibliography[edit]

External links[edit]

See also[edit]