Description
The generic function quantile
produces sample quantiles corresponding to the given probabilities. The smallest observation corresponds to a probability of 0 and the largest to a probability of 1.
Usage
quantile(x, …)# S3 method for defaultquantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE, type = 7, …)
Arguments
x
numeric vector whose sample quantiles are wanted, or an object of a class for which a method has been defined (see also ‘details’). NA
and NaN
values are not allowed in numeric vectors unless na.rm
is TRUE
.
probs
numeric vector of probabilities with values in \([0,1]\). (Values up to 2e-14 outside that range are accepted and moved to the nearby endpoint.)
na.rm
logical; if true, any NA
and NaN
's are removed from x
before the quantiles are computed.
names
logical; if true, the result has a names
attribute. Set to FALSE
for speedup with many probs
.
type
an integer between 1 and 9 selecting one of the nine quantile algorithms detailed below to be used.
…
further arguments passed to or from other methods.
Types
All sample quantiles are defined as weighted averages of consecutive order statistics. Sample quantiles of type \(i\) are defined by: $$Q_{i}(p) = (1 - \gamma)x_{j} + \gamma x_{j+1}$$ where \(1 \le i \le 9\), \(\frac{j - m}{n} \le p < \frac{j - m + 1}{n}\), \(x_{j}\) is the \(j\)th order statistic, \(n\) is the sample size, the value of \(\gamma\) is a function of \(j = \lfloor np + m\rfloor\) and \(g = np + m - j\), and \(m\) is a constant determined by the sample quantile type. Discontinuous sample quantile types 1, 2, and 3 For types 1, 2 and 3, \(Q_i(p)\) is a discontinuous function of \(p\), with \(m = 0\) when \(i = 1\) and \(i = 2\), and \(m = -1/2\) when \(i = 3\). Inverse of empirical distribution function. \(\gamma = 0\) if \(g = 0\), and 1 otherwise. Similar to type 1 but with averaging at discontinuities. \(\gamma = 0.5\) if \(g = 0\), and 1 otherwise. SAS definition: nearest even order statistic. \(\gamma = 0\) if \(g = 0\) and \(j\) is even, and 1 otherwise. Continuous sample quantile types 4 through 9 For types 4 through 9, \(Q_i(p)\) is a continuous function of \(p\), with \(\gamma = g\) and \(m\) given below. The sample quantiles can be obtained equivalently by linear interpolation between the points \((p_k,x_k)\) where \(x_k\) is the \(k\)th order statistic. Specific expressions for \(p_k\) are given below. \(m = 0\). \(p_k = \frac{k}{n}\). That is, linear interpolation of the empirical cdf. \(m = 1/2\). \(p_k = \frac{k - 0.5}{n}\). That is a piecewise linear function where the knots are the values midway through the steps of the empirical cdf. This is popular amongst hydrologists. \(m = p\). \(p_k = \frac{k}{n + 1}\). Thus \(p_k = \mbox{E}[F(x_{k})]\). This is used by Minitab and by SPSS. \(m = 1-p\). \(p_k = \frac{k - 1}{n - 1}\). In this case, \(p_k = \mbox{mode}[F(x_{k})]\). This is used by S. \(m = (p+1)/3\). \(p_k = \frac{k - 1/3}{n + 1/3}\). Then \(p_k \approx \mbox{median}[F(x_{k})]\). The resulting quantile estimates are approximately median-unbiased regardless of the distribution of \(m = p/4 + 3/8\). \(p_k = \frac{k - 3/8}{n + 1/4}\). The resulting quantile estimates are approximately unbiased for the expected order statistics if Further details are provided in Hyndman and Fan (1996) who recommended type 8. The default method is type 7, as used by S and by R < 2.0.0.quantile
returns estimates of underlying distribution quantiles based on one or two order statistics from the supplied elements in x
at probabilities in probs
. One of the nine quantile algorithms discussed in Hyndman and Fan (1996), selected by type
, is employed.
x
.x
is normally distributed.
Details
A vector of length length(probs)
is returned; if names = TRUE
, it has a names
attribute.
NA
and NaN
values in probs
are propagated to the result.
The default method works with classed objects sufficiently like numeric vectors that sort
and (not needed by types 1 and 3) addition of elements and multiplication by a number work correctly. Note that as this is in a namespace, the copy of sort
in base will be used, not some S4 generic of that name. Also note that that is no check on the ‘correctly’, and so e.g.quantile
can be applied to complex vectors which (apart from ties) will be ordered on their real parts.
There is a method for the date-time classes (see "POSIXt"
). Types 1 and 3 can be used for class "Date"
and for ordered factors.
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, American Statistician 50, 361--365. 10.2307/2684934.
See Also
ecdf
for empirical distributions of which quantile
is an inverse; boxplot.stats
and fivenum
for computing other versions of quartiles, etc.
Examples
# NOT RUN {quantile(x <- rnorm(1001)) # Extremes & Quartiles by defaultquantile(x, probs = c(0.1, 0.5, 1, 2, 5, 10, 50, NA)/100)### Compare different typesquantAll <- function(x, prob, ...) t(vapply(1:9, function(typ) quantile(x, prob=prob, type = typ, ...), quantile(x, prob, type=1)))p <- c(0.1, 0.5, 1, 2, 5, 10, 50)/100signif(quantAll(x, p), 4)## for complex numbers:z <- complex(re=x, im = -10*x)signif(quantAll(z, p), 4)# }
Run the code above in your browser using DataLab