How to Easily Calculate Percentiles in R (With Examples) (2024)

The nth percentile of a dataset is the value that cuts off the first n percent of the data values when all of the values are sorted from least to greatest.

For example, the 90th percentile of a dataset is the value that cuts of the bottom 90% of the data values from the top 10% of data values.

One of the most commonly used percentiles is the 50th percentile, which represents the median value of a dataset: this is the value at which 50% of all data values fall below.

Percentiles can be used to answer questions such as:

  • What score does a student need to earn on a particular test to be in the top 10% of scores? To answer this, we would find the 90th percentile of all scores, which is the value that separates the bottom 90% of values from the top 10%.
  • What heights encompass the middle 50% of heights for students at a particular school? To answer this, we would find the 75th percentile of heights and 25th percentile of heights, which are the two values that determine the upper and lower bounds for the middle 50% of heights.

How to Calculate Percentiles in R

We can easily calculate percentiles in R using the quantilefunction, which uses the following syntax:

quantile(x, probs = seq(0, 1, 0.25))

where:

  • x: a numeric vector whose percentiles we wish to find.
  • probs: a numeric vector of probabilities in [0,1] that represent the percentiles we wish to find.

The following examples show how to use this function in different scenarios.

Finding Percentiles of a Vector

The following code illustrates how to find various percentiles for a given vector in R:

#create vector of 100 random values uniformly distributed between 0 and 500data <- runif(100, 0, 500)#Find the quartiles (25th, 50th, and 75th percentiles) of the vectorquantile(data, probs = c(.25, .5, .75))# 25% 50% 75% # 97.78961 225.07593 356.47943 #Find the deciles (10th, 20th, 30th, ..., 90th percentiles) of the vectorquantile(data, probs = seq(.1, .9, by = .1))# 10% 20% 30% 40% 50% 60% 70% 80% # 45.92510 87.16659 129.49574 178.27989 225.07593 300.79690 337.84393 386.36108 # 90% #423.28070#Find the 37th, 53rd, and 87th percentilesquantile(data, probs = c(.37, .53, .87))# 37% 53% 87% #159.9561 239.8420 418.4787 

Finding Percentiles of a Data Frame Column

To illustrate how to find the percentiles of a specific data frame column, we’ll use the built-in dataset iris:

#view first six rows of iris datasethead(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa

The following code shows how to find the 90th percentile value for the column Sepal.Length:

quantile(iris$Sepal.Length, probs = 0.9)#90% #6.9 

Finding Percentiles of Several Data Frame Columns

We can also find percentiles for several columns at once using the apply() function:

#define columns we want to find percentiles forsmall_iris<- iris[ , c('Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width')]#use apply() function to find 90th percentile for every columnapply(small_iris, 2, function(x) quantile(x, probs = .9))#Sepal.Length Sepal.Width Petal.Length Petal.Width # 6.90 3.61 5.80 2.20 

Finding Percentiles by Group

We can also find percentiles by group in R using the group_by() function from the dplyr library.

The following code illustrates how to find the 90th percentile of Sepal.Length for each of the
three species in the iris dataset:

#load dplyr librarylibrary(dplyr)#find 90th percentile of Sepal.Length for each of the three speciesiris %>% group_by(Species) %>% summarise(percent90 = quantile(Sepal.Length, probs = .9))# A tibble: 3 x 2# Species percent90# #1 setosa 5.41#2 versicolor 6.7 #3 virginica 7.61

The following code illustrates how to find the 90th percentile for all of the variables by Species:

iris %>% group_by(Species) %>% summarise(percent90_SL = quantile(Sepal.Length, probs = .9), percent90_SW = quantile(Sepal.Width, probs = .9), percent90_PL = quantile(Petal.Length, probs = .9), percent90_PW = quantile(Petal.Width, probs = .9))# A tibble: 3 x 5# Species percent90_SL percent90_SW percent90_PL percent90_PW# #1 setosa 5.41 3.9 1.7 0.4 #2 versicolor 6.7 3.11 4.8 1.51#3 virginica 7.61 3.31 6.31 2.4 

Visualizing Percentiles

There is no built-in function to visualize the percentiles of a dataset in R, but we can create a plot to visualize the percentiles relatively easily.

The following code illustrates how to create a plot of the percentiles for the data values of Sepal.Length from theirisdataset:

n = length(iris$Sepal.Length)plot((1:n - 1)/(n - 1), sort(iris$Sepal.Length), type="l", main = "Visualizing Percentiles", xlab = "Percentile", ylab = "Value")

How to Easily Calculate Percentiles in R (With Examples) (1)

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Calculate Percentile Rank in R
How to Calculate Z-Scores in R
How to Calculate Relative Frequencies Using dplyr

How to Easily Calculate Percentiles in R (With Examples) (2024)

FAQs

How to calculate percentiles using R? ›

A: In R, you can calculate percentiles using the quantile() function, which provides a simple way to determine the percentile rank of a set of values. For more tailored calculations, the empirical cumulative distribution function ( ecdf() ) can also be utilized.

What is the easiest way to calculate percentile? ›

How do you calculate percentile? Percentile is found with the equation: P = n/N * 100%. Where P is the percentile, lower case n is the number of data points below the data point of interest, and N is the total number of data points in the data set.

How to calculate 95% quantile? ›

To calculate the 95th percentile value:
  1. Assuming that the number of data is N in total, calculate K = N x 0.95.
  2. Arrange the data in ascending order from smallest to largest. The K value in the sorted list will be 95th percentile value.
  3. If K is not an integer, then the value needs to be rounded up.
Sep 26, 2023

How do you find the percentile of a normal distribution in R? ›

We obtain percentile values in R using the function qnorm. This function returns the value of the standard normal (by default) distribution corresponding to a given percentile. For example, qnorm(. 5) returns 0, the median of the standard normal distribution.

How do you work out percentages in R? ›

Count the occurrences: Count the number of occurrences or instances within each group. Calculate the total: Find the total number of occurrences in the entire dataset. Calculate the percentage: Divide the count of each subgroup by the total count and multiply by 100 to get the percentage.

Is quantile the same as percentile in R? ›

Percentiles are given as percent values, values such as 95%, 40%, or 27%. Quantiles are given as decimal values, values such as 0.95, 0.4, and 0.27. The 0.95 quantile point is exactly the same as the 95th percentile point. R does not work with percentiles, rather R works with quantiles.

How to calculate the 75th percentile? ›

Answer and Explanation:
  1. To calculate the 75th percentile, first arrange the data set in ascending order as follows: 16 , 25 , 28 , 32 , 35 , 38 , 42.
  2. Calculate the position of the 75th percentile term by using the formula: P 75 = 75 100 ( n + 1 ) ...
  3. The 6th term of the dataset is 38. So, the 75th percentile score is .

What is the general formula for percentile? ›

P = (n/N) × 100

n = ordinal rank of the given value or value below the number. N = number of values in the data set. P = percentile.

How to calculate the 90th percentile? ›

Step 1: Place lead results in ascending order (from lowest to highest value). Step 2: Assign each sample a number, 1 for lowest value. Step 3: Multiply the total number of samples by 0.9. This is your 90th percentile value.

How do you manually calculate quantile? ›

Calculating Quartiles Manually

Using the following formulas, you calculate each quartile: First Quartile (Q1) = (n + 1) x 1/4. Second Quartile (Q2), or the median = (n + 1) x 2/4. Third Quartile (Q3) = (n + 1) x 3/4.

What is the formula for quantile and percentile? ›

The quantile function is defined by the equation Q(p)=inf{x∈R:p≤F(x)}. Now that we have got these definitions out of the way, we can define the terms: percentile: a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall.

What is the quantile function in R? ›

quantile() function in R Language is used to create sample quantiles within a data set with probability[0, 1]. Such as first quantile is at 0.25[25%], second is at 0.50[50%], and third is at 0.75[75%].

What is the percentile for dummies? ›

It reveals the percentage of scores that a given score surpassed. For example, if you earned a 75 on your last math test and ranked in the 85th percentile, that means that your 75 was higher than 85% of the other scores.

How to get percentile from z-score in r? ›

To convert Z-scores to percentiles in R 'pnorm()' function is use, which calculates quantiles from a normal distribution. Convert Z-scores to percentiles first need to calculate the Z-scores for your data points and then convert these Z-scores to percentiles.

How do you generate percentile rank in R? ›

You can use 'percent_rank' function to get the percentile calculation. In Exploratory, you can simply select 'Create Window Calculation' -> 'Rank' -> 'Percent Rank' from the menu of 'numbers_per_k' column in this case. Once you run it, the calculation is done for each row.

How do you convert Z-score to percentile in R? ›

To convert Z-scores to percentiles in R 'pnorm()' function is use, which calculates quantiles from a normal distribution. Convert Z-scores to percentiles first need to calculate the Z-scores for your data points and then convert these Z-scores to percentiles.

What is the quantile command in R? ›

You can use the quantile() function to find quartiles in R. If your data is called “data”, then “quantile(data, prob=c(. 25,. 5,.

What is the formula for percentiles in statistics? ›

P = (n/N) × 100

Where, n = ordinal rank of the given value or value below the number. N = number of values in the data set. P = percentile.

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Frankie Dare

Last Updated:

Views: 6288

Rating: 4.2 / 5 (73 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Frankie Dare

Birthday: 2000-01-27

Address: Suite 313 45115 Caridad Freeway, Port Barabaraville, MS 66713

Phone: +3769542039359

Job: Sales Manager

Hobby: Baton twirling, Stand-up comedy, Leather crafting, Rugby, tabletop games, Jigsaw puzzles, Air sports

Introduction: My name is Frankie Dare, I am a funny, beautiful, proud, fair, pleasant, cheerful, enthusiastic person who loves writing and wants to share my knowledge and understanding with you.