Student’s t-test for one sample

Keywords: #statistics

One of the most common statistical hypothesis tests, when we are analyzing populations, is the Student’s t-test. We can mainly use this test to evaluate a characteristic’s value in a population based on a sample. It’s important to highlight that the sample needs to follow a normal distribution for carrying out the Student’s t-test (also called normality). Let’s see three examples.

First example

Firstly, we load the data set employees_for_ttest that we can find in the package HRdatsets. This data set contains a sample with 550 observations.

library(tidyverse)

devtools::install_github("vicencfernandez/HRdatasets")
library(HRdatasets)
employees_for_ttest
## # A tibble: 550 x 6
##    employeeID gender salary20 salary21 performance20 performance21
##         <int> <fct>     <dbl>    <dbl>         <dbl>         <dbl>
##  1          1 female    1850.    1867.          58.9          63.0
##  2          2 female    2449.    2481.          73.0          73.8
##  3          3 female    2154.    2179.          91.2          87.9
##  4          4 female    2011.    2032.          55.9          60.7
##  5          5 female    1910.    1929.          69.6          71.2
##  6          6 female    2324.    2353.          72.3          73.3
##  7          7 female    1939.    1958.          79.8          79.1
##  8          8 female    1919.    1939.          67.5          69.6
##  9          9 female    1912.    1931.          96.4          91.8
## 10         10 female    2012.    2034.          68.8          70.6
## # … with 540 more rows

We want to know if the average salary of the female employees is equal to or different than 1,930 euros in 2021. So, the first step is to define the null and the alternative hypotheses:

  • Null Hypothesis: The average salary of female employees in 2021 is equal to 1,930 euros
  • Alternative Hypothesis: The average salary of female employees in 2021 is different than 1,930 euros

We can also write them mathematically.

  • H0: \(\mu_{salary} = 1930\)
  • H1: \(\mu_{salary} \neq 1930\)

The second step is to check the normality of the sample. Then, we can carry out a Shapiro-Wilk test to evaluate its distribution. Of course, when we have samples greater than 30, it’s unnecessary to analyze the normality due to the central limit theorem. However, here we are going to test it.

female_employees <- employees_for_ttest %>% filter(gender == "female")
female_employees$salary21 %>% shapiro.test()
## 
##  Shapiro-Wilk normality test
## 
## data:  .
## W = 0.99466, p-value = 0.5296

As the p-value (0.5296) is high (greater than 0.05), we cannot reject the hypothesis that the sample comes from a normal distribution population. So, we can accept that the population follows a normal distribution.

The third step is to set the maximum probability of error that we accept (significance level). In this example, we can use the 5% (it’s the most common value). Remember that we want to say something about the population based on a sample.

Finally, the fourth and last step is to carry out the Student t-test with the function t.test() and the following parameters:

  1. The data set - In this scenario, the data set is female_employess$salary21
  2. The value to compare: mu - In this case, it’s \(1930\)
  3. The type of the alternative hypothesis: alternative - We can face three types of alternative hypothesis: (1) different to (“two.sided”), (2) greater than (“greater”), and (3) less than (“less”). In this case, we have a ‘different to’ hypothesis.
  4. The confidence level of the interval: conf.level - Its value is one minus the significance level. In this case, the confidence level is \(0.95\).
female_employees$salary21 %>% t.test(mu = 1930, alternative = "two.sided", conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  .
## t = 1.9721, df = 249, p-value = 0.0497
## alternative hypothesis: true mean is not equal to 1930
## 95 percent confidence interval:
##  1930.034 1980.719
## sample estimates:
## mean of x 
##  1955.376

The p-value tells us the probability of making a mistake if we reject the null hypothesis. As the p-value (0.0497) is less than our significance level (0.05), we can reject the null hypothesis, and so, we can accept the alternative hypothesis. So, the average salary of female employees in 2021 is different than 1,930 euros.

Second example

Let’s see another example with the same data set. Now, we want to test if the average salary of female employees in 2021 is greater than 1,930 euros. So, we define the following hypotheses:

  • H0: \(\mu_{salary} = 1930\)
  • H1: \(\mu_{salary} > 1930\)

Now, we can use again the function t.test(), but with the parameter alternative = "greater".

female_employees$salary21 %>% t.test(mu = 1930, alternative = "greater", conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  .
## t = 1.9721, df = 249, p-value = 0.02485
## alternative hypothesis: true mean is greater than 1930
## 95 percent confidence interval:
##  1934.132      Inf
## sample estimates:
## mean of x 
##  1955.376

As the p-value (0.02485) is less than our significance level (0.05), we can reject the null hypothesis, and so, we can accept the alternative hypothesis. So, the average salary of female employees in 2021 is greater than 1,930 euros.

Third example

Let’s see the last example with the same data set. Now, we want to test if the average salary of female employees in 2021 is less than 1,930 euros. So, we define the following hypotheses:

  • H0: \(\mu_{salary} = 1930\)
  • H1: \(\mu_{salary} < 1930\)

Now, we can use again the function t.test(), but with the parameter alternative = "less".

female_employees$salary21 %>% t.test(mu = 1930, alternative = "less", conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  .
## t = 1.9721, df = 249, p-value = 0.9752
## alternative hypothesis: true mean is less than 1930
## 95 percent confidence interval:
##     -Inf 1976.62
## sample estimates:
## mean of x 
##  1955.376

As the p-value (0.9752) is greater than our significance level (0.05), we can’t reject the null hypothesis, and so, we can’t accept the alternative hypothesis. So, we cannot say that the average salary of female employees in 2021 is less than 1,930 euros.