Two Sample z-test in R with examples

In this article, we will discuss how to do a two sample z-test in R with some practical examples.

What is Two sample z-test for mean?

A two sample z-test is used to determine whether there is a significant difference between the two population means given for the two samples with known population variance.

Conditions required to conduct two sample z-test for mean

Assumptions for Two Sample Mean z-test

  • Both samples should be drawn at random from their respective populations.
  • Two Samples should be independent of each other.
  • Both Populations should follow a normal distribution
  • Both Population variances are known.
  • Both sample size should be greater than 30.

Hypothesis for the two sample z-test for mean

Let1 denote the sample mean for a random sample from population 1.

2 denotes the sample mean for a random sample from population 2.

µ1 denotes the mean for population 1

µ2 denotes the mean for population 2

Null Hypothesis:

H0 : µ1 = µ2 Both population means are equal.

Alternative Hypothesis: Three forms of alternative hypothesis are as follows:

  • Ha : µ1 – µ2 <0 The difference between two population means is less than 0 i.e.mean for population 1 is less than the mean for population 2.It is called lower tail test (left-tailed test).
  • Ha : µ1 – µ2 >0 The difference between two population means is greater than 0 i.e. mean for population 1 is greater than the mean for population 2. It is called Upper tail test (right-tailed test).
  • Ha : µ1 – µ2  0 The difference between two population means is not equal to 0 i.e. mean for population 1 is not equal to mean for population 2. It is called two tail test.

Formula for the test statistic two sample Z test is:

Formula for two sample z-test
Formula for two sample z-test

where :

1 : sample mean for population 1

2: sample mean for population 2

µ1 : mean for population 1

µ2 : mean for population 2

n1 : sample size for sample mean from population 1.

n2 : sample size for sample mean from population 2.

σ21 : variance for population 1

σ22 : variance for population 2

Function in R for z-test

z.test() function in R from the BSDA library is used to perform a one-sample z-test for mean.

Install BSDA for z-test for mean

If you don’t have the BSDA library installed then use the below command on the R Editor for BSDA library installation

install.packages("BSDA")

The z.test() function uses the following basic syntax:

z.test(x,y = NULL,
alternative = "two.sided"or"greater", "less" or
mu = 0,
sigma.x = NULL,
sigma.y = NULL,
conf.level = 0.95
)

where :

x,y: It tells us about the datasets used in the test.

alternative: The alternative hypothesis for the test. It can be ‘greater’, ‘less’, ‘two. sided’ based on the alternative hypothesis.

mu: The true value of the mean.

sigma.x: It represents the population standard deviation for the x sample.

sigma.y: It represents the population standard deviation for the y sample.

conf. level: confidence level of the interval

Summary for the Two Mean Z-test

Left-tailed TestRight-tailed TestTwo-tailed Test
Null HypothesisH0 : µ1 – µ2 ≥0H0 : µ1 – µ2 ≤0H0 : µ1 – µ2 =0
Alternate HypothesisHa : µ1 – µ2 <0Ha : µ1 – µ2 >0Ha : µ1 – µ2 ≠ 0
Test Statisticz = ( x̅1 – x̅2 ) – (µ1 – µ2) / √(σ21/n1 + σ22 /n2)z = ( x̅1 – x̅2 ) – (µ1 – µ2) / √(σ21/n1 + σ22 /n2)z = ( x̅1 – x̅2 ) – (µ1 – µ2) / √(σ21/n1 + σ22 /n2)
Decision Rule: p-value approach (where α is level of significance)If p-value ≤α
then Reject H0
If p-value ≤α
then Reject H0
If p-value ≤α
then Reject H0
Decision Rule: Critical-value approachIf z ≤ -zα
then Reject H0
If z ≥ zα
then Reject H0
If z ≤ -zα/2 or z ≥ zα/2 then Reject H0

How to do two sample z-test for mean in R?

We will calculate the test statistic by using a two sample z-test for the mean.

Procedure for Two Sample Z-Test for mean

Step 1: Define the Null Hypothesis and Alternate Hypothesis.

Step 2: Decide the level of significance α (alpha).

Step 3: Calculate the test statistic using the z.test() function from R.

Step 4: Interpret the two sample z-test results.

Step 5: Determine the rejection criteria for the given confidence level and conclude the results whether the test statistic lies in the rejection region or non-rejection region.

Let’s see practical examples that show how to use the z.test() function in R.

Example for Two Sample Z-Test

Example 1: Two-tailed test in R with known equal variance.

Lets take IQ levels among boys and girls in 10th class are known to be normally distributed each with population standard deviations of 25.

A teacher wants to know if the mean IQ level between girls and boys in class are different, so she selects a two random samples of boys and girls each of size 40 from the class and records their IQ levels.

Lets perform the two sample z-test in R to determine if the mean IQ level is different between boys and girls with 5% level of significance.

Solution : Given data:

sample size for boys (n1) = 40

sample size for girls (n2) = 40

Population standard deviation for boys (σ1) = 25

Population standard deviation for girls (σ2) = 25

Now we will solve this example with the step-by-step procedure.

Step 1: Define the Null Hypothesis and Alternate Hypothesis.

µ1 denotes the mean for boys

µ2 denotes the mean for girls

Null Hypothesis: The IQ level for girls and boys are equal.

H0 : µ1 = µ2

Alternate Hypothesis : The IQ level for girls and boys are not equal.

Ha : µ1 ≠ µ2

Step 2: level of significance (α) = 0.05

Step 3: Calculate the test statistic using a z.test() function in R using the below code.

# Define the datasets for boys and girls 

boys_dataset = c( 79,118,99,117,98,102,112,111,102,73,114,97,114,122,90,115,84,105,84,126,
                  83,96,111,151,147,103,104,118,132,108,95,118,121,88,94,92,94,109,105,123)
girls_dataset = c( 99,128,89,107,99,104,119,112,105,93,84,91,113,129,100,105,94,115,114,106,
                   113,116,116,131,117,123,134,128,112,101,105,89,101,118,124,72,104,119,145,133)

# Perform the two sample z-test

z.test(x=boys_dataset, y=girls_dataset, mu=0, sigma.x=25, sigma.y=25,alternative = "two.sided")

Specify the alternative hypothesis as “two.sided” because we are performing a two-tailed test. The results are as follows.

#Results
Two-sample z-Test

data:  boys_dataset and girls_dataset
z = -0.68424, p-value = 0.4938
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -14.781532   7.131532
sample estimates:
mean of x mean of y 
  106.350   110.175 

Step 4: Interpret the two sample test results.

How to interpret two sample z-test results in R?

Let’s see the interpretation of z-test results in R.

data: This gives information about the vector used in the z-test. x represents the data set for boys and y represents the data set for girls.

z: It is the test statistic of the z-test. In our case, test statistic = -0.68424.

p-value: This is the p-value corresponding to a statistic. In our case, the p-value is 0.4938.

alternative: It is the alternative hypothesis used for the z-test. In our case, an alternative hypothesis the IQ level for girls and boys are not equal, i.e. two-tailed.

95 percent confidence interval: This gives us a 95% confidence interval for the true mean. Here the 95% confidence interval is [-14.781532,7.131532].

sample estimates: It gives the sample means.In our case, the sample mean for boys=106.350 and sample mean for girls =110.175.

Step 5: Determine the rejection criteria for the given confidence level and conclude the results whether the test statistic lies in the rejection region or non-rejection region.

Conclusion:

Since the p-value[ 0.4938] is greater than the level of significance (α) = 0.05, we fail to reject the null hypothesis.

This means we have sufficient evidence to say that IQ level for boys and girls are equal in 10th class.

Example 2: Left-tailed test in R with known unequal variance.

The two independent populations taken from two shops in a small town.The first shop A sells “traditional” lime juice.  However the second shop B is selling “Special” Mojito. We selects the two random sample of sales for each drink(shop) and records their sales for 35 days to determine if sales for “Special” Mojito out performed sales of “traditional” lime juice at 5% level of significance. The population variances for lime juice sales is 15 and for Mojito is 12.

Solution : Given data:

sample size for lime juice sales (n1) = 35

sample size for Mojito sales (n2) = 35

Population standard deviation for lime juice sales (σ1) = 15

Population standard deviation for Mojito sales (σ2) = 12

Lets perform z-test in this example with the step-by-step procedure.

Step 1: Define the Null Hypothesis and Alternate Hypothesis.

µ1 denotes the mean for lime juice sales

µ2 denotes the mean for Mojito sales

Null Hypothesis: The sales for lime juice sales and Mojito sales are equal.

H0 : µ1 = µ2

Alternate Hypothesis : The sales for Mojito sales are greater and sales for lime juice.

Ha : µ1 – µ2 <0 i.e µ2 > µ1

Step 2: level of significance (α) = 0.05

Step 3: Calculate the test statistic using a z.test() function in R using the below code.

# Define the datasets for both drinks

lime_juice_sales = c(56,65,37,47,66,76,75,31,80,45,34,42,42,23,67,47,
45,50,45,42,59,34,50,48,65,41,53,41,36,39,51,69,30,52,42)

mojito_sales = c(51,47,53,40,70,49,63,71,47,62,65,62,56,74,49,33,80,60,46,
65,48,61,54,67,65,48,46,66,52,65,62,59,63,44,50)

# Perform the two sample z-test

z.test(x=lime_juice_sales, y=mojito_sales, mu=0, sigma.x=15, sigma.y=12,alternative = "less")

Specify the alternative hypothesis as “less” because we are performing a left-tailed test. The results are as follows.

#Results
Two-sample z-Test

data:  lime_juice_sales and mojito_sales
z = -2.3582, p-value = 0.009181
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
        NA -2.316342
sample estimates:
mean of x mean of y 
 49.28571  56.94286 

Step 4: Interpret the two sample test results.

How to interpret two sample z-test results in R?

Let’s see the interpretation of z-test results in R.

data: This gives information about the vector used in the z-test. x represents the data set for lime juice sales and y represents the data set for mojito sales.

z: It is the test statistic of the z-test. In our case, test statistic = -2.3582.

p-value: This is the p-value corresponding to a statistic. In our case, the p-value is 0.009181.

alternative: It is the alternative hypothesis used for the z-test. In our case, an alternative hypothesis the sales for Mojito sales are greater and sales for lime juice , i.e. left-tailed.

95 percent confidence interval: This gives us a 95% confidence interval for the true mean.

sample estimates: It gives the sample means.In our case, the sample mean for lime juice sales = 49.28571and sample mean for mojito sales = 56.94286

Step 5: Determine the rejection criteria for the given confidence level and conclude the results whether the test statistic lies in the rejection region or non-rejection region.

Conclusion:

Since the p-value[ 0.009181] is less than the level of significance (α) = 0.05, we reject the null hypothesis.

This means we have sufficient evidence to say that the sales for mojito drink is out performed as comapre to lime juice sales in the town.

Two Sample Z-test FAQ

Which test statistics is used to test the hypothesis about difference between two population Means?

We use two sample Z-test to test the hypothesis about difference between two population means.

What are the types of two sample z tests?

There are two types of sample z-tests
1- Two sample z hypothesis test for known equal Variance
2- Two sample z hypothesis test for known unqual Variance

Why do we need a two sample z-test for means?

The two sample z-test is used when the two population variance are known.

Summary

I hope you found the above article on two sample z-test in R with Examples informative and educational.

Leave a Comment