Background:
One of the basic concepts to science is testing a
hypothesis. A hypothesis is a
proposed explanation for a phenomenon based on observations. Often researchers want to test a hypothesis
to make inferences about a population from a sample. This is called hypothesis testing which is the focus of this exercise. In hypothesis testing, there are two hypotheses
used: the null hypothesis and alternative hypothesis. A null
hypothesis states that there is no difference between the sample mean and
the hypothesized mean and this hypothesis is either rejected or the researcher fails to reject it. An alternative hypothesis states that
there is a difference between the sample mean and hypothesized mean. To test hypotheses, there are two main tests:
z tests and t tests. A z test is used to determine if two
population means are different based on a normal distribution and a sample
larger than 30. A t test tests whether a sample or samples fit a normal distribution
and is based on degrees of freedom and a sample size of less than 30. Degrees of freedom are the sample size
minus one to eliminate bias due to a smaller sample size. The goal of
both of these tests is to determine if there is a difference between the sample
and hypothesized mean. The sample mean
is the average from sample data and the hypothesized mean is the average of the population and what the sample mean is compared to. Both of these tests merely tell the reader
whether there was a difference between the sample mean and the hypothesized
mean. This is important because if there
is a difference, then further analyses can be done to explain things such as
why the result is different, what causes the difference, or what the
implications are of having a difference.
In this lab there are two parts. In part one question one, a basic table is filled
out to show how significance levels, z or t tests, and z or t values are
determined. A significance level is the confidence interval subtracted from
100 and divided by 100. This must be divided by 2 if the
test is two tailed (both sides of
the normal distribution curve tested) or kept as is if a one tailed test is used. The start of the area defined by the significance level is the critical value, also called the z or t
value. The confidence interval is
the range of values the true value likely falls within. For question two, a hypothesis test is
conducted on three different crop yield means using data from a Department of
Agriculture and Live Stock Development organization in Kenya and survey results
of farmers in Kenya to determine if yields in a certain district approach the
country averages for yields. Question three
looks at levels of a particular stream pollutant using a hypothesis test.
For part two, two shape
files, block groups for the City of Eau Claire and block groups for all of Eau
Claire County, are used to decide if the average value of homes for the City of
Eau Claire block groups is significantly different from the average value of
homes for the Eau Claire County block groups.
Methods:
For part one question one the significance level, z or t test
determination, and z or t value was recorded based on the information
given. The significance level is found
by subtracting the confidence level from 100 and then dividing by 100.
This would be the answer for a one tailed test. For a two tailed test, the resulting number
is then divided by two because the test is being conducted on both ends of the
normal distribution. To determine if a z
or t test should be used, the n value was taken into account. If n is greater than 30, a z test is
used. If n is less than 30, a t test is
used. Finally, the z and t values, or
critical values were found by consulting a z and t table of critical values
given each significant level.
For part one question two, first the null and alternative
hypotheses for ground nuts, cassava, and beans’ yields were stated to frame the
question. For ground nuts, the null
hypothesis states that there is no difference between the sample yield of
ground nuts and the country average of ground nuts yield and the alternative
hypothesis states that there is a difference between the sample yield of ground
nuts and the country average of ground nuts yield. For cassava, the null hypothesis
states that there is no difference between the sample yield of cassava and the
country average of cassava yield and the alternative hypothesis states that
there is a difference between the sample yield of cassava and the country
average of cassava yield. Finally, for beans the null hypothesis states that
there is no difference between the sample yield of beans and the country
average of beans yield and the alternative hypothesis states that there is a
difference between the sample yield of beans and the country average of beans
yield. Then for all three crops a two tailed t test with a 95% confidence level
and significance level of 0.025 was used to test the hypotheses because the
sample size is less than 30 (23). The
results for all three crops is given below in figure 1.
![]() |
Figure 1. Part 1 question 2: hypothesis tests of ground nuts, cassava, and beans' yields in a certain district compared to the country of Kenya. |
For part one question three, the null hypothesis states that there
is no difference between the allowable limit of 4.4 mg/l of a stream pollutant
and the sample mean pollutant level of 6.8 mg/l. The alternative hypothesis states that there
is a difference between the allowable limit of 4.4 mg/l of a stream pollutant
and the sample mean pollutant level of 6.8 mg/l. The sample is only 17, so a one tailed t-test
will be used with a 95% confidence interval and a significance level of
0.5. The methods and results of the test are shown
below in figure 2.
For part two, the average value of homes in the City of Eau
Claire, average value of homes in Eau Claire County, standard deviation of the
average value of homes in the City of Eau Claire, and the number of block
groups in the City of Eau Claire were all obtained from the shapefiles in
ArcMap. Then, all these values were used
in a two tailed z test with a 95% confidence interval and 0.025 significance level. The results are shown in figure 2 above.
Results:
Figure 3. Part 1 question 1: significance levels, z or t determinations, and z or t values for given interval types and confidence levels. |
Figure 3 shows the results from part 1 question 1 in a table
format. The last column demonstrates that the z or t value varies depending on
the significance level that is given for either test.
Figure 1 (methods section) shows the t test for
3 crops grown in a certain district of Kenya, including the critical values for
each hypothesis test which came out to -2.07 and +2.07. The t value for ground nuts was -0.64 and the probability of this score was 26.4%. For ground nuts, we fail to reject the null
hypothesis because the t value of -0.64 does not fall below the critical value of
-2.07 and the probability was larger than the the 2.5% significance level. This means that there is not a
difference between the sample yield of ground nuts and the country average of
ground nuts yield. The t value for cassava was -2.59 and the probability of this score was 0.84%. For cassava, we
reject the null hypothesis because the t value of -2.59 falls below the critical
value of -2.07 and the probability of 0.84% is lower than the 2.5% significance level. This means there is a
difference between the sample yield of cassava and the country average of
cassava yield and the sample mean for cassava is higher than the country average. The t value for beans was1.84 and the probability of this score was 96.03%. For beans, we also fail
to reject the null hypothesis because the t value of 1.84 does not exceed
2.07 and the probability of 96.03% was not beyond the 97.5% significance level. This means that there is not a
difference between the sample yield of beans and the country average beans
yield. Out of the three crops, only
cassava failed to approach the country average for yield. Ground nuts and beans
both had no difference between the sample mean and country mean yields, so they
adhered to the estimation of the Department of Agriculture and Live Stock
Development organization in Kenya that yields in this certain district should approach the country averages.
Figure 2 (methods section) shows the results of the t test to determine if
pollutant levels in a stream are significantly higher than the allowable
limit. The critical value for the
calculation is 1.75, the t value is 2.36, and the probability value for this t value is 98.6%. With a t value of 2.36 and a probability of 98.6%, we reject the null hypothesis because the t value is larger than the critical value and the probability was above the 97.5% significance level .
Thus there is a difference between the allowable limit 4.4 mg/l of a
stream pollutant and the sample mean pollutant level of 6.8 mg/l. The sample mean of the pollutant level is higher than the allowable limit of the pollutant so the researcher can advocate for
measures to be taken to reduce the level of pollutant.
![]() |
Figure 4. Map of the average value of homes in Eau Claire County and the City of Eau Claire block groups. |
Figure 4 shows a map of the average value of homes for the
City of Eau Claire and Eau Claire County block groups.
Based on the results of the z test in figure 2, the z value of -2.57 is
smaller than the critical value of -1.96 and thus we reject the null
hypothesis. Therefore, there is a
significant difference between the average values of homes in the City of Eau
Claire and the average value of homes in Eau Claire County. Based on
the means for the city and the county, the homes in the City of Eau Claire have a lower average value than
the homes in all of Eau Claire County. The
map supports this analysis because some block groups inside the City of Eau
Claire are lighter purple than the block groups outside of the city which
denotes a lesser home value. There are
also more dark purple block groups outside of the City of Eau Claire than
inside the city. The z test performed provides a quantitative support for this visual
trend seen in the map.
Discussion:
Z and t tests are a simple and easy way to determine if a
sample mean differs from the population mean.
Z tests are good for using with large samples (greater than 30) and t
tests are best used with samples less than 30.
What makes t tests so good with small samples is its dependence on
degrees of freedom. This eliminates some
of the bias that can occur with small sample sizes, such as the influence of
outliers. Although this bias exists, t
tests are still a great tool to use, especially when the sample size is
small. There are limitations to both of
these tests. Z and t tests merely
determine if a sample mean differs from the population mean. Further analyses are need to infer more from
the data such as why the sample mean differs, where is the sample mean different, and what factors caused the sample mean to differ. These tests are however a good
start to determining if results obtained from a sample are significant enough
for further analysis and they are widely applicable, such as the crop yield and
stream pollution examples in part 1 and the home value question in part 2.
Sources:
Definitions of statistical concepts and shapefiles were
provided by Ryan Weichelt of the University of Wisconsin-Eau Claire.
No comments:
Post a Comment