ANOVA: Test of Normality of the Data

Subject:	Sciences
Pages:	12
Words:	3350
Reading time:	14 min
Study level:	PhD

Introduction
Test of reliability of the instrument
Descriptive statistical test
Inferential statistical test (Analysis of Variance)
Main effect
Effect size
Post hoc tests
References

Introduction

Most statistical techniques assume that distributions of scores on dependent variables are normal. In this case, a normal distribution represents a symmetrical, bell-shaped curve, which has the greatest frequency of scores in the middle, with smaller frequencies towards the extremes (Gravetter and Wallnau, 2000; Everitt, 1996).

Skewness tests for instrument normality indicate the tilt or lack of it in distribution of respondents (Kline, 1986). These are right and left skewness whereby right skew are common. Some authors have argued that skew should have a range of +2 to -2 to indicate normal distribution. However, others statisticians apply a stringent rule of + 1 to – 1 to assess normality of instruments. In the latter case, the test for normality is critical. Positive skewness shows positive skew i.e. this is whereby scores are to the left at the low values. Conversely, negative skewness shows scores clustering at the high end. This is the right-hand side of the chart. Tabachnick and Fidell argue that when case samples are reasonable large skewness will not “make a substantive difference in analysis” (Tabachnick and Fidell, 2001).

Kurtosis indicates peaked distribution of samples. Positive Kurtosis shows clustering of samples at the center with long, thin tails. Kurtosis values below zero show relatively flat distributions i.e. there are many samples in the extremes.

Statistics
Gender of respondents
N	Valid	334
	Missing	0
Mean		1.40
Median		1.40^a
Mode		1
Std. Deviation		.490
Skewness		.431
Std. Error of Skewness		.133
Kurtosis		-1.826
Std. Error of Kurtosis		.266

a. Calculated from grouped data.

This histogram indicates that there is a positive skewness whereby most of the respondents were mainly male. The skewness is positive at 0.431 while Kurtosis is at – 1.826. According to James Dean Brown, normal distributions result into a skewness value of about zero (Brown, 1997). Thus, 0.431 value of skewness is an acceptable value of normal distribution because it is close to zero, and the difference could happen by chance. This is also within the range of + or – 2. A normal Kurtosis is usually zero. Kurtosis value of – 1.826 shows a relatively flat distribution. However, this value is within the acceptable range.

Statistics
Age of respondents
N	Valid	334
	Missing	0
Mean		1.92
Median		1.83^a
Mode		2
Std. Deviation		.835
Skewness		1.082
Std. Error of Skewness		.133
Kurtosis		1.676
Std. Error of Kurtosis		.266
a. Calculated from grouped data.

Statistics
Education level of respondents
N	Valid	334
	Missing	0
Mean		1.51
Median		1.49^a
Mode		1
Std. Deviation		.604
Skewness		.724
Std. Error of Skewness		.133
Kurtosis		-.439
Std. Error of Kurtosis		.266
a. Calculated from grouped data.

Statistics
Tribal affiliation of respondents
N	Valid	334
	Missing	0
Mean		2.35
Median		2.17^a
Mode		1
Std. Deviation		1.287
Skewness		.937
Std. Error of Skewness		.133
Kurtosis		.450
Std. Error of Kurtosis		.266
a. Calculated from grouped data.

Statistics
Areas of residence of respondents
N	Valid	334
	Missing	0
Mean		1.99
Median		1.98^a
Mode		1
Std. Deviation		.872
Skewness		.029
Std. Error of Skewness		.133
Kurtosis		-1.686
Std. Error of Kurtosis		.266
a. Calculated from grouped data.

Skewness and Kurtosis values indicate that research instruments are within the acceptable ranges of – or + 2. This implies a normal distribution. However, Kolmogorov-Smirnov tests for normality indicate Significance values of.000 as shown in the below table. Kolmogorov-Smirnov test shows statistical evaluation between the practical and theoretical case of normal distribution (the significance value is.05). However, Kolmogorov-Smirnov tests for normality indicate Significance values of.000 as shown in the below table. Kolmogorov-Smirnov test shows statistical evaluation between the practical and theoretical case of normal distribution (the significance value is.05). These deviations could occur by chance because deviations are small from the recommended value. Thus, we can treat distribution as normal. Pallant argues that such cases can occur in large samples by chance (Pallant, 2005). Brown insists that skewness and Kurtosis values interpretation depend on the purpose and type of tests under analyses.

Tests of Normality
	Kolmogorov-Smirnov^a			Shapiro-Wilk
	Statistic	Df	Sig.	Statistic	df	Sig.
Gender of respondents	.395	334	.000	.620	334	.000
Age of respondents	.295	334	.000	.799	334	.000
Education level of respondents	.345	334	.000	.718	334	.000
Tribal affiliation of respondents	.193	334	.000	.853	334	.000
Areas of residence of respondents	.257	334	.000	.764	334	.000
a. Lilliefors Significance Correction

Test of reliability of the instrument

Reliability of an instrument scale should show how it is free from common errors (DeVellis, 2003; Cooper and Schindler, 2003). We can assess the reliability of a scale using test-retest reliability and internal consistency. In this case, the researcher used internal consistency to assess the reliability of research instruments. Internal consistency demonstrates the “extent to which research items that form the scale measure the same underlying attribute i.e. the extent to which the items ‘hang together” (Pallant, 2005). This research relies on Cronbach’s coefficient alpha in SPSS to measure internal consistency of instruments. This is a common method among statisticians when testing instruments reliability (Smithson, 2000). The Cronbach’s coefficient shows the average correlation among all instruments that form the scale of research. Theoretically, coefficient value for reliability ranges from 0 to 1, where higher value shows greater reliability.

Different statisticians have recommended different levels of reliability. However, these recommendations depend on the nature and use of the instrument. Nunnally proposed a lower level of 0.7 (Nunnally, 1978). It is important to note that the value of Cronbach alpha relies on the number of items on the instrument. When an instrument scale has fewer than ten items, then the Cronbach alpha values can be small. However, scale reliability also differs in terms of research samples used.

From the data, we can observe internal consistency of items of the scale as follows.

Reliability Statistics
Cronbach’s Alpha	N of Items
.313	5

Item-Total Statistics
	Scale Mean if Item Deleted	Scale Variance if Item Deleted	Corrected Item-Total Correlation	Cronbach’s Alpha if Item Deleted
Gender of respondents	7.77	4.544	.085	.313
Age of respondents	7.24	4.040	.067	.336
Education level of respondents	7.65	4.210	.156	.272
Tribal affiliation of respondents	6.82	2.360	.239	.169
Areas of residence of respondents	7.18	3.445	.234	.189

According to Nunnally, internal consistency Cronbach’s Alpha value should be 0.7. However, in this case, we have a Cronbach’s Alpha of 0.313. This value is lower than 0.7. Thus, it means that the scale items do not have internal consistency. According to Lord and Novick, SPSS may generate Alpha value that is lower than the normal range (Lord and Novick, 1968). Lord and Novick argue that Alpha “is actually a lower bound on the true reliability of a test under general conditions and that it will only equal the true reliability if items satisfy a property known as essential t – equivalence” (Lord and Novick, 1968). This suggestion requires that all items should have same values, or we can use a constant value to turn true score of every item to any other item’s true score. Thus, the implication is that the Alpha value for reliability should measure the same thing instead of a lower bound.

We must also note that reliability can range from 0.00 to 1. This suggests that Cronbach Alpha is flexible in determining perceived differences among instruments. However, reliability is not inherent in items themselves. Instead, it estimates internal consistency of various items administered to certain respondents depending on time, condition, and purpose.

Descriptive statistical test

Descriptive tests provide information about all research instruments. Descriptive tests give basic information such as mean, media, standard deviation, range, maximum, minimum, and variance. This test enables the researcher to describe characteristics of research samples, check for possibilities of violation of assumptions, and provide answers on a certain research questions. This test enables the researcher to describe characteristics of research samples, check for possibilities of violation of assumptions, and provide answers on a certain research questions (Boyce, 2003; Greene and d’Oliveira, 1999).

Statistics
		Areas of residence of respondents	Age of respondents	Gender of respondents	Education level of respondents	Tribal affiliation of respondents
N	Valid	334	334	334	334	334
N	Missing	0	0	0	0	0
Mean		1.99	1.92	1.40	1.51	2.35
Std. Error of Mean		.048	.046	.027	.033	.070
Median		2.00	2.00	1.00	1.00	2.00
Mode		1	2	1	1	1
Std. Deviation		.872	.835	.490	.604	1.287
Variance		.760	.697	.240	.365	1.657
Range		2	4	1	2	5
Minimum		1	1	1	1	1
Maximum		3	5	2	3	6

Areas of residence of respondents
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Kisumu	129	38.6	38.6	38.6
	Nakuru	81	24.3	24.3	62.9
	Eldoret	124	37.1	37.1	100.0
	Total	334	100.0	100.0

Age of respondents
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	18-25 yrs	104	31.1	31.1	31.1
	26-35 yrs	174	52.1	52.1	83.2
	36-45 yrs	38	11.4	11.4	94.6
	46-55 yrs	14	4.2	4.2	98.8
	56-65 yrs	4	1.2	1.2	100.0
	Total	334	100.0	100.0

Gender of respondents
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Male	202	60.5	60.5	60.5
	Female	132	39.5	39.5	100.0
	Total	334	100.0	100.0

Education level of respondents
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	High	181	54.2	54.2	54.2
	Bachelors	134	40.1	40.1	94.3
	Masters	19	5.7	5.7	100.0
	Total	334	100.0	100.0

Tribal affiliation of respondents
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Luo	106	31.7	31.7	31.7
	Kikuyu	90	26.9	26.9	58.7
	Kalenjin	93	27.8	27.8	86.5
	Kisii	15	4.5	4.5	91.0
	Luyha	21	6.3	6.3	97.3
	Kamba	9	2.7	2.7	100.0
	Total	334	100.0	100.0

Inferential statistical test (Analysis of Variance)

In inferential statistical tests, the researcher attempts to reach a conclusion. Thus, the main aims of these tests are to show what the respondents think. In addition, we can use inferential statistical tests for evaluation of observed differences in dependability or occurrences by chances in study instruments (Hayes, 2000).

Analysis of Variance (ANOVA) shall tell the researcher presence of statistically significant difference between the means of three or more sets of data (Harris, 1994). In addition, the researcher can also explore the size of the difference that exists among variables. This is the Effect Size. We can determine the Effect Size through the partial eta-squared statistic (Keppel and Zedeck, 1989).

One-way ANOVA works like a t-test. However, One-way ANOVA is useful in cases where the researcher has two or more variables wants to compare the mean scores of such variables (Hair, Tatham, Anderson and Black, 1998). This test is useful when the researcher wants to determine the effect of only one independent variable on dependent variable (Stevens, 1996). However, ANOVA cannot tell where the significance difference is. Thus, the researcher can conduct post hoc comparisons to determine the significance among groups. The researcher tested if there is a difference between age groups and tribal affiliations of subjects and establish as follows.

Test of Homogeneity of Variances
Tribal affiliation of respondents
Levene Statistic	df1	df2	Sig.
.776	4	329	.542

The test of homogeneity of variances in the above case gives Levene’s test for homogeneity that illustrates whether the scores are the same for samples of the groups. The Significance value (Sig.) in this case.542. Thus, it is greater than.05. This implies that the researcher has not “violated the assumption of homogeneity of variance” (Pallant, 2005).

ANOVA
Tribal affiliation of respondents
	Sum of Squares	df	Mean Square	F	Sig.
Between Groups	2.338	4	.585	.350	.844
Within Groups	549.374	329	1.670
Total	551.713	333

From the ANOVA table, the researcher has interests the significance value of Tribal affiliation of respondents. According to Pallant, “if the significance value is less than or equal to.05 (e.g..03,.01,.001), then there is a significant difference somewhere among the mean scores on your dependent variable for the groups”. In this case, the significance value is.844. This indicates that there is no significance between these pairs.

Tribal affiliation of respondents
Tukey HSD^a,,b
Age of respondents	N	Subset for alpha = 0.05
Age of respondents	N	1
46-55 yrs	14	2.14
56-65 yrs	4	2.25
26-35 yrs	174	2.32
18-25 yrs	104	2.35
36-45 yrs	38	2.55
Sig.		.921
Means for groups in homogeneous subsets are displayed.
a. Uses Harmonic Mean Sample Size = 13.770.
b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error levels are not guaranteed.

From the above data, we can present the result as follows. The researcher conducted a one-way between-groups analysis to explore the effect of age on level of tribal affiliation. The researcher grouped the respondents according to their age (Age: 18 to 25, 26 to 35, 36 to 45, 46 to 55, and 56 to 65). The researcher established that there was no statistically significance effect within the groups as the Significance was.844. This value is above the range of.05. We can also notice this value from Tukey HSD with a Significance value of.921. A number of researchers concur that they should not discuss non-significant results (Grimm and Yarnold, 1995). This is because such results indicate actual differences between groups.

Two-way ANOVA enables the researcher to test “the effect of two independent variables on one dependent variable” (Harris, 1994). From this analysis, we can see the interaction effect. This implies that we can see the influence of one independent variable on another. At the same time, we can also test the overall or main effect every independent variable.

A two-way ANOVA to show effects of gender and tribe, and education of the subjects revealed the following effects.

Levene’s Test of Equality of Error Variances^a
Dependent Variable: Education level of respondents
F	df1	df2	Sig.
2.752	11	322	.002
Tests the null hypothesis that the error variance of the dependent variable is equal across groups.
a. Design: Intercept + GENDER + TRIBE + GENDER * TRIBE

The Levene’s Test of Equality of Error Variances provides the researcher with the significance value of.002. This is less than.05. Thus, it is significant. Significant value suggests that the variance of dependent variable across the group is not equal. The value is less than.05. Thus, we shall consider the main effect and interaction effect among subjects. The Levene’s Test of Equality of Error Variances provides the researcher with the significance value of.002. This is less than.05. Thus, it is significant. Significant value suggests that the variance of dependent variable across the group is not equal. The value is less than.05 thus, we shall consider the main effect and interaction effect among subjects. From the Tests of Between-Subjects Effects, we have several pieces of information as follows.

The interaction effects show possibilities of any interaction effect among variables. For instance, we look at the effect of age on education levels, and whether it depends on whether the respondents are male or female.

Tests of Between-Subjects Effects
Dependent Variable: Education level of respondents
Source	Type III Sum of Squares	Df	Mean Square	F	Sig.	Partial Eta Squared
Corrected Model	6.853^a	11	.623	1.751	.062	.056
Intercept	307.157	1	307.157	863.250	.000	.728
GENDER	.406	1	.406	1.141	.286	.004
TRIBE	5.517	5	1.103	3.101	.009	.046
GENDER * TRIBE	1.818	5	.364	1.022	.405	.016
Error	114.572	322	.356
Total	888.000	334
Corrected Total	121.425	333
a. R Squared =.056 (Adjusted R Squared =.024)

The significance result in this table is the area marked GENDER*TRIBE. In order to determine the significance interaction, we check the significance column for the value. We consider any value that falls within the significance range of.05 (equal to or less than.05). Such values enable the researcher conclude that there is a significant interaction effect. In this case, the interaction effect is.405 (GENDER*TRIBE: Sig. =.405) this is [F (5, 333).364, p =.405)]. This value indicates the lack of significant difference in the effect of age on the level of education among female and male respondents.

Main effect

There is no significant interaction in the above case. Consequently, we can look at the main effects i.e. the effect on “a single independent variable on other variables” (Pallant, 2005). For instance, we take look at the variable GENDER so as to determine whether we have main effect for every independent variable. We check significance column for every variable. Any value than is less than or equal to.05 indicates a significant main effect for that particular variable. For GENDER, we have significance value of.286. Thus, there is no main effect. However, for TRIBE, the significance value is.009. This is less than.05. Thus, we have significant of main effect. Male and female do not differ in their levels of education. However, there is a difference among tribes in terms of their ages.

Effect size

We can locate the effect size for TRIBE on the Partial Eta Squared column. This value is.046. If we consider Cohen’s criterion, then this effect size is small (Cohen, 1988). This is statistical significance. We can see this from the Descriptive statistics of gender across all tribes (see below). The difference is little for any practical significance.

Post hoc tests

This shall help us determine where the differences exist among various tribes. This provides a systematically difference among means for all pairs of the groups. We have significant main effect for TRIBE. Thus, we can determine where differences exist using multiple comparisons.

Multiple comparisons use Tukey HSD to show significance differences among variables. In this column, we check values that are less than.05. We can also identify where the differences exist from the asterisk in the Mean Difference column (see below). We have significance difference among Luo and Kikuyu (Sig. =.029), Kalenjin and Kikuyu (Sig. =.042).

Descriptive Statistics
Dependent Variable: Education level of respondents
Gender of respondents	Tribal affiliation of respondent	Mean	Std. Deviation	N
Male	Luo	1.58	.678	67
	Kikuyu	1.34	.479	64
	Kalenjin	1.59	.658	44
	Kisii	1.50	.707	10
	Luyha	1.45	.522	11
	Kamba	1.50	.548	6
	Total	1.50	.609	202
Female	Luo	1.62	.633	39
	Kikuyu	1.31	.471	26
	Kalenjin	1.59	.610	49
	Kisii	1.20	.447	5
	Luyha	1.60	.516	10
	Kamba	2.33	.577	3
	Total	1.55	.597	132
Total	Luo	1.59	.659	106
	Kikuyu	1.33	.474	90
	Kalenjin	1.59	.630	93
	Kisii	1.40	.632	15
	Luyha	1.52	.512	21
	Kamba	1.78	.667	9
	Total	1.51	.604	334

Multiple Comparisons
Education level of respondents Tukey HSD
(I) Tribal affiliation of respondents	(J) Tribal affiliation of respondents	Mean Difference (I-J)	Std. Error	Sig.	95% Confidence Interval
(I) Tribal affiliation of respondents	(J) Tribal affiliation of respondents	Mean Difference (I-J)	Std. Error	Sig.	Lower Bound	Upper Bound
Luo	Kikuyu	.26^*	.085	.029	.02	.51
	Kalenjin	.00	.085	1.000	-.24	.25
	Kisii	.19	.165	.846	-.28	.67
	Luyha	.07	.142	.996	-.34	.48
	Kamba	-.18	.207	.950	-.78	.41
Kikuyu	Luo	-.26^*	.085	.029	-.51	-.02
	Kalenjin	-.26^*	.088	.042	-.51	.00
	Kisii	-.07	.166	.999	-.54	.41
	Luyha	-.19	.145	.775	-.60	.22
	Kamba	-.44	.209	.274	-1.04	.15
Kalenjin	Luo	.00	.085	1.000	-.25	.24
	Kikuyu	.26^*	.088	.042	.01	.51
	Kisii	.19	.166	.858	-.28	.67
	Luyha	.07	.144	.997	-.35	.48
	Kamba	-.19	.208	.948	-.78	.41
Kisii	Luo	-.19	.165	.846	-.67	.28
	Kikuyu	.07	.166	.999	-.41	.54
	Kalenjin	-.19	.166	.858	-.67	.28
	Luyha	-.12	.202	.990	-.70	.45
	Kamba	-.38	.252	.663	-1.10	.34
Luyha	Luo	-.07	.142	.996	-.48	.34
	Kikuyu	.19	.145	.775	-.22	.60
	Kalenjin	-.07	.144	.997	-.48	.35
	Kisii	.12	.202	.990	-.45	.70
	Kamba	-.25	.238	.893	-.94	.43
Kamba	Luo	.18	.207	.950	-.41	.78
	Kikuyu	.44	.209	.274	-.15	1.04
	Kalenjin	.19	.208	.948	-.41	.78
	Kisii	.38	.252	.663	-.34	1.10
	Luyha	.25	.238	.893	-.43	.94
Based on observed means. The error term is Mean Square(Error) =.356.
*. The mean difference is significant at the.05 level.

The plot shows the impact of tribe on education across.

We can conclude that the two-way ANOVA between groups explored the effect of tribes on education levels among the respondents. The researcher grouped the respondents according to their tribes (Luo, Kalenjin, Kikuyu, Luyha, Kamba, and Kisii). We found significant main effect for tribes at Sig. value of.009. However, Partial Eta Squared revealed that the effect size was small (.046). Post hoc tests using the Tukey HSD scale showed that the significant difference existed among Luo, Kikuyu, and Kalenjin. The rest of the groups did not have significance differences.

References

Boyce, J. (2003). Market research in practice. Boston: McGraw-Hill.

Brown, J. D. (1997). Skewness and kurtosis. Shiken: JALT Testing & Evaluation, 1(1), 20-23.

Cohen, J. W. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

Cooper, D. R. and Schindler, P. S. (2003). Business research methods (8th ed.). Boston: McGraw-Hill.

DeVellis, R. F. (2003). Scale development: Theory and applications (2nd ed.). Thousand Oaks, California: Sage.

Everitt, B. S. (1996). Making sense of statistics in psychology: A second level course. Oxford: Oxford University Press.

Gravetter, F. J. and Wallnau, B. (2000). Statistics for the behavioral sciences (5th ed.). Belmont, CA: Wadsworth.

Greene, J. and d’Oliveira, M. (1999). Learning to use statistical tests in psychology (2^nd ed.). Buckingham: Open University Press.

Grimm, L. G. and Yarnold, P. R. (1995). Reading and understanding multivariate statistics. Washington, DC: American Psychological Association.

Hair, J. F., Tatham, R. L., Anderson, R. E. and Black, W. C. (1998). Multivariate data analysis (5th ed.). New York: Prentice Hall.

Harris, R. J. (1994). ANOVA: An analysis of variance primer. Itasca, Ill: Peacock.

Hayes, N. (2000). Doing psychological research: Gathering and analysing data. Buckingham: Open University Press.

Keppel, G. and Zedeck, S. (1989). Data analysis for research designs: Analysis of variance and multiple regression/correlation approaches. New York: Freeman.

Kline, P. (1986). A handbook of test construction. New York: Methuen.

Lord, F. M. and Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

Nunnally, J. O. (1978). Psychometric theory. New York: McGraw-Hill.

Pallant, J. (2005). SPSS Survival Manual. Sydney: Ligare.

Smithson, M. (2000). Statistics with confidence. London: Sage.

Stevens, J. (1996). Applied multivariate statistics for the social sciences (3rd ed.). Mahway, NJ: Lawrence Erlbaum.

Tabachnick, B. G. and Fidell, L. S. (2001). Using multivariate statistics (4th ed.). New York: HarperCollins.