Chi-Square Using SPSS Analysis

Subject: Sciences
Pages: 5
Words: 1333
Reading time:
5 min
Study level: PhD

Introduction

This document is an essay that captures the solutions to the assignment given on nonparametric tests and chi-square with SPSS. In this assignment, there are seven tasks to be accomplished. The first task is to state the statistical assumptions that underlie a chi-square test. The second task is to select a dataset and from it choose the independent and dependent variables. The third task is to formulate the null and alternative hypotheses. The fourth task is to use SPSS to calculate a chi-square. The fifth task is to make a decision on whether or not to accept the null hypothesis. The sixth task is to generate the SPSS syntax and output files. The final task is to report the results of the SPSS analysis using the correct APA format.

In statistical inference, nonparametric tests or distribution-free tests are statistical procedures for use in cases where it is not appropriate to use parametric tests. Such cases occur when the assumptions underlying the tests do not agree with the assumptions underlying a standard parametric procedure. Nonparametric tests or distribution-free tests are perfect for data that is nominal and ordinal (Changing Minds, 2011, p. 2). An example of a nonparametric test is the chi-square test for independence.

Given two nominal variables that are taken to be independent of each other, a chi-square test for independence will reveal if there is any relationship and/or dependency between the two variables (Schloesser, 2000, p. 1). Additionally, this test gives the difference between the observed frequency and the expected frequency, which is called the residual. Observed frequencies are the actual frequencies that emerge from a survey whereas expected frequencies are non-actual as they are suggestions based on speculation, theory or prior research. An example of a research problem that can be solved using a chi-square test for independence is; whether there is a relationship between fish-eating and level of intelligence. For this example, one variable would be fish-eater, yes or no and the other variable would be intelligence level, is it low, moderate or high.

Assumptions

A number of assumptions underlie a chi-square test for independence (StatTrek.com, 2011, p. 4). One of the assumptions is that each piece of data or observation in the datasets being compared in the test is independent of the other. The sample that gives the data set is assumed to be random and from a population or distribution that is fixed. To avoid inaccuracy, the sample (or data set) size is assumed to be sufficiently large. The final assumption is that the expected frequencies for each cell are greater than five.

The dependent and independent variable

The data set for use in this exercise is selected from the GSS data disk that came with the course text Research Methods in the Social Sciences. The data set comprises data corresponding to the responses of participants involved in a General Social Survey (GSS) in which questions were asked that included race income, gender, and opinion type questions. From this data set, two variables are selected, namely, sex and income. Sex is the dependent variable whereas income is the independent variable. Sex takes one of two values, 1 and 2. 1is used to represent male sex and 2 is used to represent female sex. Income takes one of thirteen values. Each value from 1 to 12 represents a different level of income, for example, 1 represents an income of less than $1000, 2 represents an income of between $1000 and $1999, 3 represents an income of between $2000 and $2999 and so on. Value 13 indicates that the participant in the general social survey refused to disclose his or her income.

The null and alternative hypothesis

The goal of the chi-square test for independence in this exercise will be to determine if there is a relationship between gender (sex) and income. The null hypothesis, H0, for the test is; there is a relationship between gender and income. The alternative hypothesis, H1, for the test is; there is no relationship between gender and income. From these hypotheses, it should be noted that the chi-square test for independence in this exercise is 2-tailed.

When the null hypothesis is denied, when in actuality it should be accepted the first kind of error occurs, and when the reverse of this happens a second kind occurs. The first kind of error described is a type I error while the second is a type II error. In hypothesis testing, we accept the null hypothesis with a certain level of confidence that there is no type I error or we accept it on the basis that there is strong or very strong evidence indicating that there is no type II error. To accept the null hypothesis using the first basis the sample statistic, which is computed from the samples (datasets) being used in the test must lie inside an appropriate confidence interval. To accept the null hypothesis using the second basis the probability of committing a type II (known as p value) error must be less than the probability of committing a type I error (Mason et al, 1999).

Test and test results

To test the above hypotheses using a chi-square test for independence in SPSS, the syntax shown in appendix A is run. The output of the syntax, which is also shown in appendix A is three tables, namely, case processing summary, sex*income crosstabulation, and chi-square tests. The case processing summary table gives a quick summary of the variables used in the test. From the case processing summary table in appendix A, we can see that the total number of observations is 2812, which is the sum of valid observations (= 2692) and missing values (= 120). The sex*income crosstabulation table gives the observed frequencies and expected frequencies for each cell. It again gives the standardized residual for each cell, which is the difference between the observed frequencies and the expected frequencies. Lastly, it gives the sum of the residuals and the sum of the observed and expected frequencies for each income level and participant response. From the chi-square tests table, the Pearson chi-square value is 32.124a, its degrees of freedom are 12 and the significance value is 0.001. Although there are other values in the table, only these are of interest to us since we are undertaking a chi-square test for independence.

Conclusion

From the chi-square table, the significance value for the Pearson chi-square is 0.001. Since this value is very small, this is an indication that there is very strong evidence that the two variables being tested are related. Therefore, the null hypothesis, H0, is taken to be true and the conclusion made that, there is a relationship between gender and income.

References

  1. Changing Minds. (2011). Parametric vs. non-parametric tests. Web.
  2. Mason, R. D., Lind, D, A. and Marchal, W. G.. (1999). Statistical techniques in business and economics. (10th ed.) Irwin/McGraw-Hill; USA. 316
  3. Schloesser, N. (2000). Chi-square test for independence. Web.
  4. StatTrek.com. (2011). AP statistics tutorial: chi-Square test for independence.

Appendix

Appendix A

Syntax

GET

FILE=’C:UsersuserDocumentsgss04worth.sav’.

CROSSTABS

/TABLES=SEX BY INCOME

/FORMAT=AVALUE TABLES

/STATISTICS=CHISQ

/CELLS=COUNT EXPECTED RESID

/COUNT ROUND CELL.

Output

Crosstabs

[DataSet1] C:UsersuserDocumentsgss04worth.sav

Case Processing Summary
Cases
Valid Missing Total
N Percent N Percent N Percent
RESPONDENTS SEX * TOTAL FAMILY INCOME 2692 95.7% 120 4.3% 2812 100.0%
RESPONDENTS SEX * TOTAL FAMILY INCOME Crosstabulation
TOTAL FAMILY INCOME Total
LT $1000 $1000 TO 2999 $3000 TO 3999 $4000 TO 4999 $5000 TO 5999 $6000 TO 6999 $7000 TO 7999 $8000 TO 9999 $10000 – 14999 $15000 – 19999 $20000 – 24999 $25000 OR MORE REFUSED
RESPONDENTS SEX MALE Count 21 14 7 9 9 12 8 12 50 64 72 865 91 1234
Expected Count 21.1 13.8 9.6 8.7 11.5 17.4 11.0 16.5 73.3 70.1 76.1 808.6 96.3 1.2E3
Residual .0 .2 -2.6 .3 -2.5 -5.4 -3.0 -4.5 -23.3 -6.1 -4.1 56.4 -5.3
FEMALE Count 25 16 14 10 16 26 16 24 110 89 94 899 119 1458
Expected Count 24.9 16.2 11.4 10.3 13.5 20.6 13.0 19.5 86.7 82.9 89.9 955.4 113.7 1.5E3
Residual .1 -.2 2.6 -.3 2.5 5.4 3.0 4.5 23.3 6.1 4.1 -56.4 5.3
Total Count 46 30 21 19 25 38 24 36 160 153 166 1764 210 2692
Expected Count 46.0 30.0 21.0 19.0 25.0 38.0 24.0 36.0 160.0 153.0 166.0 1764.0 210.0 2.7E3
Chi-Square Tests
Value df Asymp. Sig. (2-sided)
Pearson Chi-Square 32.124a 12 .001
Likelihood Ratio 32.803 12 .001
Linear-by-Linear Association 8.904 1 .003
N of Valid Cases 2692
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 8.71.