## Introduction

In statistical analysis, it is critical to properly formulate a research question and to determine the variables to be measured and the tests to be run. This paper offers a research question that could be answered using the “WK1.spss.Dataset.New.sav” dataset, and explains which test needs to be carried out to answer this question. The justification for selecting this test is also provided.

## Research Question and Variables for the Dataset

A possible research question related to the given dataset is as follows: Is there a significant difference in the average body mass index between people who have had a stroke and people who have not had a stroke?

The variables that need to be used in order to answer this question are stroke (independent) and BMI (dependent). The level of measurement for the stroke variable is categorical (Warner, 2013, p. 9); it might be called nominal because it is meaningless to use the operations < or > to it (Warner, 2013, p. 7). On the other hand, the BMI variable is quantitative; more specifically, it is rather an interval than a ratio, because it is meaningless to multiply or divide different body mass indexes (Warner, 2013, pp. 7, 9).

These variables were chosen because they are directly related to the research question; it is needed to determine whether there is a difference in BMI between the people who have had a stroke and those who have not, which is reflected in the chosen variables.

## The Test to Be Used

For this study, an independent samples t-test should be used (George & Mallery, 2016). Levene’s test for equality of variances will help determine whether the pooled (for equal variances) or separate (for different variances) t-test should be employed (Forthofer, Lee, & Hernandez, 2007). The test will compare the means of the dependent variable (BMI) across the groups into which the sample will be divided according to the values of the independent variable (stroke).

## Justification for Using the Test

The proposed research question requires comparing the means of the body mass index (which is reflected in the dependent variable, BMI) of the two groups of people: those who have had a stroke and those who have not had it (which is shown with the independent variable, stroke). When it is needed to compare the means of the same quantitative variable across two groups in order to determine whether there is a statistically significant difference between them, a t-test should be used (Field, 2013).

The two groups into which the sample is split by the stroke variable are groups of different people; the division into these groups is exhaustive and mutually exclusive, which is appropriate for an independent samples t-test, and not for a repeated measures t-test (Field, 2013).

A chi-square test ought not to be used for this research question because it is not required to determine whether the observed results differ from the expected ones, or to find out whether there is a relationship between two variables (Warner, 2013).

## Conclusion

Therefore, in order to answer the research question “Is there a significant difference in the average body mass index between people who have had a stroke and people who have not had a stroke?,” it is needed to use the variables stroke (independent) and BMI (dependent) from the provided dataset. An independent samples t-test should be run to compare the means of the body mass index across the two groups.

## References

Field, A. (2013). *Discovering statistics using IBM SPSS statistics *(4th ed.). Thousand Oaks, CA: SAGE Publications.

Forthofer, R. N., Lee, E. S., & Hernandez, M. (2007). *Biostatistics: A guide to design, analysis, and discovery* (2nd ed.). Burlington, MA: Elsevier Academic Press.

George, D., & Mallery, P. (2016). *IBM SPSS Statistics 23 step by step: A simple guide and reference* (14th ed.). New York, NY: Routledge.

Warner, R. M. (2013). *Applied statistics: From bivariate through multivariate techniques* (2nd ed.). Thousand Oaks, CA: SAGE Publications.