Review Correlation and Regression in Education

Subject:	Sciences
Pages:	9
Words:	2605
Reading time:	10 min
Study level:	PhD

Assignment

Exploratory Data Analysis of Chamorro-Premuzic.sav

Age. Descriptive statistics (Table 1) shows the age of respondents varies from zero to 43 years with a mean of 19.60, mode of 18, and a standard deviation of 3.448. In distribution, the age of respondents has positive skew (skewness = 2.35) and a sharp peak (kurtosis = 17.989).

Table 1.

Descriptive Statistics of Age
N	Valid	404
	Missing	26
Mean		19.60
Median		19.00
Mode		18
Std. Deviation		3.448
Variance		11.888
Skewness		2.350
Std. Error of Skewness		.121
Kurtosis		17.989
Std. Error of Kurtosis		.242
Range		43
Minimum		0
Maximum		43

Gender. The frequency distribution of gender (Table 2) indicates that out of 424 respondents, females comprised 71.4% (307), while males constituted (27.2% (117). However, the data have six missing values forming 1.4% of valid respondents.

Table 2.

Frequency Distribution of Gender
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Female	307	71.4	72.4	72.4
	Male	117	27.2	27.6	100.0
	Total	424	98.6	100.0
Missing	System	6	1.4
Total		430	100.0

Students’ Personality Traits

Table 3 depicts descriptive statistics of personality traits of students, namely, neuroticism, extroversion, openness, agreeableness, and conscientiousness. The trait of neuroticism has a mean of 23.63, mode of 24.00, and median of 24.00 with a standard deviation of 8.60. With maximum and minimum values of 0 and 44, respectively, the distribution has no skewness (0.005), but it has negative kurtosis of -0.312. The trait of extroversion has a mean of 30.10, median of 31.00, and mode of 33 with a standard deviation of 6.32. Extroversion score ranges from 5 to 46 with negative skewness (-0.459) and a positive kurtosis (0.671). The trait of openness has a mean of 28.83 (SD = 6.22), a median of 29.00, and a mode of 29 with maximum and minimum values of 14 and 46, correspondingly. The distribution of openness has a positive skew of 0.161 and a negative kurtosis of -0.462. The trait of agreeableness among students has a mean of 46.52 (SD = 7.45), a median of 47.00, and a mode of 50. The distribution has a negative skew of -0.136 and a positive kurtosis of 0.204. The trait of conscientiousness has a mean of 30.20, median of 30.00, and mode of 34.00. The distribution has a slightly positive skewness (0.029) and a positive kurtosis of 0.239.

Table 3.

Descriptive Statistics of Students’ Personality Traits
		Student Neuroticism	Student Extroversion	Student Openness	Student Agreeableness	Student Conscientiousness
N	Valid	420	418	418	413	416
N	Missing	10	12	12	17	14
Mean		23.6262	30.1029	28.8301	46.5157	30.1971
Median		24.0000	31.0000	29.0000	47.0000	30.0000
Mode		24.00	33.00	29.00	50.00	34.00
Std. Deviation		8.60431	6.31897	6.21882	7.45295	6.75489
Variance		74.034	39.929	38.674	55.546	45.629
Skewness		.005	-.459	.161	-.136	.029
Std. Error of Skewness		.119	.119	.119	.120	.120
Kurtosis		-.312	.671	-.462	.204	1.040
Std. Error of Kurtosis		.238	.238	.238	.240	.239
Range		44.00	41.00	32.00	48.00	51.00
Minimum		.00	5.00	14.00	25.00	7.00
Maximum		44.00	46.00	46.00	73.00	58.00

Expected Lecturers’ Personality Traits

Descriptive statistics (Table 4) show that students have different expectations regarding the personality traits of their lecturers. The train of neuroticism ranges from -30 to 25 with a mean of -21.68, median of -24, and mode of -30 with a positive skew of 1.91 and a positive kurtosis of 4.95. Concerning the trait of extroversion ranges from -6 to 28 with a mean of 12.96, median of 13.00, and mode of 10 without skewness but a negative kurtosis of -0.213. Openness is the trait that ranges from -15 to 30 with a mean of 8.77, median of 8, and mode of 6. The trait of agreeableness ranges from -21 to 29 with a mean of 8.89 (SD = 9.58) with a negative skewness of -0.154 and negative kurtosis of -0.467. Conscientious has values that range from -8 to 30 with a mean of 6.29 (SD = 7.72) with a negative skewness of -0.496 and a negative kurtosis of -0.115.

Table 4.

Descriptive Statistics of the Expected Lecturers’ Personality Traits
		Student Wants Neuroticism in Lecturers	Student Wants Extroversion in Lecturers	Student Wants Openness in Lecturers	Student Wants Agreeableness in Lecturers	Student Wants Conscientiousness in Lecturers
N	Valid	417	283	420	417	417
N	Missing	13	147	10	13	13
Mean		-21.6882	12.9576	8.7690	8.8825	16.2854
Median		-24.0000	13.0000	8.0000	9.0000	17.0000
Mode		-30.00	10.00	6.00	2.00	15.00
Std. Deviation		9.14408	6.94494	8.08466	9.57577	7.71908
Variance		83.614	48.232	65.362	91.695	59.584
Skewness		1.910	.017	.161	-.154	-.496
Std. Error of Skewness		.120	.145	.119	.120	.120
Kurtosis		4.948	-.239	-.213	-.467	-.115
Std. Error of Kurtosis		.238	.289	.238	.238	.238
Range		55.00	34.00	45.00	50.00	38.00
Minimum		-30.00	-6.00	-15.00	-21.00	-8.00
Maximum		25.00	28.00	30.00	29.00	30.00

Scatter Plots

Students’ Agreeableness/Lecturers’ Agreeableness

Figure 1. Scatterplot of Students’ Agreeableness and Expected Lecturers’ Agreeableness

Students’ Extroversion/Lecturers’ Extroversion

Figure 2. Scatterplot of Students’ Extroversion and Expected Lecturers’ Extroversion

Students’ Agreeableness/Lecturers’ Extroversion

Figure 3. Scatterplot of Students’ Agreeableness and Expected Lecturers’ Extroversion

Students’ Extroversion/Lecturers’ Agreeableness

Figure 4. Scatterplot of Students’ Extroversion and Lecturers’ Agreeableness.

The scatterplots show that students’ agreeableness and extroversion have a positive relationship with varying strengths of relationships. In figure 1, students’ agreeableness accounts for 2.7% (R² = 0.027) of the variation in lecturers’ agreeableness, while in figure 2, students’ extroversion explains 2.3% of the variation in lecturers’ agreeableness. Students’ agreeableness has negligible effect because it accounts for 0.2% (R² = 0.002) of the variation in lecturers’ extroversion. Moreover, students’ extroversion does not affect because it explains 0% (R² = 0.000) of the variation in lecturers’ agreeableness.

Correlation Analysis

Since the missing values are present in data, the analysis handled them by pairwise deletion. George and Mallery (2016) explain that the pairwise deletion of missing values is advantageous because it does optimize not only data but also increases the power of analysis. In correlation analysis, the study employed a two-tailed test because the relationships between variables can be either negative or positive. Table 5 reveals that students’ extroversion has statistically significant but weak positive relationship with the expected lecturers’ extroversion (r = 0.153, p = 0.010). However, students’ extroversion has no statistically significant relationships with students’ agreeableness (r = 0.080, p = 0.106) and the lecturers’ expected agreeableness (r = 0.004, p = 0.932). Students’ agreeableness has no relationship with the expected lecturers’ extroversion (r = 0.05, p = 0.412) but with statistically significant positive relationship with the expected lecturers’ agreeableness (r = 0.164, p = 0.001). The expected lecturers’ agreeableness has statistically significant positive relationship with the expected lecturers’ extroversion (r = 0.118, p = 0.049).

Table 5.

Correlation Analysis
		Student Extroversion	Student Agreeableness	Student Wants Extroversion in Lecturers	Student Wants Agreeableness in Lecturers
Student Extroversion	Pearson Correlation	1	.080	.153	.004
	Sig. (2-tailed)		.106	.010	.932
	N	418	406	281	411
Student Agreeableness	Pearson Correlation	.080	1	.050	.164
	Sig. (2-tailed)	.106		.412	.001
	N	406	413	276	405
Student Wants Extroversion in Lecturers	Pearson Correlation	.153	.050	1	.118
	Sig. (2-tailed)	.010	.412		.049
	N	281	276	283	280
Student Wants Agreeableness in Lecturers	Pearson Correlation	.004	.164	.118	1
	Sig. (2-tailed)	.932	.001	.049
	N	411	405	280	417

Regression

The regression analysis to predict the effect of students’ extroversion of the expected lecturers’ extroversion meets the assumptions of linearity of relationship, lack of collinearity, and significant outliers are absent. The R-value is similar to that of the correlation because it indicates there is a weak relationship between students’ extroversion and the expected lecturers’ extroversion (R = 0.153). The regression model shows that students’ extroversion explains 2.3% of the variation in the expected lecturers’ extroversion (R² = 0.023) (Table 6). The regression model is statistically significant in predicting the influence of students’ extroversion on the expected lecturers’ extroversion, F(1,279) = 6.687, p = 0.010 (Table 7). Coefficient shows that an increase in students’ extroversion by a unit causes the expected lecturers’ extroversion to increase by 0.160 (Table 7), which is statistically significant at the alpha level of 0.1 (two-tailed test).

Table 6.

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.153^a	.023	.020	6.82989
a. Predictors: (Constant), Student Extroversion

Table 7.

ANOVA^a
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	311.947	1	311.947	6.687	.010^b
	Residual	13014.630	279	46.647
	Total	13326.577	280
a. Dependent Variable: Student Wants Extroversion in Lecturers
b. Predictors: (Constant), Student Extroversion

Table 8.

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.	95.0% Confidence Interval for B		Collinearity Statistics
Model		B	Std. Error	Beta	t	Sig.	Lower Bound	Upper Bound	Tolerance	VIF
1	(Constant)	8.220	1.866		4.405	.000	4.547	11.893
1	Student Extroversion	.160	.062	.153	2.586	.010	.038	.281	1.000	1.000
a. Dependent Variable: Student Wants Extroversion in Lecturers

Multiple Regression

Multiple regression analysis of the effect of age, gender, and students’ extroversion on the expected lecturers’ extroversion met the assumption of linearity of relationship, a continuous scale of the dependent variable, lack of autocorrelation, and absences of significant outliers. Table 9 shows that there is a weak relationship between the independent variables and the dependent variable (R = 0.168). Moreover, regression analysis indicates that age, gender, and students’ extroversion accounts for 2.8% of the variation in the expected lecturers’ extroversion (R² = 0.028).

Table 9.

Model Summary
Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	.168^a	.028	.018	6.82934
a. Predictors: (Constant), Student Extroversion, Gender, Age

The regression model (Table 10) is statistically significant in predicting the effect of age, gender, and the expected lecturers’ extroversion, F(3,276) = 2.673, p = 0.048.

Table 10.

ANOVA^a
Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	373.952	3	124.651	2.673	.048^b
	Residual	12872.615	276	46.640
	Total	13246.568	279
a. Dependent Variable: Student Wants Extroversion in Lecturers
b. Predictors: (Constant), Student Extroversion, Gender, Age

Table 11 reveals that only students’ extroversion is a statistically significant predictor of the expected lecturers’ extroversion (β = 0.161, p = 0.010). Age (β = 0.019, p = 0.866) and gender (β = 1.036, p = 0.265) are not statistically significant predictors based alpha level of 0.05.

Table 11.

Coefficients^a
Model		Unstandardized Coefficients		Standardized Coefficients	t	Sig.	95.0% Confidence Interval for B		Collinearity Statistics
Model		B	Std. Error	Beta	t	Sig.	Lower Bound	Upper Bound	Tolerance	VIF
1	(Constant)	7.560	2.844		2.658	.008	1.961	13.159
	Age	.019	.109	.010	.169	.866	-.197	.234	.996	1.004
	Gender	1.036	.927	.066	1.118	.265	-.789	2.861	.997	1.003
	Student Extroversion	.161	.062	.155	2.607	.010	.039	.283	.998	1.002
a. Dependent Variable: Student Wants Extroversion in Lecturers

Pearson Correlation

An area of research interest comprises factors that influence economic growth and development in various countries. The inflation rate and the gross domestic product are two variables that correlation analysis can determine the magnitude and direction of the relationship. The inflation rate exists on a ratio scale because its measurement is on percent changes, while the gross domestic product is on an interval scale since its values are in dollars. The mock finding is that the inflation rate has a moderately negative effect on the gross domestic product of a developing country (r = -0.4, p = 0.031). R²of 0.16 (r²) indicates that inflation explains 16% of the variation in the gross domestic product. The economic resilience of a country is a third variable that mediates the effect of inflation on the gross domestic product.

Spearman’s Correlation

Spearman’s correlation can assess the strength and course of the relationship between the degree of experience and the proportion of sales that employees make per year. The degree of experience exists on an ordinal scale that ranks years based on five-year categories. In addition, the proportion of sales is on an ordinal scale of 10% increments. The mock finding of Spearman’s correlation shows that there is a positive relationship between the degree of experience and the proportion of sales among employees in an organization (ρ = 0.65, p = 0.029). The effect size indicates that the relationship is strong because the degree of experience explains 42.3% of the variation in the proportion of sales. However, age is a third variable that confounds the effect of the degree of experience on the proportion of sales.

Partial Correlation vs. Semi-Partial Correlation

Population, inflation, and gross domestic product are three variables that I can use in calculating partial correlation. The similarity between the partial correlation and semi-partial correlation is the control effect of one or more third variables. However, the difference is that partial correlation control for the effect of a third variable on both correlating variables, while the semi-partial correlation control for the effect of a third variable on either of the correlating variables (Darlington & Hayes, 2016). In this case, partial correlation entails controlling for the effect of population on the relationship between inflation and the gross domestic product. Comparatively, semi-partial correlation involves controlling the effect of the population on either inflation or the gross domestic product in their relationship. In research, I would use partial correlation because it controls for the effects of population on both inflation and the gross domestic product.

Simple Regression

The unemployment rate and the crime rate are two variables that I could use in calculating a simple regression analysis. The unemployment rate is the proportion of people unemployed in the labor market, while the crime rate represents the fraction of reported crimes in the population. Both the unemployment rate and the crime rate exist on a ratio scale. The unemployment rate would be a predictor because it influences the crime rate, the outcome variable. In simple regression, R² would indicate the extent to which the unemployment rate explains the variation in the crime rate.

Multiple Regression

Property price, the interest rate, distance from the city, and the number of rooms are four variables that I could use in performing multiple regression analysis. Property of prices represents the value of houses in dollars on a ratio scale. The interest rate is the proportion of money charged on loaned money, which is on an interval scale. The distance from the city in kilometers exists on an interval scale. The number of rooms, which is on a ratio scale, shows the size of the property. Interest rates, distance from the city, and the number of rooms represent the predictor variables since they are independent variables that influence the property price, which is the outcome variable. The best method that I would use in multiple regression is stepwise because it allows the selection of significant predictors in a model. R²would depict the collective effect of all predictor variables on the property price, whereas adjusted R²shows the cumulative influence of weighty predictors on the outcome variable.

Logistic Regression

The price of a product, the availability of a product, and the choice of customers are three variables that could be analyzed with logistic regression. The price of a product exists on a continuous scale of dollars, while the availability of a product and the choice of customers are on a categorical scale (dichotomous scale of no or yes). The choice of customers is an outcome variable because it exhibits consumer behavior and is on a dichotomous scale. The price and the availability of products are predictor variables since they influence the consumer behavior of preferences. I would use the regression method of entering because it includes all predictors in the modeling of the regression equation. The regression output would generate Nagelkerke R Square, which indicates the degree to which the price and the availability of products influence consumer behavior. Moreover, the regression output would create coefficients for each predictor indicating magnitude, the direction of influence, odds ratio, and significance.

References

Darlington, R. B., & Hayes, A. F. (2016). Regression analysis and linear models: Concepts, applications, and implementation. New York, NY: The Guilford Press.

George, D., & Mallery, P. (2016). IBM SPSS statistics 23 step by step: A simple guide and reference (4th ed.). New York, NY: Routledge.