Relationship of Incentives and Customer Service Evaluations

Subject: Sciences
Pages: 8
Words: 3922
Reading time:
14 min
Study level: PhD

The purpose of this study is to determine if there is truth in the assumption that by adding an incentive at the end of a transaction, a customer may potentially feel as though they were treated better than normal. As a result, customers who take a customer satisfaction survey may be convinced that the service was better than it initially was and provide a higher score for the service in the store which they patronized.

The idea is that organizations can potentially offer incentives to their customers in order to make them feel better about the organization and also increase their customer service scores. As a result, organizations will be able to retain more customers and therefore increase their overall profits while satisfying a greater number of patrons. This research also examines if more customers will take the survey and this will help organizations increase the feedback rate they receive. With more feedback, organizations will have a better idea as to what their customers are demanding and what they need to do in order to please their customers. This chapter describes the analysis and results of data analysis employing both the Chi-Squared Test of Independence and ANOVA Analysis of Variance.

Results

Test of hypothesis 1

Hypothesis H01 states that the type of incentive offered for participating in the interactive voice response (IVR), computer-scripted and automated telephone survey makes no difference in shopper ratings of store cleanliness. Put another way, the attractiveness of the incentive does not unduly bias shoppers from giving their true opinion about how well the store comes up to ordinary standards for cleanliness.

In this and subsequent analyses, shoppers are classified into one of four independent groups according to the store where they had been randomly selected to participate in the survey. By incentive provided, the four groups were: no incentive for participation (control group), survey participants received a candy bar (intervention 1), shoppers were entered into a drawing for a $25 gift card (intervention 2), and qualified for a drawing of a $250 gift card (intervention 3). Location and the associated intervention, therefore, comprise the independent variable in the study.

Table 1 (below) and Fig. 1 (overleaf) reveal that, as expected, there was no change for the control group between baseline and test periods. On the other hand, the proportion of shoppers satisfied with store cleanliness rose across the board wherever an incentive was implemented.

Table 1: Proportions Satisfied with Cleanliness At Baseline and Field Trial Periods.

Condition Baseline Test
No incentive 84.4 83
Candy Bar 83.3 91
$25 gift card 82.7 89
$250 gift card 83.3 91
Change in Cleanliness Satisfaction
Figure 1: Change in Cleanliness Satisfaction, by Experimental Condition.

The researcher opted to run multiple chi square tests of independence against the no-incentive group rather than testing goodness-of-fit across all four shopper groups all at once because:

  • The multi-group analysis defaults to testing against “expected values” across the entire contingency table, as a result of which variances are inflated and “significant” differences almost inevitable.
  • The nature of the research design is really a field quasi-experiment where the independent variable of incentive to participate is manipulated at three levels and the no-incentive group is effectively the control group.

Tables 2 and 3 below reveal the breakdown of satisfaction with cleanliness according to test conditions at baseline and after the field experiment phase.

Table 2: Proportions Satisfied with Cleanliness at Baseline, by Incentive Condition.

Crosstab
Type of incentive
No incentive Candy bar $25 gift card $250 gift card Total
Baseline period 82.7 Count 0 0 1 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive .0% .0% 100.0% .0% 25.0%
83.3 Count 0 1 0 1 2
Expected Count .5 .5 .5 .5 2.0
% within Type of incentive .0% 100.0% .0% 100.0% 50.0%
84.4 Count 1 0 0 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive 100.0% .0% .0% .0% 25.0%
Total Count 1 1 1 1 4
Expected Count 1.0 1.0 1.0 1.0 4.0
% within Type of incentive 100.0% 100.0% 100.0% 100.0% 100.0%

Table 3: Proportions Satisfied with Cleanliness During Test Period, by Incentive Condition.

Crosstab
Type of incentive
No incentive Candy bar $25 gift card $250 gift card Total
Test period 83 Count 1 0 0 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive 100.0% .0% .0% .0% 25.0%
89 Count 0 0 1 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive .0% .0% 100.0% .0% 25.0%
91 Count 0 1 0 1 2
Expected Count .5 .5 .5 .5 2.0
% within Type of incentive .0% 100.0% .0% 100.0% 50.0%
Total Count 1 1 1 1 4
Expected Count 1.0 1.0 1.0 1.0 4.0
% within Type of incentive 100.0% 100.0% 100.0% 100.0% 100.0%

Table 4: Chi Square result of Experimental Period Readings.

Chi-Square Tests
Value df Asymp. Sig. (2-sided)
Pearson Chi-Square 8.000a 6 .238
Likelihood Ratio 8.318 6 .216
Linear-by-Linear Association 1.688 1 .194
N of Valid Cases 4
a. 12 cells (100.0%) have expected count of less than 5. The minimum expected count is.25.

In turn, table 4 above summarizes the results of the Chi-Square Test of Independence between baseline and intervention periods across all four conditions. The chi square value for the overall test of independence is 8. At six degrees of freedom (df), the associated significance value of 0.24 does not meet the α = 0.05 hurdle. We are unable to reject the null hypothesis H01. Despite the appearance of a positive effect on visual inspection, we must therefore conclude that providing any incentive does not make a difference in altering store cleanliness ratings upwards.

Test of hypothesis 2

Hypothesis H02 states the customer’s attitude toward the checkout process being efficient is independent of the incentive used. Table 5 is the contingency table for this null hypothesis. Once again, we discern no change in the control store and seemingly appreciable improvements in the proportions of shoppers who rate the stores positively when exposed to an incentive.

Table 5: Proportions Satisfied with Checkout Efficiency At Baseline and Field Trial Periods.

Condition Baseline Test
No incentive 85.8 85
Candy Bar 84.0 95
$25 gift card 84.1 95
$250 gift card 83.3 94

Tables 6 and 7 (overleaf) present the breakdowns by stage, test condition and discrete proportions of shoppers who are satisfied with accuracy and speed at checkout.

Table 6: Proportions Satisfied with Checkout Process at Baseline.

Crosstab
Type of incentive
No incentive Candy bar $25 gift card $250 gift card Total
Baseline period 83.3 Count 0 0 0 1 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive .0% .0% .0% 100.0% 25.0%
84 Count 0 1 0 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive .0% 100.0% .0% .0% 25.0%
84.1 Count 0 0 1 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive .0% .0% 100.0% .0% 25.0%
85.8 Count 1 0 0 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive 100.0% .0% .0% .0% 25.0%
Total Count 1 1 1 1 4
Expected Count 1.0 1.0 1.0 1.0 4.0
% within Type of incentive 100.0% 100.0% 100.0% 100.0% 100.0%

Table 7: Proportions Satisfied with Checkout Efficiency During Experimental Period.

Crosstab
Type of incentive
No incentive Candy bar $25 gift card $250 gift card Total
Test period 85 Count 1 0 0 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive 100.0% .0% .0% .0% 25.0%
94 Count 0 0 0 1 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive .0% .0% .0% 100.0% 25.0%
95 Count 0 1 1 0 2
Expected Count .5 .5 .5 .5 2.0
% within Type of incentive .0% 100.0% 100.0% .0% 50.0%
Total Count 1 1 1 1 4
Expected Count 1.0 1.0 1.0 1.0 4.0
% within Type of incentive 100.0% 100.0% 100.0% 100.0% 100.0%

For the test of this hypothesis, the chi-square value once again stands at 8 (Table 8 overleaf). At 6 df, however, the associated statistical significance p = 0.24 once again fails to meet the p < 0.05 hurdle needed to have confidence that the “difference” represents meaningful rather than random variation. We cannot reject the null hypothesis that implementing an incentive makes no difference where better evaluations of checkout efficiency are concerned

Table 8: Summary Statistics for Chi Square Test of Significance: Efficiency of Checkout Process at Experimental Period.

Chi-Square Tests
Value df Asymp. Sig. (2-sided)
Pearson Chi-Square 8.000a 6 .238
Likelihood Ratio 8.318 6 .216
Linear-by-Linear Association 1.546 1 .214
N of Valid Cases 4
a. 12 cells (100.0%) have an expected count of less than 5. The minimum expected count is.25.

Test of hypothesis 3

Hypothesis H03 states that the customer’s attitude toward the staff being friendly and courteous is independent of the incentive used. Table 9 below summarizes average proportions satisfied with friendliness at both baseline and intervention stages. This overview reveals no change in the control store and such favorable response in the three test stores that the proportion of shoppers favorably impressed with store clerk friendliness rose very near to saturation point.

Table 9: Proportions Satisfied with Friendliness At Baseline and Field Trial Periods.

Condition Baseline Test
No incentive 82.5 83
Candy Bar 82.7 98
$25 gift card 83.1 96
$250 gift card 82.9 97

Tables 10 and 11 (overleaf) present the details of satisfaction with personnel attitudes by incentive type and baseline/test stages.

Table 10: Proportions Satisfied with Friendliness at Baseline.

Crosstab
Type of incentive
No incentive Candy bar $25 gift card $250 gift card Total
Baseline period 82.5 Count 1 0 0 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive 100.0% .0% .0% .0% 25.0%
82.7 Count 0 1 0 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive .0% 100.0% .0% .0% 25.0%
82.9 Count 0 0 0 1 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive .0% .0% .0% 100.0% 25.0%
83.1 Count 0 0 1 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive .0% .0% 100.0% .0% 25.0%
Total Count 1 1 1 1 4
Expected Count 1.0 1.0 1.0 1.0 4.0
% within Type of incentive 100.0% 100.0% 100.0% 100.0% 100.0%

Table 11: Proportions Satisfied with Friendliness, Experimental Period.

Crosstab
Type of incentive
No incentive Candy bar $25 gift card $250 gift card Total
Test period 83 Count 1 0 0 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive 100.0% .0% .0% .0% 25.0%
96 Count 0 0 1 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive .0% .0% 100.0% .0% 25.0%
97 Count 0 0 0 1 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive .0% .0% .0% 100.0% 25.0%
98 Count 0 1 0 0 1
Expected Count .2 .2 .2 .2 1.0
% within Type of incentive .0% 100.0% .0% .0% 25.0%
Total Count 1 1 1 1 4
Expected Count 1.0 1.0 1.0 1.0 4.0
% within Type of incentive 100.0% 100.0% 100.0% 100.0% 100.0%

The first null hypothesis involving perceptions of store clerk attitudes is evaluated in the result of the chi square test in Table 12 overleaf. Here we see that the chi-square value of 12 yields a significant statistic of 0.21 at nine df. This suggests that even the apparently convincing surge in shopper appreciation for the friendliness aspect of store service could very well have occurred by chance in about a fifth of re-samplings, were this study to be undertaken again.

Since there are not enough field observations to warrant an assessment that the baseline and test period distributions are independent and can therefore be evaluated as representing meaningfully different shopper responses, we reject the null hypothesis and once again conclude that the incentives do not meaningfully impact shopper satisfaction with store clerk attitudes.

Table 12: Summary Statistics for Chi-Square Test of Significance: Friendliness of Store Clerks, Experimental Period.

Chi-Square Tests
Value df Asymp. Sig. (2-sided)
Pearson Chi-Square 12.000a 9 .213
Likelihood Ratio 11.090 9 .270
Linear-by-Linear Association 1.611 1 .204
N of Valid Cases 4
a. 16 cells (100.0%) have expected count of less than 5. The minimum expected count is.25.

ANOVA Analysis of Variance (Hypothesis 4)

Recall the null hypothesis that there is no difference in the mean daily calls of customers as a function of incentive used by the store. The analytical approach that applies, in this case, is the analysis of variance because there are a total of four shopper groups being compared.

There are at least two items of interest in the compilation of descriptive statistics in table 13 overleaf. The first is that the mean proportion of shoppers who agreed to participate in the IVR-based customer satisfaction facility more than doubled from 27 percent to 58 percent. While this is an eminently satisfying finding, the paucity of observations has resulted in a standard deviation of 15. Among other things, this results in a situation where the upper bound of the 95% confidence interval for the mean at baseline overlaps with the lower bound for the “successful” experimental period.

Table 13: Descriptive Statistics for One-Way ANOVA.

Descriptives
Average calls
N Mean Std. Deviation Std. Error 95% Confidence Interval for Mean Minimum Maximum
Lower Bound Upper Bound
Baseline 4 27.625 5.1416 2.5708 19.444 35.806 22.5 34.4
Test 4 58.250 15.5430 7.7715 33.518 82.982 36.0 72.0
Total 8 42.938 19.5661 6.9177 26.580 59.295 22.5 72.0

Leven’s test for homogeneity of variances (Table 14 below) essentially tests the null hypothesis H0: σ2Baseline = σ2Experimental. Given that the significance statistic p = 0.16 exceeds the α = 0.05 benchmark, one fails to reject the null hypothesis, thus affording confidence that variances are equal and the assumption about homogeneity of variance is satisfied. This is a critical assumption for the ANOVA.

Table 14: Test of Homogeneity of Variances.

Average calls
Levene Statistic df1 df2 Sig.
2.553 1 6 .161

The results of the One-Way ANOVA, the appropriate tool in the case of multiple independent variables, are depicted in Table 15 below. The differences between groups – across all incentive conditions, as well as the control – are substantial enough to yield an F value of 14.0 (rounded off). Such a value is quite rare, being possible about once in a hundred re-runs of this field experiment (p = 0.01). Since this probability is well below the more commonly-accepted cut-off of α < 0.05, one concludes that the null hypothesis cannot be safely rejected. Incentives do help induce greater participation in the telephone survey.

Table 15:

Average calls
Sum of Squares df Mean Square F Sig.
Between Groups 1875.781 1 1875.781 13.997 .010
Within Groups 804.058 6 134.010
Total 2679.839 7

Discussion, Implications, Recommendations

Research Questions and Hypotheses

Recall that the research questions were, in the temporal order of desired shopper response:

  1. Will an extra incentive, over and above the existent $1,000 gift card drawing, provoke randomly selected shoppers to participate in the phone-in survey? Is there a difference in the mean daily calls in locations or during periods of no incentive versus when there is an incentive?
  2. What kind of extra incentive has a positive biasing effect on shopper satisfaction with critical shopper experience criteria?

Recall, in turn, that the null and alternative hypotheses were articulated as follows:

Series Null Hypotheses Alternative Hypotheses
1 The customer’s attitude toward the store’s cleanliness is independent of the incentive used. The customer’s attitude toward the store’s cleanliness is influenced by the incentive used.
2 The customer’s perception of checkout process efficiency is independent of the incentive used. The customer’s perception of checkout process efficiency is influenced by the incentive used.
3 The customer’s perception of friendliness and courtesy is independent of the incentive used. The customer perception of friendliness and courtesy is influenced by the incentive used.
4 There is no difference in phone survey participation according to the incentive used. The type of incentive offered influences the propensity to participate in the phone survey.

Summary of the Results

Going by the comparison of baseline and experimental stages with four test conditions, none of the three incentives are any better than no incentive at all for stimulating:

  • A greater number of shoppers into agreeing that the respective branches meet their personal criteria for outlet cleanliness.
  • A larger number of shoppers assenting that the checkout process meets their standard for speed and efficiency. On further analysis, this may be a compound variable (two variables in one) since rapid and error-free checkout are conceptually different.
  • More shoppers into agreeing that the store staff is unfailingly friendly and courteous.

On the other hand, this field test showed that incentives are effective in raising participation rates in what is, after all, a voluntary call-in facility (no doubt conveniently located in the stores’ premises) for dialing and responding to a customer satisfaction survey service. To the extent that more survey respondents improve the heterogeneity of feedback and therefore the overall reliability of this customer satisfaction survey tool, a program of incentives is well worth continuing.

Theoretical Analysis and Summary

The base tool of chi square analysis being sensitive to degrees of freedom or number of observations available, the findings of this two-period test yielded no statistically significant differences in surveyed perceptions of the customer satisfaction parameters: store cleanliness, checkout efficiency and staff courtesy. This can be considered analogous to the case of Sherlock Holmes’ “dog that did not bark” because the absence of statistically significant differences itself has certain implications. On the surface, the results signify that the chain can implement the least-cost option because consumer response will be the same regardless.

The question then becomes, what behavior is really being encouraged? The experiment mixed an immediate token reward (the candy bar) and a sweepstake with an unknown but expectedly low chance of winning the gift cards. This is an indirect sales promotion, given the underlying intent of customer satisfaction programs for maximizing customer experience, thereby enhancing loyalty and switching those who alternate their shopping trips in CHAIN X with competing retail establishments.

This quest for competitive advantage requires answers to certain strategic issues, among which are: how does one consistently incentivize voluntary participation in a call-in program? At this level, however, the test was really about encouraging more participation and not about biasing shopper evaluations favorably. Happily, the results of the experiment point to incentives effectively inducing better response to an invitation to participate to the phone-in survey. This participant base is the headcount of shoppers:

  • Who already respond to the prospect of winning a thousand dollars, even though the odds of winning may be low; and/or,
  • Those who relish the chance to give feedback either because they are very pleased or very disappointed.

In the first case, we therefore see what economists like to call “marginal utility” for the three minor incentives tested by this month-long experiment. The three incentives have absolutely no impact (or at best, only in a minor way) because shoppers already behave with the prospect of winning a thousand dollars in mind.

If that is so, then there is a distinct possibility that this channel for gathering customer satisfaction data is effectively a strictly voluntary, shopper-driven research method. Like comment cards, call-from-home/place-of-work campaigns, or Web site comment boxes, forums and chat/IM facilities, the operation of the low-cost, computer script-driven service subject of this experiment basically invites participation by the 20 percent or less of shoppers who are either very satisfied or extremely disappointed. Such methods risk being unrepresentative of shoppers in the middle who have more moderate opinions.

Limitations

The principal “limitation” of this field experiment design is that the prevailing incentive of a $1,000 drawing remained in force throughout the trial period. Shopper behavior was reinforced not just by the three levels of the incentive independent variable but by the continuing operation of a thousand-dollar drawing. Granted, it is open to debate whether a large incentive with low odds of winning is more compelling than a more modest incentive with provision for more winners.

The chain can determine for itself whether there is incremental participation value to the three incentives tried if the data were changed to become a percentage of each store’s traffic that had been invited by having their attention called to the marking on their receipts. As it is, the comparison with control-store headcounts creates the impression that participation improves but not the prospects of favorable feedback.

Conclusion

That the results are equivalent no matter the seeming attractiveness of the $25 and $250 cash incentives is more meaningful than at first meets the eye. First of all, a much higher incentive of a $1,000 drawing remained before, during and after the experimental period. The key finding of this experiment is therefore this: the chain can save money by implementing a drawing for the smaller amounts rather than the thousand-dollar prize since these at least enhance degree of participation.

It is comforting to know that incentives of differing (perceived) value do not necessarily bias shopper ratings. There is corporate illogic and myopia involved in wanting to test the effectiveness of incentives for raising customer satisfaction ratings. If the $250 drawing as incentive had provoked statistically significant improvements in ratings at that store, the result would still have begged the question of whether the branch was really meeting its key customer satisfaction performance indicators (KPIs). To be hardnosed about it, the sole independent variable in a customer satisfaction campaign should be how well each store meets shopper criteria for acceptable cleanliness, efficiency at checkout and staff attitudes.

In evaluating these results, one should also not neglect the possibility that all four stores uniformly meet their customer satisfaction KPI’s. Assuming that a customer satisfaction program had been in place for some time, one can reasonably expect that branch personnel have become adept at meeting minimum KPI levels at least. This explains why stores obtain the same satisfaction ratings regardless of incentives.

Suggestions for Future Research

To be more confident about eliciting the same participation rate even at lower incentive levels, the chain should consider running a longer field trial where the $1,000 incentive is officially discontinued and where actual performance on the three key indicators is also logged on a continuing basis for analysis as the true independent variable. A longer sampling series bears the triple virtues of evening out seasonality, reflecting the variety of competitive store chain promotions, and increasing the “degrees of freedom” that the chi square statistic employs to evaluate the reliability of a data series.

In customer satisfaction research, perceptions logically count for more than store managers’ protestations that they manage key performance indicators according to system-wide standards. Nonetheless, a statistical series and analysis that includes KPI’s would be more rigorous for having greater explanatory power and being more action-oriented than is the case with the limited data series at hand.

Two minor adjustments are also worth exploring. Total store traffic count and headcount notified of having been picked at random bear including in the evaluation database. This means that the surveyed customers can be reckoned in terms of participation rates or proportions. After all, staying with headcounts, as the customer satisfaction test program appears to do at present, relies too much on the assumption of equal store populations all the time.