Item Construction
Item construction is a complex process that requires the consideration of multiple concerns. Nowadays, researchers can employ various tools that may assist in item construction, and one of them is Rauthmann’s (2011) taxonomy. Using Rosenberg’s self-esteem scale (Eklund, Bäckström, & Hansson, 2018), as well as the literature on the topic, the present paper will review the issues of item construction, discuss possible response scales, offer three sample items, and analyze them with the help of Rauthmann’s (2011) taxonomy. For this paper, self-esteem was chosen as the construct to be considered.
Major Issues in Item Construction
The most common issues related to item construction are validity and reliability, which need to be ensured. The former describes the ability of an item to measure the studied phenomenon, and the latter refers to the consistency of items that can be defined as their ability to produce consistent results (Kline, 2015; Rauthmann, 2011). Reliable items which were developed to assess the same phenomenon are supposed to offer the same results over time. According to Rauthmann (2011), reliability depends on the content and format of items, and the author proposes various approaches to achieving this outcome.
Furthermore, items need to be unidimensional: that is, they must measure the construct that they are supposed to measure while not measuring other ones, even those that are related to the targeted phenomenon. Ziegler and Hagemann (2015) state that this goal may not be attainable, but researchers need to minimize the association of items with other constructs. The approaches that the authors suggest are different types of analyses. Thus, the issues of validity and reliability need to be addressed, and there exist the means of achieving them, but the process is admittedly challenging.
Types of Response Scales
The general types of response scales are unidimensional and multidimensional. A unidimensional scale measures only one dimension; a simple example is a weight or height, which uses unidimensional scales. Multidimensional ones are more complex since they incorporate multiple dimensions. For example, they can assess multiple aspects or symptoms of a disorder or include various dimensions of one complex construct (Möller, 2014). To use a more specific example, Rosenberg’s self-esteem scale has been interpreted to include at least two dimensions (self-competence, and self-licking), which could qualify it for a multidimensional scale (Eklund et al., 2018). However, according to Eklund et al. (2018), it can be argued that the two dimensions are just the components of self-esteem, and the scale was constructed to produce a single score for self-esteem. As a result, it can be considered unidimensional.
Regarding the more specific types of scales, at least three approaches to unidimensional scales can be named: Thurstone scales, Guttman scales, and Likert scales. Each of them offers a unique solution to the challenge of assigning scores to text statements. The first one involves developing items and presenting them to a number of people (judges) who are supposed to determine their favorability (using an 11-point scale); the judges’ responses are used to determine the scores of the items (Kline, 2015). The items that are agreed upon by judges are included in the final instrument and presented to a subject who is expected to agree or disagree with the chosen items. Thus, a final item can look like the one presented below (Table 1).
Table 1. A Thurstone Scale.
Guttman scales are based on the idea of ordering items, which are supposed to construct a continuous scale of the assessed dimension (Kline, 2015). A very short example is presented below. The items do not have to come to the respondent in this order, but this order would be used to assign scores to them.
- I deserve to be treated well.
- I deserve respect.
- I deserve admiration.
Finally, Likert scales consist of statements that subject rate based on their agreement with said statements (Harpe, 2015). The scales can vary, but they generally include either five or seven points (Kline, 2015). However, Rosenberg’s self-esteem scale uses only four points because it does not employ the neutral option. Examples of the scales are presented below (Table 2 and Table 3).
Table 2. A 7-Point Likert Scale.
Table 3. A 5-Point Likert Scale.
Appropriate Response Scale for Self-Esteem
Multidimensional scales can produce a more nuanced assessment of a construct, but unidimensional scales are easier to apply and interpret. When one dimension of a construct is considered, unidimensional scales are a better choice (Möller, 2014), which is why they are selected for the present paper that targets self-esteem. Rosenberg’s self-esteem scale proves that a unidimensional approach to self-esteem is possible (Eklund et al., 2018), which supports this decision.
As for the specific type of scale, the present paper suggests using a Likert scale. As pointed out by Kline (2015), when compared to Thurstone and Guttman scales, Likert ones are more feasible. Thurstone scales require a large number of judges, and their sampling is also problematic. Guttman scales presuppose assuming that participants agree with all the items that precede the one that they have chosen, and the development of these scales can result in items that are too widely spaced or not unidimensional. On the other hand, a Likert scale consists of scores that are linearly related to the measured dimension, which, according to Kline (2015), makes unwarrantable assumptions unlikely. Furthermore, Likert scales are well-established and frequently used in various contexts (Harpe, 2015; Kline, 2015). The fact that Rosenberg’s self-esteem scale uses a four-point Likert scale supports that this decision is feasible (Eklund et al., 2018). Thus, the present paper will apply a Likert scale to the concept of self-esteem.
Sample Items
It is suggested to include a large Likert scale (with seven points) because larger Likert scales improve the reliability of responses (Kline, 2015). Apart from that, Kline (2015) also recommends using graphic scales in which the numbers are set apart to make the use of the scale more convenient. The sample items for a self-esteem scale are presented below; they are based on the items from Rosenberg’s self-esteem scale (Eklund et al., 2018). The participants would be instructed to check the statements and rate their agreement with them in accordance with the presented scale from 1 (strongly disagree) to 7 (strongly agree). A higher score would mean greater self-esteem; some items (item three) will have the reverse score (see Table 4).
Table 4. Sample Items for the Self-Esteem Scale to Be Developed.
Rauthmann’s Proposed Item Format Taxonomy Analysis
Rauthmann (2011) proposes an item format taxonomy that incorporates several key dimensions that are going to be applied to the presented sample items to determine their ability to reflect a person’s self-esteem. From the perspective of the general format, the sample items (see Table 4) are based on the valency approach: they focus on the respondents’ feelings, which are indicated directly within the items. In addition, the final item combines frequency and valency. From the perspective of the point of reference, all the items use the first person: they require the respondent to analyze their own feelings.
The items are not conditional (do not include any particular conditions to be considered). As for their construct indicator, it can be suggested that they are all attributable: worthiness and being worthy of respect are attributes, and the act of having good qualities can also be viewed as an attribute since it is a descriptor of one’s character. Thus, the three items invite the participants to assess their own traits, which, as demonstrated by Rosenberg’s self-esteem scale (Eklund et al., 2018), are indicative of a person’s self-esteem. Therefore, the proposed items can indeed assess self-esteem, and Rauthmann’s (2011) taxonomy can assist with the analysis of item validity.
Conclusion
Validity and reliability are the common concerns of item development, but there are the means of addressing them. In this paper, Rauthmann’s (2011) taxonomy was used to this end. Apart from that, there are different types of scales that can be employed with items, including unidimensional and multidimensional ones, as well as more specific approaches. For the present paper, a unidimensional Likert scale was chosen to assess the concept of self-esteem due to its benefits and appropriateness for the task. As shown by Rauthmann’s (2011) taxonomy, the developed items should produce the information about a respondent’s feelings towards oneself, which are shown to be indicative of self-esteem by Rosenberg’s self-esteem scale.
References
Eklund, M., Bäckström, M., & Hansson, L. (2018). Psychometric evaluation of the Swedish version of Rosenberg’s self-esteem scale. Nordic Journal of Psychiatry, 72(5), 318-324. Web.
Harpe, S. (2015). How to analyze Likert and other rating scale data. Currents in Pharmacy Teaching and Learning, 7(6), 836-850. Web.
Kline, P. (2015). A handbook of test construction. New York, NY: Routledge.
Möller, H. (2014). Observer rating scales. In G. Alexopoulos, S. Kasper, H. Möller & C. Moreno (Eds.), Guide to assessment scales in major depressive disorder (pp. 7-22). Cham, Switzerland: Springer International Publishing.
Rauthmann, J. (2011). Not only item content but also item format is important: Taxonomizing item format approaches. Social Behavior and Personality: An International Journal, 39(1), 119-128. Web.
Ziegler, M., & Hagemann, D. (2015). Testing the unidimensionality of items. European Journal of Psychological Assessment, 31(4), 231-237. Web.