Introduction
Scientists use the item response theory (IRT) in psychological assessment tests. Researchers define the item response theory as a modern measurement model that evaluates the margin of error (or reliability) of test scores (Bond, 2007). Different types of item response models exist. One type is the Rasch model. The Rasch model gains the characteristics of IRT when measuring dichotomous data (like yes/no responses). Similarly, the mathematical assessment method used in the Rasch model resembles other models used in the item response theory (Bond, 2007).
However, some experts perceive the Rasch model to be different from other IRT models because of its unique properties. For example, Von Davier (2007) explains that the Rasch model differs from other IRT models because other models cannot separate the effects of personability from test scores. This ability usually has a significant implication on final test scores. Von Davier (2007) adds, “this uniqueness stems from the mathematical embodiment of invariant comparison” (p. 16).
Broadly, the Rasch model seeks to determine three things – the biasness of items, the definition of measured traits, and the measurement of persons (defined by the measured trait) (Bond, 2007). One basic assumption of the model is its stipulation that an able person has a better probability of succeeding on an item than a less able person. The same model also assumes better chances of success for people who use easy items, as opposed to a complex one (Bond, 2007).
Generally, scientists who undertake psychometric evaluations use the Rasch model to obtain an accurate outcome. Often, researchers measure abilities, attitudes, and personality traits (using this model). In addition, educators have used this model to evaluate students’ reading abilities and the extremity of individual attitudes. Recent developments have seen the healthcare sector also apply the model in medical research.
Based on the growing application of the Rasch model, this paper provides a critical analysis of the model to establish its weaknesses, threats, criticisms, and competencies. Three sections define this analysis. The first section embodies the most positive appraisals about the model and the second section defines the negative appraisals about the model. Finally, the third section provides an analysis of both the positive and negative appraisals.
Positive Appraisal
Proponents of the Rasch model argue that its property provides a highly accurate criterion for evaluating a person’s ability (or trait) (von Davier, 2007). Proponents of the Rasch model also say it can determine a person’s location and item bias because it provides a perfect “item fit.” This analysis not only shows that Rasch models may determine poorly functioning items, but also shows that the model allows the investigation of test score bias. This is especially profound when it is possible to isolate linguistic, cultural, and gender issues in the test.
Another positive attribute about the Rasch model is its ability to provide a sample-free test. Ordinarily, traditional “item difficulty” values depend on the nature and size of the sample, but the Rasch model may adjust the “item fit” to allow for the provision of a sample-free test (Gyll, 2012). This unique attribute is especially crucial in some psychometric tests (like the office specialist program) because such tests cannot guarantee representative samples.
The Rasch model is equally able to provide an accurate item precision and sample size with a small sample of about 100 examinees. Such tests may provide a standard error of about 12 logits and my position within a logit (or two) of the average ability level (Wright, 1977). Finally, Bond (2007) believes that a fundamental advantage of the Rasch model is its ability to create one-dimensional lines, where different people can trace their abilities and weaknesses in the same continuum. Comprehensively, these attributes define the positive attributes of the Rasch model.
Negative Appraisal
Some critics fault the application of the Rasch model in educational research for its inability to use multiple-choice items. Therefore, it is very difficult to make informed guesses in such situations because if examinees make informed guesses, the asymptote almost shows a zero probability (usually occurring in two-parameter (or three-parameter) models) (Gyll, 2012). Despite the existence of this weakness, it is crucial to point out that the specification of zero probability asymptote is a critical part of the model’s characteristics. This is also true for uniform discriminating.
Critics have also faulted the Rasch model for using raw test scores. Often, some people compare the use of raw test scores with standard scores because they say raw test scores cannot provide nationally representative samples (Gyll, 2012). Therefore, the Rasch model is weak in this regard. Similarly, it is difficult to place test scores (derived from the Rasch model) on a readily understandable scale and compare the performance of two groups of participants with different attributes (Bond, 2007).
Some pundits also criticize the Rasch model for having the same discrimination gap across all items. Researchers like Bollinger & Hornke (1978) advance this criticism. These researchers have expressed their reservations about the Rasch model because they deem it impractical to use the model when the discrimination is zero. In such a situation, the Rasch model would show that a person’s ability equals the item’s difficulties (to fit the model’s design). However, this analysis demonstrates bias because it resembles tossing a biased coin to determine the outcome of a test.
Finally, critics fault the Rasch model as an impractical measurement model because it does not use actual data (Wright, 1977). Instead, the model tries to fit data into models (a data-prescriptive approach) and since not all data may fit, it is impractical to adopt the model in all situations. Wright (1977) explains that the Rasch model can only use actual data when there are large samples, but his opinion has been refuted by researchers who have obtained accurate data on small sample sizes (like 100 people).
Analysis
Most of the criticism leveled against the Rasch model center on its mathematical foundation and its provision of a general solution for measurements. However, it is crucial to highlight the weaknesses of these criticisms. For example, many critics have said that the model uses raw scores for measurement, but they fail to recognize that these scores are sufficient for measuring a person’s ability. Similarly, for critics who question the mathematical foundation of the model, it is crucial to point out that the Rasch model may compare with the Pythagorean theory because it adopts a similar arithmetic derivation as the Pythagorean theory. Indeed, the Rasch model derives linear measures from ordering qualities (similar to the Pythagorean theory) (Gyll, 2012, p. 3). Through this understanding, it is difficult to disapprove of the mathematical basis of the Rasch model.
Critics who say that the Rasch model provides an inaccurate and impractical solution to psychometric measurement also fail to recognize that the model produces similar results as complex IRT models. In fact, the same model may produce similar results as simpler models, such as, the classical test theory (Gyll, 2012, p. 3). In addition, the failure of Rasch models to factor guesses may pose an advantage to certain tests (like office specialist exams), which do not incorporate guesses. In fact, in such circumstances, too many variations complicate the documentation of possible responses. Lastly, Licarce (1996) counters criticisms that Rasch models provide equal discrimination between items by demonstrating that the presence of an unequal discrimination produces item malfunction. The Rasch model, therefore, prevents such malfunctions by providing equal discrimination between items (Gyll, 2012).
Conclusion
After weighing the pros and cons of the Rasch model, the advantages of adopting the model in test development overcome its criticisms. Some of its advantages provide a traditional and modern approach to testing reliability and item difficulty. However, the main advantage associated with the model is its ability to provide an object-free calibration of instruments. This advantage allows it to generalize data across a large sample population (beyond the calibration sample) (Gyll, 2012).
Comprehensively, most of the criticism leveled against the Rasch model has significant faults and misconceptions. However, like other scientific models, the Rasch model is not perfect, but for purposes of future exam integrity and security, it is vital to appreciate the ability of the Rasch model to equate items onto a common scale and replace poorly functioning scales. These dynamics underlie the applicability of the Rasch model.
References
Bollinger, G. & Hornke, L.F. (1978). The relationship between item discrimination and Rasch scalability. Archiv für Psychologie, 130, 89-96.
Bond, T. (2007). Applying the Rasch Model: Fundamental Measurement in the Human Sciences. London: Routledge.
Gyll, S. (2012). Advantages of the Rasch Measurement Model In Test Development and Analysis. Web.
Licarce, J. M. (1996). The Rasch model cannot be “disproved”! Rasch Measurement Transactions, 10(3), 512-514.
Von Davier, M. (2007). Multivariate and Mixture Distribution Rasch Models: Extensions and Applications. New York: Springer.
Wright, B. (1977). Misunderstanding the Rasch Model. Journal of Educational Measurement, 14(3), 219-225.