A Computer Program for Classical Item Analysis

Seock-Ho Kim
The University of Georgia

Jun 18, 1999

CIA files are now available from: https://shkim.myweb.uga.edu
Send all correspondence to Seock-Ho Kim, The University of Georgia,
325 Aderhold Hall, Athens, GA 30602-7143 (shkim@uga.edu).

Abstract

This paper describes a computer program for classical item analysis for tests that consist of multiple-choice or true-false items. In addition to item statistics for each item response, the program provides summary statistics of the total score, coefficient alpha, and test scoring results. The cross classification of quintile group by item response can be obtained optionally.

Introduction

Item analysis is the process to evaluate the effectiveness of items in a test by exploring the examinees' responses to each item. It can be noted that, depending on the specific purpose of testing, the effectiveness of items can be demonstrated by many different ways. In the context of classical test theory (e.g., Gulliksen, 1950/1987) item analysis is performed in general for tests with multiple-choice items. In this case the item analysis procedure provides such useful information as the difficulty of each item, the discrimination power of the item, and other properties of choices or distractors (Henrysson, 1971).

In several past decades the field of educational measurement and psychometrics have witnessed the continual development, sophistication, and applications of the various models for item responses as well as transition of test theory and test construction practice. Consequently, a number of different item analysis techniques have been suggested. For example, item analysis can be visually as amusing as Wainer (1989), mathematically and statistically as complicate as Thissen, Steinberg, and Fitzpatrick (1989), and multiculturally as challenging as Holland and Wainer (1993). In addition, item analysis in a computerized adaptive testing situation seems to require extremely detailed articulations (see Wainer, et al., 1990).

Although current computer technology and software development have made all above mentioned item analysis techniques quite feasible using personal computers, item analysis and reliability estimation methods presented in elementary educational and psychological measurement texts are largely based on crude approximation methods developed a long time ago when even hand-held calculators were not available. Especially, the indices of the item discrimination power presented in most introductory texts are based on partial information obtained from the subset of examinees (e.g., Kelley, 1939). This practice based on the computational simplicity may not have a standing ground in the current information age.

The computer program described in this paper is designed to provide students in measurement courses and measurement practitioners with a sound and user friendly means of analyzing tests with multiple-choice or true-false items. It should be noted that, however, the program has not been developed as an alternative to the highly efficient and versatile computer programs available commercially and used in operational testing facilities (e.g., Cohen, 1989; Coffman, 1971). It should be noted that item analysis is merely a part of a whole test development process. Emphases should be placed on scholarship, ingenuity, and painstaking effort on the part of item writers instead of mechanical use of item analysis (Davis, 1951).

Item and Test Analyses

The computer program provides classical item statistics including item difficulty and item discrimination by way of item and total score correlations. Both point biserial and biserial correlations are available as item discrimination indices. For each choice/alternative, the proportion of examinees who select the choice, the point biserial and biserial correlations are obtained. Blank and other responses are categorized as the omitted, unreached, and invalid responses. The same statistics are obtained for each of the omitted, unreached, and invalid responses.

The computer program also provides summary statistics of the total score; the number of examinees, the number of items, the mean, variance, standard deviation, minimum, and maximum of the total score. For reliability estimation, coefficient alpha (i.e., Kuder Richardson formula 20) and the standard error of measurement are obtained. In addition, mean item difficulty and mean item discrimination indices are obtained. As test scoring results, a test score, the number of omitted items, the number of unreached items, and the number of invalid responses are available for each examinee.

The two options are available which may increase the utility of the program. First, it is possible to obtain non-spurious item and total score correlations. The point biserial and biserial correlations can be calculated using the item-excluded total score. Second, the cross classification of quintile group by item response can be obtained for each item. Quintiles are formed based on the total score, and examinees are divided into equal fifths, that is, quintile groups. The range of the total score for each quintile group and the average score of the examinees who select each item response can be obtained.

Program Capabilities

The program is written in FORTRAN to run on IBM-PC or compatible computers using DOS or the DOS mode in Windows. It has a maximum capacity of 200 dichotomous items. No item may have more than nine possible choices, but successive items need not have the same number of choices. The maximum number of examinees is 10,000. A subset of items can be analyzed using a FORTRAN input format specification.

The program runs interactively with the user supplying the necessary specifications. The responses can be written in a batch file and executed using the redirection feature of DOS.

Availability

The manual in ASCII and in LaTeX, the FORTRAN source code, object code, the executable program, and the example data files are available at no charge. You can download the CIA related files from my web page (i.e., https://shkim.myweb.uga.edu). Questions and comments can be sent to Seock-Ho Kim, Department of Educational Psychology and Instructional Technology, The University of Georgia, 325 Aderhold Hall, Athens, GA 30602-7143, U.S.A. The manual and the program files can also be obtained through Internet by sending a request email to shkim@uga.edu.

References

Coffman, W. E. (1971). The achievement tests. In W. H. Angoff (Ed.), The College Board Admissions Testing Program: A technical report on research and development activities relating to the Scholastic Aptitude Test and Achievement Tests (pp. 49-77). New York: College Entrance Examination Board.

Cohen, A. S. (1989). Catalog of services. Madison: University of Wisconsin, Testing and Evaluation Services.

Davis, F. B. (1951). Item selection techniques. In E. F. Lindquist (Ed.), Educational measurement (pp. 266-328). Washington, DC: American Council on Education.

Gulliksen, H. (1987). Theory of mental tests. Hillsdale, NJ: Erlbaum. (Reprinted from Theory of mental tests, by H. Gulliksen, 1950, New York: Wiley)

Henrysson, S. (1971). Gathering, analyzing, and using data on test items. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 130-159). Washington, DC: American Council on Education.

Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Erlbaum.

Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. The Journal of Educational Psychology, 30, 17-24.

Thissen, D., Steinberg, L., & Fitzpatrick, A. R. (1989). Multiple-choice models: The distractors are also part of the item. Journal of Educational Measurement, 26, 161-176.

Wainer, H. (1989). The future of item analysis. Journal of Educational Measurement, 26, 191-208.

Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., Mislevy, R. J., Steinberg, L., & Thissen, D. (1990). Computerized adaptive testing: A primer.. Hillsdale, NJ: Erlbaum.


File translated from TEX by TTH, version 2.00.
On 18 Jun 1999, 09:51.