Testing is an interesting process. One has to make sure that all the proverbial ducks are in a row. There needs to be variability and reliability, and this is particular true for test scores. Things like test length and item difficulty have an effect, as well. Understanding all this will invariably impact the way one designs a test. Most importantly, one needs to understand all these individual elements to produce a better test.
Variability and reliability are important aspects of education in many ways i.e. research. It is particularly important for test scoring. What is variability and reliability? Variability refers to how likely something will change, while reliability refers to repeatability. The group variability would depend on how spread out the data is. Kubiszyn and Borich (2016) discuss this concept, for “group variability affects the size of the reliability coefficient; higher coefficients result from heterogeneous groups than from homogenous groups” (p. 310). Reliability depends on constancy. It means that a test score is stable over repeated distributions; good score reliability means stable scores over repeated administrations, as long as the measured trait has not changed (Kubiszyn & Borich, 2016, p. 312). One needs to make sure the test they use is secure.
Scoring is not only affected by group variability and reliability; intriguingly, test length and item difficulty are elements that will also affect scoring. Kubiszyn and Borich (2016) explain that “test length affects score reliability; as test length increases, the test’s score reliability tends to go up” (p. 313). This is because the more items on the test, the scores become more inconsistent. This could mean higher or lower scores. Also, “when more items are added to a test, the test is better able to sample the student’s knowledge of the attribute being measured” (Kubiszyn & Borich, 2016, p. 311). More tends to be better; however, that doesn’t mean just throw in meaningless questions. Otherwise, it would mess with the scores validity and reliability. Item difficulty is another mitigating factor. “As tests become very easy or hard, distributions are homogeneous, significant shifting of ranks and lowering of the correlation coefficient will occur”; when tests are made too much difficult, guessing is encouraged, which again lowers the reliability (Kubiszyn & Borich, 2016, p. 312).
Kubiszyn and Borich (2016) relay several principles that interpret the reliability coefficients:
a) group variability affects test score reliability. As group variability increases, score reliability goes up; b) scoring reliability limits test score reliability. As scoring reliability goes down, so does the test’s reliability; c) test length affects test score reliability. As test length increases, the test’s score reliability tends to go up; d) item difficulty affects test score reliability. As items become very easy or very hard, the test’s score reliability goes down. (pp. 312-313).
These principles will help test producers mitigate the concerns of scoring variability. One should be able to make the necessary changes that will help improve testing. Nonetheless, no test will be perfect. This is something that everyone must accept. There is no perfection, only striving for perfection.
Kubiszyn, T., & Borich, G., D. (2016). Educational testing & measurement: Classroom applications and practice (11thed.). Hoboken, NJ: Wiley.
“A test’s reliability and validity will vary depending on the purpose of the test and the population with which it is used” (Kubiszyn & Borich, 2016, p. 315). A quality assessment not only takes time and effort but also planning. Group variability, scoring reliability, test length and item difficulty all affect assessment reliability. Creating assessments that match instructional objectives is essential. That’s the goal isn’t it? Creating assessments that reflect what has been taught.
Each semester/year we will have a new class of students that will all not always be exact, on the same level or the same size and this directly influences the validity coefficient. Similarly, as the number of test items increases so does the variability. It’s like a 10-question test and a 100-question test. This group variability is important to keep in mind as all of these factors influence one another.
Reliability is consistently getting the same results after repeatedly administering the same assessment. This is important since these results can assist teachers in placement of students. A new student from another country will most likely not score the same as a native of this country. That doesn’t mean the test is bad or unreliable instead it is score reliability. A few methods to test reliability are test-retest, alternative form and internal consistency.
How long should a test be? Creating clear, organized and neatly arranged tests is critical. Grouping like items together, T/F or multiple choice, positioning graphics near those questions, avoiding overcrowding, and arranging from easy to hard is a good way to start creating a well-constructed assessment. As the length increases so does the reliability. Missing one on a 10-question quiz or a 50 question quiz makes a big difference. Creating assessments with enough length but not too much or too little. Ensuring material was understood and tested accordingly.
Have you taken a test that you felt great or not so great about? Yes, I think we all have experienced both of these. Test planning and construction take time and effort. As long as we are matching our instructional objectives to this we should have no problem. It is something we will get feedback from and use scores to revise and refine for the next time. Dividing the number of students who got an answer correctly by the total number of students configures item difficulty. We want items to be difficult but not to the extent that everyone gets it incorrect.
Overall, there are several variables that go into developing and planning assessments. It is critical to be aware of these variables to create a well thought, planned and structured assessment. Over time I will refine areas to have excited, motivated and excelling students. “The body is a unit, though it is made up of many parts; and though all its parts are many, they form one body” (1 Corinthians 12:12, New International Version).
Kubiszyn, T. & Borich, G. D. (2016). Educational testing & measurement: Classroom application and practice (11th ed.). Hoboken, NJ: Wiley.