Psychological Testing and Assessment text, you read about three sources of error variance that occur in testing and assessment. These include test construction, test administration, and test scoring and interpretation. Additionally, other sources of error may be suspect. You were also introduced to reliability coefficients, which provide information about these sources of error variance on a test (see Table 5-4).
The following reliability coefficients were obtained from studies on a new test, THING, purporting to measure a new construct (that is, Something). Alternate forms of the test were also developed and examined in subsequent studies published in the peer-reviewed journals. The alternate test forms were titled THING 1 and THING 2. (Remember to refer back to your Psychological Testing and Assessment text for information about using and interpreting a coefficient of reliability.)
· Internal consistency reliability coefficient = .92
· Alternate forms reliability coefficient = .82
· Test-retest reliability coefficient = .50
In your post:
· Describe what these scores mean.
· Interpret these results individually in terms of the information they provide on sources of error variance.
· Synthesize all of these interpretations into a final evaluation about this test’s utility or usefulness.
· Explain whether these data are acceptable.
· Explain under what conditions they may not be acceptable and under what conditions, if any, they may be appropriate.
|Type of Reliability||Purpose||Typical uses||Number of Testing Sessions||Sources of Error Variance||Statistical Procedures|
|· Test-retest||· To evaluate the stability of a measure||· When assessing the stability of various personality traits||· 2||· Administration||· Pearson r or Spearman rho|
|· Alternate-forms||· To evaluate the relationship between different forms of a measure||· When there is a need for different forms of a test (e.g., makeup tests)||· 1 or 2||· Test construction or administration||· Pearson r or Spearman rho|
|· Internal consistency||· To evaluate the extent to which items on a scale relate to one another||· When evaluating the homogeneity of a measure (or, all items are tapping a single construct)||· 1||· Test construction||· Pearson r between equivalent test halves with Spearman Brown correction or Kuder-R-ichardson for dichotomous items, or coefficient alpha for multipoint items or APD|
|· Inter-scorer||· To evaluate the level of agreement between raters on a measure||· Interviews or coding of behavior. Used when researchers need to show that there is consensus in the way that different raters view a particular behavior pattern (and hence no observer bias).|