< - - Back to   Measurement matrix         MEGs Search

Quick Links:   Sample   Response Rate   Internal Consistency    DIF Method    DIF Evidence    DIF Impact    Expert Opinion

 

Cole et. al.   ( see abstract )

Center for Epidemiologic Studies-Depression Scale (CES-D)

Name/ Reference

Cole SR, Kawachi I, Maller SJ, Berkman, LF. Test of item-response bias in the CES-D scale: experience from the New Haven EPESE Study. Journal of Clinical Epidemiology. (2000) 53: 285–289.

Source contact info

Stephen R. Cole,  Tel.: (617) 278-0872; fax (617) 731-3843.

E-mail address : scole@rics.bwh.harvard.edu

Availability (private or public)

public

Conceptual framework

Authors stated that the effect of age, gender and race on the measurement properties of the CES-D remains uncertain. They present results of item-level analyses of invariance to the exogenous variables age, gender, and race on all CES-D items among subjects from the New Haven component of the Established Populations for Epidemiologic Studies of the Elderly (EPESE).

Purpose of measure & application (clinical, research, survey, screening)

quantifies depressive symptoms

 

Sample characteristics

Subjects included in the analyses were (N=2340) from The New Haven EPESE study (N=2812) (one of four NIA-funded studies that randomly sampled community-dwelling men and women 65 years of age or older in 1982 to identify predictors of morbidity, mortality, disability, and placement in long-term care facilities), who responded to all 20 CES-D items and for which information was available on all exogenous variables.

The 2340 subjects were 58% female, 20% black, and 32% educated at or beyond 12th grade. The age distribution was 36% <70 years, 28% 70–74 years, 17% 75–79 years, 11% 80–84 years, and 8%> 84 years.

The 2340 subjects were similar to the 472 removed from analysis with respect to gender, but were more likely to be black (20% vs. 13%, P<0.01) and were less likely to be older (P for trend <0.01).

Recruitment methods

Subjects included in the analyses were (N=2340) from The New Haven EPESE study (N=2812) (one of four NIA-funded studies that randomly sampled community-dwelling men and women 65 years of age or older in 1982.

Data collection method

Not provided

Response rate

2340 (with no missing data) out of total 2812.

Format & design (readability, # of items, time to complete, response categories) The CES-D was scored as a summed scale, each of the 20 items scored between 0 (rarely/none of the time) and 3 (most/all of the time) points; the theoretical range is 0–60. Items 4, 8, 12, & 16 were reversed. Because the CES-D summary score was skewed it was log transformed.

Type of measurement (nominal, ordinal, interval, ratio)

ordinal

Scoring (range, direction, rules, missing data)

See above. No treatment for missing data (only respondents with no missing data in the 20 CES-D items and in all exogenous variables were used in the analyses).

Availability of translations & source

Not provided

Psychometric Properties:

Scale construction Not provided

Basic summary statistics

The average CES-D scale score was 8 (interquartile range: 2, 12); the average log-transformed CES-D scale score was 1.7 (interquartile range: 1.1, 2.6).

Variability

All item responses were skewed towards the Rarely or Never category.

Test-retest reliability

Not provided

Interrater reliability

Not provided

Internal consistency

The internal consistency reliability estimate for the 20-item CES-D, as measured by Cronbach’s alpha, was 0.86, for this sample.

The reduced 17-item CES-D retained a high internal consistency reliability of 0.85.

Content validity

Not provided

Construct validity

Not provided

Concurrent validity

Not provided

Predictive validity

Not provided

Sensitivity to change

Not provided

Differential Item Functioning (DIF):

Variable studied (e.g., groups)

exogenous variables were: age, collected in discrete categories and analyzed as a binary variable (less than 75 vs. 75 or older); gender, and race, collected and analyzed as black or white.

Sample size

N=2340

DIF method used (e.g., MH, IRT, Logistic regression, MIMIC, other factor analysis)

An extension of the Mantel-Hanszel method, a proportional odds regression model was used for polytomous (ordinal) items. The Bonferroni correction was used; the adjusted p-value was < 0.0008.

 

Test of model assumptions

They tested the proportional odds assumption by examining whether the proportional odds score test was <.10. Spearman rank correlations between the item and the exogenous variable, conditioning on CES-D score were examined. (see pg 286)

 Authors mentioned that the items on the CES-D are presumed to tap a single construct, namely depression. It is common practice to report the overall CES-D scale score, which is an implicit acceptance of a unidimensional scale. They cite Hertzog  et al (see references below)  results from confirmatory factor analysis models supporting the use of a total CES-D score.

“However, four subscales (depressed mood, low positive mood, somatic complaints, and interpersonal problems) have been reproduced on various samples to various degrees”.

Purification

The authors discuss that a purified scale has been found in  simulations to be necessary for accurate identification of DIF, but question the need in observed data. Nevertheless, they examined the results using a purified 17 item scale and noted that the item difficulty parameters are not altered.

Evidence of uniform DIF

According to the authors, 17 of the 20 CES-D items were found to be relatively free of item bias by age group, gender, and racial group. Three occurrences of practically meaningful item-level bias (proportional odds ratio >2.0 and Spearman rank correlation >0.10) were observed. “The proportional odds of blacks responding higher on the item “people are unfriendly” were 2.29 times (95% confidence interval: 1.74, 3.02) that of whites matched on overall depressive symptoms. The mean score for the “people are unfriendly” item, adjusted for overall depressive symptoms, was 0.37 (standard error 0.02) for blacks and 0.19 (standard error 0.01) for whites. The proportional odds of blacks responding higher to the item “people dislike me” were 2.96 times (95% confidence interval: 2.15, 4.07) that of whites matched on overall depressive symptoms. Neither of these differences in item difficulty varied by level of depressive symptoms (P for interaction=0.53 for both tests). The mean score for the “people dislike me” item, adjusted for overall depressive symptoms, was 0.25 (standard error, 0.02) for blacks and 0.11 (standard error 0.01) for whites. As these two items combine to comprise the interpersonal problems factor of the CES-D, this item-level bias in favor of blacks matched on depressive symptoms reporting more interpersonal problems carries over as a positive factor level bias, whereby the proportional odds of blacks responding higher on the interpersonal problems subscale were 2.72 times (95% confidence interval: 2.11, 3.51) that of whites matched on overall depressive symptoms. Finally, the proportional odds of women responding higher on the item “crying spells” were 2.14 times (95% confidence interval: 1.60, 2.82) that of men matched on overall depressive symptoms. The mean score for the “crying spells” item, adjusted for overall depressive symptoms, was 0.23 (standard error 0.01) for women and 0.15 (standard error 0.02) for men. This difference in item difficulty by gender did not appear to vary by level of depressive symptoms (p for interaction= 0.86). There was no evidence of any item bias by age group in this sample of elders.” (see pg 287)

“Three of 20 CES-D items were found to function differently among subgroups of gender and race. The two items that comprise the interpersonal problems factor of the CES-D were each biased towards higher endorsement among blacks, after matching on overall depressive symptom score. These item-level biases carried over as a factor-level bias, as these two items combine to comprise the interpersonal problems factor of the CES-D. No evidence of item-level bias by age was observed in this sample of elders.” (see pg 288)

Evidence of non-uniform DIF

Not discussed, although it was estimated through the use of the interaction term for CES-D by studied exogenous factor.

Magnitude of DIF

An effect size cut-point (proportional odds ratio >2.0 or <.5) to define a meaningful level of item bias was used. Because 60 comparisons were made the conservative Bonferroni correction method had been used, the adjusted p-value would have been <0.0008. All three of the biased items had accompanying p-values <0.000.

Impact of DIF

“The 17-item version correlated 0.99 with the full 20-item version. Taking the standard cutpoint of> 16 points on the full 20-item scale as the threshold for diagnosis of clinical depression, the sensitivity and specificity of the reduced 17-item scale varied with the choice of cut-point.”

Limitations according to authors:

1. The authors acknowledge that the lack of evidence of item bias by age might be due to the restricted age range in the sample.

2. Authors suggest that the two interpersonal problem items might contribute to the association between “perception of racial prejudice” and depression.

Strengths by authors:

The use of a proportional odds regression model allowed for examination of DIF in item difficulties and discrimination parameters, and to test for differential factor functions by examining the relationship of the factor score with each studied (exogenous) variable after conditioning on the CES-D score.

Additional strength and limitations according to expert review:

This analysis was carefully executed using a model that can detect both uniform and non-uniform DIF. The authors considered carefully the model assumptions.  The sample size was adequate, and effects of purification were examined. Potential weaknesses include a) no formal tests of the assumption of unidimensionality, b) no discussion of types of DIF, c) the use of an observed rather than latent conditioning variable.

Key references:

1. Radloff LS. The CES-D Scale: a self-report depression scale for research in the general population. Appl Psychol Meas 1977;1:385–401.

2. Liang J, Tran TV, Krause N, Markides KS. Generational differences in the structure of the CES-D scale in Mexican Americans. J Gerontol 1989;44(3):S110–20.

3. Hertzog C, Van Alstine J, Usala PD, Hultsch DF, et al . Measurement properties of the Center for Epidemiological Studies Depression Scale (CES-D) in older populations. Psychol Assess 1990;2(1):64–72. 

4. Gatz M, Hurwicz M-L. Are old people more depressed? Cross-sectional data on Center for Epidemiological Studies Depression Scale factors. Psychol Aging 1990;5(2):284–90.

5. Callahan CM, Wolinsky FD. The effect of gender and race on the measurement properties of the CES-D in older adults. Med Care 1994;32:341–56.

6. Stommel M, Given BA, Given CW, Kalaian HA, Schulz R, McCorkle R. Gender bias in the measurement properties of the Center for Epidemiologic Studies Depression Scale (CES-D). Psychiatry Res 1993;49:239–50.

( see abstract )

 Back to   TOP    Measurement matrix         MEGs Search