< - - Back to Measurement matrix MEGs Search
Quick Links: Sample Response Rate Internal Consistency DIF Method DIF Evidence DIF Impact Expert Opinion
![]()
Azocar et. al. ( see abstract )
Beck Depression Inventory (BDI)
Name/ Reference
Azocar F, Areán P, Miranda J, Muñoz RF. Differential Item Functioning in a Spanish Translation of the Beck Depression Inventory. Journal of Clinical Psychology. 2001; 57(3):355–365.
Source contact info
Francisca Azocar, Ph.D., United Behavioral Health, 425 Market St., 27th Floor, San Francisco, CA 94105–2426
Availability (private or public)
Public
At the time of the study, no official Spanish translation was available.
Conceptual framework
“To examine whether a Spanish-translated version of the BDI is an equivalent measure of depression in Spanish- and English-speaking medical patients by comparing item functioning in Spanish-speaking Latinos and English-speaking U.S. nationals.”
Purpose of measure & application (clinical, research, survey, screening)
The BDI is a self-report measure for assessing severity of depression that is commonly used in primary-care settings.
Sample characteristics
Subjects (N=292) who completed the BDI as part of the screening for the Depression Prevention Research Project (DPRP) were included for this study; 55 were literate only in Spanish; 237 were English-speaking. The Latinos sampled were from Nicaragua (28%), Mexico (23%), El Salvador (15%), and the remainder were born in the United States or in other Latin American countries. Age ranged from 18-69 years, and about 70% earned less than 10,000 dollars a year. The Spanish- and English-speaking groups did not differ significantly in demographic characteristics. Approximately 19% met criteria for lifetime major depression and 15% met criteria for current major depression.
Recruitment methods
“Study subjects were originally recruited for participation in the Depression Prevention Research Project (DPRP). Participants were medical patients seen at the outpatient general medical clinics at both the University of California, San Francisco and at San Francisco General Hospital. Exclusion criteria were (1) current mental health treatment, (2) illiteracy, (3) terminal illness, and (4) psychosis.” (pg 358)
Data collection method
The screening and demographics interview occurred at the first visit to the medical clinic. "On the second visit, interviewers administered the Diagnostic Interview Survey and the Beck Depression Inventory. At the third visit, interviewers administered a series of paper-and-pencil questionnaires including a biculturality scale developed by the research team which assessed language, cultural activities, and contact." (pg 358)
Response rate Not provided
Format & design (readability, # of items, time to complete, response categories)
See below
Type of measurement (nominal, ordinal, interval, ratio)
See below
Scoring (range, direction, rules, missing data)
The BDI scale consists of 21 self-report items, each with three symptom-choices reflecting a respondent’s experience over the course of a week. Scores above 16 suggest moderate depression, scores above 24 suggest severe depression.
Availability of translations & source
No official Spanish translation of the BDI was available at the time of the study. Since then, Beck (1993) released a Spanish-translated version of the BDI. The authors claim Beck’s version is similar to the translation for the study. However, the authors noted that two of the items for which item bias was identified have semantically different meanings between the two translations.
“All measures used in the DPRP study (including the BDI) were translated by the DPRP research team of mental health professionals and students using Brislin’s (1976) method (see ref #7 below). The instruments were first translated into Spanish, and then independently translated back into English by a group blind to the original English version. The original and the back-translated versions were then compared and any differences were discussed and reconciled.” (pp 358-359)
Psychometric Properties:
Scale construction
Not provided
Basic summary statistics
The mean (sd) BDI-scale score for the English speaking group = 14.87 (10.13), and for the Spanish-speaking group= 14.60 (11.10).
Variability
Not provided
Test-retest reliability
Not provided
Interrater reliability
Not provided
Internal consistency
The alpha coefficient was .89 in one study (see ref #3 below)
In this sample, two factors were extracted from the BDI items: an affective and a somatic factor. “Using a principal-components analysis with varimax rotation, the two factors were replicated in the two language subsamples. The coefficient alpha for the entire scale was .97, with the affective and somatic subscales showing coefficients of .97 and .93, respectively.” (pg 360)
Content validity
According to the authors, published reports on the cultural validity of Spanish-translated versions of the BDI for the general U.S. population have not been published.
Construct validity
Not provided
Concurrent validity
Not provided
Predictive validity
Not provided
Sensitivity to change
Not provided
Differential Item Functioning (DIF):
Variable studied (e.g., groups)
Language is the studied variable. The reference group is the English-speaking sample and the focal group is the Spanish speaking sample.
Sample size
N=292; 55 Spanish speaking; 237 English-speaking.
DIF method used
(e.g., MH, IRT, Logistic regression, MIMIC, other factor analysis)
Mantel-Haenszel for Ordered Response Categories (Mantel, 1963; ref # 9) was used together with an extended standardization procedure using the z statistic (Dorans and Schmitt, 1991; ref # 8). The Bonferroni correction was used to control for multiple comparisons.
The expected item-level scores at each strata are computed using frequencies to create a weighted average score. These scores are then compared between the English-speaking and Spanish-speaking groups, creating a z score, which provides information about the magnitude of difference between groups in their response to the item. The results are interpreted by size and direction of the z score. A z score that is significantly large and positive indicates that the focal group is more likely to endorse a higher score on this item. A negative z score indicates that the focal group is less likely to endorse a higher category on this item. (pg 360)
Test of model assumptions
Using a principal-components analysis with varimax rotation, the two factors extracted (affective and somatic) were replicated in the two language subsamples.
There was no significant difference in the mean BDI-scale score between the English speaking group, 14.87 (SD= 10.13), and the Spanish-speaking group, 14.60 (SD=11.10).
Purification
Not performed.
Evidence of uniform DIF
Four BDI items were biased for the Spanish speaking sample: “I feel like I am being punished”, with a z= 2.86, p < .003; “I feel like crying”, with a z=2.07, p< .05; and “I believe I look ugly”, with a z=2.16, p< .01. These were more likely to be endorsed by Spanish speakers regardless of their level of depression. Conversely, the Spanish speakers were less likely to endorse “I can’t do any work at all”, z=-2.34, p< .01, regardless of level of depression.
Evidence of non-uniform DIF
Not estimated with this method
Magnitude of DIF
Not discussed
Impact of DIF
“If a Latino patient is not depressed, yet endorses Items 6 (punished), 10 (crying), and 14 (ugly), the BDI score could be as much as nine points higher than a non depressed Anglo patient. Biased items artificially increase or, as with Item 15 (can't walk), decrease the total score of the scale. Thus, Latino samples with depression levels equal to English-speaking samples would have mean scores up to six points greater”. (pg 363)
Limitations according to the authors:
1. “The inclusion of a Latino bilingual group would have allowed the comparison of language differences within one ethnic group.” (pg 363)
2. “Bilingual fluency was not assessed in this study; rather patients chose which language they preferred to be interviewed in, based on their comfort level with either language”. (pg 363)
3. “Although biculturality was measured, its effect could not be examined given that the sample that returned for the third interview was too small (n=45)”. (pg 363)
4. “The acculturation measure was not administered to the White English-speaking sample, making it impossible to examine the effect of acculturation on DIF.” (pg 363)
5. “Future studies examining the DIF between English- and Spanish-speaking samples should include both a Latino English speaking sample and monolingual Spanish- and English-speaking samples. This would allow an examination of the differential effects of language and ethnicity.” (pg 363)
Limitations based on the expert review:
1. The sample size of the focal group (Spanish speaking) was small (n=55).
2. Assumptions of the model were not explicitly discussed. While non-parametric models have few assumptions, the lack of unidimensionality suggested by the factor analyses may be a problem because lack of unidimensionality can result in inaccurate DIF detection.
3. Purification was not performed.
4. Only uniform DIF was tested.
5. The magnitude of DIF was not discussed.
Strengths according to the author:
1. “The study highlights the issue of cultural sensitivity and appropriateness of the scale when translations are performed.” (pg 363)
2. “The article emphasizes the need for caution in the translation of items reflecting a depressed mood in particular, in order to avoid introducing bias due to semantic differences”. (pg 363)
Strengths based on the expert review:
1. The article identifies two potentially semantically different items in two independent translations of the BDI. A specific Spanish-translated version could make a difference when item bias is examined in the context of language use.
2. The article provides a good contextual discussion, anchored in the Latino cultural norms, of putative causes of DIF at the item level.
3. Impact of DIF was evaluated.
Key references:
1. Beck, A.T., Ward, C.H., Mendelsohn, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561–571.
2. Beck, A.T. (1993). Beck Depression Inventory: Spanish translation. San Antonio, TX : The Psychological Corporation, Harcourt Brace.
3. Beck, A.T., Steer, R.A., & Garbin, M.C. (1988). Psychometric properties of the Beck Depression Inventory: Twenty-five years of evaluation. Clinical Psychology Review, 8, 77–100.
4. Bernal, G., Bonilla, J., & Santiago , J. (1995). Confiabilidad interna y validez de construcción lógica de dos instrumentos para medir sintomatología psicológica en una muestra clínica: El inventario de depresión Beck y la Lista de Cotejo de Síntomas—36. Revista Latinoamericana de Psicología, 27, 207–29.
5. Gatewood-Colwell, G., Kaczmarek, M., & Ames , M.H. (1989). Reliability and validity of the Beck Depression Inventory for a white and Mexican-American gerontic population. Psychological Reports, 65, 1163–1166.
6. Torres-Castillo, M., Hernández-Malpica, E., & Ortega-Soto, H.A. (1991). Valuates y reproducibilidad del Inventario para Depression de Beck en un hospital de cardiología. Salud Mental, 14, 1– 6.
7. Brislin, R.W. (1976). Translations: Applications in research. New York: Wiley/Halstead.
8. Dorans , N.J. , & Schmitt, M.P. (1991). Constructed response and differential item functioning: A pragmatic approach. Princeton: ETS.
9. Mantel, N. (1963). Chi-square tests with one degree of freedom: Extensions of the Mantel Haenszel procedure. Journal of the American Statistical Association, 58, 690–700.10. Muñoz, R.F., & Ying, Y. (1993). The prevention of depression: Research and practice. Baltimore, MD: Johns Hopkins University Press.
( see abstract )
![]()
Back to TOP Measurement matrix MEGs Search