DELV™ Psychometric Properties

The Diagnostic Evaluation of Language Variation–Norm Referenced edition (DELV–Norm Referenced) has undergone various analyses to establish its reliability and validity as a diagnostic test for identifying speech and language disorders in children aged 4 years 0 months through 9 years 11 months.

The information below is a condensed version of the complete reliability and validity evidence presented in DELV examiners manual. For more information, please contact support@ventrislearing.com.

Evidence of Reliability

Reliability refers to the accuracy, consistency, and stability of test scores. The DELV–Norm Referenced edition demonstrates reliability through several measures:

Test-Retest Stability: This was assessed using Pearson’s product-moment correlation coefficient with 239 children aged 4 years 0 months through 9 years 11 months (mean age: 6 years 11 months). The interval between tests was 6 to 28 days (mean of 18 days. Average test-retest coefficients were calculated using Fisher’s z transformation. The correlation coefficients were corrected for the variability of the standardization sample. Corrected coefficients range from .71 (adequate) to .96 (excellent), with most in the good range (.80s).
Internal Consistency: This indicates the extent to which items within a domain measure a single construct.
- Coefficient Alpha: For the standardization sample, average coefficient alphas across age groups for the domains ranged from .75 to .93, suggesting adequate-to-excellent reliability. Reliability coefficients for the Composite Standard Score ranged from .81 to .92 and were generally higher, as the composite summarizes a broader range of abilities. Reliability coefficients for clinical groups were similar to or higher than those for the standardization sample.
- Split-Half Method: The average reliability coefficients for the domains of the DELV–Norm referenced edition range from adequate (.78) to excellent (.92) for the standardization sample. The average reliability coefficients for the Syntax and Pragmatics domains are good (.83 and .82, respectively). The average reliability coefficient for the Phonology domain is excellent (.92). The average reliability coefficient for the Semantics domain is adequate (.78). The reliability coefficient for the Composite Standard Score is in the good-to-excellent range for all ages (in the .80s and .90s). Reliability coefficients for clinical groups were similar to or higher than those for the standardization sample.
- Interscorer Reliability: Evaluated for subdomains not scored objectively. The interscorer consistency ranges from 92% to 100%, and the average scorer agreement for all ages combined ranges from 93% to 100%. Overall, the results indicate a high degree of consistency between scorers’ interpretations (i.e., scoring) of the children’s responses; this demonstrates that all subdomains can be scored reliably using the existing scoring rules.

Evidence of Validity

Validity refers to the degree to which a test measures what it intends to measure. The DELV–Norm Referenced edition presents various lines of validity evidence:

Test Content: The test content was developed by experts to be culturally fair for all English-speaking children, including African American English (AAE) speakers, focusing on universal aspects of language rather than dialect-specific features. The four domains (Syntax, Pragmatics, Semantics, Phonology) measure well-documented language constructs.
Response Processes: Item development and refinement involved pilot and tryout studies examining response frequencies and children’s explanations to ensure items elicited expected language processes.
Internal Structure: Correlation analyses show positive, moderate intercorrelations (.52–.55) among the language domains (Syntax, Pragmatics, Semantics), as expected, since they measure language skills. The Phonology domain had lower correlations (.22–.24) with the other domains. This is also expected as Phonology measures speech sound production while the other three domains measure language skills.
Relationships With Other Variables (Criterion-Related):
- DELV–Screening Test: Moderate correlations between the scores measured by the language domains of the DELV–Norm Referenced edition and the Diagnostic Risk Status section of the DELV–Screening Test provide support for DELV–Norm Referenced edition as a valid measure of language.
- CELF–4: Comparisons with selected CELF–4 subtests showed low-to-moderate correlations (e.g., DELV Composite Standard Score correlations ranging from .39 to .55 with CELF-4 subtests), consistent with tests based on different theoretical models measuring the same construct.
- PLS–4 Articulation Screener: A high correlation (.87) was observed between the DELV Phonology domain and the PLS–4 Articulation Screener, while correlations with DELV language domains were lower (e.g., Syntax .26, Pragmatics .24, Semantics .28). This supports the Phonology domain’s focus on speech sound production.
- Special Group Studies: Studies comparing language-disordered and articulation-disordered groups to matched typically developing samples showed significantly lower scores for the clinical groups, with large effect sizes.

Evidence of Diagnostic Accuracy

In the absence of an appropriate standardized test against which to measure diagnostic accuracy of the DELV-NR, Pearson, Jackson, and Wu (2014) used a procedure to standardize concurrent language samples and additionally used a well-established epidemiological method for discrepancy resolution. Their study showed that the DELV-NR clinical status assignment had average to excellent diagnostic accuracy at 1.5 standard deviations below the mean, while the diagnosis for the same children made with prior tests was close to random.

pearson jackson wu graphic – DELV™ Psychometric Properties

Source:
Pearson, B. Z., Jackson, J. E., & Wu, H. (2014). Seeking a valid gold standard for an innovative, dialect-neutral language test. Journal of Speech-Language and Hearing Research. 57(2):495-508.

In conclusion, the DELV–Norm Referenced edition demonstrates adequate to excellent reliability across various measures and provides multiple lines of evidence supporting its validity for assessing language and phonology in children within the specified age range, while also being designed to account for language variation.