The evaluation of bulbar redness grading scales

Schulze, Marc-Matthias

The evaluation of bulbar redness grading scales

Files

Schulze_Marc-Matthias.pdf (5.03 MB)

Date

2010-01-12T20:54:42Z

Authors

Schulze, Marc-Matthias

Publisher

University of Waterloo

Abstract

The use of grading scales is common in clinical practice and research settings. A number of grading scales are available to the practitioner, however, despite their frequent use, they are only poorly understood and may be criticised for a number of things such as the variability of the assessments or the inequality of scale steps within or between scales. Hence, the global aim of this thesis was to study the McMonnies/Chapman-Davies (MC-D), Institute for Eye Research (IER), Efron, and validated bulbar redness (VBR) grading scales in order to (1) get a better understanding and (2) attempt a cross-calibration of the scales. After verifying the accuracy and precision of the objective and subjective techniques to be used (chapter 3), a series of experiments was conducted. The specific aims of this thesis were as follows: • Chapter 4: To use physical attributes of redness to determine the accuracy of the four bulbar redness grading scales. • Chapter 5: To use psychophysical scaling to estimate the perceived redness of the four bulbar redness grading scales. • Chapter 6: To investigate the effect of using reference anchors when scaling the grading scale images, and to convert grades between scales. • Chapter 7: To grade bulbar redness using cross-calibrated versions of the MC-D, IER, Efron, and VBR grading scales. Methods: • Chapter 4: Two image processing metrics, fractal dimension (D) and % pixel coverage (% PC), as well as photometric chromaticity (u’) were selected as physical measures to describe and compare redness in the four bulbar redness grading scales. Pearson correlation coefficients were calculated between each set of image metrics and the reference image grades to determine the accuracy of the scales. • Chapter 5: Ten naïve observers were asked to arrange printed copies of modified versions of the reference images (showing vascular detail only) across a distance of 1.5m for which only start and end point were indicated by 0 and 100, respectively (non-anchored scaling). After completion of scaling, the position of each image was hypothesised to reflect its perceived bulbar redness. The averaged perceived redness (across observers) for each image was used for comparison to the physical attributes of redness as determined in chapter 4. • Chapter 6: The experimental setup from chapter 5 was modified by providing the reference images of the VBR scale as additional, unlabelled anchors for psychophysical scaling (anchored scaling). Averaged perceived redness from anchored scaling was compared to non-anchored scaling, and perceived redness from anchored scaling was used to cross-calibrate grades between scales. • Chapter 7: The modified reference images of each grading scale were positioned within the 0 to 100 range according to their averaged perceived redness from anchored scaling, one scale at a time. The same 10 observers who had participated in the scaling experiments were asked to represent perceived bulbar redness of 16 sample images by placing them, one at a time, relative to the reference images of each scale. Perceived redness was taken as the measured position of the placed image from 0 and was averaged across observers. Results: • Chapter 4: Correlations were high between reference image grades and all sets of objective metrics (all Pearson’s r’s≥0.88, p≤0.05); each physical attribute pointed to a different scale as being most accurate. Independent of the physical attribute used, there were wide discrepancies between scale grades, with sometimes little overlap of equivalent levels when comparing the scales. • Chapter 5: The perceived redness of the reference images within each scale was ordered as expected, but not all consecutive within-scale levels were rated as having different redness. Perceived redness of the reference images varied between scales, with different ranges of severity being covered by the images. The perceived redness was strongly associated with the physical attributes of the reference images. • Chapter 6: There were differences in perceived redness range and when comparing reference levels between scales. Anchored scaling resulted in an apparent shift to lower perceived redness for all but one reference image compared to non-anchored scaling, with the rank order of the 20 images for both procedures remaining fairly constant (Spearman’s ρ=0.99). • Chapter 7: Overall, perceived redness depended on the sample image and the reference scale used (RM ANOVA; p=0.0008); 6 of the 16 images had a perceived redness that was significantly different between at least two of the scales. Between-scale correlation coefficients of concordance (CCC) ranged from 0.93 (IER vs. Efron) to 0.98 (VBR vs. Efron). Between-scale coefficients of repeatability (COR) ranged from 5 units (IER vs. VBR) to 8 units (IER vs. Efron) for the 0 to 100 range. Conclusions: • Chapter 4: Despite the generally strong linear associations between the physical characteristics of reference images in each scale, the scales themselves are not inherently accurate and are too different to allow for cross-calibration based on physical redness attributes. • Chapter 5: Subjective estimates of redness are based on a combination of chromaticity and vessel-based components. Psychophysical scaling of perceived redness lends itself to being used to cross calibrate the four clinical scales. • Chapter 6: The re-scaling of the reference images with anchored scaling suggests that redness was assessed based on within-scale characteristics and not using absolute redness scores, a mechanism that may be referred to as clinical scale constancy. The perceived redness data allow practitioners to modify the grades of the scale they commonly use so that comparisons of grading estimates between calibrated scales may be made. • Chapter 7: The use of the newly calibrated reference grades showed close agreement between grading estimates of all scales. The between-scale variability was similar to the variability typically observed when a single scale is repeatedly used. Perceived redness appears to be dependent upon the dynamic range of the reference images of the scale. In conclusion, this research showed that there are physical and perceptual differences between the reference images of all scales. A cross-calibration of the scales based on the perceived redness of the reference images provides practitioners with an opportunity to compare grades across scales, which is of particular value in research settings or if the same patient is seen by multiple practitioners who are familiar with using different scales.