Abstract:
Introduction:
Skin cancer is the most common human malignancy. Australasia has the highest incidence and mortality of skin cancer in the world. Primary prevention, early diagnosis and treatment are the keys to reduce the mortality and morbidity from skin cancer. Primary care providers, dermatology specialists, and healthcare access are crucial components in providing early diagnosis and optimising skin cancers treatment. Artificial intelligence (AI) is a computer science that involves creating programs that aim to simulate human cognition and processes to analyse data. Many studies performed on AI-assisted diagnoses show promising results with sensitivity and specificity on par with dermatologists. However, real-world clinical validation is still currently lacking. Our study aims to assess an AI-augmented triaging system in a primary care real-life setting to evaluate its reliability.
Methods:
This study is a single-centre, double-blinded observational study with a predetermined study design and prospective recruitment/collection of participants. We performed the study in the Waikato region, New Zealand, with 20 recruited primary care practices. The practices recruited patients who attended for suspected skin cancer lesions. They photographed the lesions with the GP camera and referred these lesions for assessment via the teledermatology pathway to Waikato hospital for specialist assessment. Another set of photographs was taken simultaneously with our DermLite study camera, which was subsequently processed by the AI algorithm. The diagnoses from the teledermatologists (TD) were compared with the AI algorithm to assess “reliability”. “Reliability” is arbitrarily defined by the specificity of benign lesions of >95%, and sensitivity of malignant lesions of >98%, by comparison with the consensus diagnosis of two TD being the reference standard. We have also compared the image qualities between the GP camera and DermLite study camera.
Results and findings:
The study population included 334 patients, and a total of 304 images of skin lesions were included in our primary outcome assessment. We recruited 193 (57.8%) women and 315 (94.3%) European/pakeha participants. The age ranged between 18 and 97 years, with a mean age of 64 years. We stratified the lesions by management into benign, uncertain and malignant. One hundred thirteen lesions were deemed uncertain, therefore, not included in the sensitivity/specificity analysis. Of the remaining 188 lesions, using the TD’s consensus diagnosis as the gold standard for the AI algorithm, the sensitivity was 99.04% (95% CI, 94.76% - 99.98%), specificity was 85.71% (95% CI, 76.38% - 92.39%). This specificity does not meet our predetermined arbitrary criteria of “reliability”.
We had a sample of 132 lesions to compare the AI algorithm with histological diagnoses. The sensitivity was 100% (95% CI, 96.27% – 100%), specificity was 80% (95% CI, 63.06 – 91.56%). With our available data, we have also compared 101 lesions’ TD diagnoses with the histology results, the sensitivity was 98.78% (CI 95%, 93.39% - 99.97%), specificity was 84.21% (CI 95%, 60.42% - 96.62%).
We compared the image quality of the DermLite study camera and the GP photographs and found excellent quality for 92.7% vs 79.9%, adequate quality for 1.6% vs 13.3% , and poor-quality photographs for 5.7% vs 6.8%, respectively. We concluded that the DermLite study camera produced better quality photographs.
Conclusion:
We have a high incidence of skin cancers with significant health and economic impact on our communities. There are health outcome inequities in our population and a shortage of general practitioners and dermatology specialists in New Zealand. An under-served population is associated with worse clinical outcomes, reduced quality of life, and increased healthcare cost. The AI algorithm did not meet our arbitrary criteria of reliability; however, it demonstrates very promising results as a triage tool in conjunction with TD to augment healthcare and improving access to dermatology. Further real-life studies need to be conducted on a bigger scale to assess its use in primary care.