Subject to change.
Subject to change.
Alex graduated from Stanford University in 2024 with a B.S. in Computational Biology and M.S. in Computer Science (AI track). He is now a Research Assistant at the Trauma Imaging Research and Innovation Center (TIRIC) at Brigham and Women’s Hospital, where he is harnessing his data analytic skills to help create automated multimodal solutions for violence prediction, injury detection, geriatric frailty, and falls. Alex is currently applying to medical school and hopes to matriculate in 2025.
Introduction and Background: Timely and accurate pathology diagnoses are crucial for medical management decisions and clinical trial eligibility. Academic institutions and large healthcare enterprises offer expert consultations nationwide for the most complex cases requiring urgent pathology input. At Stanford Pathology alone, the volume of consult cases has surged by over 10% in the past two years, now exceeding 14,000 cases annually. These cases represent 25% of all surgical pathology cases, with further growth anticipated due to the aging baby boomer population. Despite this growth, the intake process for consult cases, known as accessioning, remains labor-intensive, requiring significant staff effort to manually enter numerous patient information fields. This time-consuming process is vulnerable to staff availability fluctuations, leading to substantial delays. Accessioning bottlenecks account for nearly 50% of the total turnaround time per consult, delaying critical pathology results and undermining care for the sickest patients. In this study, we propose an AI-based solution for automating pathology accessioning. We present a novel end-to-end deep learning strategy for efficiently and accurately extracting patient information from document images, as well as an infrastructure strategy for clinical integration. Model Design: Understanding document images is challenging due to the need for both text recognition and holistic document comprehension. Traditional visual document extraction pipelines rely on off-the-shelf optical character recognition (OCR) for text reading and then focus on interpreting OCR outputs. While promising in certain clinical applications, these OCR-based methods are computationally expensive and lack flexibility across different languages and document types. Pathology accessioning, in particular, presents additional challenges as consult cases vary in structure, length, and writing quality, with many documents being handwritten or using non-text elements like checkboxes. This highlights the need for a more generalizable approach. We developed a transformer-based vision-language model (VLM) for processing document images end-to-end, eliminating the need for an intermediate text representation like OCR. Our model utilizes a vision encoder and a text decoder to directly extract patient information. This approach significantly improves generalizability, essential for effectively handling the diverse and complex nature of consult cases. By leveraging various pretraining and fine-tuning strategies, ensemble methods, and data augmentation techniques, our model achieves 88-95% average normalized Levenshtein similarity (ANLS) in extracting patient information fields, with inference times of just 1-2 seconds. This performance significantly surpasses OCR-based approaches, with a 13% improvement in accuracy and an 8-10x reduction in runtime. Additionally, we implemented a text localization strategy that integrates spatial information from OCR with model outputs to approximate bounding boxes for extracted text, facilitating manual reviews. Furthermore, we developed a quality-checking model that cross-references extracted patient information across document pages to automatically detect missing or incorrect data, ensuring the accurate reporting of confidential medical information. Clinical Integration: For clinical integration, we developed a human-in-the-loop pipeline to ensure zero tolerance for errors with patients' data. Documents are scanned into a PHI-safe cloud-based bucket, automatically triggering model inference to extract patient information and run data quality checks. This initial analysis happens locally and asynchronously without the need for human input. Accessioners then review cases using a custom user interface, allowing them to verify and, if necessary, correct the extracted information from the model, ensuring data accuracy is maintained. The verified patient data is then sent to Epic Beaker to create requisition orders. Our cloud-based serverless infrastructure allocates compute resources on-demand, saving costs and ensuring scalability. Thus, our semi-supervised pipeline maintains data accuracy and patient privacy while dramatically accelerating the accessioning workflow. Concluding Remarks: Our system accurately and efficiently extracts patient information from pathology documents in 1-2 seconds, compared to the several minutes typically spent per accessioning case, significantly reducing bottlenecks and treatment delays. This text extraction pipeline can be readily scaled to various medical document processing tasks, including the formatting and drafting of pathology reports, as well as automated metadata extraction for efficient management of whole slide images (WSI). Our solution has a profound impact on our ability to deliver timely and accurate pathology results.
Learning Objective