Computer vision digitization of smartphone images of anesthesia paper health records from low-middle income countries

Folks, Ryan D.; Naik, Bhiken I.; Brown, Donald E.; Durieux, Marcel E.

doi:10.1186/s12859-024-05785-8

Research
Open access
Published: 07 May 2024

Computer vision digitization of smartphone images of anesthesia paper health records from low-middle income countries

Ryan D. Folks¹,
Bhiken I. Naik¹,
Donald E. Brown² &
…
Marcel E. Durieux¹

BMC Bioinformatics volume 25, Article number: 178 (2024) Cite this article

500 Accesses
Metrics details

Abstract

Background

In low-middle income countries, healthcare providers primarily use paper health records for capturing data. Paper health records are utilized predominately due to the prohibitive cost of acquisition and maintenance of automated data capture devices and electronic medical records. Data recorded on paper health records is not easily accessible in a digital format to healthcare providers. The lack of real time accessible digital data limits healthcare providers, researchers, and quality improvement champions to leverage data to improve patient outcomes. In this project, we demonstrate the novel use of computer vision software to digitize handwritten intraoperative data elements from smartphone photographs of paper anesthesia charts from the University Teaching Hospital of Kigali. We specifically report our approach to digitize checkbox data, symbol-denoted systolic and diastolic blood pressure, and physiological data.

Methods

We implemented approaches for removing perspective distortions from smartphone photographs, removing shadows, and improving image readability through morphological operations. YOLOv8 models were used to deconstruct the anesthesia paper chart into specific data sections. Handwritten blood pressure symbols and physiological data were identified, and values were assigned using deep neural networks. Our work builds upon the contributions of previous research by improving upon their methods, updating the deep learning models to newer architectures, as well as consolidating them into a single piece of software.

Results

The model for extracting the sections of the anesthesia paper chart achieved an average box precision of 0.99, an average box recall of 0.99, and an mAP0.5-95 of 0.97. Our software digitizes checkbox data with greater than 99% accuracy and digitizes blood pressure data with a mean average error of 1.0 and 1.36 mmHg for systolic and diastolic blood pressure respectively. Overall accuracy for physiological data which includes oxygen saturation, inspired oxygen concentration and end tidal carbon dioxide concentration was 85.2%.

Conclusions

We demonstrate that under normal photography conditions we can digitize checkbox, blood pressure and physiological data to within human accuracy when provided legible handwriting. Our contributions provide improved access to digital data to healthcare practitioners in low-middle income countries.

Peer Review reports

Background

Globally, approximately 313 million surgical cases are performed annually. 6% of these surgeries are performed in low-middle income countries (LMICs), where a third of the global population currently resides. Surgical mortality rates are twice as high in LMICs, compared to high-income countries despite patients being younger, having a lower risk profile and undergoing less invasive surgery [1]. A significant majority of these deaths are preventable with surveillance of high-risk patients and early evidence-based interventions [1, 2].

Surveillance and improvement in surgical and anesthesia care is dependent on having access to continuous, reproducible, and real-time data. However, in LMICs the primary method of data capture for anesthesia and surgery is within paper health records. These records are characterized by having multiple data elements including medication administration, physiological parameters, and procedural-specific elements recorded manually by the provider at a regular frequency (e.g., every 5 min). The data density of the anesthesia paper health records, defined as the data generated per unit of time, is amongst the highest for any healthcare setting [3].

The most efficient method to record high-volume anesthesia data is with automatic data capture monitors and electronic medical record systems (EMRs). Unfortunately, due to their cost and complexity, electronic records remain an unlikely solution in LMICs for the foreseeable future [4]. This creates major gaps in digital data access for anesthesia providers in LMICs, and their ability to utilize data to rapidly anticipate and intervene to reduce anesthesia and surgical complications and mortality.

In this paper we describe our methodology to further improve the accuracy of the digitization of anesthesia paper health records from the University Teaching Hospital of Kigali (CHUK) in real time using computer vision. Our work builds from our previous digitizing efforts and further consolidates the process using a single software program. Our overarching goal for this project is to provide rapidly accessible, digital data to anesthesia healthcare providers in LMICs, which can faciliate evidence-based actionable interventions to reduce morbidity and mortality.

The remainder of this paper begins with an introduction to the paper anesthesia record from CHUK, leading into a discussion on our methodology for correcting common distortions in smartphone images of the paper anesthesia record, followed by our methods for extracting the blood pressure, physiological, and checkbox data elements. Finally, we assess the improvements in our methods from previous research in the results section, and discuss the impact, challenges, and future directions of our results and work.

The intraoperative anesthesia paper health record

We utilized 500 smartphone photographs of paper anesthesia records collected from 2019 to 2023. The photographs of the anesthesia paper records varied greatly in quality, with some being clear, well lit, and legible, whereas others were blurry, poorly lit, and illegible. The anesthesia record has seven distinct sections: handwritten medications (Fig 1, Section A), inhaled volatile anesthetics (Fig 1, Section B), intravenous fluids (Fig 1, Section C), blood and blood product transfused (Fig 1, Section D), blood pressure and heart rate (Fig 1, Section E), physiological data elements (Fig 1, Section F), and checkboxes for marking key procedural events (Fig 1, Section G).

Intravenous medications

Multiple intravenous medications are administered over the course of surgery, with both the dose and timing of administration recorded in the anesthesia paper health record. Commonly administered medications include drugs required for induction of anesthesia, prevention of infection (e.g., antibiotics), to induce or reverse muscle paralysis, and to ensure blood pressure and heart stability. The medications are written in the temporal order in which they are administered.

Inhaled volatile medications

The inhaled volatile anesthetic medications are halogentated hydrocarbon gases that are administered to maintain general anesthesia. To document the type of the volatile inhaled anesthetic administered, the anesthesia paper health record has three checkboxes, two are for the most commonly used inhaled anesthetics: isoflurane and halothane, and the third box is a fill-in if another gas such as sevoflurane or desflurane is used. The dose of the volatile inhaled anesthetic medication is recorded as a percentage value.

Intravenous fluids

Intravenous fluids are administered during anesthesia to maintain fluid homeostasis and hemodynamic stability. The type of intravenous fluids, in addition to the incremental and total volume given during anesthesia is recorded as free text.

Blood and blood product transfused

Blood and component blood products are administered when significant bleeding and hemorrhagic complications occur. The Blood and Blood Product Transfused section is a free text section where providers list both the specific blood component product (e.g., packed red blood cells or fresh frozen plasma) and volume administered.

Blood pressure and heart rate

The blood pressure and heart rate section utilize handwritten arrows and dots to encode blood pressure in millimeters of mercury (mmHg) and heart rate in beats per minute (bpm). The x axis on the grid indicates five minute epochs, during which a provider takes a systolic blood pressure (downward arrow), diastolic blood pressure (upward arrow), and heart rate measurement (dot). The y-axis encodes both bpm and mmHg in increments of 10.

Physiological indicators

The physiological indicators section uses handwritten digits to encode different types of physiological information including oxygen saturation, inspired oxygen concentration, exhaled carbon dioxide, mechanical ventilator data, body temperature, amount of urine produced, and blood loss encountered. The x-axis on the grid represents five minute epochs.

Checkboxes

The checkboxes section uses handwritten check marks to indicate boolean values associated with a patient’s position on the operating table, intubation status, type of monitoring devices and details, and safety best-practices utilized during the surgery.

Related work

In 2015, Ohuabunwa et al. [5] detailed the need for electronic medical record systems in LMICs. According to their analysis, the rise of “communicable diseases necessitates adequate record keeping for effective follow-up”, and for retrospective research. Among the difficulties with implementing these EMRs in LMICs are unfamiliarity with these systems and the cost of implementation and maintenance which make them prohibitively expensive. The authors assert that even hybrid paper-electronic systems where an image of the health record is scanned into a database and certain data elements are manually entered into an EMR can be very costly and require significant human and monetary resources. We postulate that a system which would only require the user to take a smartphone image of an anesthesia paper record would impose minimal burdens to the existing clinical workflow and require a very small amount of capital to adopt in comparison to EMR systems.

In 2020, Rho et al. described using computer vision software to automatically digitize portions of an anesthesia paper record from CHUK using smartphone images [6]. Their work utilized a wooden box within which the anesthesia paper record would be inserted and on top of which a smartphone could be placed to attain an image that was standardized for lighting and position. They digitized the checkboxes section with 82.2% accuracy, blood pressure data with an average mean squared error of 21.44 between the systolic and diastolic symbols, and classified handwritten images of medication text with an accuracy of 90.1%. It is unclear how comparable this metric is to future work, since the algorithm used was trained to reject “unreadable” samples, and did so on approximately 15% of the test set.

Subsequently, Adorno et al. developed an improved approach for blood pressure symbol detection utilizing U-Nets [7]. By generating a segmentation mask of the blood pressure symbols, using image morphology to separate the detections, and computing the centroid of each pixel cluster, Adorno was able to improve the object detection precision to 99.7% and recall to 98.2%. The mean average error of the association between U-Net detections and the ground truth blood pressure values was approximately 4 mmHg. Our approaches build on this conceptual basis of using deep learning to identify handwritten symbols in conjunction with a post-processing algorithm to associate values with detections. We implement two of the suggestions in the future work section of Adorno’s paper, namely to incorporate image tiling, and to improve the post-processing algorithms.

For checkbox detection, Murphy et al. utilized a deep neural network approach. They used a template matching algorithm called ORB and a convolutional neural network (CNN) to locate and classify the checkboxes rather than the proportion of pixel intensity method initially used by Rho et al. [8]. Their new algorithm was capable of locating checkboxes with an accuracy of 99.8% and classifying them as checked or unchecked with an accuracy of 96.7%. In subsequent development, we simplified this process by using the YOLOv8 single shot detector to combine the detection and classification steps.

Finally, Annapareddy et al. investigated the use of the YOLOv5 single shot detector to extract and classify handwritten intravenous medications and digitize the physiological indicators Sect. [9]. Due to the large number of classes in the medication and physiological indicator sections, their paper found that models that attempted both detection and classification were generally unable to do either due to lack of sufficient data in each class. However, models trained on a single class performed much better in detection, but could not classify.

Methods

The extraction of data from an anesthesia paper chart begins with optimizing the lighting of the smartphone photographs, removing shadows, and using object detection to find document landmarks for use in removing perspective distortion. Then, each section of the chart is identified by a YOLOv8 model and cropped out of the chart. YOLOv8 models which are trained to detect handwritten blood pressure symbols, numbers, and checkboxes used in anesthesia paper charts produce lists of bounding boxes that a combination of convolutional neural networks, traditional computer vision, machine learning, and algorithms then use to impute meaningful values and detect errors.

Image optimization techniques

To maximize the accuracy of digitization, the input images need to be optimized as follows: (1) shadows removed, (2) pixel intensities standardized and normalized, (3) perspective distortions such as rotation, shear, and scaling corrected, and (4) general location of document landmarks fixed. We accomplish this by first removing shadows using image morphology techniques, then normalize and standardize the pixel values of the images, and finally correct perspective distortions and approximately correct the location of document landmarks using a homography transformation.

Shadow removal

Smartphone photographs of the anesthesia paper chart often suffer from sudden changes in pixel intensities caused by shadows being cast onto the image which break up the lighting. Sudden changes in the value of pixels can cause difficulty for deep learning models which learn representations of objects as functions of the weighted sums of pixels. Therefore, both normalization and shadow removal are necessary to optimize our inputs and maximize detection accuracy. One algorithm for accomplishing this is outlined by Dan Mašek in a stack overflow post from 2017 (Algorithm 1) [10].

The exact values for the median blur and dilation operations are subject to the image’s size and degree of shadow and can be tuned to the dataset. This algorithm only operates on grayscale images, but since no information in the anesthesia paper charts are encoded with color, we converted our charts to grayscale. We did not use any metrics to assess shadow removal, but a visual inspection of the output shows that the resulting images no longer suffer from a lighting gradient (Fig. 2).

The planar homography

The planar homography is defined as the most general linear mapping of all the points contained within one quadrilateral to the points of another quadrilateral (Fig. 3). A planar homography was used to correct perspective distortions within the smartphone image.

Translation, rotation, scaling, affine, and shear transformations are all subsets of the homography, and the homography in turn can be decomposed into these transformations. Here, as in many other computer vision applications, the homography is used to correct linear distortions in the image caused by an off-angle camera perspective (Fig. 4).

In order to compute a useful homography for document correction, four document landmarks need to be identified from a target anesthesia paper chart image. Those same four landmark locations were then identified on a scanned, perfectly aligned control anesthesia paper chart image. We trained a YOLOv8 model to detect the document landmarks “Total”, “Time”, “Procedure Details”, and “Patient Position” which fall in the four corners of the anesthesia paper chart described in Fig. 1. We then used the OpenCV python package to compute the homography between the two sheets and warp the target image accordingly (Fig. 5). The benefits to this method are that the homography computation is robust to failure due to YOLOv8’s high accuracy, even under sub-optimal conditions. In cases where the planar homography failed to correct the distortion, clear errors were found on the anesthesia paper chart including: (1) landmarks being obscured by writing (2) landmarks being covered by other pieces of paper (3) landmarks not being included in the smartphone image entirely. Initially, this deep object detection approach seems excessive, as there are a number of traditional computer vision methods for automatic feature matching between two images such as ORB and SIFT. However, the variance in lighting and blurriness in our dataset posed challenges for these nondeep algorithms, which often failed silently, mistaking one landmark for another, and warping images such that they were unidentifiable.

Section extraction

There are seven sections which encode different pieces of intraoperative information on the anesthesia paper chart (Fig. 1). Due to nonlinear distortions in the image, the homography is not a perfect pixel-to-pixel matching from the target image to the scanned control image. Therefore, an alternative method of identifying the precise location of the sections is required. We accomplished this by training a YOLOv8s model to place a bounding box around each section. Because the homography already normalizes the locations of the sections to within a few dozen pixels, we were able to train one of the smallest architectures of YOLOv8, YOLOv8s, to extract the different sections.

Image tiling for small object detection

The anesthesia paper chart is characterized by having handwritten symbols (e.g., medication, numerical and blood pressure symbols) that are small and often tightly packed together (Fig. 1). Single shot detectors like YOLO struggle to separate and identify these handwritten symbols due to their use of a grid which assigns responsibility of a single cell to the center of a single object. One solution to this issue is to increase the image size, however since YOLO uses padding to make all images square, and the number of pixels in a square image grows quadratically with image size, this causes training memory usage and detection time to increase quadratically as well. To overcome this problem, we used an approach called image tiling where we divided the image into smaller pieces called tiles and trained on the tiles rather than the entire image. This allowed us to increase the size of these small objects relative to the frame, allowing us to get much better object detections.

There are, however, several challenges associated with image tiling. First, objects which are larger than the tiles which we have divided the image into will not be able to fit into a single tile, and will be missed by the model. All the handwritten symbols in our dataset were small, and were uniform in size, allowing us to use image tiling without the risk of losing any detections. Second, by needing to detect on every sub-image, the detection time increases. Whereas this may be an issue in real-time detection, the difference in detection time is only measured in several hundred milliseconds, which does not affect our use case. Third, the number of unique images and total objects in a single training batch will be smaller, causing the models weights to have noisy updates and require longer training. We solved these issues by utilizing the memory savings acquired by tiling to double the training batch size from 16 to 32. In addition, due to the very large number of empty tiles, we were able to randomly add only a small proportion to the training dataset, which further increased the object to tile ratio. Finally, objects which lie on the border of two tiles will not be detected since they do not reside in either image. Our solution to this issue is to not divide the image into a strict grid, but instead to treat the tiling process as a sliding window which moves by one half of its width or height every step. With this approach, if an object is on the edge of one sub-image, it will be directly in the center of the next one (Fig. 6). This solution introduces its own challenge though since nearly every detection will be double counted when the detections are reassembled. Our solution to this problem is to compute the intersection-over-union of every bounding box with every other bounding box at detection time, group together boxes whose intersection-over-union is greater than a given threshold, and combine them into one detection. Since the objects we are detecting should be well separated and never overlap, this allows us to remove the doubled detections.

Blood pressure symbol detection and interpretation

The blood pressure section encodes blood pressure values using arrows, and heart rate using dots or lines. Each vertical line on the grid indicates a five minute epoch of time during which a provider records a blood pressure and heart rate reading (Fig. 1). The y-axis encodes the value of blood pressure in mmHg, and each horizontal line denotes a multiple of ten (Fig. 1).

Symbol detection

Systolic blood pressure values are encoded by a downward arrow, and diastolic blood pressure values are encoded with an upward arrow. The downward and upward arrows are identical when reflected over the x-axis, so we were able to collapse the two classes into one. We then trained a YOLOv8 model on the single “arrow” class, and during detection we simply detect on the image and an upside-down-version of itself to obtain systolic and diastolic detections respectively. Finally, the diastolic detections y-values are subtracted from the image's height to correct for the flip.

Thereafter two key pieces of information are required from each of the bounding boxes: (1) its value in millimeters of mercury (mmHg), and (2) its timestamp in minutes.

Inferring mmHg values from blood pressure symbol detections

The value of blood pressure encoded by an arrow corresponds to the y-pixel of the tip of the arrow. By associating a blood pressure value to each y-pixel in the blood pressure section, we can obtain a value for each blood pressure bounding box. We trained a YOLOv8 model to identify the 200 and 30 legend markers, and by identifying the locations of the 200 and 30 markers, we were able to interpolate the value of blood pressure for each y-pixel between the 200 and 30 bounding boxes (Fig. 7).

Assigning timestamps to blood pressure symbol detections

To impute timestamps, we wrote an algorithm that applies timestamps based on the relative x distances between the systolic and diastolic detections (algorithm 2).

Missing detections are a common problem when applying timestamps. Our algorithm deals with this in two ways. The while loop checks if two boxes are within 1% of the image’s width from one another, ensuring they are not too far away to plausibly match before actually pairing them. If a box has no pair which is within the 1% range, the algorithm considers it to not have any matches. Another problem occurs when there are no detections for a five minute epoch. This is solved by sampling the distance between true matches in the dataset. We found that 100% of the matches were within 0.016*image’s width of the next matching pair. So, adding a small amount for error, if a match is more than 0.018*image’s width from the next pair, a time gap of 10 min is applied instead of the typical 5.

Blood pressure model training and error testing

A YOLOv8l model, the second largest architecture of YOLOv8, was trained to detect downward arrows for 150 epochs and using a batch size of 32 images. The images used to train this model were tiled images of the blood pressure section where only the systolic arrows were annotated on unflipped images, and only the diastolic arrows were annotated on flipped images.

There are two ways that error will be assessed for the blood pressure section: detection error and inference error. Detection error will be computed using the normal object detection model metrics of accuracy, recall, precision, and F1. Inference error is the error between the value in millimeters of mercury the program assigned to a blood pressure detection on the whole image of the blood pressure section, and the ground truth value that was manually annotated. Blood pressure detections made by the program were hand matched with ground truth values during assessment in order to avoid the case where the correct blood pressure value was assigned to a different timestamp. The error metric we used for this was mean average error. The 30 chart images used for testing included 1040 systolic and diastolic marks (this number varies from the object detection testing set due to image tiling duplicating detections). The ability of the program to match blood pressure detections to a particular time stamp was not assessed.