<< Back to the abstract archive
Evaluating ChatGPT Generated RVU Assignment in Hand Trauma Operations
Ermina Lee, BS; Tiffany Shi, PhD; Victoria Lee, BS; Kelly Spiller, MD; Ryan Gobble, MD; Ann R. Schwentker, MD
University of Cincinnati
2025-01-10
Presenter: Ermina Lee
Affidavit:
Ermina, Tiffany, Victoria, and Dr. Schwentker contributed to data acquisition. All co-authors contributed to project design and approved of the abstract.
Director Name: Ann R. Schwentker
Author Category: Medical Student
Presentation Category: Clinical
Abstract Category: Hand
Background:
Hospital systems utilize Relative Value Units (RVUs) to assess productivity and allocate resources. Publicly available open-source artificial intelligence (OSAI) products, such as Chat Generative Pre-Trained Transformer (ChatGPT), are marketed to streamline medical billing. This study evaluates ChatGPT RVU assignment in hand trauma operations.
Methods:
A retrospective review identified 20 hand trauma operations from 2018 to 2023 with associated Current Procedural Terminology (CPT) codes. Three independent users queried ChatGPT-4.0 with: "Using the [year] Centers for Medicare & Medicaid Services (CMS) RVUs, calculate the appropriate RVUs for the following CPT codes," with prompted regional and facility adjustments. ChatGPT-generated RVUs were compared to each other and to the CMS Physician Fee Schedule (CMS-PFS) using R software (v.4.2.0).
Results:
51 total CPT codes were queried. Analysis revealed a 39.6% overestimation in total RVUs between CMS-PFS (309.8 RVUs) and the mean of three ChatGPT trials (432.5 RVUs). Mean difference between the CMS-PFS and ChatGPT assignment for each operation's CPT codes to RVUs was 7.3 (range 0.0 to 25.1). On average, the mean absolute error CMS-PFS and ChatGPT was 8.54 RVUs. No user could replicate RVUs exactly despite identical CPT input and query. Bland-Altman analyses showed no systemic biases (mean difference=0) between users but high standard deviation of differences (6.19) and limits of agreement (-12.13 to 12.13), indicating high variability and low precision between users.
Conclusions:
The discrepancy in RVU calculations between CMS-PFS and ChatGPT, along with variability among users, highlights caution with integrating OSAI into healthcare systems. Continued human oversight is essential for refinement.