DISCLAIMERS

contact us >>

Evaluating ChatGPT Generated RVU Assignment in Hand Trauma Operations

Ermina Lee, BS; Tiffany Shi, PhD; Victoria Lee, BS; Kelly Spiller, MD; Ryan Gobble, MD; Ann R. Schwentker, MD
University of Cincinnati
2025-01-10

Presenter: Ermina Lee

Affidavit:
Ermina, Tiffany, Victoria, and Dr. Schwentker contributed to data acquisition. All co-authors contributed to project design and approved of the abstract.

Director Name: Ann R. Schwentker

Author Category: Medical Student
Presentation Category: Clinical
Abstract Category: Hand

Background:
Hospital systems utilize Relative Value Units (RVUs) to assess productivity and allocate resources. Publicly available open-source artificial intelligence (OSAI) products, such as Chat Generative Pre-Trained Transformer (ChatGPT), are marketed to streamline medical billing. This study evaluates ChatGPT RVU assignment in hand trauma operations.
Methods:
A retrospective review identified 20 hand trauma operations from 2018 to 2023 with associated Current Procedural Terminology (CPT) codes. Three independent users queried ChatGPT-4.0 with: "Using the [year] Centers for Medicare & Medicaid Services (CMS) RVUs, calculate the appropriate RVUs for the following CPT codes," with prompted regional and facility adjustments. ChatGPT-generated RVUs were compared to each other and to the CMS Physician Fee Schedule (CMS-PFS) using R software (v.4.2.0).
Results:
51 total CPT codes were queried. Analysis revealed a 39.6% overestimation in total RVUs between CMS-PFS (309.8 RVUs) and the mean of three ChatGPT trials (432.5 RVUs). Mean difference between the CMS-PFS and ChatGPT assignment for each operation's CPT codes to RVUs was 7.3 (range 0.0 to 25.1). On average, the mean absolute error CMS-PFS and ChatGPT was 8.54 RVUs. No user could replicate RVUs exactly despite identical CPT input and query. Bland-Altman analyses showed no systemic biases (mean difference=0) between users but high standard deviation of differences (6.19) and limits of agreement (-12.13 to 12.13), indicating high variability and low precision between users.
Conclusions:
The discrepancy in RVU calculations between CMS-PFS and ChatGPT, along with variability among users, highlights caution with integrating OSAI into healthcare systems. Continued human oversight is essential for refinement.

Ohio,Pennsylvania,West Virginia,Indiana,Kentucky,Pennsylvania American Society of Plastic Surgeons

OVSPS Conference