DISCLAIMERS

contact us >>

Assessing the Utility of ChatGPT in Hand Trauma Surgical Coding

Tiffany Shi, PhD; Ermina Lee, BS; Victoria Lee, BS; Kelly Spiller, MD; Ryan Gobble, MD; Ann R. Schwentker, MD
University of Cincinnati
2025-01-10

Presenter: Tiffany Shi, PhD

Affidavit:
Tiffany, Ermina, Victoria and Dr. Schwentker contributed to data acquisition. All co-authors contributed to project design and approved of the abstract.

Director Name: Ann R. Schwentker

Author Category: Medical Student
Presentation Category: Clinical
Abstract Category: Hand

Background:
Hand trauma presents unique challenges in medical billing – each repair requires an unbundled Current Procedural Terminology (CPT) code. Here, we assessed the utility of open-source artificial intelligence (AI), specifically Chat Generative Pre-Trained Transformer (ChatGPT), to enhance coding precision in hand trauma.

Methods:
20 hand trauma operations were identified from retrospective review of 2018-2023. De-identified operative reports were input to ChatGPT-4.0 with the prompt: "Provide all related [year] CPT codes that could be assigned to this operation. List modifier codes associated with each CPT. Assume applicable bundling is applied." ChatGPT then assessed billed codes and made final selection based on which were supported by the operative report. An expert hand surgeon assessed report quality and whether coding was supported by documentation.

Results:
The 20 identified operations included 22 repairs and 49 procedures. Documentation supported 36/51 billed CPT codes (70.6%). Initially, only 22/68 ChatGPT assignments (32.4%) were supported by documentation; this increased to 29/63 (46.0%) at final query. Among codes unsupported by documentation, 2/51 billed CPT codes (3.8%) and 10/68 (14.7%) ChatGPT-assigned codes were attributed to anatomical inconsistencies. Overcoding by ChatGPT was observed in 8/20 operative notes (40%), while undercoding by billers occurred in 5/20 notes (25%), primarily related to wound preparation and debridement.

Conclusions:
Despite improved CPT assignment with an iterative query, ChatGPT's performance falls short of billers. Performance was dependent on user input and exhibited leading bias. Since AI is already marketed for medical billing and used by insurers to deny claims, model refinement for unbundled procedures is crucial.

Ohio,Pennsylvania,West Virginia,Indiana,Kentucky,Pennsylvania American Society of Plastic Surgeons

OVSPS Conference