<< Back to the abstract archive
Evaluating GPT-4's Accuracy in CPT Coding Across Various Plastic Surgery Sub-Specialties
Hilary Y Liu, BS; Jos� Antonio Arellano, MD; Mia Carrarini, BS; Mare G Kaulakis, BS; Christopher J Fedor, BS; Rebecca Hohsfield, BS; Joseph E Losee, MD; Vu T Nguyen, MD; Francesco M Egro, MD, MSc, MRCS
University of Pittsburgh Medical Center
2025-01-09
Presenter: Hilary Y Liu, BS
Affidavit:
I agree
Director Name: Vu T Nguyen
Author Category: Medical Student
Presentation Category: Clinical
Abstract Category: Breast (Aesthetic and Recon.)
Background: CPT coding represents a significant administrative burden in healthcare delivery. While the large language model GPT-4 has shown promise in medical education and patient counseling, its potential to assist with procedural coding remains unexplored. This study evaluates GPT-4's accuracy in generating CPT codes across various plastic surgery procedures.
Methods: GPT-4's coding accuracy was assessed using 27 de-identified operative notes spanning six plastic surgery subspecialties: aesthetic, breast reconstruction, general reconstruction, hand surgery, craniofacial trauma, and burn procedures. Operative notes were entered into GPT-4 with a standardized prompt requesting appropriate CPT codes. Responses were compared against surgeon-verified codes for accuracy.
Results: GPT-4 demonstrated variable accuracy across subspecialties, with an overall accuracy rate of 77.8% (n=21). Performance was strongest in aesthetic surgery (n=6; 100%), breast surgery (n=4; 100%), and burn procedures (n=3; 100%). Lower accuracy rates were observed in reconstructive (n=5; 71.4%), hand (n=2; 50%), and craniofacial surgery (n=1; 33.3%). CPT codes were incorrectly identified for forearm amputation, nerve repair, nasal bone fracture, mandibular fracture, head and neck pedicled flap, and head and neck free flap. Common errors included misidentification of the level of amputation, confusion between vessel and nerve graft codes, misclassification of internal versus external fixation in facial procedures, improper differentiation between muscle and skin flaps for complex reconstruction, and usage of outdated codes.
Conclusions: GPT-4 shows promise as a CPT coding assistant, particularly in aesthetic, breast, and burn procedures. However, its variable performance across subspecialties suggests that current implementations require human oversight, especially in reconstructive, hand, and craniofacial cases.