NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting

Several limitations should be noted. There were insufficient instances in the notes of the emotional support subcategories to evaluate the NLP systems. Emotional support (and lack thereof) is an important and distinct fine-grained category that would ideally be identified in the notes. Second, the RBS was designed with specific lexicons from manual review at MSHS and WCM, may have experienced overfitting and led to an inflated f-score. It would be beneficial to validate these NLP systems on clinical notes from different EHR systems. Other healthcare systems that implement a lexicon-based rules approach will need to perform site-specific template removal to avoid the problem of false-positives. With fine-tuning, the LLM approach may have been able to correctly interpret the templates; however, because the templates were removed from the notes before the annotation process, this was not assessed.

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

:::info Authors:

(1) Braja Gopal Patra, Weill Cornell Medicine, New York, NY, USA and co-first authors;

(2) Lauren A. Lepow, Icahn School of Medicine at Mount Sinai, New York, NY, USA and co-first authors;

(3) Praneet Kasi Reddy Jagadeesh Kumar. Weill Cornell Medicine, New York, NY, USA;

(4) Veer Vekaria, Weill Cornell Medicine, New York, NY, USA;

(5) Mohit Manoj Sharma, Weill Cornell Medicine, New York, NY, USA;

(6) Prakash Adekkanattu, Weill Cornell Medicine, New York, NY, USA;

(7) Brian Fennessy, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(8) Gavin Hynes, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(9) Isotta Landi, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(10) Jorge A. Sanchez-Ruiz, Mayo Clinic, Rochester, MN, USA;

(11) Euijung Ryu, Mayo Clinic, Rochester, MN, USA;

(12) Joanna M. Biernacka, Mayo Clinic, Rochester, MN, USA;

(13) Girish N. Nadkarni, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(14) Ardesheer Talati, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA and New York State Psychiatric Institute, New York, NY, USA;

(15) Myrna Weissman, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA and New York State Psychiatric Institute, New York, NY, USA;

(16) Mark Olfson, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA, New York State Psychiatric Institute, New York, NY, USA, and Columbia University Irving Medical Center, New York, NY, USA;

(17) J. John Mann, Columbia University Irving Medical Center, New York, NY, USA;

(18) Alexander W. Charney, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(19) Jyotishman Pathak, Weill Cornell Medicine, New York, NY, USA.

:::

This content originally appeared on HackerNoon and was authored by Natural Language Processing

Print Share Comment Cite Upload Translate Updates

APA

Natural Language Processing | Sciencx (2025-04-01T22:48:55+00:00) NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting. Retrieved from https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/

MLA

" » NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting." Natural Language Processing | Sciencx - Tuesday April 1, 2025, https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/

HARVARD

Natural Language Processing | Sciencx Tuesday April 1, 2025 » NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting., viewed ,<https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/>

VANCOUVER

Natural Language Processing | Sciencx - » NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/

CHICAGO

" » NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting." Natural Language Processing | Sciencx - Accessed . https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/

IEEE

" » NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting." Natural Language Processing | Sciencx [Online]. Available: https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/. [Accessed: ]

rf:citation

» NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting | Natural Language Processing | Sciencx | https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

5.1 Limitations

Related Posts