NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting

There were insufficient instances in the notes of the emotional support subcategories to evaluate the NLP systems.


This content originally appeared on HackerNoon and was authored by Natural Language Processing

Abstract and 1. Introduction

2 Data

2.1 Data Sources

2.2 SS and SI Categories

3 Methods

3.1 Lexicon Creation and Expansion

3.2 Annotations

3.3 System Description

4 Results

4.1 Demographics and 4.2 System Performance

5 Discussion

5.1 Limitations

6 Conclusion, Reproducibility, Funding, Acknowledgments, Author Contributions, and References

\ SUPPLEMENTARY

Guidelines for Annotating Social Support and Social Isolation in Clinical Notes

Other Supervised Models

5.1 Limitations

Several limitations should be noted. There were insufficient instances in the notes of the emotional support subcategories to evaluate the NLP systems. Emotional support (and lack thereof) is an important and distinct fine-grained category that would ideally be identified in the notes. Second, the RBS was designed with specific lexicons from manual review at MSHS and WCM, may have experienced overfitting and led to an inflated f-score. It would be beneficial to validate these NLP systems on clinical notes from different EHR systems. Other healthcare systems that implement a lexicon-based rules approach will need to perform site-specific template removal to avoid the problem of false-positives. With fine-tuning, the LLM approach may have been able to correctly interpret the templates; however, because the templates were removed from the notes before the annotation process, this was not assessed.

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

:::info Authors:

(1) Braja Gopal Patra, Weill Cornell Medicine, New York, NY, USA and co-first authors;

(2) Lauren A. Lepow, Icahn School of Medicine at Mount Sinai, New York, NY, USA and co-first authors;

(3) Praneet Kasi Reddy Jagadeesh Kumar. Weill Cornell Medicine, New York, NY, USA;

(4) Veer Vekaria, Weill Cornell Medicine, New York, NY, USA;

(5) Mohit Manoj Sharma, Weill Cornell Medicine, New York, NY, USA;

(6) Prakash Adekkanattu, Weill Cornell Medicine, New York, NY, USA;

(7) Brian Fennessy, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(8) Gavin Hynes, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(9) Isotta Landi, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(10) Jorge A. Sanchez-Ruiz, Mayo Clinic, Rochester, MN, USA;

(11) Euijung Ryu, Mayo Clinic, Rochester, MN, USA;

(12) Joanna M. Biernacka, Mayo Clinic, Rochester, MN, USA;

(13) Girish N. Nadkarni, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(14) Ardesheer Talati, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA and New York State Psychiatric Institute, New York, NY, USA;

(15) Myrna Weissman, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA and New York State Psychiatric Institute, New York, NY, USA;

(16) Mark Olfson, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA, New York State Psychiatric Institute, New York, NY, USA, and Columbia University Irving Medical Center, New York, NY, USA;

(17) J. John Mann, Columbia University Irving Medical Center, New York, NY, USA;

(18) Alexander W. Charney, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(19) Jyotishman Pathak, Weill Cornell Medicine, New York, NY, USA.

:::

\


This content originally appeared on HackerNoon and was authored by Natural Language Processing


Print Share Comment Cite Upload Translate Updates
APA

Natural Language Processing | Sciencx (2025-04-01T22:48:55+00:00) NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting. Retrieved from https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/

MLA
" » NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting." Natural Language Processing | Sciencx - Tuesday April 1, 2025, https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/
HARVARD
Natural Language Processing | Sciencx Tuesday April 1, 2025 » NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting., viewed ,<https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/>
VANCOUVER
Natural Language Processing | Sciencx - » NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/
CHICAGO
" » NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting." Natural Language Processing | Sciencx - Accessed . https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/
IEEE
" » NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting." Natural Language Processing | Sciencx [Online]. Available: https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/. [Accessed: ]
rf:citation
» NLP Performance in Clinical Notes: Addressing Data Limitations and System Overfitting | Natural Language Processing | Sciencx | https://www.scien.cx/2025/04/01/nlp-performance-in-clinical-notes-addressing-data-limitations-and-system-overfitting/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.