Assessing ML classification algorithms and NLP techniques for depression detection: An experimental case study

Lorenzoni, Giuliano; Tavares, Cristina; Nascimento, Nathalia; Alencar, Paulo; Cowan, Donald

Assessing ML classification algorithms and NLP techniques for depression detection: An experimental case study

dc.contributor.author	Lorenzoni, Giuliano
dc.contributor.author	Tavares, Cristina
dc.contributor.author	Nascimento, Nathalia
dc.contributor.author	Alencar, Paulo
dc.contributor.author	Cowan, Donald
dc.date.accessioned	2025-07-03T19:54:02Z
dc.date.available	2025-07-03T19:54:02Z
dc.date.issued	2025
dc.description	© 2025 Lorenzoni et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
dc.description.abstract	Context and background. Depression has affected millions of people worldwide and has become one of the most common mental disorders. Early mental disorder detection can reduce costs for public health agencies and prevent other major comorbidities. Additionally, the shortage of specialized personnel is very concerning since depression diagnosis is highly dependent on expert professionals and is time-consuming. Research problems. Recent research has evidenced that machine learning (ML) and natural language processing (NLP) tools and techniques have significantly benefited the diagnosis of depression. However, there are still several challenges in the assessment of depression detection approaches in which other conditions such as post-traumatic stress disorder (PTSD) are present. These challenges include assessing alternatives in terms of data cleaning and pre-processing techniques, feature selection, and appropriate ML classification algorithms. Purpose of the study. This paper tackles such an assessment based on a case study that compares different ML classifiers, specifically in terms of data cleaning and pre-processing, feature selection, parameter setting, and model choices. Methodology. The experimental case study is based on the Distress Analysis Interview Corpus - Wizard-of-Oz (DAIC-WOZ) dataset, which is designed to support the diagnosis of mental disorders such as depression, anxiety, and PTSD. Major findings. Besides the assessment of alternative techniques, we were able to build models with accuracy levels around 84% with Random Forest and XGBoost models, which is significantly higher than the results from the comparable literature which presented the level of accuracy of 72% from the SVM model. Conclusions. More comprehensive assessments of ML classification algorithms and NLP techniques for depression detection can advance the state of the art in terms of improved experimental settings and performance.
dc.description.sponsorship	Natural Sciences and Engineering Research Council of Canada (NSERC) \|\| Centre for Community Mapping (COMAP).
dc.identifier.uri	https://doi.org/10.1371/journal.pone.0322299
dc.identifier.uri	https://hdl.handle.net/10012/21971
dc.language.iso	en
dc.publisher	Public Library of Science (PLOS)
dc.relation.ispartofseries	PLOS One; 20(5); e0322299
dc.rights	Attribution 4.0 International	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.subject	depression
dc.subject	machine learning
dc.subject	natural learning processing
dc.subject	machine learning algorithms
dc.subject	semantics
dc.subject	support vector machines
dc.subject	mental health and psychiatry
dc.subject	preprocessing
dc.title	Assessing ML classification algorithms and NLP techniques for depression detection: An experimental case study
dc.type	Article
dcterms.bibliographicCitation	Lorenzoni, G., Tavares, C., Nascimento, N., Alencar, P., & Cowan, D. (2025). Assessing ML classification algorithms and NLP techniques for Depression detection: An experimental case study. PLOS One, 20(5). https://doi.org/10.1371/journal.pone.0322299
uws.contributor.affiliation1	Faculty of Mathematics
uws.contributor.affiliation2	David R. Cheriton School of Computer Science
uws.peerReviewStatus	Reviewed
uws.scholarLevel	Faculty
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: journal.pone.0322299.pdf
Size:: 302.5 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 4.47 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Waterloo Research