Time | to 01:00 pm Add to Calendar 2024-11-20 12:00:00 2024-11-20 13:00:00 Using Binary Predictors for Text Classification with Discriminative Models HHD101 Population Research Institute hxo5077@psu.edu America/New_York public |
---|---|
Location | HHD101 |
Presenter(s) | Our speaker for this week is Priyanka Paul, a doctoral student in Human Development and Family Studies (HDFS) |
Description |
Naturalistic observation methods have enabled researchers to analyze text, audio, and video data, thus offering valuable insights into human behavior and development. While advanced statistical techniques have long been used to explore the complex process and patterns of human development, the integration of machine learning algorithms offers new opportunities to quantitatively analyze these distinct forms of data and uncover meaningful insights. The present study uses a toy dataset (Kaggle dataset) to explore and test the use of discriminative machine learning models to analyze text data to predict suicidal ideation. It focuses on comparing the following machine learning models: Logistic Regression, Random Forest, and Support Vector Machine to predict suicidal ideation based on user comments on the Reddit platform. The purpose of testing on the Kaggle dataset is to address challenges such as ‘overfitting’ and to be able to fine-tune the preprocessing of text data before applying the model to a real-world dataset (FamBest Study). Although the specific research questions may vary, the core focus of classifying text to predict binary outcomes will remain consistent when applied to the real-world dataset, ensuring the model performs well and is ready for testing on the FamBest dataset. |
Contact Person | Hyungeun Oh |
Contact Email | hxo5077@psu.edu |