Reddit Forum Classifier
Natural Language Processing
Description:
  • Utilized scikit-learn and matplotlib Python libraries to build a NLP model to quickly identify the mood of an online forum by training the model to deduce the sentiment of the text speech into happy, sad, and neutral classifications.
  • Used NumPy to convert string inputs into vectors that were imported into scikit-learn models. Explored SVMs with linear and quadratic kernels, L1 and L2 penalties, hinge and squared-hinge losses, and grid-search and random-search.
  • Selected hyperparameters using 5-fold cross-validation on the training data and resulted with a 95%+ accuracy performance.
Skills:
Scikit-Learn, Matplotlib, Pandas, NumPy, Support Vector Machines, Cross-Validation
Statistics:

6,000+

Reddit
Comments

95%+

Accuracy
Score

97%+

AUROC
Score

99%+

Sensitivity
Score