Metrics and scoring in scikit-learn

In this article, we explore how to use a custom scoring object. While working on an integrated project for my Practicum data science Bootcamp, I came across this interesting topic.

Evaluating the quality of a model's predictions is essential in model selection. We can use 1) estimator score method, 2) scoring parameter, 3) metric functions to evaluate models. Always be aware that dummy models serve as helpful baselines for model comparison.    

Here we look at how to implement my scoring object from scratch. The scenario is that I want to use my self-defined score function in cross-validation. My score function measures symmetric mean absolute error, which means it is a loss function, and the smaller the error, the better the model.

First, I define my scoring (loss) function my_score_func. To make it work properly for the cross-validation, we should be aware that for a loss function, the output of the my_score_func function is negated by the scorer object scoring = make_scorer(my_score_func, greater_is_better=False), conforming to the cross-validation convention that scorers return higher values for better models. That means, my models' true and correct score should be -scoring.

Of course, remember to import make_scorer:

from sklearn.metrics import make_scorer

Understanding this part offers vast flexibility in model evaluation.
I hope my small article provides useful takeaway messages for readers.

Thank you for your time reading it. 😊

Jinyu Du

Jinyu Du