Evaluation and optimization of a machine leaning alogrithm
Updated:
The hypothesis may be overfitting the training sets (usually takes 70% data set). We need cross validation set and test set to evaluate and test current learning algorithm. Status of the algorithm can be evaluated using the learning curve ploting the cost of cross_validation and training sets.
Evaluation of test error:
Linear regression: $$J_{test}(\theta) = \frac{1}{2m_{test}}\sum_{i=1}^{m_{test}}(h_\theta(x_{test}^{(i)}-y_{test}^{(i)})^2])$$
Classicfication: $$J_{error}(\theta) = \frac{1}{m_{test}}\sum_{i=1}^{m_{test}}err(h_\theta(x_{test}^{(i)}),y_{test}^{(i)})$$
where, $$err(h_\theta(x),y) = 1$$, if $h_{\theta}(x)> 0.5$ $and$ $y= 0$ or $h_{\theta}(x)<0.5$ $and$ $ y= 1$ ( make sure how to write otherwise)
Diagnosing bias or variance
High Bias | High Variation |
---|---|
Under-fitting | Overfitting |
$J_{train}(\theta)$ is high; $J_{cv}(\theta)\approx J_{train}(\theta) $ | $J_{train}(\theta)$ is low; $J_{cv}(\theta) >> J_{train}(\theta) $ |
decrease $\lambda$; more features; Adding polynomial features | increase $\lambda$ ; Less features ; Increase training set size |
Design ML systems, example
Spam email classifier
- Define a feature: choose spam word list (deal, buy, discount, …), can be sorted alphabetically
- Vectorized the email content into a list and check if the spam words are in the list.
- Input of a training example consist a vector $X=[0,1,0,1…1,0]$ indicating whether the “spam words” apears.
- Optimization method:
- More data
- Features based on header
- Spell checking
- Error analysis
- Start with simple, quick, dirty algorithm
- Plot learning curves
- Error analysis:
- Manual examimation: classification, numerical evaluation,
- Skewed classes: ratio of positive and negative example is too extreme, —use precision/recall
F score
We need to define a proper threshold value as cutoff for our hypothesis (for instance, the usually choice of 0.5 in logistic regression model), $g(z)$ bigger than the threshold are consider true prediction. If the threshold is big ($>0.5$) we have the risk of putting too much positive case in false prediction group, which is call high recall. Otherwise, if threshild is small,
Precision (P) an d Recall number (R): recall here means (true positive) over No. of actural positive. To evaluate the current threshold, we use F score to evaluate.
$$F_1=\frac{2PR}{(P+R)}$$
Always use cross validation set to test whether the