Evaluation and optimization of a machine leaning alogrithm

Contents

1. Design ML systems, example

The hypothesis may be overfitting the training sets (usually takes 70% data set). We need cross validation set and test set to evaluate and test current learning algorithm. Status of the algorithm can be evaluated using the learning curve ploting the cost of cross_validation and training sets.

Evaluation of test error:

Linear regression: $$J_{test}(\theta) = \frac{1}{2m_{test}}\sum_{i=1}^{m_{test}}(h_\theta(x_{test}^{(i)}-y_{test}^{(i)})^2])$$
Classicfication: $$J_{error}(\theta) = \frac{1}{m_{test}}\sum_{i=1}^{m_{test}}err(h_\theta(x_{test}^{(i)}),y_{test}^{(i)})$$

where, $$err(h_\theta(x),y) = 1$$, if $h_{\theta}(x)> 0.5$ $and$ $y= 0$ or $h_{\theta}(x)<0.5$ $and$ $ y= 1$ ( make sure how to write otherwise)
Diagnosing bias or variance

High Bias	High Variation
Under-fitting	Overfitting
$J_{train}(\theta)$ is high; $J_{cv}(\theta)\approx J_{train}(\theta) $	$J_{train}(\theta)$ is low; $J_{cv}(\theta) >> J_{train}(\theta) $
decrease $\lambda$; more features; Adding polynomial features	increase $\lambda$ ; Less features ; Increase training set size

Design ML systems, example

Spam email classifier

Define a feature: choose spam word list (deal, buy, discount, …), can be sorted alphabetically
Vectorized the email content into a list and check if the spam words are in the list.
Input of a training example consist a vector $X=[0,1,0,1…1,0]$ indicating whether the “spam words” apears.
Optimization method:
- More data
- Features based on header
- Spell checking
Error analysis
- Start with simple, quick, dirty algorithm
- Plot learning curves
- Error analysis:
  - Manual examimation: classification, numerical evaluation,
  - Skewed classes: ratio of positive and negative example is too extreme, —use precision/recall

F score

We need to define a proper threshold value as cutoff for our hypothesis (for instance, the usually choice of 0.5 in logistic regression model), $g(z)$ bigger than the threshold are consider true prediction. If the threshold is big ($>0.5$) we have the risk of putting too much positive case in false prediction group, which is call high recall. Otherwise, if threshild is small,

Precision (P) an d Recall number (R): recall here means (true positive) over No. of actural positive. To evaluate the current threshold, we use F score to evaluate.

$$F_1=\frac{2PR}{(P+R)}$$

Always use cross validation set to test whether the