Coursera-Machine Learning 之 Logistic Regression (逻辑回归)-0x02

来源：互联网发布：软件是做什么的编辑：程序博客网时间：2024/06/10 03:00

Cost Function

Training set(训练集):
{(x(1),(y(1)),(x(2),(y(2)),...,(x(m),(y(m))}

m 个训练样本

x \in ⎡ ⎣ ⎢ ⎢ ⎢ x 0 x 1 . . . x n ⎤ ⎦ ⎥ ⎥ ⎥ x 0 = 1, y \in {0, 1}

hθ(x)=11+e−θTx

如何选择拟合参数 θ ?

代价函数

线性回归：
J(θ)=1m∑i=1m12(hθ(x(i))−y(i))2

Cost(hθ(x(i)),y(i))=12(hθ(x(i))−y(i))2

Logistic 回归：

C o s t (h θ (x (i)), y (i)) = {- l o g (h θ (x)) - l o g (1 - h θ (x)) if y = 1 if y = 0

Note:

y=0 or 1 always

结合函数图像比较好理解。

Simplified cost function and gradient descent

Cost(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x))

J(θ)=1m∑i=1mCost(hθ(x(i)),y(i))=−1m[∑i=1my(i)log hθ(x(i))+(1−y(i))log(1−hθ(x(i))]

拟合参数 θ:

minθJ(θ)

针对一个新的 x 预测输出值：

Output hθ(x)=11+e−θTx

want minθJ(θ):

Repeat {
θj:=θj−α∂∂θjJ(θ)
}

∂∂θjJ(θ)=1m∑i=1m(hθ(x(i)),y(i))x(i)j

Advanced Optimization(高级优化)

Optimization algorithm

Gradient descent（梯度下降）
Conjugate gradient（共轭梯度法）
BFGS（变尺度法）
L-BFGS（限制变尺度法）

后三种算法优点：
不需要手动选择学习率
一般情况下比梯度下来收敛得更快
缺点：更加复杂

Example:
θ=[θ0θ1]

J(θ)=(θ1−5)2+(θ2−5)2

∂∂θ1J(θ)=2(θ1−5)

∂∂θ2J(θ)=2(θ2−5)

function [jVal, gradient] = costFunction(theta)    jVal = (theta(1) - 5)^2 + (theta(2) - 5)^2;    gradient = zeros(2, 1);    gradient(1) = 2*(theta(1) - 5);    gradient(2) = 2*(theta(2) - 5);options = optimset('GradObj', 'on', 'MaxIter', '100');initialTeta = zeros(2,1);[optTheta, functionVal, exitFlag]     = fminunc(@costFunction, initialTheta, options);

Multiclass Classification: One-vs-all

One-vs-all(one-vs-rest)
h(i)θ(x)=P(y=i|x;θ) (x=1,2,3)

给定新的输入 x 值，选择对应类别：

maxih(i)θ(x)

0 0