Deming Regression

2018. 4. 27. 11:48

#!/usr/bin/env python3

Deming Regression

Deming regression은 total regression^전회귀로도 불립니다.

Deming regression는 y값과 x값의 오차를 최소화 합니다.

Deming regression을 구현하기 위해서는 Loss Cost Function을 수정해야합니다.

일반적인 선형 회귀의 비용함수는 수직거리를 최소화 하기 때문입니다.

직선의 기울기와 y절편을 이용하여 점까지 수식 거리를 구하고

tensorflow가 그 값을 최소화 하게 합니다.

직선까지의 수직 거리를 최소화(좌) 직선까지의 전체거리를 최소화(우)

출처: https://github.com/nfmcclure/tensorflow_cookbook/

데이터를 로딩하고 placeholder를 다음과 같이 생성해보겠습니다.

import tensorflow as tf

from tensorflow.python.framework import ops

import numpy as np

from sklearn.datasets import load_iris

ops.reset_default_graph()

iris = load_iris()

print(iris.keys())

# dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])

print(iris.feature_names)

# ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

# load the data

x_val = iris.data[:,3] # petal width

y_val = iris.data[:,0] # sepal length

# initialize placeholders

x_data = tf.placeholder(shape=[None, 1], dtype=tf.float32)

y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)

# create variables for linear regression

A = tf.Variable(tf.random_normal(shape=[1, 1]))

b = tf.Variable(tf.random_uniform(shape=[1, 1]))

직선 $y = mx +b$와 점 $(x_{0}, y_{0})$가 주어졌을 때 둘 사이의 수직 거리는 다음과 같이 쓸 수 있습니다.

$$d=\frac{\left |y_{0}-(mx_{0}+b) \right|}{\sqrt{m^{2}+1}}$$

따라서 이 식을 이용하여 loss function을 재구성하여 linear regression을 구현해보겠습니다.

with tf.Session() as sess:

fomula = tf.add(tf.matmul(x_data, A) ,b)

demm_numer = tf.abs(tf.subtract(fomula, y_target)) # numerator

demm_denom = tf.sqrt(tf.add(tf.square(A), 1)) # denominator

loss = tf.reduce_mean(tf.truediv(demm_numer, demm_denom)) # 점과 직선사이의 거리

opt = tf.train.GradientDescentOptimizer(learning_rate=0.15)

train_step = opt.minimize(loss)

init = tf.global_variables_initializer()

init.run()

loss_vec = []

batch_size = 125

for i in range(1000):

rand_idx = np.random.choice(len(x_val), size=batch_size)

rand_x = x_val[rand_idx].reshape(-1, 1)

rand_y = y_val[rand_idx].reshape(-1, 1)

my_dict = {x_data:rand_x, y_target:rand_y}

sess.run(train_step, feed_dict=my_dict)

temp_loss = sess.run(loss, feed_dict=my_dict)

loss_vec.append(temp_loss)

if (i+1)%100==0:

print('step {}: A={}, b={}, Loss={}'.format(i+1, A.eval(), b.eval(), temp_loss))

# step 100: A=[[2.8481812]], b=[[2.1150784]], Loss=0.39886653423309326

# step 200: A=[[2.4716957]], b=[[2.581221]], Loss=0.4149680733680725

# step 300: A=[[2.0858126]], b=[[3.1767926]], Loss=0.37009572982788086

# step 400: A=[[1.5102198]], b=[[3.989578]], Loss=0.30516621470451355

# step 500: A=[[1.0213077]], b=[[4.55735]], Loss=0.25061553716659546

# step 600: A=[[1.0353084]], b=[[4.609328]], Loss=0.2725234925746918

# step 700: A=[[1.0107175]], b=[[4.6160936]], Loss=0.3082656264305115

# step 800: A=[[1.0400845]], b=[[4.612001]], Loss=0.27881959080696106

# step 900: A=[[1.0318567]], b=[[4.6159105]], Loss=0.27347463369369507

# step 1000: A=[[0.9662517]], b=[[4.5973287]], Loss=0.2258552461862564

[slope] = A.eval()

[cept] = b.eval()

위의 결과를 시각화하는 코드는 다음와 같습니다.

import matplotlib.pyplot as plt

best_fit = []

for i in x_val.ravel():

poly = i*slope[0] + cept[0]

best_fit.append(poly)

_, axes = plt.subplots(1, 2)

axes[0].scatter(x_val, y_val, edgecolors='k', label='Data Points')

axes[0].plot(x_val, best_fit, c='red', label='Best fit line')

axes[0].set_title('Petal Width vs Sepal Length', size=12)

axes[0].set_xlabel('Petal Width')

axes[0].set_ylabel('Sepal Length')

axes[0].legend(loc=2)

axes[1].plot(loss_vec, c='k')

axes[1].set_title('Demming Loss per Generation', size=12)

axes[1].set_xlabel('Iteration')

axes[1].set_ylabel('Demming Loss')

plt.show()

Demming Regression과 Loss Function

참고 자료:

[1]TensorFlow Machine Learning Cookbook, Nick McClure

[2]https://github.com/nfmcclure/tensorflow_cookbook

'Tensorflow > Linear Regression' 카테고리의 다른 글

Elastic Net Regression (0)	2018.04.29
LASSO and Ridge Regression (1)	2018.04.27
Loss Function in Linear Regressions (0)	2018.04.26
TensorFlow Way of LinearRegression (0)	2018.04.26
Implementing_a_Decomposition_Method with the Cholesky Decomposition Method (0)	2018.04.26

게으른 우루루

Deming Regression

'Tensorflow > Linear Regression' 카테고리의 다른 글

+ Recent posts

티스토리툴바