[CV] 인공지능 개론(4) 선형 회귀(Linear Regression) & 다항 회귀(Polynomial Regression)

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

인공지능 요모조모

[CV] 인공지능 개론(4) 선형 회귀(Linear Regression) & 다항 회귀(Polynomial Regression) 본문

ROKEY/Machine Learning

[CV] 인공지능 개론(4) 선형 회귀(Linear Regression) & 다항 회귀(Polynomial Regression)

dvl.hyeon_ 2025. 2. 27. 17:49

선형 회귀(Linear regression)

▪️독립 변수와 종속 변수 간의 선형 관계를 모델링

▪️즉, 독립 변수의 변화에 따라 종속 변수가 선형적으로 변함

▪️모델은 일차 함수 형태인 직선으로 표현됨

$$y=ax+b$$

✅x는 변수이므로, 어떤 값이든 적용될 수 있지만, 기울기 a와 절편 b는 모르는 상태

✅즉, 기울기와 절편의 값을 알게된다면 원하는 x값을 대입했을 때 y값을 얻을 수 있다는 것

다항 회귀(Polynomial regression)

▪️선형 회귀와 유사하지만, 독립 변수와 종속 변수 간의 관계를 다항식으로 모델링

▪️다항식 특징: 비선형 모델을 사용하지 않으면서도 일부 비선형 관계를 포착할 수 있는 방법

▪️= 독립 변수의 변화에 따라 종속 변수가 비선형적으로 변화할 수 있음

$$y=anxn+ ... +a2x2+ax+b$$

✅데이터가 비선형 패턴을 따르거나 직선이 아닌 패턴을 가질 때 유용하게 적용

✅모델 과적합의 가능성이 높기 때문에 적절한 다항식의 차수를 선택하는 것이 중요

1️⃣단일 속성 다항 회귀

▪️속성(feature)이 1개인 경우

2️⃣다중 다항 회귀

▪️속성(feature)이 여러 개인 경우

실습 1. 단순 선형 회귀 분석

▪️입력: 1개 & 출력 1개인 선형 함수

# 난수 시드값 고정
torch.manual_seed(123)

# 입력과 출력이 모두 1인 선형 함수 정의
l1 = nn.Linear(1, 1)

# 초깃값 설정
nn.init.constant_(l1.weight, 2.0)
nn.init.constant_(l1.bias, 1.0)

''' 테스트용 데이터 생성 '''
# x_np를 넘파이 배열로 정의
x_np = np.arange(-2, 2.1, 1) # float64

# 텐서 변수화
x = torch.tensor(x_np) # float32

# (N,1) 사이즈로 변경
x = x.view(-1,1)

# 결과 확인
print(x)

▪️선형 함수 확인

# (1)
print(l1)

# (2)
print(list(l1.parameters()))

# (3)
name, tensor = list(l1.named_parameters())[0]
print(name, tensor[0], tensor[0].shape)

▪️numpy.arange(start, stop, step)

반열린구간 [start, stop) 에서 step의 크기만큼 일정하게 떨어져 있는 숫자들을 array 형태로 반환해 주는 함수

▪️입력: 2개 & 출력 1개인 선형 함수

# 입력은 2, 출력은 1인 선형 함수 정의
l2 = nn.Linear(2, 1)

# 초깃값 설정
nn.init.constant_(l2.weight, 1.0)
nn.init.constant_(l2.bias, 2.0)

# 2차원 넘파이 배열
x2_np = np.array([[0, 0], [0, 1], [1, 0], [1,1]])

# 텐서 변수화
x2 =  torch.tensor(x2_np).float()

# 함수 값 계산
y2 = l2(x2)

▪️입력: 2개 & 출력 3개인 선형 함수

# 입력은 2, 출력은 3인 선형 함수 정의
l3 = nn.Linear(2, 3)

# 초깃값 설정
nn.init.constant_(l3.weight[0,:], 1.0)
nn.init.constant_(l3.weight[1,:], 2.0)
nn.init.constant_(l3.weight[2,:], 3.0)
nn.init.constant_(l3.bias, 2.0)

# 함수 값 계산
y3 = l3(x2)

▪️클래스를 이용한 모델 정의

class Net(nn.Module):
  def __init__(self, n_input, n_output):
    super().__init__()
    self.l1 = nn.Linear(n_input, n_output)
  
  def forward(self, x):
    x1 = self.l1(x) # 선형 회귀
    return x1
    
    
n_input, n_output = 1, 1
net = Net(n_input, n_output)

inputs = torch.rand(100, 1)
labels = torch.rand(100, 1)

# 예측 결과
outputs = net(inputs)
print("outputs = \n", outputs)

'''MSELoss 클래스를 이용한 손실함수'''
criterion = nn.MSELoss()

loss = criterion(outputs, labels)
loss.backward()
print(net.l1)

실습 2. Boston Dataset을 이용한 회귀 분석

📌Boston Dataset

: 현재 ' scikit-learn' 라이브러리에서 가져올 수 있지만, scikit-learn에서 앞으로 이 데이터를 사용할 수 없기 때문에 웹 url에서 직접 수집

# Variables in order:

# CRIM per capita crime rate by town

# ZN proportion of residential land zoned for lots over 25,000 sq.ft.

# INDUS proportion of non-retail business acres per town

# CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

# NOX nitric oxides concentration (parts per 10 million)

# RM average number of rooms per dwelling

# AGE proportion of owner-occupied units built prior to 1940

# DIS weighted distances to five Boston employment centres

# RAD index of accessibility to radial highways

# TAX full-value property-tax rate per $10,000

# PTRATIO pupil-teacher ratio by town

# B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

# LSTAT % lower status of the population

# MEDV Median value of owner-occupied homes in $1000's

# CRIM: 인구당 마을별 범죄율

# ZN: 25,000 평방피트를 초과하는 주거용 토지 비율

# INDUS: 마을별 비소매업 지역 비율

# CHAS: 찰스강 더미 변수 (강과 접한 지역 = 1, 그렇지 않으면 = 0)

# NOX: 질소 산화물 농도 (1000만 분의 1 단위)

# RM: 주택당 평균 방 개수

# AGE: 1940년 이전에 건축된 자가 소유 주택의 비율

# DIS: 보스턴 주요 고용 센터 5곳까지의 가중 거리

# RAD: 방사형 고속도로 접근성 지수

# TAX: $10,000당 재산세율

# PTRATIO: 마을별 학생-교사 비율

# B: 1000(Bk - 0.63)^2, 여기서 Bk는 마을별 흑인 인구 비율

# LSTAT: 저소득층 인구 비율

# MEDV: 자가 소유 주택의 중간값 ($1000 단위)

1️⃣데이터 준비

data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+",
                     skiprows=22, header=None)
print(raw_df.head(10))

# 짝수줄 전체 
# 홀수줄 [: 2] => Features 
# x_org[:10, :5]
x_org = np.hstack([raw_df.values[::2, :],
                   raw_df.values[1::2, :2]]) 

yt = raw_df.values[1::2, 2] ## Target
feature_names = np.array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX',
                          'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO','B', 'LSTAT'])
x_org[:5]
feature_names == 'RM'

# 데이터 추출(RM 항목)
x = x_org[:,feature_names == 'RM']
print('추출 후', x.shape)
print(x[:5,:])

# 정답 데이터 y 표시
print('정답 데이터')
print(yt[:5])

# 산포도 출력

plt.scatter(x, yt, s=10, c='b')
plt.xlabel('Room counts')
plt.ylabel('Price')
plt.title(' Scatter plot between Room counts vs Price ')
plt.show()

2️⃣회귀 모델 정의

from torchinfo import summary

n_input= x.shape[1] # 입력 차원수
n_output = 1 # 출력 차원수


# 예측 모델의 클래스 정의
class Net(nn.Module):
    def __init__(self, n_input, n_output):
        super().__init__()
        self.l1 = nn.Linear(n_input, n_output)

    def forward(self, x):
        x1 = self.l1(x) # 선형 회귀
        return x1
        
        
# 인스턴스 생성
net = Net(n_input, n_output)

# 모델 파라미터 확인
for parameter in net.named_parameters():
    print(f'변수 명: {parameter[0]}')
    print(f'변수 값: {parameter[1].data}')
    
# summary(net, (1,), device = 'cpu')


''' 하이퍼 파라미터 설정 '''
criterion = nn.MSELoss() # 손실 함수： 평균 제곱 오차
lr = 0.01 # 학습률
optimizer = optim.SGD(net.parameters(), lr=lr) # 최적화 함수: 경사 하강법

inputs = torch.tensor(x, dtype = torch.float32)

labels = torch.tensor(yt, dtype = torch.float32)
_labels = labels.view((-1, 1))

for epoch in range(num_epochs):
    optimizer.zero_grad()    # 경삿값 초기화
    outputs = net(inputs)    # 예측 계산
    
    loss = criterion(outputs, _labels)    # 손실 계산
    print(f'{loss.item():.5f}')
    
    loss.backward()    # 경사 계산
    optimizer.step()    # 파라미터 수정

    # 100회 마다 도중 경과를 기록
    if (epoch % 100 == 0):
        history = np.vstack((history, np.array([epoch, loss.item()])))
        print(f'Epoch {epoch} loss: {loss.item():.5f}')

3️⃣학습 곡선 출력(손실)

# 가장 처음 요소는 제외

plt.plot(history[1:,0], history[1:,1], 'b')
plt.xlabel('반복 횟수')
plt.ylabel('손실')
plt.title('학습 곡선(손실)')
plt.show()

4️⃣회귀 직선 산출

# x의 최솟값, 최댓값
xse = np.array((x.min(), x.max())).reshape(-1,1)
Xse = torch.tensor(xse).float()

with torch.no_grad():
  Yse = net(Xse)

print(Yse.numpy())

# 산포도와 회귀 직선 출력

plt.scatter(x, yt, s=10, c='b')
plt.xlabel('방 개수')
plt.ylabel('가격')
plt.plot(Xse.data, Yse.data, c='k')
plt.title('산포도와 회귀 직선')
plt.show()

▪️손실을 그래프로 나타내기

from torchviz import make_dot

g = make_dot(loss, params=dict(net.named_parameters()))
display(g)

'ROKEY > Machine Learning' 카테고리의 다른 글

[CV] 인공지능 개론(3) 목적함수(Objective Function), 손실함수(Loss Function), 비용함수(Cost Function) (0)	2025.02.26
[CV] 인공지능 개론(2) 최적화 알고리즘 (0)	2025.02.25
[CV] 인공지능 개론(1) FC Layer와 MLP (0)	2025.02.19

'ROKEY/Machine Learning' Related Articles

인공지능 요모조모

[CV] 인공지능 개론(4) 선형 회귀(Linear Regression) & 다항 회귀(Polynomial Regression) 본문

[CV] 인공지능 개론(4) 선형 회귀(Linear Regression) & 다항 회귀(Polynomial Regression)

'ROKEY > Machine Learning' 카테고리의 다른 글

티스토리툴바