Fundamentals and Implementation of Machine Learning


Introduction

Machine learning is a technology that automatically learns patterns from data to perform predictions and classifications. This article introduces fundamental machine learning algorithms and provides implementation examples.

Linear Regression

Linear regression is a foundational supervised learning method used for predicting continuous values.

Mathematical Foundation

A linear regression model is expressed by the following equation:

y^=w0+w1x1+w2x2++wnxn\hat{y} = w_0 + w_1 x_1 + w_2 x_2 + \cdots + w_n x_n

Or, in vector form:

y^=wTx\hat{y} = \mathbf{w}^T \mathbf{x}

The loss function based on the ordinary least squares method is:

J(w)=12mi=1m(y^(i)y(i))2J(\mathbf{w}) = \frac{1}{2m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)})^2

where mm is the number of samples.

Implementation in Python

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# データの生成
np.random.seed(42)
X = np.random.randn(100, 3)
true_weights = np.array([2.0, -3.0, 1.5])
y = X @ true_weights + np.random.randn(100) * 0.1

# モデルの構築と訓練
model = LinearRegression()
model.fit(X, y)

# 予測
y_pred = model.predict(X)

# 評価
mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)

print(f"MSE: {mse:.4f}")
print(f"R² Score: {r2:.4f}")
print(f"推定されたウェイト: {model.coef_}")

Classification with Logistic Regression

Logistic regression is used for binary classification problems.

Sigmoid Function

Logistic regression applies the sigmoid function to a linear combination to obtain a probability:

P(y=1x)=11+ewTxP(y=1|\mathbf{x}) = \frac{1}{1 + e^{-\mathbf{w}^T\mathbf{x}}}

The cross-entropy loss is:

L(w)=1mi=1m[y(i)log(p^(i))+(1y(i))log(1p^(i))]L(\mathbf{w}) = -\frac{1}{m}\sum_{i=1}^{m}\left[y^{(i)}\log(\hat{p}^{(i)}) + (1-y^{(i)})\log(1-\hat{p}^{(i)})\right]

Neural Network Implementation in JavaScript

class NeuralNetwork {
    constructor(inputSize, hiddenSize, outputSize) {
        this.inputSize = inputSize;
        this.hiddenSize = hiddenSize;
        this.outputSize = outputSize;
        
        // ウェイトの初期化
        this.W1 = this.randomMatrix(inputSize, hiddenSize);
        this.b1 = new Array(hiddenSize).fill(0);
        this.W2 = this.randomMatrix(hiddenSize, outputSize);
        this.b2 = new Array(outputSize).fill(0);
    }
    
    randomMatrix(rows, cols) {
        return Array(rows).fill(0).map(() => 
            Array(cols).fill(0).map(() => Math.random() - 0.5)
        );
    }
    
    sigmoid(x) {
        return 1 / (1 + Math.exp(-x));
    }
    
    relu(x) {
        return Math.max(0, x);
    }
    
    forward(input) {
        // 隠れ層
        this.z1 = this.matmul(input, this.W1);
        this.a1 = this.z1.map(x => x + this.b1[0]).map(x => this.relu(x));
        
        // 出力層
        this.z2 = this.matmul(this.a1, this.W2);
        this.a2 = this.z2.map((x, i) => this.sigmoid(x + this.b2[i]));
        
        return this.a2;
    }
    
    matmul(a, b) {
        // 簡略化された行列積(実装)
        return Array(b[0].length).fill(0).map((_, j) => 
            a.reduce((sum, _, i) => sum + a[i] * b[i][j], 0)
        );
    }
}

// 使用例
const nn = new NeuralNetwork(2, 4, 1);
const input = [1.0, 0.5];
const output = nn.forward(input);
console.log("予測出力:", output);

Gradient Descent

The fundamental algorithm for updating parameters is Gradient Descent:

w:=wαJ(w)\mathbf{w} := \mathbf{w} - \alpha \nabla J(\mathbf{w})

where α\alpha is the learning rate and J\nabla J is the gradient.

Introducing Momentum

To accelerate the convergence of gradient descent, we can use momentum:

v:=βvαJ(w)\mathbf{v} := \beta \mathbf{v} - \alpha \nabla J(\mathbf{w}) w:=w+v\mathbf{w} := \mathbf{w} + \mathbf{v}

Typically, a value of β=0.9\beta = 0.9 is used.

Regularization

To prevent overfitting, we add a regularization term:

J(w)=1mi=1mL(y(i),y^(i))+λw2J(\mathbf{w}) = \frac{1}{m}\sum_{i=1}^{m}L(y^{(i)}, \hat{y}^{(i)}) + \lambda \|\mathbf{w}\|^2

  • L2 Regularization (Ridge): λw22\lambda \|\mathbf{w}\|_2^2
  • L1 Regularization (Lasso): λw1\lambda \|\mathbf{w}\|_1

Implementation Best Practices

Important considerations when implementing machine learning models:

  1. Feature Scaling: Standardize or normalize input values.
  2. Class Balance: Implement strategies for imbalanced datasets.
  3. Hyperparameter Tuning: Utilize grid search or Bayesian optimization.
  4. Cross-Validation: Improve the accuracy of model evaluation.
  5. Detecting Overfitting: Monitor the gap between training error and test error.

Conclusion

Machine learning is a fusion of statistics and optimization. Understanding basic algorithms allows you to advance to training more complex models. When implementing, it is always crucial to prioritize data preprocessing and robust model evaluation.