
Implementing Logistic Regression in Python
Logistic regression is one of the most widely used classification algorithms in machine learning. Unlike linear regression, which predicts continuous values, logistic regression predicts the probability that an observation belongs to a particular category. It’s especially useful for binary classification tasks, like predicting whether an email is spam or not, or whether a customer will make a purchase.
Today, we’ll build a logistic regression model from scratch using Python and NumPy. By the end of this article, you'll have a clear understanding of how logistic regression works under the hood and how to implement it yourself.
The Math Behind Logistic Regression
At the heart of logistic regression is the logistic function, also known as the sigmoid function. This S-shaped curve maps any real-valued number into a value between 0 and 1. The formula for the sigmoid function is:
$$ \sigma(z) = \frac{1}{1 + e^{-z}} $$
Here, ( z ) is the linear combination of input features and weights, i.e., ( z = w_0 + w_1x_1 + w_2x_2 + ... + w_nx_n ).
The goal of logistic regression is to find the best parameters (weights) that minimize the cost function. For logistic regression, we use the log loss (or binary cross-entropy) cost function:
$$ J(w) = -\frac{1}{m} \sum_{i=1}^{m} [y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)})] $$
Where ( m ) is the number of training examples, ( y^{(i)} ) is the actual label, and ( \hat{y}^{(i)} ) is the predicted probability.
To minimize this cost function, we use gradient descent. The gradients for the weights are computed as:
$$ \frac{\partial J}{\partial w_j} = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)}) x_j^{(i)} $$
This is similar to linear regression, but note that ( \hat{y}^{(i)} ) is the output of the sigmoid function here.
Let’s see how these formulas translate into code.
Implementing the Sigmoid Function
First, we'll implement the sigmoid function. This will be used to convert our linear predictions into probabilities.
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
Test it with a few values to make sure it works:
print(sigmoid(0)) # Output: 0.5
print(sigmoid(10)) # Output: ~0.99995
print(sigmoid(-10)) # Output: ~0.00005
As expected, values close to 0 yield 0.5, positive large numbers approach 1, and negative large numbers approach 0.
Building the Logistic Regression Model
Now, let's define our logistic regression class. We'll include methods for initializing parameters, fitting the model, making predictions, and computing accuracy.
class LogisticRegression:
def __init__(self, learning_rate=0.01, n_iters=1000):
self.lr = learning_rate
self.n_iters = n_iters
self.weights = None
self.bias = None
def fit(self, X, y):
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
for _ in range(self.n_iters):
linear_model = np.dot(X, self.weights) + self.bias
y_pred = sigmoid(linear_model)
dw = (1 / n_samples) * np.dot(X.T, (y_pred - y))
db = (1 / n_samples) * np.sum(y_pred - y)
self.weights -= self.lr * dw
self.bias -= self.lr * db
def predict(self, X):
linear_model = np.dot(X, self.weights) + self.bias
y_pred = sigmoid(linear_model)
class_pred = [1 if i > 0.5 else 0 for i in y_pred]
return class_pred
def accuracy(self, y_true, y_pred):
accuracy = np.sum(y_true == y_pred) / len(y_true)
return accuracy
In the fit
method, we initialize weights and bias to zero. Then, for each iteration, we compute the linear model, apply the sigmoid function to get probabilities, and update the weights and bias using gradient descent.
The predict
method returns the class labels (0 or 1) based on a threshold of 0.5.
Testing Our Model
Let’s test our implementation on a simple dataset. We'll use the breast cancer dataset from scikit-learn, which is a classic binary classification dataset.
from sklearn.model_selection import train_test_split
from sklearn import datasets
bc = datasets.load_breast_cancer()
X, y = bc.data, bc.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = LogisticRegression(learning_rate=0.0001, n_iters=1000)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print("Accuracy:", model.accuracy(y_test, predictions))
You should see an accuracy around 90-95%, which is pretty good for a simple model!
Note that we used a small learning rate (0.0001) because the features in this dataset have different scales. In practice, it's a good idea to standardize your features before training.
Comparison with Scikit-Learn
Let's compare our implementation with scikit-learn's logistic regression to see how we did.
from sklearn.linear_model import LogisticRegression as SKLogisticRegression
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
sk_model = SKLogisticRegression()
sk_model.fit(X_train_scaled, y_train)
sk_predictions = sk_model.predict(X_test_scaled)
sk_accuracy = np.sum(y_test == sk_predictions) / len(y_test)
print("Scikit-learn Accuracy:", sk_accuracy)
Scikit-learn's implementation might perform slightly better because it includes regularization by default and uses more advanced optimization algorithms. But our from-scratch version holds up quite well!
Key Hyperparameters
When training logistic regression, there are a few hyperparameters you should be aware of:
- Learning rate: Controls the step size during gradient descent. Too high, and you might overshoot the minimum; too low, and training will be slow.
- Number of iterations: How many times we update the weights. Too few, and the model might not converge; too many, and we waste computation.
- Regularization: Helps prevent overfitting. We didn't implement it here, but scikit-learn includes L1 and L2 regularization.
Hyperparameter | Typical Values | Effect |
---|---|---|
Learning Rate | 0.001 to 0.1 | Controls convergence speed and stability |
Iterations | 1000 to 10000 | Determines how long training runs |
Regularization | L1 or L2 | Reduces overfitting |
When to Use Logistic Regression
Logistic regression is a great choice when:
- You need a simple, interpretable model.
- Your problem is binary classification.
- You want to understand the impact of each feature (thanks to the coefficients).
However, it might not perform well if:
- The relationship between features and target is highly non-linear.
- There are complex interactions between features.
In those cases, you might want to try more advanced algorithms like decision trees or neural networks.
Improving Our Implementation
Our basic implementation works, but there's room for improvement. Here are a few ideas:
-
Add regularization: This helps prevent overfitting. For L2 regularization, the cost function becomes:
$$ J(w) = -\frac{1}{m} \sum_{i=1}^{m} [y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)})] + \frac{\lambda}{2m} \sum_{j=1}^{n} w_j^2 $$
And the gradient for weights becomes:
$$ \frac{\partial J}{\partial w_j} = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}^{(i)} - y^{(i)}) x_j^{(i)} + \frac{\lambda}{m} w_j $$
-
Add support for multiclass classification: Using strategies like one-vs-rest (OvR) or softmax regression.
-
Add early stopping: Stop training if the cost doesn't improve for a certain number of iterations.
-
Add momentum to gradient descent: This can speed up convergence.
Let's implement L2 regularization in our model:
class LogisticRegressionWithRegularization:
def __init__(self, learning_rate=0.01, n_iters=1000, lambda_param=0.01):
self.lr = learning_rate
self.n_iters = n_iters
self.lambda_param = lambda_param
self.weights = None
self.bias = None
def fit(self, X, y):
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
for _ in range(self.n_iters):
linear_model = np.dot(X, self.weights) + self.bias
y_pred = sigmoid(linear_model)
dw = (1 / n_samples) * np.dot(X.T, (y_pred - y)) + (self.lambda_param / n_samples) * self.weights
db = (1 / n_samples) * np.sum(y_pred - y)
self.weights -= self.lr * dw
self.bias -= self.lr * db
# predict and accuracy methods remain the same
Notice that we added a lambda_param
parameter and modified the weight gradient to include the regularization term.
Common Pitfalls and How to Avoid Them
When working with logistic regression, watch out for these common issues:
- Overfitting: If your model performs well on training data but poorly on test data, you might be overfitting. Try using regularization or getting more data.
- Underfitting: If performance is poor on both training and test data, your model might be too simple. Try adding more features or decreasing regularization.
- Feature scaling: Logistic regression benefits from scaled features. Always standardize or normalize your data for better performance.
- Multicollinearity: Highly correlated features can make the model unstable. Consider removing correlated features or using regularization.
Visualizing Decision Boundaries
For 2D datasets, it's helpful to visualize the decision boundary. Let's create a simple synthetic dataset and plot the boundary.
import matplotlib.pyplot as plt
# Generate synthetic data
np.random.seed(42)
X = np.random.randn(100, 2)
y = (X[:, 0] + X[:, 1] > 0).astype(int)
# Train model
model = LogisticRegression(learning_rate=0.1, n_iters=1000)
model.fit(X, y)
# Create mesh grid for plotting
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))
# Predict on mesh grid
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = np.array(Z).reshape(xx.shape)
# Plot
plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')
plt.title("Logistic Regression Decision Boundary")
plt.show()
This will show you how the model separates the two classes with a linear decision boundary.
Evaluating Model Performance
Accuracy alone doesn't always tell the whole story. For classification problems, it's important to look at other metrics:
- Precision: Of all positive predictions, how many were correct?
- Recall: Of all actual positives, how many did we correctly predict?
- F1-score: Harmonic mean of precision and recall.
- Confusion matrix: Shows true positives, false positives, true negatives, and false negatives.
Here's how to compute these metrics:
from sklearn.metrics import confusion_matrix, classification_report
cm = confusion_matrix(y_test, predictions)
print("Confusion Matrix:")
print(cm)
print("\nClassification Report:")
print(classification_report(y_test, predictions))
This gives you a much better understanding of your model's performance than accuracy alone.
Handling Imbalanced Datasets
If your classes are imbalanced (e.g., 90% of examples are class 0, 10% are class 1), accuracy can be misleading. A model that always predicts the majority class would have high accuracy but be useless.
In such cases, you can:
- Use different evaluation metrics like precision, recall, or F1-score.
- Resample the data (oversample the minority class or undersample the majority class).
- Use class weights to make the model pay more attention to the minority class.
Scikit-learn's logistic regression has a class_weight
parameter that can help with this.
Conclusion
You've now implemented logistic regression from scratch in Python! We covered the key concepts:
- The sigmoid function and how it maps values to probabilities.
- The log loss cost function and its gradient.
- How to train the model using gradient descent.
- Ways to improve the model with regularization.
- How to evaluate and interpret the results.
Logistic regression is a powerful yet simple algorithm that forms the foundation for many more complex models. Understanding how it works internally will make you a better data scientist or machine learning engineer.
Keep practicing by trying different datasets and experimenting with the hyperparameters. Happy coding!