Post

AutoML with Python: Automate Your Machine Learning Pipeline

Learn how to use AutoML in Python with Auto-sklearn, TPOT, and H2O AutoML. This guide covers automated model selection, hyperparameter tuning, and pipeline optimization with practical examples.

AutoML with Python: Automate Your Machine Learning Pipeline

What Is AutoML?

AutoML (Automated Machine Learning) automates the process of selecting models, engineering features, and tuning hyperparameters. Instead of manually trying dozens of algorithms and parameter combinations, you let a framework search for the best pipeline.

1
2
3
4
5
6
7
8
9
10
11
12
13
# Manual ML workflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

model = RandomForestClassifier()
param_grid = {"n_estimators": [100, 200, 500], "max_depth": [5, 10, 20]}
grid = GridSearchCV(model, param_grid, cv=5, scoring="accuracy")
grid.fit(X_train, y_train)

# AutoML workflow -- does the above across many algorithms automatically
import autosklearn.classification
automl = autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=300)
automl.fit(X_train, y_train)

AutoML handles algorithm selection, hyperparameter tuning, feature preprocessing, and ensemble construction. You provide the data and a time budget; it returns the best model it can find.

When to Use AutoML

AutoML is useful when:

  • You need a strong baseline quickly.
  • You are exploring which algorithm family works best for your data.
  • You have limited ML expertise but need a production-quality model.
  • You want to validate that your hand-tuned model is not significantly worse than what automated search finds.

AutoML is not a replacement for understanding your data. You still need to clean your data, handle missing values thoughtfully, and validate that the model makes sense for your problem domain.

Setting Up a Dataset

We will use the same dataset across all three frameworks for fair comparison.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Load the adult income dataset
data = fetch_openml("adult", version=2, as_frame=True)
X = data.data
y = (data.target == ">50K").astype(int)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")
print(f"Class distribution: {y_train.value_counts().to_dict()}")

Auto-sklearn

Auto-sklearn is built on top of scikit-learn. It uses Bayesian optimization to search over scikit-learn’s classifiers and preprocessing methods.

Installation

1
pip install auto-sklearn

Note: Auto-sklearn runs on Linux only. On macOS or Windows, use WSL or Docker.

Basic Usage

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import autosklearn.classification
from sklearn.metrics import accuracy_score

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=600,   # Total time in seconds
    per_run_time_limit=60,         # Max time per model
    n_jobs=-1,                     # Use all CPU cores
    memory_limit=4096,             # Memory limit in MB
    seed=42,
)

automl.fit(X_train, y_train)

# View the models it tried
print(automl.leaderboard())

# Get predictions
y_pred = automl.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(classification_report(y_test, y_pred))

Inspecting the Results

1
2
3
4
5
6
7
8
9
# Show the final ensemble
print(automl.show_models())

# Get statistics about the search
print(f"Number of models evaluated: {len(automl.cv_results_['mean_test_score'])}")
print(f"Best single model score: {max(automl.cv_results_['mean_test_score']):.4f}")

# Sprint statistics
print(automl.sprint_statistics())

Advanced Configuration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
automl_advanced = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=1200,
    per_run_time_limit=120,
    ensemble_size=10,
    initial_configurations_via_metalearning=25,
    include={
        "classifier": ["random_forest", "gradient_boosting", "extra_trees", "mlp"],
    },
    exclude={
        "feature_preprocessor": ["kernel_pca"],
    },
    metric=autosklearn.metrics.balanced_accuracy,
    resampling_strategy="cv",
    resampling_strategy_arguments={"folds": 5},
)

automl_advanced.fit(X_train, y_train)

You can restrict the search space to specific classifiers or exclude preprocessing steps you know are irrelevant.

TPOT

TPOT (Tree-based Pipeline Optimization Tool) uses genetic programming to evolve scikit-learn pipelines. It breeds and mutates pipelines across generations to find the best one.

Installation

1
pip install tpot

Basic Usage

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from tpot import TPOTClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder

tpot = TPOTClassifier(
    generations=10,
    population_size=50,
    cv=5,
    random_state=42,
    verbosity=2,
    n_jobs=-1,
    max_time_mins=10,
    scoring="accuracy",
)

# TPOT needs numeric data -- encode categoricals first
X_train_encoded = X_train.copy()
X_test_encoded = X_test.copy()

label_encoders = {}
for col in X_train.select_dtypes(include=["category", "object"]).columns:
    le = LabelEncoder()
    X_train_encoded[col] = le.fit_transform(X_train[col].astype(str))
    X_test_encoded[col] = le.transform(X_test[col].astype(str))
    label_encoders[col] = le

tpot.fit(X_train_encoded, y_train)

y_pred = tpot.predict(X_test_encoded)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")

Exporting the Pipeline

One of TPOT’s best features is that it exports the winning pipeline as a Python script:

1
tpot.export("best_pipeline.py")

This generates a standalone scikit-learn script you can use directly in production without TPOT as a dependency:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Example exported pipeline (contents of best_pipeline.py)
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

exported_pipeline = make_pipeline(
    StandardScaler(),
    GradientBoostingClassifier(
        learning_rate=0.1,
        max_depth=7,
        n_estimators=200,
        subsample=0.85
    )
)

exported_pipeline.fit(X_train, y_train)

Configuration Options

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Use a lighter configuration for faster searches
tpot_light = TPOTClassifier(
    config_dict="TPOT light",
    generations=5,
    population_size=20,
    verbosity=2,
)

# Use only specific operators
tpot_custom = TPOTClassifier(
    config_dict={
        "sklearn.ensemble.RandomForestClassifier": {
            "n_estimators": [100, 200, 500],
            "max_depth": [5, 10, 20, None],
            "min_samples_split": [2, 5, 10],
        },
        "sklearn.ensemble.GradientBoostingClassifier": {
            "n_estimators": [100, 200, 500],
            "learning_rate": [0.01, 0.1, 0.2],
            "max_depth": [3, 5, 7],
        },
        "sklearn.preprocessing.StandardScaler": {},
    },
    generations=10,
    population_size=30,
)

H2O AutoML

H2O AutoML is a scalable, enterprise-grade AutoML platform. It trains and cross-validates a wide range of models including stacked ensembles, gradient boosting machines, deep learning, and GLMs.

Installation

1
pip install h2o

Basic Usage

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import h2o
from h2o.automl import H2OAutoML

h2o.init(max_mem_size="4G")

# Convert pandas DataFrames to H2O frames
train = h2o.H2OFrame(pd.concat([X_train, y_train.rename("target")], axis=1))
test = h2o.H2OFrame(pd.concat([X_test, y_test.rename("target")], axis=1))

target = "target"
features = [col for col in train.columns if col != target]

train[target] = train[target].asfactor()
test[target] = test[target].asfactor()

aml = H2OAutoML(
    max_runtime_secs=600,
    max_models=20,
    seed=42,
    sort_metric="AUC",
    balance_classes=True,
)

aml.train(x=features, y=target, training_frame=train)

Viewing the Leaderboard

1
2
3
4
5
6
7
8
9
10
lb = aml.leaderboard
print(lb.head(rows=10))

best_model = aml.leader
print(f"Best model: {best_model.model_id}")

performance = best_model.model_performance(test)
print(f"AUC: {performance.auc():.4f}")
print(f"Accuracy: {performance.accuracy()[0][1]:.4f}")
print(performance.confusion_matrix())

Model Explainability

1
2
3
4
5
6
7
8
9
importance = best_model.varimp(use_pandas=True)
print(importance.head(10))

# SHAP values (if supported by model type)
contributions = best_model.predict_contributions(test)
print(contributions.head())

# Generate a full explanation report
h2o.explain(best_model, test)

Saving and Loading Models

1
2
3
4
5
6
7
8
model_path = h2o.save_model(model=best_model, path="./h2o_models", force=True)
print(f"Model saved to: {model_path}")

loaded_model = h2o.load_model(model_path)
predictions = loaded_model.predict(test)
print(predictions.head())

h2o.shutdown(prompt=False)

Comparing Results

Here is how to run all three frameworks and compare them:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import time

results = {}

# Auto-sklearn
start = time.time()
ask = autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=600)
ask.fit(X_train, y_train)
y_pred_ask = ask.predict(X_test)
results["Auto-sklearn"] = {
    "accuracy": accuracy_score(y_test, y_pred_ask),
    "time": time.time() - start
}

# TPOT
start = time.time()
tpot = TPOTClassifier(max_time_mins=10, verbosity=0, random_state=42)
tpot.fit(X_train_encoded, y_train)
y_pred_tpot = tpot.predict(X_test_encoded)
results["TPOT"] = {
    "accuracy": accuracy_score(y_test, y_pred_tpot),
    "time": time.time() - start
}

# Print comparison
print(f"{'Framework':<15} {'Accuracy':<12} {'Time (s)':<10}")
print("-" * 37)
for name, r in results.items():
    print(f"{name:<15} {r['accuracy']:<12.4f} {r['time']:<10.1f}")

AutoML for Regression

The same frameworks handle regression tasks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import autosklearn.regression
from tpot import TPOTRegressor

# Auto-sklearn regression
automl_reg = autosklearn.regression.AutoSklearnRegressor(
    time_left_for_this_task=300,
    per_run_time_limit=60,
)
automl_reg.fit(X_train_reg, y_train_reg)

# TPOT regression
tpot_reg = TPOTRegressor(
    generations=5,
    population_size=30,
    scoring="neg_mean_squared_error",
    max_time_mins=5,
)
tpot_reg.fit(X_train_reg, y_train_reg)

# H2O regression (don't call asfactor on the target)
aml_reg = H2OAutoML(max_runtime_secs=300, sort_metric="RMSE")
aml_reg.train(x=features, y=target, training_frame=train_reg)

When to Use AutoML vs. Manual Tuning

ScenarioRecommendation
Quick baseline for a new projectAutoML
Exploring which algorithm family works bestAutoML
Production model with strict latency requirementsManual tuning (you control model complexity)
Domain-specific feature engineering neededManual + AutoML for model selection
Very large dataset (millions of rows)H2O AutoML (scales well)
Need an interpretable modelManual (choose the model type deliberately)
Kaggle competitionAutoML for baseline, then manual refinement

A practical approach: run AutoML first to establish a baseline score. If the score is good enough, deploy the AutoML model. If not, use the AutoML results to understand which algorithm families work best, then manually tune within that family.

Tips for Getting the Best Results

Give it enough time. AutoML frameworks need time to explore. A 5-minute budget will find something reasonable. A 1-hour budget will often find something substantially better.

Clean your data first. AutoML handles missing values and encoding, but it cannot fix data quality issues. Remove duplicates, handle obvious errors, and ensure your target variable is correct.

Use cross-validation. All three frameworks support cross-validation. Never evaluate AutoML on the training set – the reported score should always come from held-out data.

Watch for overfitting. AutoML optimizes for the validation metric. If your validation set is too small or not representative, the selected model may overfit. Always evaluate on a truly held-out test set.

Check the selected features. AutoML may include features that are data leaks. Always review feature importance and remove leaky features.

1
2
3
4
5
6
7
8
9
10
import matplotlib.pyplot as plt

importance = best_model.varimp(use_pandas=True)
plt.figure(figsize=(10, 6))
plt.barh(importance["variable"][:15], importance["relative_importance"][:15])
plt.xlabel("Relative Importance")
plt.title("Top 15 Features")
plt.tight_layout()
plt.savefig("feature_importance.png")
plt.show()

Summary

AutoML automates model selection, hyperparameter tuning, and pipeline construction. Auto-sklearn is best for scikit-learn users who want a drop-in replacement for manual tuning. TPOT is unique in its genetic programming approach and its ability to export standalone pipelines. H2O AutoML scales to large datasets and provides built-in explainability tools. Start with AutoML for baselines, then refine manually when you need more control over the model architecture or deployment constraints.

This post is licensed under CC BY 4.0 by the author.