AutoML Workbench (Streamlit)

A practical no-code/low-code ML app for:

Uploading tabular data
Exploring data quality and leakage risks
Configuring preprocessing
Training and comparing models
Exporting predictions and model artifacts

doc_2026-02-13_16-54-21.mp4

Quick Start

git clone https://github.com/bewaffnete/Streamlit-ML-Workbench.git
pip install -r requirements.txt
streamlit run app.py

Open the URL shown by Streamlit (usually http://localhost:8501).

What This App Does

The UI follows a guided flow:

Data Upload: CSV, Excel, Parquet + validation and dataset summary
Target & EDA: choose target/features and inspect distributions/correlations
Warnings: automatic alerts (leakage, imbalance, missingness, high-cardinality)
Preprocessing: imputation, encoding, scaling, outlier handling, optional polynomial features
Model Config: task type + model family + CV/split + tuning controls
Train & Evaluate: launch background training jobs, compare results, inspect metrics
Predict & Export: download predictions, metadata, and trusted model artifacts

Key Features

Clean separation: UI layer, service/orchestration layer, training/evaluation utilities
Background training via ProcessPoolExecutor (non-blocking Streamlit flow)
Fingerprint-based caching for heavy dataset summaries
Smart data warnings with configurable thresholds
Optional hyperparameter tuning (RandomizedSearchCV)
Safe project state import/export (JSON/YAML) with strict schema checks

Project Structure

app.py: app entry point and composition root
automl_gui/ui/: Streamlit rendering modules (upload, EDA, training, export, sidebar)
automl_gui/services.py: business orchestration (DataService, WarningService, TrainingService)
automl_gui/core/jobs.py: background job manager
automl_gui/data_utils.py: file loading, validation, dataset fingerprinting, cached summaries
automl_gui/preprocessing.py: preprocessing config + factory
automl_gui/training.py: model trainer + optimization hooks
automl_gui/evaluation.py: metrics and evaluation helpers
automl_gui/models.py: extensible model registry
automl_gui/state.py: session facade + strict config schema validation
automl_gui/warnings_utils.py: warning generation logic
automl_gui/visualization.py: plotting helpers

Configuration

Environment variables:

AUTOML_MAX_UPLOAD_MB: max upload size in MB (default 200)
AUTOML_MAX_N_JOBS: max CPU jobs for supported models/tuning (default 2)
AUTOML_BG_WORKERS: background worker processes (default 1)
AUTOML_LOG_LEVEL: logger level (INFO, DEBUG, ...)
AUTOML_LOG_TO_FILE: set 1 to enable rotating file logs
AUTOML_LOG_FILE: log filename (default automl_gui.log)

Security Notes

Do not load .joblib/.pkl files from untrusted sources.
Model loading in the UI requires explicit trust confirmation.
Project settings import uses strict schema validation and safe parsing (yaml.safe_load).

Troubleshooting

ModuleNotFoundError: verify venv is activated and pip install -r requirements.txt completed.
App feels slow on very large data: enable sampling in Upload tab and reduce selected features/models.
Training takes too long: disable tuning, lower iterations, and reduce model count.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
automl_gui		automl_gui
tests		tests
.env		.env
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoML Workbench (Streamlit)

Quick Start

What This App Does

Key Features

Project Structure

Configuration

Security Notes

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoML Workbench (Streamlit)

Quick Start

What This App Does

Key Features

Project Structure

Configuration

Security Notes

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages