A practical no-code/low-code ML app for:
- Uploading tabular data
- Exploring data quality and leakage risks
- Configuring preprocessing
- Training and comparing models
- Exporting predictions and model artifacts
doc_2026-02-13_16-54-21.mp4
git clone https://github.com/bewaffnete/Streamlit-ML-Workbench.git
pip install -r requirements.txt
streamlit run app.pyOpen the URL shown by Streamlit (usually http://localhost:8501).
The UI follows a guided flow:
Data Upload: CSV, Excel, Parquet + validation and dataset summaryTarget & EDA: choose target/features and inspect distributions/correlationsWarnings: automatic alerts (leakage, imbalance, missingness, high-cardinality)Preprocessing: imputation, encoding, scaling, outlier handling, optional polynomial featuresModel Config: task type + model family + CV/split + tuning controlsTrain & Evaluate: launch background training jobs, compare results, inspect metricsPredict & Export: download predictions, metadata, and trusted model artifacts
- Clean separation: UI layer, service/orchestration layer, training/evaluation utilities
- Background training via
ProcessPoolExecutor(non-blocking Streamlit flow) - Fingerprint-based caching for heavy dataset summaries
- Smart data warnings with configurable thresholds
- Optional hyperparameter tuning (
RandomizedSearchCV) - Safe project state import/export (
JSON/YAML) with strict schema checks
app.py: app entry point and composition rootautoml_gui/ui/: Streamlit rendering modules (upload, EDA, training, export, sidebar)automl_gui/services.py: business orchestration (DataService,WarningService,TrainingService)automl_gui/core/jobs.py: background job managerautoml_gui/data_utils.py: file loading, validation, dataset fingerprinting, cached summariesautoml_gui/preprocessing.py: preprocessing config + factoryautoml_gui/training.py: model trainer + optimization hooksautoml_gui/evaluation.py: metrics and evaluation helpersautoml_gui/models.py: extensible model registryautoml_gui/state.py: session facade + strict config schema validationautoml_gui/warnings_utils.py: warning generation logicautoml_gui/visualization.py: plotting helpers
Environment variables:
AUTOML_MAX_UPLOAD_MB: max upload size in MB (default200)AUTOML_MAX_N_JOBS: max CPU jobs for supported models/tuning (default2)AUTOML_BG_WORKERS: background worker processes (default1)AUTOML_LOG_LEVEL: logger level (INFO,DEBUG, ...)AUTOML_LOG_TO_FILE: set1to enable rotating file logsAUTOML_LOG_FILE: log filename (defaultautoml_gui.log)
- Do not load
.joblib/.pklfiles from untrusted sources. - Model loading in the UI requires explicit trust confirmation.
- Project settings import uses strict schema validation and safe parsing (
yaml.safe_load).
ModuleNotFoundError: verify venv is activated andpip install -r requirements.txtcompleted.- App feels slow on very large data: enable sampling in Upload tab and reduce selected features/models.
- Training takes too long: disable tuning, lower iterations, and reduce model count.