# Configuration Reference AIFT uses a layered configuration system. Settings are resolved in this order (highest priority first): 1. **Environment variables** — API keys from shell environment 2. **`config/config.yaml`** — User overrides under the project root 3. **Hardcoded defaults** — Built-in values in `app/utils/config.py` If `config/config.yaml` does not exist on startup, AIFT creates one from the defaults automatically. Headless automation entry points (`aift_cli.py --config`, REST `/api/automation/run` `config_path`, and MCP `config_path`) can use any existing YAML config file path, including custom filenames such as `config/acme-analysis-settings.yml`. When omitted, they use `config/config.yaml`. --- ## `ai` Section Controls which AI provider is active and per-provider connection settings. | Key | Type | Default | Description | |-----|------|---------|-------------| | `ai.provider` | string | `"claude"` | Active AI provider. One of: `claude`, `openai`, `kimi`, `local` | ### `ai.claude` | Key | Type | Default | Description | |-----|------|---------|-------------| | `ai.claude.api_key` | string | `""` | Anthropic API key. Can also be set via `ANTHROPIC_API_KEY` env var | | `ai.claude.model` | string | `"claude-opus-4-8"` | Claude model identifier | | `ai.claude.attach_csv_as_file` | bool | `true` | Send CSV data as a file attachment rather than inline text | | `ai.claude.request_timeout_seconds` | int | `600` | HTTP request timeout in seconds | ### `ai.openai` | Key | Type | Default | Description | |-----|------|---------|-------------| | `ai.openai.api_key` | string | `""` | OpenAI API key. Can also be set via `OPENAI_API_KEY` env var | | `ai.openai.model` | string | `"gpt-5.5"` | OpenAI model identifier | | `ai.openai.attach_csv_as_file` | bool | `true` | Send CSV data as a file attachment rather than inline text | | `ai.openai.request_timeout_seconds` | int | `600` | HTTP request timeout in seconds | ### `ai.kimi` | Key | Type | Default | Description | |-----|------|---------|-------------| | `ai.kimi.api_key` | string | `""` | Moonshot/Kimi API key. Can also be set via `MOONSHOT_API_KEY` or `KIMI_API_KEY` env var | | `ai.kimi.model` | string | `"kimi-k2.6"` | Kimi model identifier | | `ai.kimi.base_url` | string | `"https://api.moonshot.ai/v1"` | Kimi API base URL | | `ai.kimi.attach_csv_as_file` | bool | `true` | Send CSV data as a file attachment rather than inline text | | `ai.kimi.request_timeout_seconds` | int | `600` | HTTP request timeout in seconds | ### `ai.local` For OpenAI-compatible local endpoints (Ollama, LM Studio, vLLM, etc.). | Key | Type | Default | Description | |-----|------|---------|-------------| | `ai.local.base_url` | string | `"http://localhost:11434/v1"` | Local model API endpoint | | `ai.local.model` | string | `"llama3.1:70b"` | Model name as recognized by the local server | | `ai.local.api_key` | string | `"not-needed"` | API key (most local servers ignore this) | | `ai.local.attach_csv_as_file` | bool | `true` | Send CSV data as a file attachment rather than inline text | | `ai.local.request_timeout_seconds` | int | `3600` | HTTP request timeout in seconds (higher default for local models) | --- ## `server` Section | Key | Type | Default | Description | |-----|------|---------|-------------| | `server.port` | int | `5000` | TCP port the Flask server listens on (1–65535) | | `server.host` | string | `"127.0.0.1"` | Bind address. Use `127.0.0.1` for localhost only | --- ## `evidence` Section | Key | Type | Default | Description | |-----|------|---------|-------------| | `evidence.large_file_threshold_mb` | int | `0` | Maximum evidence upload size in MB. Set to `0` for unlimited (recommended for large forensic images). Files exceeding this limit are rejected with a suggestion to use path mode | | `evidence.intake_timeout_seconds` | int | `7200` | Frontend timeout for evidence intake requests (hashing + validation). Large images (100 GB+) may need 30+ minutes. Default is 2 hours | | `evidence.compute_hashes` | bool | `true` | Compute SHA-256 and MD5 hashes during evidence intake. Disable to speed up intake for large images; the report will show SKIPPED instead of PASS/FAIL for hash verification. Headless runs (CLI, REST automation, MCP) also use this setting as the hashing default when the caller does not pass an explicit skip_hashing choice | | `evidence.archive_max_members` | int | `10000000` | Maximum regular files AIFT will validate or extract from an archive before rejecting it | | `evidence.archive_max_total_bytes` | int | `536870912000` | Maximum extracted bytes per archive. The byte budget applies to each extracted archive individually (including nested archives), not as an aggregate across a run. Default is 500 GiB | | `evidence.archive_max_member_bytes` | int | `214748364800` | Maximum size of one extracted archive member. Default is 200 GiB | The `archive_max_*` limits apply to every archive extraction entry point: GUI evidence intake, the GUI Scan Directory helper, headless automation discovery, and the MCP `aift_discover_evidence` tool — including nested archives found inside extracted archives. --- ## `automation` Section Controls headless automation run retention. | Key | Type | Default | Description | |-----|------|---------|-------------| | `automation.run_retention_seconds` | int | `86400` | How long completed and failed runs, plus cancelled runs after worker finalization, remain available in memory for status checks and eligible report path retrieval. Default is 24 hours. On-disk cases and reports are preserved | Values below 60 seconds are rejected. --- ## `analysis` Section | Key | Type | Default | Description | |-----|------|---------|-------------| | `analysis.ai_max_tokens` | int | `128000` | Model context window. AIFT reserves response and safety tokens before calculating the analysis input budget | | `analysis.ai_response_max_tokens` | int | `4096` | Maximum response tokens reserved from the context window for analysis calls | | `analysis.ai_input_safety_margin_tokens` | int | `1024` | Additional safety reserve subtracted before prompt budget checks | | `analysis.shortened_prompt_cutoff_tokens` | int | `64000` | When the resolved analysis input budget is below this value, a compact prompt template (without statistics) is used to save tokens | | `analysis.connection_test_max_tokens` | int | `256` | Max tokens for the "Test Connection" health check call | | `analysis.citation_spot_check_limit` | int | `20` | Configured citation validation cap. AIFT applies an internal minimum of 500 unique timestamps, row references, and column names per artifact | | `analysis.artifact_csv_row_limit` | int | `0` | Maximum rows written per artifact CSV before analysis. `0` means unlimited; positive values intentionally cap parsed CSV output | | `analysis.artifact_deduplication_enabled` | bool | `true` | Remove duplicate rows from artifact CSVs before sending to the AI | | `analysis.artifact_ai_columns_config_path` | string | `"config/artifact_ai_columns.yaml"` | Path to the artifact column filtering config file | > **Note:** `max_merge_rounds` is not in the hardcoded defaults but can be set in `config/config.yaml` or via the UI advanced settings. It controls the maximum number of hierarchical merge iterations when chunked analysis produces multiple findings. The UI defaults to `5`. --- ## Environment Variable Mapping Environment variables override `config/config.yaml` values. Only API keys are mapped: | Environment Variable | Config Key | Notes | |---------------------|------------|-------| | `ANTHROPIC_API_KEY` | `ai.claude.api_key` | Anthropic/Claude API key | | `OPENAI_API_KEY` | `ai.openai.api_key` | OpenAI API key | | `MOONSHOT_API_KEY` | `ai.kimi.api_key` | Moonshot/Kimi API key (checked first) | | `KIMI_API_KEY` | `ai.kimi.api_key` | Alias for `MOONSHOT_API_KEY` (checked second) | If an environment variable is set and non-empty, it always wins over the `config/config.yaml` value. --- ## UI Settings Modal vs Config-Only The settings gear icon in the web UI exposes a **Basic** tab and an **Advanced** tab. ### Basic Tab (UI) - AI provider selection (`ai.provider`) - Per-provider fields: API key, model, base URL (where applicable) - Server port (`server.port`) ### Advanced Tab (UI) - `analysis.ai_max_tokens` - `analysis.ai_response_max_tokens` - `analysis.ai_input_safety_margin_tokens` - `analysis.shortened_prompt_cutoff_tokens` - `analysis.connection_test_max_tokens` - `analysis.citation_spot_check_limit` - `analysis.artifact_csv_row_limit` - `analysis.artifact_deduplication_enabled` - `analysis.max_merge_rounds` - `automation.run_retention_seconds` - `evidence.large_file_threshold_mb` (displayed in GB as "Evidence Size Threshold"; fractional GB values are accepted and stored in MB) - `evidence.intake_timeout_seconds` - `ai.local.request_timeout_seconds` - `ai..attach_csv_as_file` (per-provider toggles for Claude, OpenAI, Kimi, Local) ### Config-Only (not in UI) - `server.host` — must be edited in `config/config.yaml` - `analysis.artifact_ai_columns_config_path` — must be edited in `config/config.yaml` --- ## Artifact AI Columns Configuration **File:** `config/artifact_ai_columns.yaml` This file controls which CSV columns are sent to the AI for each artifact type. By default, all columns from a parsed artifact CSV are included. When an artifact has an entry in this file, only the listed columns are sent — reducing token usage and focusing the AI on forensically relevant data. ### Structure ```yaml artifact_ai_columns: : - column_name_1 - column_name_2 - ... ``` ### Example ```yaml artifact_ai_columns: runkeys: - ts - name - command - username shimcache: - last_modified - name - path ``` ### Currently Configured Artifacts The shipped `config/artifact_ai_columns.yaml` contains projections for the current Windows and Linux registry keys. Use that file as the source of truth for the exact artifact keys and column names. OS-suffixed keys such as `services_linux` apply only when that OS matches. Some projection entries target combined artifact keys, including `thumbcache`, `ual`, `certlog`, and `dpapi.keyprovider`. ### How to Edit - To include all columns for an artifact: remove its entry from the file. - To restrict columns: list only the column names you want the AI to see. - Column names must match the CSV headers exactly (as produced by the Dissect parser). - Changes take effect on the next analysis run — no restart required.