# Configuration Reference

AIFT uses a layered configuration system. Settings are resolved in this order (highest priority first):

1. **Environment variables** — API keys from shell environment
2. **`config/config.yaml`** — User overrides under the project root
3. **Hardcoded defaults** — Built-in values in `app/utils/config.py`

If `config/config.yaml` does not exist on startup, AIFT creates one from the defaults automatically.

Headless automation entry points (`aift_cli.py --config`, REST `/api/automation/run` `config_path`, and MCP `config_path`) can use any existing YAML config file path, including custom filenames such as `config/acme-analysis-settings.yml`. When omitted, they use `config/config.yaml`.

---

## `ai` Section

Controls which AI provider is active and per-provider connection settings.

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `ai.provider` | string | `"claude"` | Active AI provider. One of: `claude`, `openai`, `kimi`, `local` |

### `ai.claude`

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `ai.claude.api_key` | string | `""` | Anthropic API key. Can also be set via `ANTHROPIC_API_KEY` env var |
| `ai.claude.model` | string | `"claude-opus-4-8"` | Claude model identifier |
| `ai.claude.attach_csv_as_file` | bool | `true` | Send CSV data as a file attachment rather than inline text |
| `ai.claude.request_timeout_seconds` | int | `600` | HTTP request timeout in seconds |

### `ai.openai`

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `ai.openai.api_key` | string | `""` | OpenAI API key. Can also be set via `OPENAI_API_KEY` env var |
| `ai.openai.model` | string | `"gpt-5.5"` | OpenAI model identifier |
| `ai.openai.attach_csv_as_file` | bool | `true` | Send CSV data as a file attachment rather than inline text |
| `ai.openai.request_timeout_seconds` | int | `600` | HTTP request timeout in seconds |

### `ai.kimi`

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `ai.kimi.api_key` | string | `""` | Moonshot/Kimi API key. Can also be set via `MOONSHOT_API_KEY` or `KIMI_API_KEY` env var |
| `ai.kimi.model` | string | `"kimi-k2.6"` | Kimi model identifier |
| `ai.kimi.base_url` | string | `"https://api.moonshot.ai/v1"` | Kimi API base URL |
| `ai.kimi.attach_csv_as_file` | bool | `true` | Send CSV data as a file attachment rather than inline text |
| `ai.kimi.request_timeout_seconds` | int | `600` | HTTP request timeout in seconds |

### `ai.local`

For OpenAI-compatible local endpoints (Ollama, LM Studio, vLLM, etc.).

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `ai.local.base_url` | string | `"http://localhost:11434/v1"` | Local model API endpoint |
| `ai.local.model` | string | `"llama3.1:70b"` | Model name as recognized by the local server |
| `ai.local.api_key` | string | `"not-needed"` | API key (most local servers ignore this) |
| `ai.local.attach_csv_as_file` | bool | `true` | Send CSV data as a file attachment rather than inline text |
| `ai.local.request_timeout_seconds` | int | `3600` | HTTP request timeout in seconds (higher default for local models) |

---

## `server` Section

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `server.port` | int | `5000` | TCP port the Flask server listens on (1–65535) |
| `server.host` | string | `"127.0.0.1"` | Bind address. Use `127.0.0.1` for localhost only |

---

## `evidence` Section

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `evidence.large_file_threshold_mb` | int | `0` | Maximum evidence upload size in MB. Set to `0` for unlimited (recommended for large forensic images). Files exceeding this limit are rejected with a suggestion to use path mode |
| `evidence.intake_timeout_seconds` | int | `7200` | Frontend timeout for evidence intake requests (hashing + validation). Large images (100 GB+) may need 30+ minutes. Default is 2 hours |
| `evidence.compute_hashes` | bool | `true` | Compute SHA-256 and MD5 hashes during evidence intake. Disable to speed up intake for large images; the report will show SKIPPED instead of PASS/FAIL for hash verification. Headless runs (CLI, REST automation, MCP) also use this setting as the hashing default when the caller does not pass an explicit skip_hashing choice |
| `evidence.archive_max_members` | int | `10000000` | Maximum regular files AIFT will validate or extract from an archive before rejecting it |
| `evidence.archive_max_total_bytes` | int | `536870912000` | Maximum extracted bytes per archive. The byte budget applies to each extracted archive individually (including nested archives), not as an aggregate across a run. Default is 500 GiB |
| `evidence.archive_max_member_bytes` | int | `214748364800` | Maximum size of one extracted archive member. Default is 200 GiB |

The `archive_max_*` limits apply to every archive extraction entry point: GUI evidence intake, the GUI Scan Directory helper, headless automation discovery, and the MCP `aift_discover_evidence` tool — including nested archives found inside extracted archives.

---

## `automation` Section

Controls headless automation run retention.

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `automation.run_retention_seconds` | int | `86400` | How long completed and failed runs, plus cancelled runs after worker finalization, remain available in memory for status checks and eligible report path retrieval. Default is 24 hours. On-disk cases and reports are preserved |

Values below 60 seconds are rejected.

---

## `analysis` Section

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `analysis.ai_max_tokens` | int | `128000` | Model context window. AIFT reserves response and safety tokens before calculating the analysis input budget |
| `analysis.ai_response_max_tokens` | int | `4096` | Maximum response tokens reserved from the context window for analysis calls |
| `analysis.ai_input_safety_margin_tokens` | int | `1024` | Additional safety reserve subtracted before prompt budget checks |
| `analysis.shortened_prompt_cutoff_tokens` | int | `64000` | When the resolved analysis input budget is below this value, a compact prompt template (without statistics) is used to save tokens |
| `analysis.connection_test_max_tokens` | int | `256` | Max tokens for the "Test Connection" health check call |
| `analysis.citation_spot_check_limit` | int | `20` | Configured citation validation cap. AIFT applies an internal minimum of 500 unique timestamps, row references, and column names per artifact |
| `analysis.artifact_csv_row_limit` | int | `0` | Maximum rows written per artifact CSV before analysis. `0` means unlimited; positive values intentionally cap parsed CSV output |
| `analysis.artifact_deduplication_enabled` | bool | `true` | Remove duplicate rows from artifact CSVs before sending to the AI |
| `analysis.artifact_ai_columns_config_path` | string | `"config/artifact_ai_columns.yaml"` | Path to the artifact column filtering config file |

> **Note:** `max_merge_rounds` is not in the hardcoded defaults but can be set in `config/config.yaml` or via the UI advanced settings. It controls the maximum number of hierarchical merge iterations when chunked analysis produces multiple findings. The UI defaults to `5`.

---

## Environment Variable Mapping

Environment variables override `config/config.yaml` values. Only API keys are mapped:

| Environment Variable | Config Key | Notes |
|---------------------|------------|-------|
| `ANTHROPIC_API_KEY` | `ai.claude.api_key` | Anthropic/Claude API key |
| `OPENAI_API_KEY` | `ai.openai.api_key` | OpenAI API key |
| `MOONSHOT_API_KEY` | `ai.kimi.api_key` | Moonshot/Kimi API key (checked first) |
| `KIMI_API_KEY` | `ai.kimi.api_key` | Alias for `MOONSHOT_API_KEY` (checked second) |

If an environment variable is set and non-empty, it always wins over the `config/config.yaml` value.

---

## UI Settings Modal vs Config-Only

The settings gear icon in the web UI exposes a **Basic** tab and an **Advanced** tab.

### Basic Tab (UI)

- AI provider selection (`ai.provider`)
- Per-provider fields: API key, model, base URL (where applicable)
- Server port (`server.port`)

### Advanced Tab (UI)

- `analysis.ai_max_tokens`
- `analysis.ai_response_max_tokens`
- `analysis.ai_input_safety_margin_tokens`
- `analysis.shortened_prompt_cutoff_tokens`
- `analysis.connection_test_max_tokens`
- `analysis.citation_spot_check_limit`
- `analysis.artifact_csv_row_limit`
- `analysis.artifact_deduplication_enabled`
- `analysis.max_merge_rounds`
- `automation.run_retention_seconds`
- `evidence.large_file_threshold_mb` (displayed in GB as "Evidence Size Threshold"; fractional GB values are accepted and stored in MB)
- `evidence.intake_timeout_seconds`
- `ai.local.request_timeout_seconds`
- `ai.<provider>.attach_csv_as_file` (per-provider toggles for Claude, OpenAI, Kimi, Local)

### Config-Only (not in UI)

- `server.host` — must be edited in `config/config.yaml`
- `analysis.artifact_ai_columns_config_path` — must be edited in `config/config.yaml`

---

## Artifact AI Columns Configuration

**File:** `config/artifact_ai_columns.yaml`

This file controls which CSV columns are sent to the AI for each artifact type. By default, all columns from a parsed artifact CSV are included. When an artifact has an entry in this file, only the listed columns are sent — reducing token usage and focusing the AI on forensically relevant data.

### Structure

```yaml
artifact_ai_columns:
  <artifact_key>:
    - column_name_1
    - column_name_2
    - ...
```

### Example

```yaml
artifact_ai_columns:
  runkeys:
    - ts
    - name
    - command
    - username
  shimcache:
    - last_modified
    - name
    - path
```

### Currently Configured Artifacts

The shipped `config/artifact_ai_columns.yaml` contains projections for the current Windows and Linux registry keys. Use that file as the source of truth for the exact artifact keys and column names. OS-suffixed keys such as `services_linux` apply only when that OS matches.

Some projection entries target combined artifact keys, including `thumbcache`, `ual`, `certlog`, and `dpapi.keyprovider`.

### How to Edit

- To include all columns for an artifact: remove its entry from the file.
- To restrict columns: list only the column names you want the AI to see.
- Column names must match the CSV headers exactly (as produced by the Dissect parser).
- Changes take effect on the next analysis run — no restart required.