Viewing plan.md

filename: plan.md
branch: main
back to repo
# Local AI Assistant (Go) - Implementation Spec (Config-Driven, Single LLM Client)

## Goal

Build a **minimal, self-hosted AI assistant** that:

- Uses **config.yaml** as the single source of truth
- Talks to an **Ollama-compatible endpoint** (local or remote)
- Uses **native tool-calling** (`/api/chat` with `tools` / `tool_calls`) to perform tasks
- Runs as a **single Go binary** (~600–900 LOC *initial target for Phases 1–2*; Phase 3 will add code)

Scope is intentionally focused:

✅ Agent  
✅ Tools  
✅ Config-driven LLM  
✅ HTTP API  

❌ No voice  
⚠️ Basic embedded web UI exists; no advanced dashboard/features yet  
❌ No multi-provider abstraction (for now)

---

## Current Project Status (Reviewed Mar 2026)

The codebase has moved beyond the original Phase 1-3 draft. This section reflects what is already implemented:

- Config loading + validation via `config/config.go` with fail-fast checks for required fields and enabled tool prerequisites.
- Single LLM client in `llm/llm.go` using Ollama-compatible `POST /api/chat` with optional Bearer auth.
- Agent tool loop implemented in `agent/agent.go` with bounded tool rounds, telemetry timing, and history retention/compaction.
- Tool registry + tool schemas implemented in `tools/registry.go`.
- Implemented tools: weather, news (headlines + article fetch), tasks, memory, and code read/write tools (workspace guardrails + size limits).
- SQLite ownership implemented in `memory/sqlite.go` and `memory/memory.go` for both tasks and long-term memories.
- HTTP API implemented with:
  - `POST /ask`
  - `POST /ask/stream` (SSE events)
  - `GET /api/status` (telemetry + memory/context metrics)
  - Embedded static UI serving from `server/static/*`
- Session and memory control commands implemented in `server/api.go`:
  - `/undo`, `/regenerate`, `/clear-session`, `/compact-session`
  - dangerous memory operations with confirmation flow (`/clear-longterm`, `/compact-longterm`, `/confirm`, `/cancel`)
- Automation scheduler implemented in `automation/scheduler.go` and `automation/morning.go`:
  - daily morning briefing
  - optional timezone support
  - telemetry for next/last run

Items from the old plan that are now outdated:

- “No dashboard” is no longer accurate (embedded web UI exists).
- “Not in scope for MVP: streaming” is no longer accurate (`/ask/stream` is implemented).
- “Tasks disabled until Phase 2 code exists” is obsolete (tasks + memory stack are implemented).
- Original phase table no longer matches the current maturity.

---

## Constraints

- Everything configurable via YAML; no hard-coded provider switching beyond `base_url` + `api_key`
- One LLM client file: [`llm/llm.go`](llm/llm.go)
- One process, one binary
- Tools = small, isolated units behind a registry
- Minimal abstractions; readable in one sitting
- No provider abstraction layer (yet)
- **No second agent** for automation: scheduled jobs reuse the same LLM client and tool registry as `POST /ask`

---

## High-Level Architecture

```text
client (curl / script)
│
▼
assistant (Go binary)
│
├ agent
│ ├ LLM (config-driven, /api/chat)
│ └ tools (registry)
├ tools (implementations)
├ memory (SQLite - single owner of DB)
└ automation (scheduler → same agent path as API)
```

### `/ask` flow (reference)

```mermaid
sequenceDiagram
participant Client
participant API
participant Agent
participant LLM
participant Tools
Client->>API: POST /ask
API->>Agent: Run(prompt)
Agent->>LLM: chat messages + tool defs
LLM-->>Agent: tool_calls or final text
loop While tool_calls
Agent->>Tools: Run(name, args)
Tools-->>Agent: tool result
Agent->>LLM: chat + tool results
LLM-->>Agent: next message or final
end
Agent-->>API: reply
API-->>Client: JSON response
```

---

## Project Structure

```text
assistant/
│
├ go.mod
├ main.go
│
├ config/
│ ├ config.go
│ └ config.yaml
│
├ server/
│ ├ http.go      # listen, mux, lifecycle (HTTP server)
│ └ api.go       # route handlers
│
├ agent/
│ ├ agent.go
│ ├ prompt.go
│ └ toolcall.go
│
├ llm/
│ └ llm.go       # sole HTTP client to Ollama-compatible API
│
├ tools/
│ ├ registry.go
│ ├ weather.go
│ ├ news.go
│ ├ tasks.go    # calls memory package only (no direct DB open)
│ └ code.go
│
├ memory/
│ ├ sqlite.go   # open DB, migrations/schema (tasks + memories)
│ └ memory.go   # task CRUD + future memory APIs
│
├ automation/
│ ├ scheduler.go
│ └ morning.go  # weather + news + tasks → LLM summary (same client/registry as /ask)
│
└ util/
    ├ http.go   # shared helpers (e.g. JSON decode, header copy), not the listening server
    └ log.go    # small logging helpers (stderr prefix, optional level); stdlib-only is fine
```

[`server/http.go`](server/http.go) owns binding `host:port` and routing; [`util/http.go`](util/http.go) holds small shared HTTP utilities used by LLM client and handlers; [`util/log.go`](util/log.go) centralizes log formatting so packages do not each invent their own prefix style.

---

## Config System

### `config/config.yaml`

```yaml
llm:
  base_url: http://localhost:11434
  model: qwen2.5:7b
  api_key: ""   # optional; Bearer when non-empty (local Ollama usually blank)

agent:
  system_prompt: |
    You are a personal assistant.
    You have access to tools.
    Use tools when helpful.

server:
  host: 0.0.0.0
  port: 8080

automation:
  morning_briefing_hour: 7
  # Optional later: timezone: America/Los_Angeles  (IANA); if omitted, use host local TZ

tools:
  weather:
    enabled: true
    lat: 39.16
    lon: -120.14

  news:
    enabled: true
    feeds:
      - https://news.ycombinator.com/rss

  # Phase 1: keep false until memory/tasks exist; enabling early should fail startup (see config.go)
  tasks:
    enabled: false

  code:
    enabled: false
    workspace: ./workspace
```

**Config vs phases:** A Phase 1 build only ships weather + news. If `tools.tasks.enabled` or `tools.code.enabled` is `true` while that code path is absent, **exit at startup** with an explicit error-never register silent no-ops.

### `config/config.go`

- Load YAML from a known path (CLI flag or cwd)
- Validate required fields; **fail fast** on bad config
- Gate tools: if YAML enables a tool the binary does not implement yet, **fail fast** with a clear message (or split sample configs per phase)
- Expose typed structs consumed by `main`, `llm`, `server`, `tools`, `automation`

---

## LLM layer (single file): `llm/llm.go`

This is the **only** code that calls the Ollama-compatible HTTP API.

**Assumption:** Ollama-compatible **`POST {base_url}/api/chat`** with **`stream: false`** for Phase 1–2. Prefer `/api/chat` for multi-turn tool loops; `/api/generate` is not the primary path for this design.

**Config:** `base_url`, `model`, `api_key` (optional).

**Auth:** If `api_key` is non-empty, set `Authorization: Bearer <api_key>`; otherwise omit.

### Minimal shapes (illustrative)

**Request (non-streaming chat + tools):**

```json
{
  "model": "qwen2.5:7b",
  "messages": [
    { "role": "system", "content": "..." },
    { "role": "user", "content": "..." }
  ],
  "tools": [ { "type": "function", "function": { "name": "...", "description": "...", "parameters": { ... } } } ],
  "stream": false
}
```

**Response:** Parse `message.content` and/or `message.tool_calls` (names + arguments) from the Ollama/Ollama-compatible JSON. For each tool result, append messages that match **your server’s documented tool-result shape** (often `role: "tool"` plus fields that tie back to the assistant `tool_calls`, e.g. name / id-field names vary by Ollama version). Repeat until the model returns final text without tool calls.

**Not in scope for MVP:** streaming (`stream: true`), speculative multi-provider clients.

**Interop:** `ChatRequest` / `ChatResponse` structs may need small tweaks for the exact JSON your Ollama version returns-validate against a live `curl` capture once, then freeze the structs.

### Suggested API surface

```go
type Client struct {
    BaseURL string
    Model   string
    APIKey  string
}

func NewClient(cfg Config) *Client

// Chat sends one non-streaming /api/chat request; caller builds messages + optional tools.
func (c *Client) Chat(ctx context.Context, body ChatRequest) (*ChatResponse, error)
```

Typed `ChatRequest` / `ChatResponse` structs mirror the minimal JSON fields you need-avoid hand-waving `map[string]any` in the hot path if you can keep structs small.

---

## Agent: `agent/agent.go`

**Core loop:**

```text
input
 ↓
LLM (chat + tool definitions)
 ↓
if tool_calls → execute tools → append results → LLM again
 ↓
final natural language reply
```

**Responsibilities:**

- Inject system prompt and user message
- Register only **enabled** tools from config in the chat `tools` array
- Run tool handlers; stringify or JSON-encode results; attach tool results using the **same message shape** your `/api/chat` server expects (see LLM layer note on tool replies)
- Return final assistant string to the HTTP layer

### Prompt system: `agent/prompt.go`

Assemble chat messages:

- System: from `config.agent.system_prompt` plus a concise list of **enabled** tool names/descriptions (no dead tools in the prompt)
- User: request body

### Tool dispatch: `agent/toolcall.go`

- Map `tool_calls[i].function.name` (OpenAI-style; adjust field paths if your Ollama build differs) → registry entry
- Parse arguments (JSON) into what each tool expects
- Errors from tools → surface as tool result text or logged + user-safe message (pick one pattern and stay consistent)

---

## Tool system

### `tools/registry.go`

```go
type Tool struct {
    Name        string
    Description string
    Parameters  json.RawMessage // JSON Schema object for /api/chat
    Run         func(ctx context.Context, args json.RawMessage) (string, error)
}
```

Registry: `map[string]Tool` built at startup from config (`enabled` flags).

### `tools/weather.go`

- Open-Meteo (or equivalent); `lat` / `lon` from config

### `tools/news.go`

- Fetch RSS from `tools.news.feeds`

### `tools/tasks.go`

- **Does not open SQLite.** Calls [`memory`](memory/) package: add / list / complete tasks.

### `tools/code.go`

Filesystem helper for a configured workspace (e.g. Cursor workflows).

**Guardrails (in scope; not a full sandbox):**

- Resolve all paths under `tools.code.workspace` (config); reject `..` segments and paths that escape the workspace after `filepath.Clean` + root prefix check
- Optional max read/write size (e.g. reject files above N MB) to avoid accidental huge reads
- **Non-goal:** OS-level sandboxing, arbitrary command execution, or network from this tool

**Surface:** e.g. `ReadFile(relPath)`, `WriteFile(relPath, content)` as thin wrappers over `os` within the workspace root only (no SQLite; not the `memory` package).

---

## SQLite ownership: `memory/`

- [`memory/sqlite.go`](memory/sqlite.go): single `sql.DB`, schema/migrations, tables **`tasks`** and **`memories`** (memories may stay unused until you need them)
- [`memory/memory.go`](memory/memory.go): exported functions for task CRUD (and future “memories” APIs)

[`tools/tasks.go`](tools/tasks.go) depends on `memory` only-one owner for the DB avoids duplicate connections and drift.

---

## Server

### `server/http.go`

- Listen on `config.server.host:port`
- Attach routes from `api.go`

### `server/api.go`

- **`POST /ask`** - body: `{ "prompt": "..." }`
- Flow: handler → agent → JSON response with assistant text (and optional debug fields only if you want them behind a flag later)

---

## Automation

### `automation/scheduler.go`

- `time.Ticker` or schedule: run jobs at configured times
- **`morning_briefing_hour`:** interpret in the **host’s local timezone** until an optional `automation.timezone` (IANA) exists in config

### `automation/morning.go`

**Pipeline:** gather weather + news + task summary inputs → **same `llm.Client` and tool registry** as the HTTP path → single LLM summary (no duplicate agent implementation).

**Phase 3:** When `base_url` points off-machine, **document and manually test** HTTPS + Bearer against that host (configuration already supports it; verification is the deliverable).

---

## Main (`main.go`)

```text
load config
init LLM client
init memory (SQLite) when Phase 2+ tasks enabled
init tools registry (enabled tools only)
init agent
start HTTP server
start scheduler (Phase 3)
block forever (or graceful shutdown later)
```

---

## Build & run

```bash
go mod init assistant
go mod tidy
go run .

go build -o assistant .
./assistant
```

---

## Strong Next Steps (Updated Roadmap)

| Priority | Focus | Why this matters | Done when |
| -------- | ----- | ---------------- | --------- |
| **P1** | Add automated tests for core flows (`agent` loop, tool dispatch, memory store, slash commands) | Current behavior is feature-rich and needs regression protection | `go test ./...` covers happy-path + error-path for core components |
| **P1** | Add integration smoke tests for HTTP endpoints (`/ask`, `/ask/stream`, `/api/status`) | Ensures API contracts stay stable while internals evolve | CI/local script validates endpoint responses and SSE framing |
| **P1** | Harden long-term memory retrieval strategy (ranking, truncation, prompt budget) | Prevents noisy memory context and improves answer quality | Memory context injection is deterministic, bounded, and tested |
| **P2** | Improve news pipeline quality (dedupe/rank, stricter preference filtering, timeout/fallback handling) | Reduces irrelevant headlines and brittle external feed behavior | News responses remain useful with partial feed failures |
| **P2** | Add graceful shutdown + lifecycle management | Cleaner exits for DB, scheduler, and server in production-like runs | SIGINT/SIGTERM stops server, scheduler, and DB cleanly |
| **P3** | Validate remote Ollama deployment profile (HTTPS, auth, latency budgets, retry policy) | De-risks moving from local to remote inference | Documented runbook + tested config profile for remote endpoint |

Recommended immediate sequence:

1. Build a small test harness around `agent.Run` + fake tools/LLM.
2. Add endpoint-level integration tests (including SSE).
3. Tune memory-context budget rules after baseline tests are in place.

---

## End goal

A **config-driven** agent that can **swap endpoints** by editing YAML (`base_url`, `api_key`, `model`), run **real tools** with **native chat tool calls**, and stay **small enough to read and change** without a framework.