# Local AI Assistant (Go) - Implementation Spec (Config-Driven, Single LLM Client)
## Goal
Build a **minimal, self-hosted AI assistant** that:
- Uses **config.yaml** as the single source of truth
- Talks to an **Ollama-compatible endpoint** (local or remote)
- Uses **native tool-calling** (`/api/chat` with `tools` / `tool_calls`) to perform tasks
- Runs as a **single Go binary** (~600–900 LOC *initial target for Phases 1–2*; Phase 3 will add code)
Scope is intentionally focused:
✅ Agent
✅ Tools
✅ Config-driven LLM
✅ HTTP API
❌ No voice
⚠️ Basic embedded web UI exists; no advanced dashboard/features yet
❌ No multi-provider abstraction (for now)
---
## Current Project Status (Reviewed Mar 2026)
The codebase has moved beyond the original Phase 1-3 draft. This section reflects what is already implemented:
- Config loading + validation via `config/config.go` with fail-fast checks for required fields and enabled tool prerequisites.
- Single LLM client in `llm/llm.go` using Ollama-compatible `POST /api/chat` with optional Bearer auth.
- Agent tool loop implemented in `agent/agent.go` with bounded tool rounds, telemetry timing, and history retention/compaction.
- Tool registry + tool schemas implemented in `tools/registry.go`.
- Implemented tools: weather, news (headlines + article fetch), tasks, memory, and code read/write tools (workspace guardrails + size limits).
- SQLite ownership implemented in `memory/sqlite.go` and `memory/memory.go` for both tasks and long-term memories.
- HTTP API implemented with:
- `POST /ask`
- `POST /ask/stream` (SSE events)
- `GET /api/status` (telemetry + memory/context metrics)
- Embedded static UI serving from `server/static/*`
- Session and memory control commands implemented in `server/api.go`:
- `/undo`, `/regenerate`, `/clear-session`, `/compact-session`
- dangerous memory operations with confirmation flow (`/clear-longterm`, `/compact-longterm`, `/confirm`, `/cancel`)
- Automation scheduler implemented in `automation/scheduler.go` and `automation/morning.go`:
- daily morning briefing
- optional timezone support
- telemetry for next/last run
Items from the old plan that are now outdated:
- “No dashboard” is no longer accurate (embedded web UI exists).
- “Not in scope for MVP: streaming” is no longer accurate (`/ask/stream` is implemented).
- “Tasks disabled until Phase 2 code exists” is obsolete (tasks + memory stack are implemented).
- Original phase table no longer matches the current maturity.
---
## Constraints
- Everything configurable via YAML; no hard-coded provider switching beyond `base_url` + `api_key`
- One LLM client file: [`llm/llm.go`](llm/llm.go)
- One process, one binary
- Tools = small, isolated units behind a registry
- Minimal abstractions; readable in one sitting
- No provider abstraction layer (yet)
- **No second agent** for automation: scheduled jobs reuse the same LLM client and tool registry as `POST /ask`
---
## High-Level Architecture
```text
client (curl / script)
│
▼
assistant (Go binary)
│
├ agent
│ ├ LLM (config-driven, /api/chat)
│ └ tools (registry)
├ tools (implementations)
├ memory (SQLite - single owner of DB)
└ automation (scheduler → same agent path as API)
```
### `/ask` flow (reference)
```mermaid
sequenceDiagram
participant Client
participant API
participant Agent
participant LLM
participant Tools
Client->>API: POST /ask
API->>Agent: Run(prompt)
Agent->>LLM: chat messages + tool defs
LLM-->>Agent: tool_calls or final text
loop While tool_calls
Agent->>Tools: Run(name, args)
Tools-->>Agent: tool result
Agent->>LLM: chat + tool results
LLM-->>Agent: next message or final
end
Agent-->>API: reply
API-->>Client: JSON response
```
---
## Project Structure
```text
assistant/
│
├ go.mod
├ main.go
│
├ config/
│ ├ config.go
│ └ config.yaml
│
├ server/
│ ├ http.go # listen, mux, lifecycle (HTTP server)
│ └ api.go # route handlers
│
├ agent/
│ ├ agent.go
│ ├ prompt.go
│ └ toolcall.go
│
├ llm/
│ └ llm.go # sole HTTP client to Ollama-compatible API
│
├ tools/
│ ├ registry.go
│ ├ weather.go
│ ├ news.go
│ ├ tasks.go # calls memory package only (no direct DB open)
│ └ code.go
│
├ memory/
│ ├ sqlite.go # open DB, migrations/schema (tasks + memories)
│ └ memory.go # task CRUD + future memory APIs
│
├ automation/
│ ├ scheduler.go
│ └ morning.go # weather + news + tasks → LLM summary (same client/registry as /ask)
│
└ util/
├ http.go # shared helpers (e.g. JSON decode, header copy), not the listening server
└ log.go # small logging helpers (stderr prefix, optional level); stdlib-only is fine
```
[`server/http.go`](server/http.go) owns binding `host:port` and routing; [`util/http.go`](util/http.go) holds small shared HTTP utilities used by LLM client and handlers; [`util/log.go`](util/log.go) centralizes log formatting so packages do not each invent their own prefix style.
---
## Config System
### `config/config.yaml`
```yaml
llm:
base_url: http://localhost:11434
model: qwen2.5:7b
api_key: "" # optional; Bearer when non-empty (local Ollama usually blank)
agent:
system_prompt: |
You are a personal assistant.
You have access to tools.
Use tools when helpful.
server:
host: 0.0.0.0
port: 8080
automation:
morning_briefing_hour: 7
# Optional later: timezone: America/Los_Angeles (IANA); if omitted, use host local TZ
tools:
weather:
enabled: true
lat: 39.16
lon: -120.14
news:
enabled: true
feeds:
- https://news.ycombinator.com/rss
# Phase 1: keep false until memory/tasks exist; enabling early should fail startup (see config.go)
tasks:
enabled: false
code:
enabled: false
workspace: ./workspace
```
**Config vs phases:** A Phase 1 build only ships weather + news. If `tools.tasks.enabled` or `tools.code.enabled` is `true` while that code path is absent, **exit at startup** with an explicit error-never register silent no-ops.
### `config/config.go`
- Load YAML from a known path (CLI flag or cwd)
- Validate required fields; **fail fast** on bad config
- Gate tools: if YAML enables a tool the binary does not implement yet, **fail fast** with a clear message (or split sample configs per phase)
- Expose typed structs consumed by `main`, `llm`, `server`, `tools`, `automation`
---
## LLM layer (single file): `llm/llm.go`
This is the **only** code that calls the Ollama-compatible HTTP API.
**Assumption:** Ollama-compatible **`POST {base_url}/api/chat`** with **`stream: false`** for Phase 1–2. Prefer `/api/chat` for multi-turn tool loops; `/api/generate` is not the primary path for this design.
**Config:** `base_url`, `model`, `api_key` (optional).
**Auth:** If `api_key` is non-empty, set `Authorization: Bearer <api_key>`; otherwise omit.
### Minimal shapes (illustrative)
**Request (non-streaming chat + tools):**
```json
{
"model": "qwen2.5:7b",
"messages": [
{ "role": "system", "content": "..." },
{ "role": "user", "content": "..." }
],
"tools": [ { "type": "function", "function": { "name": "...", "description": "...", "parameters": { ... } } } ],
"stream": false
}
```
**Response:** Parse `message.content` and/or `message.tool_calls` (names + arguments) from the Ollama/Ollama-compatible JSON. For each tool result, append messages that match **your server’s documented tool-result shape** (often `role: "tool"` plus fields that tie back to the assistant `tool_calls`, e.g. name / id-field names vary by Ollama version). Repeat until the model returns final text without tool calls.
**Not in scope for MVP:** streaming (`stream: true`), speculative multi-provider clients.
**Interop:** `ChatRequest` / `ChatResponse` structs may need small tweaks for the exact JSON your Ollama version returns-validate against a live `curl` capture once, then freeze the structs.
### Suggested API surface
```go
type Client struct {
BaseURL string
Model string
APIKey string
}
func NewClient(cfg Config) *Client
// Chat sends one non-streaming /api/chat request; caller builds messages + optional tools.
func (c *Client) Chat(ctx context.Context, body ChatRequest) (*ChatResponse, error)
```
Typed `ChatRequest` / `ChatResponse` structs mirror the minimal JSON fields you need-avoid hand-waving `map[string]any` in the hot path if you can keep structs small.
---
## Agent: `agent/agent.go`
**Core loop:**
```text
input
↓
LLM (chat + tool definitions)
↓
if tool_calls → execute tools → append results → LLM again
↓
final natural language reply
```
**Responsibilities:**
- Inject system prompt and user message
- Register only **enabled** tools from config in the chat `tools` array
- Run tool handlers; stringify or JSON-encode results; attach tool results using the **same message shape** your `/api/chat` server expects (see LLM layer note on tool replies)
- Return final assistant string to the HTTP layer
### Prompt system: `agent/prompt.go`
Assemble chat messages:
- System: from `config.agent.system_prompt` plus a concise list of **enabled** tool names/descriptions (no dead tools in the prompt)
- User: request body
### Tool dispatch: `agent/toolcall.go`
- Map `tool_calls[i].function.name` (OpenAI-style; adjust field paths if your Ollama build differs) → registry entry
- Parse arguments (JSON) into what each tool expects
- Errors from tools → surface as tool result text or logged + user-safe message (pick one pattern and stay consistent)
---
## Tool system
### `tools/registry.go`
```go
type Tool struct {
Name string
Description string
Parameters json.RawMessage // JSON Schema object for /api/chat
Run func(ctx context.Context, args json.RawMessage) (string, error)
}
```
Registry: `map[string]Tool` built at startup from config (`enabled` flags).
### `tools/weather.go`
- Open-Meteo (or equivalent); `lat` / `lon` from config
### `tools/news.go`
- Fetch RSS from `tools.news.feeds`
### `tools/tasks.go`
- **Does not open SQLite.** Calls [`memory`](memory/) package: add / list / complete tasks.
### `tools/code.go`
Filesystem helper for a configured workspace (e.g. Cursor workflows).
**Guardrails (in scope; not a full sandbox):**
- Resolve all paths under `tools.code.workspace` (config); reject `..` segments and paths that escape the workspace after `filepath.Clean` + root prefix check
- Optional max read/write size (e.g. reject files above N MB) to avoid accidental huge reads
- **Non-goal:** OS-level sandboxing, arbitrary command execution, or network from this tool
**Surface:** e.g. `ReadFile(relPath)`, `WriteFile(relPath, content)` as thin wrappers over `os` within the workspace root only (no SQLite; not the `memory` package).
---
## SQLite ownership: `memory/`
- [`memory/sqlite.go`](memory/sqlite.go): single `sql.DB`, schema/migrations, tables **`tasks`** and **`memories`** (memories may stay unused until you need them)
- [`memory/memory.go`](memory/memory.go): exported functions for task CRUD (and future “memories” APIs)
[`tools/tasks.go`](tools/tasks.go) depends on `memory` only-one owner for the DB avoids duplicate connections and drift.
---
## Server
### `server/http.go`
- Listen on `config.server.host:port`
- Attach routes from `api.go`
### `server/api.go`
- **`POST /ask`** - body: `{ "prompt": "..." }`
- Flow: handler → agent → JSON response with assistant text (and optional debug fields only if you want them behind a flag later)
---
## Automation
### `automation/scheduler.go`
- `time.Ticker` or schedule: run jobs at configured times
- **`morning_briefing_hour`:** interpret in the **host’s local timezone** until an optional `automation.timezone` (IANA) exists in config
### `automation/morning.go`
**Pipeline:** gather weather + news + task summary inputs → **same `llm.Client` and tool registry** as the HTTP path → single LLM summary (no duplicate agent implementation).
**Phase 3:** When `base_url` points off-machine, **document and manually test** HTTPS + Bearer against that host (configuration already supports it; verification is the deliverable).
---
## Main (`main.go`)
```text
load config
init LLM client
init memory (SQLite) when Phase 2+ tasks enabled
init tools registry (enabled tools only)
init agent
start HTTP server
start scheduler (Phase 3)
block forever (or graceful shutdown later)
```
---
## Build & run
```bash
go mod init assistant
go mod tidy
go run .
go build -o assistant .
./assistant
```
---
## Strong Next Steps (Updated Roadmap)
| Priority | Focus | Why this matters | Done when |
| -------- | ----- | ---------------- | --------- |
| **P1** | Add automated tests for core flows (`agent` loop, tool dispatch, memory store, slash commands) | Current behavior is feature-rich and needs regression protection | `go test ./...` covers happy-path + error-path for core components |
| **P1** | Add integration smoke tests for HTTP endpoints (`/ask`, `/ask/stream`, `/api/status`) | Ensures API contracts stay stable while internals evolve | CI/local script validates endpoint responses and SSE framing |
| **P1** | Harden long-term memory retrieval strategy (ranking, truncation, prompt budget) | Prevents noisy memory context and improves answer quality | Memory context injection is deterministic, bounded, and tested |
| **P2** | Improve news pipeline quality (dedupe/rank, stricter preference filtering, timeout/fallback handling) | Reduces irrelevant headlines and brittle external feed behavior | News responses remain useful with partial feed failures |
| **P2** | Add graceful shutdown + lifecycle management | Cleaner exits for DB, scheduler, and server in production-like runs | SIGINT/SIGTERM stops server, scheduler, and DB cleanly |
| **P3** | Validate remote Ollama deployment profile (HTTPS, auth, latency budgets, retry policy) | De-risks moving from local to remote inference | Documented runbook + tested config profile for remote endpoint |
Recommended immediate sequence:
1. Build a small test harness around `agent.Run` + fake tools/LLM.
2. Add endpoint-level integration tests (including SSE).
3. Tune memory-context budget rules after baseline tests are in place.
---
## End goal
A **config-driven** agent that can **swap endpoints** by editing YAML (`base_url`, `api_key`, `model`), run **real tools** with **native chat tool calls**, and stay **small enough to read and change** without a framework.