← Back to portfolio

How this AI works

Full transparency on the tech, data flow, and what gets stored.

Model

The chat is powered by a local LLM running in LM Studio: qwen/qwen3-4b-2507. No data is sent to OpenAI or other cloud providers. Inference runs on your machine (or a server you control).

Memory layers

The AI uses an OpenClaw-inspired memory system:

  • Short-term: The current conversation (in the model context window).
  • Medium-term: Summaries of past conversations, stored per session.
  • Long-term: Facts you explicitly ask to remember (e.g. "remember that I like Python").

Memory is keyed by an anonymous session cookie (no login). Each visitor gets their own memory for the duration of their session.

Telemetry

Optional: Chat requests can be logged to Langfuse for observability (traces, latency, token counts). Configure LANGFUSE_* env vars to enable. Without them, no telemetry is sent anywhere.

Data storage

Chat messages and memory summaries are stored locally in SQLite (file: .data/portfolio.db). Everything stays on your machine. Data is scoped by session ID.

AI improvement

The AI improves from past chats via memory retrieval (RAG-style): summaries are injected into the system prompt so it remembers context. A separate fine-tuning pipeline (LoRA on curated chats) can be used to train improved adapters—documented as a capability, not automated in this app.

Architecture

Browser (cookie) → Next.js /api/chat
  → Session (SQLite)
  → Memory fetch (SQLite)
  → LM Studio (local LLM)
  → Stream response
  → Persist messages (SQLite)
  → Summarize / extract "remember" (LM Studio)
  → Langfuse trace (optional)

Questions? Connect on LinkedIn