How this AI works
Full transparency on the tech, data flow, and what gets stored.
Model
The chat is powered by a local LLM running in LM Studio: qwen/qwen3-4b-2507. No data is sent to OpenAI or other cloud providers. Inference runs on your machine (or a server you control).
Memory layers
The AI uses an OpenClaw-inspired memory system:
- Short-term: The current conversation (in the model context window).
- Medium-term: Summaries of past conversations, stored per session.
- Long-term: Facts you explicitly ask to remember (e.g. "remember that I like Python").
Memory is keyed by an anonymous session cookie (no login). Each visitor gets their own memory for the duration of their session.
Telemetry
Optional: Chat requests can be logged to Langfuse for observability (traces, latency, token counts). Configure LANGFUSE_* env vars to enable. Without them, no telemetry is sent anywhere.
Data storage
Chat messages and memory summaries are stored locally in SQLite (file: .data/portfolio.db). Everything stays on your machine. Data is scoped by session ID.
AI improvement
The AI improves from past chats via memory retrieval (RAG-style): summaries are injected into the system prompt so it remembers context. A separate fine-tuning pipeline (LoRA on curated chats) can be used to train improved adapters—documented as a capability, not automated in this app.
Architecture
Browser (cookie) → Next.js /api/chat → Session (SQLite) → Memory fetch (SQLite) → LM Studio (local LLM) → Stream response → Persist messages (SQLite) → Summarize / extract "remember" (LM Studio) → Langfuse trace (optional)
Questions? Connect on LinkedIn