Under the hood
A deep dive into the graph database, AI pipelines, three-tier storage model, and the design decisions that make Noemata work.
Stack
Data model
Content flows through three tiers based on size and retrieval needs. Short content lives inline on graph nodes. Longer content is stored as blobs. Large content is chunked for semantic retrieval.
Node metadata, relationships, vector embeddings, and short content under 100 words. Every node carries its type, timestamps, and properties. Relationships are first-class citizens with 16 semantic types.
Full note bodies, PDFs, images, and attachments for content over 100 words. Stored as markdown files keyed by node type and ID. Referenced via content_uri on the graph node.
Content over 300 words is split into semantic chunks with overlap. Each chunk gets an embedding vector via text-embedding-3-small. Chunks link back to parents via CHUNK_OF and NEXT_CHUNK relationships.
POST /api/ingest → word count check
> 100 words → upload to R2, store content_uri on node
> 300 words → chunk into ~200-token segments with overlap
→ create Chunk nodes with CHUNK_OF relationships
→ generate embeddings (queued)Graph schema
Every piece of knowledge is a typed node. Every connection is a semantic relationship. The graph is the source of truth.
Intelligence layer
Claude powers both the ingestion pipeline (auto-linking on save) and the retrieval pipeline (GraphRAG for questions).
Runs on every note save
POST /api/ai/pipeline { nodeId, content }
→ extractEntities(content) // Claude: people, projects, concepts, tags
→ searchNodesFallback(term, user) // Neo4j: fuzzy match against graph
→ suggestRelationships(entities) // Claude: confidence-scored link proposals
→ confidence ≥ 0.85 ? auto-commit : surface for reviewRuns on every question in the chat sidebar
POST /api/ai/ask { question, pinnedNodeIds, contextNodeIds, history }
→ load pinned + context nodes (highest priority)
→ extractEntities(question) // what is the user asking about?
→ searchNodesFallback() × 5 terms // find matching nodes
→ getNeighbors(nodeId, depth=2) // traverse the graph
→ collect Chunk children for matched nodes
→ prioritize: direct matches → neighbors → chunks (8K token budget)
→ askWithContext(question, contextPackage) // Claude synthesizes answer
→ extract [citations] from response // link back to source nodesAwareness engine
AI-synthesized snapshots of project state and overall focus. Pulse queries the graph for recent activity, task distributions, and upcoming deadlines — then asks Claude to distill it into actionable awareness.
Generated per-project by querying related nodes from the last 7 days, task status distributions, and recent activity. Stored as a Pulse node linked via PULSE_FOR.
MATCH (p:Project)-[r]-(n)
WHERE n.updatedAt >= $sevenDaysAgo
RETURN n, type(r), labels(n)[0]
ORDER BY n.updatedAt DESC LIMIT 20Synthesizes all project pulses, recent cross-project activity, and upcoming deadlines into a holistic view. Returns top-of-mind items, priorities, and open threads.
Output shape:
{
top_of_mind: string[] // 2-4 items
priorities: string[] // 3-5 actionable items
open_threads: string[] // 0-5 unresolved items
}API surface
Document pipeline
Documents flow through the same ingestion pipeline as notes. Upload a PDF, DOCX, or import from Google Drive — AI extracts entities and connects them to your graph.
POST /api/documents → create Document node (status: pending)
→ upload original to R2 (documents/{nodeId}.{ext})
→ fire-and-forget: /api/documents/:id/process
→ parse → summarize → chunk → Stage 1 pipeline → embeddings
→ update status: readyPatterns
Every Cypher query includes WHERE n.userId = $userId. Every API route calls requireUserId() before touching the graph. No data crosses user boundaries.
The Neo4j driver is a module-scoped singleton — survives across warm Lambda invocations, reconnects on cold starts. Max pool size of 5, with 10s connection acquisition timeout.
All queries use parameterized Cypher via a helper module. No string interpolation. No template literals in queries. Every value goes through $params.
Content is split on paragraph boundaries, then sentence boundaries for oversized paragraphs. Short segments merge up to ~200 tokens. 20-token overlap between consecutive chunks for context continuity.
GraphRAG caps context at 8,000 tokens. Direct node matches get priority, then 1-hop neighbors, then chunks. Budget is tracked and stops adding context when exhausted.
AI relationship suggestions above 85% confidence are auto-committed. Below that threshold, they surface in the ReviewModal for human approval. All auto-actions are revertible.
Security
Your knowledge graph contains your most personal thoughts. Every layer of the stack is designed to keep them safe.
Both Anthropic and OpenAI API policies guarantee that API inputs are not used for model training.
Upstash Redis-backed sliding window rate limits — 20 req/min on AI routes, 100 req/min on CRUD.
Every POST/PUT endpoint validates request bodies with Zod schemas. Malformed payloads are rejected before touching the database.
CSP, HSTS, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, and Permissions-Policy on every response.
SOC 2 Type II certified auth provider. Middleware enforces auth at the edge before routes execute.
All data encrypted at rest (Neo4j AuraDB, Cloudflare R2) and in transit (TLS 1.3).
Algorithm spotlight
The chunker balances semantic coherence with retrieval granularity. Paragraphs are the primary boundary, with sentence-level splitting as fallback for oversized blocks.
function chunkText(text, options) {
// Step 1: Split on paragraph boundaries (double newlines)
const paragraphs = text.split(/\n\s*\n/)
// Step 2: Break oversized paragraphs into sentences
// Sentences split on [.!?] followed by space or end
for (const para of paragraphs) {
if (estimateTokens(para) > maxTokens) {
segments.push(...splitSentences(para))
} else {
segments.push(para)
}
}
// Step 3: Merge small segments until ~200 tokens
// Keeps related content together
// Step 4: Add 20-token overlap between consecutive chunks
// Previous chunk's tail prepended to next chunk's head
// Maintains context continuity for retrieval
}