Skip to content

How It Works - Architecture Overview

At the core of DevLingo is an intelligent “tiered lookup” system: starting with fast local lookups and escalating to AI reasoning as needed. This way, most lookups complete in milliseconds, while even complex ones take no more than 2 seconds.

User selects text in any app
[Trigger] Press ⌘⇧D hotkey
[Extract] TextExtractor captures:
• The selected text itself
• 50 characters of surrounding context (for AI to understand context)
• Current app's bundleIdentifier (Xcode / Slack / GitHub, etc.)
[Classify] InputTypeDetector determines input type:
• Chinese → Express mode
• Single word → Word mode
• 2-4 words with no sentence structure → Phrase mode
• Complete sentence ≤20 words → Sentence mode
• Multiple sentences or >20 words → Paragraph mode
[Lookup] LookupCoordinator performs tiered lookup (details below)
[Render] FloatingPanelController:
• Pops up an NSPanel floating window below the cursor
• Displays a skeleton screen (loading animation)
• Selects the appropriate View subclass based on mode (WordView / PhraseView / SentenceView, etc.)
• When data arrives, smoothly fades in the results
[Interact] User can:
• Click the play button to hear pronunciation
• Click a word for a recursive lookup
• Save to "Word Book" (SwiftData)
• Close the floating window and continue working

This is the key to DevLingo’s performance. Lookups are performed in priority order, returning as soon as a result is found:

Tier 1: Local Technical Vocabulary (<50ms)

Section titled “Tier 1: Local Technical Vocabulary (<50ms)”

For English input, the local SQLite database is queried first. It contains 85+ pre-loaded common development terms:

idempotent, deployment, microservice, containerization,
latency, throughput, cache invalidation, API gateway,
circuit breaker, distributed tracing, ...

Hits roughly 30% of developer vocabulary queries. Ultra-fast response.

Tier 2: Local Development Phrase Database (<100ms)

Section titled “Tier 2: Local Development Phrase Database (<100ms)”

For 2-4 word phrases, the local phrase database is queried (50+ pre-loaded phrases):

yak shaving, bikeshedding, rubber ducking, code smell,
low-hanging fruit, technical debt, nerd sniping, ...

Hits roughly 20% of phrase queries.

Words, phrases, and sentences the user has previously looked up are cached locally in SwiftData. If found, results are returned immediately.

User looked up "idempotent" → next lookup <10ms

Hit rate is roughly 50% (depending on usage history).

If the first three tiers all miss, the Claude API is called for a structured response.

Request: {
"text": "gracefully degrade",
"mode": "phrase",
"context": "when upstream dependencies are unavailable",
"sourceApp": "Xcode",
"userLanguage": "zh-CN"
}
Response: {
"type": "phrase", // compound verb
"definition_en": "...",
"definition_zh": "...",
"examples": [...],
"pronunciation": {...},
"register": "technical",
"l1_tips": "..."
}

:::note Why the Tiered Design

  • 95% of common vocabulary hits the first three tiers, returning in <100ms
  • 5% of rare or new terms require the Claude API, but still only take 1-2 seconds
  • Without a network, the local database and cache are still available
  • Saves user quota — Claude API pricing is usage-based; the tiered system reduces API calls by 95% :::

The DevLingo backend is deployed on Cloudflare Workers edge nodes to handle API requests.

Mac App
↓ HTTPS (Bearer token)
Cloudflare Workers (Edge)
├─ API Gateway (Hono router)
├─ Auth Middleware (JWT verification)
├─ Lookup Endpoint (/api/lookup)
│ └─ Claude API Client (proxies AI requests)
├─ TTS Endpoint (/api/tts)
│ └─ Google Cloud TTS (proxies pronunciation synthesis)
└─ Data Sync Endpoints
└─ Cloudflare D1 (user database)
  • Hono Framework: Lightweight HTTP framework optimized for Cloudflare Workers
  • D1 Database: SQLite on Cloudflare, stores user Word Book and sync data
  • KV Store: Session tokens, rate limiting cache, API response cache
  • Claude API Proxy: The Mac app sends requests to Claude through Workers, avoiding local API key exposure

TextExtractor uses AXUIElement (macOS Accessibility API) to capture:

User selects "idempotent" in Slack
Extracted result:
{
"selectedText": "idempotent",
"beforeContext": "We need to make sure this endpoint is ",
"afterContext": " for retry requests.",
"sourceApp": "com.tinyspeck.slackmacgap", // Slack
"fullSentence": "We need to make sure this endpoint is idempotent for retry requests."
}

This context is sent to Claude, helping the AI understand: “idempotent here refers to an API design concept,” rather than something else.

if text contains Chinese:
→ Express mode
elif word count == 1:
→ Word mode
elif word count in [2, 3, 4]:
if contains full sentence structure:
→ Sentence mode
else:
→ Phrase mode
elif sentence count > 1 or word count > 20:
→ Paragraph mode
else:
→ Sentence mode

:::tip Smart Detection Input detection happens locally and instantly, with no API call. This ensures the user never notices any delay from the “determine input type” step. :::

FloatingPanelController creates an NSPanel (floating window):

Properties:
• Appears above all apps (Level: screenSaver)
• Position: directly below the cursor (y offset +20px to avoid obstruction)
• Initial state: skeleton screen (loading skeleton)
• Data arrival: fade in, destroy skeleton
• Interaction: fully passes through mouse events (doesn't block background apps)
• Close methods: click outside, press ESC, auto-close after 5 minutes of inactivity
  • Local database lookup: <50ms (99th percentile)
  • Cache hit: <10ms
  • Full end-to-end (including API): <2s (99th percentile)
  • Floating window appearance: <300ms (perceived as fast by users)
  • TTS audio generation: First time <2s, subsequent cache <100ms

:::note Caching Strategy TTS audio is also cached locally and in KV; pronunciation for the same word is only synthesized once. :::

User text
Mac App (HTTPS encrypted)
Cloudflare Edge (instant processing)
├─ Claude API request (ephemeral, not used for training)
└─ Result cached in KV (auto-expires)
Result returned to Mac
Local SwiftData storage (on-device, optional cloud sync to D1)

Text content is not stored unless the user explicitly saves it to the Word Book. See Data Privacy & Security for details.

DevLingo’s architecture philosophy: “Speed first, AI as a complement, privacy above all.”