AI Prompt Engineer
San Francisco, CA (On-Site M-F)
Our client is an early-stage, AI-native technology company building AI-powered call center and scheduling agents.
About the Role
As an AI Prompt Engineer, you will own critical behavioral slices of production voice agents used, shaping both shared and customer-specific behaviors across thousands of calls. You will design prompts, sub-agent architectures, and evaluation harnesses to iteratively improve automation, booking, and resolution rates using real call data.
Responsibilities
- Write, maintain, and version-control production prompts for intent classification, information extraction, scheduling and availability negotiation, verification flows, objection handling, and edge-case recovery across all customer deployments.
- Review failed or low-performing calls daily, identify root causes, and ship targeted prompt or configuration updates multiple times per week to measurably improve automation and booking metrics.
- Design and manage sub-agent architectures (e.g., routing, specialist agents, fallback handlers) that support complex multi-turn healthcare workflows while maintaining latency and reliability requirements.
- Build and maintain offline evaluation harnesses, including curated eval sets, automated prompt optimization workflows (e.g., GEPA-style approaches), and regression test suites for safe shipping of changes.
- Collaborate on human-in-the-loop onboarding flows that translate practice-specific intake forms, scheduling rules, and quirks into robust agent configurations, and define customer-specific evaluation metrics.
- Simulate real-world caller scenarios, monitor live production performance dashboards, detect drift or degradation early, and coordinate fixes with engineering and operations.
- Partner closely with software engineers to integrate prompts and agents into the broader AI stack, ensuring clean interfaces, observability, and reliable deployments in a high-volume environment.
Qualifications
- 2+ years of experience with AI/ML, NLP, or prompt engineering in production, including hands-on work shipping prompts or agents that real users relied on.
- Demonstrated experience writing, testing, and iterating prompts for tasks such as classification, information extraction, scheduling, or conversational flows in high-stakes or operational contexts.
- Strong analytical, data-driven mindset with comfort designing experiments, reading dashboards, and justifying changes with metrics (e.g., conversion or booking rate improvements).
- Excellent writing skills, including sensitivity to tone, register, and phrasing in spoken or TTS-delivered interactions.
- Comfort reading in Python and working familiarity with TypeScript, with the ability to collaborate effectively using modern AI coding tools.
- On-site availability five days per week in the San Francisco Bay Area.
Preferred Skills
- Prior experience with voice AI, TTS, ASR, or telephony platforms and real-time conversational systems.
- Automated prompt optimization experience using frameworks or approaches such as DSPy, GEPA, or similar techniques.
- Experience building and maintaining evaluation suites, test harnesses, or CI pipelines for LLM-based agents.
- Academic or practical training in linguistics, philosophy, cognitive science, or related fields that inform language and conversation design.