Job Title: Prompt Engineer – Guardrails & Evals Specialist
Location: Hyderabad, India
Onsite – 5 days a week
Role Overview:
- We are looking for a Prompt Engineer – Guardrails & Evals Specialist to build and operate safe, reliable, and compliant LLM-powered systems. This role is focused on LLM guardrails, evaluation frameworks, and prompt robustness for enterprise-grade AI applications.
- You will collaborate closely with AI engineers, platform teams, product owners, and compliance stakeholders to ensure AI systems behave predictably, safely, and in alignment with business and regulatory requirements.
Work Experience:
- 3–8+ years (AI/ML, Prompt Engineering, or Applied LLM Systems)
Key Responsibilities:
- Prompt Engineering & Control
- Design system, developer, and user prompts with strict instruction hierarchy
- Create deterministic, grounded, and policy-aligned prompts
- Protect against prompt injection, jailbreaks, and hallucinations
- Enforce structured outputs using JSON schema, regex, and validators
- LLM Guardrails & Governance
- Define and implement AI guardrails for safety and compliance
- Handle content moderation (PII, PHI, toxicity, bias, restricted content)
- Design refusal, fallback, and escalation mechanisms
- Align AI behavior with enterprise governance and compliance standards
- LLM Evaluations (Evals)
- Build automated and human-in-the-loop eval pipelines
- Measure accuracy, relevance, faithfulness, instruction adherence, and safety
- Create golden datasets and benchmark suites
- Perform regression testing for prompts and model updates
- Conduct A/B testing for prompt optimization
- RAG Safety & Quality
- Evaluate retrieval quality and context relevance
- Enforce grounding and citation standards
- Prevent cross-session data leakage
- Optimize chunking, ranking, and context window strategies
Monitoring & Observability:
- Define AI quality and safety metrics
- Analyze failures and model drift
- Implement continuous improvement loops
- Integrate evals into CI/CD pipelines
Required Skills & Qualifications:
Core Skills:
- Strong hands-on experience in prompt engineering for LLMs
- Deep expertise in LLM guardrails, safety, and governance
- Practical experience building LLM evaluation frameworks
- Proven ability to detect and mitigate hallucinations
- Strong analytical and debugging skills
Technical Skills:
- Python (primary language for evals and tooling)
- JSON Schema, Pydantic, structured output validation
- Experience with OpenAI, Anthropic, Gemini, or similar LLM platforms
- Knowledge of RAG pipelines and vector databases
- Git-based version control and experimentation workflows
Nice to Have:
- Experience in regulated domains (Healthcare, Insurance, Finance)
- Familiarity with AI risk and model governance frameworks
- Exposure to multi-agent systems or tool-calling architectures