Chaseit builds AI voice agents for debt collection. 25,000+ calls daily across 6+ countries and languages. You'll be investigating agent performance, figuring out why conversations fail, and fixing it.
What you'll do
Your main work is writing prompts and building evaluators. You'll refine how agents respond based on real call outcomes and client feedback, then build LLM-based systems to test if those changes actually work at scale.
Specifically:
- Write and test prompts across languages and conversation scenarios
- Build AI evaluators that assess conversation quality automatically
- Create evaluation datasets and run systematic experiments
- Analyze failure patterns in real conversations and implement fixes
- Design quality checks that catch problems before they reach production calls
What you need
- 2-5 years working with LLMs (prompting, testing, building evals)
- Python or JavaScript skills for running tests and analyzing results
- Ability to work through messy problems systematically
- Clear written English for documenting what you find
If you've worked with conversational AI, voice systems, or multi-language models, even better.
Details
- Salary: €2,500-4,000/month (depending on experience)
- Equity: Early team members get stock options
- Locations: Vilnius (on-site), London (hybrid), or EEA remote for strong candidates
We're VC-backed, working with lenders across EMEA and US. Small team, direct access to founders, real impact on product direction.