As a Prompt Engineer focused on Data Science and Quality Analysis, you'll design, test, and evaluate prompts for AI systems that interact with real-world restaurant data. You'll work cross-functionally to develop AI solutions that drive operational efficiency, improve data interpretation, and support smarter decision-making for restaurant operators.
Your work will directly shape how AI models perform in high-stakes, dynamic environments like order processing, reporting, support automation, and performance analysis.
Essential Job Functions:
- Prompt Design & Evaluation: Develop, test, and refine prompts for tasks such as text generation, question answering, data classification, and structured data extraction to optimize Voice AI performance.
- Data-Driven Analysis & Quality Measurement: Design evaluation frameworks and analyze prompt outputs using quantitative metrics, human-in-the-loop evaluation, and user feedback to identify improvement opportunities.
- Experimentation & Iteration Conduct experiments to test prompt variations, measure their business and operational impact, and iterate to enhance accuracy, consistency, and safety.
- Regression Testing & Compliance Build principled regression test suites using tools like LangFuse and Galileo to ensure prompts remain compliant and high-performing as models and use cases evolve.
- Collaboration Across Teams: Work closely with data science, product, legal, engineering, and operations teams to align prompt designs with business goals, operational workflows, and compliance requirements.
- Model Adaptation & Strategy Development prompts across multiple LLMs (GPT, LLaMA, Gemini, and Checkmate's fine-tuned models), understanding model differences to optimize outputs effectively.
- Team Leadership & Mentorship Lead a team of analysts focused on prompt evaluation and data quality analysis, guiding prioritization, experimentation, and reporting. Collaborate with ops teams for seamless deployment and feedback loops.
- Research & Continuous Learning Stay up to date on emerging prompting techniques, LLM behaviors, evaluation frameworks, and AI safety practices to keep Checkmate's AI solutions best-in-class.
Requirements - Strong analytical and data science skills, with hands-on experience in Python (pandas, NumPy, scikit-learn)
- Experience designing and conducting experiments and evaluations in applied AI or NLP contexts
- Proficiency in SQL and working with relational databases (e.g. MySQL, PostgreSQL, Oracle, MS SQL)
- Good understanding of data processing, quality measurement, and testing fundamentals
- Experience leading analyst or operations teams, with strong prioritization, mentorship, and collaboration skills
- Strong problem-solving mindset with a drive to explore, optimize, and automate workflows
- Excellent communication skills for presenting insights to technical and non-technical stakeholders
- Bachelor's degree in Data Science, Computer Science, Statistics, Engineering, or a related field
- Flexible to work US hours until at least 6 pm ET, with a strong remote setup
Preferred Qualifications - Experience with LLM evaluation and prompt engineering workflows
- Familiarity with tools like LangFuse and Galileo for prompt evaluation and analysis
- Knowledge of cloud platforms (AWS, GCP, Azure) and data pipeline tools
- Familiarity with machine learning concepts and NLP workflows
- Master's or PhD in Data Science, Statistics, Computer Science, or a related field
Benefits - Health Care Plan (Medical, Dental & Vision)
- Retirement Plan (401k)
- Life Insurance (Basic, Voluntary & AD&D)
- Flexible Paid Time Off
- Family Leave (Maternity, Paternity)
- Short Term & Long Term Disability
- Training & Development
- Work From Home
- Stock Option Plan