Role : AI Content & prompt Evaluator
Remote (Italian /German/French)
Job Summary
We are seeking culturally and linguistically skilled professionals to evaluate and improve multilingual datasets for large language models (LLMs). The role includes reviewing translations, creating prompts, rating model responses, and identifying cultural nuances or biases.
Responsibilities
- Review and refine prompts and translations for accuracy and naturalness.
- Design and update rubrics with cultural/linguistic examples.
- Evaluate model responses and document issues like cultural bias or insensitive outputs.
- Create prompts to test cultural awareness in LLMs.
- Provide clear justifications for evaluations.
Qualifications
- Native proficiency in the target language- Italian , German Or French with strong cultural knowledge.
- Experience in LLM evaluation, content moderation, or linguistic QA preferred.
- Detail-oriented with strong language and cultural analysis skills.
- Comfortable using spreadsheets and evaluation templates.
- Bachelor’s degree in Humanities or related fields preferred.
Preferred
- Experience in prompt engineering or LLM testing.
- Familiarity with LLM tools (e.g., Gemini, ChatGPT).
- Ability to explain reasoning behind ratings or edits.
Job Type: Part-time
Pay: $17.00 - $19.00 per hour
Expected hours: 20 – 25 per week
Experience:
- AI: 1 year (Preferred)
- LLM: 1 year (Preferred)
Language:
- Italian (Preferred)
- French (Preferred)
- German (Preferred)
Work Location: Remote