Freelance AI Agent Evaluation Engineer
Mindrift · Jordanie
وصف الوظيفة
About the role
Mindrift is looking for a freelance engineer to help build a dataset that evaluates AI coding agents on realistic developer tasks. You will design challenging tasks, write robust evaluation criteria, and work closely with AI models to ensure the tests accurately reflect real‑world development scenarios.
Key responsibilities
- Create virtual company environments with codebases, infrastructure, documentation, and development history.
- Design and calibrate tasks from intermediate states, crafting prompts and fair evaluation metrics.
- Develop isolated workstation simulations (Linux, CLI tools, repositories, task trackers, messengers, and a web application codebase).
- Write comprehensive tests that accept correct solutions and reject incorrect ones without being overly strict or lenient.
- Iterate with AI agents on test quality, reviewing code, analyzing failures, and designing edge‑case scenarios.
- Incorporate feedback from expert QA reviewers to improve task quality.
Required profile
- Degree in Computer Science, Software Engineering, or a related field.
- 5+ years of software development experience, primarily with Python.
- Comfortable reading and reasoning about full‑stack code (frontend and backend).
- Strong English proficiency (minimum B2).
Required skills
- Python (FastAPI, pytest, async/await, subprocess, file operations)
- React, JavaScript, TypeScript
- Docker containers
- PostgreSQL, Kafka, Redis
- CI/CD pipelines, GitHub Actions
What we offer
- Project‑based freelance contract with flexible hours.
- Opportunity to work on cutting‑edge AI evaluation challenges.
- Collaboration with a team of AI and software experts.
Questions fréquentes
لماذا تبلغ عن هذا العرض؟
قدم طلبك في 30 ثانية
أدخل بريدك الإلكتروني للتقديم. سيتم إنشاء حساب تلقائياً.
بالمتابعة، أنت توافق على شروط الاستخدام.
لديك حساب بالفعل؟ تسجيل الدخول
عزز فرصك
حمّل سيرتك الذاتية وسنقترح عليك الوظائف التي تناسب ملفك.
جاري تحليل سيرتك الذاتية...
Mindrift
Jordanie