Freelance AI Agent Evaluation Engineer
Mindrift · Jordanie
Job description
About the role
Mindrift is looking for a freelance engineer to help build a dataset that evaluates AI coding agents on realistic developer tasks. You will design challenging tasks, write robust evaluation criteria, and work closely with AI models to ensure the tests accurately reflect real‑world development scenarios.
Key responsibilities
- Create virtual company environments with codebases, infrastructure, documentation, and development history.
- Design and calibrate tasks from intermediate states, crafting prompts and fair evaluation metrics.
- Develop isolated workstation simulations (Linux, CLI tools, repositories, task trackers, messengers, and a web application codebase).
- Write comprehensive tests that accept correct solutions and reject incorrect ones without being overly strict or lenient.
- Iterate with AI agents on test quality, reviewing code, analyzing failures, and designing edge‑case scenarios.
- Incorporate feedback from expert QA reviewers to improve task quality.
Required profile
- Degree in Computer Science, Software Engineering, or a related field.
- 5+ years of software development experience, primarily with Python.
- Comfortable reading and reasoning about full‑stack code (frontend and backend).
- Strong English proficiency (minimum B2).
Required skills
- Python (FastAPI, pytest, async/await, subprocess, file operations)
- React, JavaScript, TypeScript
- Docker containers
- PostgreSQL, Kafka, Redis
- CI/CD pipelines, GitHub Actions
What we offer
- Project‑based freelance contract with flexible hours.
- Opportunity to work on cutting‑edge AI evaluation challenges.
- Collaboration with a team of AI and software experts.
Questions fréquentes
Why are you reporting this job?
Apply in 30 seconds
Enter your email to apply. An account will be created automatically.
By continuing, you accept our terms of use.
Already have an account? Login
Published 9 hours ago
Expires 1 month from now
1 views · 0 applications
Boost your chances
Upload your CV — we will match you with relevant openings.
Analyzing your CV...
Mindrift
Jordanie