Senior AI Systems Engineer
The Role
We are looking for a Senior AI Systems Engineer to lead the architecture, deployment, evaluation, and governance of enterprise-grade autonomous AI agents for our clients.
This is not an exploratory Data Science or standard "prompt engineering" role. We need a hardened software engineer who operates at the intersection of Backend Engineering, Cloud Infrastructure, LLMOps, and AI Security. You will be responsible for taking experimental Agentic workflows and turning them into highly available, compliant, and scalable production systems using AWS Bedrock.
If you have the "production scars" from debugging non-deterministic AI workflows, writing defensive AWS Lambda functions for tool execution, and locking down AI architectures against enterprise security threats, we want you on our team.
What You Will Do
Agent Architecture & Software Engineering
- Design, build, and orchestrate complex autonomous AI agents using AWS Agent Core, Agents for Amazon Bedrock, Knowledge Bases, and Action Groups.
- Write robust, scalable backend services (AWS Lambda, API Gateway) in Python, TypeScript, or Go to interface foundation models with enterprise APIs, databases, and third-party systems.
- Enforce strict OpenAPI schemas and build deterministic system wrappers around non-deterministic foundation models.
Production Deployment & LLMOps
- Lead the transition of experimental AI prototypes into highly available, scalable production environments.
- Manage complex agent state, optimize for token latency and compute cost, and handle retry logic for API rate limits.
- Implement comprehensive observability, tracing, and logging for complex Chain-of-Thought (CoT) workloads using AWS CloudWatch and specialized LLM telemetry tools.
Automated Evaluations & Telemetry
- Architect automated, continuous evaluation pipelines.
- Design deterministic and non-deterministic tests (e.g., LLM-as-a-judge) to quantitatively measure agent reasoning trajectories, tool-selection accuracy, context precision, and hallucination rates over time.
- Create "golden datasets" and robust regression testing frameworks to ensure model upgrades do not break existing agent capabilities.
AI Governance, Security & Compliance
- Architect secure, "defense-in-depth" boundaries for agent actions to meet strict enterprise compliance standards (SOC2, HIPAA, etc.).
- Implement Amazon Bedrock Guardrails to filter malicious inputs, mask PII, and prevent prompt injection attacks.
- Decouple LLM reasoning from critical system authorization by enforcing strict Role-Based Access Control (RBAC) and the Principle of Least Privilege at the AWS IAM and API execution layers.
What You Bring (Required Qualifications)
- Engineering Experience: 5+ years of core backend software engineering or cloud architecture experience, with a proven track record of building and deploying LLM/GenAI applications in production.
- AWS Experience: Deep, hands-on operational experience with the AWS ecosystem, specifically Amazon Bedrock, AWS Agent Core, AWS Lambda, IAM, API Gateway, and serverless architectures.
- Production Deployments: You have moved LLM applications out of Jupyter notebooks/local environments and into production CI/CD pipelines. You know how to handle failure loops and LLM orchestration challenges.
- Evaluation Frameworks: Practical experience building custom evaluation pipelines or using existing frameworks (e.g., Ragas, TruLens, DeepEval, LangSmith) to measure and judge agent performance.
- Security Mindset: Strong understanding of AI TRiSM (Trust, Risk, and Security Management). You know that security cannot be handled by system prompts alone and must be enforced at the infrastructure level.
Bonus Points
- Active AWS Certifications (e.g., AWS Certified Machine Learning – Specialty, AWS Certified Solutions Architect – Professional, or AWS Certified AI Practitioner).
- Experience working in a consulting, professional services, or client-facing technical role.
- Familiarity with Infrastructure as Code (Terraform, AWS CDK) and vector database provisioning (Amazon OpenSearch Serverless, Pinecone, pgvector).
Why Join nClouds?
- Cutting-Edge Tech: Work on the bleeding edge of Generative AI with direct support, training, and partnership from AWS.
- Impact: Build mission-critical systems that solve real business problems for a diverse portfolio of exciting clients.
- Culture: Join a highly collaborative, remote-first team of top-tier cloud engineers who value continuous learning and innovation.
- Benefits: Competitive salary, comprehensive health/dental/vision benefits, 401(k) matching, flexible PTO, and paid certifications.