New See where your enterprise data creates delays, rework, and leakage.Get a free Data Savings Estimate
Stargo

limitedDistribution · Industry Research

AI Agent Data Pipeline Intern

Build pipelines to ingest and organize experiment-related data from team communications, meeting notes, experiment plans, analysis documents, metrics, and evaluation results. Use LLM-based methods to clean noisy unstructured data, extract experiment-relevant information, and convert fragmented discussions into structured records.

job-boards.greenhouse.io StaffMay 18, 20262 min read
AI Agent Data Pipeline Intern

XPeng Motors leverages LLM-based methods to convert unstructured data into structured insights, enhancing autonomous vehicle development.

Executive Summary

Build pipelines to ingest and organize experiment-related data from team communications, meeting notes, experiment plans, analysis documents, metrics, and evaluation results. Use LLM-based methods to clean noisy unstructured data, extract experiment-relevant information, and convert fragmented discussions into structured records. Design data schemas, metadata, and quality checks that make experiment context easier to search, trace, and use in downstream agent workflows. Support retrieval and indexing workflows, including semantic search or RAG-style pipelines, so the agent can access relevant experiment context. Prepare curated datasets for agent evaluation and, where applicable, LLM fine-tuning or instruction-tuning. Work with MLEs and platform engineers to understand experiment workflows, data gaps, and the types of insights most useful for planning and analysis. Evaluate whether the agent uses curated experiment data correctly to generate summaries, comparisons, recommendations, and analysis insights. Contribute to internal tools, dashboards, or reports that help teams monitor experiment status, outcomes, and trends. Strong skills in Python, SQL, and data processing. Experience working with structured and unstructured data, including text-heavy sources such as documents, notes, messages, or logs. Familiarity with data pipelines, ETL workflows, or large-scale data processing. Interest in LLM development, LLM evaluation, agentic AI systems, RAG pipelines, semantic retrieval, prompt engineering, or LLM-assisted data processing. Familiarity with machine learning workflows, model training, evaluation metrics, or MLOps concepts. Strong analytical thinking and attention to data quality, consistency, and reliability. Comfort working with ambiguous data sources and collaborating with ML and platform engineers to clarify requirements. Previous experience building internal tools, automation scripts, or data quality checks. A fun, supportive and engaging envi

Source: job-boards.greenhouse.io

Original Article: http://job-boards.greenhouse.io/xpengmotors/jobs/8548990002

More from the News Room

View all

We are publishing more related coverage here soon. Explore the full News Room for the latest articles.

See ROI in 12 weeks

See where enterprise data is slowing operations down.

Estimate the manual effort, delays, and leakage hidden across your current workflow before you automate it.