Senior Data Scientist, Copilot AI
Location: Mountain View
Posted on: June 23, 2025
|
|
Job Description:
As Microsoft continues to push the boundaries of AI, we are on
the lookout for passionate individuals to work with us on the most
interesting and challenging AI questions of our time. Our vision is
bold and broad — to build systems that have true artificial
intelligence across agents, applications, services, and
infrastructure. It’s also inclusive: we aim to make AI accessible
to all — consumers, businesses, developers — so that everyone can
realize its benefits. Microsoft AI (MS AI) is seeking an
experienced Data Scientist to help build the next wave of
capabilities of our personal AI, Copilot. We’re looking for someone
who thinks deeply about measurement and human-AI interactions—in
this role you help us describe and measure how people use Microsoft
Copilot. We seek a versatile data scientist who can architect
solutions that stand the test of time and who will bring an
abundance of positive energy, empathy, and kindness to the team
every day, in addition to being highly effective. Microsoft’s
mission is to empower every person and every organization on the
planet to achieve more. As employees we come together with a growth
mindset, innovate to empower others, and collaborate to realize our
shared goals. Each day we build on our values of respect,
integrity, and accountability to create a culture of inclusion
where everyone can thrive at work and beyond. By applying to this
U.S. Mountain View, CA OR Redmond, WA position, you are required to
be local to the San Francisco area OR Seattle area and in office 3
days a week. Responsibilities Develop and improve evaluation
methodologies to assess model output quality, for both machine eval
and human eval metrics and coverage. Design and implement scalable
data pipelines to extract, transform, and structure product logs
for evaluation use cases. Synthesize datasets for human or machine
evaluation. Analyze and interpret results from A/B tests, offline
benchmarks, and live experiments to drive actionable
recommendations. Train ML classifiers to analyze and label user
logs (e.g., classify intent, detect quality issues) for evaluation
Draw insights from eval results and form recommendations, drive
different eval experiments to find the most optimal solutions. Work
closely with product managers, engineers, and researchers to define
evaluation criteria aligned with product goals and user value.
Create and maintain dashboards and reporting tools to monitor eval
performance and trends. Contribute to the development of custom
metrics that go beyond standard benchmarks to capture
product-specific nuances. Stay current on the latest in LLM
research on evaluation and prompting. Embody our Culture and Values
. Required Qualifications Doctorate in Data Science, Mathematics,
Statistics, Econometrics, Economics, Operations Research, Computer
Science, or related field AND 1 year(s) data-science experience
(e.g., managing structured and unstructured data, applying
statistical techniques and reporting results) OR Masters Degree in
Data Science, Mathematics, Statistics, Econometrics, Economics,
Operations Research, Computer Science, or related field AND 3 years
data-science experience (e.g., managing structured and unstructured
data, applying statistical techniques and reporting results) OR
Bachelors Degree in Data Science, Mathematics, Statistics,
Econometrics, Economics, Operations Research, Computer Science, or
related field AND 5 years data-science experience (e.g., managing
structured and unstructured data, applying statistical techniques
and reporting results) OR equivalent experience. 5 years of
experience in data science, ML evaluation, or applied research.
Working knowledge of LLM evaluation methods, including experience
conducting both human evaluations and LLM-as-a-judge assessments
Experience using Python, SQL, and common data analysis libraries
for data processing and analysis. Ability to analyze complex
problems, communicate findings clearly, and translate insights into
actionable steps. Preferred Qualifications Experience building or
evaluating LLM applications in production. Product-driven thinking.
Ability to work in a fast-paced environment, manage multiple
priorities, and adapt to changing requirements and deadlines. Data
Science IC4 - The typical base pay range for this role across the
U.S. is USD $119,800 - $234,700 per year. There is a different
range applicable to specific work locations, within the San
Francisco Bay area and New York City metropolitan area, and the
base pay range for this role in those locations is USD $158,400 -
$258,000 per year.
Keywords: , Santa Rosa , Senior Data Scientist, Copilot AI, IT / Software / Systems , Mountain View, California