Senior Associate/Assistant Vice President, AI Data Engineer

Location:

SG, 238891

Group: Corporate Group

Department: Technology

Section: Applications, Data & Digital

Job Type: Permanent

Req ID: 12086

Temasek is a global investment company headquartered in Singapore, with a net portfolio value of S$518 billion (US$401b, €350b, £304b, RMB2.77t) as at 31 March 2026. Our Purpose “So Every Generation Prospers” guides us to make a difference for today’s and future generations. We seek to build a resilient and forward-looking portfolio that will deliver good sustainable returns over the long term.

We have 13 offices in 9 countries around the world: Beijing, Hanoi, Mumbai, Shanghai, Shenzhen, and Singapore in Asia; and Brussels, London, Mexico City, New York, Paris, San Francisco, and Washington, DC outside Asia.

For more information on Temasek, please visit www.temasek.com.sg
For Temasek Review 2026, please visit www.temasekreview.com.sg
For Sustainability Report 2026, please visit www.temasek.com.sg/SR2026

Introduction

AI agents are only as good as the data they can reason over. Poorly structured, stale, or inconsistently governed data is the most common reason enterprise AI products fail to deliver value — not model capability, but data readiness. The AI Data Engineer at Temasek is responsible for building the data foundations that make Temasek's agentic AI systems trustworthy, accurate, and capable of reasoning over the complex, heterogeneous data environment of a global investment institution.

This role sits at the intersection of data engineering and AI systems engineering — responsible for designing and building the data architectures, pipelines, and quality frameworks that allow AI agents to retrieve, reason over, and act on Temasek's investment data. You will work across structured investment data (portfolio positions, financial statements, market data), unstructured data (research reports, company filings, meeting notes, news), and real-time data streams — making all of it accessible, reliable, and AI-readable.

Responsibilities

Agent-ready data architecture

Design and build data architectures specifically optimised for AI agent consumption: structured data stores accessible via tool-calling APIs, vector knowledge bases for semantic retrieval, graph databases for relationship-based reasoning, and hybrid retrieval systems that combine keyword, semantic, and structured query approaches.
Build and maintain the data layer for Temasek's RAG (Retrieval-Augmented Generation) pipelines: document ingestion workflows, chunking strategy design, embedding generation and refresh pipelines, metadata tagging for filtered retrieval, and vector index management across multiple knowledge domains (company research, market intelligence, portfolio data, regulatory filings).
Design the ontology and schema standards for AI-accessible data assets — ensuring that data structures are consistent, well-documented, and interpretable by AI agents without requiring ad-hoc parsing or custom logic in every product that consumes them.
Architect real-time and near-real-time data feeds for AI agents that require current information: market data integration, news and research feed ingestion, portfolio event streaming, and alert-triggering data pipelines — with latency and freshness SLAs defined and enforced.

Enterprise data quality and governance

Define and implement data quality standards for AI-consumed data assets: completeness checks, consistency validation, freshness monitoring, and anomaly detection — with automated quality gates that prevent degraded data from entering AI production systems.
Build and maintain comprehensive data lineage tracking across all AI data pipelines — enabling full traceability from raw source to AI agent consumption, supporting both operational debugging and regulatory audit requirements.
Partner with the AI Security & Governance Lead and the enterprise data governance function to ensure AI data assets comply with Temasek's data classification standards, access control requirements, and cross-border data handling rules — particularly for data used in China-facing AI workflows.
Design and operate data observability tooling: pipeline health monitoring, data drift detection, schema change alerting, and SLA dashboards that give AI product teams visibility into the data their systems depend on.
Conduct regular data quality reviews with AI product teams — identifying where data gaps or quality issues are limiting AI product performance, and prioritising data engineering investment to address the highest-impact gaps.

Shared, reusable data platform for AI

Build reusable data assets and services that multiple AI products can depend on — including a shared investment knowledge graph, a company and market data API layer, a document intelligence pipeline for research and filing ingestion, and a portfolio analytics data service.
Maintain and evolve a data catalogue for AI-accessible data assets: documenting sources, schemas, freshness, quality metrics, access procedures, and known limitations — enabling AI product managers and engineers to make informed decisions about which data to use and trust.
Contribute to Temasek's enterprise data platform strategy from an AI-first perspective — advocating for data architecture decisions that serve AI consumption patterns, not just conventional BI and reporting use cases.
Engage with external data vendors and market data providers to evaluate, onboard, and maintain data sources that improve the quality and coverage of AI agent knowledge bases — conducting ongoing vendor data quality assessments and managing data licensing agreements in coordination with procurement.

Requirements

Experience and background

4–8 years of data engineering experience with at least 2 years specifically focused on building data infrastructure for AI/ML or LLM-powered systems in production.
Demonstrated experience at a data-intensive organisation with complex, heterogeneous data environments — financial data, enterprise data platforms, or equivalent — ideally with exposure to investment data domains (company financials, market data, portfolio systems).
Hands-on experience building RAG pipelines or AI knowledge bases in production, including vector store management, embedding pipeline design, and chunking strategy optimisation.
Strong data engineering fundamentals: pipeline design and orchestration, schema design, data quality frameworks, and lineage tracking — with the rigour expected of a Palantir-calibre data engineering background.

Technical capabilities

Data pipeline and orchestration: Python (pandas, Polars, SQLAlchemy), dbt, Apache Airflow or Prefect, Spark for large-scale processing; experience with both batch and streaming pipeline architectures (Kafka, Kinesis, or equivalent).
AI data stack: vector databases (Pinecone, Weaviate, pgvector, Chroma), embedding models and management, LlamaIndex or LangChain data connectors, document parsing and OCR tooling, and chunking strategy design for different document types.
Structured data and analytics: SQL proficiency across multiple dialects, experience with enterprise data warehouse platforms (Snowflake, BigQuery, Redshift, or Databricks), and familiarity with graph database concepts (Neo4j or equivalent).
Data quality and observability: experience with data quality frameworks (Great Expectations, Soda, or equivalent), data lineage tools (OpenLineage, DataHub, Marquez), and data observability platforms (Monte Carlo, Acceldata, or equivalent).