Senior Associate/Assistant Vice President, AI Data Engineer

Location: 

SG, 238891

Group:  Corporate Group
Department:  Technology
Section:  Applications, Data & Digital
Job Type:  Permanent
Req ID:  12086

Temasek is a global investment company headquartered in Singapore, with a net portfolio value of S$434 billion (US$324 billion, €299 billion, £250 billion, and RMB2.35 trillion) as at 31 March 2025. Marking our unlisted assets to market would provide S$35 billion of value uplift and bring our mark to market net portfolio value to S$469 billion. 

 

Our Purpose “So Every Generation Prospers” guides us to make a difference for today’s and future generations. 

 

Operating on commercial principles, we seek to deliver sustainable returns over the long term. 

 

We have 13 offices in 9 countries around the world: Beijing, Hanoi, Mumbai, Shanghai, Shenzhen, and Singapore in Asia; and Brussels, London, Mexico City, New York, Paris, San Francisco, and Washington, DC outside Asia.  

 

For more information on Temasek, please visit www.temasek.com.sg.
For Temasek Review 2025, please visit www.temasekreview.com.sg.
For Sustainability Report 2025, please visit https://www.temasek.com.sg/content/dam/temasek-corporate/sustainability/2025/Temasek-Sustainability-Report-2025.pdf.

 

Introduction

AI agents are only as good as the data they can reason over. Poorly structured, stale, or inconsistently governed data is the most common reason enterprise AI products fail to deliver value — not model capability, but data readiness. The AI Data Engineer at Temasek is responsible for building the data foundations that make Temasek's agentic AI systems trustworthy, accurate, and capable of reasoning over the complex, heterogeneous data environment of a global investment institution.


This role sits at the intersection of data engineering and AI systems engineering — responsible for designing and building the data architectures, pipelines, and quality frameworks that allow AI agents to retrieve, reason over, and act on Temasek's investment data. You will work across structured investment data (portfolio positions, financial statements, market data), unstructured data (research reports, company filings, meeting notes, news), and real-time data streams — making all of it accessible, reliable, and AI-readable.

Responsibilities

Agent-ready data architecture

  • Design and build data architectures optimised for AI agent consumption, including structured stores exposed via APIs, vector databases for semantic retrieval, graph databases for relationship reasoning, and hybrid retrieval systems combining keyword, semantic, and structured queries.
  • Own the data layer for RAG pipelines: document ingestion workflows, chunking strategies, embedding generation and refresh, metadata tagging, and vector index management across domains (e.g., company research, market intelligence, portfolio data, regulatory filings).
  • Establish ontology and schema standards to ensure AI-accessible data is consistent, well-documented, and interpretable without custom parsing logic.
  • Architect real-time and near-real-time data feeds (e.g., market data, news, portfolio events), defining and enforcing latency and freshness SLAs.

 

Enterprise data quality and governance

  • Define and implement data quality standards (completeness, consistency, freshness, anomaly detection) with automated quality gates to prevent degraded data entering AI systems.
  • Build end-to-end data lineage tracking across AI pipelines, enabling traceability from source to AI consumption for debugging and audit requirements.
  • Partner with AI Security & Governance and enterprise data teams to ensure compliance with data classification, access control, and cross-border handling requirements (including China-related workflows).
  • Design and operate data observability tooling covering pipeline health, data drift, schema changes, and SLA monitoring, giving product teams visibility into data reliability.
  • Run regular data quality reviews with AI product teams to identify gaps impacting performance and prioritise data engineering investments.

 

Shared, reusable data platform for AI

  • Develop reusable data assets and services supporting multiple AI products, including a shared investment knowledge graph, company/market data APIs, document intelligence pipelines, and portfolio analytics services.
  • Maintain a data catalogue documenting sources, schemas, freshness, quality metrics, access protocols, and limitations to enable informed data usage.
  • Contribute to enterprise data platform strategy with an AI-first perspective, ensuring architectures support AI consumption patterns beyond traditional BI/reporting needs.
  • Engage external data vendors to evaluate, onboard, and maintain high-quality data sources, including ongoing quality assessment and licensing management with procurement.

Requirements

Experience and background

  • 4–8 years of data engineering experience with at least 2 years specifically focused on building data infrastructure for AI/ML or LLM-powered systems in production.
  • Demonstrated experience at a data-intensive organisation with complex, heterogeneous data environments — financial data, enterprise data platforms, or equivalent — ideally with exposure to investment data domains (company financials, market data, portfolio systems).
  • Hands-on experience building RAG pipelines or AI knowledge bases in production, including vector store management, embedding pipeline design, and chunking strategy optimisation.
  • Strong data engineering fundamentals: pipeline design and orchestration, schema design, data quality frameworks, and lineage tracking — with the rigour expected of a Palantir-calibre data engineering background.

 

Technical capabilities

  • Data pipeline and orchestration: Python (pandas, Polars, SQLAlchemy), dbt, Apache Airflow or Prefect, Spark for large-scale processing; experience with both batch and streaming pipeline architectures (Kafka, Kinesis, or equivalent).
  • AI data stack: vector databases (Pinecone, Weaviate, pgvector, Chroma), embedding models and management, LlamaIndex or LangChain data connectors, document parsing and OCR tooling, and chunking strategy design for different document types.
  • Structured data and analytics: SQL proficiency across multiple dialects, experience with enterprise data warehouse platforms (Snowflake, BigQuery, Redshift, or Databricks), and familiarity with graph database concepts (Neo4j or equivalent).
  • Data quality and observability: experience with data quality frameworks (Great Expectations, Soda, or equivalent), data lineage tools (OpenLineage, DataHub, Marquez), and data observability platforms (Monte Carlo, Acceldata, or equivalent).

Stay connected by joining our network! Enter your e-mail and tell us a bit about yourself, and well keep you informed about upcoming events and opportunities that match your interests.