Technical personal site / selected systems portfolio

Akshay Sardana

I build production AI and data systems that make complex business workflows more reliable.

Principal Engineer working across conversational BI, data platforms, anomaly analysis, commerce data quality, and LLM workflow reliability. This site collects notes on systems work, engineering patterns, and representative projects with confidential details intentionally omitted.

Data scale 10TB+ daily platform workloads
Platform reach Shared standards across multiple business units
Delivery 100+ pipelines and data models in 6 months
Analytics lift 60% reduction in analytics backlog

Technical Focus

Production AI and data systems

Recurring problem spaces from applied engineering work: governed analytics, durable data platforms, reliable LLM workflows, self-service reporting, and commerce data systems.

01

AI Analytics and Conversational BI

Natural-language analytics systems that answer business questions against governed warehouse data without bypassing metric definitions or SQL safety.

  • Question-to-SQL workflows with dry runs, retries, and answer verification
  • Schema retrieval grounded in metadata, business terms, and examples
  • Conversation memory and escalation paths for ambiguous questions
02

Data Platform Strategy

Practical architecture and sequencing for fragmented pipelines, warehouse models, and ownership boundaries.

  • Architecture and sequencing across BigQuery, dbt, Airflow, streaming, and PySpark
  • Data quality, lineage, and model contracts for shared datasets
  • Operating model for roadmap, support, and stakeholder intake
03

LLM Workflow Reliability

Auditable AI workflows with deterministic checks, structured outputs, tool use, human review, and production observability.

  • Guardrails for confidence, safety, retries, and failure investigation
  • Evaluation harnesses that match the business workflow
  • Review gates for multimodal, classification, and agentic systems
04

Analytics Engineering and Self-Service

Semantic models, dashboard contracts, and enablement practices that let analysts and business teams answer repeat questions with less ad hoc support.

  • Reusable dbt models and metrics with tested definitions
  • Self-service paths that keep sensitive logic governed
  • Training and documentation for engineers, analysts, and operators
05

Commerce and Operational Data Systems

Commerce and operational reporting systems where inconsistent source signals, delayed updates, and reconciliation make the data hard to trust.

  • Reconciliation tests between source systems and reporting contracts
  • Data models that balance explainability, operational ownership, and reporting trust
  • Workflow checks that make source-system drift visible before it compounds

Selected Work

Production systems across AI, data, and ML infrastructure

Selected examples are based on public career history and sanitized project descriptions. Company names provide employment context only; confidential implementation details are omitted.

AI Analytics

Conversational Analytics Agent

A governed natural-language analytics agent that turns business questions into validated BigQuery analysis across Ads, Editorial, Commerce, and operations workflows.

Context
At Hearst Magazines, business teams needed faster analytical answers from shared warehouse data without bypassing metric ownership, access expectations, or reviewable SQL.
Problem
Naive question-to-SQL was not enough: schema names were ambiguous, metric definitions lived across multiple layers, and users expected follow-up questions, not one-shot query generation.
Constraints
The workflow had to stay governed: no unsafe SQL execution, no unverified answers, and graceful handling of zero-result or ambiguous questions.
Architecture
Built a LangGraph workflow on Vertex AI/Gemini and BigQuery with schema metadata retrieval using embeddings, keyword search, business-term matching, and fuzzy search. Added dry-run validation, SQL safety checks, retries, error-driven tool use, zero-result investigation, and AI judge verification.
Role
Led the architecture and implementation path from prototype behavior to a governed internal workflow, including retrieval design, agent state, validation loops, and single-turn and multi-turn interaction patterns.
Outcome
Conceived and built the initial workflow, then expanded it from prototype behavior into a governed analytics agent used across Ads, Editorial, and Commerce workflows while preserving source-of-truth data boundaries and answer traceability.
Demonstrates
Production LLM engineering, metadata-grounded retrieval, SQL safety, evaluation discipline, and the ability to make AI useful inside real analytics governance.

Focus

  • AI analytics
  • conversational BI
  • SQL safety
  • semantic retrieval
  • evals

Stack

  • LangGraph
  • Vertex AI/Gemini
  • BigQuery
  • embeddings
  • dbt metadata
Data Platform

Shared Data Platform and Self-Service Analytics

A shared analytics foundation for high-volume digital media and commerce data, built to reduce repeated requests and increase safe self-service.

Context
At Hearst Magazines, a large analytics environment processed 10TB+ of daily workloads, including 5TB+ of clickstream data, while many teams depended on repeated custom SQL and a small group of specialists.
Problem
Analytics demand was growing faster than the platform operating model. Definitions drifted, pipeline ownership was fragmented, and business users needed governed self-service instead of ad hoc ticket queues.
Constraints
The work had to improve reliability without stopping delivery: existing reporting could not break, teams had different skill levels, and source systems spanned batch, streaming, warehouse, and transformation layers.
Architecture
Led platform architecture across BigQuery, Airflow, dbt, Kinesis, and PySpark. Established modeling standards, semantic-layer patterns, data quality checks, ownership practices, and reusable datasets for common analytical paths.
Role
Set roadmap and standards while remaining hands-on in implementation, stakeholder intake, model design, pipeline delivery, training, and migration planning.
Outcome
Reduced analytics backlog by 60%, delivered 100+ pipelines and data models in six months, and helped engineers and analysts adopt safer self-service practices.
Demonstrates
Data platform leadership at scale: technical architecture, operating model, education, and delivery discipline moving together.

Focus

  • data platform
  • semantic layer
  • self-service analytics
  • data quality
  • enablement

Stack

  • BigQuery
  • Airflow
  • dbt
  • Kinesis
  • PySpark
AI Workflow Automation

AI Commerce Defect Detection

A reviewable AI workflow that detects commerce catalog and retailer-page defects before operational issues compound.

Context
At Hearst Magazines, commerce operations depended on product availability, retailer content, and catalog state staying aligned across systems that changed outside direct control.
Problem
Manual review did not scale, rules alone missed visual and contextual failures, and operators needed actionable signals rather than noisy alerts.
Constraints
The system had to tolerate unstable web pages, partial extraction, retailer variation, unavailable products, visual ambiguity, and the need for human-reviewable evidence.
Architecture
Combined async web extraction, deterministic rules validation, structured outputs, and gated multimodal review. Gemini screenshot verification produced existence, availability, and confidence signals rather than opaque pass/fail labels.
Role
Designed the workflow boundaries, validation stages, confidence schema, and review path so AI would be used where visual reasoning added value and deterministic checks would handle known cases.
Outcome
Created a defect-detection loop that lets operators prioritize likely catalog and retailer-page issues instead of manually inspecting every product page.
Demonstrates
Practical multimodal AI system design: use rules where possible, use LLM vision where useful, and expose confidence and evidence for operational decisions.

Focus

  • multimodal review
  • catalog validation
  • operator workflow
  • confidence scoring

Stack

  • Gemini
  • async extraction
  • rules validation
  • structured outputs
Privacy Data Infrastructure

Consumer-Scale Event and PII Data Systems

High-scale event processing and privacy-oriented data systems supporting safe analytics over large consumer-product datasets.

Context
At Meta, event and product datasets supported analytics, product decisions, and privacy-sensitive workflows across rapidly changing consumer systems.
Problem
Teams needed safer analytics over high-volume data while reducing storage cost, detecting sensitive information earlier, and preserving continuity during an organizational pivot.
Constraints
The systems had to handle 5B+ daily events, support analytics across 60+ NoSQL collections, maintain backward compatibility, and avoid exposing sensitive personal information through analytical workflows.
Architecture
Worked on real-time NLP-based PII detection, safe analytics patterns over NoSQL-derived datasets, cumulative table design, and a backward-compatible data model that could support changing product and organizational requirements.
Role
Contributed to data modeling, pipeline design, privacy-aware analytics infrastructure, and migration support inside large-scale product data environments.
Outcome
Supported privacy-aware analytics at consumer scale, enabled safer access patterns across broad NoSQL-derived data, and reduced storage by 65% through cumulative table design.
Demonstrates
Experience with high-scale event systems, privacy-sensitive data engineering, storage-efficient modeling, and migration work where compatibility matters.

Focus

  • event data
  • PII detection
  • privacy-safe analytics
  • NoSQL analytics
  • storage optimization

Stack

  • real-time pipelines
  • NLP classification
  • NoSQL collections
  • cumulative tables
ML Infrastructure

Fintech ML Data Infrastructure

Machine-learning data infrastructure for fraud and risk workflows, built to shorten model iteration cycles and improve production responsiveness.

Context
Point Predictive operated in a fintech ML environment where model performance, data availability, and response time directly affected fraud and risk decision workflows.
Problem
Model lifecycle steps were too slow, derived datasets were not yet centralized, and scaling constraints limited how quickly the team could improve and serve analytical signals.
Constraints
The platform needed reliable orchestration, warehouse-backed derived data, streaming ingestion, batch processing, and model-supporting datasets without disrupting active business workflows.
Architecture
Built infrastructure across AWS Step Functions, Redshift, PySpark on EMR, and Kinesis Firehose. Helped establish the first derived-data warehouse and production paths for model lifecycle data.
Role
Worked across data engineering and ML infrastructure, connecting ingestion, transformation, warehouse modeling, and model-supporting datasets into a more durable platform.
Outcome
Reduced model lifecycle time from weeks to hours, improved model performance by 20%, delivered 5x faster response, and increased scalability by 10x.
Demonstrates
Ability to build ML data infrastructure where orchestration, derived data, model iteration, and service responsiveness are all part of the same system.

Focus

  • ML data infrastructure
  • fraud analytics
  • model lifecycle
  • derived warehouse

Stack

  • AWS Step Functions
  • Redshift
  • PySpark
  • EMR
  • Kinesis Firehose
Financial Data Systems

Post-Trade and Financial Data Systems

Financial data and post-trade systems spanning reference data, derivatives clearing migration, reporting infrastructure, cloud migration, and developer tooling.

Context
Earlier financial-technology work at Barclays spanned post-trade systems, reference data, derivatives workflows, client reporting, and platform modernization.
Problem
Financial data systems required accuracy, auditability, migration discipline, and user-facing reporting tools while supporting complex securities and derivatives workflows.
Constraints
Work had to fit regulated environments, legacy integration points, data quality expectations, operational reporting needs, and production change-management practices.
Architecture
Built ETL pipelines for Enterprise Security Master data, supported derivatives clearing migration and post-trade technology, contributed to cloud migration and DevOps work, and built self-service tooling including a SQL generator, visualization platform, and client reporting infrastructure.
Role
Contributed as an engineer across delivery, migration, automation, reporting, and tooling efforts, with earlier algo-trading internship work providing exposure to market-facing systems.
Outcome
Delivered financial-data pipelines and workflow tools that improved reporting access, migration readiness, and operational support across post-trade and reference-data domains.
Demonstrates
Foundation in disciplined financial data engineering: ETL reliability, regulated workflows, reporting infrastructure, and practical tools for technical and business users.

Focus

  • financial data
  • post-trade systems
  • ETL
  • reporting infrastructure
  • migration

Stack

  • ETL pipelines
  • SQL tooling
  • visualization
  • cloud migration
  • DevOps

Approach / Engineering Principles

A practical path from ambiguous mandate to durable system

The pattern is direct: understand the workflow, ground it in governed data, design for production behavior, and leave teams with maintainable operating practices.

Clarify

Start with the business workflow

Define the decision, user path, source systems, owners, failure modes, and economic constraints before choosing the AI or data architecture.

Ground

Connect AI to governed data

Use source-of-truth datasets, metric contracts, retrieval metadata, and access boundaries so answers can be traced and corrected.

Operate

Design for evals and production behavior

Build dry runs, tests, reconciliation checks, confidence gates, observability, and human review into the workflow instead of adding them after launch.

Transfer

Leave durable systems and team capability

Document decisions, coach internal owners, and leave an operating model that can keep improving after the initial build.

Experience

Senior engineering judgment for ambiguous systems work

The operating mode is hands-on and leadership-facing: clarify the problem, design the system, and leave teams with durable foundations.

Principal Engineer, AI/ML/Data - Hearst Magazines

Principal Engineer since 2023, promoted from Senior Data Engineer, working inside a large digital media and e-commerce publisher with 140M+ monthly consumers.

  • Lead a cross-functional data and AI team of five across roadmap, architecture, stakeholder alignment, and delivery
  • Platform impact across multiple engineering, analytics, and business teams
  • Build AI analytics, anomaly analysis, commerce validation, article classification, and shared data-platform systems

Senior Data Engineering - Meta, Point Predictive, 1stDibs, Barclays

Prior work across consumer-scale event pipelines, fintech model infrastructure, e-commerce ML/data systems, and financial technology.

  • Worked on data systems processing 5B+ events daily and real-time PII detection at Meta
  • Improved fintech ML data infrastructure with stronger model performance, faster responses, and higher scale
  • Built across dbt, Airflow, PySpark, Kafka/Kinesis, BigQuery, AWS, GCP, Azure, LangGraph, OpenAI, Anthropic, Bedrock, and Vertex AI

Systems leadership pattern

Hands-on principal engineering work where strategy, architecture, implementation, and operating ownership need to converge.

  • Translate ambiguous mandates into scoped delivery plans and measurable system behavior
  • Build enough of the critical path to prove the architecture under real constraints
  • Create contracts, documentation, and team practices that survive beyond the initial launch

Contact

Schedule time

Use Calendly to find a time that works without the back-and-forth. For technical notes, project context, or background verification, email still works.