Technical personal site / selected systems portfolio

Akshay Sardana

I build production AI and data systems that make complex business workflows more reliable.

Principal Engineer working across conversational BI, data platforms, anomaly analysis, commerce data quality, and LLM workflow reliability. This site collects notes on systems work, engineering patterns, and representative projects with confidential details intentionally omitted.

View selected systems Schedule time

Data scale 10TB+ daily platform workloads

Platform reach Shared standards across multiple business units

Delivery 100+ pipelines and data models in 6 months

Analytics lift 60% reduction in analytics backlog

Technical Focus

Production AI and data systems

Recurring problem spaces from applied engineering work: governed analytics, durable data platforms, reliable LLM workflows, self-service reporting, and commerce data systems.

AI Analytics and Conversational BI

Natural-language analytics systems that answer business questions against governed warehouse data without bypassing metric definitions or SQL safety.

Question-to-SQL workflows with dry runs, retries, and answer verification
Schema retrieval grounded in metadata, business terms, and examples
Conversation memory and escalation paths for ambiguous questions

Data Platform Strategy

Practical architecture and sequencing for fragmented pipelines, warehouse models, and ownership boundaries.

Architecture and sequencing across BigQuery, dbt, Airflow, streaming, and PySpark
Data quality, lineage, and model contracts for shared datasets
Operating model for roadmap, support, and stakeholder intake

LLM Workflow Reliability

Auditable AI workflows with deterministic checks, structured outputs, tool use, human review, and production observability.

Guardrails for confidence, safety, retries, and failure investigation
Evaluation harnesses that match the business workflow
Review gates for multimodal, classification, and agentic systems

Analytics Engineering and Self-Service

Semantic models, dashboard contracts, and enablement practices that let analysts and business teams answer repeat questions with less ad hoc support.

Reusable dbt models and metrics with tested definitions
Self-service paths that keep sensitive logic governed
Training and documentation for engineers, analysts, and operators

Commerce and Operational Data Systems

Commerce and operational reporting systems where inconsistent source signals, delayed updates, and reconciliation make the data hard to trust.

Reconciliation tests between source systems and reporting contracts
Data models that balance explainability, operational ownership, and reporting trust
Workflow checks that make source-system drift visible before it compounds

Selected Work

Production systems across AI, data, and ML infrastructure

Selected examples are based on public career history and sanitized project descriptions. Company names provide employment context only; confidential implementation details are omitted.

AI Analytics

Conversational Analytics Agent

A governed natural-language analytics agent that turns business questions into validated BigQuery analysis across Ads, Editorial, Commerce, and operations workflows.

Context: At Hearst Magazines, business teams needed faster analytical answers from shared warehouse data without bypassing metric ownership, access expectations, or reviewable SQL.
Problem: Naive question-to-SQL was not enough: schema names were ambiguous, metric definitions lived across multiple layers, and users expected follow-up questions, not one-shot query generation.
Constraints: The workflow had to stay governed: no unsafe SQL execution, no unverified answers, and graceful handling of zero-result or ambiguous questions.
Architecture: Built a LangGraph workflow on Vertex AI/Gemini and BigQuery with schema metadata retrieval using embeddings, keyword search, business-term matching, and fuzzy search. Added dry-run validation, SQL safety checks, retries, error-driven tool use, zero-result investigation, and AI judge verification.
Role: Led the architecture and implementation path from prototype behavior to a governed internal workflow, including retrieval design, agent state, validation loops, and single-turn and multi-turn interaction patterns.
Outcome: Conceived and built the initial workflow, then expanded it from prototype behavior into a governed analytics agent used across Ads, Editorial, and Commerce workflows while preserving source-of-truth data boundaries and answer traceability.
Demonstrates: Production LLM engineering, metadata-grounded retrieval, SQL safety, evaluation discipline, and the ability to make AI useful inside real analytics governance.

Focus

Stack

Data Platform

Shared Data Platform and Self-Service Analytics

A shared analytics foundation for high-volume digital media and commerce data, built to reduce repeated requests and increase safe self-service.

Context: At Hearst Magazines, a large analytics environment processed 10TB+ of daily workloads, including 5TB+ of clickstream data, while many teams depended on repeated custom SQL and a small group of specialists.
Problem: Analytics demand was growing faster than the platform operating model. Definitions drifted, pipeline ownership was fragmented, and business users needed governed self-service instead of ad hoc ticket queues.
Constraints: The work had to improve reliability without stopping delivery: existing reporting could not break, teams had different skill levels, and source systems spanned batch, streaming, warehouse, and transformation layers.
Architecture: Led platform architecture across BigQuery, Airflow, dbt, Kinesis, and PySpark. Established modeling standards, semantic-layer patterns, data quality checks, ownership practices, and reusable datasets for common analytical paths.
Role: Set roadmap and standards while remaining hands-on in implementation, stakeholder intake, model design, pipeline delivery, training, and migration planning.
Outcome: Reduced analytics backlog by 60%, delivered 100+ pipelines and data models in six months, and helped engineers and analysts adopt safer self-service practices.
Demonstrates: Data platform leadership at scale: technical architecture, operating model, education, and delivery discipline moving together.

Focus

Stack

AI Workflow Automation

AI Commerce Defect Detection

A reviewable AI workflow that detects commerce catalog and retailer-page defects before operational issues compound.

Context: At Hearst Magazines, commerce operations depended on product availability, retailer content, and catalog state staying aligned across systems that changed outside direct control.
Problem: Manual review did not scale, rules alone missed visual and contextual failures, and operators needed actionable signals rather than noisy alerts.
Constraints: The system had to tolerate unstable web pages, partial extraction, retailer variation, unavailable products, visual ambiguity, and the need for human-reviewable evidence.
Architecture: Combined async web extraction, deterministic rules validation, structured outputs, and gated multimodal review. Gemini screenshot verification produced existence, availability, and confidence signals rather than opaque pass/fail labels.
Role: Designed the workflow boundaries, validation stages, confidence schema, and review path so AI would be used where visual reasoning added value and deterministic checks would handle known cases.
Outcome: Created a defect-detection loop that lets operators prioritize likely catalog and retailer-page issues instead of manually inspecting every product page.
Demonstrates: Practical multimodal AI system design: use rules where possible, use LLM vision where useful, and expose confidence and evidence for operational decisions.

Focus

Stack

Privacy Data Infrastructure

Consumer-Scale Event and PII Data Systems

High-scale event processing and privacy-oriented data systems supporting safe analytics over large consumer-product datasets.

Context: At Meta, event and product datasets supported analytics, product decisions, and privacy-sensitive workflows across rapidly changing consumer systems.
Problem: Teams needed safer analytics over high-volume data while reducing storage cost, detecting sensitive information earlier, and preserving continuity during an organizational pivot.
Constraints: The systems had to handle 5B+ daily events, support analytics across 60+ NoSQL collections, maintain backward compatibility, and avoid exposing sensitive personal information through analytical workflows.
Architecture: Worked on real-time NLP-based PII detection, safe analytics patterns over NoSQL-derived datasets, cumulative table design, and a backward-compatible data model that could support changing product and organizational requirements.
Role: Contributed to data modeling, pipeline design, privacy-aware analytics infrastructure, and migration support inside large-scale product data environments.
Outcome: Supported privacy-aware analytics at consumer scale, enabled safer access patterns across broad NoSQL-derived data, and reduced storage by 65% through cumulative table design.
Demonstrates: Experience with high-scale event systems, privacy-sensitive data engineering, storage-efficient modeling, and migration work where compatibility matters.

Focus

Stack

ML Infrastructure

Fintech ML Data Infrastructure

Machine-learning data infrastructure for fraud and risk workflows, built to shorten model iteration cycles and improve production responsiveness.

Context: Point Predictive operated in a fintech ML environment where model performance, data availability, and response time directly affected fraud and risk decision workflows.
Problem: Model lifecycle steps were too slow, derived datasets were not yet centralized, and scaling constraints limited how quickly the team could improve and serve analytical signals.
Constraints: The platform needed reliable orchestration, warehouse-backed derived data, streaming ingestion, batch processing, and model-supporting datasets without disrupting active business workflows.
Architecture: Built infrastructure across AWS Step Functions, Redshift, PySpark on EMR, and Kinesis Firehose. Helped establish the first derived-data warehouse and production paths for model lifecycle data.
Role: Worked across data engineering and ML infrastructure, connecting ingestion, transformation, warehouse modeling, and model-supporting datasets into a more durable platform.
Outcome: Reduced model lifecycle time from weeks to hours, improved model performance by 20%, delivered 5x faster response, and increased scalability by 10x.
Demonstrates: Ability to build ML data infrastructure where orchestration, derived data, model iteration, and service responsiveness are all part of the same system.

Focus

Stack

Financial Data Systems

Post-Trade and Financial Data Systems

Financial data and post-trade systems spanning reference data, derivatives clearing migration, reporting infrastructure, cloud migration, and developer tooling.

Context: Earlier financial-technology work at Barclays spanned post-trade systems, reference data, derivatives workflows, client reporting, and platform modernization.
Problem: Financial data systems required accuracy, auditability, migration discipline, and user-facing reporting tools while supporting complex securities and derivatives workflows.
Constraints: Work had to fit regulated environments, legacy integration points, data quality expectations, operational reporting needs, and production change-management practices.
Architecture: Built ETL pipelines for Enterprise Security Master data, supported derivatives clearing migration and post-trade technology, contributed to cloud migration and DevOps work, and built self-service tooling including a SQL generator, visualization platform, and client reporting infrastructure.
Role: Contributed as an engineer across delivery, migration, automation, reporting, and tooling efforts, with earlier algo-trading internship work providing exposure to market-facing systems.
Outcome: Delivered financial-data pipelines and workflow tools that improved reporting access, migration readiness, and operational support across post-trade and reference-data domains.
Demonstrates: Foundation in disciplined financial data engineering: ETL reliability, regulated workflows, reporting infrastructure, and practical tools for technical and business users.

Focus

Stack

Approach / Engineering Principles

A practical path from ambiguous mandate to durable system

The pattern is direct: understand the workflow, ground it in governed data, design for production behavior, and leave teams with maintainable operating practices.

Clarify

Start with the business workflow

Define the decision, user path, source systems, owners, failure modes, and economic constraints before choosing the AI or data architecture.

Ground

Connect AI to governed data

Use source-of-truth datasets, metric contracts, retrieval metadata, and access boundaries so answers can be traced and corrected.

Operate

Design for evals and production behavior

Build dry runs, tests, reconciliation checks, confidence gates, observability, and human review into the workflow instead of adding them after launch.

Transfer

Leave durable systems and team capability

Document decisions, coach internal owners, and leave an operating model that can keep improving after the initial build.

Experience

Senior engineering judgment for ambiguous systems work

The operating mode is hands-on and leadership-facing: clarify the problem, design the system, and leave teams with durable foundations.

Principal Engineer, AI/ML/Data - Hearst Magazines

Principal Engineer since 2023, promoted from Senior Data Engineer, working inside a large digital media and e-commerce publisher with 140M+ monthly consumers.

Lead a cross-functional data and AI team of five across roadmap, architecture, stakeholder alignment, and delivery
Platform impact across multiple engineering, analytics, and business teams
Build AI analytics, anomaly analysis, commerce validation, article classification, and shared data-platform systems

Senior Data Engineering - Meta, Point Predictive, 1stDibs, Barclays

Prior work across consumer-scale event pipelines, fintech model infrastructure, e-commerce ML/data systems, and financial technology.

Worked on data systems processing 5B+ events daily and real-time PII detection at Meta
Improved fintech ML data infrastructure with stronger model performance, faster responses, and higher scale
Built across dbt, Airflow, PySpark, Kafka/Kinesis, BigQuery, AWS, GCP, Azure, LangGraph, OpenAI, Anthropic, Bedrock, and Vertex AI

Systems leadership pattern

Hands-on principal engineering work where strategy, architecture, implementation, and operating ownership need to converge.

Translate ambiguous mandates into scoped delivery plans and measurable system behavior
Build enough of the critical path to prove the architecture under real constraints
Create contracts, documentation, and team practices that survive beyond the initial launch

Contact

Schedule time

Use Calendly to find a time that works without the back-and-forth. For technical notes, project context, or background verification, email still works.

Book time on Calendly Email