Deep dives into the technical challenges, team dynamics, and outcomes behind the most meaningful work I've done.

Google Cloud 2022 – Present

Workforce Identity at Cloud Scale

Building SSO federation and auth infrastructure for enterprise GCP customers, connecting major identity providers to Google Cloud at millisecond latency.

Identity & Auth Security 15+ Engineers

Context

Enterprise customers adopting GCP needed a way to connect their existing identity providers — Azure Entra ID, Okta, Ping, and others — to Google Cloud without managing separate credentials. The workforce identity platform is the backbone of how large organizations manage access to GCP at scale.

The team owned the full auth stack: SSO federation, OAuth infrastructure, reauthentication flows, session lifecycle, and abuse detection — all running globally with strict latency and availability requirements.

Technical Challenge

Federation at this scale involves deep protocol work — SAML, OIDC, and OAuth across dozens of enterprise IdP configurations, each with quirks. Getting SSO to work reliably across Azure Entra ID, Okta, and Ping meant handling edge cases in token exchange that most implementations never encounter.

The hardest part was the compliance layer: achieving FedRAMP High, IL4, and IL5 certifications while maintaining the same codebase and SLO targets. Security requirements for regulated markets changed the architecture significantly — data residency, key management, audit logging, and access controls all had to be redesigned or hardened.

We maintained 99.99% availability on global infrastructure. A/B testing, canary deployments, and CUJ (Critical User Journey) monitoring were core to how we shipped changes safely at this scale.

Java C++ Go Spanner DB Protobufs Terraform OpenAPI

Team & Org

I lead 15+ engineers organized across authentication infrastructure, session management, and security/compliance workstreams. The team spans both feature development and production reliability — everyone owns their services end-to-end.

A consistent focus has been building a culture of high ownership and directness. Manager ratings have stayed at 4.5+ across delivery, people growth, and community — which matters to me more than any individual metric because it reflects how the team feels about the work environment.

Outcome

GCP now supports SSO federation with the major enterprise identity providers, enabling large organizations to manage GCP access through their existing identity infrastructure. The platform achieved regulated market certifications (FedRAMP High, IL4, IL5), opening GCP to government and defense customers who require these standards.

Meta 2022

Monetizing Short-Form Video Without Hurting the Experience

Leading ads monetization for Facebook video — balancing revenue growth with user experience in one of the most competitive formats in social media.

Ads & Monetization Full-Stack 10 Engineers

Context

Facebook was scaling its short-form video product aggressively — competing directly with TikTok and Instagram Reels. My team owned ads monetization for video on the Facebook app. The fundamental tension: ads are how Facebook makes money, but aggressive ad insertion kills engagement and retention. Getting this balance right is both a technical and product problem.

Technical Challenge

The core challenge was building non-interruptive ad formats — ads that could appear alongside video content without breaking the viewing experience or dropping engagement metrics. This required deep work on ad ranking, timing, and format design in collaboration with data science and ML teams.

Moving into Reels monetization in H2 2022 meant starting from scratch on ad efficiency — Reels has different user behavior patterns than traditional video, so insertion logic, bidding, and format constraints all needed to be rethought.

Team & Org

I managed a full-stack team of 10 engineers — 4 backend, 4 mobile, plus data engineers. The team was small but high-leverage given the revenue impact of the surface we owned. I worked closely with PM, PMM, and data science on strategy and roadmap, and ran the H2 2022 planning process with cross-functional leads.

A focus for me was career development — I created individual growth plans for each engineer and built promotion cases for roughly half the team in the first cycle.

Outcome

Launched non-interruptive ad formats on Facebook video in H1 2022, growing revenue while improving ad efficiency. Shifted focus to Facebook Reels monetization in H2, with a target to significantly improve ad efficiency on the new format. The work contributed to Facebook's broader video monetization strategy during a critical period of format transition.

Sumo Logic 2020 – 2021

Building an Observability Platform from Zero

Taking Sumo Logic from no alerting product to a market-competitive observability platform — built, launched, and scaled in under a year.

Distributed Systems Observability 9 Engineers

Context

Sumo Logic had strong capabilities in log management and search but lacked a competitive alerting product. Enterprise customers were using Datadog, PagerDuty, and others for alerting while using Sumo for logs — a fragmented workflow that was a clear competitive disadvantage and a customer retention risk.

The ask was to build and ship a new alerting platform that would make Sumo a credible end-to-end observability solution. The timeline was aggressive and the technical bar was high.

Technical Challenge

The platform needed to process hundreds of thousands of monitors per minute — alerts triggered from application logs, infrastructure metrics, and custom queries running continuously against Sumo's data plane. Building this as a reliable distributed system on AWS meant careful design of the scheduling, evaluation, and notification layers.

We also had a delivery problem to solve: the team had been shipping monthly, which was too slow to iterate and compete. Moving to weekly releases required changes to the testing strategy, deployment pipeline, and incident response process — not just the code.

Log search availability was a persistent reliability issue we inherited. Improving it from 75% to 98% required root cause analysis across the data path and targeted infrastructure changes.

AWS EC2 DynamoDB Redis RDS Kafka Kubernetes Zookeeper Terraform React Angular

Team & Org

I led 9 engineers — 6 backend, 3 frontend. When I joined, the team had some trust and morale challenges from prior delivery struggles. The biggest early investment was in process clarity and execution rhythm: clear sprint goals, better incident ownership, and a weekly demo culture that made progress visible.

The team was recognized as the best execution team at the Q1 2021 town hall — which mattered because it came from peers and leadership across the company, not just our own reporting chain. Four engineers were promoted during my tenure, which reflected the investment in growth plans and the opportunities we created by shipping at higher velocity.

Outcome

The alerting platform launched and was rapidly adopted across the customer base, creating tens of thousands of monitors within months. It became a new revenue line for the company within a quarter of launch. Delivery cadence improved from monthly to weekly. Log search availability went from 75% to 98%. The team went from struggling to being a model for execution across the org.

AppDynamics (Cisco) 2016 – 2020

Pioneering Mobile & IoT Performance Monitoring

From founding engineer to EM — building a new product category, the team, and the market from scratch.

Mobile SDKs IoT APM 8 Engineers

Context

AppDynamics was the leading APM platform for server-side applications but had no mobile or IoT story. As enterprises moved to mobile-first strategies and IoT deployments proliferated, the gap was becoming a competitive liability. I joined as the founding engineer to build this from zero.

The challenge wasn't just technical — it was proving that a new product category was worth building, validating it with early customers, and then scaling both the technology and the team to bring it to market.

Technical Challenge

Mobile APM required lightweight SDKs that could run on iOS, Android, Xamarin, and React Native without impacting app performance — the SDK itself couldn't be the thing slowing down the app. This meant careful work on instrumentation overhead, batching, and adaptive sampling.

IoT was a different problem: C++ SDKs for constrained devices, a cloud backend processing millions of events per minute from hundreds of millions of devices, and geolocation microservices to map where those devices were in the world.

The product innovation I'm most proud of was end-user journey mapping — the first feature in the industry that could reconstruct a user's complete journey across mobile and web touchpoints within an application. It required correlating events across SDK agents, backend traces, and session data in real time. This shipped as an industry first and meaningfully improved customer acquisition.

2 patents were granted for techniques to make device agents intelligent in capturing performance data and deriving actionable insights.

iOS SDK Android SDK React Native IoT C++ SDK AWS Kinesis DynamoDB Elasticsearch ELB

Team & Org

I started as the only engineer on this product. Over four years I built and grew a team of 8 engineers — mobile specialists, backend engineers, and later an infrastructure team. Building a team from scratch means you're also defining the culture, the on-call practices, the code review bar, and the career ladder — all simultaneously with shipping product.

Customer support was a significant org challenge. The SLA was 3 weeks — unacceptably slow for a developer tool. I drove this to 3 days through a combination of product quality investments, better documentation, and a structured on-call rotation. This wasn't just an ops improvement; it changed how the team thought about quality and accountability.

Outcome

The platform became a significant revenue line for AppDynamics, generating over $50M in annual revenue at the time of my departure. Customer attach rate increased meaningfully at the launch of journey mapping. Support SLA improved from 3 weeks to 3 days. The team grew from 1 to 8 engineers with multiple promoted leaders. Two patents were granted for device intelligence techniques. Published thought leadership articles referenced by SD Times.

Apple 2012 – 2016

Making iPhones Last Longer on Cellular

Context-aware cellular intelligence to improve battery life on iPhone, iPad, and Apple Watch — deep changes across the cellular stack, iOS kernel, and CFNetwork.

Cellular iOS Kernel Embedded 3 Teams · 8 Engineers

Context

Battery life was consistently one of the top customer complaints for Apple devices, especially in areas with weak or variable cellular coverage. In poor signal conditions, a device's cellular modem aggressively retries connections, driving up power consumption dramatically. The opportunity was to make the cellular stack smarter — not just more efficient, but context-aware.

Technical Challenge

This was deep systems work across multiple layers: the cellular embedded stack, the data bus between the modem and application processor, the iOS kernel, and the CFNetwork framework. Changes at any one layer could introduce regressions in others — the cross-layer integration testing alone was a significant engineering challenge.

The two key innovations were:

Dynamic connection throttling using reinforcement learning of network performance. The system learned how a device's cellular connection typically behaved over time and in specific locations, then used that to make smarter decisions about when to retry connections vs. wait.

Context-aware cellular networking using motion sensors and screen activity. If a device is stationary and the screen is off, it doesn't need to aggressively maintain a cellular connection. Correlating cellular behavior with device context — motion, screen state, app activity — let us dramatically reduce unnecessary connection attempts.

The result was a reduction in outgoing connections by 25% and measurable battery life improvements in weak cellular conditions. 8 patents were granted for these techniques.

Team & Org

I led 3 cross-functional teams: cellular (modem and stack), networking (iOS networking framework), and application (app-layer behavior). 8 engineers total, with expertise spanning embedded firmware, OS internals, and application networking. Coordinating across these teams required a clear shared model of the problem — we were all touching different parts of the same system, and changes needed to be tested end-to-end across all three layers.

Outcome

Measurable improvement in battery life for iPhone, iPad, and Apple Watch during poor cellular coverage — the most power-intensive scenario. The techniques were novel enough to result in 8 granted US patents, many of which remain in use in Apple devices today.

Personal Projects

Active

Taskday

A focused daily task manager built around the idea that you should only commit to what you can actually finish today. Details coming soon.

Active

memlib

Personal knowledge library — a tool for capturing and connecting ideas across reading, notes, and conversations. Details coming soon.