Deep dives into the technical challenges, team dynamics, and outcomes behind the most meaningful work I've done.
Enterprise customers adopting GCP needed a way to connect their existing identity providers — Azure Entra ID, Okta, Ping, and others — to Google Cloud without managing separate credentials. The workforce identity platform is the backbone of how large organizations manage access to GCP at scale.
The team owned the full auth stack: SSO federation, OAuth infrastructure, reauthentication flows, session lifecycle, and abuse detection — all running globally with strict latency and availability requirements.
Federation at this scale involves deep protocol work — SAML, OIDC, and OAuth across dozens of enterprise IdP configurations, each with quirks. Getting SSO to work reliably across Azure Entra ID, Okta, and Ping meant handling edge cases in token exchange that most implementations never encounter.
The hardest part was the compliance layer: achieving FedRAMP High, IL4, and IL5 certifications while maintaining the same codebase and SLO targets. Security requirements for regulated markets changed the architecture significantly — data residency, key management, audit logging, and access controls all had to be redesigned or hardened.
We maintained 99.99% availability on global infrastructure. A/B testing, canary deployments, and CUJ (Critical User Journey) monitoring were core to how we shipped changes safely at this scale.
I lead 15+ engineers organized across authentication infrastructure, session management, and security/compliance workstreams. The team spans both feature development and production reliability — everyone owns their services end-to-end.
A consistent focus has been building a culture of high ownership and directness. Manager ratings have stayed at 4.5+ across delivery, people growth, and community — which matters to me more than any individual metric because it reflects how the team feels about the work environment.
GCP now supports SSO federation with the major enterprise identity providers, enabling large organizations to manage GCP access through their existing identity infrastructure. The platform achieved regulated market certifications (FedRAMP High, IL4, IL5), opening GCP to government and defense customers who require these standards.
Facebook was scaling its short-form video product aggressively — competing directly with TikTok and Instagram Reels. My team owned ads monetization for video on the Facebook app. The fundamental tension: ads are how Facebook makes money, but aggressive ad insertion kills engagement and retention. Getting this balance right is both a technical and product problem.
The core challenge was building non-interruptive ad formats — ads that could appear alongside video content without breaking the viewing experience or dropping engagement metrics. This required deep work on ad ranking, timing, and format design in collaboration with data science and ML teams.
Moving into Reels monetization in H2 2022 meant starting from scratch on ad efficiency — Reels has different user behavior patterns than traditional video, so insertion logic, bidding, and format constraints all needed to be rethought.
I managed a full-stack team of 10 engineers — 4 backend, 4 mobile, plus data engineers. The team was small but high-leverage given the revenue impact of the surface we owned. I worked closely with PM, PMM, and data science on strategy and roadmap, and ran the H2 2022 planning process with cross-functional leads.
A focus for me was career development — I created individual growth plans for each engineer and built promotion cases for roughly half the team in the first cycle.
Launched non-interruptive ad formats on Facebook video in H1 2022, growing revenue while improving ad efficiency. Shifted focus to Facebook Reels monetization in H2, with a target to significantly improve ad efficiency on the new format. The work contributed to Facebook's broader video monetization strategy during a critical period of format transition.
Sumo Logic had strong capabilities in log management and search but lacked a competitive alerting product. Enterprise customers were using Datadog, PagerDuty, and others for alerting while using Sumo for logs — a fragmented workflow that was a clear competitive disadvantage and a customer retention risk.
The ask was to build and ship a new alerting platform that would make Sumo a credible end-to-end observability solution. The timeline was aggressive and the technical bar was high.
The platform needed to process hundreds of thousands of monitors per minute — alerts triggered from application logs, infrastructure metrics, and custom queries running continuously against Sumo's data plane. Building this as a reliable distributed system on AWS meant careful design of the scheduling, evaluation, and notification layers.
We also had a delivery problem to solve: the team had been shipping monthly, which was too slow to iterate and compete. Moving to weekly releases required changes to the testing strategy, deployment pipeline, and incident response process — not just the code.
Log search availability was a persistent reliability issue we inherited. Improving it from 75% to 98% required root cause analysis across the data path and targeted infrastructure changes.
I led 9 engineers — 6 backend, 3 frontend. When I joined, the team had some trust and morale challenges from prior delivery struggles. The biggest early investment was in process clarity and execution rhythm: clear sprint goals, better incident ownership, and a weekly demo culture that made progress visible.
The team was recognized as the best execution team at the Q1 2021 town hall — which mattered because it came from peers and leadership across the company, not just our own reporting chain. Four engineers were promoted during my tenure, which reflected the investment in growth plans and the opportunities we created by shipping at higher velocity.
The alerting platform launched and was rapidly adopted across the customer base, creating tens of thousands of monitors within months. It became a new revenue line for the company within a quarter of launch. Delivery cadence improved from monthly to weekly. Log search availability went from 75% to 98%. The team went from struggling to being a model for execution across the org.
AppDynamics was the leading APM platform for server-side applications but had no mobile or IoT story. As enterprises moved to mobile-first strategies and IoT deployments proliferated, the gap was becoming a competitive liability. I joined as the founding engineer to build this from zero.
The challenge wasn't just technical — it was proving that a new product category was worth building, validating it with early customers, and then scaling both the technology and the team to bring it to market.
Mobile APM required lightweight SDKs that could run on iOS, Android, Xamarin, and React Native without impacting app performance — the SDK itself couldn't be the thing slowing down the app. This meant careful work on instrumentation overhead, batching, and adaptive sampling.
IoT was a different problem: C++ SDKs for constrained devices, a cloud backend processing millions of events per minute from hundreds of millions of devices, and geolocation microservices to map where those devices were in the world.
The product innovation I'm most proud of was end-user journey mapping — the first feature in the industry that could reconstruct a user's complete journey across mobile and web touchpoints within an application. It required correlating events across SDK agents, backend traces, and session data in real time. This shipped as an industry first and meaningfully improved customer acquisition.
2 patents were granted for techniques to make device agents intelligent in capturing performance data and deriving actionable insights.
I started as the only engineer on this product. Over four years I built and grew a team of 8 engineers — mobile specialists, backend engineers, and later an infrastructure team. Building a team from scratch means you're also defining the culture, the on-call practices, the code review bar, and the career ladder — all simultaneously with shipping product.
Customer support was a significant org challenge. The SLA was 3 weeks — unacceptably slow for a developer tool. I drove this to 3 days through a combination of product quality investments, better documentation, and a structured on-call rotation. This wasn't just an ops improvement; it changed how the team thought about quality and accountability.
The platform became a significant revenue line for AppDynamics, generating over $50M in annual revenue at the time of my departure. Customer attach rate increased meaningfully at the launch of journey mapping. Support SLA improved from 3 weeks to 3 days. The team grew from 1 to 8 engineers with multiple promoted leaders. Two patents were granted for device intelligence techniques. Published thought leadership articles referenced by SD Times.
Battery life was consistently one of the top customer complaints for Apple devices, especially in areas with weak or variable cellular coverage. In poor signal conditions, a device's cellular modem aggressively retries connections, driving up power consumption dramatically. The opportunity was to make the cellular stack smarter — not just more efficient, but context-aware.
This was deep systems work across multiple layers: the cellular embedded stack, the data bus between the modem and application processor, the iOS kernel, and the CFNetwork framework. Changes at any one layer could introduce regressions in others — the cross-layer integration testing alone was a significant engineering challenge.
The two key innovations were:
Dynamic connection throttling using reinforcement learning of network performance. The system learned how a device's cellular connection typically behaved over time and in specific locations, then used that to make smarter decisions about when to retry connections vs. wait.
Context-aware cellular networking using motion sensors and screen activity. If a device is stationary and the screen is off, it doesn't need to aggressively maintain a cellular connection. Correlating cellular behavior with device context — motion, screen state, app activity — let us dramatically reduce unnecessary connection attempts.
The result was a reduction in outgoing connections by 25% and measurable battery life improvements in weak cellular conditions. 8 patents were granted for these techniques.
I led 3 cross-functional teams: cellular (modem and stack), networking (iOS networking framework), and application (app-layer behavior). 8 engineers total, with expertise spanning embedded firmware, OS internals, and application networking. Coordinating across these teams required a clear shared model of the problem — we were all touching different parts of the same system, and changes needed to be tested end-to-end across all three layers.
Measurable improvement in battery life for iPhone, iPad, and Apple Watch during poor cellular coverage — the most power-intensive scenario. The techniques were novel enough to result in 8 granted US patents, many of which remain in use in Apple devices today.
A focused daily task manager built around the idea that you should only commit to what you can actually finish today. Details coming soon.
Personal knowledge library — a tool for capturing and connecting ideas across reading, notes, and conversations. Details coming soon.