λ akhil
← Back to Blog

SRS CRM: what building a real SaaS taught me about scaling

SRS CRM started as a simple lead management system and grew into a multi-channel outreach automation platform with AI scoring. Here is what I learned about scaling infrastructure along the way.

typescriptcrminfrastructure

SRS CRM: what building a real SaaS taught me about scaling

SRS CRM started as a lead management system. Simple stuff: contacts, pipelines, notes. Then came the asks. Email sequences. LinkedIn outreach. Lead scoring. AI-generated follow-ups. A mobile app. Bulk import. Analytics dashboards. Each feature individually is reasonable. Together they turn a simple CRUD app into a distributed system with some interesting problems.

I learned more about scaling infrastructure from this project than from anything else I have built.

the early architecture

The first version was exactly what you expect: Express API, PostgreSQL, React frontend. Simple MVC with no real layering. It worked fine at small scale. The problems started when we added background jobs for email sequences and the "simple" codebase started having race conditions everywhere.

Background jobs in a process that also handles web requests are a bad idea. The job queue needs to survive deploys. The job results need to be queryable. The job failures need alerts. None of this is hard, but it is enough complexity that keeping it in the same process is a mistake.

We split early: API servers handle requests, worker pool handles jobs, both talk to the same PostgreSQL. That boundary made everything cleaner. When the job queue gets backed up, you scale the workers, not the API servers.

lead scoring with ML

The AI lead scoring was the feature I was most skeptical about and ended up finding most useful. The intuition is simple: given all the signals you have about a lead (title, company size, engagement with emails, time since last contact, deal stage), can you predict which leads are most likely to convert?

The model is a gradient boosted tree trained on historical deal outcomes. Nothing exotic. The features matter more than the algorithm here. Engagement signals (email opens, link clicks, reply rate) are the strongest predictors. Company size and title are decent. Time-based features (days since first contact, days in current stage) are surprisingly useful for flagging deals that have gone stale.

Getting the training data was the hard part. Closed-lost deals are underrepresented because salespeople sometimes just stop updating CRM records instead of marking deals lost. We had to clean the data aggressively before training was useful.

multi-channel outreach is operationally painful

Sending emails is easy. Sending emails at scale with good deliverability is a job in itself. We use SES with multiple sending domains, bounce and complaint handling, suppression lists, and careful rate limiting per domain per hour. Miss any of these and your deliverability tanks.

LinkedIn outreach adds a different kind of complexity. The LinkedIn API has strict rate limits and the OAuth flow requires the user to stay connected. We had to build a connection health monitoring system that detects when a user account gets rate limited or disconnected and pauses their sequences automatically.

The lesson: every channel has its own operational quirks. Budget engineering time per channel, not just for the initial integration.

the database evolution

We stayed on PostgreSQL the whole time, but the schema evolved significantly. The initial schema had a contacts table with individual columns for every field. As the product grew and customers wanted custom fields, we moved to a hybrid: a contacts table with a JSONB column for custom fields, plus indexed virtual columns for the commonly filtered ones.

This is a pattern I would use from the start next time. Core fields that you always query get real columns. Everything else goes in JSONB. Partial indexes on JSONB fields handle the common custom field queries. You lose some type safety but gain the flexibility that real product requirements demand.

what I would change

Start with the job queue separation. We added it reactively instead of proactively, which meant a painful migration. Build the analytics data model separately from the operational model from day one: operational data and reporting data have different access patterns and trying to serve both from the same tables causes problems at any real scale. And instrument everything earlier. The observability we have now we should have had from launch.

The product works. Real users, real data, real revenue. But the engineering decisions I would make on day one today are pretty different from the ones I actually made.

AKhil Raghav — iOS/OSX Swift Developer