
Hiring the wrong product development partner doesn't just cost the engagement fee.

Decades of Standish Group CHAOS data show roughly two-thirds of software projects miss budget, schedule, or scope targets, and partner selection sits upstream of most of those failures. It costs the rebuild fee a year later, the time it takes to find a replacement partner, and the investor confidence damage when the launch timeline slips by a quarter.
The diligence to avoid that outcome is unglamorous. It's a 30-minute checklist run before the contract is signed, focused on capabilities you can verify, red flags you can identify, and questions you can ask that filter out vendors who would never deliver. This article is that checklist.
If you're a founder picking your first product development partner or a CTO evaluating a replacement for one that didn't work out, the framework below gives you concrete criteria. The companies that get this right ship better products. The ones that don't pay for the same software three times.
The phrase "product development services" gets thrown around loosely. Some vendors mean staff augmentation under another name. Others mean fixed-scope freelance work. A real product development partner delivers four specific things end-to-end.
A discovery output. Before engineering starts, a real partner runs a discovery phase that produces a written artifact: scoped feature list, technical assumptions, integration map, risk register, and a realistic timeline with milestones. PMI's Pulse of the Profession is consistent on this point: organizations that invest in formal discovery and value-delivery practices waste roughly 28x less on poor performance. If a vendor wants to skip discovery, they're not a product development partner. They're a body shop. For startup-specific framing, see our concierge MVP guide.
A scoped roadmap. The roadmap maps features to sprints with capacity assumptions. It tells you what ships at each milestone and what doesn't. It changes as you learn things, but it's a real document, not a vague Gantt chart.
Working software shipped to a real environment. This sounds obvious. It isn't. Partners that bill T&M without scope discipline can run for months without ever shipping a production deploy. A real partner has a release cadence and respects it.
Governance artifacts. Sprint reports, burndown, retrospective notes, change request log, post-launch support plan. McKinsey's operations research consistently finds that disciplined governance is the single largest predictor of agile-delivery outcomes. These are the artifacts that make the engagement audit-able, both to you and to investors during due diligence.
If a vendor doesn't deliver all four, they're a vendor. They might still be useful, but they're not a partner.
Use this list during evaluation calls. Score each capability 0–2: 0 if it's missing, 1 if it's partial, 2 if it's strong. A partner that scores below 12 isn't ready for your project.
1. Discovery rigor. Have they run discovery on at least three projects in the last two years? Ask for the artifacts they produced (redacted is fine). If they can't show one, they don't do discovery.
2. Senior engineering oversight. Is a senior engineer (7+ years, real production experience) on every project from day one? Or do they staff junior teams and hope the buddy system covers? Ask who the tech lead would be on your project and what their portfolio looks like.
3. Design and engineering integration. Are designers and engineers in the same sprints, or does design hand off PSDs to engineering after week six? The first model works. The second produces apps that look great in mockups and break in production.
4. QA discipline. Do they have dedicated QA on every project? Not "engineers test their own code." Real QA, written test plans, regression coverage. This is the most-skipped capability and the most expensive to fix retroactively.
5. Compliance experience. Have they delivered in regulated industries that match yours (HIPAA, PCI DSS, GDPR, SOC 2)? Ask for examples. Compliance can't be picked up mid-project at acceptable cost.
6. Governance and reporting cadence. Weekly delivery report, monthly retrospective, quarterly business review. If they default to "we'll send updates as needed," they don't have governance.
7. Post-launch support. What happens after the product ships? Bug fix SLA, performance monitoring, OS-update tracking. Many vendors treat launch as the end. Real partners treat it as the start of phase two.
8. References at your scale. Three references from projects within 50% of your size and complexity. Smaller doesn't translate. A vendor that ships $10K marketing sites might not ship a $300K marketplace.
| Capability | What to verify | Score: weak (0) | Score: partial (1) | Score: strong (2) |
|---|---|---|---|---|
| Discovery rigor | Written discovery artifacts on 3+ recent projects | No artifact, sales deck only | One redacted artifact, no risk register | Full discovery output with risk register and milestone plan |
| Senior engineering oversight | 7+ year tech lead on every project from day one | Junior team, no named lead | Senior named on contract but absent from sprints | Senior lead on kickoff plus first three sprint reviews |
| Design and engineering integration | Designers and engineers in the same sprints | Hand-off model, PSDs to engineering | Mixed: some projects integrated, others not | Integrated from sprint one with shared backlog |
| QA discipline | Dedicated QA engineer with written test plans | No QA, engineers test their own code | QA shared part-time with no test plan | Named QA, regression coverage, written plans |
| Compliance experience | Delivered in HIPAA, PCI DSS, GDPR, or SOC 2 | No regulated work shown | One past project in one regime | Multiple projects across multiple regimes |
| Governance and reporting | Weekly delivery, monthly retro, quarterly business review | Ad-hoc updates only | Weekly status, no retros | Full cadence with written artifacts |
| Post-launch support | Bug SLA, monitoring, OS-update tracking | Launch is end of engagement | Reactive bug fixes only | SLA-backed support, monitoring, roadmap |
| References at your scale | Three on-budget references at ±50% of your size | No references, NDA excuses | One small or out-of-scope reference | Three on-budget references within 50% of your scope |

Some patterns predict bad outcomes regardless of how strong the pitch sounds.
Promise-heavy pitches with no discovery offer. A vendor that gives you a fixed price and timeline before running discovery is either guessing or padding. Either way the cost lands on you when scope diverges.
No senior on calls. If you only meet the senior engineer at the pitch but the project will be run by juniors, the pitch was the bait. Insist that the tech lead be on the kickoff call and the first three sprint reviews. Gartner's IT services research consistently lists "bait-and-switch staffing" among the top three failure modes in vendor selection.
No QA in scope. "QA is built into the developer process" is a euphemism for "we don't have QA." Insist on a named QA engineer on the team or in a shared pool.
No escalation path. If you ask "who do I call if a sprint slips" and the answer is "your project manager," they don't have one. A real partner has a delivery director above the PM.
No on-budget references. If every reference's budget grew "because scope evolved," the vendor doesn't scope well. References that came in on budget are the ones worth talking to.
The proposal is generic. If the proposal could be sent to any client in your industry without changing more than the name, they didn't do discovery. They did a template.
Six questions that filter out vendors who would otherwise pass the pitch.
Who owns the code on day one? Work-for-hire clause, every contributor named, no vendor retention. If they hesitate, walk.
What happens to my code if you go out of business? Source code escrow, repository access continuity, knowledge transfer plan. If they don't have one, you're betting your product on their solvency. BCG's digital technology and data work emphasizes continuity planning as a baseline expectation for any serious software vendor relationship.
How do you handle scope changes? Change request log, signed amendments, transparent re-estimation. Not "we'll figure it out."
What's your sprint cadence and how do you handle slips? Weekly or bi-weekly. When a sprint slips, what's the recovery protocol? If the answer is "we'll work weekends," they don't have a process.
Can I see three references at my scale, in my industry, that came in on budget? This is the highest-information question in the entire evaluation. If they can produce three, the rest of the diligence is verification. If they can't, the rest of the diligence is rationalization.
What's your team's attrition rate? Industry average is 15–20% annually. Above 25% means you'll lose key team members mid-project. Below 10% with data is a strong signal.

Case studies on a vendor's website are marketing. References are diligence. The difference is in the questions you ask.
When you call a reference, don't ask "were they good." Ask:
"What surprised you about the engagement?" Surprises are where the real story lives. Good vendors surprise references with overdelivery; bad ones surprise with hidden costs.
"What's the worst thing they did?" Every engagement has friction. A reference that can't name anything is either an unreliable narrator or a friend of the vendor. A reference that names something concrete and explains how the vendor handled it is the gold standard.
"Would you hire them again, knowing what you know now?" Yes/no question. Listen to how long the pause is before the answer.
"What did the engagement cost vs the original estimate?" Numbers matter. A reference that came in within 10% of estimate is meaningfully different from one that came in 50% over.
"How long did onboarding take?" Tells you whether the vendor has process or theatre.
For case studies on the vendor's website, look at metrics, not adjectives. "Improved performance" is meaningless. "10M downloads, 4.8 App Store rating, 12-month build" is meaningful. If the case studies don't have numbers, the vendor either doesn't measure or doesn't want you measuring.

At Empat, we've delivered 300+ projects across 23 markets for clients including 6 Y Combinator alumni, Fortune 500 companies, and 19 Forbes 30 Under 30 founders. Engagements start with discovery (2–6 weeks, from $10,000), move through MVP or full-scale build, and continue with post-launch support.
We score well on the checklist above because we built the company around it: senior engineering oversight from day one, dedicated QA on every project, design and engineering integrated in the same sprints, sprint reports as standard, and compliance experience across HIPAA, PCI DSS, GDPR, and SOC2.
Our reference list is concrete: Obimy (10M+ downloads, #1 US App Store), Monitree (100K+ NHS users), VitalAI (production-grade AI healthcare monitoring), Aurora (3.9× conversion lift). Average engagement is on-budget by design, not by luck.
If you're scoping a product development partnership, see our product development services, browse case studies, or book a free 30-minute consultation. For startup-specific outsourcing context, our outsourcing guide for startups covers the broader engagement-model decision. If you need a fractional senior to lead the engagement on your side, CTO as a service and fractional CTO cover that model. For SaaS-specific scope, see SaaS product development.
A bad partnership shows up in two places: the budget runs over, or the product runs broken. By the time either happens, the diligence window has closed.
Run the checklist before you sign. Score the eight capabilities honestly. Call the references and ask the hard questions. The 30 minutes spent on diligence is the cheapest project management on the engagement, and the only kind that happens before the bill arrives.
A freelancer delivers tasks. A staff augmentation provider delivers engineers. A product development partner delivers a working product end-to-end: discovery, scoped roadmap, build, QA, launch, post-launch support, and governance artifacts. If a vendor calls themselves a partner but only delivers engineers, they're a staff augmentation provider with marketing.
A written scoped feature list, technical assumptions and stack decisions, integration map, risk register, milestone timeline with sprint-level capacity assumptions, and a realistic budget with confidence ranges per milestone. If discovery produces less than this, the build phase will be expensive.
Three signals: repeated missed sprint deliverables without a recovery plan, scope-change conversations replacing product-build conversations, or the senior engineer named on the contract no longer attending sprint reviews. If two of three show up, start the replacement conversation. The cost of switching at month three is lower than the cost of finishing with the wrong partner at month nine.
Fixed-price works when scope is genuinely fixed and discovery has produced a confident estimate. T&M works for evolving products where requirements will shift. Most product development engagements should be hybrid: fixed-price discovery and milestone 1 (to bound risk), then T&M or monthly retainer for everything after (to handle scope evolution honestly).


