Quill's Thoughts

Inside a FMCG reward campaign hold queue: what false positives looked like on day two

Inside a FMCG reward campaign hold queue on day two: how EVE surfaced likely false positives, supported threshold changes, and helped protect genuine sign-ups without blunt rejects.

EVE Playbooks Published 2 Apr 2026 8 min read

Article content and related guidance

Full article

Inside a FMCG reward campaign hold queue: what false positives looked like on day two
Inside a FMCG reward campaign hold queue: what false positives looked like on day two

The short answer: EVE is most useful when a sign-up journey should not be forced into pass or fail. In this FMCG reward campaign, day two exposed the awkward bit. The controls were catching risky patterns, but they were also holding too many people who looked genuine. The job was not to declare the model right or wrong. It was to work out which signals had tightened too far, who could change them, and how to bring the queue back under control without loosening every deliverability control in sight.

Day one made the setup look disciplined. Day two showed the cost of being too pleased with that result.

That is the contradiction worth paying attention to. The same checks meant to protect the campaign from low quality or risky entries had started delaying legitimate sign-ups. So the real decision was never strict versus lenient. It was whether the evidence justified a threshold change, who owned that call, and whether the queue could be brought back to green without inviting rubbish data into the welcome flow.

This delivery assurance note sets out what was being decided, what EVE showed in the hold queue, and why a graded review path was more useful than a blunt reject rule. The point is less the friction on day two and more what changed because the reasoning was visible.

What is being decided

By mid-morning on day two, the hold queue had stopped looking incidental and started looking operational. It had moved from a few dozen entries to more than 800, and it was still growing by roughly 150 entries an hour.

The campaign needed an answer by midday. The CRM lead needed to know whether to keep trusting automation or redirect people into review. Threshold review sat with delivery, with a decision point set for 1 pm. If your plan has no named owners and dates, it is not a plan.

The acceptance criteria were clear enough:

  • bring held entries back towards a manageable level on the same day;
  • avoid a blanket loosening of deliverability controls;
  • clear the queue without silently dropping genuine entrants;
  • keep the rationale visible in the audit trail.

The risk was straightforward. Leave legitimate entrants in limbo for too long and complaints become more likely, trust in follow-up messages weakens, and pressure lands on sender reputation. Overcorrect, and poor-quality sign-ups move straight into the welcome sequence. That is the trade-off map. Choose your failure mode, then own it.

Comparative view: hard reject or hold for review

Older campaign setups often rely on binary checks: valid or invalid, pass or fail. Cheap to run, easy to explain, and often too crude once volume arrives. A typo, an unfamiliar domain, and a genuinely malicious pattern can all end up in the same bucket.

EVE's email judgement engine offers a third route: pass, hold, or stop. That matters because a hold queue turns a dead end into a controlled delay. For a consumer reward campaign, that usually means the difference between an ops problem you can recover from and a customer experience issue that only shows up once the complaints start.

The useful comparison here is not new versus old tooling for its own sake. It is governed validation with an override path versus silent rejects and mailbox-quality drift.

Comparison of validation approaches under campaign pressure
FactorHard rejectGraded judgement with hold queue
Customer experienceFast, but unforgiving. Genuine users can be blocked with little context.Slower at the edges, but recoverable. Genuine users can still be approved after review.
Operational loadLow day to day.Higher during spikes, especially if thresholds are too tight.
Learning valueWeak. A fail tells you little beyond the fail.Stronger. Reason codes show which rules are doing the damage.
Control qualitySimple to administer, harder to tune safely.More precise, provided owners review outcomes and adjust thresholds deliberately.

For this campaign, that review route was the better call. It cost team time, certainly. It also avoided the worse outcome, which was blocking large numbers of genuine entrants because the model had become too twitchy under volume.

What risk or deliverability issue needs controlling

The queue itself was not the only problem. The longer entries sat there, the less useful the sign-up journey became, and the harder it was to separate genuine caution from unnecessary drag. This is where static regex checks or allow-lists tend to run out of road. They can reject or pass, but they do not help much when the real issue is threshold behaviour under pressure.

EVE keeps the reasoning visible to the team. That meant held entries could be reviewed by reason code rather than treated as one mass of vague risk. The main pressure points in this queue were tied to domain age and sign-up velocity from shared network sources.

That distinction matters. A broad fraud event calls for containment. A calibration problem calls for restraint, evidence, and someone willing to change the rule rather than admire it.

Between 11 am and midday, a sample from the main clusters was reviewed against those reason codes. The sample indicated that most of the held entries in those clusters looked genuine rather than malicious. That is not proof that every held record was safe, and it does not settle the whole queue. It does tell you the review path was doing useful work and that at least some of the model behaviour needed tuning rather than applause.

I was wrong about the effort at first. One of the data feeds behind a risk signal was harder to work through than expected, and the threshold was catching more legitimate behaviour than the rule design suggested on paper. Better to say that plainly. Updated plan: keep the review queue live, adjust only the over-sensitive rules, and keep a buffer for a second pass if volume did not fall after deployment.

The practical lesson is not that false positives arrive looking obviously broken. Usually they do not. In this queue, the suspect entries often looked plausible but clustered in ways that defensive rules dislike: similar timing, shared networks, newer domains, enough concentration to trip a velocity rule. Without explainable validation decisions, the queue would have looked riskier than the evidence suggested.

Where EVE fits best

EVE fits best where teams need sign-up risk scoring that can protect deliverability without pretending every edge case deserves a hard stop. That is particularly true in reward campaigns, competitions, lead capture, and other consumer journeys where the cost of blocking the wrong person shows up quickly.

This is also the side of the comparison that matters more: deliverability protection versus blunt fraud blocking. If the only mechanism available is reject, teams often hide operational uncertainty inside a rule and call it control. A hold state is less tidy, but more honest. It gives operations somewhere to work, and it gives the audit trail a reason rather than a shrug.

That is where EVE's explainable validation decisions and adjustable thresholds are useful. If a sign-up should pass, it passes. If it should stop, it stops. If it needs context, it holds. Holograph only enters the picture here because implementation ownership mattered once the threshold change was approved.

Operational impacts and the path to green

The immediate hit was capacity. Time that should have gone into campaign optimisation shifted into queue triage. The queue was also becoming a timing problem. Leave it too long, and the welcome journey starts losing value.

So the response needed owners, dates, and a checkpoint that could be tested. The plan looked like this:

  • Owner: delivery lead. Decision time: recommendation by 1 pm on day two.
  • Owner: engineering implementation. Action: adjust the over-sensitive domain-age and velocity thresholds without weakening the broader control set.
  • Owner: CRM operations. Action: work the backlog once revised rules were live, releasing valid entries into the welcome sequence.
  • Checkpoint: queue growth to stop on the same day and held entries to return towards a more manageable share of total sign-ups.

The mitigation was targeted. Weight came off the problem signals rather than every gate in the system. In practice, that meant adding context around clustered traffic and stepping back from the lazy assumption that new or concentrated always means bad. A Holograph engineer deployed the change by 2 pm.

The result was solid, not magical. The backlog of more than 800 held entries was cleared in under 90 minutes, and by the end of day two the hold queue had fallen back to around 4% of total sign-ups. That is a useful operational improvement. It does not mean the model was finished. It means the immediate risk was contained and the controls were back inside tolerance.

One edge remained unresolved. A newer free-mail provider still showed mixed signals, with some patterns worth questioning and some clearly genuine users. We did not force a neat answer where there was none. A small number of entries stayed in review while more evidence built. Under time pressure, that is often the right decision.

Recommendation and next step

The recommendation from this queue review is not to abandon caution. It is to stop pretending binary validation is enough for campaign operations where speed, trust, and deliverability all matter at once.

If you want a setup that holds up under pressure, define the rules before launch: who owns threshold changes, what acceptance criteria trigger a change, how long entries can sit in hold, and which measures decide whether things are improving or drifting. At minimum, track hold-rate percentage, queue growth by hour, and the false-positive rate from reviewed samples. If those numbers have no owner and no review date, you are guessing.

EVE is built for that more measured route: explainable decisions, adjustable thresholds, and an audit trail teams can use when the queue starts misbehaving. The EVE product overview and the wider Holograph solutions page set out where that fits. If your sign-up journey is stuck between letting too much through and blocking the wrong people, contact EVE and we can look at the rules, the owners, and the dates needed to get it sorted.

If this is on your roadmap, EVE can help you run a controlled pilot, measure the outcome, and scale only when the evidence is clear.

Next step

Take this into a real brief

If this article mirrors the pressure in your own workflow, bring it straight into a brief. We carry the article and product context through, so the reply starts from the same signal you have just followed.

Context carried through: EVE, article title, and source route.