How Card-Testing Attacks Evolved in 2025

abstract visualization of rapid sequential transaction attempts on a payment network

Card-testing fraud does not look like it did in 2022. The old playbook — buy a list of stolen PANs, run $1 micro-transactions from one IP until something authorizes — gets caught by any decent velocity rule. The operators running these attacks know that. What they deploy today looks nothing like that, and that's exactly the problem for fraud teams still leaning on velocity thresholds.

The 2025 attack anatomy is distributed, paced, and behaviorally calibrated to look like real traffic. Understanding how it works is the first step to building detection that actually holds up.

What Card Testing Actually Is — and Why Attackers Still Bother

Card testing is the process of verifying whether a stolen card number is still active before monetizing it. The value is in the verification: a confirmed-live card is worth dramatically more on secondary markets than an unvalidated number. A $5 card with a known-good status can sell for $40–60 on carding forums, because the buyer knows it works.

The typical pipeline is: acquire a dump (often millions of numbers from a data breach), test at scale against low-friction merchants, extract the live ones, sell or use them. The economics have always favored scale. What changed is how attackers achieve that scale without triggering detection.

In 2024, Stripe reported that card testing represented 27% of all dispute-generating fraud on its network. The numbers have not gone down in 2025. The attacks have just gotten quieter.

The Old Model: Velocity Clusters

Pre-2022 card testing was unsophisticated. A single attacker IP would submit 50–500 transactions in a short window, all with small amounts, all targeting the same merchant. Velocity rules caught this trivially: if IP X submits more than 10 transactions in 5 minutes, block it.

Rules got refined as attackers rotated IPs. Then rules got layered: if any combination of (IP, device, BIN prefix) shows velocity above threshold, flag. This worked for a few years. The fraud rings adapted.

The shift happened when attackers started treating the detection rule itself as an input to their operation. They don't just want to evade the rule — they want to know exactly where the threshold is and operate just below it. That requires feedback loops, automation, and something closer to A/B testing their own attack patterns against different merchants.

The 2025 Attack: Distributed Slow-Burn Probing

Current card-testing infrastructure has three defining characteristics that make velocity rules nearly useless against it.

Distributed execution across residential IPs. Modern carding operations rent residential proxy networks — not datacenter IPs, not VPNs, but legitimate residential ISP addresses tied to real devices. A batch of 10,000 card tests might come from 8,000 different IP addresses spread across 40 US states. No single IP submits more than 2–3 transactions per hour. Velocity rules keyed to IP see nothing unusual.

Temporal spacing calibrated to your thresholds. Attackers probe the same merchant over days, not minutes. They'll submit 20 transactions on Monday, 18 on Tuesday, and 22 on Wednesday. The daily volume looks like low-tier legitimate traffic. The pattern only becomes visible when you look across a sliding 7-day window and correlate against the BIN prefix distribution — which most rule engines don't do.

Transaction amounts chosen to mimic real traffic. $1 micro-charges are dead. Current card testing uses amounts that mirror the target merchant's typical ticket size. A testing run against a SaaS subscription merchant will use $9.99 and $19.99. Against a gas station network, $45–85. The amount selection often correlates with the merchant category, which means amount-based anomaly detection fails unless it's benchmarked per-MCC.

Why BIN-Level Correlation Is the Critical Signal

When you pull back from transaction-level analysis and look at the BIN prefix distribution across a merchant's traffic over 7 days, card testing has a distinct fingerprint. A legitimate merchant will see BIN diversity that roughly tracks the issuer market share in their geography. Bank of America Visa cards (BIN 4111xx) show up proportionally to their card-in-wallet share among US consumers.

Card-testing batches come from dumps. Dumps are not random. They're typically from a single breach at a specific issuer, a specific retailer, or a specific geography. A single dump from a Southeast Asian processor breach will contain BINs heavily concentrated around 3-4 issuing banks. When those BINs suddenly appear at elevated rates across a merchant — even with low per-card velocity — the BIN distribution signal screams card testing.

InferX processes BIN-level distribution shifts as a feature in the fraud scoring model. A BIN cluster that represents 0.1% of normal traffic jumping to 8% of a 24-hour window is a strong signal regardless of per-IP or per-device velocity. This is one of the signals that velocity-only rule engines miss structurally.

Device Signal Spoofing: What's Been Bypassed

Browser-based device fingerprinting was a meaningful defensive layer in 2020–2021. Canvas fingerprinting, WebGL fingerprinting, font enumeration — these signals were hard to spoof at scale. That's no longer true.

Anti-fingerprint browsers (Multilogin, Kameleo, GoLogin) are cheap, automated, and specifically engineered to produce plausible, unique device fingerprints on each session. A fraud ring running 10,000 card tests will use an automated script that spins up a unique Multilogin browser profile for each transaction. Each profile produces a unique canvas hash, a unique font set, a unique WebGL renderer. The fingerprints look like real devices.

What still holds up is behavioral signal within the session. Legitimate users exhibit typing velocity, mouse movement patterns, form fill timing, and interaction sequences that emerge from human cognition and muscle memory. Automated testing scripts either produce too-perfect behavior (fixed-interval keystrokes, pixel-precise mouse movement) or too-random behavior (timing variance that looks statistically artificial). Behavioral biometrics applied to session data remains one of the harder signals to reliably spoof at scale.

The Authorization Feedback Loop Problem

Card testing is only partially about finding live cards. It's also about learning which merchants have weak fraud controls. Attackers run small-scale tests across hundreds of merchant categories. The ones with high authorization rates despite obviously anomalous patterns get marked as soft targets. The attack volume then concentrates on those merchants.

This creates a selection pressure problem: merchants with weak fraud controls attract disproportionate card-testing volume, which inflates their chargeback ratios, which raises their interchange costs or gets them placed into Visa's VAMP or Mastercard's ECP programs. The fraud problem is self-amplifying for the victims.

The implication for processors is that fraud detection has to be good enough to make the testing operation unprofitable, not just to block individual attacks. If an attacker can probe 10,000 cards at a merchant and successfully test 8,000 before the detection kicks in, that's still a profitable run. The detection threshold that matters is the one that makes the economics of the operation break down — typically below 5% successful tests per batch before the merchant looks "hard."

What Detection Infrastructure Needs to Do Differently

The detection architecture that actually catches 2025 card testing requires four things that rules engines don't provide.

First: cross-transaction correlation on a sliding window basis with configurable lookback periods. The attack spans days. You need to score each transaction against the aggregate pattern of the preceding 7 days, not just the last hour.

Second: BIN-level distribution monitoring. If BINs concentrated in a specific issuing region or issuer type appear at elevated rates relative to baseline, it's a signal independent of per-IP velocity.

Third: network graph relationships between cards, devices, and merchants. A device that appears across 50 different card numbers — even with plausible fingerprints — is structurally anomalous. Graph analysis surfaces this; per-transaction scoring misses it.

Fourth: authorization feedback incorporated into risk scores within minutes. When a card-test batch starts showing a pattern of declines at other merchants on the network, those declines should immediately elevate the risk score for the same cards at any merchant. This requires a shared signal layer across the processing network, not siloed per-merchant scoring.

Practical Thresholds That Actually Work

Without publishing the exact numbers InferX uses internally (which would just give attackers calibration data), some practical observations from production deployments:

Velocity thresholds keyed to individual IPs are nearly useless for detecting distributed card testing. They're still worth keeping as a first-pass filter for unsophisticated attacks, but they should not be a primary signal for experienced fraud operations. Threshold tuning time is better spent on BIN-cluster anomaly detection and graph relationship signals.

Amount-distribution anomaly per MCC is underused. Most merchants have a fairly predictable distribution of transaction amounts over a 30-day window. Card testing that mimics the merchant's typical ticket size still shows up as an anomalous spike in the lower percentiles of that distribution. A payment processor seeing 3x the normal volume in the $8–12 range at a merchant that typically does $40–200 tickets should flag that for review even if per-IP velocity looks clean.

Authorization rate by BIN prefix over a rolling 24-hour window is a simple signal that catches a lot. If a specific BIN prefix has a 90%+ authorization rate across the network yesterday but drops to 10% today at a specific merchant, that asymmetry is worth investigating. The BINs that card testers target are often ones where the issuer has recently tightened fraud controls — meaning the authorization rate drop is a real signal, not random noise.

The Organizational Problem: Why Detection Often Lags

Most payment processors know their rule-based systems are insufficient for modern card testing. The bottleneck isn't knowledge — it's organizational. Fraud operations teams are resource-constrained. Updating rules requires review cycles. Adding a new ML feature requires data engineering time. The attack operations evolve faster than the internal process allows the defenses to adapt.

The fraud teams that are keeping pace are the ones that have reduced the cycle time between "we see a new attack pattern" and "we have a detection update deployed." That requires infrastructure where threshold changes deploy in minutes, not weeks, and where new features can be tested against historical data before pushing to production. The models matter less than the ability to iterate on them faster than the attackers iterate on their operations.

What to Check in Your Own Fraud Stack

If you're assessing your current card-testing detection, the questions worth asking: Does your scoring system look at cross-transaction patterns on a sliding window beyond 60 minutes? Do you have BIN-level distribution monitoring, or are you only tracking per-card or per-IP velocity? Do you have graph-based detection that correlates devices across multiple cards? How long does it take to deploy a threshold change — hours or weeks? Can you tell, right now, whether a card-testing campaign is running against any merchant on your network?

If the answer to any of those is "no" or "weeks," there's a gap worth closing. The attacks that are costing processors the most in 2025 are the ones that exploit exactly those gaps — not through technical sophistication alone, but through patience and calibration against detection limits that don't move fast enough to catch them.

How Card-Testing Attacks Evolved in 2025 — and What Detection Actually Requires