Home

The Real Cost of Over-Retention: How Bad Data Hygiene IsKilling Your Legal Hold Strategy

Insight

June 29, 2026

Keeping everything just in case feels responsible. In legal circles, it has been elevated almost to
doctrine – a reflexive instinct that hoarding data is the safe choice, that more preserved is more
protected, and that deletion is a risk no cautious organization should take.

But that instinct could be wrong. And it is costing organizations far more than they realize.
Here is the question most legal teams have not seriously asked: what if the biggest threat to
your next litigation is not what you deleted – but everything you kept?

What if the biggest threat to your next litigation is not what you deleted, but everything you
kept?

The Assumption That Is Costing You Millions

Research from ISACA puts a striking number on what organizations are paying to hold onto
data they could legally delete: as much as $34 million in excess retention costs per
organization. That figure spans infrastructure – hardware, software, and cloud storage – but it
only begins to capture the real exposure. Layered on top are the collection, processing, and
review costs that multiply every time legal teams must work through years of irrelevant,
duplicated, and stale data to find what actually matters in a matter.

The deeper problem is not the financial line item. It is that over-retained data is not neutral. It is
an active liability. Every redundant email thread, every auto-generated Copilot summary, every
undisposed Slack conversation sitting in storage past its useful life is a document that opposing
counsel can request, that regulators can audit, and that your legal team must account for when
a hold is triggered. Volume without governance is not protection. It is exposure dressed up as
caution.

Where the "Keep Everything" Instinct Came From

It is worth being fair to how this mindset developed, because it was not irrational. Early
eDiscovery case law created real and immediate consequences for spoliation. Organizations
that deleted data too aggressively faced sanctions, adverse inference instructions, and in some
cases, catastrophic litigation outcomes. When the legal risk of deletion was concrete and the
cost of storage was falling, the calculus pointed clearly toward keeping everything.

That has fundamentally changed. A single online meeting now generates a recording, a
transcript, an AI-produced summary, and a set of action items – each stored across different
systems with different retention defaults. That is before accounting for the Slack threads
debating the agenda beforehand, the Teams channel where follow-ups were posted, the Copilotoutputs summarizing the discussion, and the email chain distributing the notes. Every meeting.
Every day. Across thousands of employees.

At this scale, broad preservation is no longer a strategy. It is an abdication of governance. When
everything is preserved, nothing is truly managed – and the legal hold that was supposed to
protect the organization becomes impossible to define, apply, or defend with any precision.
When everything is preserved, nothing is truly managed. Volume without governance is not
protection – it is exposure.

What Over-Retention Is Actually Costing You

The costs of over-retention land in four places, and most organizations are only tracking one of them. The financial costs are the most visible. Cloud storage and infrastructure expenses
compound as data volumes grow without corresponding deletion. Collection and processing
fees in litigation escalate when legal teams must comb through bloated repositories to identify
what is actually responsive. Review costs multiply when AI-powered tools are fed low-quality, redundant data sets that degrade their performance and require more human intervention to compensate.

The litigation exposure is less obvious but more dangerous. When data is not mapped or
managed, preservation becomes difficult to defend. Opposing counsel gains a broader target –
more data means more threads to pull, more inconsistencies to surface, more discovery
disputes to initiate. And over-broad preservation sends a signal to courts and regulators that the organization does not have a disciplined governance program – which is the opposite of the
good faith posture most legal teams are trying to project.

The regulatory and privacy risk is increasingly unavoidable. 2As of April 2026, 20 US states
have comprehensive consumer privacy laws in effect. Nearly all of them require documented
retention periods and enforceable deletion schedules. Under GDPR, retaining personal data
beyond the purpose for which it was collected is not just risky – it is a violation. Organizations that cannot demonstrate they are deleting data in accordance with their stated policies are not
just exposed in litigation. They are non-compliant by design.

The operational costs are quieter but persistent. Legal teams spend time and attention parsing
stale data instead of building case strategy. Excess data is difficult to mine for insights. And the AI-powered tools that are supposed to accelerate review and early case assessment perform
measurably worse when trained on or applied to bloated, low-signal data sets. Over-retention
does not just increase costs. It degrades capability.

The Shift: From Preserve Everything to Preserve Precisely

The answer to over-retention is not reckless deletion. It is disciplined, documented, policy-driven disposal – what the industry now calls defensible deletion. The distinction matters enormously
and it is worth being precise about it.

Defensible deletion is not the absence of preservation. It is the systematic application of
documented retention schedules, with documented sign-off, producing an auditable record that
the organization knew what it had, applied its policies consistently, and disposed of data in
accordance with legal and regulatory requirements. Done correctly, it is a stronger position than
indiscriminate retention – because it demonstrates governance rather than exposing its
absence.

Three principles anchor this shift in practice.

The first is knowing what you have.

Data mapping and classification are prerequisites, not optional enhancements. You cannot
define a hold, apply a retention schedule, or make a defensible deletion decision about data you have not located and characterized. The inventory comes first.

The second is defining what you need.

Retention schedules should be built around actual legal, regulatory, and operational
requirements – not around fear, organizational inertia, or the path of least resistance. That
means specifying retention periods by data category, aligning them to applicable obligations,
and reviewing them on a cadence that keeps pace with changing regulations and business
practices.

The third is deleting what you do not need.

Configuring tiered retention aligned to legal holds, expiring data that has exceeded its retention period, and documenting every disposal action with cross-functional sign-off is not a risk to be managed. It is a technical control that modern compliance programs depend on. The documentation of what was deleted, when, under what policy, and with whose authorization is what turns deletion from a liability into a defense.

What Good Looks Like

A data hygiene program that is genuinely protecting a legal hold strategy has a few identifiable characteristics – and none of them involve simply writing a policy and filing it away.

The retention schedule is documented, enforced, and auditable. Not aspirational. Policies that
exist only on paper provide no defense and create a credibility problem when an organization
claims it follows them.

Legal hold scope can be defined and custodians identified in hours, not days. This is only
possible when data is mapped and classified in advance of a matter. Organizations that start the mapping exercise when the hold is triggered are already behind.
Data locations are fully documented across on-premises storage, cloud platforms, collaboration
tools, and backup environments. The organization knows where its data lives before anyone
asks.

Holds are applied automatically and consistently, not dependent on individual employee action. Preservation is triggered by the system, not by hoping the right people received and understood the right email.

Legal, IT, and compliance review hold status on a regular cadence – not only when a matter
surfaces. Governance is ongoing, not reactive.
And deletion is defensible: logged, policy-aligned, and signed off across functions. The
organization can demonstrate not just that it deletes data, but that it deletes it correctly.

The Real Risk Is What You Kept

The fear that drove the keep-everything instinct was real. The landscape that made it
reasonable has changed completely. Data volumes have scaled beyond what broad
preservation can manage. Regulatory requirements have multiplied. AI tools have raised the
stakes on data quality. And the litigation exposure of over-retention has grown to match – and in
many cases exceed – the exposure of thoughtful, documented deletion.

The organizations that are best positioned for their next litigation are not the ones with the most
data. They are the ones that know exactly what they have, why they have it, and what they
deleted – and can prove all three.

Keeping everything is not a safety net. For most organizations in 2026, it is the risk.

What is over-retention and why is it a legal risk?

Over-retention is the practice of keeping data beyond its legal, regulatory, or operational need –
typically driven by a “keep everything just in case” instinct. It is a legal risk because retained
data is discoverable data. The more data an organization holds beyond its useful life, the
broader the target it presents to opposing counsel, regulators, and auditors. Over-retention also
signals poor governance, which can undermine an organization’s credibility when defending its
preservation practices in litigation.

How much does over-retention actually cost organizations?

Research from ISACA estimates that organizations can spend as much as $34 million
managing data they could legally delete. That figure covers infrastructure, cloud storage, and
software costs, but does not fully account for the downstream costs in litigation: inflated
collection, processing, and review fees when legal teams must work through years of irrelevant
data. The total cost – financial and legal – is almost always significantly higher than organizations
estimate.

Is deleting data not just as risky as keeping it?

Not when deletion is defensible. Defensible deletion means disposing of data pursuant to a
documented, consistently applied retention policy – with an audit trail that records what wasdeleted, when, under what authority, and with whose sign-off. Courts and regulators distinguish between policy-driven deletion and reckless destruction. A well-documented deletion program is a stronger position than indiscriminate retention, because it demonstrates that the organization governs its data rather than simply accumulating it.

What is a retention schedule and how should one be built?

A retention schedule is a documented policy that defines how long specific categories of data
should be kept before being deleted or archived. It should be built around actual legal and
regulatory obligations – not organizational habit or fear. Each data category should have a
defined retention period tied to a specific requirement: a statutory limitation period, a regulatory
mandate, or a documented operational need. Schedules should be reviewed regularly and
updated when regulations change or new data sources are introduced.

How does over-retention affect legal holds specifically?

Over-retention expands the scope of every legal hold. When an organization has not applied
disciplined retention schedules, a hold triggered today may need to cover years of data across
dozens of platforms – much of it redundant, stale, or entirely irrelevant to the matter. This makes
preservation harder to define, slower to apply, more expensive to review, and more difficult to
defend. Organizations with strong data hygiene programs can scope, apply, and document
holds far more precisely and quickly.

What privacy regulations require data deletion?

As of April 2026, 20 US states have comprehensive consumer privacy laws in effect, most of
which require documented retention periods and enforceable deletion schedules for personal
data. GDPR requires that personal data be retained no longer than necessary for the purpose
for which it was collected – retaining it beyond that purpose is a violation, not just a risk.
Organizations that cannot demonstrate compliance with their stated retention policies face
regulatory exposure independent of any litigation.

Why does data quality matter for AI-powered review tools?

AI-powered review and early case assessment tools perform significantly better on clean,
well-governed data sets than on bloated repositories containing years of duplicated, irrelevant,
or low-signal content. Over-retained data degrades AI performance, increases the volume that
requires human review to compensate, and drives up costs. Organizations that invest in data
hygiene are not just reducing legal exposure – they are improving the quality and efficiency of
every AI-assisted workflow that depends on that data.

Where does Gemean fit into retention and data hygiene programs?

Gemean works with organizations to assess current data governance practices, identify
over-retention risks, and build retention and deletion frameworks that are defensible under
adversarial scrutiny. We approach retention not as a compliance checkbox but as a component
of a broader legal hold and eDiscovery strategy – one that affects how quickly and precisely an
organization can respond when a matter arises. Our forensic background means we understandboth the legal requirements and the technical architecture that makes defensible deletion possible.