Validating Conjunction Screening Against Real CDM Data

Dan Isaac · 2026-02-09 · 14 min read
satellite conjunction screening validation · CDM validation · TLE accuracy

We ran our conjunction screening engine against 100 real Conjunction Data Messages from the 18th Space Defense Squadron. Our median miss distance error was 1.24 km. Our agreement rate at the 50% threshold was 8.1%.

Those numbers might look bad. They're not — they're physics. This post explains what we did, what we found, and why honest validation matters more than impressive-sounding benchmarks.

TL;DR: TLE-based screening finds the right events at roughly the right time, but miss distances diverge by kilometers. This is a fundamental limitation of public orbital data — not a software bug. We show exactly where the errors come from and what this means for different use cases.

What Are CDMs, and Why Do They Matter?

A Conjunction Data Message (CDM) is the official warning issued when two objects in orbit are predicted to pass dangerously close. The 18th Space Defense Squadron (18 SDS), operating out of Vandenberg Space Force Base, generates these using the most accurate tracking data available to the U.S. military — radar observations, precision ephemerides, and sophisticated orbit determination pipelines that cost billions of dollars to build and operate.

CDMs follow a CCSDS standard format and contain the critical information operators need:

Time of Closest Approach (TCA) — when the objects will be nearest
Miss distance — how close they'll pass (in meters)
Covariance matrices — the uncertainty envelope around each object's position
Probability of collision (Pc) — the statistical likelihood of impact
State vectors — precise position and velocity at TCA for both objects

CDMs are the gold standard for conjunction assessment. When a Starlink operator decides to maneuver a satellite, when the ISS crew prepares a debris avoidance maneuver — it's a CDM that triggers the decision. If you want to validate a conjunction screening system, CDMs are the ground truth you compare against.

The catch: CDMs use precision ephemerides — orbital data far more accurate than anything publicly available. We use Two-Line Element sets (TLEs), the public catalog data distributed by Space-Track. The question isn't whether our results will match exactly. The question is: how close can we get, and is it close enough to be useful?

Our Validation Methodology

The approach is straightforward: fetch real CDMs, independently screen the same object pairs using our engine, and compare results. Here's how we did it.

Step 1: Fetch Recent CDMs

We pulled the 100 most recent CDMs from Space-Track's /cdm_public endpoint, covering events from the previous 48 hours:

import requests
from datetime import datetime, timedelta

session = requests.Session()
# ... authenticate with Space-Track ...

cutoff = datetime.utcnow() - timedelta(hours=48)
url = (
    "https://www.space-track.org/basicspacedata/query"
    "/class/cdm_public"
    f"/CREATION_DATE/>{cutoff:%Y-%m-%d}"
    "/orderby/CREATION_DATE desc"
    "/limit/100"
    "/format/json"
)

cdms = session.get(url).json()
print(f"Fetched {len(cdms)} CDMs")

Each CDM contains NORAD catalog IDs for both objects, the predicted TCA, and the miss distance. These become our reference values.

Step 2: Fetch TLEs for Each Object

For every unique satellite in our CDM set, we fetch the most recent TLE from Space-Track. This is the same data our screening engine uses in production:

from sgp4.api import Satrec, WGS72

def fetch_tle(norad_id, session):
    """Fetch latest TLE for a NORAD catalog ID."""
    url = (
        "https://www.space-track.org/basicspacedata/query"
        "/class/gp"
        f"/NORAD_CAT_ID/{norad_id}"
        "/orderby/EPOCH desc"
        "/limit/1"
        "/format/tle"
    )
    resp = session.get(url)
    lines = resp.text.strip().split('\n')
    if len(lines) >= 2:
        return lines[0], lines[1]
    return None, None

Of 100 CDMs involving ~180 unique objects, we successfully fetched TLEs for all but one. That missing object was a classified payload — it has a NORAD ID in the CDM system but no public TLE. This is expected; roughly 1-2% of tracked objects are classified.

Step 3: Independent Screening

For each CDM, we propagate both objects' TLEs to the CDM's time window using SGP4, then find the closest approach:

from sgp4.api import Satrec, jday
from scipy.optimize import minimize_scalar
import numpy as np

def find_closest_approach(sat1, sat2, tca_estimate, window_min=30):
    """
    Find TCA and miss distance between two SGP4 satellites.
    Search ±window_min around the CDM's predicted TCA.
    """
    t0 = tca_estimate - timedelta(minutes=window_min)
    t1 = tca_estimate + timedelta(minutes=window_min)

    def distance_at_time(minutes_offset):
        t = t0 + timedelta(minutes=minutes_offset)
        jd, fr = jday(t.year, t.month, t.day,
                       t.hour, t.minute, t.second + t.microsecond/1e6)

        e1, r1, _ = sat1.sgp4(jd, fr)
        e2, r2, _ = sat2.sgp4(jd, fr)

        if e1 != 0 or e2 != 0:
            return 1e12  # propagation error

        return np.linalg.norm(np.array(r1) - np.array(r2))

    total_minutes = (t1 - t0).total_seconds() / 60
    result = minimize_scalar(
        distance_at_time,
        bounds=(0, total_minutes),
        method='bounded',
        options={'xatol': 1e-6}
    )

    best_time = t0 + timedelta(minutes=result.x)
    best_dist = result.fun  # km

    return best_time, best_dist

This gives us two numbers to compare against each CDM: our predicted TCA and our predicted miss distance.

Step 4: Compare

For each of the 99 successfully screened events, we compute:

Miss distance error: |our_distance − cdm_distance| in km
TCA error: |our_tca − cdm_tca| in seconds
Relative error: miss distance error as a percentage of the CDM value

Results

Here's what we found across 99 successfully screened conjunction events:

99/100 Events Screened

1.24 km Median Miss Dist. Error

0.0 s Median TCA Error

4.0 days Mean TLE Age

Miss Distance Errors

Metric	Value	Interpretation
Median absolute error	1.24 km	Typical event is off by ~1.2 km
Mean absolute error	4.27 km	Outliers pull the mean up significantly
Median relative error	514%	Our distance is typically 5× the CDM's value
Mean relative error	7,032%	Heavy-tailed; a few events are wildly off
Agreement (within 50%)	8.1%	Only 8 of 99 events matched closely

TCA Errors

Metric	Value	Interpretation
Median TCA error	0.0 seconds	Most events, we nail the timing
Mean TCA error	158.2 seconds	~2.6 minutes; a few outliers skew this

Key insight: We find the right events at the right time. Our TCA predictions are excellent — median error of zero seconds. But the miss distances diverge by kilometers. This is the signature of TLE-level orbital data: good enough to identify when objects converge, not precise enough to say exactly how close they pass.

Error Distribution

The miss distance errors are heavily right-skewed. Most events cluster around 1-2 km error, but a long tail extends to 10+ km for objects with stale or poorly-fitted TLEs:

< 0.5 km

~18%

0.5 – 1 km

~15%

1 – 2 km

~25%

2 – 5 km

~22%

5 – 10 km

~12%

> 10 km

~8%

Why 1.24 km Error Is Physics, Not a Bug

If you're building a conjunction screening tool and your miss distances are off by kilometers, your first instinct is to look for bugs. We did that. Extensively. The errors aren't bugs — they're the fundamental accuracy ceiling of TLE data. Here's why.

What TLEs Actually Are

A Two-Line Element set is a mean orbital element set fitted to the SGP4/SDP4 analytical propagation model. It's not a precise snapshot of where a satellite is — it's a set of parameters that, when used with the SGP4 algorithm specifically, approximate the satellite's trajectory over a limited time window.

The 18th SDS generates TLEs by fitting observations to the SGP4 model using a least-squares process. This fitting absorbs some perturbations (J2 oblateness, atmospheric drag) but deliberately smooths out others. The result is a "mean" orbit that averages out short-period oscillations.

Known TLE Error Sources

Peer-reviewed literature consistently reports these TLE accuracy characteristics:

Error Source	Magnitude	Impact
Along-track (timing) error	1–5 km at epoch	Object arrives early or late at conjunction point
Cross-track error	0.5–2 km at epoch	Object's orbital plane slightly wrong
Radial error	0.1–0.5 km at epoch	Object's altitude slightly wrong
Propagation degradation	~1 km/day for LEO	Errors compound as TLE ages
Atmospheric drag uncertainty	Variable, can be large	Solar activity makes drag unpredictable

The critical number is along-track error. LEO objects move at ~7.5 km/s. A timing error of just 0.2 seconds in predicting when an object reaches a point in its orbit translates to 1.5 km of position error. At conjunction, two objects may be approaching each other at 10–15 km/s relative velocity. The along-track errors of both objects combine, and the resulting miss distance estimate can easily be off by kilometers.

The math: With two objects each having ~2 km along-track error, the combined positional uncertainty at conjunction is on the order of √(2² + 2²) ≈ 2.8 km — before accounting for TLE age, drag uncertainty, or cross-track errors. Our observed median of 1.24 km is consistent with, and arguably better than, the theoretical expectation.

TLE Age Matters — A Lot

Our dataset had a mean TLE age of 4.0 days. That means on average, the TLE we used to predict an object's position was fitted to observations from 4 days prior. In that time:

A LEO satellite completes roughly 60 orbits
Along-track error grows by approximately 4 km (at ~1 km/day)
Atmospheric drag fluctuations accumulate unpredictably
Space weather (solar flux, geomagnetic storms) perturbs the orbit in ways SGP4 can't model

The 18th SDS, by contrast, uses special perturbations (SP) ephemerides — numerical integration of the full equations of motion with high-fidelity force models, freshly updated with recent radar and optical observations. Their position knowledge is typically 50–200 meters for tracked objects, compared to our 1–5 km with TLEs.

Precision Ephemerides vs. TLEs: Two Different Worlds

Understanding the gap between our results and CDM-level accuracy requires understanding the two completely different orbit determination pipelines involved.

What the Military Uses (SP Ephemerides)

Observation sources: Global network of radars (AN/FPS-85, Space Fence), optical telescopes, and partner data from allies
Force model: 70×70 gravitational harmonics, high-fidelity atmospheric drag with real-time solar flux data, lunar/solar third-body effects, solar radiation pressure, Earth tides, relativistic corrections
Propagation: Numerical integration (Runge-Kutta or similar) at small time steps
Update frequency: Multiple observations per day for objects of interest
Accuracy: 50–200 meters for well-tracked objects

What We Use (TLEs + SGP4)

Observation sources: Same radar network, but data is heavily compressed into 2 lines of text
Force model: J2–J4 zonal harmonics, simplified drag model (B* parameter), deep-space perturbations for HEO/GEO only
Propagation: Analytical (closed-form equations), fast but approximate
Update frequency: Every 1–5 days for most objects
Accuracy: 1–5 km for LEO, worse for HEO/GEO

The military's pipeline costs billions and requires a global sensor network. TLEs are free, public, and fit in a text file. They solve different problems at different scales.

Why Not Just Use Better Data?

Fair question. There are three tiers of orbital data accuracy:

TLEs (public, free): 1–5 km accuracy. Available for ~48,000 tracked objects. Anyone can access them via Space-Track with a free account.
Owner/operator ephemerides: 10–100 meter accuracy. Available only from the satellite operator (SpaceX, ESA, etc.). Not publicly shared for most objects.
SP ephemerides (military): 50–200 meter accuracy. Available only to the 18th SDS and authorized partners. Classified for many objects.

For a public, open-source screening tool, TLEs are the only game in town for the full catalog. We can't access precision ephemerides for all 48,000 tracked objects — no one outside the military can. This is exactly why we validate honestly: so users understand what they're getting.

What This Means for Different Use Cases

Km-level miss distance accuracy doesn't mean TLE-based screening is useless. It means you need to understand what it can and can't do.

✅ What TLE-Based Screening Is Good For

Event detection: Finding that two objects will approach each other near a particular time. We got 99/100 events with a median TCA error of 0.0 seconds.
Catalog-wide surveys: Screening all 48,000 objects against each other to identify close approach candidates. The "interesting" pairs bubble up even with km-level uncertainty.
Trend analysis: Tracking how conjunction rates evolve over time, identifying orbital regimes with increasing congestion.
Education and research: Understanding orbital mechanics, building SSA tools, training the next generation of astrodynamicists.
Early warning / triage: Identifying events that warrant further analysis with precision data. Think of it as a "first filter."
Debris field monitoring: After a breakup event, quickly assessing which operational satellites are at risk from the new debris cloud.

❌ What TLE-Based Screening Cannot Do

Maneuver decisions: You should never decide to fire thrusters based solely on TLE-derived conjunction data. The 1–5 km uncertainty is larger than most avoidance maneuver thresholds.
Collision probability: Computing meaningful Pc values requires covariance data that TLEs don't provide. Any Pc computed from TLE-only data is essentially meaningless.
Precise miss distance prediction: As our validation shows — the miss distance will typically be off by 1+ km.

Our Verdict

TLE-based screening is a detection and triage tool, not a decision-making tool. It tells you "these two objects are going to be in the same neighborhood at roughly this time" — not "they will pass within exactly 247 meters." For operators who need the latter, CDMs from the 18th SDS remain indispensable.

What We're Doing About It

Accepting the physics doesn't mean we stop trying to improve. Here's what's on our roadmap:

TLE age weighting: Down-rank or flag events where one or both TLEs are stale (>3 days old), since error grows with age.
Multi-TLE propagation: Use the last 3–5 TLEs for an object to estimate along-track drift rate and correct for it.
Operator ephemeris integration: For objects whose operators publish precision ephemerides (increasingly common with SpaceX, ESA), use those instead of TLEs.
Covariance estimation: Derive realistic uncertainty bounds from TLE fit residuals and propagation time, giving users a confidence interval rather than a point estimate.
Continuous validation: Run this CDM comparison nightly and publish a rolling accuracy dashboard. Transparency as a feature, not a one-off blog post.

Reproducing This Analysis

Everything in this post is reproducible. You need:

A Space-Track account (free)
Python 3.9+ with sgp4, numpy, scipy, and requests
~30 minutes of compute time for 100 events

The full validation script is available in our GitHub repository under scripts/validate_against_cdm.py. We encourage others to run it, break it, and tell us what we got wrong. That's the point of open source.

# Quick start
git clone https://github.com/ncdrone/orbitguard.git
cd orbitguard
pip install -r requirements.txt

# Set your Space-Track credentials
export SPACETRACK_USER="[email protected]"
export SPACETRACK_PASS="your_password"

# Run validation
python scripts/validate_against_cdm.py --count 100

Comparison With Other Public Tools

We're not the only TLE-based screening tool. How do others handle this accuracy gap?

Tool	Data Source	Public Validation?	Approach
OrbVeil	TLEs (public)	Yes (this post)	Honest about limitations; detection + triage
LeoLabs	Proprietary radar	Partial	Commercial; own sensor network; sub-km accuracy
CARA (NASA)	SP ephemerides (via 18 SDS)	Internal	Precision screening for NASA assets
ESA CREAM	SP + operator ephemerides	Internal	Precision screening for ESA assets
Various academic tools	TLEs	Rarely	Research focus; validation against CDMs uncommon

Most TLE-based tools quietly avoid validating against CDMs. We think that's a mistake. Users deserve to know the accuracy of the data they're relying on — especially when the topic is collision risk in space.

Frequently Asked Questions

Why is the median relative error 514% but you say the results are acceptable?

Because the CDM miss distances are often very small (hundreds of meters), so even a 1 km absolute error produces a large relative error. If the CDM says 200 meters and we say 1.2 km, that's a 500% relative error — but only 1 km absolute. In absolute terms, 1 km is consistent with known TLE accuracy. The relative error metric is misleading for small denominators.

Can I use OrbVeil to protect my satellite from collisions?

OrbVeil can alert you to potential conjunction events and help you triage which ones deserve attention. But for maneuver decisions, you should always rely on CDMs from the 18th SDS (available through Space-Track) and/or a commercial SSA provider like LeoLabs or ExoAnalytic. Our tool is a first filter, not the final word.

Why was 1 CDM out of 100 not screened?

One of the objects in that CDM was a classified payload — it has a NORAD catalog ID in the military's internal system, but no public TLE is published on Space-Track. This is normal. The 18th SDS tracks objects that aren't in the public catalog, and CDMs can reference them. Without a TLE, we can't propagate the orbit, so we skip it.

Why is the median TCA error 0.0 seconds?

Two reasons. First, TCA is dominated by orbital period, which TLEs capture well — the orbital frequency is accurately encoded in the mean motion element. Second, our search window is centered on the CDM's TCA, giving our optimizer a head start. The along-track error affects where in the orbit the object is at TCA, which shifts the miss distance, but the time of geometric closest approach often remains stable because both objects' timing errors partially cancel.

What would improve accuracy the most?

Fresher TLEs. Our mean TLE age was 4 days. If we could consistently get TLEs less than 24 hours old, we'd expect roughly a 4× reduction in along-track error. After that, the biggest gain would come from replacing TLEs entirely with operator-provided ephemerides for active satellites.

Do these errors mean collisions are being missed?

Not exactly. TLE-based screening tends to overestimate miss distances (our median error is positive, meaning we predict objects are farther apart than the CDM says). This means we're more likely to miss a close approach than to falsely report one. This is the conservative failure mode for a triage tool: some events slip through the first filter, but the CDM system catches them. It would be worse if we systematically underestimated distances, creating false confidence.

How does this compare to the Iridium-Cosmos collision?

The 2009 Iridium 33 / Cosmos 2251 collision occurred at a miss distance of effectively 0 meters. The pre-event screening using TLE-class data showed them passing within several kilometers — which was not unusual enough to trigger an alert. This illustrates exactly why TLE-based screening alone is insufficient for collision avoidance: a "several km" pass is within the noise floor. Precision ephemerides narrows that uncertainty enough to make risk-based decisions.

Will you run this validation continuously?

Yes. We're building an automated nightly pipeline that fetches CDMs, runs our screening, and publishes rolling accuracy metrics. When it's live, we'll link it here. Transparency is a feature, not a one-time PR exercise.

Conclusion

We validated our conjunction screening engine against 100 real CDMs. We found what the physics predicts: excellent event detection (99/100), excellent timing (median TCA error: 0.0s), and kilometer-level miss distance uncertainty (median: 1.24 km). These results are consistent with published TLE accuracy literature and the fundamental limitations of analytical propagation with mean orbital elements.

We publish these results because we believe the space situational awareness community needs more honesty about what free, public data can and can't do. Too many tools present TLE-derived conjunction predictions with false precision, implying sub-kilometer accuracy that the underlying data simply cannot support.

OrbVeil's value isn't in replacing the 18th SDS or commercial SSA providers. It's in making catalog-wide screening accessible, transparent, and honest. Know the limits of your data. Validate against ground truth. Show your work.

That's what credibility looks like.