Margin of Error in PSG Respiratory and EEG Data
Polysomnography does not have a single defined "margin of error" in the traditional sense; instead, reliability is measured through interscorer agreement, which varies substantially by the specific parameter being scored and the scoring methodology used.
Respiratory Data Reliability
Interscorer Reliability for Respiratory Events
The reliability of scoring respiratory events depends critically on several technical and physiological factors 1:
- Apnea scoring achieves excellent interscorer agreement (0.91 using nasal pressure sensors) 1
- Hypopnea scoring shows only moderate agreement (0.69), with significantly more variability than apnea detection 1
- Flow limitation events demonstrate the poorest reliability (0.64), with one scorer potentially identifying 35% more events than another 1
Factors Affecting Respiratory Scoring Accuracy
The degree of scoring variability is influenced by 1:
- Duration of the respiratory event
- Degree of amplitude reduction in measured signals
- Level of oxyhemoglobin desaturation associated with events
- Presence and duration of accompanying arousals
Hypopneas associated with 2-5% desaturations can achieve interscorer reliability of 0.90, but this drops substantially when smaller desaturations or more subtle airflow reductions are involved 1.
Impact of Scoring Methodology
The choice of sensors and scoring approach significantly affects measurement accuracy 1, 2:
- Manual scoring versus automated scoring: Manual scoring achieves better agreement with PSG (kappa = 0.54) compared to automated scoring (kappa = 0.10) 1
- Mean differences: Manual scoring shows mean differences of 3.5 ± 5.3 events/hour versus 10.7 ± 8.5 events/hour with automated scoring 1
- One study reported automated scoring differed by an average of 9 events per hour compared to 2 events per hour with manual scoring 1
Sensor Selection Impact
The type of airflow sensor used affects detection sensitivity 2:
- Combined nasal pressure + thermal sensor (NP+Th) detects the most respiratory events overall 2
- For mild-moderate OSA (AHI <50): NP+Th detects significantly more events than either sensor alone, with NP alone detecting 54% and thermal sensor alone detecting only 42% of matched events (P<0.005) 2
- For severe OSA (AHI >50): All three methods (NP+Th, NP alone, Th alone) detect approximately 90% of events with similar reliability 2
EEG Data Reliability
Sleep Stage Scoring
Sleep staging demonstrates excellent overall reliability but varies by specific stage 3:
- Overall interscorer and intrascorer reliability: Kappa statistics >0.80 (excellent) 3
- Stage 3/4 (deep sleep): Most reliably discriminated 3
- Stage 1 sleep: Shows the greatest scoring discrepancies 3
- Arousal index: Moderately reliable with intraclass correlation (ICC) of 0.54 3
Respiratory Disturbance Indices
The reliability of calculated indices depends on the definition used 3:
- RDI with 2-5% desaturation criteria: Highly reliable (ICC >0.90) 3
- RDI without desaturation or arousal criteria: Moderately reliable (ICC = 0.74) 3
- RDI with arousal criteria added: Minimal improvement in reliability (ICC = 0.77) 3
Clinical Implications and Quality Control
Recommended Standards
To minimize variability, the American Academy of Sleep Medicine recommends 1:
- Use of AASM-endorsed sensors (oronasal thermal sensor, nasal pressure transducer, respiratory inductance plethysmography) 1
- Manual scoring or manual editing of automated scoring by skilled personnel 1
- Review of raw data by board-certified sleep specialists 1
- Consistent scoring criteria following published AASM standards 1
Common Pitfalls
Be aware that automated scoring software is frequently modified with version updates, making published accuracy data potentially obsolete for current commercial versions 1. Additionally, in patients with marked respiratory muscle weakness, true obstructive events may be incorrectly scored as "central" on external sensors, requiring esophageal pressure monitoring for accurate classification 1, 4.
False Negative Rates
False negative rates for portable monitoring may reach 17% in unattended studies of high pretest probability patients, necessitating in-laboratory PSG when portable monitoring is technically inadequate or negative despite high clinical suspicion 1.