What is the margin of error for respiratory and electroencephalogram data on a polysomnography?

Medical Advisory BoardAll articles are reviewed for accuracy by our Medical Advisory Board
Educational purpose only • Exercise caution as content is pending human review
Article Review Status
Submitted
Under Review
Approved

Last updated: February 12, 2026View editorial policy

Personalize

Help us tailor your experience

Which best describes you? Your choice helps us use language that's most understandable for you.

Margin of Error in PSG Respiratory and EEG Data

Polysomnography does not have a single defined "margin of error" in the traditional sense; instead, reliability is measured through interscorer agreement, which varies substantially by the specific parameter being scored and the scoring methodology used.

Respiratory Data Reliability

Interscorer Reliability for Respiratory Events

The reliability of scoring respiratory events depends critically on several technical and physiological factors 1:

  • Apnea scoring achieves excellent interscorer agreement (0.91 using nasal pressure sensors) 1
  • Hypopnea scoring shows only moderate agreement (0.69), with significantly more variability than apnea detection 1
  • Flow limitation events demonstrate the poorest reliability (0.64), with one scorer potentially identifying 35% more events than another 1

Factors Affecting Respiratory Scoring Accuracy

The degree of scoring variability is influenced by 1:

  • Duration of the respiratory event
  • Degree of amplitude reduction in measured signals
  • Level of oxyhemoglobin desaturation associated with events
  • Presence and duration of accompanying arousals

Hypopneas associated with 2-5% desaturations can achieve interscorer reliability of 0.90, but this drops substantially when smaller desaturations or more subtle airflow reductions are involved 1.

Impact of Scoring Methodology

The choice of sensors and scoring approach significantly affects measurement accuracy 1, 2:

  • Manual scoring versus automated scoring: Manual scoring achieves better agreement with PSG (kappa = 0.54) compared to automated scoring (kappa = 0.10) 1
  • Mean differences: Manual scoring shows mean differences of 3.5 ± 5.3 events/hour versus 10.7 ± 8.5 events/hour with automated scoring 1
  • One study reported automated scoring differed by an average of 9 events per hour compared to 2 events per hour with manual scoring 1

Sensor Selection Impact

The type of airflow sensor used affects detection sensitivity 2:

  • Combined nasal pressure + thermal sensor (NP+Th) detects the most respiratory events overall 2
  • For mild-moderate OSA (AHI <50): NP+Th detects significantly more events than either sensor alone, with NP alone detecting 54% and thermal sensor alone detecting only 42% of matched events (P<0.005) 2
  • For severe OSA (AHI >50): All three methods (NP+Th, NP alone, Th alone) detect approximately 90% of events with similar reliability 2

EEG Data Reliability

Sleep Stage Scoring

Sleep staging demonstrates excellent overall reliability but varies by specific stage 3:

  • Overall interscorer and intrascorer reliability: Kappa statistics >0.80 (excellent) 3
  • Stage 3/4 (deep sleep): Most reliably discriminated 3
  • Stage 1 sleep: Shows the greatest scoring discrepancies 3
  • Arousal index: Moderately reliable with intraclass correlation (ICC) of 0.54 3

Respiratory Disturbance Indices

The reliability of calculated indices depends on the definition used 3:

  • RDI with 2-5% desaturation criteria: Highly reliable (ICC >0.90) 3
  • RDI without desaturation or arousal criteria: Moderately reliable (ICC = 0.74) 3
  • RDI with arousal criteria added: Minimal improvement in reliability (ICC = 0.77) 3

Clinical Implications and Quality Control

Recommended Standards

To minimize variability, the American Academy of Sleep Medicine recommends 1:

  • Use of AASM-endorsed sensors (oronasal thermal sensor, nasal pressure transducer, respiratory inductance plethysmography) 1
  • Manual scoring or manual editing of automated scoring by skilled personnel 1
  • Review of raw data by board-certified sleep specialists 1
  • Consistent scoring criteria following published AASM standards 1

Common Pitfalls

Be aware that automated scoring software is frequently modified with version updates, making published accuracy data potentially obsolete for current commercial versions 1. Additionally, in patients with marked respiratory muscle weakness, true obstructive events may be incorrectly scored as "central" on external sensors, requiring esophageal pressure monitoring for accurate classification 1, 4.

False Negative Rates

False negative rates for portable monitoring may reach 17% in unattended studies of high pretest probability patients, necessitating in-laboratory PSG when portable monitoring is technically inadequate or negative despite high clinical suspicion 1.

Professional Medical Disclaimer

This information is intended for healthcare professionals. Any medical decision-making should rely on clinical judgment and independently verified information. The content provided herein does not replace professional discretion and should be considered supplementary to established clinical guidelines. Healthcare providers should verify all information against primary literature and current practice standards before application in patient care. Dr.Oracle assumes no liability for clinical decisions based on this content.

Have a follow-up question?

Our Medical A.I. is used by practicing medical doctors at top research institutions around the world. Ask any follow up question and get world-class guideline-backed answers instantly.