Monday, March 19, 2012

Average, Median, 95% numbers: A guide to those who don't remember their introductory stats

Managing OSA with a full data CPAP/APAP machine involves a tremendous number of numbers. The indices (AHI, AI, CAI, OAI, HI, RERA I, etc) are best understood as the typical number of events of the given type that are expected to occur in one hour of sleep.. The computation of these numbers on a PSG is discussed in Understanding the data in your sleep test. The way our machines compute our overnight AHI (and other indices) is remarkably simple:
  • AHI = (number of events detected) / (total time the  machine was run)
  • AI = (number of apneas detected) / (total time the machine was run)
  • OAI = (number of obstructive apneas detected) / (total time the machine was run)
  • CAI = (number of clear airway apneas detected) / (total time the machine was run)
  • HI = (number of hypopneas detected) / (total time the machine was run)
But the way leak data and pressure data is presented in ResScan, Encore Viewer or Pro, and SleepyHead involves a bit of elementary statistics.  Resmed users see median and 95th percentile for both the leak data and the pressure data in ResScan and 95% numbers on the S9's LCD. PR users see average leak rates and 90th percentile pressure levels in Encore. Users of SleepyHead can choose between median and weighted average for "middle" computations.

And just what the heck to these numbers actually mean?

What they mean, of course, is based on how they are computed. This blog entry computation focuses on how these various numbers are computed for a given set of data. The differences between median and (weighted) average are best understood by looking at what are called discrete sets of data. A discrete set of data is nothing more than a list of numbers. Once the data becomes continuous the ideas are the same, but the computations become more complicated. So to try to make this easier to understand, I'll stick with examples with discrete data. Let's look at a specific example. Suppose our data set looks like this:
 8.4, 8.4, 8.4, 8.4, 8.4, 8.4, 8.4, 8.4, 8.4, 8.4, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 9.5, 10, 10, 10, 10, 10, 10.1, 10.1, 10.1, 10.1, 10.1, 10.1, 10.1, 10.1, 15.3, 15.3
 There are 100 numbers on this list.  The two 9's with the green background are the 50th and 51st numbers on the list.  The red 10, is the 90th number on the list.  And the blue 10.1 is the 95th number on the list..

The median for this set of numbers is 9 because half of the data is AT or BELOW this number and half of the data is AT or ABOVE this number.  (Technically speaking the median of this set of data is the average of the two 9 with the green backgrounds.).

The 90 percentile (90%)  for the data is 10  because 90% of the data is AT or BELOW 10 since the is the 10 is the 90th number on our list of 100 numbers..  And note that 10% of the data is AT or ABOVE 10 because the last ten numbers on the list are all at least as big as the 10  is.

The 95 percentile (95%)  for the data is 10.1  because 95% of the data is AT or BELOW 10.1 because the 10.1  is the 95th number on our list of 100 numbers..  And note that 5% of the data is AT or ABOVE 10.1 because the last ten numbers on the list are all at least as big as the 10.1  is.

The weighted average of the numbers is found by multiplying each number by the percentage of data points at that number. Since we've got 100 data points in our example, the percentage is just the number of times a given number appears in our list divided by 100.  If you count carefully, you find that the weighted average of this set of data is 9354, which is.found by this calculation:


(0.1 * 8.4) + (0.45 * 9) + (0.3 * 9.5) + (0.05 * 10) + (0.08 * 10.1) + (0.02 * 15.3) = 9.354

So for this set of data we get the following statistical numbers:
  • Median = 9
  • 90% = 10
  • 95% = 10.1
  • average = 9.354
A lot of people will assume that the average and the median are in the same neighborhood.  But that really depends on the spread of the data.  In general, the average can be less than the median OR it can equal the median OR it can be greater than the median.  Indeed, in data sets with a very limited number of distinct numbers where one low number predominates, the average can even be greater than the 90% or even the 95% numbers.  For example, consider this set of data with 100 points:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,  10, 10, 10

The 50th and 51st data data points are the pair of green highlighted 0's.  So the median is 0.

The 90th data point  is the red 0.  So the 90% number is also 0.
The 95th data point is the blue 1.  So the 95% number is 1.


But average is 3.4 because (0.91*0) + (0.4*1) + (0.3*10) = 3.4  and so the average is larger than even the 95% number.

Now to tie this directly back to CPAP numbers:


  • The median leak rate is 25 L/min:  This means that for 50% of the night, your leak rate was AT or BELOW 25 L/min.   And for 50% of the night, your leak rate was AT or ABOVE 25 L/min. (Whether a leak rate of 25 L/min is good or not depends on whether your machine is reporting total leaks or  unintentional leaks.  The PR System One machines report total leaks and 25 L/min is going to be very, very good for most masks at pressures of up to about 8 or 9 cm.  The Resmed machines report unintentional leaks and 25 L/min is above the Red Line of 24 L/min, which means the leak rate is bad.
  • The 90% leak rate is 35 L/min.  This means that for 90% of the night, your leak rate was AT or BELOW 35 L/min and for 10% of the night your leak rate was AT or ABOVE 35 L/min.
  • The 95% leak rate is 40 L/min. This means that for 95% of the night, your leak rate was AT or BELOW 40 L/min and for 5% of the night your leak rate was AT or ABOVE 40 L/min.
  • The average leak rate is 28.89 L/min.  Remember this tells you nothing about how much time you spent with a leak rate that was AT or BELOW 26.89 L/min because there is no concrete relationship between the average of a set of numbers and percentiles for the data.
Pressure numbers are similar for APAPs in Auto mode:
  • The median pressure is 9.5 cm H2O:  This means that for 50% of the night, your pressure was AT or BELOW 9 cm H2O..   And for 50% of the night, your pressure was AT or ABOVE 9.5 cm H2O
  • The 90% pressure is 11 cm H2O.  This means that for 90% of the night, your pressure was AT or BELOW 11 cm H2O and for 10% of the night your pressure was AT or ABOVE 11 cm H2O.
  • The 95% leak rate is 11.5 cm H2O. This means that for 95% of the night, your pressure was AT or BELOW 11.5 cm H2O and for 5% of the night your pressure was AT or ABOVE 11.5 cm H2O..
  • The average pressure is 9.65 cm H2O.  Remember this tells you nothing about how much time you spent with a pressure that was AT or BELOW 9.65 cm H2O because there is no concrete relationship between the average of a set of numbers and percentiles for the data.

5 comments:

  1. what's the meaning of Average Pressure in SleepyHead Statistics?

    See it in my post here:

    http://www.cpaptalk.com/viewtopic.php?f=1&t=80013&p=728174#p728174

    avi123

    ReplyDelete
    Replies
    1. Avi,

      According to JediMark, the "averages" in SH are weigted averages. Loosely speaking, to get the "Average pressure" you multiply each pressure setting by the total time at that pressure. You add all these numbers together and divide by the total time for the night. To put it in more mathematical language: You divide the area trapped under the Pressure curve (and above the horizontal axis) by the length of the night. In other words, it's an old formula you may remember from your college calculus class:

      Average Pressure = (Integral from a to b of P(t))/(b - a)

      where:

      P(t) = Pressure at time t
      a = Start time for the night
      b = End time for the night

      Delete
    2. And I should add:

      Geometrically speaking, the Average Pressure is the height of a rectangle that has the same length as the Pressure Graph and the same area as the area trapped under the Pressure curve.

      Delete
  2. Your description of "weighted average" does not match what I learned in stats. Instead, what you are describing is the mean or regular average. Mean or average = (day 1 + day 2 + day 3 + ... + day n) / n for discrete data and the integral you describe for continuous data. For discrete, this is mathematically identical to multiplying each value by the % of time that it occurs - this is the distributive law of arithmetic. A weighted average is if some days are more important than others, for example: [ (1/28)*day 1 + (2/28)*day 2 + (3/28)*day 3 + (4/28)*day 4 + (5/28)*day 5 + (6/28)*day 6 + (7/28)*day 7) ] / 7.

    Daniel

    ReplyDelete
  3. I think you wrote 95% Leak Rate where you meant to say 95% Pressure, not leaks. Doug

    Here:Pressure numbers are similar for APAPs in Auto mode:
    The median pressure is 9.5 cm H2O: This means that for 50% of the night, your pressure was AT or BELOW 9 cm H2O.. And for 50% of the night, your pressure was AT or ABOVE 9.5 cm H2O
    The 90% pressure is 11 cm H2O. This means that for 90% of the night, your pressure was AT or BELOW 11 cm H2O and for 10% of the night your pressure was AT or ABOVE 11 cm H2O.
    The 95% leak rate is 11.5 cm H2O. This means that for 95% of the night, your pressure was AT or BELOW 11.5 cm H2O and for 5% of the night your pressure was AT or ABOVE 11.5 cm H2O..
    The average pressure is 9.65 cm H2O. Remember this tells you nothing about how much time you spent with a pressure that was AT or BELOW 9.65 cm H2O because there is no concrete relationship between the average of a set of numbers and percentiles for the data.

    ReplyDelete