[gradsusr] Standard deviation in GrADS

Charles Seman Charles.Seman at noaa.gov
Fri Dec 17 16:50:56 EST 2010


Sudev,

Don't know if you've received a reply yet?

Following, for example "Statistical Concepts and Methods" by 
Bhattacharyya and Johnson (1977), the sample variance s**2 for a set of 
"N" observations of a variable "xi" (x1,...,xN) is defined as:

    s**2 = [(x1-xbar)**2 + (x2-xbar)**2 + (x3-xbar)**2 +....
          + (xN-xbar)**2 ]/(N-1) ,  (from page 34)

  where the sample mean = xbar = (x1+x2+x3+...+xN)/N , (from page 27)

  Note, the sample variance sums over "N" observations, and divides the 
sum by "N-1"...

  The sample standard deviation is defined as the square root of the 
sample variance (from page 35):

    s = sqrt(s**2)

Using the GrADS "ave()" function, ave(pow(sst-tave,2),t=1,t=12) would 
not divide by "N-1", where "N" is the number of points going into the 
time average.  A possible workaround is to do something like (see also 
Example 3 at: http://grads.iges.org/grads/gadoc/gradfuncconst.html)

set t 1 12
define ones = const(sst,1) ;* this defines a field of "ones" for all 
non-missing data only, for all time levels
set t 1
define tave = ave(sst,t=1,t=12) ;* this defines a field of the 
time-averaged sst
define nptsm1 = sum(ones,t=1,t=12) - 1 ;* this defines a field of the 
sum over time of the "ones" field and subtracts "1"
define sdsst = sqrt(sum(pow(sst-tave,2),t=1,t=12)/nptsm1)

Note, please use the above with caution!  I think that if "nptsm1" = 0 
at a point, GrADS makes the result of dividing by zero undefined (ie 
missing).  One way you could test it is by doing

set t 1 12
define ones = const(sst,1)
set t 1
define tave = ave(sst,t=1,t=12) ;* this defines a field of the 
time-averaged sst
define npts = sum(ones,t=1,t=12)
define sdsst = sqrt(sum(pow(sst-tave,2),t=1,t=12)/npts)

and compare with your original method to check for consistency with the 
"ave()" function...

I did some test calculations in GrADS on a local dataset which contains 
missing data, and found no difference between using the "sum()/npts" and 
"ave()" methods but some differences using the "sum()/nptsm1" method 
(screen output shown below, and note, "..." indicates output not 
included here (either command errors or command results not relevant for 
illustration)):

------------------------------------------------------------------
...
Grid Analysis and Display System (GrADS) Version 1.9b4
Copyright (c) 1988-2005 by Brian Doty and IGES
Center for Ocean-Land-Atmosphere Studies (COLA)
Institute for Global Environment and Society (IGES)
GrADS comes with ABSOLUTELY NO WARRANTY
See file COPYRIGHT for more information

Config: v1.9b4 32-bit little-endian readline sdf/xdf netcdf lats athena 
printim

Issue 'q config' command for more information.

Landscape mode? (no for portrait): 
GX Package Initialization: Size = 11 8.5
ga-> open CER_SRBAVG1_Terra-FM1-Edition2D.GEO-Rev1.ctl
Scanning description file:  CER_SRBAVG1_Terra-FM1-Edition2D.GEO-Rev1.ctl
...
ga-> set gxout shaded
ga-> d csw
Contouring: 20 to 180 interval 20
ga-> c
ga-> set t 1 60
Time values set: 2000:3:1:0 2005:2:1:0
...
ga-> define ones = const(csw,1)
Define memory allocation size = 15638400 bytes
ga-> d ones
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Constant field.  Value = 1
Press enter to continue:::::
ga-> set t 1
Time values set: 2000:3:1:0 2000:3:1:0
ga-> define nptsm1 = sum(ones,t=1,t=12) - 1
SUMing.  dim = 3, start = 1, end = 12
Define memory allocation size = 260640 bytes
ga-> define nptsm1 = sum(ones,t=1,t=60) - 1
SUMing.  dim = 3, start = 1, end = 60
Define memory allocation size = 260640 bytes
Name already DEFINEd:  nptsm1.   Will be deleted and replaced.
ga-> d nptsm1
Contouring: 10 to 55 interval 5
ga-> define npts = sum(ones,t=1,t=60)   
SUMing.  dim = 3, start = 1, end = 60
Define memory allocation size = 260640 bytes
ga-> d npts-nptsm1
Constant field.  Value = 1
ga-> d npts
Contouring: 10 to 60 interval 5
ga-> define tave = ave(csw,t=1,t=60)
Averaging.  dim = 3, start = 1, end = 60
Define memory allocation size = 260640 bytes
ga-> define nptsm1 = sum(ones,t=1,t=60) - 1
SUMing.  dim = 3, start = 1, end = 60
Define memory allocation size = 260640 bytes
Name already DEFINEd:  nptsm1.   Will be deleted and replaced.
ga-> define npts = sum(ones,t=1,t=60)
SUMing.  dim = 3, start = 1, end = 60
Define memory allocation size = 260640 bytes
Name already DEFINEd:  npts.   Will be deleted and replaced.
ga-> d npts-nptsm1
Constant field.  Value = 1
...
ga-> define sdnm1csw = sqrt(sum(pow(csw-tave,2),t=1,t=60)/nptsm1)
SUMing.  dim = 3, start = 1, end = 60
Define memory allocation size = 260640 bytes
ga-> d sdnm1csw
Contouring: 20 to 140 interval 20
ga-> run cbar
ga-> d tave
Contouring: 40 to 180 interval 20
...
ga-> define sdncsw = sqrt(sum(pow(csw-tave,2),t=1,t=60)/npts)
SUMing.  dim = 3, start = 1, end = 60
Define memory allocation size = 260640 bytes
ga-> c
ga-> d sdncsw
Contouring: 10 to 150 interval 10
ga-> d sdncsw-sdnm1csw
Contouring: -2.7 to -0.3 interval 0.3
ga-> run cbar
ga-> c
ga-> define sd2ncsw = sqrt(ave(pow(csw-tave,2),t=1,t=60))
Averaging.  dim = 3, start = 1, end = 60
Define memory allocation size = 260640 bytes
ga-> d sd2ncsw
Contouring: 10 to 150 interval 10
ga-> d sd2ncsw-sdncsw
Constant field.  Value = 0
ga-> d npts
Contouring: 10 to 60 interval 5
ga-> d nptsm1
Contouring: 10 to 55 interval 5
ga-> set gxout stat
ga-> d npts
...
Undef count = 0  Valid count = 65160
Min, Max = 10 60
...
ga-> d nptsm1
...
Undef count = 0  Valid count = 65160
Min, Max = 9 59
...
ga-> set gxout shaded
ga-> set t 1 12
Time values set: 2000:3:1:0 2001:2:1:0
ga-> define ones = const(csw,1)
Define memory allocation size = 3127680 bytes
Name already DEFINEd:  ones.   Will be deleted and replaced.
ga-> set t 1
Time values set: 2000:3:1:0 2000:3:1:0
ga-> define nptsm1 = sum(ones,t=1,t=12) - 1
SUMing.  dim = 3, start = 1, end = 12
Define memory allocation size = 260640 bytes
Name already DEFINEd:  nptsm1.   Will be deleted and replaced.
ga-> define npts = sum(ones,t=1,t=12)   
SUMing.  dim = 3, start = 1, end = 12
Define memory allocation size = 260640 bytes
Name already DEFINEd:  npts.   Will be deleted and replaced.
ga-> define tave = ave(csw,t=1,t=12)
Averaging.  dim = 3, start = 1, end = 12
Define memory allocation size = 260640 bytes
Name already DEFINEd:  tave.   Will be deleted and replaced.
ga-> d npts-nptsm1
Constant field.  Value = 1
ga-> define sdnm1csw = sqrt(sum(pow(csw-tave,2),t=1,t=12)/nptsm1)
SUMing.  dim = 3, start = 1, end = 12
Define memory allocation size = 260640 bytes
Name already DEFINEd:  sdnm1csw.   Will be deleted and replaced.
ga-> define sdncsw = sqrt(sum(pow(csw-tave,2),t=1,t=12)/npts)
SUMing.  dim = 3, start = 1, end = 12
Define memory allocation size = 260640 bytes
Name already DEFINEd:  sdncsw.   Will be deleted and replaced.
...
ga-> define sd2ncsw = sqrt(ave(pow(csw-tave,2),t=1,t=12))
Averaging.  dim = 3, start = 1, end = 12
Define memory allocation size = 260640 bytes
Name already DEFINEd:  sd2ncsw.   Will be deleted and replaced.
ga-> d sd2ncsw-sdncsw
Constant field.  Value = 0
ga-> d sdnm1csw-sdncsw
Contouring: 5 to 35 interval 5
ga-> d (sdncsw-sdnm1csw)/sdnm1csw
Contouring: -0.27 to -0.06 interval 0.03
ga-> set gxout stat
ga-> d tave
...
Undef count = 6  Valid count = 65154
Min, Max = 9.96517 207.509
...
ga-> d nptsm1    
...
Undef count = 6  Valid count = 65154
Min, Max = 0 11
...
ga-> d sd2ncsw      
...
Undef count = 6  Valid count = 65154
Min, Max = 0 155.331
...
ga-> d sdnm1csw      
...
Undef count = 46  Valid count = 65114
Min, Max = 0.0914958 164.753
...
------------------------------------------------------------------

the three different standard deviation formulas for 60 time levels:

ga-> define sdnm1csw = sqrt(sum(pow(csw-tave,2),t=1,t=60)/nptsm1) <- the 
"sum()/nptsm1" method where if all points are non-missing the divisor 
would here be "n=59"
...
ga-> define sdncsw = sqrt(sum(pow(csw-tave,2),t=1,t=60)/npts) <- the 
"sum()/npts" method where if all points are non-missing the divisor 
would here be "n=60"
...
ga-> define sd2ncsw = sqrt(ave(pow(csw-tave,2),t=1,t=60)) <- the "ave()" 
method where if all points are non-missing the divisor would also here 
be "n=60" (similar to what you sent below)

and for 12 time levels:

ga-> define sdnm1csw = sqrt(sum(pow(csw-tave,2),t=1,t=12)/nptsm1)
...
ga-> define sdncsw = sqrt(sum(pow(csw-tave,2),t=1,t=12)/npts)
...
ga-> define sd2ncsw = sqrt(ave(pow(csw-tave,2),t=1,t=12))

I found "sd2ncsw-sdncsw = 0" as shown in above screen output, and the 
differences between the "sum()/nptsm1" and "sum()/npts" methods became 
larger as the number of time levels went from 60 to 12... Note, the Min 
= 0 for "nptsm1" for the 12 time-level case (see above), with an 
increase in the number of undefined points (see sd2ncsw, sdnm1csw output 
above).  You could try similar calculations on your local dataset for 
comparison...

Note, the "sum()/nptsm1" method would be the method to use for the 
sample standard deviation according to "Statistical Concepts and 
Methods" by Bhattacharyya and Johnson (1977).

I hope this helps.  Please let me know if you find errors and/or have 
questions...

Thanks,
Chuck

sudev das M P wrote:
> Hai all,
>
> I am calculating standard deviation in grads using the following formula.
>
> I  request you please tell me whether I am calculating sd in the right 
> way.
>
> Is there any other way to calculate sd in GrADS.
> =============================================
> define tave = ave(sst,t=1,t=12)
> define sdsst = sqrt(ave(pow(sst-tave,2),t=1,t=12))
> d sdsst
> =============================================
>
> Thanks,
> sudev
> ------------------------------------------------------------------------
>
> _______________________________________________
> gradsusr mailing list
> gradsusr at gradsusr.org
> http://gradsusr.org/mailman/listinfo/gradsusr
>   

-- 

Please note that Charles.Seman at noaa.gov should be considered my NOAA
email address, not cjs at gfdl.noaa.gov.

********************************************************************
 Charles Seman                                Charles.Seman at noaa.gov
 U.S. Department of Commerce / NOAA / OAR
 Geophysical Fluid Dynamics Laboratory         voice: (609) 452-6547
 201 Forrestal Road                              fax: (609) 987-5063
 Princeton, NJ  08540-6649            http://www.gfdl.noaa.gov/~cjs/
********************************************************************

"The contents of this message are mine personally and do not reflect any
official or unofficial position of the United States Federal Government,
the United States Department of Commerce, or NOAA."




More information about the gradsusr mailing list