[gradsusr] Performance hit with data descriptor files?

Jennifer Adams jma at cola.iges.org
Wed Jul 14 10:59:33 EDT 2010


On Jul 14, 2010, at 10:21 AM, Colarco, Peter R. (GSFC-6133) wrote:

> Hi Jennifer-
>
> I updated to opengrads 2.0.a8.oga.1.  The problem I reported seems  
> to have been solved here.
Great! I suspected that would be the case...

>  I don't understand why: whether I sdfopen open a file directly or  
> xdfopen a templated DDF the same data file must be open and  
> uncompressed.  Presumably xdfopen-ing the DDF is doing something a  
> little more than just that.
The timing of when a data file is opened and closed for data I/O and  
metadata I/O is different for sdfopen, xdfopen, and open (with dtype  
netcdf). For non-templated data sets (sdfopen) the file is opened once  
for metadata gathering, but remains open for further I/O requests. Not  
so for data sets templated with xdfopen -- a data file is opened,  
metadata is gathered, file is closed, and then when you make an I/O  
request the appropriate file is opened again and I/O begins. With open  
(dtype netcdf), the required metadata is provided in the descriptor,  
so no data file needs to be opened for metadata gathering, files are  
only opened to fulfill I/O requests.

I never worked with version 4.0 of the netcdf library, so I can't say  
whether it is to blame for the delay in doing all that file I/O.  I do  
know that version 4.1 of the library had a lot of changes to improve  
performance for large compressed files that were essentially  
unreadable under the older cacheing paradigms.

Jennifer

>
> Thank you,
> Pete
>
>
>
>
> On Jul 13, 2010, at 8:11 AM, Jennifer Adams wrote:
>
>> Pete,
>> The short answer is that you should upgrade to 2.0.a8. A whole lot  
>> of critical changes were implemented in the newer versions (of  
>> GrADS and the netCDF library) to improve performance when handling  
>> compressed netcdf files. Please read the doc page on compression (http://iges.org/grads/gadoc/compression.html 
>> ) for a full explanation of the factors at play. When the cache  
>> settings are appropriate (which they are almost certainly not in  
>> your build), you will find that the I/O for compressed netcdf can  
>> be a whole lot faster than classic netcdf and flat binary data sets.
>> --Jennifer
>>
>>
>> On Jul 12, 2010, at 9:26 PM, Colarco, Peter R. (GSFC-6133) wrote:
>>
>>> Hi-
>>>
>>> Apologies, here is the requested information.
>>>
>>> Thanks,
>>> Pete
>>>
>>>
>>> OS:
>>> % uname -a
>>> Linux redacted.gsfc.nasa.gov 2.6.32-gentoo-r7 #2 SMP Thu Jul 8  
>>> 11:39:03 EDT 2010 x86_64 Intel(R) Xeon(R) CPU X5460 @ 3.16GHz  
>>> GenuineIntel GNU/Linux
>>>
>>> File Information:
>>> % ncdump -hs /path/to/file/Y2003/M01/ 
>>> MYD04_L2_ocn.aero_tc8_051.qawt.20030101.nc4
>>> netcdf MYD04_L2_ocn.aero_tc8_051.qawt.20030101 {
>>> dimensions:
>>>         time = UNLIMITED ; // (8 currently)
>>>         levels = 7 ;
>>>         longitude = 1152 ;
>>>         latitude = 721 ;
>>> variables:
>>>         double time(time) ;
>>>                 time:units = "hours since 2003-1-1 1" ;
>>>                 time:_Storage = "chunked" ;
>>>                 time:_ChunkSizes = 1 ;
>>>         double levels(levels) ;
>>>                 levels:units = "hPa" ;
>>>                 levels:description = "Pressure level" ;
>>>                 levels:type = "plev" ;
>>>                 levels:long_name = "Level" ;
>>>                 levels:positive = "down" ;
>>>                 levels:_Storage = "contiguous" ;
>>>         double longitude(longitude) ;
>>>                 longitude:units = "degrees_east" ;
>>>                 longitude:long_name = "Longitude" ;
>>>                 longitude:_Storage = "contiguous" ;
>>>         double latitude(latitude) ;
>>>                 latitude:units = "degrees_north" ;
>>>                 latitude:long_name = "Latitude" ;
>>>                 latitude:_Storage = "contiguous" ;
>>>         float aodtau(time, levels, latitude, longitude) ;
>>>                 aodtau:comments = "Unknown1 variable comment" ;
>>>                 aodtau:long_name = "aodtau" ;
>>>                 aodtau:units = "" ;
>>>                 aodtau:grid_name = "grid01" ;
>>>                 aodtau:grid_type = "linear" ;
>>>                 aodtau:time_statistic = "instantaneous" ;
>>>                 aodtau:missing_value = 1.e+15f ;
>>>                 aodtau:_Storage = "chunked" ;
>>>                 aodtau:_ChunkSizes = 1, 1, 721, 1152 ;
>>>                 aodtau:_DeflateLevel = 2 ;
>>>
>>> // global attributes:
>>>                 :Conventions = "COARDS" ;
>>>                 :calendar = "standard" ;
>>>                 :comments = "File" ;
>>>                 :model = "geos/das" ;
>>>                 :center = "gsfc" ;
>>>                 :_Format = "netCDF-4" ;
>>> }
>>>
>>>
>>>
>>> ga-> q config
>>> Config: v2.0.a7.oga.3 little-endian readline printim grib2 netcdf  
>>> hdf4-sds hdf5 opendap-grids,stn athena geotiff
>>> Grid Analysis and Display System (GrADS) Version 2.0.a7.oga.3
>>> Copyright (c) 1988-2009 by Brian Doty and the
>>> Institute for Global Environment and Society (IGES)
>>> This program is distributed WITHOUT ANY WARRANTY
>>> See file COPYRIGHT for more information.
>>>
>>> Built Thu Oct 29 17:57:08 EDT 2009 for x86_64-unknown-linux-gnu
>>>
>>> This version of GrADS has been configured with the following  
>>> options:
>>>   o Built on a LITTLE ENDIAN machine
>>>   o Command line editing ENABLED
>>>       http://tiswww.case.edu/php/chet/readline/rltop.html
>>>   o printim command for image output ENABLED
>>>       http://www.zlib.net
>>>       http://www.libpng.org/pub/png/libpng.html
>>>       http://www.libgd.org/Main_Page
>>>   o GRIB2 interface ENABLED
>>>       http://www.ijg.org
>>>       http://www.ece.uvic.ca/~mdadams/jasper
>>>       http://www.nco.ncep.noaa.gov/pmb/codes/GRIB2
>>>       g2clib-1.0.5
>>>   o NetCDF interface ENABLED
>>>       http://www.opendap.org
>>>       libnc-dap 4.0.1-beta3-snapshot2009021712 of Mar  3 2009  
>>> 14:13:33 $
>>>   o HDF interface ENABLED
>>>       http://hdfgroup.org
>>>       HDF 4.2r3
>>>       HDF5 1.8.2
>>>   o Athena Widget GUI ENABLED
>>>   o OPeNDAP gridded data interface ENABLED
>>>       http://www.opendap.org
>>>       libdap 3.7.10
>>>   o OPeNDAP station data interface ENABLED
>>>       http://iges.org/grads/gadoc/supplibs.html
>>>       libgadap 2.0.oga.1
>>>   o GeoTIFF and KML output ENABLED
>>>       http://www.libtiff.org
>>>       http://geotiff.osgeo.org
>>>
>>> For additional information please consult http://iges.org/grads
>>>
>>> On Jul 12, 2010, at 4:51 PM, Jennifer Adams wrote:
>>>
>>>> Could you please send the output from ncdump -hs (the s is for  
>>>> compression info, since you have netcdf-4 files). Also send the  
>>>> output from 'q config' and the specs of the OS you are running  
>>>> on. These things are important for solving every problem.
>>>>
>>>> --Jennifer
>>>>
>>>>
>>>> On Jul 12, 2010, at 4:42 PM, Colarco, Peter R. (GSFC-6133) wrote:
>>>>
>>>>> Hi-
>>>>>
>>>>> I am running opengrads 2.0.a7.oga.3/x86_64.
>>>>>
>>>>> I have a very simple data descriptor file (below) to template a  
>>>>> set of large-ish files (gridded 1152 x 721 x 7 points).  When I  
>>>>> simply "sdfopen" the first file and plot the first variable/ 
>>>>> first time/first level, I get my result essentially  
>>>>> immediately.  When I instead "xdfopen" the data descriptor file  
>>>>> and try to plot the same, it takes some time (~10 sec) to plot.   
>>>>> When I repeat the same sort of thing but operate on several  
>>>>> times the difference in performance between the two methods is  
>>>>> of course very noticeable.
>>>>>
>>>>> The issue seems to be in the "options template" line below;  
>>>>> taking that out and explicitly putting the first file in I can  
>>>>> "xdfopen" the data descriptor file and plot the first variable  
>>>>> essentially instantly.  Additional metadata in the data  
>>>>> descriptor file (e.g., from make_ctl.sh) does not help or matter  
>>>>> as far as I can tell.
>>>>>
>>>>> Can anyone suggest a solution to this problem?  Am I doing  
>>>>> something weird or wrong here?
>>>>>
>>>>> Data Descriptor File:
>>>>>
>>>>> dset /path/to/my/data/Y2003/M01/MYD04_L2_ocn.aero_tc8_051.qawt. 
>>>>> %y4%m2%d2.nc4
>>>>> options template
>>>>> tdef time 248 linear 0z01jan2003 3hr
>>>>>
>>>>> Thank you,
>>>>> Pete Colarco
>>>>>
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------
>>>>> Peter Colarco
>>>>> NASA GSFC
>>>>> Code 613.3
>>>>> NASA Goddard Space Flight Center
>>>>> Greenbelt, MD 20771
>>>>> 301.614.6382 (ph)
>>>>> 301.614.5903 (fax)
>>>>>
>>>>> peter.r.colarco at nasa.gov
>>>>> http://hyperion.gsfc.nasa.gov/People/Colarco
>>>>> --------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> gradsusr mailing list
>>>>> gradsusr at gradsusr.org
>>>>> http://gradsusr.org/mailman/listinfo/gradsusr
>>>>
>>>> --
>>>> Jennifer M. Adams
>>>> IGES/COLA
>>>> 4041 Powder Mill Road, Suite 302
>>>> Calverton, MD 20705
>>>> jma at cola.iges.org
>>>>
>>>>
>>>>
>>>> <ATT00001..txt>
>>>
>>> --------------------------------------------------------------------
>>> Peter Colarco
>>> NASA GSFC
>>> Code 613.3
>>> NASA Goddard Space Flight Center
>>> Greenbelt, MD 20771
>>> 301.614.6382 (ph)
>>> 301.614.5903 (fax)
>>>
>>> peter.r.colarco at nasa.gov
>>> http://hyperion.gsfc.nasa.gov/People/Colarco
>>> --------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> gradsusr mailing list
>>> gradsusr at gradsusr.org
>>> http://gradsusr.org/mailman/listinfo/gradsusr
>>
>> --
>> Jennifer M. Adams
>> IGES/COLA
>> 4041 Powder Mill Road, Suite 302
>> Calverton, MD 20705
>> jma at cola.iges.org
>>
>>
>>
>> <ATT00001..txt>
>
> --------------------------------------------------------------------
> Peter Colarco
> NASA GSFC
> Code 613.3
> NASA Goddard Space Flight Center
> Greenbelt, MD 20771
> 301.614.6382 (ph)
> 301.614.5903 (fax)
>
> peter.r.colarco at nasa.gov
> http://hyperion.gsfc.nasa.gov/People/Colarco
> --------------------------------------------------------------------
>
>
>
>
> _______________________________________________
> gradsusr mailing list
> gradsusr at gradsusr.org
> http://gradsusr.org/mailman/listinfo/gradsusr

--
Jennifer M. Adams
IGES/COLA
4041 Powder Mill Road, Suite 302
Calverton, MD 20705
jma at cola.iges.org



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://gradsusr.org/pipermail/gradsusr/attachments/20100714/97f6f0b8/attachment-0003.html 


More information about the gradsusr mailing list