[gradsusr] Performance Tips

Jennifer M Adams jadams21 at gmu.edu
Mon Feb 15 14:53:21 EST 2016


On Feb 15, 2016, at 2:13 PM, Christopher Gilroy <chris.gilroy at gmail.com<mailto:chris.gilroy at gmail.com>> wrote:

Jennifer,

I'm trying to speed-up processing (each hour is a seperate script call, using args) and I'm wondering if there's anything out of the ordinary with the way we're doing our snowfall 10:1:

define snowtot = const((sum(maskout(weasdsfc.1-weasdsfc.1(t-1),weasdsfc.1-weasdsfc.1(t-1)),t=2,t+0)*0.0393701)*10, 0, -u)

d snowtot

You could do this all at once, like this:

‘set t 2 ‘tlast
‘define snowtot = tloop(const((sum(maskout(weasdsfc.1-weasdsfc.1(t-1),weasdsfc.1-weasdsfc.1(t-1)),t=2,t+0)*0.0393701)*10, 0, -u))’
t=2
while (t<=tlast)
  ‘set t ‘t
  ‘d snowtot'
  t=t+1
endwhile

Without using tloop(), you can still take advantage of the cacheing. You set t fixed, then whenyou define an expression that has more than one grid in it, e.g. weasdsfc-weasdsfc(t-1), then both grids at t=2 and t=1 get cached. On your next time step, you set t=3 and display your expression again, but the grid for t=2 is already cached, so it only needs to read the one grid for t=3.

Either way, you only end up reading each grid once.

However, if you quit grads in between each time step, you don’t take advantage of the cacheing and the whole operation takes longer because at each time step you are re-reading each grid. If you are doing this calculation while the files are downloading, and you don’t have the data for t=3 at the time you are drawing the grid for t=2, then what you have is probably the best you can do.

Using fwrite to write out grids at each time step and re-reading them in again adds overhead and complexity, and if it doesn’t improve your overall performance, it’s probably not worth the bother. Try to take advantage of cacheing as much as you can.
—Jennifer



The main question is one of... I assume it should take longer the further out the data goes and is there anyway to reduce that time? The actual calculation was presumably causing the processing time being longer and longer since it was doing a maskout, sum and const... less data in the beginning, more and more data the further out it goes that it needs to calculate, makes sense. I thought I could run an fwrite script that exports the "pre-calculated" variable snowtot, and then when plotting the already calculated snowtot I could essentially simply do something like:

d snowtot(t=2)+snowtot(t=3)+snowtot(t=4)

etc but it seems like the processing time seemed to be the exact same instead of (expectedly) simple addition being extremely quick?


Basically, we do an approach like so, and I'd have no problem changing it to something like: define snowtot2, define snowtot3, etc but:

//download t 2 data
set t 2
define snowtot = const((sum(maskout(weasdsfc.1-weasdsfc.1(t-1),weasdsfc.1-weasdsfc.1(t-1)),t=2,t+0)*0.0393701)*10, 0, -u)
d snowtot *draws timestep 2 snow
quit

//download t 3 data
set t 3
define snowtot = const((sum(maskout(weasdsfc.1-weasdsfc.1(t-1),weasdsfc.1-weasdsfc.1(t-1)),t=2,t+0)*0.0393701)*10, 0, -u)
d snowtot *draws timestep 2 and 3 snowtot
quit

//download t 4 data
set t 4
define snowtot = const((sum(maskout(weasdsfc.1-weasdsfc.1(t-1),weasdsfc.1-weasdsfc.1(t-1)),t=2,t+0)*0.0393701)*10, 0, -u)
d snowtot *draws timestep 2, 3 and 4 snowtot
quit


So the thought process was since we were already calculating t 2 snowtot we could use the previously calculated t 2 for the subsequent run. Then on t 4 we've already calculated t 2 and t 3, so we could just use those. I thought fwrite'ing each hours "already calculated" data would have drastically sped that up for each future hour, but it seemed to take the same amount of time as doing it this way?

Sorry if that seems to make no sense (I'll try to explain better if so), or way overboard, but fundamentally we process the data real-time as the files are released from the models.

-Chris



On Mon, Feb 15, 2016 at 9:13 AM, Jennifer M Adams <jadams21 at gmu.edu<mailto:jadams21 at gmu.edu>> wrote:

On Feb 14, 2016, at 12:34 AM, Christopher Gilroy <chris.gilroy at gmail.com<mailto:chris.gilroy at gmail.com>> wrote:

I figured this would be the perfect post to ask these questions in:

1.) "Defining" a variable actually 'costs' nothing in terms of performance, correct? It only actually gets calculated (the cost of processing) when you display it, right?
The define command copies the result of your expression into memory, so that the I/O and any calculations are done only that one time (when you invoke ‘define’). In a sense, all of the ‘cost’ is paid with define, and subsequent displays get the grid from memory, so they are effectively ‘free’. you can draw your defined grid twice (say, once for shaded contours and again for a line-contour highlight), without using any extra processing time. If you do not define your expression, but just ‘display’ it, then after the drawing is finished, the grid is released. If you ‘display’ the expression again, the I/O and calculations are repeated.

Many use ‘define’ and then ‘display' as a matter of habit, even if the grid is only drawn once and it would be just as fast to just use ‘display’ once. This practice does no harm, but it can lead to an unnecessarily bloated memory footprint of your GrADS session.


2.) Is there anyway to process the display once, but use it's output multiple times without incurring additional processing time? Simple example would be, display t step 2, but also fwrite the processed data to a file so when you process t step 3 you can just open the already processed data and simply add t step 3's data to it? Think of doing a total accumulation plot, the further out it gets the longer it takes to process (presumably) because it's calculating all previous hours data (which technically has already been processed) before it even calculates the current hours.
This sounds like a job for tloop().


3.) All of the pdef talk in this, without defining a smaller pdef is the actual entire file read as soon as you open the corresponding ctl file? From the way the talks have been in this, it seems so but I figured I'd ask.
The I/O for pdef’d data is done one grid box at a time, for each point in the destination grid (which is defined with your xdef and ydef entries). Assuming you are doing bilinear interpolation, for each grid box in the destination grid, four data points are read from the native grid, then the weighted average of those four points goes into one element of the array containing the interpolated grid. If the native grid is cached, then this 4:1 I/O factor is not so noticable. Grads does internal caching for grib, grib2, and netcdf4 data types. The operating system does some cacheing for binary data. The one that is really slow is netcdf3. It is very slow to read a high-res netcdf3 file that needs PDEF. It is only my list to add internal cacheing in the GrADS I/O layer for *all* data types, but I haven’t gotten that code written yet.

Perhaps I'm confused on what setting an actual lat/long 'does', I assume it calculated the display output based around those dimensions
and from the way I'm reading this it seems like the lat/lon is simply "controlling" grads in terms of the actual bounding dimension to display the data?
That’s right.

For example, we do this:

'set lat 10 68'
'set lon -138 -55'

'define radar1kagl = refd1000m.1'
'define hgt850tmp = tmpprs.1(lev=850)-273.15’

At this point your variable hgt850tmp is copied in memory, with the dimension limits you specified.

'set lat 20 58'
'set lon -128 -65'

'd hgt850tmp’
Now you changed the dimension limits and display the defined variable — no new I/O is performed from the original data file. If your new limits are outside the domain where the variable is defined, you will get missing data. In this case, your display will be complete because your new dimesions are within the domain of the defined variable.
—Jennifer


As you guys are probably aware, if we were to make the lat/lon before the define half the size the second lat/lon would still be the bounding box but the data would only be calculated from the first set, which is also why I'm questioning #3.



On Fri, Feb 12, 2016 at 1:38 PM, Wesley Ebisuzaki - NOAA Federal <wesley.ebisuzaki at noaa.gov<mailto:wesley.ebisuzaki at noaa.gov>> wrote:
Travis,

    I think this is how Jennifer wants it to work.

You have a 1 km grid and a ctl for the 1km grid.  This works great for my town
but is super slow for the CONUS where your sceen can't resolve 1 km.

You make a ctl that uses the 1km file but uses xdef/ydef for a 10 km grid.
This will require a pdef line.  This control file will be good for state maps.

You can also make a ctl that uses the 1km file but uses xdef/ydef for a 100 km grid.
This will also require a pdef line.  This ctl file will be good for CONUS maps.

How you make the pdef line/file is the subject of another email.

Wesley




On Fri, Feb 12, 2016 at 12:55 PM, Travis Wilson - NOAA Federal <travis.wilson at noaa.gov<mailto:travis.wilson at noaa.gov>> wrote:
Hi All,

These are great tips.  Changing xdef and ydef is a great option but if I understand correctly, it won’t work with Jennifer’s new code since pdefwrite would have to be redone.  Making a dummy grid on the fly and using lterp looks to be the next best option for us.

The most surprising thing I found from my python test is that plotting performance doesn’t really degrade as you view a larger area of a high resolution grib file.

HRRR example (attached in original email)
Grads =  0.86s (California view)  --> 6.36 (Conus View)
Python = 4.7 (California view) --> 6.8 (Conus View)

It may be beneficial if grads had a grid-to-xwindow ratio and/or a grid-to-image ratio setting to acknowledge the fact that we don’t want to keep writing over the same pixel for high resolution grids (basically regrid or start skipping grib points for the plot/xwindow when things become redundant).  GrADS could possibly allow users to turn this option on/off and set their desired ratio.  This would make grads very snappy with possibly little to no image/xwindow quality loss.  A good example is when someone is doing an analysis with a HRRR grib file and grads performance would change very little whether someone is looking at the entire conus or just a small region.   Right now, grads performance changes by a factor of 7 in the examples I sent in the PDF.  Python’s performance changes by only a factor of 1.5, so I suspect it is doing some regridding or selective grib point plotting on the fly to keep things speedy.  Anyways, it is just a thought and may be beneficial as we head towards higher resolution datasets.  Thank you all for your help, I really appreciate it.

Travis

On Thu, Feb 11, 2016 at 11:12 AM, Jennifer M Adams <jadams21 at gmu.edu<mailto:jadams21 at gmu.edu>> wrote:
Dear Travis, Wesley, et al.,

I have done some testing with the high-res fnexrad 1km radar data, comparing the use of ‘pdef lccr’ (where the interpolation weights are calculated internally) and ‘pdef bilin’ (where interpolation weights are provided by the user in an external file. Reading the weights from a file was significantly faster — something like 30x faster!

The tricky part of taking advantage of this performance gain is creating the pdef file itself, which depends on you being able to calculate non-integer i,j values in the native grid that correspond to each grid point in the destination grid, which is defined by what you put in your XDEF and YDEF statements. This is not necessarily simple.

The good news is that GrADS does this calculation for you every time you open a descriptor with a pdef statement that doesn’t point to an external file — lcc, lccr, nps, sps, etc. I am going to implement a command ‘pdefwrite’ that will write out the interpolation weights calculated internally for these types of PDEF entries so that the file can be used with ‘pdef bilin’ instead. The protocol will be something like this:
1. Create a descriptor that has a pdef statement like this:
pdef 4736 3000 lccr 23.0 -120 1 1 40.0 40.0 -100 1016.2360 1016.150
2. Open it with grads
3. Invoke pdefwrite with a file name as an argument
4. Rewrite your descriptor to use this pdef statment instead:
pdef 4736 3000 bilin stream binary your-filename-here
5. Don’t change the XDEF and YDEF statements — those match the pdef file you created in step 3.
6. Open the new descriptor with GrADS and start working right away.

Additional comments on Travis’s email:

Shade1 may be faster than shade2 in some cases, but it won’t look right with transparent colors because the polygons in the old algorithm overlap. By the way, in the newer versions of GrADS, ‘gxout shaded’ is an alias for shade2, so if you want to use shade1 you have to say so explicitly.

For regridding, the new code in lterp() does just about everything re() does only it is faster and more accurate. It is true that the destination grid definition requires an open file, but I use something like this all the time:
dset ^foo.bin
options template
undef -9.99e8
xdef 90 linear 2 4
ydef 45 linear -88 4
tdef 1 linear 01Jan0001 1dy
zdef 1 linear 1 1
vars 1
foo 0 99 foo
endvars

You can even create that dummy descriptor on the fly, depending on what destination grid you need at the time. Also, if you are using pdef, it is a waste of resources to use lterp(), just put your desired destination grid in the XDEF and YDEF statements.

High res data sets take longer to render because they have more data to grind through to calculate where to draw the contours. But if your data is high res, don’t you want to see that reflected in your plot?

I like ‘gxout grfill’ to really see the finer details in the data. Contours over highly variable data (e.g. temperature in the Rocky Mountains) can look really noisy but grfill lets you see that variability without all the annoying squiggly contour lines.

Regarding the resolution of the image output — there is no point to write out really high res data to a small image file; you just end up drawing over the same pixel multiple times. If image file dimensions are your limiting factor, then it might make sense to downgrade the resolution of your grid. I don’t think the optimal ratio between grid size and image size is 1:1, however. There’s probably a sweet spot somewhere where you can still see all the details in your data and the image size is lean enough. I think 800x600 is pretty small, and it is also not quite the same aspect ratio as 11x8.5 so your image will be a bit distorted from what you see in the display window.

Don’t forget about the utility ‘pngquant' for making the image output files (from v2.1+) less bulky so you can store more of them and they will load faster in a browser.

—Jennifer


On Feb 11, 2016, at 10:40 AM, Wesley Ebisuzaki - NOAA Federal <wesley.ebisuzaki at noaa.gov<mailto:wesley.ebisuzaki at noaa.gov>> wrote:

Travis,

   I haven't tried this but it may work.

   Instead of regridding your hi-res lat-lon data, make a new control file
which has a PDEF .. BILIN.  This PDEF would map low_res(i,j) -> hi_res(n*i, n*j)

     low_res() : the low-res x-y grid which is defined in the low-res ctl file.
     hi_res(): the hi-res grib file grid

I don't remember if grids start at grid(0,0) or grid(1,1).  If grids start at (1,1) then
the above formula would have to be changed.

Wesley



On Tue, Feb 9, 2016 at 5:37 PM, Travis Wilson - NOAA Federal <travis.wilson at noaa.gov<mailto:travis.wilson at noaa.gov>> wrote:
Hi All,

Attached is a very short ppt on grads performance vs python using grib files.  In most cases, grads blows python away.  Times are relative to our machine and consider everything from starting grads/opening the file, to closing the file.

- In particular we have found that shaded1 is much faster.  Up to 40% faster on our machines.
- Wesley Ebisuzaki recommended converting the grib files to a lat/lon grid to eliminate the PDEF entry to significantly speed up the opening time of high resolution grib files. http://gradsusr.org/pipermail/gradsusr/2016-January/039339.html
- Again noted by Wesley, grib packing can have an impact on performance http://gradsusr.org/pipermail/gradsusr/2010-May/027683.html

One thing we show in the ppt is that as the view gets wider (i.e. the number of points that are plotted increase), the slower grads is relative to python.  At some point, python will become faster.   Anyways, to battle with this, regridding (using the re() function) the data within grads significantly speeds up the plotting time (see last slide) when you have a lot of points.  As far as I know, you can’t use re() in grads 2.1a3.  You do have lterp() but a grid is needed.  Is there anything that will allow me to lterp to my image dimensions?  Say my image dimensions are x800 y600 then lterp would interpolate my high resolution grib file to x800 y600 (or some multiple of) when a view exceeds 800 points across.  This will significantly speed up the plotting time when viewing a wide view of a high resolution grib file while not degrading the image quality by much (again, see last slide).

Also, if anyone has other performance tips on plotting high resolution grib files we would love to hear them.

Thanks,
Travis

_______________________________________________
gradsusr mailing list
gradsusr at gradsusr.org<mailto:gradsusr at gradsusr.org>
http://gradsusr.org/mailman/listinfo/gradsusr


_______________________________________________
gradsusr mailing list
gradsusr at gradsusr.org<mailto:gradsusr at gradsusr.org>
http://gradsusr.org/mailman/listinfo/gradsusr

--
Jennifer Miletta Adams
Center for Ocean-Land-Atmosphere Studies (COLA)
George Mason University




_______________________________________________
gradsusr mailing list
gradsusr at gradsusr.org<mailto:gradsusr at gradsusr.org>
http://gradsusr.org/mailman/listinfo/gradsusr



_______________________________________________
gradsusr mailing list
gradsusr at gradsusr.org<mailto:gradsusr at gradsusr.org>
http://gradsusr.org/mailman/listinfo/gradsusr



_______________________________________________
gradsusr mailing list
gradsusr at gradsusr.org<mailto:gradsusr at gradsusr.org>
http://gradsusr.org/mailman/listinfo/gradsusr




--
-Chris A. Gilroy
_______________________________________________
gradsusr mailing list
gradsusr at gradsusr.org<mailto:gradsusr at gradsusr.org>
http://gradsusr.org/mailman/listinfo/gradsusr

--
Jennifer Miletta Adams
Center for Ocean-Land-Atmosphere Studies (COLA)
George Mason University




_______________________________________________
gradsusr mailing list
gradsusr at gradsusr.org<mailto:gradsusr at gradsusr.org>
http://gradsusr.org/mailman/listinfo/gradsusr




--
-Chris A. Gilroy
_______________________________________________
gradsusr mailing list
gradsusr at gradsusr.org<mailto:gradsusr at gradsusr.org>
http://gradsusr.org/mailman/listinfo/gradsusr

--
Jennifer Miletta Adams
Center for Ocean-Land-Atmosphere Studies (COLA)
George Mason University



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://gradsusr.org/pipermail/gradsusr/attachments/20160215/1a2d98a5/attachment-0001.html 


More information about the gradsusr mailing list