loop over x,y,z in user defined function (opengrads)

Brian Doty doty at COLA.IGES.ORG
Fri Jan 22 14:46:00 EST 2010


Ming, this question gets into the philosophy of how one writes code  
that is stable long term, code that is easy to change, how to avoid  
bloatware, etc.  So it depends on what the goals are for the code.  If  
it is for personal use, then one may not care much.

By providing defined call-back entry points, and by isolating data  
structures, it is possible to have a base code that can be changed  
easily, and also allow external functions that don't have to change  
when the base code changes.  Even with this sort of interface, there  
is nothing to stop an external function from ignoring this interface  
and calling back into grads anywhere, or use any internal data  
structure.  In that case, a responsible code writer would be sure to  
flag this behavior very prominently, since such a code would be more  
effort to maintain over time.  One can forsee that if there ends up  
being a lot of external functions like this, and they circulate  
through the community, then when certain changes are made to the grads  
base code there will be many headaches (and complaints).  Not really a  
good place to be.

So do we try to provide an interface that isolates dimension-looping  
external functions, such that they can be protected from changes to  
the base code?  This is a tough question.  It is a harder thing to  
do.  I would personally hesitate to commit to such a path without  
really thinking it through.... Brian


On Jan 22, 2010, at 1:59 PM, Ming Pan wrote:

> Hi Brian,
>
> I really appreciate your explanations on the design of grads and its
> user defined function interface.
>
> While I feel it absolutely necessary to limit memory consumption, I
> wish the new interface will consider allowing user program to pull
> more data (e.g. loop over an additional dimension) some way. For
> example, we can allow passing only one grid at a time, but also allow
> re-using that memory block for passing more in a loop. It's indeed a
> trade-off between how much you allow user program to do and the risk
> of breaking grads itself. (Re-design the software with C++ may solve
> some fundamental issues on API?) Arlindo's UDF design gives me really
> good flexibility and fast speed, though I have to read a lot of the
> grads source and go through many crash/debug cycles with gdb. I don't
> really mind about that since it's eventually the user's responsibility
> not to blow up the memory or cause segmentation fault. To use the less
> efficient external files for data transfer will completely isolate
> grads so that it never crashes on user's fault, and users do not have
> to learn too much about how grads works internally. On the other hand,
> it doesn't really make a difference for me whether it crashes inside
> grads or my functions -- both are my own fault.
>
> I completely agree that the returned grid from a user function should
> not have fundamentally different dimensions. At the same time I wish
> the new interface would allow returning more than one outputs. I once
> needed this desperately and ended up writing a user defined command
> (Arlindo's interface) to replace the default "define" command so that
> I can receive/define multiple grids from my customized function in one
> shot.
>
> ming
>
> On Tue, Jan 19, 2010 at 12:15 PM, Brian Doty <doty at cola.iges.org>  
> wrote:
>> The expression handling part of grads is fundamentally limited to  
>> grids with
>> a maximum of 2D varying dimensions.  This goes back to the early  
>> 90s when I
>> designed and implemented the language known as the "grads  
>> expression".  This
>> design has stood the test of time -- if anything, experience since  
>> then has
>> shown this to be extremely important, as it limits the amount of  
>> memory that
>> grads consumes.  You can, of course, override this at any time by  
>> staging
>> data in memory via the "define" feature.  Define can use great  
>> amounts of
>> memory very rapidly, but it is under your control.
>>
>> There are functions, such as ave, that loop over some dimension and  
>> evaluate
>> a target expression that results in a 2D or less grid, and then  
>> does some
>> operation on those grids (such as sum them and divide by n).  Note  
>> that at
>> no time is there a grid present that is more than 2D.  The looping  
>> results
>> in potentially a lot of I/O.  If the data is going to be looped  
>> through more
>> than once (or sliced in some non-efficient way), then putting the  
>> data into
>> memory in advance, via define, can yield large performance  
>> improvements.
>>  (It is important to understand the way grads evaluates expressions  
>> and what
>> implications this has for I/O if you are working with a lot of data  
>> -- the
>> way you do certain things can have huge performance differences.)
>>
>> The functions that loop through a dimension (eg, ave) operate  
>> recursively.
>>  They are thus not suitable as UDF functions, since I don't see how  
>> to
>> easily provide a standardized interface without very high long term
>> maintenance costs.  Thus I would personally recommend that you  
>> implement
>> such a function as a source code mod, this being much safer in the  
>> long
>> term, as it will make clear to future users the nature of the  
>> function and
>> the level of burden for maintaining it.
>>
>> As a side note, it is also important that no function outputs a  
>> grid that
>> has fundamentally different dimensions than the input dimension  
>> environment.
>>  This is a necessary restriction to preserve the internal  
>> consistency of the
>> expression handling.   What I mean by this can be shown by an  
>> example:  if
>> the input dimension environment is X and Y varying, it would be  
>> invalid to
>> output a grid with some dimension other than X and Y varying (eg,  
>> T).   It
>> is ok to output a grid that is a subset, ie, with just X varying.
>>
>> Another approach is to do what I do in this situation:  I write a  
>> script and
>> a C program, and my script outputs the needed data into a file, and  
>> the C
>> program does the calculation, and the script reads in the data  
>> output by the
>> C program.  Somewhat kludgy, but not all functions are worth the  
>> time and
>> effort to implement within grads and maintain over the longer  
>> term,  IMO.
>>
>> I am working through the design of some sort of interface to allow  
>> tasks to
>> be done directly on "defined" objects.  This is an outgrowth of our  
>> recent
>> implementation of netCDF file output.  Jennifer and I had extensive  
>> design
>> discussions on this feature, and it became clear that we needed to  
>> collect
>> large amounts of data in memory to optimize the I/O, and define was  
>> the
>> obvious way to do this.  Now I am working on a way to do a "sort"  
>> operation
>> on a "defined" data object, where the sort would operate though one  
>> of the
>> dimensions.  I am also thinking that this type of thing could be  
>> generalized
>> and a "UDF" style capability could be provided to perform some  
>> arbitrary
>> task on a defined object.  This would not be part of the expression
>> capability, but would be for one-shot types of things.  I've been  
>> trying to
>> figure out how to provide a sort feature and this is the only way I  
>> can see
>> to do it cleanly... Brian
>>
>>
>> On Jan 13, 2010, at 3:37 PM, Ming Pan wrote:
>>
>>> Dear Users/Developers,
>>>
>>> I'm using grads-2.0.a7.oga.3 and trying to write a customized  
>>> function
>>> to compute low-level cape. In this case, I need to loop over 3
>>> dimension x, y, and z, but it seems the current udf interface only
>>> allows passing 2-D grids (isiz, jsiz). Anyone ran into a similar
>>> situation and came up with a solution/work-around?
>>>
>>> Thanks!
>>>
>>> ming
>>



More information about the gradsusr mailing list