loop over x,y,z in user defined function (opengrads)

Fri Jan 22 13:59:23 EST 2010

Hi Brian,

I really appreciate your explanations on the design of grads and its
user defined function interface.

While I feel it absolutely necessary to limit memory consumption, I
wish the new interface will consider allowing user program to pull
more data (e.g. loop over an additional dimension) some way. For
example, we can allow passing only one grid at a time, but also allow
re-using that memory block for passing more in a loop. It's indeed a
trade-off between how much you allow user program to do and the risk
of breaking grads itself. (Re-design the software with C++ may solve
some fundamental issues on API?) Arlindo's UDF design gives me really
good flexibility and fast speed, though I have to read a lot of the
grads source and go through many crash/debug cycles with gdb. I don't
really mind about that since it's eventually the user's responsibility
not to blow up the memory or cause segmentation fault. To use the less
efficient external files for data transfer will completely isolate
grads so that it never crashes on user's fault, and users do not have
to learn too much about how grads works internally. On the other hand,
it doesn't really make a difference for me whether it crashes inside
grads or my functions -- both are my own fault.

I completely agree that the returned grid from a user function should
not have fundamentally different dimensions. At the same time I wish
the new interface would allow returning more than one outputs. I once
needed this desperately and ended up writing a user defined command
(Arlindo's interface) to replace the default "define" command so that
I can receive/define multiple grids from my customized function in one
shot.

ming

On Tue, Jan 19, 2010 at 12:15 PM, Brian Doty <doty at cola.iges.org> wrote:
> The expression handling part of grads is fundamentally limited to grids with
> a maximum of 2D varying dimensions.  This goes back to the early 90s when I
> designed and implemented the language known as the "grads expression".  This
> design has stood the test of time -- if anything, experience since then has
> shown this to be extremely important, as it limits the amount of memory that
> grads consumes.  You can, of course, override this at any time by staging
> data in memory via the "define" feature.  Define can use great amounts of
> memory very rapidly, but it is under your control.
>
> There are functions, such as ave, that loop over some dimension and evaluate
> a target expression that results in a 2D or less grid, and then does some
> operation on those grids (such as sum them and divide by n).  Note that at
> no time is there a grid present that is more than 2D.  The looping results
> in potentially a lot of I/O.  If the data is going to be looped through more
> than once (or sliced in some non-efficient way), then putting the data into
> memory in advance, via define, can yield large performance improvements.
>  (It is important to understand the way grads evaluates expressions and what
> implications this has for I/O if you are working with a lot of data -- the
> way you do certain things can have huge performance differences.)
>
> The functions that loop through a dimension (eg, ave) operate recursively.
>  They are thus not suitable as UDF functions, since I don't see how to
> easily provide a standardized interface without very high long term
> maintenance costs.  Thus I would personally recommend that you implement
> such a function as a source code mod, this being much safer in the long
> term, as it will make clear to future users the nature of the function and
> the level of burden for maintaining it.
>
> As a side note, it is also important that no function outputs a grid that
> has fundamentally different dimensions than the input dimension environment.
>  This is a necessary restriction to preserve the internal consistency of the
> expression handling.   What I mean by this can be shown by an example:  if
> the input dimension environment is X and Y varying, it would be invalid to
> output a grid with some dimension other than X and Y varying (eg, T).   It
> is ok to output a grid that is a subset, ie, with just X varying.
>
> Another approach is to do what I do in this situation:  I write a script and
> a C program, and my script outputs the needed data into a file, and the C
> program does the calculation, and the script reads in the data output by the
> C program.  Somewhat kludgy, but not all functions are worth the time and
> effort to implement within grads and maintain over the longer term,  IMO.
>
> I am working through the design of some sort of interface to allow tasks to
> be done directly on "defined" objects.  This is an outgrowth of our recent
> implementation of netCDF file output.  Jennifer and I had extensive design
> discussions on this feature, and it became clear that we needed to collect
> large amounts of data in memory to optimize the I/O, and define was the
> obvious way to do this.  Now I am working on a way to do a "sort" operation
> on a "defined" data object, where the sort would operate though one of the
> dimensions.  I am also thinking that this type of thing could be generalized
> and a "UDF" style capability could be provided to perform some arbitrary
> task on a defined object.  This would not be part of the expression
> capability, but would be for one-shot types of things.  I've been trying to
> figure out how to provide a sort feature and this is the only way I can see
> to do it cleanly... Brian
>
>
> On Jan 13, 2010, at 3:37 PM, Ming Pan wrote:
>
>> Dear Users/Developers,
>>
>> I'm using grads-2.0.a7.oga.3 and trying to write a customized function
>> to compute low-level cape. In this case, I need to loop over 3
>> dimension x, y, and z, but it seems the current udf interface only
>> allows passing 2-D grids (isiz, jsiz). Anyone ran into a similar
>> situation and came up with a solution/work-around?
>>
>> Thanks!
>>
>> ming
>