loop over x,y,z in user defined function (opengrads)

Thu Feb 4 17:02:29 EST 2010

Hi Brian,

Thanks again for your kind response. My last email was intended to say
what kind of features of GrADS I'd hope for, from a user's point of
view, and I know essentially nothing about the coding philosophy of
GrADS.

Still, I hope the GrADS development team can consider good features
more favorably than coding style. If there is a conflict between the
features and issues like long-term maintainability, that suggests we
should probably improve our software engineering and migrate to a more
flexible/maintainable design. For example, c++ would let you
safely/conveniently reload functions and prevent users from accessing
what they shouldn't touch. It's worthy breaking some of the
compatibility (some is already broken in version 2) if we can make
GrADS grow more and more powerful. Otherwise we just let GrADS be
limited by some old designs.

By no mean to say any old design is not good -- just to say we should
let things grow.

ming

On Fri, Jan 22, 2010 at 2:46 PM, Brian Doty <doty at cola.iges.org> wrote:
> Ming, this question gets into the philosophy of how one writes code that is
> stable long term, code that is easy to change, how to avoid bloatware, etc.
>  So it depends on what the goals are for the code.  If it is for personal
> use, then one may not care much.
>
> By providing defined call-back entry points, and by isolating data
> structures, it is possible to have a base code that can be changed easily,
> and also allow external functions that don't have to change when the base
> code changes.  Even with this sort of interface, there is nothing to stop an
> external function from ignoring this interface and calling back into grads
> anywhere, or use any internal data structure.  In that case, a responsible
> code writer would be sure to flag this behavior very prominently, since such
> a code would be more effort to maintain over time.  One can forsee that if
> there ends up being a lot of external functions like this, and they
> circulate through the community, then when certain changes are made to the
> grads base code there will be many headaches (and complaints).  Not really a
> good place to be.
>
> So do we try to provide an interface that isolates dimension-looping
> external functions, such that they can be protected from changes to the base
> code?  This is a tough question.  It is a harder thing to do.  I would
> personally hesitate to commit to such a path without really thinking it
> through.... Brian
>
>
> On Jan 22, 2010, at 1:59 PM, Ming Pan wrote:
>
>> Hi Brian,
>>
>> I really appreciate your explanations on the design of grads and its
>> user defined function interface.
>>
>> While I feel it absolutely necessary to limit memory consumption, I
>> wish the new interface will consider allowing user program to pull
>> more data (e.g. loop over an additional dimension) some way. For
>> example, we can allow passing only one grid at a time, but also allow
>> re-using that memory block for passing more in a loop. It's indeed a
>> trade-off between how much you allow user program to do and the risk
>> of breaking grads itself. (Re-design the software with C++ may solve
>> some fundamental issues on API?) Arlindo's UDF design gives me really
>> good flexibility and fast speed, though I have to read a lot of the
>> grads source and go through many crash/debug cycles with gdb. I don't
>> really mind about that since it's eventually the user's responsibility
>> not to blow up the memory or cause segmentation fault. To use the less
>> efficient external files for data transfer will completely isolate
>> grads so that it never crashes on user's fault, and users do not have
>> to learn too much about how grads works internally. On the other hand,
>> it doesn't really make a difference for me whether it crashes inside
>> grads or my functions -- both are my own fault.
>>
>> I completely agree that the returned grid from a user function should
>> not have fundamentally different dimensions. At the same time I wish
>> the new interface would allow returning more than one outputs. I once
>> needed this desperately and ended up writing a user defined command
>> (Arlindo's interface) to replace the default "define" command so that
>> I can receive/define multiple grids from my customized function in one
>> shot.
>>
>> ming
>>
>> On Tue, Jan 19, 2010 at 12:15 PM, Brian Doty <doty at cola.iges.org> wrote:
>>>
>>> The expression handling part of grads is fundamentally limited to grids
>>> with
>>> a maximum of 2D varying dimensions.  This goes back to the early 90s when
>>> I
>>> designed and implemented the language known as the "grads expression".
>>>  This
>>> design has stood the test of time -- if anything, experience since then
>>> has
>>> shown this to be extremely important, as it limits the amount of memory
>>> that
>>> grads consumes.  You can, of course, override this at any time by staging
>>> data in memory via the "define" feature.  Define can use great amounts of
>>> memory very rapidly, but it is under your control.
>>>
>>> There are functions, such as ave, that loop over some dimension and
>>> evaluate
>>> a target expression that results in a 2D or less grid, and then does some
>>> operation on those grids (such as sum them and divide by n).  Note that
>>> at
>>> no time is there a grid present that is more than 2D.  The looping
>>> results
>>> in potentially a lot of I/O.  If the data is going to be looped through
>>> more
>>> than once (or sliced in some non-efficient way), then putting the data
>>> into
>>> memory in advance, via define, can yield large performance improvements.
>>>  (It is important to understand the way grads evaluates expressions and
>>> what
>>> implications this has for I/O if you are working with a lot of data --
>>> the
>>> way you do certain things can have huge performance differences.)
>>>
>>> The functions that loop through a dimension (eg, ave) operate
>>> recursively.
>>>  They are thus not suitable as UDF functions, since I don't see how to
>>> easily provide a standardized interface without very high long term
>>> maintenance costs.  Thus I would personally recommend that you implement
>>> such a function as a source code mod, this being much safer in the long
>>> term, as it will make clear to future users the nature of the function
>>> and
>>> the level of burden for maintaining it.
>>>
>>> As a side note, it is also important that no function outputs a grid that
>>> has fundamentally different dimensions than the input dimension
>>> environment.
>>>  This is a necessary restriction to preserve the internal consistency of
>>> the
>>> expression handling.   What I mean by this can be shown by an example:
>>>  if
>>> the input dimension environment is X and Y varying, it would be invalid
>>> to
>>> output a grid with some dimension other than X and Y varying (eg, T).
>>> It
>>> is ok to output a grid that is a subset, ie, with just X varying.
>>>
>>> Another approach is to do what I do in this situation:  I write a script
>>> and
>>> a C program, and my script outputs the needed data into a file, and the C
>>> program does the calculation, and the script reads in the data output by
>>> the
>>> C program.  Somewhat kludgy, but not all functions are worth the time and
>>> effort to implement within grads and maintain over the longer term,  IMO.
>>>
>>> I am working through the design of some sort of interface to allow tasks
>>> to
>>> be done directly on "defined" objects.  This is an outgrowth of our
>>> recent
>>> implementation of netCDF file output.  Jennifer and I had extensive
>>> design
>>> discussions on this feature, and it became clear that we needed to
>>> collect
>>> large amounts of data in memory to optimize the I/O, and define was the
>>> obvious way to do this.  Now I am working on a way to do a "sort"
>>> operation
>>> on a "defined" data object, where the sort would operate though one of
>>> the
>>> dimensions.  I am also thinking that this type of thing could be
>>> generalized
>>> and a "UDF" style capability could be provided to perform some arbitrary
>>> task on a defined object.  This would not be part of the expression
>>> capability, but would be for one-shot types of things.  I've been trying
>>> to
>>> figure out how to provide a sort feature and this is the only way I can
>>> see
>>> to do it cleanly... Brian
>>>
>>>
>>> On Jan 13, 2010, at 3:37 PM, Ming Pan wrote:
>>>
>>>> Dear Users/Developers,
>>>>
>>>> I'm using grads-2.0.a7.oga.3 and trying to write a customized function
>>>> to compute low-level cape. In this case, I need to loop over 3
>>>> dimension x, y, and z, but it seems the current udf interface only
>>>> allows passing 2-D grids (isiz, jsiz). Anyone ran into a similar
>>>> situation and came up with a solution/work-around?
>>>>
>>>> Thanks!
>>>>
>>>> ming
>>>
>