RSL 2.0 Shading Guidelines

RSL 2.0 Shading Guidelines

September, 2009

Introduction

Since laying the foundation of RSL 2.0 in PRMan 13.5, subsequent releases have continued to introduce new shading language features that extend the capabilities of RSL 2.0 and improve the shading performance and efficiency. This application note will delve into those new features and aims to provide some guidelines, rather than a set of hard and fast rules, for the optimal implementation of RSL 2.0 in your shading pipeline.

The ultimate goal when designing shaders - other than the obvious one of making pretty pictures - is to write a maintainable set of shaders that runs efficiently, and has managable memory requirements. Loss of efficiency will result if computations are repeated, leading to a desire to store the result of calculations so they don't have to be recomputed. But storing data will use memory - there is a classic space-time tradeoff.

Recall that the shading pipeline looks like this:

  • ( construct() - runs only once)
  • begin()
  • displacement()
  • opacity()
  • prelighting()
  • lighting()
  • postlighting()

PRMan will call any begin() function lazily, but before any method calls are performed. The begin() function is called only once per grid.

Note that construct() runs once whereas for a given shader object the rest of the pipeline may run multiple times as the object is reused in subsequent shading computations. Please the Shader Objects appnote for further details.

Memory Considerations over the Shading Pipeline

Member variables provide a convenient way to avoid recomputing data. Results can be shared between the pipeline stages, at the cost of some memory being dedicated to preserving the computed result.

It is important to realize that there may be significant delay between running each of pipeline stages mentioned above. Not only that, but given the multithreaded nature of PRMan, there may be many grids "paused" at a given pipeline stage. This implies some memory must be dedicated to preserving the member variables for each shader.

In particular, PRMan may choose to run the displacement method of your shader (or the displacement shader itself, if it is separate) much earlier than it runs the surface() methods. It does this to determine visibility so that shading only occurs where and when it is really needed.

When PRMan runs your displacement early it must preserve any state generated in that phase until the surface methods run. This means that the member variables of all shaders and co-shaders that have run, and their AOVs, must be saved. This of course implies some memory will be dedicated to maintaining this state from displacement through surface shading. With larger displacement bounds and numerous member variables or AOVs, many grids may be in flight at once, so it is worth examining some techniques for reducing the memory associated with such storage.

Member Variables

Recall that if the same shader is used for displacement and surface methods, all its members must be preserved. Often, a surface shader will naturally want to group shading state relating to BRDF together in a struct. It is also possible that if your shader supports layering it may already have an array of such BRDFs. By making the state that is relevant to surface shading a struct and placing it inside a variable length array, you may control the lifetime of the member variables that only pertain to surface shading.

The following idiom shows how you can avoid PRMan spending memory budget preserving member variables that are not set up during displacement:

struct stateNeededOnlyDuringSurface {
   //...
   public void accumulate(...) {
      //....
   }
}

class myshader(...) {

    private stateNeededOnlyDuringSurface m_surfaceState[] = {};

    public void displacement(output point P; output normal N) {
        // ...

        // the member of m_surfaceState has 0 size and doesn't
        // need to be preserved
    }

    // potentially long delay between displacement and prelighting

    public void prelighting(output color Ci,Oi) {
        resize(m_surfaceState,1);

        //... fill it in
        m_surfaceState[0] = stateNeededOnlyDuringSurface( .... );
    }

    public void lighting(output color Ci, Oi) {

        // The surface state is preserved over subsquent pipeline methods

        //...
        for ( ... lights ... ) {
            m_surfaceState[0]->accumulate(....);
        }
    }

This technique will ensure that minimal state is saved during displacement through to the surface method. Often, when implementing layers, it will be natural to establish the layering using a resizable array, so simply ensuring a relevant location for that resize will help your shaders be memory efficient.

Member Variables and AOVs

Arbitrary Output Variables (AOVs), much like member variables, must be preserved from displacement through to surface shading. Often AOVs are only written to during surface shading and all the values PRMan dutifully preserves represent wasted space. Unfortunately, unlike member variables, there is no simple idiom that allows you to control the lifetime of such variables. They exist from the very first run of the shader until it is done shading. This represents a problem particularly when the same surface shader is used for both displacement and surface shading. This limitation will be addressed in a future release of PRMan.

The two options described below will allow you to control the amount of memory associated with AOVs when displacement is in play.

The first option is to simply use a separate displacement shader that does not bind to the AOVs in its parameter list. Obviously there are disadvantages to this methodology. At first it may appear that the benefit of sharing computation between the displacement shader and the surface is lost. While it is possible to use the old message passing shadeops to communicate such information between the displacement and the surface shader, it may be more expressive to define methods on the displacement shader to extract such information (perhaps embodied in a struct). The surface can then call this when it is first invoked. Note that it would be unwise to try to store such information in the surface from the displacement() method, as that would involve running surface shader methods during displacement, at which point PRMan would correctly assume that the surface shader contains useful state that must be preserved.

struct displaceResults {
    //...
};

class displacementShader() {

    displaceResults m_displacements;

    public void displacement (output point P; output normal N) {
        //....

        m_displacements = displaceResults(...P.... N.... );
    }

    public void getDisplacementResults(output displaceResults res) {
        res = m_displacements;
    }
};

///...

class surfaceShader() {

    displaceResults m_displacements;

    public void prelighting(output color Ci, Oi) {
        // do not attempt to communicate the result of displacement
        // during the surface shader begin(), use prelighting()

        displacement->getDisplacementResults(m_displacements);

        //...
    }
}

Another option is to structure your combined shader as a minimal system that invokes co-shaders associciated with each pipeline stage, with each of those co-shaders being obtained (via getshader()) at the appropriate stages in the pipeline. If, at the end of the displacement phase of the pipeline, the only co-shaders that have been obtained are those pertaining to displacement, those co-shaders will be the only ones whose state requires preserving.

class baseSurface(string surfaceHandle = ""; string displacementHandle = "") {
    shader m_displacementModule = null;
    shader m_surfaceModule = null;

    public void displacement(output varying point P; output varying normal N) {
        if (displacementHandle != "") {
            m_displacementModule = getshader(displacementHandle);
            m_displacemenModule->displacement(P,N);
        }
    }

    public void prelighting(output color Ci, Oi) {
        if (surfaceHandle != "") {
            m_surfaceModule = getshader(surfaceHandle);
            m_surfaceModule->prelighting(m_displacementModule,Ci,Oi);
        }
    }

    //...
}

Note that the displacement module can be passed to the surface module if the surface requires results from the displacement module.

Co-shaders and Co-shader Member Variables

The shading pipeline supports co-shaders as a method of dynamically deferring part of the shading computation to another block of code. In order to minimize memory overhead associated with displacement it would be prudent to acquire surface co-shaders as late as possible, preferably during the surface shader's prelighting(), rather than in begin() or displacement(). Co-shaders that have not been obtained or run do not require saving. At one end of the spectrum, simply avoiding getting co-shaders early may be sufficient:

class myShaderWithDisplacement() {

    shader displacementCoshader;
    shader surfaceCoshader1,surfaceCoshader2...

    public void begin() {

    }

    public void displacement(output point P; output normal N) {
        // the displacement generating coshader must
        // be available here, but others do not have to be
        displacementCoshader = getshader("myDisplacementCoshader");
    }

    // potentially long delay between displacement and prelighting

    public void prelighting (output color Ci, Oi) {
        // delay getting these coshaders until prelighting so that
        // their state doesn't need to be preserved
        surfaceCoshader1 = ....
        surfaceCoshader2 = ....
    }

Taking this a step further, there may be some significant time between running displacement() and surface() or prelighting()/lighting()/postlighting(). PRMan will save any co-shaders that have been begun until the surface shading is done or until the grid is determined to be invisible.

It is generally advisable to to delay getting shaders until they are absolutely required, provided that it doesn't cause you to have to make multiple calls to getshader().

  • Note

    Existential conundrums aside, when a shader is returned via getshaders() or getlights() the begin() method is called (if it hasn't already been). Because begin() is user code and can store state into the co-shader, PRMan will save such state at various stages in the shading pipeline, even if you never use it.

Attribute Access in construct()

RSL 2.0 introduced the ability to write user code which runs during construct. There are some things to be aware of when writing a construct() method. Firstly, varying state is unavailable during construct(). The construct() method runs before the shader is associated with a grid, and runs only once, whereas begin() runs per-grid and can access varying data.

It is important to be aware that when attributes are obtained in construct() the shader object can only be reused in a limited set of circumstances. PRMan instantiates a shader object to perform shading. Although it is desirable to minimize the code that runs per-grid (recall that construct() runs only once for each shader object), it may actually be less efficient to put your attribute accesses in the construct() call. Also be aware that getshader(), getlight(), getshaders() and getlights() are all effectively attribute accesses.

Imagine the following code:

class mysurface() {

    //...
    public void construct() {
        // both of these are attribute accesses
        attribute("sides",m_sides);
        m_othershader = getshader("othershader");
    }

Because the shader object embodies state that includes the attribute values, PRMan cannot validly reuse the shader object again unless the sidedness of both is the same, and the same shader is attached. In fact, PRMan does not currently perform analysis that is that complex, so performing such attribute access may force a new shader object to be created for each primitive with this shader attached.

This may be problematic. If there are many small geometric primitives (gprims), each with this shader attached, the shader will need to have a new shader object created for each of those primitives. This may mean that construct runs many times and, in fact, the additional setup and memory overhead may drastically outweigh any benefit gained. Shader objects that are created because of attribute access in construct() cannot be multi-grid combined or shaded with other shader objects (even those of the same shader, but attached to other gprims). While a given gprim's grids will be candidates for grid combining and multi-grid shading, the reduction in the number of combinable grids may represent a significant performance penalty depending on the type of geometry, the number of grids, and how large the geometry is. Additionally, if a light has attribute access in construct(), it may not function properly during re-rendering. So, in general, it is probably better to avoid attribute access in construct() altogether.


Guidelines for RSL 2.0 Language Features

We now examine some of the tools you have at your disposal for writing efficient, flexible, and easily maintained shaders, along with some practical applications for those tools.

Structs and Struct Member Functions

Struct methods allow a shader writer to group code together with the data (in a struct) that the code operates on. This helps keep code maintainable and promotes good code factoring. The object-oriented nature of struct member functions also isolates the data from the interface via which it is accessed.

Struct member functions are useful for a number of purposes, but perhaps the most typical use would be to minimize getting attributes and options, e.g. ray depth, sidedness, and accumulating BRDFs.

While structs themselves do not help manage memory budget for a shader, nor do they avoid recomputing data, they do provide tools that make these goals easier to achieve with clean, manageable code.

Case in point: you may find it helpful to group together all renderer-state information in a struct that you set up in begin(). A shader will often check how many sides the current object has (i.e. its RiSides), it may need to know the current ray depth, and it may look up user options and attributes. Doing so only once makes your shader much more efficient. Without structs, this would become unwieldy:

struct ShadingContext {
  uniform float m_sides = 1;
  uniform float m_raydepth = 0;
  string m_passName = “”;
  float m_frameNumber = 0;
  //...
  public void InitializeOptions() {
    option(“user:passName”,passName);
    option(“Frame”,frameNumber);
  }
  public void InitializeAttributes() {
    attribute("sides",sides);
    rayinfo("depth",raydepth);
  }
}

This state would usually be best stored in a member variable of the shader:

class someShader () {
  //...

  ShadingContext m_shadingCtx;

  public void construct() {
      m_shadingCtx->InitializeOptions();
      //...
  }

  public void begin() {
      m_shadingCtx->InitializeAttributes();
  }

  //...

While it is perfectly possible to grab attributes in construct(), it may not always be a good idea to do so. For more about getting attribute values during construct(), see the Attribute Access in construct() section, above.

Having set up the ShadingContext struct, it can be referenced by other methods and even other co-shaders, if you pass it via the parameter list. This will avoid the need to repeatedly inquire about attribute states, ray depths, and so on. Although this is possible without structs, it rapidly becomes unwieldy to pass multiple separate values around.

Struct member functions are as efficient as traditional functions and have very little or no overhead compared with writing code out longhand. Method calls, on the other hand, offer dynamism, but at an increased performance cost compared to struct member functions.

Versioning for structs

Structs may be employed to communicate between shaders, or may be used simply to factor code within a given shader. It is important to note that PRMan does not permit two shaders to be loaded with different definitions for a given struct. This implies structuring the naming for your structs is important. Note that it may be worth employing a prefix to avoid name collisions with libraries provided by third parties.

It also means that even if a struct is used only within a shader (and never used to communicate through a method call), it must still be match the definition used in other shaders. This may present an issue if the struct is modified and you do not wish to publish a full new set of shaders.

One way to deal with this is to version your structs:

#define VERSION_STRUCT(name,ver) name##_##ver
#define myStruct VERSION_STRUCT(myStruct,1)

struct myStruct{
    //...
};

This means that the version is baked into the name and avoids the struct colliding with other shaders that just use the struct (perhaps a BRDF) internally.

Remember, if a struct is used to communicate between shaders, the definitions must match.

Method Calls

Method calls allow for modularity and dynamism in your shader design, which may allow you to write a more flexible shader set that is easier to maintain than might be possible without the dynamism afforded by method calls. Such dynamism comes at a very small performance cost. We outline here some guidelines for ensuring that the dynamism afforded by method calls and co-shaders comes at a modest cost, outweighed by the benefits of such an approach.

Calls to methods on a shader imply a small performance cost, which is not present for traditional function calls. These costs will be most significant when ray tracing, or when the grid size is very small. By following a few simple guidelines, however, you can keep the cost to a minimum, while still enjoying the flexibility and dynamism afforded by method calls.

  • When calling a method on the same shader, don't call via this->, if possible. If your method call has a fixed set of arguments, call it without method call syntax, e.g.:

    someothermethod(...)
    

    rather than:

    this->someothermethod(...)
    

    However if your method has optional arguments:

    public void foo(float a, float b, float c = 0) {
        //...
    }
    

    Then you must use the method call syntax:

    shader->foo(1,2,"c",2);
    
  • In general, you should try to do as much work in each method call as possible. For example, loop inside your method call, rather than outside:

    vector Ls[],Cls[];
    Ls = ...
    Cls = ...
    othershader->accumulate(Ls,Cls);        // accumulate takes resizable arrays
    

    not:

    vector Ls[],Cls[];
    Ls = ...
    Cls = ...
    for(i=0;i<arraylength(Ls);i+=1) {
        othershader->accumulate(Ls[i],Cls[i]);
    }
    
  • More often than not, it is preferable to use method calls that operate on all samples rather than single samples at a time. However, this may be expensive when multiple lights and multiple BRDFs come into play.

  • When ray tracing, method calls cost proportionally more than in REYES rendering scenarios. That being said, the dynamism afforded by method calls and the improved factoring may result in overall better render times than by avoiding method calls all together. As noted above, minimizing the number of method calls when designing your shaders is worthwhile.

  • It is also worthwhile ensuring that the amount of work done in a method call is reasonably significant. Deciding what is sufficient work is best done with some experimentation; however, it is probably safe to say that the following example is not doing enough work in the method call:

    public float verylittlework(varying float a, varying float b) {
        return a+b;
    }
    

    And that texture lookups and shadows are probably sufficiently heavyweight:

    public float sufficientlyheavy(varying point shadowLoc) {
        return shadow(m_shadowMap,shadowLoc);
    }
    
  • As with any shadow, texture lookup, or shadeop relying on derivatives, calling methods that rely on derivatives (by calling, for example, shadow()) in a varying conditional may not provide correct results.

Structs for Protocols

Structs, struct member functions, and inheritance provide a way to express lightweight protocols. By factoring your code such that there exists a common base class, structs with similar purposes can share code. By overriding the functions, or causing all derived structs to implement a selection of functions, it becomes trivial to swap out one implementation (say, of a BRDF) for another. In the example below we define a sample generator that creates a number of direction vectors.

First, we define an interface:

struct HemisphereSampler
{
    private varying vector samples[] = {};
    private uniform float curSample = 0;
    private uniform float nSamples = 0;

    public void generateSamples(vector N,uniform float nsamples) {
    }
    public vector getSample() {
        if (curSample < nSamples) {
            return samples[curSample];
            curSample += 1;
        }
        return vector(0);
    }
}

Next, we inherit from the interface, to implement a uniform sampler:

struct UniformHemisphereSampler : HemisphereSampler {
    public void generateSamples(vector N,uniform float nsamples) {
        uniform float i;
        for(i=0; i<nsamples; i+=1) {
            vector v = vector random();
            while (length(v) > 1 || N.v < 0 || length(v) < 1e-6) {
                v = vector random();
            }
            push(samples,v);
        }
    }
}

This implementation is somewhat simplistic, but is sufficient to demonstrate protocols using a base class. Now, imagine that the distribution this sample generates does not always have the desired distribution. There are cases where we might like to bias the samples towards the normal. The following sampler struct does just that.

struct WeightedHemisphereSampler : HemisphereSampler {
    public void generateSamples(vector N;uniform float nsamples) {
        uniform float i;
        for(i=0; i<nsamples; i+=1) {
            vector v = vector random();
            while (length(v) > 1 || N.v < 0 || length(v) < 1e-6) {
                v = vector random();
            }

            v -= N*(v.N);               // project to disc
            float r2 = v.v;             // square radius
            v *= r2;
            float vl = length(v);
            v += N*sqrt(1-vl*vl);       // project back

            push(samples,v);
        }
    }
};

The major advantage of this style of coding is hopefully obvious when we try to swap the new sampler in place for the old one. Imagine the code looked like this:

UniformHemisphereSampler sampler;

//..

sampler->generateSamples(N,256);

Now we can swap in the new sampler by changing only one line:

WeightedHemisphereSampler sampler;

//..

sampler->generateSamples(N,256);

This is especially helpful when the struct implements a number of methods: simply changing the struct type allows you to switch one implementation for another. Furthermore, such code, while syntactically similar to a method call, does not have any of the overhead associated with a method call. Such struct member function calls are the same cost as traditional functions, which are effectively the same cost as writing expanding code out in place at each call site. Note, however, that method calls offer dynamism that is not currently available via struct member functions - the full types of the struct must be known, whereas a shader on which a method call is made can effectively be opaque to the call site.

Resizable Arrays

Resizable arrays permit storage of data where the number of items is not known at compile time. This offers flexibility in shader design and avoids the error-prone and rather ugly hacks that result from trying to use sufficiently large arrays that are only partially filled. Additionally, allocating only the required space is likely to be considerably more memory-efficient.

However, resizable arrays present their own set of concerns in the new shading paradigm. For starters, resizing a resizable array may imply a memory allocation to cope with the new size, so it is advantageous to perform resizes as infrequently as possible. That said, one resize is preferable to pushing many elements onto the array. For example, this:

vector results[] = {};
uniform float i;
for(i = 0; i < 128; i+=1) {
    push(results,generatevector());
}

is less efficient than:

vector results[];
resize(results,128);
uniform float i;
for(i = 0; i < 128; i+=1) {
    results[i] = generatevector();
}

As mentioned previously, the fact that resizable arrays have minimal storage requirements until resized to hold items is also a benefit. It can help control the lifetime of data and provides one technique for managing the memory associated with member variables and displacement.

Finally, note that:

reserve(array,0);

deallocates any memory associated with the array, whereas:

resize(array,0);

simply sets its size to zero in the hope that future resizes will be able to reuse the previously allocated memory.

Memory allocated with resizable arrays is automatically freed when shading is complete, so there is normally no need to free memory using this method, unless you specifically want to discard results that are no longer needed and might otherwise represent wasted storage space in between the pipeline methods.


Ray Tracing

When your shaders fire rays you must be aware that different rays might hit the same surface, or that other rays might hit your surface and invoke shading upon it. Often it is desirable to fire more than one ray to anti-alias reflections and so on. However, the simplistic approach of firing n rays every time the shader is invoked will become costly very quickly if any of those rays invoke additional shading that fires yet another set of rays.If at each ray depth we fire n rays, the number of rays we must shade rapidly balloons. Instead it may be desirable to limit the number of rays fired based on some criteria.

The simplest of these is probably to trim the ray budget for ray-traced hits, that is, ray hits at depth > 0. This can be achieved like so, or thusly, as it were:

float samples = nSamples;
float raydepth = 0;
rayinfo("depth",raydepth);
if (raydepth > 0)
    samples = 1;

This pattern is likely to repeat so often that it may be beneficial to include it in a struct of attributes that are obtained during begin().

Avoid Computing AOVs During Ray Tracing

Surface shaders often have chunks of code devoted to accumulating and outputting AOVs. Unless you anticipate these AOVs will be fetched using message passing, time spent initializing them and outputting them is likely to be wasted.

Provided the code changes are not too invasive, it may be worth avoiding outputting AOVs at ray hits.

class myshader(
        output varying CoutDiff;    // avoid initializing by default
        output varying CoutSpec;
        output varying CoutAmbi;
        ) {


    if (m_shadeContext->m_rayDepth > 0) {
        CoutDiff = CoutSpec = CoutAmbi = 0;
    }

    // ...

}

Dealing with Importance Down the Ray Tree

There are situations where it may be possible to fire many fewer rays when it is known, a priori, that the result does not represent a large portion of the surface's color. For example, if the reflective coefficient is very small, trimming the number of rays might result in similar picture quality but at a small percentage of the cost. In other circumstances, a given ray may not contribute a significant amount to the end result, so it is therefore not worth firing many rays additional rays when shading it.

Your shader might be able to usefully alter its ray budget on such importance. This can be done using a shader parameter to which importance is sent:

class mysurf(varying float importance = 1;) {

    public void surface(output color Ci, Oi) {

        // use importance to control the ray budget
    }

}

This can then be sent to the surface using the send: mechanism of gather():

//...

gather(.... , "send:surface:importance",importancebudget) {

}

Re-rendering Tips

In the same way that displacement and the surface methods may be executed with substantial delay in between, re-rendering introduces a lengthy (cross-render) delay in between the prelighting() and lighting() methods of the pipeline. Many of the tips given for managing the memory footprint associated with displacement also apply to re-rendering, can help ensure that the bake database is minimal in size, and that the re-rendering process is as efficient as possible.

Here are some additional important points to bear in mind when writing shaders that are intended to be compatible with re-rendering, as they can affect both the interactivity of re-rendering and the correctness of the result.

  • The call to getlights() for lights should go in the lighting() method. Re-rendering alters the number of lights according to which lights are being actively edited. If the getlights() is not performed in the lighting() method it will not reflect the lights being currently edited, and not only will re-rendering be slow, its results will likely be wrong.
  • Avoid attribute access in construct() for lights and any co-shaders referred to by lights. Also avoid lights that bind to primitive variables. Currently, such constructs interfere with re-rendering and may cause incorrect results. If a light requires access to attributes, perform such access in begin() or when required.
  • Ensure that a you use a linear lighting model, i.e. ensure lights independently interact with the BRDF (and don't influence one another), and that the resultant color Ci for the surface is a linear weighted sum of the BRDFs.

Using the Apodaca Device

The Apodaca Device is a safe way to avoid doing unnecessary work, whereas traditional conditionals may not be a safe way to avoid performing expensive chunk of code. A number of shadeops, such as texture(), shadow(), Du() and others, rely or compute derivatives of quantities on the surface. Such derivatives are not always well defined if the call is made within a varying conditional:

if (/* some varying condition */) {
    texture(...) // bad idea
}

The problem is that the values from which derivatives are computed might not have been properly computed at all points.

The Apodaca Device is an idiom whose purpose is to check if a given condition holds true for any of the points being shaded. A normal conditional if with a varying condition, like so:

if (xcomp(P) > 0) {
    //...
    // varying conditional
}

executes the body for only points at which the condition holds true. Executing certain calls within the body (those which require derivatives, such as texture()) may not provide correct results.

The Apodaca Device sets a uniform variable to true if any point passes the condition. A second conditional then uses the uniform value. The body of the second conditional will either execute for all points being shaded or for none:

uniform float anyTrue = 0;
if (/*someVaryingCondition*/) {
    anyTrue = 1;
}
if (anyTrue > 0) {
    // safely execute for all or for none
    // texture() etc is ok here
}

Because the second condition executes everywhere, it is now perfectly okay to call shadeops that compute or rely on derivatives. It should be noted that it is still important that the input variables to such shadeops are not computed in a varying conditional, because if they were, they may not be computed everywhere and the original issue would once again apply.