OBJ File Format Performance

This page contains some notes I sent a developer regarding the creation of fast OBJ files. This is only of interest if you are working on an export script that creates the OBJ file itself - that is, it deals with how to best organize an OBJ file for maximum speed.

= Part 1: Ordering =

Well, let's start with the 'cost' of everything. Basically, the rough list is (most expensive changes come first: these are the things you do NOT want to interleave):

- Changing primitive type (tris vs lines vs lights vs smoke - always put lights last!). - Changing to and from the cockpit texture (try to avoid doing this more than once!!) - Changing "shader" (this includes: two sided, poly OS, blend mode, ATTR_shiny_rat, ATTR_light_level - Material changes other than shiny rat. - Animation - "Cheap" attributes (ATTR_hard/ATTR_hard_deck, ATTR_draw_disable, ATTR_solid_wall) - basically these ones that control which 'physical' mesh a tri is part of are the cheapest attributes and lowest prio to optimize.

In other words, you never want to do anything to make your OBJ bigger to optimize out ATTR_hard, because the sim is good at ATTR_hard, but it is very slow at changing to the cockpit texture, so avoid doing that more than once. Wihtin these buckets, there really isn't much different - that is, poly_os and blend mode both change shaders,so they have nearly the same cost.

Two other notes:

1. You can avoid 'small' batches of two sided by duplicating the geometry and reversing the normal - if an entire OBJ is two sided, ATTR_two_sided _might_ be best, but if a model has only a few two-sided geometry, duplicate the mesh; it can be worth it to create up to 100+ extra triangles to avoid a shader change attribute.

Similarly, you can create 'flat shading' by creating a smooth shaded triangle with the vertex normals all set to the triangle normal, again avoiding the attribute. You virtually never want to use ATTR_flat - just output "flat" normal data.

2. The cost of anim transforms is of course partly in the animation, but part is in the _stoppage_ of drawing. So given a HUGE pile of animation commands, it's good to have them all in one group. That's why Jonathan's exporter performs well even though it duplicates the animations (to avoid attribute state) - the result is typically a set of animations, and they're cheaper in groups.

One other note: if you put the word DEBUG on its own line at the end of your OBJ, X-Plane will output the internal command structure, which is what X-Plane will actually run. It will be different from what you output - some commands are optimized out, and some OBJ commands become multiple X-Plane commands. Generally, shorter output in that log is better, and fewer shader changes are really important.

Note that Jonathan's list contains aesthetic ideas too - e.g. poly_os is before non-poly_os because otherwise the poly_os geometry will look bad! A general draw order for the entire OBJ for aesthetics and speed on the "outermost" level might be:

poly_os tris normal tris cockpit tris lines lights smoke

Actually, I must correct myself. In X-Plane 10, non-custom lights (named and param lights) are the "cheap" attributes. So re-animating for them isn't necessary.

In fact, even in v9 re-aniamting for them might not produce the best results. My suggestion would be:

- If a light is animated, perhaps include it WITH the animation. - Include all non-animated lights at the end in one big block.

Re: alpha blended...ideally your user won't turn alpha blending off and on very much; doing so is expensive.

Jonathan's code copes with this by effectively coming up with an "ID" for each unique set of fully animated transform state, and a sort order that orders it such that parents always come before children. Thus any object he tracks knows its parent transform and he can output the transform multiple times if the batches get split up.

In the end, you may have to have a few of your most advanced users try their 'huge' objects and compare prioritizations. There will always be exceptions too - if an animation uses a plugin dataref and the plugin is slow, in theory it could be a worse penalty than a shader change.

In the end, the fast path is an OBJ with _no_ internal shader changes; this case is fast no matter what the sort order (since there are no shader changes to sort) but if a user includes shader changes, there is going to be a speed hit.

Also you may want to read this:

http://wiki.x-plane.com/Optimizing_Object_Performance

What is 'most important' can be hugely affected by the authoring scenario.

= Part 2: LOD =

So first, most important: when an OBJ is farther away than its FARTHEST LOD (and remember, an OBJ with no ATTR_LOD has one LOD assigned by the sim, so including a single ATTR_LOD is more like you taking control of this far LOD) then there is a BIG performance win. When the OBJ is "truly gone", the sim does not have to set up the shader at all, nor does it have to ensure the texture and mesh are in VRAM, so the working set of VRAM and bus bandwidth is decreased.

Therefore the BIGGEST win that any author can do in ANY scenario is to make sure that the FARTHEST LOD is not too huge. This is a way to get a LOT of detail into airplanes without always paying for it, or to get lots of small objects into scenery.

To give you an idea, the runway light objects are 3-d, have a lot of tris, and there are perhaps 8000+ objects in a single airport. But with only 500 m of "max" LOD, they are mostly _gone_ and thus have...no cost. :-)

Now if you are going to have more than one LOD, then there is a trade-off: you burn more memory in VRAM to have TWO versions of the mesh, but you don't have to do as much in the far LOD. For this reason, I recommend at least two LODs for very complex, expensive objects (e.g. a 500,000 triangle fuselage should have a second LOD). This takes out the 'sting' of the the first LOD when a lot of other scenery is visible.

But I do not recommend a 3rd, 4th, 5th LOD - a that point we start to see diminishing returns - not a huge decrease in triangle count, but a lot more COPIES of the object.

One trick you can apply here...if the user is clever and builds the OBJ with some OBJs visible in ALL LODs and some in others, you can do this:

ATTR_LOD 0 1000 TRIS 0 3000 TRIS 3000 9000 ATTR_LOD 1000 10000 TRIS 0 3000
 * 1) far view
 * 1) details

IN this case, a mesh (indices + vertices) is REPEATED in both LODs. IF you can manage this, it's a nice win...we just saved duplicating the far view.

If the attribute list is expensive for an OBJ, LOD is a win. Per the previous document, attributes are most expensive when an OBJ is used a lot. So for example,if you (for some reason) were to have an animation in a building that is used a lot in a city, it would be a huge win to have the far LOD _not_ have the animation..the far LOD would be on the "fast" path.

X-Plane 10 will have additional tricks to optimize using LOD, so it would also be worth it to have LOD support be 'ready'.

Like attributes, a lot of this is author judgment; you can't stop authors from making stupid models...the most important thing for an exporter is: if an author TRIES to make their model, does the exporter write out the right output?

See also here:

http://www.x-plane.com/blog/2011/04/what-is-the-cost-of-1-million-vertices/

Normally we think of a high vertex count in an OBJ as "for free" because the raw throughput of the engine is very good. But authors are making some ridiculously complex objects. If an author puts a million vertices on an airplane, LOD is warranted.

And of course the final puzzle is: there are two sides to this...X-Plane is constantly changing, in an attempt to be faster. Generally speaking, good optimization doesn't hurt when the sim is made faster..the only cost is that some previously expensive operations might become less expensive. (For example, animated named lights in X-Plane 10 don't cause a shader change - they do in X-Plane 9.)

The universal rule remains: for fast fps, use as few attributes as possible. There are three cases where you can use mesh tricks to avoid the attribute: 1. ATTR_no_cull (output the tri twice) 2. ATTR_flat (output flat normals per vertex) 3. ATTR_shiny_rat more than once (the author can use a normal map).