Friday, March 11, 2011

OpenGL Vertex Buffer Objects

It took me some time, but I've finally got them working the way I want, and now you can too. But first, an example...

Now you can see why that broken foot silhouette was so exciting for me. Now onto the making it happen for you.

VBO's in a nutshell

Traditionally Vertex Buffers were a place you could upload carefully formatted geometry to the video card, and that's about it. Then came the ARB_matrix_palette extension, and things got a bit exciting. Nowadays we have GLSL, and can put whatever we feel like in them. Lets take a look at how that's done...
// VBO init code
GLuint vbo = -1;
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, num_vert * sizeof(vertex), vert_ptr, GL_STATIC_DRAW);

After that, you upload the vertex indicies for your polygons...
// Element Array Init Code
GLuint ebo = -1;
glGenBuffers(1, &ebo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, &ebo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, num_quad * 8, quad_ptr, GL_STATIC_DRAW);

And to draw you call...
// Draw Code
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glVertexPointer(3, GL_FLOAT, 0, 0);
glDrawElements(GL_QUADS, num_quad * 8, GL_UNSIGNED_SHORT, 0);

At least, that's what you would expect from reading the spec. And indeed this code may just render your pile of flat untextured polygons. On some cards.

This is where it gets fun. Buckle up.

Thar be dragons ahead

Lets start with the VBO init code. Actually that's almost perfect - however a lone vertex doesn't cut it. Now a float is 4 bytes, so a lone vertex is 12 bytes. We also want Normals - another 12 bytes, and a Texture Coordinate, 8 more bytes bringing us to 32 bytes exactly, which is really convenient as we will find out, but first allow me to segway onto skeletal animation techniques for a moment.

Skeletal Animation

So you have 2 bones connected by a joint, lets imagine a knee. Now when you bend your knee, say 90°, everything from a bit below the knee moves in a straight forward 100% of 90degrees motion. It's the points along your knee that are tricky, otherwise you get nasty artifacts like this...

So we need a method of partially rotating the points along your knee, depending how close they are to the joint, or whatever, we don't care, that's the modellers problem. What we need to care about is how it's done, with Weights. So the bone will have a list of which points it affects, and to what extent it affects each of them. In the bad old scary days when this was all done in software people used Quaternions. Quaternions can be used similar to matrices, in that they can store rotation, and you can bash some vertices against a quaternion and it will spit out rotated vertices. Quaternions have the very cool property that it is possible to interpolate between 2 quaternions, depending on the method you use, it will correctly follow the plane you want it to rotate in. Also Quaternions use less operations that Matrices do. Anyhow, these days we have hardware that does Matrix * vertex multiplication really fast, and the modern day trick is to use these weights to do weighted averaging of the points that are spit out by the two separate matrices.

Back to our Vertex Buffer Object

So our nice friendly lovable artists have given us a model with a bunch of bones, which each reference a bunch of vertices and weights. We need to find out how many bones reference each vertex, and specifically find out what is the most bones referencing a single vertex. For the particular build of MakeHuman I'm using models from, this is 6.

ATI recommends that your vertex entries are multiples of 32 bytes, to speed up fetch operations. This is sound advice no matter what hardware you're using. So the vertex entry I'm uploading looks like this...
12 bytes of Vertex (3 floats)
12 bytes of Normal (3 floats)
8 bytes of TexCoord (2 floats)
24 bytes of Bone Weights (6 floats)
6 bytes of bone indicies (1 byte each)
1 byte of number of bones actually referencing this vert
1 byte of zero padding

Now we have that settled, lets look at the final version of the VBO init code...
// VBO init code - Final
GLuint vbo = -1;
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, num_vert * 64, vert_ptr, GL_STATIC_DRAW);

It's nice when things work out like that. Onto the Element Buffer. This is where things get a bit annoying. Here it is again to refresh your memory.
// Element Array Init Code
GLuint ebo = -1;
glGenBuffers(1, &ebo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, &ebo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, num_quad * 8, quad_ptr, GL_STATIC_DRAW);

On my ATI 4870x2, that call to glBufferData(), while it does allocate the appropriate memory on the video card, doesn't actually copy any data. No problem, a quick read of the spec says we can do this...
glBufferSubData(GL_ELEMENT_ARRAY_BUFFER, 0, num_quad * 8, quad_ptr);

No you can't. That call actually crashes the video driver on my card. Yeah not happy at all. All is not lost... here is the Element Buffer init code that actually works...
// Element Array Init Code
GLuint ebo;
void *tmp;
glGenBuffers(1, &ebo);
memcpy(tmp, quad_ptr, num_quad * 8);

Last Wave of Dragons, I Promise

Here's the code I'm using to draw my mesh...
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glVertexPointer(3, GL_FLOAT, 64, 0);
glNormalPointer(GL_FLOAT, 64, 12);
glTexCoordPointer(2, GL_FLOAT, 64, 24);
glVertexAttribPointer(bw1, 4, GL_FLOAT, GL_FALSE, 64, 32);
glVertexAttribPointer(bw2, 2, GL_FLOAT, GL_FALSE, 64, 48);
glVertexAttribPointer(bi1, 4, GL_BYTE, GL_FALSE, 64, 56);
glVertexAttribPointer(bi2, 4, GL_BYTE, GL_FALSE, 64, 60);


glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, obj->ebo);

glDrawElements(GL_QUADS, obj->nquad*8, GL_UNSIGNED_SHORT, 0);


glBindBuffer(GL_ARRAY_BUFFER, 0);

And here is the vertex shader I used...
uniform mat4 SkinMat[80];

attribute vec4 weight1;
attribute vec2 weight2;
attribute vec4 boneid1;
attribute vec4 boneid2;

varying vec4 pos;
varying vec3 normal;

void main()
 normal = gl_NormalMatrix * gl_Normal;

 pos = vec4(0,0,0,0);

 vec4 weight = weight1;
 vec2 w2 = weight2;
 vec4 bone = boneid1;
 vec4 b2 = boneid2;
 int nbone = int(boneid2.w);

  pos = gl_Vertex;

  for(int i=0; i<nbone; i++)
   pos += weight.x * (SkinMat[int(bone.x)] * gl_Vertex);
 // Rotate variables
   weight = vec4(weight.yzw, w2.x);
   w2.x = w2.y;
   bone = vec4(bone.yzw, b2.x);
   b2.x = b2.y;

 gl_FrontColor = weight1;
 gl_Position = gl_ModelViewProjectionMatrix * pos;
 pos = gl_ModelViewMatrix * gl_Vertex;

So it's important to mention that all of those..

calls were preceded with...
bw1 = glGetAttribLocation(prog, "weight1");
bw2 = glGetAttribLocation(prog, "weight2");
bi1 = glGetAttribLocation(prog, "boneid1");
bi2 = glGetAttribLocation(prog, "boneid2");
SkinMatUnif = glGetUniformLocation(prog, "SkinMat");

Lastly I should mention this... looking at the spec may make you think that you can just throw whatever you want at the shader, and it's all good. Not so! While I'm sure one day there may be a device that supports every conceivable combination out there, something like...
glVertexAttribPointer(foo, 3, GL_BYTE, GL_FALSE, 64, 60);

Is going to end in tears. The short version is this... only expect it to work with the types that can be sent individualy old-school style... as documented here. Want to send 3 GL_SHORT's to the video card? Is there a glVertexAttrib3s() function? no. Hope that helps.

Monday, March 7, 2011

Most Underwhelming Screenshot Ever

See that silhouette of a horribly broken foot? That's not just any broken foot, that's a fully hardware accelerated foot :-)


Yes I get what you're thinking... who cares? Well this particular foot is attached to a model exported from MakeHuman into an OpenGL Vertex Buffer Object, with the full armature and all skinning calculated on the GPU. I've spent the past few days banging my head against the ATI drivers wondering what crack they were smoking, and now I finally understand it all after writing a VBO-free version (that still uses the skinning vertex shader).


This 13k poly model (all quads), with most of the bone structure implemented (I've got 70 bones, Makehuman exports about 120 last I checked), currently uses about a single percentage point of my CPU (I have a 4870x2, and as such, don't care how much GPU it's using), and the video card doesn't need to spin up it's fans to deal with this. The model data totals about 1Mb of video memory. I still have to write some code to "unzip" the texture seams (VBO's can't have multiple Texture Coordinates per Vertex). So, lots of work left, but all of the technical challenges have been conquered. Hazzah!