Krammer's blog

This is an assignment from CMU 16720-A. In this assignment, we are going to implement an object tracker by using Inverse Compositional Lucas & Kanade algorithm. There will be a problem called template drifting in this kind of algorithm. I will implement a template correction algorithm to fix this problem. Then, still based on Lucas & Kanade algorithm, plus affine motion, I will implement a moving object detector.

Inverse Compositional Lucas & Kanade

The Lucas & Kanade algorithm is really complicated, so I will not explain here. Essentially, we are given a video and an initial template image. We hope to keep tracking the object (template image) in every frame, of course, we will update the template image every frame. The math behind this algorithm is that we want to find parameters p which minimizes the sum of the square difference between the warped image and template image. We can model the problem as below.

$\min_p \sum_{x} [I(W(x;p))-T(x)]^2$

The I means the image. The W(x;p) means that we warp the pixel at position x (2D) with p (2D as well). So, our goal is to find parameters such that we can warp a patch of an image and minimize the pixel difference between the warped image and template image. This equation is actually a non-linear (quadratic) function of a non-parametric function. Thus, it is very hard to optimize. Here, we assume that we have a pretty good initial guess of p and then use Taylor series approximation to make it linear. So we can solve this by SVD. But since we can only solve the $\delta p$ , we need to keep updating p until converge, just like gradient descent. The theory is very complicated. Let’s see a demo.

Template correction

As you can see in above demo, our tracking is not accurate. This is because we update our template image after processing each frame. As a result, whenever there is a small error after processing a frame, our template changes and thus the error is accumulating. So, we lost our tracking eventually. Here, we implement a template correction algorithm based on https://www.ri.cmu.edu/publications/the-template-update-problem/. Basically, every time we process a frame and get the bounding box of our target, we use the original template image to confirm again. If the two bounding boxes are close, we know that there are not many errors in the tracking this time, so we can update our template image. Otherwise, we shouldn’t update our template image in this frame. Below is another demo after implementing this algorithm.

Moving object detector

We still use Lucas & Kanade algorithm to implement this, but this time, instead of only using a patch of an image, we use the whole image frame. Another difference is that we are not going to estimate a two-dimensional p, instead, we are going to estimate a six-dimensional p matrix which represents the affine warp. Then, for each frame, we compute the affine matrix first, and we map the $I_t+1$ back to I using that affine matrix. Next, we take the difference between the warped frame and original frame and use the difference as a mask. Now, we know where is the moving objects by looking at the mask.