Video Capture Across Devices and Platforms

Working with AR technology starts rather mundane with getting some sort of video from a video source, usually an USB webcam or built in camera in a phone. One would believe that is the easiest part. But trust me it is not. There are more APIs and approaches out there then you care to learn about. Everybody trying to do things different or easier. And for what is worth it, the underlying device drivers are doing the "right thing" sometimes, but not always. One has to deal with threads that are spawned by the underlying APIs and whatever format the frames come up. Almost always, there are not enough controls for the camera parameters and so on ...
To make matters worse for realtime things like AR, some APIs are blocking, some are buffering like crazy and almost always the frame-rate can't be set reliably only leaving the option to measure it post-hoc.

The state of affairs can be briefly described as follows. On Windows DirectShow was broken and still is. That is on multiple levels. The headers are broken even now if you try to use some specific things. Sometimes drivers honor stuff that is set in the Pins, sometimes not. And for some time there is WMF which supposedly was the successor. But if you look at it, its just DirectShow + DRM with documentation telling us poor developers to stay clear if you want to capture video from a device.
Lets move on to Mac OS X. Long time ago there was the mess that is QuickTIme capture that has been long gone. For a while we were supposed to use CoreVideo and QTKit which actually was working reasonably until it is now marked deprecated again. Already here there was basically no control over how to set a reasonable sized format or have control over the capturing thread. Now in Lion we are supposed to use AVFoundation and hell is that thing broken again. It turns out that none of the good web cameras - basically everything except the built in iSights - is having problems. Point in case, a Logitech camera does squeeze the original image in whatever you request from it. Or for instance a Microsoft camera is basically unusable as none of the automatic adjustments or previous settings are honored making the video appear over saturated. And did I mention, there are no controls or API calls to allow an adjustment. And yes, the iSight or FaceTime camera is getting worse and worse each iteration. Nothing left of the marketing "magic".
Lets move over to Linux and maybe I shouldn't get started on V4L2. But we are already here. Depending on distribution, Dollar-to-Euro exchange rate and last weeks coin toss it comes with the user space conversion libraries or not and is able to load firmware files or not. And if you try to avoid it, lets have a look at GStreamer where yet another way of getting buffer probes has been introduced. And depending on the device you are using the v4l2_source needs to have this or that property set to actually get the pipeline running. And lets forget about the 1.0 incarnation. I haven't played with it but I guess nothing has been done on that part - but probably, or better most likely the API has been changed.

If we look at the relevant mobile OS we can go in with the iOS specifics of AVFoundation and the crutches needed in Android. Still now on Android there is the trade off with the Preview based capture interface where you get a buffer some random time after it has appeared on the screen or you have to pull some magic to get native access to the relevant interfaces.

And all of this was for devices only. The whole thing can be repeated with video files again.

In light of this I have decided to actually cushion myself from the pain of this and revamped an old project of mine where I will put all the code of the capturing parts of SSTT inside.

Video Extractor