<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">

<channel>
	<title>More Than Technical</title>
	
	<link>http://www.morethantechnical.com</link>
	<description>On software, code, the internet and more.</description>
	<lastBuildDate>Mon, 23 Aug 2010 10:51:39 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/MoreThanTechnical" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="morethantechnical" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://superfeedr.com/hubbub" /><item>
		<title>Combining OpenCV 2.x and Microsoft WPF [w/ code]</title>
		<link>http://www.morethantechnical.com/2010/07/21/combining-opencv-2-x-and-microsoft-wpf-w-code/</link>
		<comments>http://www.morethantechnical.com/2010/07/21/combining-opencv-2-x-and-microsoft-wpf-w-code/#comments</comments>
		<pubDate>Wed, 21 Jul 2010 21:30:59 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[gui]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[tips]]></category>
		<category><![CDATA[wpf]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=687</guid>
		<description><![CDATA[Hi, Just a quicky about OpenCV and Windows Presentation Framework interoperability. It&#8217;s really easy with a simple Managed C++ wrapper. What I&#8217;ll show is how to transfer an OpenCV cv::Mat into WPF&#8217;s BitmapSource. Let&#8217;s see how it&#8217;s done. First some framework. I&#8217;ll define a managed C++ (ref) class that holds everything WPF needs to make [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/07/opencv_mswpf.png" rel="lightbox[687]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/07/opencv_mswpf-300x162.png" alt="" title="opencv_mswpf" width="300" height="162" class="alignleft size-medium wp-image-692" /></a>Hi,<br />
Just a quicky about OpenCV and Windows Presentation Framework interoperability. It&#8217;s really easy with a simple Managed C++ wrapper. What I&#8217;ll show is how to transfer an OpenCV cv::Mat into WPF&#8217;s BitmapSource. Let&#8217;s see how it&#8217;s done.<br />
<span id="more-687"></span><br />
First some framework. I&#8217;ll define a managed C++ (ref) class that holds everything WPF needs to make a BitmapSource:</p>
<pre class="brush: plain;">
public ref class OpenCVImageWrapper {
	public:
		int width;
		int height;
		cli::array&lt;byte&gt;^ buf;
		int buf_size;
		int row_stride;
		int channels;
	};
</pre>
<p>That&#8217;s it, now let&#8217;s build a BitmapSource out of an instance of this class in C#:</p>
<pre class="brush: plain;">
OpenCVImageWrapper im = getFromOpenCV();
BitmapSource bs = BitmapSource.Create(
                im.width, im.height, 96, 96,
                System.Windows.Media.PixelFormats.Bgr24, //need to look out here: this is for BGR 24-bits images, you might need Gray8, Gray32Float or one of others...
                null, im.buf, im.row_stride);
imageOnScreen.Source = bs;
</pre>
<p>(imageOnScreen is an Image object I dragged and dropped into a window in the visual GUI builder)</p>
<p>Now we need a bit more managed C++ wizardry to fill up the wrapper, here it is:</p>
<pre class="brush: plain;">
OpenCVImageWrapper^ getFromOpenCV() {
Mat im = imread(&quot;someimage.jpg&quot;);

OpenCVImageWrapper^ i = gcnew OpenCVImageWrapper();

i-&gt;buf = gcnew cli::array&lt;byte&gt;(im.rows * im.step);
System::Runtime::InteropServices::Marshal::Copy(System::IntPtr::IntPtr(im.data),i-&gt;buf,0,im.rows * im.step);

i-&gt;width = im.cols;
i-&gt;height = im.rows;
i-&gt;row_stride = im.step;
i-&gt;channels = im.channels();
i-&gt;buf_size = im.rows * im.step;

return i;
}
</pre>
<p>We&#8217;re pretty much done. Basic stuff!</p>
<p>It&#8217;s not memory optimized as I&#8217;m copying the buffer, but it gets the job done.</p>
<p>Thanks<br />
Roy.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F07%2F21%2Fcombining-opencv-2-x-and-microsoft-wpf-w-code%2F&amp;linkname=Combining%20OpenCV%202.x%20and%20Microsoft%20WPF%20%5Bw%2F%20code%5D"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/07/21/combining-opencv-2-x-and-microsoft-wpf-w-code/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Image Recoloring using Gaussian Mixture Model and Expectation Maximization [OpenCV, w/Code]</title>
		<link>http://www.morethantechnical.com/2010/06/24/image-recoloring-using-gaussian-mixture-model-and-expectation-maximization-opencv-wcode/</link>
		<comments>http://www.morethantechnical.com/2010/06/24/image-recoloring-using-gaussian-mixture-model-and-expectation-maximization-opencv-wcode/#comments</comments>
		<pubDate>Thu, 24 Jun 2010 15:34:59 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Recommended]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[gui]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[expectation maximization]]></category>
		<category><![CDATA[gaussian]]></category>
		<category><![CDATA[mixture model]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=673</guid>
		<description><![CDATA[Hi, I&#8217;ll present a quick and simple implementation of image recoloring, in fact more like color transfer between images, using OpenCV in C++ environment. The basis of the algorithm is learning the source color distribution with a GMM using EM, and then applying changes to the target color distribution. It&#8217;s fairly easy to implement with [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/06/eggplant_orange.png" rel="lightbox[673]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/06/eggplant_orange.png" alt="" title="eggplant_orange" width="654" height="187" class="alignleft size-full wp-image-684" /></a>Hi,<br />
I&#8217;ll present a quick and simple implementation of image recoloring, in fact more like color transfer between images, using OpenCV in C++ environment. The basis of the algorithm is learning the source color distribution with a GMM using EM, and then applying changes to the target color distribution. It&#8217;s fairly easy to implement with OpenCV, as all the &#8220;tools&#8221; are built in.</p>
<p>I was inspired by <a href="http://www.cs.tau.ac.il/~liors/research/papers/image_appearance_exploration.pdf">Lior Shapira&#8217;s work</a> that was presented in Eurographics 09 about image appearance manipulation, and a work  about recoloring for the colorblind by <a href="http://www.sciweavers.org/files/docs/2358/icassp_cvd_poster_pdf_4a383d1fb0.pdf">Huang et al</a> presented at ICASSP 09. Both works deal with color manipulation using Gaussian Mixture Models.</p>
<p>Let&#8217;s see how it&#8217;s done!<br />
<span id="more-673"></span></p>
<h2>A little theory</h2>
<p>I won&#8217;t bore you with the math, but just to get a hang of the idea, what we would like to do is learn how the colors in the source and target images are distributed. Naturally we can assume, like many other things in nature, the colors in a picture have a <a href="http://en.wikipedia.org/wiki/Normal_distribution">normal (Gaussian) distribution</a>, but we can go further by saying the distribution might have <strong>a few Gaussians</strong> describing it. This is called a <a href="http://en.wikipedia.org/wiki/Mixture_distribution">mixture distribution</a>, and it&#8217;s a very handy statistical tool. Mixtures can be estimated using a powerful and ubiquitous tool called <a href="http://en.wikipedia.org/wiki/Expectation_maximization">Expectation Maximization</a>, which I have previously covered. EM essentially tries to recover the mean (mu) and variance (sigma) of the Gaussians in the mixture, by iteratively checking the current hypothesis against the data until finally converging at an extremum.</p>
<h2>Learning the color model</h2>
<p>For the learning process we must set up the sample data. So we create a binary mask saying where in the image the model can find the colors to learn.<br />
&#8211;images&#8211;<br />
Then we scan the mask and for each foreground pixel we add it&#8217;s value (here, RGB, but basically can be anything) to the sample data. Finally we train the CvEM object, which contains the GMM parameters.</p>
<pre class="brush: plain;">
void TrainGMM(CvEM&amp; source_model, Mat&amp; source, Mat&amp; source_mask) {
		int src_samples_size = countNonZero(source_mask);
		Mat source_samples(src_samples_size,3,CV_32FC1);

		Mat source_32f;
		source_32f = source;

		int sample_count = 0;
		for(int y=0;y&lt;source.rows;y++) {
			Vec3f* row = source_32f.ptr&lt;Vec3f&gt;(y);  //pointer to pixel data in the row
			uchar* mask_row = source_mask.ptr&lt;uchar&gt;(y); //pointer to binary mask
			for(int x=0;x&lt;source.cols;x++) {
				if(mask_row[x] &gt; 0) {
					source_samples.at&lt;Vec3f&gt;(sample_count++,0) = row[x];
				}
			}
		}

		source_model.clear();
		CvEMParams ps(3/* = number of gaussians*/);
		source_model.train(source_samples,Mat(),ps,NULL);
}
</pre>
<p>What we have are three 3-dimensional (R,G,B) Gaussians, describing the colors in the selected area.</p>
<h2>Matching Gaussians</h2>
<p>Now we have a couple of GMMs &#8211; one for the target and one for the source. The idea is to take a pixel in the target, see how the 3 target Gaussians describe it, and shift it to use the 3 source Gaussians. This will, hopefully, cause its color to change from target to source. But we need to know which Gaussian in the target corresponds to which Gaussian in the source. I made a quick selection algorithm that greedily chooses a Gaussian for each Gaussian. I permutate the order of selection for the greedy algorithm, and pick the best permutation to get closer to the optimal selection.</p>
<pre class="brush: plain;">
vector&lt;int&gt; Recoloring::MatchGaussians(CvEM&amp; source_model, CvEM&amp; target_model) {
		int num_g = source_model.get_nclusters();
		Mat sMu(source_model.get_means());
		Mat tMu(target_model.get_means());
		const CvMat** target_covs = target_model.get_covs();
		const CvMat** source_covs = source_model.get_covs();

		double best_dist = std::numeric_limits&lt;double&gt;::max();
		vector&lt;int&gt; best_res(num_g);
		vector&lt;int&gt; prmt(num_g); 

		for(int itr = 0; itr &lt; 10; itr++) {
			for(int i=0;i&lt;num_g;i++) prmt[i] = i;	//make a permutation
			randShuffle(Mat(prmt));

			//Greedy selection
			vector&lt;int&gt; res(num_g);
			vector&lt;bool&gt; taken(num_g);
			for(int sg = 0; sg &lt; num_g; sg++) {
				double min_dist = std::numeric_limits&lt;double&gt;::max();
				int minv = -1;
				for(int tg = 0; tg &lt; num_g; tg++) {
					if(taken[tg]) continue;

					//TODO: can save on re-calculation of pairs - calculate affinity matrix ahead
					//double d = norm(sMu(Range(prmt[sg],prmt[sg]+1),Range(0,3)),	tMu(Range(tg,tg+1),Range(0,3)));

					//symmetric kullback-leibler
					Mat diff = Mat(sMu(Range(prmt[sg],prmt[sg]+1),Range(0,3)) - tMu(Range(tg,tg+1),Range(0,3)));
					Mat d = diff * Mat(Mat(source_covs[prmt[sg]]).inv() + Mat(target_covs[tg]).inv()) * diff.t();
					Scalar tr = trace(Mat(
						Mat(Mat(source_covs[prmt[sg]])*Mat(target_covs[tg])) +
						Mat(Mat(target_covs[tg])*Mat(source_covs[prmt[sg]]).inv()) +
						Mat(Mat::eye(3,3,CV_64FC1)*2)
						));
					double kl_dist = ((double*)d.data)[0] + tr[0];
					if(kl_dist&lt;min_dist) {
						min_dist = kl_dist;
						minv = tg;
					}
				}
				res[prmt[sg]] = minv;
				taken[minv] = true;
			}

                       //total distance for the permutation
			double dist = 0;
			for(int i=0;i&lt;num_g;i++) {
				dist += norm(sMu(Range(prmt[i],prmt[i]+1),Range(0,3)),
							tMu(Range(res[prmt[i]],res[prmt[i]]+1),Range(0,3)));
			}
			if(dist &lt; best_dist) {
				best_dist = dist;
				best_res = res;
			}
		}

		return best_res;
	}
</pre>
<p>I used Symmetric Kullback-Leibler for the distance between Gaussians, as suggested by Huang et al.</p>
<h2>Applying the color</h2>
<p>Now all we have to do is use the method Shapira suggested in his work to transform a pixel&#8217;s color, from the Gaussians describing it.<br />
I&#8217;m only putting the essence of the code, the rest is in the file.</p>
<pre class="brush: plain;">
		Mat pr; Mat samp(1,3,CV_32FC1);
		for(int y=0;y&lt;target.rows;y++) {
			Vec3f* row = target_32f.ptr&lt;Vec3f&gt;(y);
			uchar* mask_row = target_mask.ptr&lt;uchar&gt;(y);
			for(int x=0;x&lt;target.cols;x++) {
				if(mask_row[x] &gt; 0) {
                                        //take pixel data
					memcpy(samp.data,&amp;(row[x][0]),3*sizeof(float)); 

                                        //Use the GMM to predict how close this pixel is to each gaussian
					float res = target_model.predict(samp,&amp;pr);

					Mat samp_64f; samp.convertTo(samp_64f,CV_64F);

                                        //Move the pixel to the new Gaussians
					//From Shapira09: Xnew = Sum_i { pr(i) * Sigma_source_i * (Sigma_target_i)^-1 * (x - mu_target) + mu_source }
					Mat Xnew(1,3,CV_64FC1,Scalar(0));
					for(int i=0;i&lt;num_g;i++) {
						if(((float*)pr.data)[i] &lt;= 0) continue;

                                               //For each Gaussian, subtract the original mean and add the target mean,
                                               //use probabilities to get a weighted average.
						Xnew += Mat((
							//Mat(source_covs[match[i]]) *
							//Mat(target_covs[i]).inv() *
							Mat(samp_64f - tMu_64f(Range(i,i+1),Range(0,3))).t() +
							sMu_64f(Range(match[i],match[i]+1),Range(0,3)).t()
							) * (double)(((float*)pr.data)[i])).t();
					}

                                        //Put pixel back into place
					Mat _tmp; Xnew.convertTo(_tmp,CV_32F);
					memcpy(&amp;(row[x][0]),_tmp.data,sizeof(float)*3);
				}
			}
		}
</pre>
<p>You might notice I skip the part of multiplying by the covariances matrices, as Shapira did. I found it produces better results, but it&#8217;s probably caused by a bug.</p>
<h2>Results</h2>
<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/06/recoloring_result1.png" rel="lightbox[673]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/06/recoloring_result1-300x83.png" alt="" title="recoloring_result1" width="300" height="83" class="aligncenter size-medium wp-image-675" /></a><br />
<a href="http://www.morethantechnical.com/wp-content/uploads/2010/06/recoloring_result.png" rel="lightbox[673]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/06/recoloring_result-300x79.png" alt="" title="recoloring_result" width="300" height="79" class="aligncenter size-medium wp-image-674" /></a></p>
<h2>Code and stuff</h2>
<p>Source code is available in SVN repo:</p>
<pre class="brush: plain;">
svn checkout http://morethantechnical.googlecode.com/svn/trunk/GMM_Recoloring recoloring
</pre>
<p>Images from Flickr (Creative Commons):</p>
<ul>
<li>http://www.flickr.com/photos/wwworks/2956622857/sizes/s/</li>
<li>http://www.flickr.com/photos/violentz/3199292482/sizes/s/</li>
<li>http://www.flickr.com/photos/davidw/164670455/sizes/m/</li>
<li>http://www.flickr.com/photos/djania/252225693/sizes/m/</li>
</ul>
<p>Now go recolor the world!</p>
<p>Thanks for tuning in..<br />
Roy.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F06%2F24%2Fimage-recoloring-using-gaussian-mixture-model-and-expectation-maximization-opencv-wcode%2F&amp;linkname=Image%20Recoloring%20using%20Gaussian%20Mixture%20Model%20and%20Expectation%20Maximization%20%5BOpenCV%2C%20w%2FCode%5D"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/06/24/image-recoloring-using-gaussian-mixture-model-and-expectation-maximization-opencv-wcode/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Iterative Closest Point (ICP) for 2D curves with OpenCV [w/ code]</title>
		<link>http://www.morethantechnical.com/2010/06/06/iterative-closest-point-icp-with-opencv-w-code/</link>
		<comments>http://www.morethantechnical.com/2010/06/06/iterative-closest-point-icp-with-opencv-w-code/#comments</comments>
		<pubDate>Sun, 06 Jun 2010 07:59:37 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Solutions]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[curve fitting]]></category>
		<category><![CDATA[icp]]></category>
		<category><![CDATA[knn]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=666</guid>
		<description><![CDATA[ICP &#8211; Iterative closest point, is a very trivial algorithm for matching object templates to noisy data. It&#8217;s also super easy to program, so it&#8217;s good material for a tutorial. The goal is to take a known set of points (usually defining a curve or object exterior) and register it, as good as possible, to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/06/icp.png" rel="lightbox[666]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/06/icp.png" alt="" title="ICP" width="283" height="223" class="alignleft size-full wp-image-669" /></a>ICP &#8211; Iterative closest point, is a very trivial algorithm for matching object templates to noisy data. It&#8217;s also super easy to program, so it&#8217;s good material for a tutorial. The goal is to take a known set of points (usually defining a curve or object exterior) and register it, as good as possible, to a set of other points, usually a larger and noisy set in which we would like to find the object. The basic algorithm is described very briefly in <a href="http://en.wikipedia.org/wiki/Iterative_Closest_Point">wikipedia</a>, but there are a ton of papers on the subject.</p>
<p>I&#8217;ll take you through the steps of programming it with OpenCV.<br />
<span id="more-666"></span><br />
So the algorithm is basically</p>
<ol>
<li>Find the points which are closest to our curve</li>
<li>Compute the best rotation and translation between our points and the closest points</li>
<li>Move our points according to the found transformation, and check the error</li>
<li>Run another iteration until convergence.</li>
</ol>
<p>How easy is that?? OpenCV gives us the tools to do everything in there!</p>
<h2>Finding closest points</h2>
<p>In v2.x of OpCV they introduced the FLANN framework of range search methods. It&#8217;s fairly easy to use, and produces good results, although I did run into some problems with it before. Let&#8217;s see how it&#8217;s used:</p>
<pre class="brush: plain;">
float flann_knn(Mat&amp; m_destinations, Mat&amp; m_object, vector&lt;int&gt;&amp; ptpairs, vector&lt;float&gt;&amp; dists = vector&lt;float&gt;()) {
	// find nearest neighbors using FLANN
	cv::Mat m_indices(m_object.rows, 1, CV_32S);
	cv::Mat m_dists(m_object.rows, 1, CV_32F);

	Mat dest_32f; m_destinations.convertTo(dest_32f,CV_32FC2);
	Mat obj_32f; m_object.convertTo(obj_32f,CV_32FC2);

	assert(dest_32f.type() == CV_32F);

	cv::flann::Index flann_index(dest_32f, cv::flann::KDTreeIndexParams(2));  // using 2 randomized kdtrees
    flann_index.knnSearch(obj_32f, m_indices, m_dists, 1, cv::flann::SearchParams(64) ); 

    int* indices_ptr = m_indices.ptr&lt;int&gt;(0);
    //float* dists_ptr = m_dists.ptr&lt;float&gt;(0);
    for (int i=0;i&lt;m_indices.rows;++i) {
   		ptpairs.push_back(indices_ptr[i]);
    }

	dists.resize(m_dists.rows);
	m_dists.copyTo(Mat(dists));

	return cv::sum(m_dists)[0];
}
</pre>
<p>This code was practically ripped off OpenCV&#8217;s sample code, and worked straight up.. You can see I have 2 input matrices that define the &#8220;destination&#8221; points &#8211; these are the unknown, the &#8220;object&#8221; points &#8211; these are our points, and an output vector to mark the correspondence between the two sets. I also have a vector for distances between points of each pair.<br />
I define a FLANN index object using KD-tree and perform a KNN (K nearest neighbors) search on it, for all the object points. After invoking the search function, it&#8217;s all garnish.</p>
<h2>Compute transform</h2>
<p>With the point pairs given, computing the transform between the two sets should be easy. And it is. I started by computing it using least-squares to find the parameters of the transformation: rotation angle, translation in X and Y directions. But my solution was not &#8220;resistant&#8221; to scale, so I looked up another easy solution to the problem, and surely enough a simple solution was found in this <a href="http://www.cs.duke.edu/researchers/artificial_intelligence/temp/eggert_rigid_body_transformations.pdf">paper</a>. </p>
<pre class="brush: plain;">
void findBestReansformSVD(Mat&amp; _m, Mat&amp; _d) {
	Mat m; _m.convertTo(m,CV_32F);
	Mat d; _d.convertTo(d,CV_32F);

	Scalar d_bar = mean(d);
	Scalar m_bar = mean(m);
	Mat mc = m - m_bar;
	Mat dc = d - d_bar;

	mc = mc.reshape(1); dc = dc.reshape(1);

	Mat H(2,2,CV_32FC1);
	for(int i=0;i&lt;mc.rows;i++) {
		Mat mci = mc(Range(i,i+1),Range(0,2));
		Mat dci = dc(Range(i,i+1),Range(0,2));
		H = H + mci.t() * dci;
	}

	cv::SVD svd(H);

	Mat R = svd.vt.t() * svd.u.t();
	double det_R = cv::determinant(R);
	if(abs(det_R + 1.0) &lt; 0.0001) {
		float _tmp[4] = {1,0,0,cv::determinant(svd.vt*svd.u)};
		R = svd.u * Mat(2,2,CV_32FC1,_tmp) * svd.vt;
	}
#ifdef BTM_DEBUG
	//for some strange reason the debug version of OpenCV is flipping the matrix
	R = -R;
#endif
	float* _R = R.ptr&lt;float&gt;(0);
	Scalar T(d_bar[0] - (m_bar[0]*_R[0] + m_bar[1]*_R[1]),d_bar[1] - (m_bar[0]*_R[2] + m_bar[1]*_R[3]));

	m = m.reshape(1);
	m = m * R;
	m = m.reshape(2);
	m = m + T;// + m_bar;
	m.convertTo(_m,CV_32S);
}
</pre>
<p>I really just followed what they said in the paper: take the mean point off the two sets, build a correlation matrix from the distances, do <a href="http://en.wikipedia.org/wiki/Singular_value_decomposition">SVD</a> and use U and Vt for computing the rotation and translation. It actually works!</p>
<h2>Putting it together</h2>
<p>This is the easy part, just using these two functions and small while loop:</p>
<pre class="brush: plain;">
	while(true) {
		pair.clear(); dists.clear();
		double dist = flann_knn(destination, X, pair, dists);

		if(lastDist &lt;= dist) {
			X = lastGood;
			break;	//converged?
		}
		lastDist = dist;
		X.copyTo(lastGood);

		cout &lt;&lt; &quot;distance: &quot; &lt;&lt; dist &lt;&lt; endl;

		Mat X_bar(X.size(),X.type());
		for(int i=0;i&lt;X.rows;i++) {
			Point p = destination.at&lt;Point&gt;(pair[i],0);
			X_bar.at&lt;Point&gt;(i,0) = p;
		}

		ShowQuery(destination,X,X_bar);

		X = X.reshape(2);
		X_bar = X_bar.reshape(2);
		findBestReansformSVD(X,X_bar);
		X = X.reshape(1); // back to 1-channel
	}
</pre>
<p>This will iteratively search for points and move the curve until the curve doesn&#8217;t move anymore.</p>
<h2>Some proof</h2>
<p><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/0eBpBxCaYpE&#038;hl=en&#038;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/0eBpBxCaYpE&#038;hl=en&#038;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></p>
<h2>Code</h2>
<p>Available in the SVN repo:</p>
<pre class="brush: plain;">
svn checkout https://morethantechnical.googlecode.com/svn/trunk/ICP ICP
</pre>
<p>Thanks for joining!<br />
Roy.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F06%2F06%2Fiterative-closest-point-icp-with-opencv-w-code%2F&amp;linkname=Iterative%20Closest%20Point%20%28ICP%29%20for%202D%20curves%20with%20OpenCV%20%5Bw%2F%20code%5D"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/06/06/iterative-closest-point-icp-with-opencv-w-code/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Bust out your own graphcut based image segmentation with OpenCV [w/ code]</title>
		<link>http://www.morethantechnical.com/2010/05/05/bust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code/</link>
		<comments>http://www.morethantechnical.com/2010/05/05/bust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code/#comments</comments>
		<pubDate>Wed, 05 May 2010 11:27:29 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Recommended]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[gmm]]></category>
		<category><![CDATA[graphcut]]></category>
		<category><![CDATA[segmentation]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=634</guid>
		<description><![CDATA[This is a tutorial on using Graph-Cuts and Gaussian-Mixture-Models for image segmentation with OpenCV in C++ environment. Been wokring on my masters thesis for a while now, and the path of my work came across image segmentation. Naturally I became interested in Max-Flow Graph Cuts algorithms, being the &#8220;hottest fish in the fish-market&#8221; right now [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/GMM-GC-segmentation.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/GMM-GC-segmentation-300x125.png" alt="" title="GMM-GC-segmentation" width="300" height="125" class="alignleft size-medium wp-image-659" /></a><em>This is a tutorial on using Graph-Cuts and Gaussian-Mixture-Models for image segmentation with OpenCV in C++ environment.</em></p>
<p>Been wokring on my masters thesis for a while now, and the path of my work came across image segmentation. Naturally I became interested in Max-Flow Graph Cuts algorithms, being the &#8220;hottest fish in the fish-market&#8221; right now if the fish market was the image segmentation scene.</p>
<p>So I went looking for a CPP implementation of graphcut, only to find out that OpenCV already implemented it in v2.0 as part of their GrabCut impl. But I wanted to explore a bit, so I found this <a href="http://www.csd.uwo.ca/~olga/code.html">implementation by Olga Vexler</a>, which is build upon Kolmogorov&#8217;s framework for max-flow algorithms. I was also inspired by <a href="http://www.wisdom.weizmann.ac.il/~bagon/matlab.html">Shai Bagon&#8217;s usage example</a> of this implementation for Matlab.</p>
<p>Let&#8217;s jump in&#8230;<br />
<span id="more-634"></span></p>
<h2>Bit of Theory</h2>
<p>Before we move on, let&#8217;s dig in <strong>a little</strong> in the theory. We look at the picture as a set of nodes, where each pixel is node and is connected to its neighbors by edges and has a label &#8211; this can be called a <a href="http://en.wikipedia.org/wiki/Markov_random_field">Markov Random Field</a>. MRFs can be solved, i.e. give an optimal labeling for each node and thus an optimal labeling, in a number of ways, one of which being graph cuts based on <a href="http://en.wikipedia.org/wiki/Max-flow_min-cut_theorem">maximal flow</a>. After we label the graph, we expect to get a meaningful segmentation of the image. <a href="http://www.cs.cornell.edu/~rdz/Papers/SZSVKATR-ECCV06.pdf">This paper</a>, by some of the big names in the field (Vexler, Kolmogorov, Agarwala), explains it pretty throughly. There a number of well known segmentation methods that use graph cuts, such as: Lazy Snapping [04], GrabCut [04] and more.</p>
<p>The math in the articles is, as usual, pretty horrific. I like to keep things simple, so I&#8217;ll try to explain the method of GC-segmentation in a simple way. We all remember min cut &#8211; max flow algorithms from 2nd year CS, right? well segmentation using GC is not very different. The magic happens when we weight the nodes and edges in a meaningful way, thus creating meaningful cuts. The weights are usually spit to two terms: Data term (or cost) and Smoothness term. The data term says in simple words: &#8220;How happy this pixels is with that label&#8221;, and the smoothness term pretty much says &#8220;How easy can a label expand from this pixel to that neighbor&#8221;. So when you think about it, the easiest thing would be to put as the data term the likelyhood of a pixel to belong to some label, and for the smoothness term &#8211; just use the edges in the picture!</p>
<p>So anyway, back to the code, only thing left is to create a graph, give weights, and max the flow. Here&#8217;s a bit of code:</p>
<pre class="brush: plain;">
Mat im_small = imread(&quot;a_pic.jpg&quot;);
int num_lables = 3;

// create a grid type graph
GCoptimizationGridGraph gc(im_small.cols,im_small.rows,num_lables);
</pre>
<p>This piece of code created a directed grid graph where every pixel will be a vertex, and each pixel can have one of 3 lables (3 parts in the image to segment).</p>
<h2>GMM to the rescue!</h2>
<p>Now for the weighting. One very &#8220;standard&#8221; way to give a data term to the pixels is by using <a href="http://en.wikipedia.org/wiki/Mixture_model">Gaussian-Mixture-Models</a> (GMM): A method that fits a few gaussian distributions over an unknown probability function to estimate how it looks. In the spirit of keeping things simple, I won&#8217;t go into details. I&#8217;ll just say that it&#8217;s a tool to get the probablility of a pixel to belong to a cluster of other pixels, and it has built-in implementation in OpenCV, which is reason enough for me to use it. In OpenCV GMM models are named EM, which is kind of erroneous, since EM (<a href="http://en.wikipedia.org/wiki/Mixture_model#Expectation_maximization_.28EM.29">Expectation-Maximization</a>) is one of the best methods to estimate a GMM and not a GMM itself.</p>
<p>Using EM in OpenCV is really very easy:</p>
<pre class="brush: plain;">
CvEM model;
CvEMParams ps(3);

Mat lables;
Mat samples;
im.reshape(1,im.rows*im.cols).convertTo(samples,CV_32FC1,1.0/255.0);

model.train(samples,Mat(),ps,&amp;lables);

Mat probs = model.get_probs();
</pre>
<p>Here&#8217;s how it looks when training over the whole image as input data (you can see original image, labeling, minus log probability):</p>
<div id="attachment_639" class="wp-caption alignnone" style="width: 619px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/em-clustering.png" rel="lightbox[634]"><img class="size-full wp-image-639" title="em-clustering" src="http://www.morethantechnical.com/wp-content/uploads/2010/05/em-clustering.png" alt="" width="609" /></a><p class="wp-caption-text">Clustering using EM in OpenCV</p></div>
<p>But, this is not exactly what we wanted&#8230; Since we are dealing with segmentation here, we would like to segment certain area. The purpose of the GMM is to learn how that area looks, based on a small set of samples, and then predict the label for all the pixels in the image.</p>
<p>In GrabCut they pretty much create a GMM for every logical &#8220;cluster&#8221; they want to segment: Positively Background, Probably Background, Probably Foreground and Positively Foreground. This is a good idea and I will follow it, but again, I&#8217;m aiming not for Object Extraction rather for k-way segmentation. In other words I&#8217;m looking for a way to divide the image to a few areas that are significantly similar, and also not similar to the other areas.</p>
<p>To do that I will create a K-gaussians GMM for N areas (see the code, it&#8217;s a long one). I tried 2 versions, where I create a 1-gaussian GMM for each channel (RGB, etc.) of each area it&#8217;s called doEM1D(), and another one with K-gaussian and N-clusters GMM for each area. The results have varied:<br />
<div id="attachment_646" class="wp-caption alignnone" style="width: 612px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/1class-gmm.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/1class-gmm.png" alt="" title="Three 1-ch-1-gs GMMs" width="602" class="size-full wp-image-646" /></a><p class="wp-caption-text">Three 1 channel 1 Gaussian GMMs</p></div><br />
<div id="attachment_647" class="wp-caption alignnone" style="width: 610px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/3class-gmm.jpg" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/3class-gmm.jpg" alt="" title="Three 3-ch-3-gs GMMs" width="600" class="size-full wp-image-647" /></a><p class="wp-caption-text">Three 3-channels 3-Gaussians per channel GMMs</p></div></p>
<p>This will provide us the data-term for our segmentation &#8211; each pixel can now say how comfortable it is with the label it got (we simply use the probability from the GMM).</p>
<h2>Play it smooth</h2>
<p>Right, moving on to the smoothness term. I mentioned before it would be easiest to just use the edges in the image. I use the Sobel filter, which gives a nice strong edge. Again we must look at each pixel&#8217;s value as the likelyhood to have an edge in it, so we should use -log to get it in nice big integers where the probability drops.</p>
<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/kid-edges.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/kid-edges.png" alt="" title="Edges in image" width="533" height="204" class="alignnone size-full wp-image-650" /></a></p>
<pre class="brush: plain;">
GaussianBlur(gray32f,gray32f,Size(11,11),0.75);

Sobel(gray32f,_tmp,-1,1,0,3);	//sobel for dx
_tmp1 = abs(_tmp);  //use abs value to get also the opposite direction edges
_tmp1.copyTo(res,(_tmp1 &gt; 0.2));  //threshold the small edges...

double maxVal,minVal;
minMaxLoc(_tmp,&amp;minVal,&amp;maxVal);
cv::log(res,_tmp);
_tmp = -_tmp * 0.17;
_tmp.convertTo(grayInt1,CV_32SC1);

Sobel(gray32f,_tmp,-1,0,1,3);	//sobel for dy
_tmp1 = abs(_tmp);
res.setTo(Scalar(0));
_tmp1.copyTo(res,(_tmp1 &gt; 0.2));

imshow(&quot;tmp1&quot;,res); waitKey();

minMaxLoc(_tmp,&amp;minVal,&amp;maxVal);
cv::log(res,_tmp);
_tmp = -_tmp * 0.17;
_tmp.convertTo(grayInt,CV_32SC1);
</pre>
<h2>Now put everything into a bowl and mix!</h2>
<p>And the last part of the process will be to put the labels probabilities per pixel and edges into the grid graph we created earlier:</p>
<pre class="brush: plain;">
GCoptimizationGridGraph gc(im.cols,im.rows,num_lables);

//Set the pixel-label probability
int N = im.cols*im.rows;
double log_1_3 = log(1.3);
for(int i=0;i&lt;N;i++) {
   double* ppt = probs.ptr&lt;double&gt;(i);
   for(int l=0;l&lt;num_lables;l++) {
      int icost = MAX(0,(int)floor(-log(ppt[l])/log2));
      gc.setDataCost(i,l,icost);
   }
}

//Set the smoothness cost
Mat Sc = 5 * (Mat::ones(num_lables,num_lables,CV_32SC1) - Mat::eye(num_lables,num_lables,CV_32SC1));
gc.setSmoothCostVH((int*)(Sc.data),(int*)dx.data,(int*)dy.data);

lables.create(N,1,CV_8UC1);

printf(&quot;\nBefore optimization energy is %d\n&quot;,gc.compute_energy());
gc.expansion(1);
printf(&quot;\nAfter optimization energy is %d\n&quot;,gc.compute_energy());

//Get the labeling back from the graph
for ( int  i = 0; i &lt; N; i++ )
   ((uchar*)(lables.data + lables.step * i))[0] = gc.whatLabel(i);
</pre>
<p>Easy. Now the labeling should give us a nice segmentation:<br />
<div id="attachment_651" class="wp-caption alignnone" style="width: 279px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/3class-graphcut.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/3class-graphcut.png" alt="" title="3class-graphcut" width="269" height="205" class="size-full wp-image-651" /></a><p class="wp-caption-text">3 x 3-ch-3-gs GMMs labeling</p></div><br />
<div id="attachment_652" class="wp-caption alignnone" style="width: 277px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/1class-graphcut.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/1class-graphcut.png" alt="" title="1class-graphcut" width="267" height="203" class="size-full wp-image-652" /></a><p class="wp-caption-text">3 x 1-ch-1-gs GMMs labeling</p></div></p>
<p>But, there&#8217;s a lot of noise in the labeling&#8230; A good heuristic to apply will be to take only the largest connected-component of each label, and also try to the the component that is closest to the original marking.<br />
<div id="attachment_654" class="wp-caption alignnone" style="width: 610px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/3-way-label.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/3-way-label.png" alt="" title="Lables extraction " width="600" class="size-full wp-image-654" /></a><p class="wp-caption-text">Lables extraction without larget component keeping</p></div><br />
<div id="attachment_655" class="wp-caption alignnone" style="width: 610px"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/05/3-way-label-final.png" rel="lightbox[634]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/05/3-way-label-final.png" alt="" title="Labels extraction" width="600" class="size-full wp-image-655" /></a><p class="wp-caption-text">Lables extraction with largest component keeping</p></div></p>
<pre class="brush: plain;">
vector&lt;vector&lt;Point&gt;&gt; contours;
for(int itr=0;itr&lt;2;itr++) {

Mat mask = (_tmpLabels == itr); //Get a mask of this label

contours.clear();
//find the contours in that mask
cv::findContours(mask,contours,CV_RETR_EXTERNAL,CV_CHAIN_APPROX_NONE);

//compute areas
vector&lt;double&gt; areas(contours.size());
for(unsigned int ai=0;ai&lt;contours.size();ai++) {
	Mat _pts(contours[ai]);
	Scalar mp = mean(_pts);

	areas[ai] = contourArea(Mat(contours[ai]))/* add some bias here to get components that are closer to initial marking*/;
}

//find largest connected component
double max; Point maxLoc;
minMaxLoc(Mat(areas),0,&amp;max,0,&amp;maxLoc);

//draw back on mask
_tmpLabels.setTo(Scalar(3),mask);	//all unassigned pixels will have value of 3, later we'll turn them to &quot;background&quot; pixels

mask.setTo(Scalar(0)); //clear...
drawContours(mask,contours,maxLoc.y,Scalar(255),CV_FILLED);

//now that the mask has only the wanted component...
_tmpLabels.setTo(Scalar(itr),mask);

}
</pre>
<h2>Code and salutations</h2>
<p>As usual the code is available from the blog&#8217;s SVN:</p>
<pre class="brush: plain;">
svn checkout http://morethantechnical.googlecode.com/svn/trunk/GMMGraphCutSegmentation GMMGraphCutSegmentation
</pre>
<p>Hey! We&#8217;re pretty much done! Glad you (and I) made it to the end, it wasn&#8217;t easy after all&#8230; I hope you learned something about GMMs and Graph-Cuts in OpenCV and in general. </p>
<p>BTW: The pictures are from Flickr, under creative commons license.<br />
<a href="http://farm1.static.flickr.com/33/40406598_fd4e74d51c_d.jpg" rel="lightbox[634]">http://farm1.static.flickr.com/33/40406598_fd4e74d51c_d.jpg</a><br />
<a href="http://www.flickr.com/photos/willemvelthoven/56589010/sizes/m/in/pool-99557785@N00/">http://www.flickr.com/photos/willemvelthoven/56589010/sizes/m/in/pool-99557785@N00/</a></p>
<p>See ya!<br />
Roy.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F05%2F05%2Fbust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code%2F&amp;linkname=Bust%20out%20your%20own%20graphcut%20based%20image%20segmentation%20with%20OpenCV%20%5Bw%2F%20code%5D"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/05/05/bust-out-your-own-graphcut-based-image-segmentation-with-opencv-w-code/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Congratulations! Roy is going to MIT</title>
		<link>http://www.morethantechnical.com/2010/03/30/congratulations-roy/</link>
		<comments>http://www.morethantechnical.com/2010/03/30/congratulations-roy/#comments</comments>
		<pubDate>Tue, 30 Mar 2010 10:10:55 +0000</pubDate>
		<dc:creator>Arnon</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[media arts and science]]></category>
		<category><![CDATA[mit]]></category>
		<category><![CDATA[roy]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/2010/03/30/congratulations-roy/</guid>
		<description><![CDATA[I would like to congratulate my friend Roy, who got accepted to M.I.T in the Program in Media Arts and Sciences. Starting this September, Roy will be spending the next two years in Boston. I wish him all the best and luck. I&#8217;m sure this degree will provide some interesting posts to this blog]]></description>
			<content:encoded><![CDATA[<p>I would like to congratulate my friend Roy, who got accepted to M.I.T in the Program in Media Arts and Sciences.<br />
Starting this September, Roy will be spending the next two years in Boston.</p>
<p>I wish him all the best and luck.</p>
<p>I&#8217;m sure this degree will provide some interesting posts to this blog</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F03%2F30%2Fcongratulations-roy%2F&amp;linkname=Congratulations%21%20Roy%20is%20going%20to%20MIT"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/03/30/congratulations-roy/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Quick and Easy Head Pose Estimation with OpenCV [w/ code]</title>
		<link>http://www.morethantechnical.com/2010/03/19/quick-and-easy-head-pose-estimation-with-opencv-w-code/</link>
		<comments>http://www.morethantechnical.com/2010/03/19/quick-and-easy-head-pose-estimation-with-opencv-w-code/#comments</comments>
		<pubDate>Fri, 19 Mar 2010 16:38:49 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[3d]]></category>
		<category><![CDATA[Recommended]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[computer vision]]></category>
		<category><![CDATA[head pose]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=623</guid>
		<description><![CDATA[Hi Just wanted to share a small thing I did with OpenCV &#8211; Head Pose Estimation (sometimes known as Gaze Direction Estimation). Many people try to achieve this and there are a ton of papers covering it, including a recent overview of almost all known methods. I implemented a very quick &#38; dirty solution based [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/03/j5.png" rel="lightbox[623]"><img class="alignleft size-full wp-image-624" style="border: 1px solid black; margin-right: 5px;" title="j5" src="http://www.morethantechnical.com/wp-content/uploads/2010/03/j5.png" alt="" width="350" height="175" /></a>Hi</p>
<p>Just wanted to share a small thing I did with OpenCV &#8211; Head Pose Estimation (sometimes known as Gaze Direction Estimation). Many people try to achieve this and there are a ton of papers covering it, including a recent <a href="http://people.ict.usc.edu/~gratch/CSCI534/Head%20Pose%20estimation.pdf" target="_blank">overview of almost all known methods</a>.</p>
<p>I implemented a very quick &amp; dirty solution based on OpenCV&#8217;s internal methods that produced surprising results (I expected it to fail), so I decided to share. It is based on 3D-2D point correspondence and then fitting of the points to the 3D model. OpenCV provides a magical method &#8211; solvePnP &#8211; that does this, given some calibration parameters that I completely disregarded.</p>
<p>Here&#8217;s how it&#8217;s done</p>
<p><span id="more-623"></span></p>
<h2>Intro</h2>
<p>I wanted to use solvePnP, since I saw how easy it was to use it when I was implementing the PTAM. It&#8217;s supposed to recover the 3D location and orientation of an object, given a 3D-2D feature correspondence and an initial guess. In fact the initial guess is not required, but the results when not using the guess are dreadful.</p>
<p>So I needed to get some 3D points on a human head. I downloaded a free model of a human head from the net, and used <a href="http://meshlab.sourceforge.net/" target="_blank">MeshLab </a>to mark some points on the model:</p>
<ol>
<li>Left ear</li>
<li>Right ear</li>
<li>Left eye</li>
<li>Right eye</li>
<li>Nose tip</li>
<li>Left mouth corner</li>
<li>Right mouth corner</li>
</ol>
<p>Then I headed to <a href="http://vis-www.cs.umass.edu/lfw/" target="_blank">LFW database</a> to get some pictures of celebrity heads. By mere accident I stumbled upon Angelina Jolie. The next step was to mark some points on Angelina&#8217;s pictures, according to the selected features. In places where the head hides an ear, I put a point in the estimated location of the ear.</p>
<h2>Time to Code</h2>
<p>First I initialize the 3D points vector, and a dummy camera matrix:</p>
<pre class="brush: plain;">
vector&lt;Point3f &gt; modelPoints;
modelPoints.push_back(Point3f(-36.9522f,39.3518f,47.1217f));    //l eye
modelPoints.push_back(Point3f(35.446f,38.4345f,47.6468f));              //r eye
modelPoints.push_back(Point3f(-0.0697709f,18.6015f,87.9695f)); //nose
modelPoints.push_back(Point3f(-27.6439f,-29.6388f,73.8551f));   //l mouth
modelPoints.push_back(Point3f(28.7793f,-29.2935f,72.7329f));    //r mouth
modelPoints.push_back(Point3f(-87.2155f,15.5829f,-45.1352f));   //l ear
modelPoints.push_back(Point3f(85.8383f,14.9023f,-46.3169f));    //r ear

op = Mat(modelPoints);
op = op / 35; //just a little normalization...
rvec = Mat(rv);
double _d[9] = {1,0,0,
          0,-1,0,
         0,0,-1}; //rotation: looking at -z axis
Rodrigues(Mat(3,3,CV_64FC1,_d),rvec);
tv[0]=0;tv[1]=0;tv[2]=1;
tvec = Mat(tv);
double _cm[9] = { 20, 0, 160,
           0, 20, 120,
             0,  0,   1 };  //&quot;calibration matrix&quot;: center point at center of picture with 20 focal length.
camMatrix = Mat(3,3,CV_64FC1,_cm);
</pre>
<p>Even though the &#8220;calibration&#8221; parameters are totally bogus they work pretty good.</p>
<p>Now, we&#8217;re all ready to start estimating some poses. So let&#8217;s use solvePnP:</p>
<pre class="brush: plain;">
vector&lt;Point2f &gt; imagePoints;

//read 2D points from file...
FILE* f;
fopen_s(&amp;f,&quot;points.txt&quot;,&quot;r&quot;);
for(int i=0;i&lt;7;i++) {
     int x,y;
     fscanf_s(f,&quot;%d&quot;,&amp;x); fscanf_s(f,&quot;%d&quot;,&amp;y);
     imagePoints.push_back(Point2f((float)x,(float)y));
}
fclose(f);&lt;/td&gt;

//make a Mat of the vector&lt;&gt;
Mat ip(imagePoints);

//display points on image
Mat img = imread(&quot;image.png&quot;);
for(unsigned int i=0;i&lt;imagePoints.size();i++) circle(img,imagePoints[i],2,Scalar(255,0,255),CV_FILLED);

//&quot;distortion coefficients&quot;... hah!
double _dc[] = {0,0,0,0};

//here's where the magic happens
solvePnP(op,ip,camMatrix,Mat(1,4,CV_64FC1,_dc),rvec,tvec,true);

//decompose the response to something OpenGL would understand.
//translation vector is irrelevant, only rotation vector is important
Mat rotM(3,3,CV_64FC1,rot);
Rodrigues(rvec,rotM);
double* _r = rotM.ptr&lt;double&gt;();
printf(&quot;rotation mat: \n %.3f %.3f %.3f\n%.3f %.3f %.3f\n%.3f %.3f %.3f\n&quot;,
          _r[0],_r[1],_r[2],_r[3],_r[4],_r[5],_r[6],_r[7],_r[8]);
</pre>
<p>Alright, all done on the vision side, so I draw some 3D. As usual, I use a very simple GLUT program to display 3D in a hurry. Initialization is nothing special, so just one thing I think is special is using glutSoldCylinder and glutSolidTetrahedron to draw the axes:</p>
<pre class="brush: plain;">
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
gluLookAt(0,0,0,0,0,1,0,1,0); //cam looking at +z axis
glPushMatrix();
glTranslated(0,0,5); //go a bit back to where I want to draw the axes
glPushMatrix();

//this is the rotation matrix I got from solvePnP, so I will rotate accordingly to align with the face
double _d[16] = {       rot[0],rot[1],rot[2],0,
                rot[3],rot[4],rot[5],0,
                rot[6],rot[7],rot[8],0,
                0,         0,     0             ,1};
glMultMatrixd(_d);
glRotated(180,1,0,0); //rotate around to face the camera

//----------- Draw Axes --------------
//Z = red
glPushMatrix();
glRotated(180,0,1,0);
glColor3d(1,0,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();

//Y = green
glPushMatrix();
glRotated(-90,1,0,0);
glColor3d(0,1,0);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();

//X = blue
glPushMatrix();
glRotated(-90,0,1,0);
glColor3d(0,0,1);
glutSolidCylinder(0.05,1,15,20);
glTranslated(0,0,1);
glScaled(.1,.1,.1);
glutSolidTetrahedron();
glPopMatrix();

glPopMatrix();
glPopMatrix();
//----------End axes --------------
</pre>
<p>That wasn&#8217;t too hard, huh? Awesome.</p>
<h2>So&#8230;. Results</h2>
<p><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/ZDNH4BT5Do4&#038;hl=en&#038;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/ZDNH4BT5Do4&#038;hl=en&#038;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></p>
<h2>Code</h2>
<p>You can grab the code from the SVN repo:</p>
<pre class="brush: plain;">

svn checkout http://morethantechnical.googlecode.com/svn/trunk/HeadPose
</pre>
<p>Enjoy!</p>
<p>Roy.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F03%2F19%2Fquick-and-easy-head-pose-estimation-with-opencv-w-code%2F&amp;linkname=Quick%20and%20Easy%20Head%20Pose%20Estimation%20with%20OpenCV%20%5Bw%2F%20code%5D"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/03/19/quick-and-easy-head-pose-estimation-with-opencv-w-code/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Implementing PTAM: stereo, tracking and pose estimation for AR with OpenCV [w/ code]</title>
		<link>http://www.morethantechnical.com/2010/03/06/implementing-ptam-stereo-tracking-and-pose-estimation-for-ar-with-opencv-w-code/</link>
		<comments>http://www.morethantechnical.com/2010/03/06/implementing-ptam-stereo-tracking-and-pose-estimation-for-ar-with-opencv-w-code/#comments</comments>
		<pubDate>Sat, 06 Mar 2010 16:53:11 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Recommended]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[opencv]]></category>
		<category><![CDATA[opengl]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[3d]]></category>
		<category><![CDATA[augmented reality]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=606</guid>
		<description><![CDATA[Hi Been working hard at a project for school the past month, implementing one of the more interesting works I&#8217;ve seen in the AR arena: Parallel Tracking and Mapping (PTAM) [PDF]. This is a work by George Klein [homepage] and David Murray from Oxford university, presented in ISMAR 2007. When I first saw it on [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/03/ptam.png" rel="lightbox[606]"><img class="alignleft size-full wp-image-617" title="ptam" src="http://www.morethantechnical.com/wp-content/uploads/2010/03/ptam.png" alt="" width="350" height="286" /></a>Hi</p>
<p>Been working hard at a project for school the past month, implementing one of the more interesting works I&#8217;ve seen in the AR arena: Parallel Tracking and Mapping (PTAM) [<a href="http://www.robots.ox.ac.uk/~gk/publications/KleinMurray2007ISMAR.pdf" target="_blank">PDF</a>]. This is a work by George Klein [<a href="http://www.robots.ox.ac.uk/~gk/" target="_blank">homepage</a>] and David Murray from Oxford university, presented in ISMAR 2007.</p>
<p>When I first saw it on youtube [<a href="http://www.youtube.com/watch?v=pBI5HwitBX4" target="_blank">link</a>] I immediately saw the immense potential &#8211; mobile markerless augmented reality. I thought I should get to know this work a bit more closely, so I chose to implement it as a part of advanced computer vision course, given by Dr. Lior Wolf [<a href="http://www.cs.tau.ac.il/~wolf/" target="_blank">link</a>] at TAU.</p>
<p>The work is very extensive, and clearly is a result of deep research in the field, so I set to achieve a few selected features: Stereo initialization, Tracking, and small map upkeeping. I chose not to implement relocalization and full map handling.</p>
<p>This post is kind of a tutorial for 3D reconstruction with OpenCV 2.0. I will show practical use of the functions in cvtriangulation.cpp, which are not documented and in fact incomplete. Furthermore I&#8217;ll show how to easily combine OpenCV and OpenGL for 3D augmentations, a thing which is only briefly described in the docs or online.</p>
<p>Here are the step I took and things I learned in the process of implementing the work.</p>
<p>Update: A nice patch by yazor fixes the video mismatching &#8211; thanks! and also a nice application by Zentium called &#8220;iKat&#8221; is doing some kick-ass <a href="http://gizmodo.com/5489946/ikat-augmented-reality-app-works-without-real+world-prompt">mobile markerless augmented reality</a>.<br />
<span id="more-606"></span></p>
<h2>Preparations&#8230;</h2>
<p>Before going straight to coding, I had to prepare a few things.</p>
<ul>
<li>A working compilation of OpenCV &#8211; not trivial with the new version 2.0.</li>
<li>A calibrated camera.</li>
<li>Test data</li>
</ul>
<p>Compiling OpenCV 2.0 proved to be a bit tricky. Even though the sourceforge project offers binary release for Win32, I compiled the whole thing from source. It turned out the binary release doesn&#8217;t contain .lib files, and anyway has compatibility issues between MS VS 2005 and 2008 &#8211; something about the embedded manifest [<a href="http://www.google.com/search?q=opencv+2.0+VS+2008+manifest+erro" target="_blank">google</a>]. I downloaded the freshest source from SVN, and compiled it, but it didn&#8217;t solve the debug-release problem, so I was left with using the release dlls even for debug evironment.</p>
<p>Initially I thought I&#8217;ll try an uncalibrated camera approach, but soon abandoned it. I had to calibrate my cameras, which I did  very easily using OpenCV&#8217;s &#8220;calibration.cpp&#8221;, which strangely is <strong>not built</strong> when building all examples &#8211; it has to be built manually. But everything went smoothly, and I soon got a calibration matrix (focal length, center of projection) and radial distortion coefficients.</p>
<h3>Getting Test Data</h3>
<p><object style="width: 480px; height: 295px;" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="295" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/WXsufPbEUmM&amp;hl=en_US&amp;fs=1&amp;" /><param name="align" value="left" /><embed style="width: 480px; height: 295px;" type="application/x-shockwave-flash" width="480" height="295" src="http://www.youtube.com/v/WXsufPbEUmM&amp;hl=en_US&amp;fs=1&amp;" align="left"></embed></object>For the test data I wanted to get a few views of a planar scene, where the first two views are separated only by a translation of ~5cm, as K&amp;M do in the PTAM article. This known translation is helpful when trying to triangulate the initial features in the scene. When you have prior knowledge of where the cameras are, you can simply intersect the epipolar lines between the two views and recover the 3D position of the points &#8211; up to a scale. Keep in mind you must also have feature correspondence: a point on image A must be correlated to a point in image B.</p>
<p>To achieve this I set up a small program that uses Optical Flow to track some 2D features in the scene, and grab a few screens + feature vectors. See &#8216;capture_data.cpp&#8217;.</p>
<h2>Stereo Initialization</h2>
<p>Now that I have 2 views with feature correspodence:</p>
<p><a rel="lightbox" href="http://www.morethantechnical.com/wp-content/uploads/2010/03/frames_correl.png"><img class="alignnone size-full wp-image-607" title="frames_correl" src="http://www.morethantechnical.com/wp-content/uploads/2010/03/frames_correl.png" alt="" width="634" height="259" /></a></p>
<p>I would like to triangulate the features. This is possible, as I discussed earlier, since I know the rotation (none), translation (5cm on -x axis) and camera calibration parameters (focal length, center of projection).</p>
<h3>Triangulation</h3>
<p>For triangulation, OpenCV has only recently added a couple of functions that implement triangulation [<a href="http://n2.nabble.com/An-implementation-of-the-Optimal-Triangulation-Method-td2295331.html" target="_blank">link</a>] as shown by Hartly &amp; Zisserman [<a href="http://users.cecs.anu.edu.au/~hartley/Papers/CVPR99-tutorial/tut_4up.pdf" target="_blank">PDF</a>, page 12]. However, these functions are not formally documented, and in fact they are missing some important parts. This is how I used cvTriangulation(), which is the key function:</p>
<pre class="brush: plain;">
//this function will initialize the 3D features from two views
void stereoInit() {

//first load camera intrinsic parameters
FileStorage fs(&quot;cam_work.out&quot;,CV_STORAGE_READ);
FileNode fn = fs[&quot;camera_matrix&quot;];
camera_matrix = Mat((CvMat*)fn.readObj(),true);

fn = fs[&quot;distortion_coefficients&quot;];
distortion_coefficients = Mat((CvMat*)fn.readObj(),true);

//vector&lt;Point2d&gt; points[2]; //these Point2d vectors hold the 2D features, double precision, from the 2 views

//get copy of points
_points[0] = points[0];
_points[1] = points[1];
Mat pts1M(_points[0]), pts2M(_points[1]); //very easy in OpenCV 2.0 to convert vector&lt;&gt; to Mat.

//Undistort points
Mat tmp,tmpOut;
pts1M.convertTo(tmp,CV_32FC2);  //undistort takes only floats not doubles, so convert to Point2f
undistortPoints(tmp,tmpOut,camera_matrix,distortion_coefficients);
tmpOut.convertTo(pts1M,CV_64FC2);  //go back to double precision

pts2M.convertTo(tmp,CV_32FC2);
undistortPoints(tmp,tmpOut,camera_matrix,distortion_coefficients);
tmpOut.convertTo(pts2M,CV_64FC2);

vector&lt;uchar&gt; tri_status; //this will hold the status for each point, a good point will have 1, bad - 0

//now triangulate
triangulate(_points[0],_points[1],tri_status);

}

void triangulate(vector&lt;Point2d&gt;&amp; points1, vector&lt;Point2d&gt;&amp; points2, vector&lt;uchar&gt;&amp; status) {

	//Convert points to 1-channel, 2-rows, double precision - This is important - see the code
...

	Mat ___tmp(2,pts1Mt.cols,CV_64FC1,__d);
...
	Mat ___tmp1(2,pts2Mt.cols,CV_64FC1,__d1);
...

	CvMat __points1 = ___tmp, __points2 = ___tmp1;

	//projection matrices
	double P1d[12] = {	-1,0,0,0,
						0,1,0,0,
						0,0,1,0 };	//Identity, but looking into -z axis
	Mat P1m(3,4,CV_64FC1,P1d);
	CvMat* P1 = &amp;(CvMat)P1m;
	double P2d[12] = {	-1,0,0,-5,
						0,1,0,0,
						0,0,1,0 };  //Identity rotation, 5cm -x translation, looking into -z axis
	Mat P2m(3,4,CV_64FC1,P2d);
	CvMat* P2 = &amp;(CvMat)P2m;

	float _d[1000] = {0.0f};
	Mat outTM(4,points1.size(),CV_32FC1,_d);
	CvMat* out = &amp;(CvMat)outTM;

//using cvTriangulate with the created structures
	cvTriangulatePoints(P1,P2,&amp;__points1,&amp;__points2,out);

//we should check the triangulation result by reprojecting 3D-&gt;2D and checking distance
	vector&lt;Point2d&gt; projPoints[2] = {points1,points2};

	double point2D_dat[3] = {0};
	double point3D_dat[4] = {0};
	Mat twoD(3,1,CV_64FC1,point2D_dat);
	Mat threeD(4,1,CV_64FC1,point3D_dat);

	Mat P[2] = {Mat(P1),Mat(P2)};

	int oc = out-&gt;cols, oc2 = out-&gt;cols*2, oc3 = out-&gt;cols*3;

	status = vector&lt;uchar&gt;(oc);

	//scan all points, reproject 3D-&gt;2D, and keep only good ones
	for(int i=0;i&lt;oc;i++) {
		double W = out-&gt;data.fl[i+oc3];
        point3D_dat[0] = out-&gt;data.fl[i] / W;
        point3D_dat[1] = out-&gt;data.fl[i+oc] / W;
        point3D_dat[2] = out-&gt;data.fl[i+oc2] / W;
        point3D_dat[3] = 1;

        bool push = true;
        /* !!! Project this point for each camera */
        for( int currCamera = 0; currCamera &lt; 2; currCamera++ )
        {
            //reproject! using the P matrix of the current camera
			twoD = P[currCamera] * threeD;

            float x,y;
            float xr,yr,wr;
 	x = (float)projPoints[currCamera][i].x;
	y = (float)projPoints[currCamera][i].y;

            wr = (float)point2D_dat[2];
            xr = (float)(point2D_dat[0]/wr);
            yr = (float)(point2D_dat[1]/wr);

            float deltaX,deltaY;
            deltaX = (float)fabs(x-xr);
            deltaY = (float)fabs(y-yr);

			//printf(&quot;error from cam %d (%.2f,%.2f): %.6f %.6f\n&quot;,currCamera,x,y,deltaX,deltaY);

			if(deltaX &gt; 0.01 || deltaY &gt; 0.01) {
				push = false;
			}
        }
		if(push) {
			// A good 3D reconstructed point, add to known world points

			double s = 7;
			Point3d p3d(point3D_dat[0]/s,point3D_dat[1]/s,point3D_dat[2]/s);
			//printf(&quot;%.3f %.3f %.3f\n&quot;,p3d.x,p3d.y,p3d.z);
			points1Proj.push_back(p3d);
			status[i] = 1;
		} else {
			status[i] = 0;
		}

	}
}
</pre>
<p>OK, now that I have (hopefully) triangulated 3D features from the initial state: 2 views of a planar scene with 5cm translation on the X axis &#8211; I can move on the pose estimation.</p>
<h2>Pose Estimation</h2>
<p>Theoretically, if I know the 3D position of features in the world and their respective 2D position in the image, it should be easy to recover the position of the camera, because there are a rotation matrix and translation vector that define this transformation. Practically in OpenCV, finding the position of an object using 3D-2D correlation is done by using the solvePnP() [<a href="http://opencv.willowgarage.com/documentation/cpp/camera_calibration_and_3d_reconstruction.html#solvepnp" target="_blank">link</a>] function.</p>
<p>Since I have an initial guess of the rotation and translation &#8211; from the first 2 frames &#8211; I can &#8220;help&#8221; the function estimate the new ones.</p>
<pre class="brush: plain;">
void findExtrinsics(vector&lt;Point2d&gt;&amp; points, vector&lt;double&gt;&amp; rv, vector&lt;double&gt;&amp; tv) {
	//estimate extrinsics for these points

	Mat rvec(rv),tvec(tv);

//initial &quot;guess&quot;, in case it wasn't already supplied
	if(rv.size()!=3) {
		rv = vector&lt;double&gt;(3);
		rvec = Mat(rv);
		double _d[9] = {1,0,0,
						0,-1,0,
						0,0,-1};
		Rodrigues(Mat(3,3,CV_64FC1,_d),rvec);
	}
	if(tv.size()!=3) {
		tv = vector&lt;double&gt;(3);
		tv[0]=0;tv[1]=0;tv[2]=0;
		tvec = Mat(tv);
	}

	//create a float rep  of points
	vector&lt;Point2f&gt; v2(points.size());
	Mat tmpOut(v2);
	Mat _tmpOut(points);
	_tmpOut.convertTo(tmpOut,CV_32FC2);

	solvePnP(points1projMF,tmpOut,camera_matrix,distortion_coefficients,rvec,tvec,true);

	printf(&quot;frame extrinsic:\nrvec: %.3f %.3f %.3f\ntvec: %.3f %.3f %.3f\n&quot;,rv[0],rv[1],rv[2],tv[0],tv[1],tv[2]);

//the output of the function is a Rodrigues form of rotation, so convert to regular rot-matrix
	Mat rotM(3,3,CV_64FC1); ///,_r);
	Rodrigues(rvec,rotM);
	double* _r = rotM.ptr&lt;double&gt;();
	printf(&quot;rotation mat: \n %.3f %.3f %.3f\n%.3f %.3f %.3f\n%.3f %.3f %.3f\n&quot;,
		_r[0],_r[1],_r[2],_r[3],_r[4],_r[5],_r[6],_r[7],_r[8]);
}
</pre>
<p>After getting the extrinsic parameters of the camera, the next step is plugging in the visualization!</p>
<h2>Integrating OpenGL</h2>
<p>Generally, it should be possible to create a 3D scene that matches exactly the true world scene, where the triangulated features appear in the scene aligned exactly with the world. I was not able to achieve that, but I got pretty close:<br />
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="385" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/Q1HVjAWls_E&amp;hl=en_US&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="480" height="385" src="http://www.youtube.com/v/Q1HVjAWls_E&amp;hl=en_US&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>It&#8217;s basically what you do in augmented reality, you align the virtual camera&#8217;s position and rotation with the results you get from the vision part of the system. In the pose estimation we ended with a 3D rotation vector (Rodrigues form) and 3D translation vector which is used as-is, so only the rotation vector should be converted to 3&#215;3 matrix using the Rodrigues() function.</p>
<p>This is the OpenGL glut display() function that draws the scene:</p>
<pre class="brush: plain;">
void display(void)
{
	glClearColor(1.0f, 1.0f, 1.0f, 0.5f);
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);	// Clear Screen And Depth Buffer

	//draw the background - the frame from the camers
	glMatrixMode(GL_PROJECTION);
	glPushMatrix();
	gluOrtho2D(0.0,352.0,288.0,0.0);
	glMatrixMode(GL_MODELVIEW);
	glPushMatrix();
	glDisable(GL_DEPTH_TEST);
	glDrawPixels(352,288,GL_RGB,GL_UNSIGNED_BYTE,backPxls.data);
	glEnable(GL_DEPTH_TEST);
	glPopMatrix();
	glMatrixMode(GL_PROJECTION);
	glPopMatrix();

    const double t = glutGet(GLUT_ELAPSED_TIME) / 1000.0;
	a = t*20.0;

	glMatrixMode(GL_MODELVIEW);
	glLoadIdentity();

//use the camera position 3D vector
	curCam[0] = cam[0]; curCam[1] = cam[1]; curCam[2] = cam[2];
//there seems to be some kind of offset...
	glTranslated(-curCam[0]+0.5,-curCam[1]+0.7,-curCam[2]);

//and the 3x3 rotation matrix
	double _d[16] = {	rot[0],rot[1],rot[2],0,
						rot[3],rot[4],rot[5],0,
						rot[6],rot[7],rot[8],0,
						0,	   0,	  0		,1};
	glMultMatrixd(_d);

//flip the rotation on the x-axis
	glRotated(180,1,0,0);

	//draw the 3D feature points
	glPushMatrix();
	glColor4d(1.0,0.0,0.0,1.0);
	for(unsigned int i=0;i&lt;points1Proj.size();i++) {
		glPushMatrix();
glTranslated(points1Proj[i].x,points1Proj[i].y,points1Proj[i].z);
		glutSolidSphere(0.03,15,15);
		glPopMatrix();
	}
	glPopMatrix();

	glutSwapBuffers();

	if(!running) {
		glutLeaveMainLoop();
	}

	Sleep(25);
}
</pre>
<p>This pretty much coveres my work, in a very concise way. The complete source code will reveal all I have done, and will provide a better copy-and-paste ground for your own projects.</p>
<h2>Things not covered in this work</h2>
<p>Initially I tried to implement a very crucial part of the PTAM work &#8211; pairing the 3D map with 2D features in the image. This allows them to re-align the map in every frame (when the tracking is bad) so the pose estimation does not &#8220;loose grip&#8221;. In essence, they keep a visual identity for each map feature, very similar to a descriptor like SURF or SIFT, so at any point they can find where in the new image are the features and recover the camera pose from the 2D-3D correspondence. I ran into a problem utilizing OpenCV&#8217;s SURF functionality, it seems to have a bug when trying to compute the descriptor for user-given feature points.</p>
<p>Another thing I chose not to implement is creating a full map of the surroundings. I wanted to achieve a simple working solution for a small map (essentially a single frame), and see how it works. In the original work by K&amp;M they constantly add more and more features to the map untill it has covered the whole surrounding room.</p>
<h2>Code and Working the Program</h2>
<p>As usual my code is available for checkout from the blog&#8217;s SNV repo:</p>
<pre class="brush: plain;">
svn checkout http://morethantechnical.googlecode.com/svn/trunk/ptam ptam
</pre>
<p>To get the stereo initialization you must press [spacebar] twice: Once when the camera has stabilized and the features are stable, and another time when the camera has translated and again stabilized.<br />
This marks the 2 keyframes that will be used for stereo init and triangulation.<br />
From that point on, the 3D scene will start and the track-and-estimate stage begins. Try not to move the camera violently as the optical flow may suffer.</p>
<p>Thanks Lior for your help getting the hang of these subjects, and the opportunity to meddle with a subject I long gone wanted to explore.</p>
<p>I hope everyone will enjoy and learn from my enjoyment and learning.</p>
<p>Bye!</p>
<p>Roy.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F03%2F06%2Fimplementing-ptam-stereo-tracking-and-pose-estimation-for-ar-with-opencv-w-code%2F&amp;linkname=Implementing%20PTAM%3A%20stereo%2C%20tracking%20and%20pose%20estimation%20for%20AR%20with%20OpenCV%20%5Bw%2F%20code%5D"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/03/06/implementing-ptam-stereo-tracking-and-pose-estimation-for-ar-with-opencv-w-code/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>iPhone OS 3.x Raw data of camera frames</title>
		<link>http://www.morethantechnical.com/2010/02/27/iphone-os-3-x-raw-data-of-camera-frames/</link>
		<comments>http://www.morethantechnical.com/2010/02/27/iphone-os-3-x-raw-data-of-camera-frames/#comments</comments>
		<pubDate>Sat, 27 Feb 2010 20:44:32 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Mobile phones]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[vision]]></category>
		<category><![CDATA[frame grabbing]]></category>
		<category><![CDATA[iphone]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=599</guid>
		<description><![CDATA[Hi All It looks like it&#8217;s finally here &#8211; a way to grab the raw data of the camera frames on the iPhone OS 3.x. Update: Apple officially supports this in iOS 4.x using AVFoundation, here&#8217;s sample code from Apple developer. A gifted hacker named John DeWeese was nice enough to comment on a post [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.morethantechnical.com/wp-content/uploads/2010/02/iphone-os-3-STEWIE.png" rel="lightbox[599]"><img src="http://www.morethantechnical.com/wp-content/uploads/2010/02/iphone-os-3-STEWIE-300x212.png" alt="" title="&quot;where is my data?!&quot;" width="300" height="212" class="alignleft size-medium wp-image-601" /></a>Hi All</p>
<p>It looks like it&#8217;s finally here &#8211; a way to grab the raw data of the camera frames on the iPhone OS 3.x. </p>
<p><strong>Update</strong>: Apple officially supports this in iOS 4.x using AVFoundation, <a href="http://developer.apple.com/iphone/library/qa/qa2010/qa1702.html#TNTAG1">here&#8217;s</a> sample code from Apple developer.</p>
<p>A gifted hacker named <a href="http://deweeeese.blogspot.com/">John DeWeese</a> was nice enough to comment on <a href="http://www.morethantechnical.com/2009/05/06/iphone-camera-frame-grabbing-and-a-real-time-meanshift-tracker/">a post from May 09&#8242;</a> with <a href="http://deweeeese.blogspot.com/2010/02/processing-iphone-camera-video-on.html">his method of hacking the APIs to get the frames</a>. Though cumbersome, it looks like it should work, but I haven&#8217;t tried it yet. I promise to try it soon and share my results.</p>
<p>Way to go John!<br />
Some code would be awesome&#8230;</p>
<p>Roy.</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F02%2F27%2Fiphone-os-3-x-raw-data-of-camera-frames%2F&amp;linkname=iPhone%20OS%203.x%20Raw%20data%20of%20camera%20frames"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/02/27/iphone-os-3-x-raw-data-of-camera-frames/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SmartHome – Embedded computing course project</title>
		<link>http://www.morethantechnical.com/2010/02/21/smarthome-embedded-computing-course-project/</link>
		<comments>http://www.morethantechnical.com/2010/02/21/smarthome-embedded-computing-course-project/#comments</comments>
		<pubDate>Sun, 21 Feb 2010 16:42:09 +0000</pubDate>
		<dc:creator>Roy</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Website]]></category>
		<category><![CDATA[code]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[arm]]></category>
		<category><![CDATA[embedded]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[swt]]></category>
		<category><![CDATA[zigbee]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=580</guid>
		<description><![CDATA[Hi In the past few weeks I have been working hard at a few projects for end-of-term at Uni. One of the projects is what I called &#8220;SmartHome&#8221;, for Embedded computing [link] course, is a home monitoring [link] application. In the course the students were given an LPC2148 arm7-MCU (NXP) based education board, implemented by [...]]]></description>
			<content:encoded><![CDATA[<p>Hi<br />
In the past few weeks I have been working hard at a few projects for end-of-term at Uni. One of the projects is what I called &#8220;SmartHome&#8221;, for Embedded computing [<a href="http://www.tau.ac.il/~stoledo/Courses/Embedded/announcement.html" target="_blank">link</a>] course, is a home monitoring [<a href="http://en.wikipedia.org/wiki/Home_automation" target="_blank">link</a>] application. In the course the students were given an LPC2148 arm7-MCU (NXP) based education board, implemented by Embedded Artists [<a href="http://www.embeddedartists.com/products/education/edu_2148.php" target="_blank">link</a>]. My partner Gil and I decided to work with ZigBee extension modules [<a href="http://en.wikipedia.org/wiki/ZigBee" target="_blank">link</a>] to enable remote communication.</p>
<p>Here are the steps we took to bring this project to life.<br />
<span id="more-580"></span></p>
<h2>The Idea</h2>
<p>Our vision is to create a home monitoring and controlling system, that will enable tracking different sensors around the house and also control switches.The system should be centralized by a master controller, and also wireless so it will not need cumbersome wiring throughout the house. We would also like the system to be easily controlled by a PC, so visual information could be displayed to the user, as well as allow manual control of the electronic switches.</p>
<p>Such a system will be able to automatically:</p>
<ul>
<li>Turn off the garden lighting when the light outside is bright enough,</li>
<li>Control the lawn watering system,</li>
<li>Control air conditioning in the house according to the temperature,</li>
<li>etc.</li>
</ul>
<h2>The Hardware</h2>
<p style="text-align: left;">As I mentioned, we were given an LPC2148 education board [<a href="http://www.embeddedartists.com/products/education/edu_2148.php" target="_blank">link</a>], implemented by Embedded Artists, that boasts a 12Mhz ARM CPU, and many peripheral subsystems. Among the systems are: LCD screen, numerous LEDs, a LED matrix, a fan, analog dials, USB and UART I/O and more. The boards also has a port for connecting a ZigBee module to enable RF communication, but it doesn&#8217;t contain the actual module. Since we needed remote communication between our stations, we bought 3 XBee modules from Maxstream [<a href="http://www.digi.com/products/wireless/point-multipoint/xbee-series1-module.jsp#overview" target="_blank">link</a>].</p>
<p style="text-align: center;"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/02/Image096.jpg" rel="lightbox[580]"><img class="size-medium wp-image-586 alignnone" title="XBee module by MaxStream" src="http://www.morethantechnical.com/wp-content/uploads/2010/02/Image096-300x225.jpg" alt="" width="210" height="158" /></a><a href="http://www.morethantechnical.com/wp-content/uploads/2010/02/Image097.jpg" rel="lightbox[580]"> <img class="size-medium wp-image-588 alignnone" title="LPC2148 board by Embedded Artists" src="http://www.morethantechnical.com/wp-content/uploads/2010/02/Image097-300x225.jpg" alt="" width="210" height="158" /></a><a href="http://www.morethantechnical.com/wp-content/uploads/2010/02/Image098.jpg" rel="lightbox[580]"> <img class="alignnone size-medium wp-image-589" title="XBee module on LPC2148 board" src="http://www.morethantechnical.com/wp-content/uploads/2010/02/Image098-300x225.jpg" alt="" width="210" height="158" /></a></p>
<h2>The Software</h2>
<p>The LPC boards can be programmed very easily using tools provided by Embedded Artists, such as GCC with Newlib for compiling (both Win and Linux), and <a href="http://www.flashmagictool.com/" target="_blank">Flashmagic</a> (for windows) or lpc21isp for loading the compiled program. We used these tools for programing the &#8220;embedded&#8221; part of the application, for the PC client we used Java with <a href="http://www.eclipse.org/swt/" target="_blank">SWT</a> and serial port connectivity (<a href="http://rxtx.qbang.org/wiki/index.php/Download" target="_blank">RxTx</a> for Windows and Linux).</p>
<h3>Embedded program</h3>
<p>The embedded program on the LPC board has two parts: A master and A client. The system has only one master, and up to 3 clients. The master gathers information from the clients, and controls their behavior according to the user requests. For communication between the master and clients we created a communication protocol that has only a few simple messages:</p>
<ul>
<li>INIT &#8211; The master sends this request to initialize the connection between itself and a client.</li>
<li>ACK &#8211; This is the response for every request.</li>
<li>POLL &#8211; The clients answers this request with the data from all it&#8217;s sensors.</li>
<li>OPERATE &#8211; The master commands the client to switch something on or off.</li>
<li>ERROR &#8211; A general error response.</li>
</ul>
<p>All commands / responses have a unified structure described here:</p>
<pre class="brush: plain;">
General struct:
 ___________________________________________ _____
|    OP   | To | From | &lt;----- DATA ------&gt; | EOM
|_________|____|______|_____________________|_____

Data:

POLL (response)
_ ____________ ______ ______ ________
 | Temperature| ADC0 | ADC1 | Button |
_|____________|______|______|________|

OPERATE
_ _________
 | BITMASK |
_|_________|
</pre>
<p>This way we had a very easy implementation of the communication module, since we always had to look for only 14 bytes on the wire.</p>
<h3>Communication with ZigBee module</h3>
<p>To jumpstart our implementation we used the examples provided by Embedded Artists for operating the ZigBee module [<a href="http://www.embeddedartists.com/support/LPC2148_EDU/XBee_example.zip" target="_blank">link</a>]. The communication with the ZigBee module is on the UART1 port of the MCU. The example code takes care of opening the correct GPIO pins, setting the IRQ masks to enable interrupts and provides a very simple API for transmitting and receiving characters with the XBee over UART  (described in uart.h and uart.c files).</p>
<p>We took the XBee example code and expanded it to be able to recieve and send data between two stations. That means setting up each station&#8217;s module with it&#8217;s ID, address, channel and target address by AT commands [<a href="http://ftp1.digi.com/support/documentation/90000982_B.pdf" target="_blank">spec</a>, see page. 28]: ATID, ATCH, ATMY and ATDL. Two other key features are putting the module into command mode (rather than transmit mode) to set the mentioned parameters, this is done by &#8216;+++&#8217; to enter command mode and ATCN to exit. Once we had a decent framework to communicate between the stations, we started to build the logic.</p>
<p>The master station logic is like so:</p>
<ol>
<li>Initialize XBee module.</li>
<li>INIT all client stations, and see which station answers &#8211; these will be our &#8220;up&#8221; stations.</li>
<li>Loop:
<ol>
<li>POLL each &#8220;up&#8221; station in a loop.</li>
<li>Take care of any OPERATE requests from the PC client.</li>
</ol>
</li>
</ol>
<p>This way, the client&#8217;s logic boils down to just looping, testing for any request and taking care of it. Both client and master share most of the code, so only the main process code is essentially different. The example code uses a framework called &#8220;Preemtive OS&#8221; to allow multitasking / processes (code was bundled in the examples).</p>
<h3>Communication with PC</h3>
<p>Communication between the master and PC client also required some lightweight &#8220;protocol&#8221;. The communication is again over UART (0 this time, 1 is used by XBee), only now the PC is doing a UART-over-USB with the board. The protocol we ended up with supports these features:</p>
<ul>
<li>Commands the PC wants the master to perform look like &#8220;m=&lt;command&gt;=&lt;parameters&gt;&#8221;, such commands can be:
<ul>
<li>&#8220;poll=&lt;i&gt;&#8221;, send a POLL to the station indexed i</li>
<li>&#8220;test=&lt;i&gt;&#8221;, send an INIT to the station indexed i</li>
<li>&#8220;toggle=&lt;i&gt;_&lt;sw&gt;_&lt;onoff&gt;&#8221;, set the switch <em>sw</em> on the station indexed <em>i</em> to <em>onoff</em> status</li>
</ul>
</li>
<li>Notifications the master would like to share with the user (PC) look like &#8220;pc=&lt;notification&gt;&#8221;, such notifications can be:
<ul>
<li> &#8220;up=&lt;i&gt;&#8221;, the station indexed <em>i</em> is up</li>
<li>&#8220;master_online&#8221;,</li>
<li>&#8220;temp=&lt;i&gt; &lt;temp&gt;&#8221;, the station indexed <em>i</em> has a temperature reading of <em>temp</em></li>
<li>&#8220;e=&lt;error message&gt;&#8221;, error message from the master</li>
</ul>
</li>
</ul>
<p>This is an incomplete set of features, but it&#8217;s representative of the idea we want to present.</p>
<h3>PC client program</h3>
<p>The PC client we built using Java, so it would (and should) be portable between OSs. The GUI was built with the SWT, using the eclipse Visual Editor plug-in which simplified the process. Communication with the LPC board is done over the serial port of the board, as I mentioned the PC creates a virtual COM port using a USB-to-UART driver (FTDI, bundled with windows) [<a href="http://en.wikipedia.org/wiki/FTDI" target="_blank">link</a>].</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/1ZC2SuYb7P0&amp;hl=en_US&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/1ZC2SuYb7P0&amp;hl=en_US&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>To test the GUI we created a &#8220;simulator&#8221;, that mimics the operation of a master station, that is shown in the video above.</p>
<p>The high-level design of the program is described in this inheritance UML diagram:</p>
<p style="text-align: center;"><a href="http://www.morethantechnical.com/wp-content/uploads/2010/02/smart_home_uml.png" rel="lightbox[580]"><img class="size-medium wp-image-593 aligncenter" title="smart_home_uml" src="http://www.morethantechnical.com/wp-content/uploads/2010/02/smart_home_uml-251x300.png" alt="" width="251" height="300" /></a></p>
<p>Our PC client uses serial-port connectivity based on the old javax.serial APIs. Though these APIs have been abandoned by Sun, and there is no official Win32 implementation bundled with the JDK (only for *NIXs/Solaris), a project named RxTx is upkeeping an implementation of this API for windows [<a href="http://rxtx.qbang.org/wiki/index.php/Download" target="_blank">link</a>].</p>
<h2>The Code</h2>
<p>We are releasing the code under the BSD license, for everyone to use, enjoy, learn and expand.</p>
<p>It is available via the blog&#8217;s SVN repo:</p>
<p>PC Client (requires RxTx serial port impl and SWT)</p>
<pre class="brush: plain;">
svn checkout http://morethantechnical.googlecode.com/svn/trunk/SmartHomePCClient
</pre>
<p>Embedded program (includes all dependencies)</p>
<pre class="brush: plain;">
svn checkout http://morethantechnical.googlecode.com/svn/trunk/smarthome_embedded/final project
</pre>
<p>To compile the master program &#8220;make&#8221; in the master directory, and you&#8217;ll get an &#8220;xbee_master.hex&#8221; file, same goes for the client in the client directory (these directories don&#8217;t contain code, only a makefile). Then you have to upload the hex into the board, this is done either by &#8220;make deploy&#8221; (if you have lpc21isp), or FlashMagic on Win.</p>
<p>Java is compiled as usual&#8230; just remember the dependencies.</p>
<p>We would like to thank Sivan Toledo for the guidance, loaned hardware and inspiration.</p>
<p>And thanks all for listening!<br />
Roy S. &amp; Gil Ramon</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F02%2F21%2Fsmarthome-embedded-computing-course-project%2F&amp;linkname=SmartHome%20%26%238211%3B%20Embedded%20computing%20course%20project"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/02/21/smarthome-embedded-computing-course-project/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Stream your favorite radio station to your workplace</title>
		<link>http://www.morethantechnical.com/2010/01/29/stream-your-favorite-radio-station-to-your-workplace/</link>
		<comments>http://www.morethantechnical.com/2010/01/29/stream-your-favorite-radio-station-to-your-workplace/#comments</comments>
		<pubDate>Fri, 29 Jan 2010 14:53:00 +0000</pubDate>
		<dc:creator>Arnon</dc:creator>
				<category><![CDATA[Solutions]]></category>
		<category><![CDATA[Stream]]></category>
		<category><![CDATA[work]]></category>
		<category><![CDATA[block]]></category>
		<category><![CDATA[bypass]]></category>
		<category><![CDATA[corporate]]></category>
		<category><![CDATA[re-stream]]></category>
		<category><![CDATA[vlc]]></category>
		<category><![CDATA[worksplace]]></category>

		<guid isPermaLink="false">http://www.morethantechnical.com/?p=567</guid>
		<description><![CDATA[I want to suggest a trick that worked for me. My work place blocks most of the popular radio stations stream sites in my country. I can understand why they&#8217;re doing that, but hey – if you want to save bandwidth I suggest you block YouTube (not that I complain…) Well, I thought of a [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">I want to suggest a trick that worked for me. My work place blocks most of the popular radio stations stream sites in my country.<br />
I can understand why they&#8217;re doing that, but hey – if you want to save bandwidth I suggest you block YouTube (not that I complain…)<br />
Well, I thought of a way to listen to my favorite radio station from work, by re-streaming it from my home. And it worked!</p>
<p style="text-align: left;">It can also work for you, in case your IT does not block by protocol, only by address.</p>
<p style="text-align: left;">So here&#8217;s how to do it:</p>
<p style="text-align: left;"><span id="more-567"></span></p>
<h2 style="text-align: left;">STEP 1 – Retrieve the actual link for your radio station</h2>
<p style="text-align: left;">Go to the station&#8217;s website and try to get a link that you can play in Windows Media Center.<br />
Some stations will let you find it, others won&#8217;t.<br />
If this is the case, here&#8217;s what you need to do (At home!)</p>
<ol style="text-align: left;">
<li>
<div style="text-align: justify;">Download <a href="http://www.wireshark.org/">wireshark</a> and install it</div>
</li>
<li>
<div style="text-align: justify;">Start capturing using this filter: <strong>rtsp.request</strong></div>
</li>
<li>
<div style="text-align: justify;">Try to listen to the station</div>
</li>
<li>
<div style="text-align: justify;">You should see such an output in wireshark:</div>
<p style="text-align: right;"><img class="alignleft" src="http://www.morethantechnical.com/wp-content/uploads/2010/01/012910_1452_Streamyourf1.png" alt="" /></p>
<p style="text-align: left;">As you can see, there is a url here that looks like this:<br />
<strong>rtsp://someurl/somefile RTSP/1.0</strong></p>
</li>
<li>Take this URL, and replace <strong>RTSP</strong> with <strong>MMS</strong> and omit the <strong>RTSP/1.0</strong> ending.<br />
Your URL should look like this: <a href="mms://someurl.somefile">mms://someurl.somefile</a></li>
<li>
<div style="text-align: justify;">Try to play this URK in windows media player. If it works – you&#8217;re half way through</div>
</li>
</ol>
<p style="text-align: left;">If you can get the link in any other way, it&#8217;s fine… This is only a suggestion</p>
<p style="text-align: left;"> </p>
<h2 style="text-align: left;">STEP 2 – Set up VLC as a stream server at your home</h2>
<p style="text-align: left;">I will start with an apology. I didn&#8217;t find a proper way to do it with the GUI of version 1.0+.<br />
On the past versions it was relatively clear, so that&#8217;s where I managed to find the command line you need to run in order for it to work<br />
So, at this point, you need to run <a href="http://www.videolan.org/">VLC</a> on your PC with these parameters:</p>
<p style="text-align: left;"><span style="font-family: Courier New;">vlc mms://yourserver/yourlink :sout=#transcode{acodec=mp3,ab=64,channels=2}:duplicate{dst=std{access=mmsh,mux=asfh,dst=:8080}}<br />
</span></p>
<p style="text-align: left;">This will stream the web-radio station to port 8080.<br />
You can test this using Windows Media Player, just open it up, and open the url <a href="mms://localhost:8080">mms://localhost:8080</a></p>
<p style="text-align: left;">Another thing you need to keep in mind, is that you should establish port forwarding if you are behind NAT.</p>
<p style="text-align: left;"> That&#8217;s the whole deal… Of course you can improve it by creating a script that will run it, or add the http interface of VLC to remote control it on another port, but I don&#8217;t want to overload this guide…</p>
<p style="text-align: left;">keep it simple!</p>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.morethantechnical.com%2F2010%2F01%2F29%2Fstream-your-favorite-radio-station-to-your-workplace%2F&amp;linkname=Stream%20your%20favorite%20radio%20station%20to%20your%20workplace"><img src="http://www.morethantechnical.com/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://www.morethantechnical.com/2010/01/29/stream-your-favorite-radio-station-to-your-workplace/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
