<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:blogger="http://schemas.google.com/blogger/2008" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0"><id>tag:blogger.com,1999:blog-3946011063058389308</id><updated>2020-09-11T12:40:39.762-07:00</updated><category term="~Mikhail Khludnev" /><category term="Solr" /><category term="lucene" /><category term="Java" /><category term="~Alexey Ragozin" /><category term="grid computing" /><category term="gigaspaces" /><category term="search" /><category term="cloud" /><category term="coherence" /><category term="data grid" /><category term="convergence" /><category term="block-join" /><category term="open source" /><category term="~Eugene Steinberg" /><category term="Hadoop" /><category term="deployment" /><category term="scalability" /><category term=".NET" /><category term="data synapse" /><category term="grid consulting" /><category term="indexing" /><category term="~Dmitry Korotkov" /><category term="~Victoria Livschitz" /><category term="Microsoft HPC" /><category term="Spring" /><category term="distributed cache" /><category term="faceted navigation" /><category term="join" /><category term="memory" /><category term="openstack" /><category term="testing" /><category term="~Oleg Malakhov" /><category term="EC2" /><category term="GridGain" /><category term="POF" /><category term="Pig" /><category term="SIMD" /><category term="Sun Grid Engine" /><category term="capacity" /><category term="cloud computing" /><category term="compression" /><category term="gigapult" /><category term="gogrid" /><category term="grid dynamics" /><category term="hpc" /><category term="maven 2" /><category term="openspaces.org" /><category term="semantic web" /><category term="~Alexander Kusnetsov" /><category term="~Ivan Mamontov" /><category term="~Kirill Ishanov" /><category term="~Max Martynov" /><category term="~Oleg Savrasov" /><category term="~Olga Kudryavtseva" /><category term="~Stan Klimoff" /><category term=".Net 4.0" /><category term="API" /><category term="AVX" /><category term="Amazon" /><category term="Big Data" /><category term="C#" /><category term="CI" /><category term="Codecs" /><category term="Conference" /><category term="DIH" /><category term="ETL" /><category term="HTTP" /><category term="Hessian" /><category term="Hyper-V" /><category term="JNI" /><category term="Kettle" /><category term="LDAP" /><category term="Mockito" /><category term="Native" /><category term="Networks" /><category term="Open Source Grid and Cluster Conference" /><category term="PackRat" /><category term="Powermock" /><category term="QA" /><category term="RIA" /><category term="RabbitMQ" /><category term="Remoting" /><category term="SSE" /><category term="Velocity" /><category term="Waters" /><category term="amazon ec2" /><category term="binary calculator" /><category term="data aware routing" /><category term="enterprise applications" /><category term="filesystems" /><category term="filters" /><category term="flex" /><category term="gemfire" /><category term="graph" /><category term="jclouds" /><category term="management" /><category term="nrt" /><category term="numeric range queries" /><category term="python" /><category term="rackspace" /><category term="range query" /><category term="scoring" /><category term="speech recognition" /><category term="spell correction" /><category term="trie fields" /><category term="visualization" /><category term="voice search" /><category term="~Alexander Tivelkov" /><category term="~Alexey Bokov" /><category term="~Alexey Kharlamov" /><category term="~Andrey Brindeyev" /><category term="~Andrey Klochkov" /><category term="~Andrey Kudryavtsev" /><category term="~Arseny Kaplun" /><category term="~Dmitri Babaev" /><category term="~Dmitry Sotnyk" /><category term="~Eugene Kirpichev" /><category term="~Ivan Bulanov" /><category term="~Kirill Shileev" /><category term="~Kirill Uvaev" /><category term="~Max Gorbunov" /><category term="~Max Morozov" /><category term="~Pavel Vasilyev" /><category term="~Roman Bogorodskiy" /><category term="~Shravan Kumar" /><category term="~Sylvia Kainz" /><category term="~Vadim Kirilchuk" /><category term="~Victor Samoylov" /><title type="text">Grid Designer's Blog</title><subtitle type="html" /><link rel="alternate" type="text/html" href="http://blog-archive.griddynamics.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default?start-index=26&amp;max-results=25&amp;redirect=false" /><author><name>Alexey Mikheev</name><uri>http://www.blogger.com/profile/02658753206865864556</uri><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>84</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/griddynamics" /><feedburner:info uri="griddynamics" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-3358529641860450113</id><published>2016-03-01T09:41:00.000-08:00</published><updated>2016-12-16T13:11:47.238-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="block-join" /><category scheme="http://www.blogger.com/atom/ns#" term="faceted navigation" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="search" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="~Eugene Steinberg" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><category scheme="http://www.blogger.com/atom/ns#" term="~Oleg Savrasov" /><title type="text">Block Join Faceting: Implementation </title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;h2 id="docs-internal-guid-915b00db-330a-39fc-6b69-499952944450" style="line-height: 1.656; margin-bottom: 4pt; margin-top: 16pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;To view an updated version of this post click &lt;a href="http://blog.griddynamics.com/how-to-implement-block-join-faceting-in-solr/lucene"&gt;he&lt;span id="goog_1551481894"&gt;&lt;/span&gt;&lt;span id="goog_1551481895"&gt;&lt;/span&gt;re&lt;/a&gt;&amp;nbsp;&lt;/span&gt;&lt;/h2&gt;&lt;h2 id="docs-internal-guid-915b00db-330a-39fc-6b69-499952944450" style="line-height: 1.656; margin-bottom: 4pt; margin-top: 16pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: #434343; font-family: &amp;quot;arial&amp;quot;; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;In &lt;/span&gt;&lt;a href="http://blog-archive.griddynamics.com/2016/02/block-join-faceting-task-definition.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;previous post&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;, we’ve been talking about business motivation behind support of structured documents in Solr/Lucene index and unique requirements to faceting engine which is created by such approach to modeling data. We introduced &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-5743" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;SOLR-5743&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; and now it is time to take a deep dive into implementation details. &lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;First, let us recap how Solr deals with structured data. Solr supports search of hierarchical documents using &lt;/span&gt;&lt;a href="http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;Block Join Query&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; (BJQ). Using this query requires special way of indexing documents. BJQ relies on special positioning of the documents in the index: all documents belonging to same hierarchy have to be indexed together, starting from child documents followed by their parent document. BJQ works as a bridge between levels of document hierarchy, e.g. it transforms matches on child documents to the matches on parent documents. When we search using BJQ, we provide a &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;child query&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; and a &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;parent filter &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;as a parameters. Child query represents what we are looking for among child documents, and parent filter tells BJQ how to distinguish parent documents from child documents in the index. For each matched child document, BJQ scans ahead in the index until it finds nearest parent document, which is sent into collector chain instead of child document. This trick of relying on relative document positioning in the index, or “index-time join”, is a secret behind high performance of BJQ.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Now, we are ready to look into faceting of BJQ results. We will use example of eCommerce catalog with Product-SKU parent-child relationship as modeling example.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Main idea is quite straightforward. We consider each hierarchy of matched documents separately. As we are using BJQ, each hierarchy is represented in the index as a document block, or DocSet Slice as we call it. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;First, we calculate facets based on matched SKUs from our block. Then, we aggregate obtained SKU counts into Product-level facet counts increasing product level facet count by only 1 for every matched block, irrespective to number of matched SKUs within the block. For example, if we are searching by COLOR:Blue, even though two Blue SKU were found within a block, aggregated Product-level counts will be increased only by 1. &lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: center; text-indent: 13.609296482412061pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;img alt="Screen Shot 2016-01-31 at 17.24.53.png" height="267" src="https://lh5.googleusercontent.com/El_Asnnq8x_xfUkZE_ODxkZmtkV7RDt0VKmTFogyLm_p6dG_ymIfMK5QmCxC_6JSyG4-5unJt5RIh4lLlJRSQ9epK7Son_fFHXWRVmeBFPFkydJpmI7AbE0olYNGvspk1BDBA3wh" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;This solution is implemented inside &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFacetComponent &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;which extends standard Solr &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;SearchComponent&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFacetComponent&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; validates the query and injects special &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFacetCollector&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; into Solr post-filter collectors chain. When &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFacetCollector&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; is invoked, it receives a parent document, since the &amp;nbsp;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFacetComponent &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;ensures that only &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;ToParentBlockJoinQuery&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; is allowed as a top level query. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Now, to adjust facet counts based on this parent, &amp;nbsp;we need to identify the children documents which matched within the current block. To do that we request &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinScorer&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; to &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;trackPendingChildHits()&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; and to &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;swapChildDocs()&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;. Our next step is to calculate facets on the obtained doc slice. Initially, we just invoked standard faceting routine e.g. employed &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;DocValuesFacet.getCounts()&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; providing it &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;DocSet &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;representing matched child documents from current index block. Finally, facet counts calculated on slice, were aggregated into global ones counting multiple hits on the child documents as one hit on parent document.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;We developed randomized test case &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFacetRandomTest &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;to ensure functional correctness of this implementation. This test checks facet calculation with different combinations of Products, SKUs and random filters.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Also, &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFacetComponent&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;was extended to work with Solr Cloud, and supplied with &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFacetDistribTest&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; as well. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;All looked good except one issue: faceting was relatively slow.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Profiling showed that utility &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;DocValuesFacet.getCounts()&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; was developed with assumption that it should be called only once for each particular field. However, with &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFacetComponent &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;we invoked this method for each matched parent document. As a result, lots of work was unnecessary repeated many times, such as: determining field type, allocating in-memory objects, determining top and segment &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;SortedSetDocValues&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;, migration &amp;nbsp;from segment facet counts to global ones. Thus, the first step of performance optimization was the introduction of &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFieldFacetAccumulator&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; which is responsible for accumulating facet counts for particular field. It performs all mentioned tasks during initialization or switching to next index reader. Those improvements showed 25 x performance gain in local benchmarks.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Next improvement was inspired by &lt;/span&gt;&lt;a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-reverse-nested-aggregation.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;Reverse Nested Aggregation Elastic Search solution&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;. Suppose, we need to calculate facets for some DocValues field. For each segment we know &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;SortedSetDocValues&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; of this field, which contains unique values of this field. For each of those values, we need to track a facet count, where each term has corresponding long value positioned in the parallel array of facet counters. Here we use another trick to save memory and ensure CPU cache locality: each long value in our facet counter array combines aggregated (parent-level) facet count for particular term (highest 4 bytes) and position of most recently matched parent document which has a block with hit for this term (lowest 4 bytes). &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;When &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFieldFacetAccumulator &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;is requested to update facet counts with particular parent document and an array of matched child documents, we:&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;iterate over matched children documents;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;for each child document we identify terms contained in the document&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;for each term we find corresponding array entry long and check if its encoded parent position matches provided parent document. If it matches this mean that we have duplicated hit within same block, so we just ignore it and proceed with iteration over terms. If positions do not match, we have a first hit in a new block and and encoded long value is updated to increment facet value counter and update last seen parent position.&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Thus, when all matched parent documents are processed, higher halves of the long values in accumulator array will contain required facet counts. This optimization delivered 2x performance boost. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Third optimization was suggested by Mikhail Khludnev. He proposed to calculate doc slice as an intersection of all matched child documents with all children of the current parent document. This approach is implemented as separate &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinDocSetFacetComponent&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;. On local benchmark it’s faster than &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFacetComponent &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;by 40%, but we believe that exact circumstances in which one component is better than another require further analysis. &amp;nbsp;We recommend to experiment with macro benchmarks using both implementations.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Components described in this blog post are committed into Solr codebase with &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-5743" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;SOLR-5743&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; in December 2015 and are available since Solr 5.5. Corresponding documentation is available at &lt;/span&gt;&lt;a href="https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;Solr Wiki&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; . In order to utilize one of the BJQ faceting components, you need to configure them in solrconfig.xml and introduce appropriate search handlers, for example&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&amp;lt;searchComponent name="blockJoinFacet"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;class="org.apache.solr.search.join.BlockJoinFacetComponent"/&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&amp;lt;requestHandler name="/blockJoinFacetRH" class="org.apache.solr.handler.component.SearchHandler"&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;lst name="defaults"&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;str name="shards.qt"&amp;gt;blockJoinFacetRH&amp;lt;/str&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/lst&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;arr name="last-components"&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;str&amp;gt;blockJoinFacet&amp;lt;/str&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/arr&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&amp;lt;/requestHandler&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&amp;lt;searchComponent name="blockJoinDocSetFacet"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;class="org.apache.solr.search.join.BlockJoinDocSetFacetComponent"/&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&amp;lt;requestHandler name="/blockJoinDocSetFacetRH"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;class="org.apache.solr.handler.component.SearchHandler"&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;lst name="defaults"&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;str name="shards.qt"&amp;gt;blockJoinDocSetFacetRH&amp;lt;/str&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/lst&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;arr name="last-components"&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;str&amp;gt;blockJoinDocSetFacet&amp;lt;/str&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/arr&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&amp;lt;/requestHandler&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Since only docValues fields could be used for BJQ faceting, you need to update corresponding fields configuration in schema.xml file, for example&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&amp;lt;field name="COLOR_s" type="string" indexed="true"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;stored="true" docValues="true"/&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&amp;lt;field name="SIZE_s" type="string" indexed="true"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;stored="true" docValues="true"/&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Then, after indexing some set of hierarchical documents like&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;lt;field name="id"&amp;gt;10&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;lt;field name="type_s"&amp;gt;parent&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;lt;field name="BRAND_s"&amp;gt;Nike&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field name="id"&amp;gt;11&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field name="type_s"&amp;gt;child&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field name="COLOR_s"&amp;gt;Red&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field name="SIZE_s"&amp;gt;XL&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field name="id"&amp;gt;12&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field name="type_s"&amp;gt;child&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field name="COLOR_s"&amp;gt;Blue&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;field name="SIZE_s"&amp;gt;XL&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;you need to pass required &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;ToParentBlockJoinQuery&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; to the configured request handler and request calculating facets using&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; child.facet.field&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; parameter, for example:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;a href="http://localhost:8983/solr/collection1/blockJoinFacetRH?q=%7B!parent+which%3D%22type_s%3Aparent%22%7Dtype_s%3Achild&amp;amp;wt=json&amp;amp;indent=true&amp;amp;facet=true&amp;amp;child.facet.field=COLOR_s&amp;amp;child.facet.field=SIZE_s" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;http://localhost:8983/solr/collection1/blockJoinFacetRH?q={!parent+which%3D%22type_s%3Aparent%22}type_s%3Achild&amp;amp;wt=json&amp;amp;indent=true&amp;amp;facet=true&amp;amp;child.facet.field=COLOR_s&amp;amp;child.facet.field=SIZE_s&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;If everything is configured correctly, then Solr response should look like&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;{&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;"responseHeader":{&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;"status":0,&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;"QTime":82},&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;"response":{"numFound":1,"start":0,"docs":[&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"id":"10",&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"type_s":["parent"],&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"BRAND_s":["Nike"],&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"_version_":1526159505694392320}]&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;},&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;"facet_counts":{&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;"facet_queries":{},&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;"facet_fields":{&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"COLOR_s":[&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Blue",1,&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"Red",1],&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"SIZE_s":[&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"XL",1]},&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;"facet_dates":{},&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;"facet_ranges":{},&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;"facet_intervals":{},&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;"facet_heatmaps":{}}}&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;You may also consider an alternative way for calculating facets on hierarchical product structures. &lt;/span&gt;&lt;a href="http://yonik.com/json-facet-api/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;Solr JSON Facet API&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; introduced by &lt;/span&gt;&lt;a href="https://twitter.com/lucene_solr" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;Yonik Seeley&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; allows to solve the same task in a bit different way. This is a new and really powerful feature appeared in Solr 5.4. It provides very flexible way to control the facets to be calculated. Particularly, it’s possible to calculate facets on child documents with aggregation by unique &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;_root_&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; field. For example, for request:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;curl http://localhost:8983/solr/collection1/query -d 'q=(type_s:child)&amp;amp;rows=0&amp;amp;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; json.facet={&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;colors:{&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;type : terms,&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;field : COLOR_s,&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;facet: {&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;productsCount: "unique(_root_)"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;},&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;sizes:{&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;type : terms,&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;field : SIZE_s,&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;facet: {&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;productsCount: "unique(_root_)"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;}&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; }&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;'&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Solr returns response that contains expected counts (check &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;productCount&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; field):&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;"facets":{&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;"count":2,&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;"colors":{&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"buckets":[{&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"val":"Blue",&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"count":1,&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"productsCount":1},&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;{&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"val":"Red",&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"count":1,&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"productsCount":1}]},&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"sizes":{&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"buckets":[{&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"val":"XL",&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"count":2,&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 10.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"productsCount":1}]}}&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;This way has obvious advantages: it’s easier to programmatically control which facets are to be calculated, how they should be presented in response, include statistic, nested facet commands etc. The drawback here is that search query in above example, i.e. &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;q=(type_s:child)&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;, yields child documents, while facets are calculated on parent ones. i.e. to present parent documents both in search results and in calculated facets, we need to execute 2 separate requests. Another reason to choose one of the &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFacetComponents&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;is much better performance. On my local test data &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinDocSetFacetComponent &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;is about twice faster than JSON Facet API.&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;courier new&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;BlockJoinFacetComponent &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;can be further improved by replacing iteration over matched child documents with bitwise operations over whole DocSets. Main idea here is to be able to obtain for each term a DocSet of matched parents with particular term in child documents. In this case, facet count of the term should correspond to the size of this DocSet.&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Happy (block) faceting!&lt;/span&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/3358529641860450113/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=3358529641860450113" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/3358529641860450113" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/3358529641860450113" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/zv15cWDHysc/block-join-faceting-implementation.html" title="Block Join Faceting: Implementation " /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2016/03/block-join-faceting-implementation.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-7558865540867855171</id><published>2016-02-27T17:01:00.000-08:00</published><updated>2016-12-16T13:16:07.993-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="block-join" /><category scheme="http://www.blogger.com/atom/ns#" term="faceted navigation" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="search" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="~Eugene Steinberg" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><category scheme="http://www.blogger.com/atom/ns#" term="~Oleg Savrasov" /><title type="text">Block Join Faceting: Introduction</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div style="text-align: left;"&gt;&lt;div style="text-align: justify; text-indent: 48px;"&gt;&lt;span style="font-family: arial;"&gt;&lt;span style="font-size: 14.6667px;"&gt;To see an updated version of this post click &lt;a href="http://blog.griddynamics.com/introduction-to-block-join-faceting-in-solr"&gt;here&lt;/a&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 14.6667px; font-weight: 400; line-height: 1.38; text-align: justify; text-indent: 36pt;"&gt;Every software application is created to bring business value. Typically, software development process starts from understanding business requirements and creating a domain model. Such a model is very helpful in communication with business stakeholders and allows to clearly understand their needs and restrictions. Additionally, simple and flexible domain model is a strong basis for creating effective and extensible software architecture that meets customer’s requirements.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Normally, business modeling starts with identifying entities and relationships between them. Relationships could be association or composition, and have different cardinalities, e.g. one-to-one, one-to-many and many-to-many relationships. Relationships are so important, that they are first class citizens in the relational databases and majority of data related specifications and frameworks like JPA or Hibernate.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;However, when we deal with search engines like Solr, we see that domain models readily supported by framework are quite simple. Each entity is represented as a document with some set of fields. That's it. It looks like Solr makes only basic steps in supporting all the variety of possible relationships between indexed documents, leaving the rest to application developer.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;At  the same time, for some business areas support of relationships is very important. In particular, such relationships introduce new challenges to the problem of facets calculation. As a example, lets consider  e-commerce platforms where each Product in catalog has several so-called Stock  Keeping Units (SKU). Each SKU defines a different flavor of the same item. Even though customers are purchasing SKUs, e.g. concrete flavor of the product, typical e-commerce business merchandizes in terms of the product.&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&lt;/span&gt; &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;a href="https://lh6.googleusercontent.com/Uf2VOChiYcSznNqioNpJfUFwJPiUuThF5szyy6MzUaip_ZuRyU0kZvUCXLVshzT_tbkf-RapreO9kNyZJc7sfLPgM3N0CtkpO4C_s0gm6_-YNuqXmcGlDeGgHcZ8UxGA6R6Md8Xp" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img alt="Screen Shot 2014-10-25 at 11.14.30 .png" border="0" height="222" src="https://lh6.googleusercontent.com/Uf2VOChiYcSznNqioNpJfUFwJPiUuThF5szyy6MzUaip_ZuRyU0kZvUCXLVshzT_tbkf-RapreO9kNyZJc7sfLPgM3N0CtkpO4C_s0gm6_-YNuqXmcGlDeGgHcZ8UxGA6R6Md8Xp" style="border: medium none; transform: rotate(0rad);" width="277" /&gt;&lt;/a&gt;&lt;a href="https://lh6.googleusercontent.com/XaDVJQa3atJpRrxfLvgKzcz8a4nmwbBCEIV34C4fIeOw2_yOhkunDCStxKoyZk0MhzPIHdkiuwerJHxjXAsfDuQJcRvQLv2kbbSuskjjo5pJAdQrZzDFgH8XRfGxwRzLSLFPVhf5" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img alt="Product_SKU_class.png" border="0" height="226" src="https://lh6.googleusercontent.com/XaDVJQa3atJpRrxfLvgKzcz8a4nmwbBCEIV34C4fIeOw2_yOhkunDCStxKoyZk0MhzPIHdkiuwerJHxjXAsfDuQJcRvQLv2kbbSuskjjo5pJAdQrZzDFgH8XRfGxwRzLSLFPVhf5" style="border: medium none; transform: rotate(0rad);" width="93" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center; text-indent: 0.16327180140038564pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 0.16327180140038564pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 0.16327180140038564pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 0.16327180140038564pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 0.16327180140038564pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 0.16327180140038564pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 0.16327180140038564pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 0.16327180140038564pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 0.16327180140038564pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 0.16327180140038564pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 0.16327180140038564pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 0.16327180140038564pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Screenshot above is taken from one of the online retailers. As we can see, a dress could be in blue, pink or red colors, and for a blue color dress, only sizes XS and S are available. However, for &amp;nbsp;merchandizer and &amp;nbsp;for a customers it’s just a single product. So, when customer navigates the site, &amp;nbsp;she should see all SKUs belonging to the same product as a single product. This means that for facet calculation, our facet counts should represent products, not SKUs. Thus, we need to find some approach to aggregate SKU-level facets into product ones. &lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: center; text-indent: 0.16327180140038564pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;img alt="Product_Upc_object.png" height="178" src="https://lh4.googleusercontent.com/nrswo31CZM3Nr4T-kaurJV8JTFHPrPzAzQcwMPHlqtWAIMy0Jy3Wma4H-rndni7ml0j9ELL9VMCxxVILi-kL2Kug6fSYz7P_l25tnSnNviOFQxdpXaLZkou6s-XgWwHq3_dkisMz" style="border: medium none; transform: rotate(0rad);" width="532" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Pretty common solution here is to propagate properties from SKU level to product level and produce single product document with multivalued fields aggregated from SKU. With this approach, our aggregated product will look as follows:&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;img alt="MultivaluedProduct.png" height="92" src="https://lh6.googleusercontent.com/JKFuKZ0K2i2OkmEvnLQfUmRyeLxcgnq_YmbOTcofl39IUxpeU-JF1b49WZ9SE66wIfIwl8sV_fa-f7TBlpilaVnBoNAEN6XQY01AaKXVv-ClDXy7KrUMOl2zmcYAeznwNrrPWYMP" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="147" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: justify; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;However, this approach creates possibility for false positive matches with regards to combinations of SKU-level fields. For example, if customer will filter by color ‘Blue’ and size ‘M’, Product_1 will be considered a valid match, even though there is no SKU in original catalog which is both 'Blue' and 'M'. This happens because when we are aggregating values from SKU level, we are loosing information about what value comes from what SKU. Even though this situation looks like an edge case, in real life application it can cause really bad customer experience. Imagine the situation when customer searched for particular item filtering by colors and sizes only to discover on checkout pages that there is no such item available in the catalog. This really frustrates customers and negatively impact customer loyalty. Not good for the business.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.6667px; text-indent: 36pt; vertical-align: baseline;"&gt;Getting back to technology, this means that we should carefully support catalog structure when searching and faceting products. The problem of searching structured data is already addressed in Solr with powerful, high performance and robust solution:&amp;nbsp;&lt;/span&gt;&lt;a href="http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html" style="line-height: 1.656; text-decoration: none; text-indent: 36pt;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.6667px; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Block Join Query&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 14.6667px; text-indent: 36pt; vertical-align: baseline; white-space: pre-wrap;"&gt;.&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 14.6667px; line-height: 1.656; text-indent: 36pt;"&gt;&amp;nbsp;We &lt;a href="http://blog.griddynamics.com/search/label/block-join"&gt;wrote&lt;/a&gt;&amp;nbsp;about this approach in this blog extensively.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.6667px; text-indent: 36pt; vertical-align: baseline;"&gt;However, problem of faceting structured data required further work. So, we created&amp;nbsp;&lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-5743" style="line-height: 1.656; text-decoration: none; text-indent: 36pt;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.6667px; text-decoration: underline; vertical-align: baseline;"&gt;SOLR-5743&lt;/span&gt;&lt;/a&gt;&lt;span style="color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.6667px; text-indent: 36pt; vertical-align: baseline;"&gt;&amp;nbsp;in February 2014 and worked on it ever since. Now, we are happy to report that first robust and high performance implementation is committed to trunk.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.6667px; text-indent: 36pt; vertical-align: baseline;"&gt;We will describe new BJQ faceting component and related algorythms in our next blog post. Stay tuned and happy faceting!&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/7558865540867855171/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=7558865540867855171" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/7558865540867855171" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/7558865540867855171" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/lhrtHekK-sI/block-join-faceting-task-definition.html" title="Block Join Faceting: Introduction" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2016/02/block-join-faceting-task-definition.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-8717178606483505684</id><published>2016-01-20T11:21:00.000-08:00</published><updated>2016-12-16T14:41:26.555-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="block-join" /><category scheme="http://www.blogger.com/atom/ns#" term="faceted navigation" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><title type="text">Block Join Faceting in Solr and other..</title><content type="html">&lt;div dir="ltr" id="docs-internal-guid-8e27c0a8-6078-70da-9e52-10d464710bfb" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;To see an updated version of this post click &lt;a href="http://blog.griddynamics.com/a-frustrating-personal-experience-with-unfaceted-search"&gt;here&lt;/a&gt;&amp;nbsp;&lt;img height="468" src="https://lh4.googleusercontent.com/99Z12iYJIV94B3aQ41J0i-xyUcRrwRolmlfGhT5PDKQsnXa-seQ1jlXSGcA6Zd1ZNNbUxOxlMXC-7snzRc4cvzjsxsXjtmIt1DL9ucKdZtqN5DGEj--e4cpUajYQGxNOJxOmuzht" style="border: none; transform: rotate(0rad);" width="624" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;I recently had a nice flight with onboard entertainment. It was quite well, but look at the film menu navigation! It definitely lacks an ability to nest different languages into the movie item and facet based on the movie languages. As a result, it wastes screen space and forces user to unnecessarily scroll titles again and again. &lt;/span&gt;&lt;a href="http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;Faceted navigation on nested items&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; is one of our favorite topics when it comes to search experience. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;It reminds me that block join facet functionality mentioned in &lt;/span&gt;&lt;a href="http://blog.griddynamics.com/2013/09/solr-block-join-support.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;earlier post&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; was recently committed under &amp;nbsp;&lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-5743" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;SOLR-5743&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; and will be available since Solr 5.5. So, it can be found in snapshots of &lt;/span&gt;&lt;a href="https://builds.apache.org/job/Lucene-Artifacts-5.x/lastSuccessfulBuild/artifact/lucene/dist/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;5.5&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://builds.apache.org/job/Lucene-Artifacts-trunk/lastSuccessfulBuild/artifact/lucene/dist/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;6.0&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;. Please check &lt;/span&gt;&lt;a href="https://cwiki.apache.org/confluence/display/solr/BlockJoin+Faceting" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;Solr Reference guide&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; for the detailed description and sample request. &lt;/span&gt;&lt;/div&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;It’s worth to mention that this functionality is known as &lt;/span&gt;&lt;a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-reverse-nested-aggregation.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;reverse nested aggregation&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; in Elastic (Thanks to &lt;/span&gt;&lt;a href="http://blog.griddynamics.com/search/label/~Andrey%20Kudryavtsev" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;Andrey Kudryavtsev&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; for the finding), and also it's can be achieved as &lt;/span&gt;&lt;a href="http://lucene.472066.n3.nabble.com/Parent-Child-Nested-Document-Faceting-tp4211632p4239852.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;unique(ID) aggregation&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; in &lt;/span&gt;&lt;a href="http://yonik.com/solr-nested-objects/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;JSON Facets&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; (this solution was suggested by &lt;/span&gt;&lt;a href="https://plus.google.com/106247029640344817659" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;Alessandro Benedetti&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;). Nethertheless, I suppose dedicated components might be more convenient for users, but in future I prefer to merge them into JSON Facets framework. &lt;/span&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/8717178606483505684/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=8717178606483505684" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/8717178606483505684" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/8717178606483505684" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/E3sGjLR85rc/block-join-faceting-in-solr-and-other.html" title="Block Join Faceting in Solr and other.." /><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/blank.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2016/01/block-join-faceting-in-solr-and-other.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-7261881540034640587</id><published>2016-01-11T01:41:00.000-08:00</published><updated>2016-01-11T02:26:44.801-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="API" /><category scheme="http://www.blogger.com/atom/ns#" term="speech recognition" /><category scheme="http://www.blogger.com/atom/ns#" term="voice search" /><category scheme="http://www.blogger.com/atom/ns#" term="~Andrey Kudryavtsev" /><title type="text">Automatic Speech Recognition Services Comparison</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;h2 dir="ltr" style="line-height: 1.38; margin-bottom: 6pt; margin-top: 18pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 21.333333333333332px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Automatic Speech Recognition Services Comparison&lt;/span&gt;&lt;/h2&gt;&lt;h1 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Introduction&lt;/span&gt;&lt;/h1&gt;&lt;div&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;“Ok Google, find me a red dress.” Your long-time customer has just been invited to an important party this evening and wants to make a good impression. She’s on her way to your store right now and can’t spend any time typing in searches while she drives. Instead of saying, “Ok, Google…” wouldn’t you rather she said, “Ok, MyFavoriteStore name?”&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Both Apple and Google have done a good job educating users on the value and ease of voice-controlled features. So how mature is commercial speech recognition today? As Grid Dynamics has extensive experience in eCommerce and search solutions, we decided to take a look at the current speech recognition technologies available for voice search implementation. In this article we will share the results from our experiment - comparing the quality of different speech recognition providers. &lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Services&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Before the Experiment was started, our team reviewed multiple providers of automatic speech recognition. We have used the following criteria for selection of the service to evaluate:&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Unified, cross-platform interface. It means service availability via HTTP REST interface&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Speech recognition quality “out of the box” without any tuning for particular customer&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Free (or low price) for initial testing of service&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Speech recognition provided as a &lt;/span&gt;&lt;a href="https://en.wikipedia.org/wiki/Software_as_a_service" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;SaaS&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-family: Arial; font-size: 14.6667px; line-height: 1.38; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: Arial; font-size: 14.6667px; line-height: 1.38; white-space: pre-wrap;"&gt;We compared the following services.&lt;/span&gt;&lt;br /&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Google&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span id="docs-internal-guid-38bfce8a-3030-e08b-4d27-d76a22a0c5d8"&gt;&lt;li dir="ltr" style="font-family: Arial; font-size: 14.6667px; list-style-type: disc; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="http://www.nuance.com/index.htm" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-size: 14.6667px; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Nuance&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="font-family: Arial; font-size: 14.6667px; list-style-type: disc; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="https://developer.att.com/" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-size: 14.6667px; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;AT&amp;amp;T&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="font-family: Arial; font-size: 14.6667px; list-style-type: disc; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span id="docs-internal-guid-38bfce8a-3038-5279-36a4-5f525c704425"&gt;&lt;a href="https://wit.ai/" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-size: 14.6667px; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;WIT&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="font-family: Arial; font-size: 14.6667px; list-style-type: disc; vertical-align: baseline;"&gt;&lt;a href="http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/" style="font-size: 14.6667px; line-height: 1.38; text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-size: 14.6667px; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;IBM Watson&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;/span&gt;&lt;/ul&gt;&lt;span id="docs-internal-guid-38bfce8a-3030-e08b-4d27-d76a22a0c5d8"&gt;&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;span id="docs-internal-guid-38bfce8a-3030-e08b-4d27-d76a22a0c5d8"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;span id="docs-internal-guid-38bfce8a-3030-e08b-4d27-d76a22a0c5d8"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: Arial; font-size: 18.6667px; line-height: 1.38; white-space: pre-wrap;"&gt;Google&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: Arial; font-size: 18.6667px; line-height: 1.38; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Google Speech API is not “production” ready.&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Experimental status can change API at any time&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;No official API documentation or usage capabilities &lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Limitations of approximately 500 requests per day, per account&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;You need to join &lt;/span&gt;&lt;a href="https://groups.google.com/a/chromium.org/forum/?fromgroups#!forum/chromium-dev" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Chromium-dev mail group&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; and generate appropriate key in &lt;/span&gt;&lt;a href="https://console.cloud.google.com/home/dashboard" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Google developer console&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; &amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Example of API usage:&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="background-color: white; font-family: Arial; font-size: 14.6667px; font-style: italic; line-height: 1.38; white-space: pre-wrap;"&gt;curl -X POST \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--header 'Content-Type: audio/x-flac; rate=44100;' \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--data-binary @red_dress.flac \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;'https://www.google.com/speech-api/v2/recognize?lang=en-us&amp;amp;key=&amp;lt;KEY&amp;gt;'&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Nuance&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Nuance speech recognition REST API features:&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="https://developer.nuance.com/public/index.php?task=register" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Registration&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; is required&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Account upgrade from Silver- to Gold-level offered for free&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Usage limitations of 5,000 requests per day&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;span style="font-family: Arial;"&gt;&lt;span style="font-size: 14.6667px; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Example of API usage:&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;curl -X POST \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--header "Content-Type: audio/x-wav;codec=pcm;bit=16;rate=16000" &amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--header "Accept: application/xml" \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--header "Accept-Topic: Dictation" \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--data-binary @red_dress.wav \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;"https://dictation.nuancemobility.net:443/NMDPAsrCmdServlet/dictation?appId=&amp;lt;APP_ID&amp;gt;&amp;amp;appKey=&amp;lt;APP_KEY&amp;gt;"&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;AT&amp;amp;T&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;AT&amp;amp;T speech recognition REST API features:&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="https://developer.att.com/developer/flow/apiPlaygroundFlow.do?execution=e7s1" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Registration&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; is required &lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Required “Premium Access” payment is &lt;/span&gt;&lt;a href="http://developer.att.com/pricing/speech-pricing-details" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;$99/year + Usage fees &lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;to access automatic speech recognition &amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="http://developer.att.com/apis/speech/docs" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;AT&amp;amp;T REST API&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; uses OAuth 2.0 for authorization &amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;According to documentation usage limitations is &lt;/span&gt;&lt;a href="https://developer.att.com/support/faqs/att-developer-program-and-api-platform-faqs#what-are-maximum-transaction-rates-for-apis" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;1 request per second&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Example of API usage:&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="background-color: white; font-family: Arial; font-size: 14.6667px; font-style: italic; line-height: 1.38; white-space: pre-wrap;"&gt;curl -X POST \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--header "Authorization: Bearer &amp;lt;TOKEN&amp;gt;" \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--header "Content-Type: audio/x-wav" &amp;nbsp;&amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--data-binary "@red_dress.wav" &amp;nbsp;&amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;"https://api.att.com/speech/v3/speechToText"&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;WIT&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;WIT is more about &lt;/span&gt;&lt;a href="https://en.wikipedia.org/wiki/Natural_language_processing" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;NLP&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; (Natural Language Processing) than about plain-speech recognition.&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Main focus, besides speech recognition, is to parse out spoken phrases and extract valuable information (e.g., some voice command). The goal is to have the system “understand” voice. For example, play “Jingle Bells” when the user says, “Hi, robot! Please play me Christmas songs.”&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Github account is all that is needed to access &lt;/span&gt;&lt;a href="https://wit.ai/docs/http/20141022" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;WIT REST API&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;No account usage limitation&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;div style="line-height: 1.38;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Example of API usage:&lt;/span&gt;&lt;/div&gt;&lt;div style="line-height: 1.38;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;span style="font-family: Arial; font-size: 14.6667px; font-style: italic; line-height: 1.38; white-space: pre-wrap;"&gt;curl -X POST \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--header "Authorization: Bearer &amp;lt;TOKEN&amp;gt;" \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--header "Content-Type: audio/wav" \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--data-binary "@&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;red_dress.wav&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;" &amp;nbsp;&amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;"https://api.wit.ai/speech?v=20141022"&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;IBM Watson &lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;IBM Speech recognition REST API features:&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Public API was released to the public in early 2015&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Registration in &lt;/span&gt;&lt;a href="https://console.ng.bluemix.net/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Bluemix&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; is required&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Usage limitations of 150,000 requests per month&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Example of API usage:&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: Arial; font-size: 14.6667px; font-style: italic; line-height: 1.38; white-space: pre-wrap;"&gt;curl -X POST \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--header "Content-Type: audio/flac" \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--user &amp;lt;USERNAME&amp;gt;:&amp;lt;PASSWORD&amp;gt; \&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;--data-binary "@red_dress.flac" &amp;nbsp;&amp;nbsp;\&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;"https://stream.watsonplatform.net/speech-to-text/api/v1/recognize"&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: italic; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.6667px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;b&gt;Experiment&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.6667px; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;To compare quality of speech recognition, you first need a recorded voice. As we worked on voice search features for eCommerce, we recorded eCommerce-like search phrases. We used short phrases such as: brand names, colors, sizes, etc. Here’s a sample of the phases used - “red dress,” “Calvin Klein jeans” and “xl coat.” We leveraged over 3,000 different phrases for this experiment and compared different conditions like gender, age and background noise (I.e., with or without noise), as well as other criteria.&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;We used the following sequence for experiment purposes.&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Delivered an audio file with recorded search phrases to external services&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Received recognized text from automatic speech recognition service&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Evaluated quality metrics of recognized text vs. actual search phrase &lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;We used multiple quality metrics, such as: &lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Volume of exact recognized phrases&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: circle; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Simple, but a paramount quality metric&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: circle; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Larger number of exact recognized phrases, the better quality of speech recognition results &lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li dir="ltr" style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="https://en.wikipedia.org/wiki/Word_error_rate" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Word Error Rate&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; (WER)&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: circle; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Minimum number of words edits (I.e., insertions, deletions or substitutions) required to change one phrase into the other&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: circle; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Normalized by phrase length (basically leveraging &lt;/span&gt;&lt;a href="https://en.wikipedia.org/wiki/Levenshtein_distance" style="text-decoration: none;"&gt;&lt;span style="background-color: white; color: #0b0080; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Levenshtein distance&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; between two phrases working at the word level, instead of the phenomenal level)&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: circle; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Fewer number of required edits, which meant that the phrases are more like each other - offering the best quality of speech recognition&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Comparison Results&lt;/span&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;The quality champion is Google. We didn’t reproduce the &lt;/span&gt;&lt;a href="http://venturebeat.com/2015/05/28/google-says-its-speech-recognition-technology-now-has-only-an-8-word-error-rate/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;declared by Google 8% WER&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; with our Grid Dynamics’ data, but the results are still impressive. Google achieved 73.3% &lt;/span&gt;&lt;span style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;of exact recognized phrases with a 15.8% WER.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: #252525; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Nuance came in second place by a large margin. In Nuance, &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;44.1% of the phrases were recognized perfectly and the WER was 39.7%. IBM (46.9.3% and 42.3% WER) came in third place. While AT&amp;amp;T and WIT had the exact same WER - 63.3%, with a small advantage in exact recognition by AT&amp;amp;T (32.8% vs 29.5%, WIT).&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Word Error Rate (less is better):&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;img alt="wer.png" height="371px;" src="https://lh6.googleusercontent.com/jgBBj7kvbvOSsoBfoUgWSib0nqC3vLnC_kTqCujQmihKFoyNbw-MeEOL-bvWOu96O6lK8l3KrTlo6Gflf9bagv-t_RWcZMoUSsAM8Jglnh-75ht1YNvT9Ji3K4C-l2nba9qwEa-S" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="600px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Percentage of Exact Recognized Phrases (more is better):&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;img alt="exact_phrases.png" height="371px;" src="https://lh6.googleusercontent.com/Csz5DMlb4eXFGuojL8tpkqIfdcf5my3p7T8wMEzS1VlL2gTxAbtBc1brNowZ_i_iVvf7AhbZtn5yOFCEV4vkTlKqxyO6ZSxwAMbHbDmon6D8A2hdu3RNP-gcvhSQcWg3CK8R9vWD" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="600px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Conclusion&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 18.666666666666664px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Based on our test criteria of exact recognized phrases and word error rate, Google is by far the best solution out of the box. This is not surprising given their history of developing and proving voice search, but unfortunately - for now - it is not commercially available. Google’s quality, however, could be used as a benchmark for the commercially available products as many of them have tools and features for customizing search experience.&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Exact phrase match and word error rate are only two issues to provide world-class voice search that your customers will soon expect. Additional challenges are speech recognition performance and recognizing eCommerce-specific terms. For instance, consider searches like brands, sizes, materials and, of course, long/complex phrase recognition (I.e., “Ok, MyFavoriteRetailer, find me a Ralph Lauren or Ann Taylor red cocktail dress, knee length and open back, in a size 9 that isn’t dry-clean only”).&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;But, we will discuss those challenges and our solutions in future articles.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/7261881540034640587/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=7261881540034640587" title="9 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/7261881540034640587" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/7261881540034640587" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/8vwtTudnNrU/automatic-speech-recognition-services.html" title="Automatic Speech Recognition Services Comparison" /><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/blank.gif" /></author><thr:total>9</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2016/01/automatic-speech-recognition-services.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-1577921516399409075</id><published>2015-08-23T14:21:00.000-07:00</published><updated>2015-09-01T06:00:14.529-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="block-join" /><category scheme="http://www.blogger.com/atom/ns#" term="join" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="scoring" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><title type="text">Scoring Join Party in Solr 5.3</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div dir="ltr" id="docs-internal-guid-0676e79e-5c66-3558-8e4f-69a006a7ce40" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;New Solr release 5.3 brings two long awaited improvements in joins: &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-6234" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;SOLR-6234&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-5882" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;SOLR-5882&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;. Join operation in Solr is as demanded as challenging. Well-known approaches are query-time &lt;/span&gt;&lt;a href="https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;{!join}&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; and index-time block join &lt;/span&gt;&lt;a href="https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;{!parent}&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=z1RqZsjhIMM&amp;amp;t=641" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;Here&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; I briefly describe the difference between two joins. Until Solr 5.3, both query parsers yield scoreless queries. Since Solr 5.3, a local parameter &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;{!&amp;lt;join|parent&amp;gt; … score=&amp;lt;avg|max|min|none|total&amp;gt;}..&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; has been introduced in both query parsers. The value of &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;score&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; parameter defines how scores of subordinate query (“from” side query and “children” query) are aggregated into join query scores.&lt;/span&gt;&lt;/div&gt;&lt;h3 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 8pt;"&gt;&lt;span style="background-color: transparent; color: #666666; font-family: 'Trebuchet MS'; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: line-through; vertical-align: baseline;"&gt;A Few&lt;/span&gt;&lt;span style="background-color: transparent; color: #666666; font-family: 'Trebuchet MS'; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline;"&gt; Notes about Query-Time Join&lt;/span&gt;&lt;/h3&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Enabling score and even specifying &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;{!join … score=none}..&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; picks the different algorithm - Lucene’s &lt;/span&gt;&lt;a href="http://lucene.apache.org/core/5_2_1/join/org/apache/lucene/search/join/JoinUtil.html#createJoinQuery%28java.lang.String,%20boolean,%20java.lang.String,%20org.apache.lucene.search.Query,%20org.apache.lucene.search.IndexSearcher,%20org.apache.lucene.search.join.ScoreMode%29" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;JoinUtil&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; instead of the original Solr’s one that might have different performance. Also it can help you to use join query as &lt;/span&gt;&lt;a href="https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-XMLUpdateCommands" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;Delete Query&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;, which lead to ClassCastException before, see &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-6357" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;SOLR-6357&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;. There is more subtle one, multiple values is supported only for “from” side (subordinate query), but not for “to” (result document). Thanks to Ryan Josal who &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-6234?focusedCommentId=14586444&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14586444" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;brought this up&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Also note, that Lucene's join algorithm expect "from" field to be string docValues. If docValues is not enabled for “from” field, it will take some heap for uninverting. It is enough for "to" field to be just indexed, like with regular Solr's join.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Query-time join with scoring still supports cross-core join via “fromIndex” parameter, however, it works for Solr cores only, and support for SolrCloud collections arriving in Solr 5.4, follow-up &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-7775" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;SOLR-7775&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;/div&gt;&lt;h3 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 8pt;"&gt;&lt;span style="background-color: transparent; color: #666666; font-family: 'Trebuchet MS'; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline;"&gt;The Lab Time&lt;/span&gt;&lt;/h3&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Now let's index a few docs and experiment with join scoring. Note, for sake of simplicity this data work for both types of joins, I’ve never seen such blend in real life.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;"&gt;&lt;pre style="line-height: 125%; margin: 0;"&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;update&amp;gt;&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;delete&amp;gt;&amp;lt;query&amp;gt;&lt;/span&gt;*:*&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/query&amp;gt;&amp;lt;/delete&amp;gt;&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;add&amp;gt;&lt;/span&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"id"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;1&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"type_s"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;product&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"name_s"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;expensive blue&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;br /&gt;               &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"id"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;sku1 of 1&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;               &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"color_s"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;Red&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;               &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"price_i"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;10&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;               &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"prod_id_s"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;1&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;br /&gt;               &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"id"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;sku 2 of 1&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;               &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"color_s"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;Blue&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;               &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"price_i"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;200&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;               &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"prod_id_s"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;1&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;br /&gt;       &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;       &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"id"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;2&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"type_s"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;product&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"name_s"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;expensive red&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;br /&gt;            &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"id"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;sku 1 of 2&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;            &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"color_s"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;Red&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;            &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"price_i"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;300&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;            &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"prod_id_s"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;2&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;br /&gt;            &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"id"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;sku 2 of 2&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;            &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"color_s"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;Blue&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;            &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"price_i"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;40&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;            &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"prod_id_s"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;2&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/add&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;commit/&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/update&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Here, both of products have red SKUs, let's sort products &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: line-through; vertical-align: baseline;"&gt;by color of SKU&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; by price of SKU which matches the Red filter. To do that we need to use price_i field as a function query in subordinate query. &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Here we use &lt;a href="https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser"&gt;Constant Score Query Syntax ^=&lt;/a&gt; to ignore score from the filtering (Red) clause.&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.6666666666667px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;&lt;a href="http://localhost:8983/solr/techproducts/select?q=%7B!join%20from=prod_id_s%20to=id%20score=max%7D%2Bcolor_s:Red%5E=0%20%7B!func%7Dprice_i&amp;amp;wt=json&amp;amp;indent=true&amp;amp;fl=score,*,[docid]" style="text-decoration: none;"&gt;http://localhost:8983/solr/techproducts/select?q={!join from=prod_id_s to=id score=max}+color_s:Red^=0 {!func}price_i&amp;amp;wt=json&amp;amp;indent=true&amp;amp;fl=score,*,[docid]&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;blockquote class="tr_bq"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;!-- HTML generated using hilite.me --&gt;&lt;br /&gt;&lt;div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;"&gt;&lt;pre style="line-height: 125%; margin: 0;"&gt;&lt;span style="color: black; font-weight: bold;"&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: #204a87; font-weight: bold;"&gt;"response"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: black; font-weight: bold;"&gt;{&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: #204a87; font-weight: bold;"&gt;"numFound"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;2&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: #204a87; font-weight: bold;"&gt;"start"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;0&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: #204a87; font-weight: bold;"&gt;"maxScore"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;300&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: #204a87; font-weight: bold;"&gt;"docs"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: black; font-weight: bold;"&gt;[&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: black; font-weight: bold;"&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;"id"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"2"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;"type_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"product"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;"name_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"expensive red"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;"score"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;300&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: black; font-weight: bold;"&gt;},&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: black; font-weight: bold;"&gt;{&lt;/span&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;"id"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"1"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;"type_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"product"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;"name_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"expensive blue"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;"score"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;10&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: black; font-weight: bold;"&gt;}&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: black; font-weight: bold;"&gt;]&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: black; font-weight: bold;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span style="color: black; font-weight: bold;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;pre style="line-height: 125%; margin: 0;"&gt;&lt;span style="color: black; font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Filtering Blue SKUs flips the products ordering:&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.6666666666667px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;&lt;a href="http://localhost:8983/solr/techproducts/select?q=%7B!join%20from=prod_id_s%20to=id%20score=max%7D%2Bcolor_s:Blue%5E=0%20%7B!func%7Dprice_i&amp;amp;wt=json&amp;amp;indent=true&amp;amp;fl=score,*,[docid]" style="text-decoration: none;"&gt;http://localhost:8983/solr/techproducts/select?q={!join from=prod_id_s to=id score=max}+color_s:Blue^=0 {!func}price_i&amp;amp;wt=json&amp;amp;indent=true&amp;amp;fl=score,*,[docid]&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;!-- HTML generated using hilite.me --&gt;&lt;br /&gt;&lt;div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;"&gt;&lt;pre style="line-height: 125%; margin: 0;"&gt;&lt;span style="color: black; font-weight: bold;"&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: #204a87; font-weight: bold;"&gt;"numFound"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;2&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: #204a87; font-weight: bold;"&gt;"start"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;0&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: #204a87; font-weight: bold;"&gt;"maxScore"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;200&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: #204a87; font-weight: bold;"&gt;"docs"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: black; font-weight: bold;"&gt;[&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: black; font-weight: bold;"&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"id"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"1"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"type_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"product"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"name_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"expensive blue"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"score"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;200&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: black; font-weight: bold;"&gt;},&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: black; font-weight: bold;"&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"id"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"2"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"type_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"product"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"name_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"expensive red"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"score"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;40&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: black; font-weight: bold;"&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: black; font-weight: bold;"&gt;]&lt;/span&gt;&lt;br /&gt;&lt;span style="color: black; font-weight: bold;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;pre style="line-height: 125%; margin: 0;"&gt;&lt;span style="color: black; font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;As I said, the data works with index-time join, so following query will yield the same result: &lt;/span&gt;&lt;a href="http://localhost:8983/solr/techproducts/select?q=%7B!parent%20which=type_s:product%20score=max%7D%2Bcolor_s:Red%5E=0%20%7B!func%7Dprice_i&amp;amp;wt=json&amp;amp;indent=true&amp;amp;fl=score,*,[docid]" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;http://localhost:8983/solr/techproducts/select?q={!parent which=type_s:product score=max}+color_s:Red^=0 {!func}price_i&amp;amp;wt=json&amp;amp;indent=true&amp;amp;fl=score,*,[docid]&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/div&gt;&lt;h3 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 8pt;"&gt;&lt;span style="background-color: transparent; color: #666666; font-family: 'Trebuchet MS'; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline;"&gt;Near Real-Time boosting&lt;/span&gt;&lt;/h3&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Another challenge is to be able to update document order in real-time, eg sort by stock units, update price or rating etc. Straightforward approach is to use &lt;/span&gt;&lt;a href="https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;Atomic Updates&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;, however internally it loads all stored fields of the updated documents, fully reindexes it, then commits and flushes all caches. It sounds like “Far Real-Time”, isn’t it? My irony doesn’t mean I know an ideal solution, as this problem is a way complex per se. &lt;/span&gt;&lt;/div&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Here is &lt;strike&gt;a better&lt;/strike&gt; another idea: if the document has many field and thus expensive for update, let’s put all frequently updated fields into separate core! Let’s create a separate core for small stock unit documents linked to SKU documents in main core:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;"&gt;&lt;pre style="line-height: 125%; margin: 0;"&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;update&amp;gt;&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;delete&amp;gt;&amp;lt;query&amp;gt;&lt;/span&gt;*:*&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/query&amp;gt;&amp;lt;/delete&amp;gt;&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;add&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"id"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;sku1 of 1&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"units_i"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;777&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;       &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;br /&gt;       &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"id"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;sku 2 of 1&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"units_i"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;88&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;       &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;       &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"id"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;sku 1 of 2&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"units_i"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;99&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;       &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;br /&gt;       &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"id"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;sku 2 of 2&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"units_i"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;66&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;       &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;br /&gt;   &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/add&amp;gt;&lt;/span&gt;&lt;br /&gt;   &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;commit/&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/update&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;pre style="line-height: 125%; margin: 0;"&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Now we can run functional query on this core, obtaining number of units available, and then pass it into main core with query-time cross-core join. Then, these scores passed again when join SKUs to products.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="http://localhost:8983/solr/techproducts/select?q=%7B!parent%20which=type_s:product%20score=max%7D%2Bcolor_s:Red%5E=0%20%7B!join%20from=id%20to=id%20fromIndex=stocks%20score=max%7D%7B!func%7Dunits_i&amp;amp;wt=json&amp;amp;indent=true&amp;amp;fl=score,*,[docid]" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;http://localhost:8983/solr/techproducts/select?q={!parent which=type_s:product score=max}+color_s:Red^=0 {!join from=id to=id fromIndex=stocks score=max}{!func}units_i&amp;amp;wt=json&amp;amp;indent=true&amp;amp;fl=score,*,[docid]&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;We see how it bumps up particular product which has SKU which is passing the filter with high amount available.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;!-- HTML generated using hilite.me --&gt;&lt;br /&gt;&lt;div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;"&gt;&lt;pre style="line-height: 125%; margin: 0;"&gt;&lt;span style="color: black; font-weight: bold;"&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: #204a87; font-weight: bold;"&gt;"numFound"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;2&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: #204a87; font-weight: bold;"&gt;"start"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;0&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: #204a87; font-weight: bold;"&gt;"maxScore"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;777&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: #204a87; font-weight: bold;"&gt;"docs"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: black; font-weight: bold;"&gt;[&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: black; font-weight: bold;"&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"id"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"1"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"type_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"product"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"name_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"expensive blue"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"score"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;777&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: black; font-weight: bold;"&gt;},&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: black; font-weight: bold;"&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"id"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"2"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"type_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"product"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"name_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"expensive red"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"score"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;99&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: black; font-weight: bold;"&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: black; font-weight: bold;"&gt;]&lt;/span&gt;&lt;br /&gt;&lt;span style="color: black; font-weight: bold;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Now, lets drop stock amount for sku 1 of 1:&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;!-- HTML generated using hilite.me --&gt;&lt;br /&gt;&lt;div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;"&gt;&lt;pre style="line-height: 125%; margin: 0;"&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;update&amp;gt;&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;add&amp;gt;&lt;/span&gt;&lt;br /&gt;        &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;doc&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"id"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;sku1 of 1&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;           &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;field&lt;/span&gt; &lt;span style="color: #c4a000;"&gt;name=&lt;/span&gt;&lt;span style="color: #4e9a06;"&gt;"units_i"&lt;/span&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;gt;&lt;/span&gt;7&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/field&amp;gt;&lt;/span&gt;&lt;br /&gt;       &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/doc&amp;gt;&lt;/span&gt;&lt;br /&gt;   &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/add&amp;gt;&lt;/span&gt;&lt;br /&gt;   &lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;commit/&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&amp;lt;/update&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;pre style="line-height: 125%; margin: 0;"&gt;&lt;span style="color: #204a87; font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;You see, here we don’t use Atomic Updates, thus fields might not be stored. After update, the product which has been almost sold out goes down:&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;!-- HTML generated using hilite.me --&gt;&lt;br /&gt;&lt;div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;"&gt;&lt;pre style="line-height: 125%; margin: 0;"&gt;&lt;span style="color: black; font-weight: bold;"&gt;{&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: #204a87; font-weight: bold;"&gt;"numFound"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;2&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: #204a87; font-weight: bold;"&gt;"start"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;0&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: #204a87; font-weight: bold;"&gt;"maxScore"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;99&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: #204a87; font-weight: bold;"&gt;"docs"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: black; font-weight: bold;"&gt;[&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: black; font-weight: bold;"&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"id"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"2"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"type_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"product"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"name_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"expensive red"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"score"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;99&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: black; font-weight: bold;"&gt;},&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: black; font-weight: bold;"&gt;{&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"id"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"1"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"type_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"product"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"name_s"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #4e9a06;"&gt;"expensive blue"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;,&lt;/span&gt;&lt;br /&gt;      &lt;span style="color: #204a87; font-weight: bold;"&gt;"score"&lt;/span&gt;&lt;span style="color: black; font-weight: bold;"&gt;:&lt;/span&gt; &lt;span style="color: #0000cf; font-weight: bold;"&gt;7&lt;/span&gt;&lt;br /&gt;    &lt;span style="color: black; font-weight: bold;"&gt;}&lt;/span&gt;&lt;br /&gt;  &lt;span style="color: black; font-weight: bold;"&gt;]&lt;/span&gt;&lt;br /&gt;&lt;span style="color: black; font-weight: bold;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;pre style="line-height: 125%; margin: 0;"&gt;&lt;span style="color: black; font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;Note, if you would like to approach query-time cross-join for real-time boosting, make sure you check performance on early stages of evaluation. Query-time join might be not fast enough for large indices. Also, index-time join can’t help here, because it’s strictly intra-core. One promising challenger is &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/LUCENE-6352" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: underline; vertical-align: baseline;"&gt;Global Ordinals Join&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;, even though ordinals map doesn’t span across cores, but only through segments of the single index. The question is: is it really a blocker?&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline;"&gt;That is it. I hope it brings you a few ideas how join scoring might be used in you application! Let us know how it worked out.&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/1577921516399409075/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=1577921516399409075" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1577921516399409075" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1577921516399409075" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/NIk-641ZUU4/scoring-join-party-in-solr-53.html" title="Scoring Join Party in Solr 5.3" /><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/blank.gif" /></author><thr:total>3</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-5671443141227268067</id><published>2015-07-06T15:16:00.000-07:00</published><updated>2017-07-20T16:17:01.097-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="DIH" /><category scheme="http://www.blogger.com/atom/ns#" term="ETL" /><category scheme="http://www.blogger.com/atom/ns#" term="indexing" /><category scheme="http://www.blogger.com/atom/ns#" term="join" /><category scheme="http://www.blogger.com/atom/ns#" term="Kettle" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><title type="text">How to import structured data into Solr</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div dir="ltr" id="docs-internal-guid-e6f0c162-655a-0e7b-7db0-52d66b84d3bf" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;This post summarizes our experience around data ingestion in Search. Almost any search project begins with feeding search engine by existing data. Here we are mostly focusing on old good relational databases as data source. I don’t even hesitate what to to type: SQL database, or not-NoSQL DB?. Needless to say most of these considerations are applicable to any other data sources like files, web services, NoSQL DBs and distributed file systems. &lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17.333333333333332px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;Solr Data Import Handler - DIH&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Let me come out at first, I’m a big fan of &lt;/span&gt;&lt;a href="https://wiki.apache.org/solr/DataImportHandler" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;Data Import Handler&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. It’s as handy as any other &lt;/span&gt;&lt;a href="https://en.wikipedia.org/wiki/Extract,_transform,_load" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;ETL&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; tool - you don’t need &lt;/span&gt;&lt;a href="https://lucidworks.com/blog/indexing-with-solrj/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;to write Java code calling SolrJ&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; and debug SQL query results in IDE. With DIH you are jungling with configs and copy-paste queries and &lt;/span&gt;&lt;a href="https://wiki.apache.org/solr/DataImportHandler#Interactive_Development_Mode" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;play with the queries and data&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; right in the &lt;/span&gt;&lt;a href="https://cwiki.apache.org/confluence/display/solr/Dataimport+Screen" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;SolrAdmin&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. So, DIH is perfect for fast prototyping, but what about running it in production? There are couple of issues here (add yours):&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;lack of concurrency - single thread processing keeps hardware idle and takes a lot of time to complete;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;lack of performant join (look further).&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;We will check these points below, however, we have an evidence that DIH is used for really huge deployments. So, despite of these limitations many people run DIH in production. To solve the concurrency problem, we can logically shard data using specially crafted queries and launch per-shard imports in parallel. Join performance problem is solved &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-2382" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;by persistent caches&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;, which presumably makes join operation faster, yet I am skeptical about this approach. &amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Now let’s look at some practical questions.&lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17.333333333333332px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;Indexing Blocks by DIH &amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;We are known as &lt;/span&gt;&lt;a href="http://blog.griddynamics.com/2013/09/solr-block-join-support.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;Block Join proponents&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;, and as such, want to &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-5147" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;index blocks in DIH&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. That’s made possible in 5.1 and enabled by specifying &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-5147?focusedCommentId=14263373&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14263373" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;child=”true”&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; in the child (2nd level and more) entity. &amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17.333333333333332px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;Nested Entities (data join) in DIH&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Now, lets come back to joining entities in DIH. As it’s perfectly described in SOLR-2382 referred above: “Using SqlEntityProcessor with Child Entities can cause an "n+1 select” problem”. This term reminds &lt;/span&gt;&lt;a href="https://docs.jboss.org/hibernate/orm/3.3/reference/en/html/performance.html#performance-fetching-custom" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;my earlier challenges&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; in IT. Thus, it can’t process any considerable amount of data in a reasonable time. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;You may ask why don’t just ask RDBMS to join entities and process joined result set in DIH. Well, it’s possible if both tables reside in the same DB, however, you should remember about &lt;/span&gt;&lt;a href="https://docs.jboss.org/hibernate/orm/3.3/reference/en/html/queryhql.html#queryhql-joins" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;cartesian product problem&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; which can happen if you join two or more child entities. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Recommended approach in this case is to cache one side of the relation on heap with hashmap, and when heap is exceeded (the only case I care) it’s suggested to &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-2943" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;put data off-heap in BDB files&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17.333333333333332px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;Merge Join in DIH&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;It turns out &lt;/span&gt;&lt;a href="https://ru.wikipedia.org/wiki/ETL" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;ETL&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; community is aware of this problem and had solution recipe for ages - &lt;/span&gt;&lt;a href="http://www.bimonkey.com/2010/10/the-merge-join-transformation/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;external merge join&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. For example a gain between naive N+1 subqueries to merge join for hundred of millions records is the speedup from several hours to several minutes. It’s worth to mention that we meet the same algorithm - merging sorted sequences over and over again in search engines implementation. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Here is the good news: starting from 5.0 merge join is available for any &lt;/span&gt;&lt;a href="https://wiki.apache.org/solr/DataImportHandler#EntityProcessor" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;EntityProcessor&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; in DIH by specifying &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-4799" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;join=”zipper”&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; in child entity. Sure you need to sort both inputs, thankfully that’s done by RDBMS’ indices quite well. Interesting to see how Kettle ETL reminds about it every time when merge join configuration is amended. DIH doesn’t bother you with such pop-up, but throws an exception if inbound streams are not ordered. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;img alt="Screen Shot 2015-06-29 at 3.12.12 PM.png" height="405px;" src="https://lh6.googleusercontent.com/toNVxYMB838OVlrSZcnAkdhYeas0DCFzFA1VafKDv_2SBaEfGdNzTBkakzGcGDeVBtFbMsoNTxDdLRvvHyXbRYVN-y8EcpfOQbbd7XxsJ1tKUr3e0GKQvwHXCRetyFLz98sKB1A" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="604px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;You can also process many-to-many relations, but it requires join and sort in RDBMS, that’s usually fine. &amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;It’s time to talk about threads and concurrent processing. Before this, let’s make a note that merge join algorithm is not easily parallelizable, thus “hash join” (lookup in cached data) is more suitable for multithreaded processing. &amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17.333333333333332px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;Multithreading &lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;There are no threads in DIH. Sic. We have usual pitfall of producer-consumer with the sequential processing - everyone waits its’ counterpart:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none; width: 624px;"&gt;&lt;colgroup&gt;&lt;col width="*"&gt;&lt;/col&gt;&lt;col width="*"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="border-bottom: solid #ffffff 0px; border-left: solid #ffffff 0px; border-right: solid #ffffff 0px; border-top: solid #ffffff 0px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;img height="380px;" src="https://lh5.googleusercontent.com/ViFFhSUZnGB8OI8du0mBffMr6FQ2_i4s1jmysdm0UkLD7R5iadU835F8HLrOqtZ7gPE5ePj4oGlF3fJf16ZwPMaRojjXc_iXCD0C7kgvtU17MYCOym-E4vEglrqY4r9iUPxWEBo" style="border: none; transform: rotate(0rad);" width="280px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;What we have in DIH&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-bottom: solid #ffffff 0px; border-left: solid #ffffff 0px; border-right: solid #ffffff 0px; border-top: solid #ffffff 0px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;img height="259px;" src="https://lh6.googleusercontent.com/_FdtBX6acr3FFR_jeDfusgdpVHuWJ-pY9h-FY2glhmZQ6upN4gOCPd-b7yLMOVv2pVRu4-1oyvTpMHNc1wE9JtS-07aauzvl057GSxfIladHWhQfuC472SWCgEr-G3N5VGzQBOE" style="border: none; transform: rotate(0rad);" width="212px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;What we want to have&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Just a note, the same problem we have if run DIH with SolrCloud, in this case DIH feeds Solr one document by one synchronously, and blocks until every document is sent to shard leader by &lt;/span&gt;&lt;a href="http://wiki.apache.org/solr/UpdateRequestProcessor#Distributed_Updates" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;DistributingUpdateProcessor&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. &amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Okay, enough about problems, let’s talk about opportunities. We can parallelize outbound flow (consumer):&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;if DIH sent updates via &lt;/span&gt;&lt;a href="https://lucene.apache.org/solr/5_0_0/solr-solrj/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;ConcurrentUpdateSolrClient&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; or &lt;/span&gt;&lt;a href="http://lucene.apache.org/solr/5_2_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;CloudSolrClient&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;, it would unblock producer from consumer, giving the opportunity to fully utilize Solr machines for indexing. However, this is not possible with current DIH design, but we have a great attempt to breakthrough - &amp;nbsp;&lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-7188" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;SOLR-7188&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. Heads up! It would be a great win, letting us to run DIH as a real ETL tool.&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;there is another patch adds threads on &lt;/span&gt;&lt;a href="https://wiki.apache.org/solr/UpdateRequestProcessor" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;UpdateRequestProcessors&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; layer - &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-3585" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;SOLR-3585&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; you can think about it as a server side &lt;/span&gt;&lt;a href="https://lucene.apache.org/solr/5_0_0/solr-solrj/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;ConcurrentUpdateSolrClient&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. Despite we have a &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-3585?focusedCommentId=14027908&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14027908" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;positive feedback&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; from the production usage, I’ve changed my mind since its’ contribution and don’t consider it as an architecturally wise approach. I suppose it’s a client’s duty (i.e. ETL) to provide the proper level of concurrent load and throttling. Nevertheless, you can use it if you are in the trouble, for example if you have some legacy script posting files into Solr http endpoint.&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;We can also think how to prefetch data in JdbcDataSource in background thread that avoids blocking producer. I must have such patch somewhere, let me know if you need one.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17.333333333333332px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;Another ETL tool: Kettle &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;I also played with a couple of open source ETL tools, I choose Kettle as an example. It has many useful facilities built-in and you can definitely use it as a tool box for data ingestion. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;However, we are keen in the particular problem - building Solr XMLs. The difficulty which I've faced is the limitations of flat relational tuples (call it rows or records). However, what we need is to have at least three levels of nesting &lt;/span&gt;&lt;a href="https://gist.github.com/mkhludnev/6406734#file-t-shirts-xml" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;like here&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; - parent-child-attributes. One of the possible workarounds is to use XML DOM as a data structure, however it can not be transferred between transformation steps as-is, and needs to be converted into string, concatenated and parsed again and again something like we have here. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;img alt="Screen Shot 2015-06-30 at 3.32.44 PM.png" height="233px;" src="https://lh5.googleusercontent.com/mZnRGdL08PmEOh0cx-9Vy3ofzWr7xgh8nr90XnGwdlYn7rwEYjRzgDD-U35A5TFazw_QwnyajMYnOIL5_cshA_rXQChlOPrm_LljCgd7vKOq-Lsf0RKdppKsJp2uq493OFTDCUw" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;I've found XML Join is not well scalable, it rather works like in-memory XPath database, it's great, but not what I need. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;One of the possible solutions is to introduce DOM XML as a first class datatype in Kettle, and let some steps to process it as-is.&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;b&gt;UPD&lt;/b&gt;: here is a prototype &lt;a href="https://github.com/griddynamics/xml-dom-kettle-etl-plugin"&gt;https://github.com/griddynamics/xml-dom-kettle-etl-plugin&lt;/a&gt; &lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: &amp;quot;arial&amp;quot;; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Stay tuned, we will present such a proof-of-concept soon. Don't hesitate to share your vision, experience and findings.&amp;nbsp;&lt;/span&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/5671443141227268067/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=5671443141227268067" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/5671443141227268067" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/5671443141227268067" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/S7bZ42wVY7g/how-to-import-structured-data-into-solr.html" title="How to import structured data into Solr" /><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/blank.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2015/07/how-to-import-structured-data-into-solr.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-62720189439435182</id><published>2015-06-16T18:51:00.000-07:00</published><updated>2015-06-16T18:51:42.891-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="AVX" /><category scheme="http://www.blogger.com/atom/ns#" term="Codecs" /><category scheme="http://www.blogger.com/atom/ns#" term="compression" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="JNI" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="Native" /><category scheme="http://www.blogger.com/atom/ns#" term="SIMD" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="SSE" /><category scheme="http://www.blogger.com/atom/ns#" term="~Ivan Mamontov" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><title type="text">Lucene SIMD Codec benchmark and future steps</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div dir="ltr" id="docs-internal-guid-29076394-fbf2-0e87-6c97-8f127943dec7" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;We are happy to share results of our &lt;/span&gt;&lt;a href="http://blog.griddynamics.com/2015/02/proposing-simd-codec-for-lucene.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;Lucene SIMD&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; research announced earlier. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Ivan integrated &lt;/span&gt;&lt;a href="https://github.com/lemire/simdcomp" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;https://github.com/lemire/simdcomp&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; as &lt;/span&gt;&lt;a href="https://www.elastic.co/blog/what-is-an-apache-lucene-codec" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;Lucene Codec&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; and we could observe &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;18%&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; gain on standard &lt;/span&gt;&lt;a href="http://lucene.apache.org/core/5_0_0/benchmark/org/apache/lucene/benchmark/byTask/package-summary.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;Lucene benchmark&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. Here are &lt;/span&gt;&lt;a href="http://git.io/vkY1o" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;the fork&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="http://git.io/vkSsB" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;deck&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=2HQdbpgHfnQ" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;recording&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; from &lt;a href="http://berlinbuzzwords.de/session/fast-decompression-lucene-codec"&gt;BerlinBuzzwords&lt;/a&gt;. &lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 17.333333333333332px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;Tech notes&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;The prototype is &lt;/span&gt;&lt;a href="https://github.com/griddynamics/solr-fork/commit/fddc1ff29e9ed2f0c7bba63981479c903a4e0dd3" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;limited to postings&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; (IndexOptions.&lt;a href="http://lucene.apache.org/core/5_0_0/core/org/apache/lucene/index/IndexOptions.html#DOCS"&gt;DOCS&lt;/a&gt;), so far it doesn’t support freqs, positions, payloads. Thus, full idf-tf scoring is not possible so far. &lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 17.333333333333332px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;The heap problem&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Currently, the bottleneck of the search performance is &lt;/span&gt;&lt;a href="http://nlp.stanford.edu/IR-book/html/htmledition/efficient-scoring-and-ranking-1.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;the scoring heap&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. Heap is hard for vectorization, and even hard to compute with regular instructions. Thus, benchmark retrieves only top 10 docs to limit efforts for managing heap.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;img alt="lucene.png" height="141px;" src="https://lh4.googleusercontent.com/cT4eKFXoim4ZaRPYNVOV-Fxvtg-4AyQe_eK5m8mRfE2m8j8SKWlOri18IL-_co8WuzvqH_IsJZBHwpmQgC6ZpKpFxTTEcuIdm-KzNda2MFN67OtodX-nMJNcZXcm95RUaLxYa90" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Here is a profiler snapshot for the default Lucene code, decoding takes more than collecting. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;img alt="simd.png" height="140px;" src="https://lh4.googleusercontent.com/GFDwkqjhPXZbJIalOjmS8-PLSNZtZCgvOHvWC5k_p0xqoWQA5r-lv4jCzOC0hGtxMVzxHz652BTSggjnRSJpyZ-lFxpgOl7ofP_ylmnQzI_9w0j5cUhjunndbTIJPzf8nu60de8" style="-webkit-transform: rotate(0.00rad); border: none; transform: rotate(0.00rad);" width="624px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;This is hotspots with the SIMD codec, note that collecting is prevailing now and ForUtil takes relatively smaller time for decoding. &amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 17.333333333333332px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;Edge cases&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;There are few special code paths which bypass generic &lt;/span&gt;&lt;a href="https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;FOR&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; decoding which make it harder to observe vectorization gain. Very dense stopwords postings are encoded as a sequence of increasing numbers with by just specifying length of the sequence (see ForUtil.ALL_VALUES_EQUAL). Thus, we excluded stopwords from the benchmark to better observe the gain in FOR decoding. &lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Another edge case is shortening postings on high segmentation. FOR compression is applied on blocks, and remaining tail is encoded by &lt;/span&gt;&lt;a href="http://nlp.stanford.edu/IR-book/html/htmledition/variable-byte-codes-1.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;vInt&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. Thus, to observe the gain in FOR decoding, we merge segments to the single one.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Due to the same reason, rare terms with short postings list is not a good use case to show a gain. &lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 17.333333333333332px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;Further Plans&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Here are some directions which we consider:&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;provide codec and benchmark as a separate modules;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;apply SIMD codec for DocValues and Norms - it should improve generic sorting, scoring and faceting. Because ordinals in DocValues are not increasing like postings, &lt;/span&gt;&lt;a href="https://github.com/lemire/FastPFor" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;https://github.com/lemire/FastPFor&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; should be incorporated;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;complete codec for supporting frequencies, offsets and positions to make it fully functional;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;presumably, SIMD facet component might get some gain from vectorization, however decoding ordinals might not be the biggest problem in faceting, like it’s described &lt;/span&gt;&lt;a href="https://sbdevel.wordpress.com/2015/03/13/n-plane-packed-counters-for-faceting/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;; &amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;execute binary operations like intersections on compressed data with SIMD instructions &lt;/span&gt;&lt;a href="https://github.com/lemire/SIMDCompressionAndIntersection" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;https://github.com/lemire/SIMDCompressionAndIntersection&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;native code might access &lt;a href="http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html"&gt;mmapped index files&lt;/a&gt; without boundary checks or copying to heap arrays;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;implementing &lt;a href="https://github.com/lemire/RoaringBitmap"&gt;roaring bitmaps&lt;/a&gt; might help with dense postings;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Which of of those directions are relevant your challenges? Leave a comment below!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Here are still questions to clarify:&lt;/span&gt;&lt;/div&gt;&lt;ol style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;will critical natives work for Java 9 and further?&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;couldn’t it happen that vectorization heuristic by JIT makes explicit SIMD codec redundant? &lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 14.666666666666666px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;We’d like to thank all people who contributed their researches and let us to conduct ours. &lt;/span&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/62720189439435182/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=62720189439435182" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/62720189439435182" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/62720189439435182" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/pCO5eRx4mP0/lucene-simd-codec-benchmark-and-future.html" title="Lucene SIMD Codec benchmark and future steps" /><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/blank.gif" /></author><thr:total>0</thr:total><georss:featurename>Petrogradsky District, Saint Petersburg, Russia</georss:featurename><georss:point>59.963803662130246 30.321364402770996</georss:point><georss:box>59.961816662130246 30.316321902770998 59.965790662130246 30.326406902770994</georss:box><feedburner:origLink>http://blog-archive.griddynamics.com/2015/06/lucene-simd-codec-benchmark-and-future.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-1907785464318003708</id><published>2015-06-01T14:05:00.000-07:00</published><updated>2015-06-01T14:05:11.131-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Big Data" /><category scheme="http://www.blogger.com/atom/ns#" term="Hadoop" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="Pig" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="~Dmitry Sotnyk" /><title type="text">Who is who in Big Data</title><content type="html">Well, this post is not about persons or companies, but brief overview of what is BigData and Big Data stack technologies, so don't be confused with the title.&lt;br /&gt;&lt;br /&gt;It's a good starting point to explore Big Data and understand what is what, pros and cons and of every technology and how they can be combined.&lt;br /&gt;&lt;br /&gt;Enjoy!&lt;br /&gt;&lt;br /&gt;&lt;a href="http://sotnikdv.github.io/bigdata/2015/05/31/who-is-who-in-bigdata.html"&gt;http://sotnikdv.github.io/bigdata/2015/05/31/who-is-who-in-bigdata.html&lt;/a&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/1907785464318003708/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=1907785464318003708" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1907785464318003708" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1907785464318003708" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/51EyzyyUv04/who-is-who-in-big-data.html" title="Who is who in Big Data" /><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/blank.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2015/06/who-is-who-in-big-data.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-5374733123139552962</id><published>2015-03-18T11:11:00.001-07:00</published><updated>2015-03-18T11:11:09.903-07:00</updated><title type="text">Spark and ZooKeeper: fault-tolerant job manager out of the box</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div dir="ltr" id="docs-internal-guid-3a80e16a-2e0e-c55d-c733-b31794812d23" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 28px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Problem definition&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Imagine that you’ve been using some RDBMS for last ten years, and there’s millions of records inside, moreover the information keeps flowing every minute. And one day you needed to organize some complex information retrieval process on this data with a full-fledge enterprise search engine like Apache Solr. In order to do this you need to develop some ETL process that will be able to convert your data to internal Solr documents. You obviously want to distribute it and make fault-tolerant (you don’t want to lose even a row of information). And the last requirement, that makes everything more interesting - you want zero support burden. Everything should come out of the box! Too difficult? Absolutely not! Let’s see how to nail it with Solr, Spark and Zookeeper binded together.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Why Spark? It is distributed, it doesn’t force you to use MapReduce programming model (like Hadoop does), and finally, it has a failover mechanism backed by Zookeeper.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;So, we introduced our toolbox, now it’s time to make it work. Let’s start with Solr and Zookeeper. Historically, Solr was just an universal search engine built on top of Apache Lucene, but starting from 4.x version it comes with distributed search support aka SolrCloud, which provides additional failure resilience, delegating cluster management to Zookeeper. Spark can also use Zookeeper for failure recovery in cluster mode.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 28px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Preparation&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;First, let’s prepare a simple application that will emulate our long running data import task. We’ll submit this application to Spark and try to simulate node/process failure, while it sends simple documents to Solr. Our goal is to ensure that there’s no document loss due to any failures.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Spark applications are just Java archives containing all project dependencies. Documentation recommends to use either&lt;/span&gt;&lt;a href="http://www.scala-sbt.org/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;sbt&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; or&lt;/span&gt;&lt;a href="http://maven.apache.org/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;Maven&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; project management tools with appropriate plugins:&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt; text-align: left;"&gt;&lt;ul&gt;&lt;li style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="https://github.com/sbt/sbt-assembly" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;sbt-assembly&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;/li&gt;&lt;li style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="http://maven.apache.org/plugins/maven-shade-plugin/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;Apache Maven Shade Plugin&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; We’ll make our application with Maven. In order to do this we need to add a dependency to SolrJ (&lt;/span&gt;&lt;a href="http://mvnrepository.com/artifact/org.apache.solr/solr-solrj" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;http://mvnrepository.com/artifact/org.apache.solr/solr-solrj&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;) and put a dummy class with a simple code like this:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="623"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="border-bottom: solid #000000 1px; border-left: solid #000000 1px; border-right: solid #000000 1px; border-top: solid #000000 1px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;package &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;com.example;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;import &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;org.apache.solr.client.solrj.SolrServer;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;import &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;org.apache.solr.client.solrj.impl.HttpSolrServer;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;import &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;org.apache.solr.common.SolrInputDocument;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;import &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;java.util.ArrayList;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;import &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;java.util.List;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;public class &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Executor {&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;public static void &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;main(String[] args) {&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;String solrServerURL = &lt;/span&gt;&lt;span style="background-color: white; color: green; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;"http://localhost:8983/solr"&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;int &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;offset = &lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;0&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;int &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;delay = &lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;1&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;for &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;(String arg : args) {&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;if &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;(arg.startsWith(&lt;/span&gt;&lt;span style="background-color: white; color: green; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;"offset="&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;)) {&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;// This offset is needed to avoid sending several documents with the&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;// same id. In spite of the fact that Solr can handle such documents,&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;// we don't want to have any intersections. If we run 4 tasks like this&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;// one, then we need check if all 4000 documents are indexed.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;offset = Integer.&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;parseInt&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;(arg.split(&lt;/span&gt;&lt;span style="background-color: white; color: green; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;"="&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;)[&lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;1&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;]);&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;} &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;else if &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;(arg.startsWith(&lt;/span&gt;&lt;span style="background-color: white; color: green; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;"delay"&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;)) {&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;delay = Integer.&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;parseInt&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;(arg.split(&lt;/span&gt;&lt;span style="background-color: white; color: green; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;"="&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;)[&lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;1&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;]);&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;} &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;else if &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;(arg.startsWith(&lt;/span&gt;&lt;span style="background-color: white; color: green; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;"solr_server"&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;)) {&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;solrServerURL = arg.split(&lt;/span&gt;&lt;span style="background-color: white; color: green; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;"="&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;)[&lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;1&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;];&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;SolrServer solrServer = &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;new &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;HttpSolrServer(solrServerURL);&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;List&amp;lt;SolrInputDocument&amp;gt; docs = &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;new &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;ArrayList&amp;lt;SolrInputDocument&amp;gt;();&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;for &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;(&lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;int &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;i = &lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;1&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;; i &amp;lt;= offset + &lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;1000&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;; i ++) {&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;SolrInputDocument document = &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;new &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;SolrInputDocument();&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;document.addField(&lt;/span&gt;&lt;span style="background-color: white; color: green; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;"docid"&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;, String.&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;valueOf&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;(i));&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;docs.add(document);&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;// Add documents to Solr by batches with a size equal to 100&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;try &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;{&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;if &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;(i % &lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;100 &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;== &lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;0&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;) {&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;solrServer.add(docs);&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;docs = &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;new &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;ArrayList&amp;lt;SolrInputDocument&amp;gt;();&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;} &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;catch &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;(Exception e) {&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;// &lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;TODO: Proper exception handling&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;}&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;// Artificial delay to prolongate total time&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;try &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;{&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Thread.&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;sleep&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;(delay);&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;} &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;catch &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;(InterruptedException e) {&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;// Nothing to do&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;}&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;// Finally commit documents and free resources&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;try &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;{&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;solrServer.commit();&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;} &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;catch &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;(Exception e) {&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: grey; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;// &lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;TODO: Proper exception handling&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;} &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;finally &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;{&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;solrServer.shutdown();&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;}&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;}&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;This code generates Solr input documents with the only field “id”, and sends them to server using Java client called SolrJ. The “id” field must be unique in the context of Solr index, and nevertheless it is not exceptional situation for Solr to handle two documents with the same id, we need to count every document sent to server to ensure that no document is lost. Fine, let’s build it, and now we have our “uber” JAR with all dependencies.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 28px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;SolrCloud deployment&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Next step. Now we are going to download and unpack SolrCloud distributive. Downloads are located on the page&lt;/span&gt;&lt;a href="http://lucene.apache.org/solr/downloads.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;http://lucene.apache.org/solr/downloads.html&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. I took 4.10.2 version. We’ll be using embedded example in our how-to, let’s clone its folder to make four independent workspaces:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="623"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="background-color: #666666; border-bottom: solid #000000 1px; border-left: solid #000000 1px; border-right: solid #000000 1px; border-top: solid #000000 1px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; cd ~/solr-4.10.2/&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:solr-4.10.2 Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; cp -r example example2&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:solr-4.10.2 Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; cp -r example exampleB&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:solr-4.10.2 Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; cp -r example example2B&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Ok, now deploy first instance:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="623"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="background-color: #666666; border-bottom: solid #000000 1px; border-left: solid #000000 1px; border-right: solid #000000 1px; border-top: solid #000000 1px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:example Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; export SOLR_INSTANCE=~/solr-4.10.2/example&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:example Root$ &lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;java -Djetty.port=8983 -Djetty.home=$SOLR_INSTANCE &amp;nbsp;-Dsolr.solr.home=$SOLR_INSTANCE/solr -Dbootstrap_confdir=$SOLR_INSTANCE/solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DzkHost=localhost:9983,localhost:8574,localhost:9900,localhost:9500 -DnumShards=2 -jar $SOLR_INSTANCE/start.jar&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;And continue deploying, changing jetty.port (8983, 7574, 8900, 8500) and SOLR_INSTANCE (example, example2, exampleB, example2B) parameters accordingly. You can send all these commands to background using “&amp;amp;” appendix or run them separately in four consoles. Here:&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt; text-align: left;"&gt;&lt;ul&gt;&lt;li style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div style="line-height: 2.38464; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;jetty.port, jetty.home - Solr from example uses Jetty embedded server under the hood, so we just need to specify right port to bind on and working folder&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div style="line-height: 2.38464; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;solr.solr.home - home directory for all Solr files (properties, index files and so on)&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;bootstrap_confdir - initial boot configuration, that will be uploaded to ZooKeeper&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;collection.configName - sets a name for ZooKeeper configuration&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;zkRun - tells to run embedded ZooKeeper&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;zkHost - specifies all nodes eligible to host ZooKeeper&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;When everything is done, we’ll get two shards and two replicas SolrCloud. Also we told Solr to run embedded ZooKeeper and upload its configuration there. You can check Solr state, cloud details and ZooKeeper data by going to Solr admin page located by the link &lt;/span&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;http://localhost:port/solr&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; (pic. 1 and 2).&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;img height="488px;" src="https://lh6.googleusercontent.com/Eov5Wf5Nk9D8xGYiFpdMJwkupkJUEVN2PUp6WbLIWHBWyPChR2i_5DHdejkYdp9yb2-cExKz1IPEWmsTwne3jVbJV0EvVALHmfraLkgnsXty66dJzTnfLP0SyyAlOggSlI-jldA" style="-webkit-transform: rotate(0.00rad); border: 1px solid #000000; transform: rotate(0.00rad);" width="624px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Pic. 1 – SolrCloud graph&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;img height="435px;" src="https://lh3.googleusercontent.com/lbGxMW-QHZ77dTmEV2jWhG3vfwiAQcPCiNYFBZFpLrb4uh4s1irnLeg6rsz9b2tOV7AyxeioHhw2dTySHDdPgRUBKDzOpGQFa4leo26FBiMz8SH8hj9e7V7K2UrxgZz3FktF6Ig" style="-webkit-transform: rotate(0.00rad); border: 1px solid #000000; transform: rotate(0.00rad);" width="624px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Pic. 2 – Zookeeper tree&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 28px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Spark deployment&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 28px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Now, it’s Spark turn. Spark can be downloaded from&lt;/span&gt;&lt;a href="https://spark.apache.org/downloads.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;https://spark.apache.org/downloads.html&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. There are several pre-built versions available for download for your convenience. When it’s done, unpack an archive and clone it - one instance will became a leader or master node, second one will became spare or standby node. Let’s name these folders accordingly:&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="623"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="background-color: #666666; border-bottom: solid #000000 1px; border-left: solid #000000 1px; border-right: solid #000000 1px; border-top: solid #000000 1px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; tar -xzvf spark-1.1.1-bin-hadoop2.4.tgz&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; mv spark-1.1.1-bin-hadoop2.4 spark-leader&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; cp spark-leader spark-standby&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Alright, then we need to configure our Spark nodes. In order to do this, copy or rename existing template:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="623"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="background-color: #666666; border-bottom: solid #000000 1px; border-left: solid #000000 1px; border-right: solid #000000 1px; border-top: solid #000000 1px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; cd spark-leader/conf&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;br class="kix-line-break" /&gt;&lt;/span&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; cp spark-env.sh.template spark-env.sh&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;And reduce it’s content to:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="623"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="border-bottom: solid #000000 1px; border-left: solid #000000 1px; border-right: solid #000000 1px; border-top: solid #000000 1px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_MASTER_IP="localhost"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_MASTER_PORT=1101&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=localhost:9983,localhost:8574,localhost:9900,localhost:9500"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_PID_DIR="/tmp/spark-leader"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_MASTER_WEBUI_PORT=6661&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_WORKER_PORT=1102&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_WORKER_MEMORY=512m&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_LOCAL_DIRS="./data"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_WORKER_DIR="./work"&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;What is this all about? Well, every option is described in the original template, we’ll only pay an attention to the following variables:&lt;/span&gt;&lt;/div&gt;&lt;ol style="margin-bottom: 0pt; margin-top: 0pt; text-align: left;"&gt;&lt;ol&gt;&lt;li style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"&gt;&lt;div style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;SPARK_DAEMON_JAVA_OPTS - we told Spark to use Zookeeper for recovery and pointed to Zookeeper cluster&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"&gt;&lt;div style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;SPARK_PID_DIR - there’s a dir where Spark will keep process ids, we will need it in order to simulate node failure&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/ol&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Do the same for the spare node:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="623"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="border-bottom: solid #000000 1px; border-left: solid #000000 1px; border-right: solid #000000 1px; border-top: solid #000000 1px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_MASTER_IP="localhost"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_MASTER_PORT=2101&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=localhost:9983,localhost:8574,localhost:9900,localhost:9500"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_PID_DIR="/tmp/spark-standby"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_MASTER_WEBUI_PORT=7661&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_WORKER_PORT=2102&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_WORKER_MEMORY=512m&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_LOCAL_DIRS="./data"&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;export SPARK_WORKER_DIR="./work"&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;That’s all, now it’s time to deploy Spark cluster. I wrote a simple bash script to achieve this goal:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="623"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="border-bottom: solid #000000 1px; border-left: solid #000000 1px; border-right: solid #000000 1px; border-top: solid #000000 1px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;#!/bin/bash&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;rm -rf /tmp/spark-leader&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;rm -rf /tmp/spark-standby&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;bash spark-leader/sbin/start-master.sh&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;bash spark-standby/sbin/start-master.sh&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;SPARK_WORKER_INSTANCES=&lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;4&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;SPARK_WORKER_WEBUI_PORT=&lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;8661&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;for ((&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;i=&lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;0&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;; i&amp;lt;$SPARK_WORKER_INSTANCES; i++&lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;))&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;; &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;do&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt; &amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;bash spark-leader/sbin/start-slave.sh &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;$(( &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;$i + &lt;/span&gt;&lt;span style="background-color: white; color: blue; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;1 &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;)) &amp;nbsp;&lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;spark://localhost:1101 --webui-port &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;$(( &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;$SPARK_WORKER_WEBUI_PORT + $i &lt;/span&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;))&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: white; color: navy; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline;"&gt;done&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;This script first cleans tmp directories, then runs two master nodes and finally starts four worker nodes. What are these master and worker nodes? Worker nodes are executors, they can only run assigned tasks, and master nodes are Spark coordinators, they orchestrate Spark cluster and assign tasks to available workers. If everything is fine, in a minute after the script is run we’ll be able to see following page by addressing to&lt;/span&gt;&lt;a href="http://locahost:6661/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;http://locahost:6661&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;img alt="Screen Shot 2015-02-10 at 13.35.16.png" height="455px;" src="https://lh5.googleusercontent.com/zsjpIrmlUpEUxTdw2ZcNKTrqvtD7JiBSyMKKW4H9UbOXhpi0Ss80ktpYrTGorqDW0YDwORR5XTqsnv2KIXdmwc61Qg4jVn6bYkmS0LLgqFJO9bZOl9DcfONBl5SfC6PdfkY-AXs" style="-webkit-transform: rotate(0.00rad); border: 1px solid #000000; transform: rotate(0.00rad);" width="624px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Pic. 3 – Spark master web UI&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;In the picture 3 we can see Spark master with four connected worker nodes. If we go to another node (&lt;/span&gt;&lt;a href="http://localhost:7661/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;http://localhost:7661&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;), we’ll see similar screen, but without any worker nodes connected (that's because we sent them to &lt;/span&gt;&lt;span style="background-color: white; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;spark://localhost:1101).&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 28px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Testing&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;So, we deployed Solr, Zookeeper and Spark clusters, and we have an assembled application, let’s submit it and experiment with failover!&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="623"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="background-color: #666666; border-bottom: solid #000000 1px; border-left: solid #000000 1px; border-right: solid #000000 1px; border-top: solid #000000 1px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; spark-leader/bin/spark-submit --class com.example.Executor --master spark://localhost:1101 --deploy-mode cluster --supervise --executor-memory 512m --total-executor-cores 4 path-to-application.jar offset=0 solr_server=http://localhost:8983/solr &amp;amp;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;What do these command mean? First of all, we use &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;spark-submit&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; script to submit our tasks. Second, we point to an executable class within our jar. Further we tell Spark how to submit the tasks (or &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;drivers &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;in terminology of Spark):&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt; text-align: left;"&gt;&lt;ul&gt;&lt;li style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;--deploy-mode cluster – from documentation “Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client)”&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;--supervise – specify this flag to make sure that the driver is automatically restarted if it fails with non-zero exit code. This’s exactly what we’re looking for – in case if Spark worker node is terminated, Spark cluster will just restart it on any free worker node.&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;What about the rest of the arguments? Well, one of them is obviously just path to our jar. And the others two are arguments that are sent to our executable class. Solr server url is an endpoint where to send Solr documents. And the last one – offset – we need it, as it was told, to avoid sending several documents with the same id.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Run this command four times increasing offset by thousand. If everything is fine, we’ll get four completed tasks and four thousand documents in Solr (see pictures 4 and 5).&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;img alt="Screen Shot 2015-02-10 at 15.31.11.png" height="172px;" src="https://lh3.googleusercontent.com/AO0NHhBEPSpUWnUNjI8oAMjwXb4InfRz5gIV0EMCJPOqb6Z_QPUYN3Pth7XW-s2lnEUD8Foe9o8Ilesakq9rZYMiLrckko0lS-WIL_jRJa7OH-19j2JtwCSNSJcixJYZ7pMdiE0" style="-webkit-transform: rotate(0.00rad); border: 1px solid #000000; transform: rotate(0.00rad);" width="624px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Pic. 4 – finished drivers in Spark master web UI&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;img alt="Screen Shot 2015-02-10 at 15.31.39.png" height="249px;" src="https://lh3.googleusercontent.com/KxqIF_K0Nxdi_zuxSPOhYNdlU5XES7AbUXQcszoZrcpZOhXlGlERVmoc-jxThXLpc_RhsU_Ta52QxdIftyN-kill1_yjHqOWwQ0YUy8enS5mrq8v1jAhI8_Zk1WggWxzFV234xM" style="-webkit-transform: rotate(0.00rad); border: 1px solid #000000; transform: rotate(0.00rad);" width="624px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Pic. 5 – Match all query in Solr web UI&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;What next? Remember we introduced “delay” command line argument in our code? We need it now to manipulate total execution time causing some artificial processing delay. During the first run we minimized it to default 1 ms, because we needed this run only to collect our control data. During the second run we’ll extend this value up to 180 ms (total execution time will be at least 3 minutes) - this will give us enough time to simulate node failure. In the end we’ll compare control data with test data. But first we need to find out leader node PID:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="623"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="background-color: #666666; border-bottom: solid #000000 1px; border-left: solid #000000 1px; border-right: solid #000000 1px; border-top: solid #000000 1px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; cd /tmp/spark-leader&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; cat spark-Root-org.apache.spark.deploy.master.Master-1.pid&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;38073&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Ok, this number – 38073 – is our leader node PID. And let’s find out our worker nodes ids:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="623"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="background-color: #666666; border-bottom: solid #000000 1px; border-left: solid #000000 1px; border-right: solid #000000 1px; border-top: solid #000000 1px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; jps&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;38145 Master&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;38073 Master&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;38016 start.jar&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;38014 start.jar&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;38015 start.jar&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;38865 Jps&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;38013 start.jar&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;38385 Worker&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;38325 Worker&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;38265 Worker&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;38205 Worker&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;We’ll choose one of the workers as well as leader node and terminate them, but first we need to purge Solr and run our modified submit commands (again, four times):&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="623"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="background-color: #666666; border-bottom: solid #000000 1px; border-left: solid #000000 1px; border-right: solid #000000 1px; border-top: solid #000000 1px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; curl http://localhost:8983/solr/update --data '&amp;lt;delete&amp;gt;&amp;lt;query&amp;gt;*:*&amp;lt;/query&amp;gt;&amp;lt;/delete&amp;gt;' -H 'Content-type:text/xml; charset=utf-8'&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; curl http://localhost:8983/solr/update --data '&amp;lt;commit/&amp;gt;' -H 'Content-type:text/xml; charset=utf-8'&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; spark-leader/bin/spark-submit --class com.example.Executor --master spark://localhost:1101 --deploy-mode cluster --supervise --executor-memory 512m --total-executor-cores 4 path-to-application.jar offset=0 delay=180 solr_server=http://localhost:8983/solr &amp;amp;&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Ready? Lets simulate nodes failure:&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="623"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="background-color: #666666; border-bottom: solid #000000 1px; border-left: solid #000000 1px; border-right: solid #000000 1px; border-top: solid #000000 1px; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; kill -9 38385&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.44; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: #666666; color: #00c800; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;my_device:~ Root$&lt;/span&gt;&lt;span style="background-color: #666666; color: white; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; kill -9 38073&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;If we’re fast enough, we can see how Spark failover mechanism works – all we need is to track changes on our standby node web UI (&lt;/span&gt;&lt;a href="http://localhost:7661/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;http://localhost:7661&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;). Usually it takes about a half of a minute to see any updates. Expected behaviour is that no submitted task will lost if the worker is down (pic. 6), spare node will change its status from STANDBY to ALIVE (it becomes leader node) if current leader node is terminated, and no orphan worker will left – they all become connected to new leader (pic. 7):&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;img alt="Screen Shot 2015-02-10 at 18.49.47.png" height="479px;" src="https://lh3.googleusercontent.com/-G0SL2vrPo-MR13vYBJJNjWQlHy3mtitxZp4Lq0pOQMXxipzRADvXdkBlO1Om_axtwFimr44-WZcxTiZVXdvQ7tvUSMG3H1XJ6Eh2nEpu0CGG9u7kPrQGInlLhMW2th7hP64c-0" style="-webkit-transform: rotate(0.00rad); border: 1px solid #000000; transform: rotate(0.00rad);" width="624px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Pic. 6 – Driver relaunching&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;img alt="Screen Shot 2015-02-10 at 18.50.02.png" height="415px;" src="https://lh5.googleusercontent.com/ikJJ-6S3JC962kURmXVkWyUCvJ-CAtHkV9h8iAv-Axk09CmitSfSjUk_GUPwh79CAxwny4IpDMaVzJ9NpMG6tjXn7BW_72D5aGRYBi3KP942YJTsFCPyLIEwS6gH83lCIp9DS60" style="-webkit-transform: rotate(0.00rad); border: 1px solid #000000; transform: rotate(0.00rad);" width="624px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Pic. 7 – Spark recovery process on spare master node&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Finally, when all drivers become completed, we need to go back to Solr and check that all match query returns the same result as we seen (pic. 5) – 4000 documents.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.656; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;Okay, seems our problem is solved - we simulated long-run distributed task, we randomly terminated executing applications and we saw entire recovery process which ensured that we haven’t lost any data. We didn’t have to write any special code to achieve this level of resiliency and job recovery. So, Spark can be used as a resilient job manager for long running data importing tasks.&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/5374733123139552962/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=5374733123139552962" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/5374733123139552962" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/5374733123139552962" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/Kw1UonOdEaE/spark-and-zookeeper-fault-tolerant-job.html" title="Spark and ZooKeeper: fault-tolerant job manager out of the box" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2015/03/spark-and-zookeeper-fault-tolerant-job.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-3070020014866645812</id><published>2015-02-01T12:45:00.006-08:00</published><updated>2015-02-01T12:45:59.253-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="compression" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="search" /><category scheme="http://www.blogger.com/atom/ns#" term="SIMD" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="~Ivan Mamontov" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><title type="text">Proposing SIMD codec for Lucene</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div dir="ltr" id="docs-internal-guid-b9e5d42a-466b-a11d-2bdb-e503344c95ff" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;This post is inspired by a&lt;/span&gt;&lt;a href="http://lemire.me/blog/archives/2012/09/12/fast-integer-compression-decoding-billions-of-integers-per-second/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;whitepaper&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. Hence, if you wish to save your time you can just read it instead of this post. However, let me try to express the context around. &lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;First, lets recollect the basics. Search engines are based on &lt;/span&gt;&lt;a href="http://nlp.stanford.edu/IR-book/html/htmledition/a-first-take-at-building-an-inverted-index-1.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;inverted index&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; (or &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/Inverted_index" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;wiki&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;) data structure which is essentially a mapping from a term to its posting list - a sorted sequence of documents where this particular term occurs. &lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;The search request execution has several stages. Usually, the three of the most time consuming are:&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;term dictionary lookup, which resolves posting list position for the given query terms;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;reading posting list from the storage, whether it’s a disk or a memory;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;applying query logic and ranking results.&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div dir="ltr" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;This post aims the second item - reading posting list, which is in fact decompression. You may wonder why to waste efforts on decompression, giving that storage cost is miserable nowadays. The answer is that system bus becomes a bottleneck when the huge index data is moved to CPU for processing. Hence, compression allows to trade CPU clocks for bus throughput.&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;Giving significant academia attention to this problem (eg check &lt;/span&gt;&lt;a href="http://ecir2014.org/awards/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;the best paper at ECIR’14&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;), we’ve tried to measure which part of the search time is spent on decompression. We’ve slightly modified Lucene wiki search benchmark. It searches for 10K most frequent terms in 2.9M docs index. We’ve compared current Lucene codec with &lt;/span&gt;&lt;a href="http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/index/FieldInfo.IndexOptions.html#DOCS_ONLY" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;DOCS_ONLY&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; mode that indexes only document numbers with the postings loaded into heap via &lt;/span&gt;&lt;a href="http://lucene.apache.org/core//4_5_0/join/org/apache/lucene/search/join/FixedBitSetCachingWrapperFilter.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;FixedBitSetCachingWrapperFilter&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;. The &lt;/span&gt;&lt;a href="https://github.com/m-khl/lucene-solr/blob/codec-benchmark/lucene/benchmark/conf/searchOnlyWiki.alg" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;benchmark&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; shows nearly &lt;/span&gt;&lt;a href="https://github.com/m-khl/lucene-solr/blob/codec-benchmark/lucene/benchmark/decoding%20comparison%20report.txt" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;20% performance gain&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; with heap bitsets. It does not seem like vast majority of the time, that might have two explanations: the benchmark is incorrect; or Lucene made a good progress toward fast decompression already.&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;The later seems happen in 4.x where &lt;/span&gt;&lt;a href="http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;bit variable codec&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;, which can stores just a few bit per document, replaced the older &lt;/span&gt;&lt;a href="http://nlp.stanford.edu/IR-book/html/htmledition/variable-byte-codes-1.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;byte variable ones&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;, which stores at least a byte per document. From another benchmark we have an evidence two-time performance gain of the former one. Nevertheless, we consider 20% as a maximum gain achievable by optimizing the decompression algorithm.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;The idea of &lt;/span&gt;&lt;a href="http://arxiv.org/abs/1209.2137" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;the paper&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; referred in the beginning is to apply &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/SIMD" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;SIMD&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; operations for decompression. Let me omit the explanation about it. We just can say that it is known as &amp;nbsp;&lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/MMX_(instruction_set)" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;MMX&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/SSE2" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;SSE2&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;, &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/SSE3" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;SSE3&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; and it’s definitely nearly as cool as &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/CUDA" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;CUDA&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;, you know. We are going to approach &lt;/span&gt;&lt;a href="https://github.com/lemire/simdcomp" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;The SIMDComp library&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; as Lucene codec and report a simple benchmark. The most difficult part is to make a fast call from Java to native SIMD code. We have an idea how to do that, which was found by one hacker from &lt;/span&gt;&lt;a href="http://ok.ru/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;ok.ru&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; used to be principal HotSpot developer. Here is the &lt;/span&gt;&lt;a href="http://habrahabr.ru/post/222997/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;description of the finding&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt; in Russian and &lt;/span&gt;&lt;a href="http://stackoverflow.com/questions/24746776/what-does-a-jvm-have-to-do-when-calling-a-native-method/24747484#24747484" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline;"&gt;translation&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline;"&gt;.&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.5; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial, Helvetica, sans-serif; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; line-height: 1.5; text-decoration: none; vertical-align: baseline;"&gt;From the codec benchmarks, which we’ve done so far we expect two times gain from approaching existing SIMD decoder. Thus, we expect only 10% overall improvement for Lucene wiki benchmark. It doesn’t sound like a true performance breakthrough, but we consider it as first step on the promising field, because further SIMD can be possibly approached for other CPU consuming operations like faceting, aggregations, intersections, &amp;nbsp;etc. So, let us know if you know VC who is hungry for something like this, or if you aware about pitfalls which we are going to hit.&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/3070020014866645812/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=3070020014866645812" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/3070020014866645812" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/3070020014866645812" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/a5V1sGjDUIk/proposing-simd-codec-for-lucene.html" title="Proposing SIMD codec for Lucene" /><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/blank.gif" /></author><thr:total>1</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2015/02/proposing-simd-codec-for-lucene.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-1167114725763003994</id><published>2014-10-15T17:33:00.001-07:00</published><updated>2014-10-15T23:14:34.591-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="numeric range queries" /><category scheme="http://www.blogger.com/atom/ns#" term="range query" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="trie fields" /><category scheme="http://www.blogger.com/atom/ns#" term="~Vadim Kirilchuk" /><title type="text">Numeric Range Queries in Lucene/Solr</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;I bet that almost every e-commerce site has price-range filters which help user to filter search results by price. Of course, ranges could be used not only as filters but also as a part of query. Most of us are familiar with &lt;a href="http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Range%20Searches"&gt;range queries&lt;/a&gt; in Lucene/Solr but only few people know about internal optimizations for &lt;b&gt;numeric ranges&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;In this post we will unveil the main algorithm for numeric range queries and also take a brief look at some tricks made in Lucene/Solr to make numeric ranges fast and efficient. From a historical point of view the idea of an algorithm came from&amp;nbsp;&lt;a href="http://www.panfmp.org/"&gt;PANGAEA® Framework for Metadata Portals&lt;/a&gt;&amp;nbsp;and original publications made by Uwe Schindler.&lt;br /&gt;&lt;h3 style="text-align: left;"&gt;Algorithm&lt;/h3&gt;&lt;div&gt;In Solr, following algorithm is used for trie based fields, such as &lt;i&gt;&lt;a href="http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/schema/TrieIntField.html"&gt;solr.TrieIntField&lt;/a&gt;&lt;/i&gt;, &lt;i&gt;&lt;a href="http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/schema/TrieFloatField.html"&gt;solr.TrieFloatField&lt;/a&gt;&lt;/i&gt;, etc...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;Let's start from explanation of the main idea of NumericRangeQuery which is based on &lt;b&gt;trie representation of numerics&lt;/b&gt;. For simplicity, lets stick with decimal number system for the first example.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-L1gvE6AfmvU/U5KGRXP9oaI/AAAAAAAAAPM/CnJvQocgP58/s1600/trie.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-L1gvE6AfmvU/U5KGRXP9oaI/AAAAAAAAAPM/CnJvQocgP58/s1600/trie.png" height="288" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;h4 style="text-align: left;"&gt;Index time&lt;/h4&gt;On the diagram there are term values themselves in the lowermost row and some term values in the upper rows: each upper level is an integer quotient of division of lower row by ten, let's call them &lt;b&gt;lower precision values&lt;/b&gt;. Let's assume that we index all numbers from lowermost row(original term values) with all their quotients.&lt;br /&gt;&lt;br /&gt;To make it more clear we need to look at posting lists:&lt;br /&gt;value -&amp;gt; document ids&lt;br /&gt;421 -&amp;gt; [1]&lt;br /&gt;423 -&amp;gt; [2]&lt;br /&gt;445 -&amp;gt; [3]&lt;br /&gt;446 -&amp;gt; [3]&lt;br /&gt;448 -&amp;gt; [4]&lt;br /&gt;521 -&amp;gt; [5]&lt;br /&gt;522 -&amp;gt; [7]&lt;br /&gt;632 -&amp;gt; [5]&lt;br /&gt;633 -&amp;gt; [6]&lt;br /&gt;634 -&amp;gt; [7]&lt;br /&gt;641 -&amp;gt; [5]&lt;br /&gt;642 -&amp;gt; [6]&lt;br /&gt;644 -&amp;gt; [7]&lt;br /&gt;&lt;br /&gt;and for quotients&lt;br /&gt;42 -&amp;gt; [1, 2]&lt;br /&gt;44 -&amp;gt; [3, 4]&lt;br /&gt;52 -&amp;gt; [5, 7]&lt;br /&gt;63 -&amp;gt; [5, 6]&lt;br /&gt;64 -&amp;gt; [5, 6 , 7]&lt;br /&gt;4 -&amp;gt; [1, 2, 3, 4]&lt;br /&gt;5 -&amp;gt; [5, 7]&lt;br /&gt;6 -&amp;gt; [5, 6, 7]&lt;br /&gt;&lt;br /&gt;In other words we group/aggregate posting lists by lower precision values during index time.&lt;br /&gt;&lt;h4 style="text-align: left;"&gt;Search time&lt;/h4&gt;Lets assume we want to find all records with term values between &lt;i&gt;“423”&lt;/i&gt; and &lt;i&gt;“642”&lt;/i&gt;. Naive algorithm here would be to expand the range to separate values: &lt;i&gt;423 OR 445 OR ... 641 OR 642&lt;/i&gt; (Note: I omitted values which were not indexed to simplify description). But as we use special type of field, instead of selecting all terms in lowermost row, query is optimized to only match on labelled term values (elements with gray fill on the diagram) with lower precision, where applicable. It is enough to select &lt;i&gt;“5”&lt;/i&gt; to match all records starting with &lt;i&gt;“5”&lt;/i&gt; (&lt;i&gt;“521”, “522”&lt;/i&gt;) or &lt;i&gt;“44”&lt;/i&gt; for &lt;i&gt;“445”, “446”, “448”&lt;/i&gt;. Query is therefore &lt;i&gt;simplified&lt;/i&gt; to match all records containing the following terms: &lt;i&gt;“423”, “44”, “5”, “63”, “641”, or “642”&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;So, instead of doing search by every value in the requested range, algorithm uses grouped values wherever possible.&lt;br /&gt;&lt;br /&gt;&lt;h3 style="text-align: left;"&gt;Under the hood&lt;/h3&gt;&lt;div&gt;Now, once we get the main idea of the algorithm we can proceed with Solr&amp;nbsp;internals and switch to binary. Warning! This part can be tricky if you've never spent several days debugging Solr and Lucene guts.&lt;br /&gt;&lt;h4 style="text-align: left;"&gt;Index time&lt;/h4&gt;During index time all terms for value and its quotients are need to be produced. Solr index flow is as simple as:&amp;nbsp;&lt;span style="font-family: Arial; font-size: 15px; white-space: pre-wrap;"&gt;DirectUpdateHandler-&amp;gt;IndexWriter#updateDocuments-&amp;gt;DocumentWriter-&amp;gt;DocumentConsumer#processDocument(fieldInfos)-&amp;gt;DocFieldProcessor-&amp;gt;DocFieldProcessorPerField-&amp;gt;DocInverterPerField&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So, for incoming document we get all its fields and process values of each field by DocInverterPerField. Basically it gets a token stream (&lt;a href="http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html"&gt;Lucene's TokenStreams are explained by Mike McCandles&lt;/a&gt;) which&amp;nbsp;produces the sequence of tokens to be indexed for a document's fields.&lt;br /&gt;&lt;span style="font-family: Arial; font-size: 15px; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;Here is a sequence diagram where you could see that for numeric field types the &lt;a href="https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/analysis/NumericTokenStream.html"&gt;&lt;i&gt;NumericTokenStream&lt;/i&gt;&lt;/a&gt; is created:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-Qj02dXzupMI/VDqQlw7z_BI/AAAAAAAAAag/tEppkRxSd64/s1600/NumericTokenStream.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-Qj02dXzupMI/VDqQlw7z_BI/AAAAAAAAAag/tEppkRxSd64/s1600/NumericTokenStream.png" height="315" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The &lt;a href="https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/analysis/NumericTokenStream.html"&gt;&lt;i&gt;NumericTokenStream&lt;/i&gt;&lt;/a&gt; is where half of magic happens. Values it produces depends on actual field type (TrieIntField, TrieLongField, etc) and &lt;b&gt;precision step&lt;/b&gt; configured for field. The &lt;b&gt;precision step &lt;/b&gt;parameter will be explained later. In the upcoming example we will stick with TrieIntField and precision step equal to 1.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;Now, let's look at an example which shows us which tokens are produced by NumericTokenStream for some value.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In our case the value would be 11, here is its binary representation:&lt;/div&gt;&lt;div&gt;&lt;span id="docs-internal-guid-455b11b7-04f9-9187-6fac-209f8294623a"&gt;&amp;nbsp;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="184px"&gt;&lt;/col&gt;&lt;col width="184px"&gt;&lt;/col&gt;&lt;col width="184px"&gt;&lt;/col&gt;&lt;col width="184px"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 93px;"&gt;&lt;td style="border-bottom: solid #000000 4px; border-left: solid #000000 4px; border-right: solid #000000 4px; border-top: solid #000000 4px; padding-bottom: 10px; padding-left: 10px; padding-right: 10px; padding-top: 10px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="font-family: Arial; font-size: 32px; vertical-align: baseline; white-space: pre-wrap;"&gt;00000000&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-bottom: solid #000000 4px; border-left: solid #000000 4px; border-right: solid #000000 4px; border-top: solid #000000 4px; padding-bottom: 10px; padding-left: 10px; padding-right: 10px; padding-top: 10px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="font-family: Arial; font-size: 32px; vertical-align: baseline; white-space: pre-wrap;"&gt;00000000&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-bottom: solid #000000 4px; border-left: solid #000000 4px; border-right: solid #000000 4px; border-top: solid #000000 4px; padding-bottom: 10px; padding-left: 10px; padding-right: 10px; padding-top: 10px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="font-family: Arial; font-size: 32px; vertical-align: baseline; white-space: pre-wrap;"&gt;00000000&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border-bottom: solid #000000 4px; border-left: solid #000000 4px; border-right: solid #000000 4px; border-top: solid #000000 4px; padding-bottom: 10px; padding-left: 10px; padding-right: 10px; padding-top: 10px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="font-family: Arial; font-size: 32px; vertical-align: baseline; white-space: pre-wrap;"&gt;0000&lt;/span&gt;&lt;span style="font-family: Arial; font-size: 32px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"&gt;1011&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And this is how it is indexed(@see rawValue row):&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span id="docs-internal-guid-6607065e-04fa-4a7d-9504-822971da1563"&gt;&lt;img height="571px;" src="https://lh3.googleusercontent.com/XoJ7n09aAofgtuImUssWFznXt4S-PmHf2BABtGetIj-seOg_vxKlBrJ_j96ebds_tCVdjPVnD6l7OwSsHyf2zms57wUVBZ9PBVCGcJAJv80a7NJrqmRWEQY-qrL3j47rG3nI" width="960px;" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;The second thing which you may have noticed is "shift" which increases by precision step value for each next token. So, the values are 11, 10, 8, 8, 0, 0 and a scrollbar.&amp;nbsp;What are binary representation for these values? Here they are:&lt;/div&gt;&lt;div&gt;&lt;span id="docs-internal-guid-6607067e-04fe-40f5-993f-c6ad2e6d4774"&gt;&amp;nbsp;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="184px"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 62px;"&gt;&lt;td style="border-bottom: solid #000000 4px; border-left: solid #000000 4px; border-right: solid #000000 4px; border-top: solid #000000 4px; padding-bottom: 10px; padding-left: 10px; padding-right: 10px; padding-top: 10px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="font-family: Arial; font-size: 32px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"&gt;00001011&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="184px"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 64px;"&gt;&lt;td style="border-bottom: solid #000000 4px; border-left: solid #000000 4px; border-right: solid #000000 4px; border-top: solid #000000 4px; padding-bottom: 10px; padding-left: 10px; padding-right: 10px; padding-top: 10px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="font-family: Arial; font-size: 32px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"&gt;0000101&lt;/span&gt;&lt;span style="color: red; font-family: Arial; font-size: 32px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"&gt;0&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;div&gt;&lt;span id="docs-internal-guid-6607067e-04ff-552b-772f-46c3d6061ec2"&gt;&amp;nbsp;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="184px"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 61px;"&gt;&lt;td style="border-bottom: solid #000000 4px; border-left: solid #000000 4px; border-right: solid #000000 4px; border-top: solid #000000 4px; padding-bottom: 10px; padding-left: 10px; padding-right: 10px; padding-top: 10px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="font-family: Arial; font-size: 32px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"&gt;000010&lt;/span&gt;&lt;span style="color: red; font-family: Arial; font-size: 32px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"&gt;00&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;span id="docs-internal-guid-6607067e-04ff-9ebf-b719-512d16701a74"&gt;&amp;nbsp;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="184px"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 61px;"&gt;&lt;td style="border-bottom: solid #000000 4px; border-left: solid #000000 4px; border-right: solid #000000 4px; border-top: solid #000000 4px; padding-bottom: 10px; padding-left: 10px; padding-right: 10px; padding-top: 10px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="font-family: Arial; font-size: 32px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"&gt;00001&lt;/span&gt;&lt;span style="color: red; font-family: Arial; font-size: 32px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"&gt;000&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;span id="docs-internal-guid-6607067e-04ff-9ebf-b719-512d16701a74"&gt;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;span id="docs-internal-guid-6607067e-0500-2926-c67d-b83dabf1748e"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="184px"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 61px;"&gt;&lt;td style="border-bottom: solid #000000 4px; border-left: solid #000000 4px; border-right: solid #000000 4px; border-top: solid #000000 4px; padding-bottom: 10px; padding-left: 10px; padding-right: 10px; padding-top: 10px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="font-family: Arial; font-size: 32px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"&gt;0000&lt;/span&gt;&lt;span style="color: red; font-family: Arial; font-size: 32px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"&gt;0000&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;span id="docs-internal-guid-6607067e-0500-53bf-dc52-7c644f1b5244"&gt;&amp;nbsp;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="184px"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 61px;"&gt;&lt;td style="border-bottom: solid #000000 4px; border-left: solid #000000 4px; border-right: solid #000000 4px; border-top: solid #000000 4px; padding-bottom: 10px; padding-left: 10px; padding-right: 10px; padding-top: 10px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="font-family: Arial; font-size: 32px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"&gt;000&lt;/span&gt;&lt;span style="color: red; font-family: Arial; font-size: 32px; font-weight: bold; vertical-align: baseline; white-space: pre-wrap;"&gt;00000&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;Et cetera.. Note that I showed you only the first byte, in general for integer we have 4 bytes. 11 is quite small number, so if we had a bigger number we would end up with 32 different tokens being produced.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;All these tokens are indexed including the value itself. Now to get the second half of magic and put pieces together we need to look at the search time.&lt;/div&gt;&lt;div&gt;&lt;h4 style="text-align: left;"&gt;Search time&lt;/h4&gt;&lt;div&gt;The search starts from a http request with some parameters. For example, for default configuration we have something like:&amp;nbsp;&lt;a href="http://localhost:8983/solr/collection1/query=price:[3%20TO%2012]"&gt;http://localhost:8983/solr/collection1/query=&lt;b&gt;price:[3 TO 12]&lt;/b&gt;&lt;/a&gt;. QueryParser parses the query to produce a Query object. Here is a sequence diagram:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-ZM8lMwt0fCg/VDqRD-faDYI/AAAAAAAAAao/PSqB3sJP6Cs/s1600/getRangedQuery.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-ZM8lMwt0fCg/VDqRD-faDYI/AAAAAAAAAao/PSqB3sJP6Cs/s1600/getRangedQuery.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Again we have &lt;i&gt;FieldType&lt;/i&gt; here which creates a query. &lt;i&gt;TrieIntField&lt;/i&gt; creates a&amp;nbsp;&lt;a href="https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/NumericRangeQuery.html"&gt;&lt;i&gt;NumericRangeQuery&lt;/i&gt;&lt;/a&gt;.&lt;/div&gt;&lt;div&gt;And here comes another half of magic.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;NumericRangeQuery&lt;/i&gt; extends &lt;i&gt;MultiTermQuery&lt;/i&gt;. The latter one is an abstract &lt;i&gt;Query&lt;/i&gt; that matches documents containing a subset of terms provided by a &lt;i&gt;FilteredTermEnum&lt;/i&gt; enumeration. This query cannot be used directly(abstract); you must subclass it and define &lt;i&gt;getEnum(Terms)&lt;/i&gt; to provide a &lt;i&gt;FilteredTermEnum&lt;/i&gt; that iterates through the terms to be matched.&amp;nbsp;&lt;i&gt;NumericRangeQuery &lt;/i&gt;returns an instance of &lt;i&gt;NumericRangeTermEnum&lt;/i&gt;&amp;nbsp;for enumerating all terms that match the sub-ranges for trie range queries.&lt;br /&gt;&lt;br /&gt;Underneath it uses utility class method &lt;i&gt;&lt;a href="https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/util/NumericUtils.html"&gt;NumericUtils&lt;/a&gt;.splitXXXRange() &lt;/i&gt;to calculate &amp;nbsp;values to be matched. We won't explain the algorithm here because it requires a separate blog post itself (it is briefly covered in the first useful link in the end of the post), but we will look at our last example one more time to find labelled term values for it:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-QliXyRef8Ro/VDqtLUTIq9I/AAAAAAAAAbA/nS84KAd5AbY/s1600/3To12.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-QliXyRef8Ro/VDqtLUTIq9I/AAAAAAAAAbA/nS84KAd5AbY/s1600/3To12.png" height="320" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;We had value=11 and precisionStep=1, but as you may remember we had &lt;b&gt;11&lt;/b&gt;, &lt;b&gt;10&lt;/b&gt;, 8, 8, 0, 0, 0.. lower precision values for it. But if you look to decimal picture you will notice that each upper level is a lower level divided by 10, right? So, here you have the same: if you divide the &lt;b&gt;11&lt;/b&gt; by 2 you will get 5 as integer quoitient and if you multiply 5 back, you will get &lt;b&gt;10&lt;/b&gt;. I know, it sounds tricky but it's really just the same as we had in decimal example, oh, binary system is so uncomfortable..&lt;br /&gt;&lt;br /&gt;There is one reasonable question to ask here. How to differ &lt;b&gt;1&lt;/b&gt; from lowermost level and &lt;b&gt;1&lt;/b&gt; from an upper level? Here is the answer: during indexation the value is encoded and first byte of encoded value is the shift which is unique across trie levels. So, in index it could look something like that (just a speculation):&lt;br /&gt;shift value&lt;br /&gt;0 3&lt;br /&gt;0 12&lt;br /&gt;2 1&lt;br /&gt;2 2&lt;br /&gt;&lt;br /&gt;As a bonus of such encoding, the values became ordered by trie level, 3 and 12 go first and then 1 and 2. And that's what&amp;nbsp;&lt;i&gt;NumericRangeTermEnum &lt;/i&gt;does. It enumerates terms/values in such an order.&lt;br /&gt;&lt;br /&gt;Actually it's not the end of the story and after you get a &lt;i&gt;FilteredEnum&lt;/i&gt; for &lt;i&gt;MultiTermQuery&lt;/i&gt; it needs to be somehow converted to a real Query because&amp;nbsp;&lt;i&gt;MultiTermQuery &lt;/i&gt;just throws UnsupportedOperationException on createWeight(..) method which means that &lt;i&gt;MultiTermQuery&lt;/i&gt; shouldn't be used as a real query. And here comes another story about different rewrite methods. If you want you could dig into javadoc &lt;i&gt;for&amp;nbsp;org.apache.lucene.search.MultiTermQuery.RewriteMethod. &lt;/i&gt;I want to simply mention two main rewrite methods and their general idea.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;BOOLEAN_QUERY_REWRITE&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;ol style="text-align: left;"&gt;&lt;li&gt;Collect terms (TermCollector) by using #getTermsEnum(...)&amp;nbsp;&lt;/li&gt;&lt;li&gt;For each term create TermQuery&amp;nbsp;&lt;/li&gt;&lt;li&gt;return BooleanQuery with all Term queries as leafs&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;&lt;b&gt;FILTER_REWRITE&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;/div&gt;&lt;ol style="text-align: left;"&gt;&lt;li&gt;Get termsEnum by using #getTermsEnum(...)&amp;nbsp;&lt;/li&gt;&lt;li&gt;Create FixedBitSet&amp;nbsp;&lt;/li&gt;&lt;li&gt;Get DocsEnum for each term&amp;nbsp;&lt;/li&gt;&lt;li&gt;Iterate over docs and bitSet.set(docid);&amp;nbsp;&lt;/li&gt;&lt;li&gt;return ConstantScoreQuery over filter (bitSet)&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;That's all I wanted to tell you here.&lt;/div&gt;&lt;h4 style="text-align: left;"&gt;Precision step&lt;/h4&gt;&lt;/div&gt;&lt;div&gt;I promised to tell you more about precision step. Actually precision step just determines the size of a shift, consequently:&lt;/div&gt;&lt;div&gt;&lt;ul style="text-align: left;"&gt;&lt;li&gt;It defines how much values to index for each original value&amp;nbsp;&lt;/li&gt;&lt;li&gt;Smaller precision step means less number of terms to match, which optimizes query speed but more terms to seek in index (because of additional terms)&lt;/li&gt;&lt;li&gt;Higher precision step means more number of terms to match, reducing the effect of optimization, if precision step is equal to a type size then nothing is optimized and no additional terms in index are present.&lt;/li&gt;&lt;li&gt;You can index with a lower precision step value and test search speed using a multiple of the original step value. Ideal step can be found only by testing&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;h4&gt;Performance&lt;/h4&gt;&lt;h4&gt;&lt;div style="font-weight: normal;"&gt;According to NumericRangeQuery javadoc:&amp;nbsp;&lt;/div&gt;&lt;div style="font-weight: normal;"&gt;&lt;ul&gt;&lt;li&gt;Opteron64 machine, Java 1.5, 8 bit precision step&amp;nbsp;&lt;/li&gt;&lt;li&gt;500k docs index&amp;nbsp;&lt;/li&gt;&lt;li&gt;TermRangeQuery in BooleanRewriteMode took about 30-40 seconds&amp;nbsp;&lt;/li&gt;&lt;li&gt;TermRangeQuery in FilterRewriteMode took about 5 seconds&amp;nbsp;&lt;/li&gt;&lt;li&gt;NumericRangeQuery took &amp;lt; 100ms&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/h4&gt;&lt;h3 style="text-align: left;"&gt;Useful links&lt;/h3&gt;Here are some useful links for curious readers:&lt;br /&gt;&lt;a href="http://www.slideshare.net/VadimKirilchuk/numeric-rangequeries"&gt;http://www.slideshare.net/VadimKirilchuk/numeric-rangequeries&lt;/a&gt;&lt;br /&gt;&lt;a href="http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/NumericRangeQuery.html"&gt;http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/NumericRangeQuery.html&lt;/a&gt;&lt;br /&gt;&lt;a href="http://invertedindex.blogspot.com/2009/11/numeric-range-queries-comparison.html"&gt;http://invertedindex.blogspot.com/2009/11/numeric-range-queries-comparison.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Thanks&lt;/div&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/1167114725763003994/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=1167114725763003994" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1167114725763003994" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1167114725763003994" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/KRxctfG0SRc/numeric-range-queries-in-lucenesolr.html" title="Numeric Range Queries in Lucene/Solr" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-L1gvE6AfmvU/U5KGRXP9oaI/AAAAAAAAAPM/CnJvQocgP58/s72-c/trie.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2014/10/numeric-range-queries-in-lucenesolr.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-5881932733915469988</id><published>2014-04-02T17:23:00.000-07:00</published><updated>2014-04-02T17:25:07.057-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="search" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="spell correction" /><category scheme="http://www.blogger.com/atom/ns#" term="~Pavel Vasilyev" /><title type="text">Alternative approach to spell correction: character histograms</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-left: 18pt; margin-top: 0pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;b id="docs-internal-guid-5d54b9cd-24db-b938-e709-474e9fe06b9e" style="font-weight: normal;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;h1 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 21px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Intro&lt;/span&gt;&lt;/h1&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Spell correction (a.k.a. “fuzzy matching”) is a comprehensive feature which is a “must-have” for any modern search engine. In Solr/Lucene, spell correction feature is implemented with &lt;/span&gt;&lt;a href="http://lucene.apache.org/core/4_7_0/suggest/org/apache/lucene/search/spell/SpellChecker.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;SpellChecker&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; class inside &lt;/span&gt;&lt;a href="http://lucene.apache.org/core/4_7_0/suggest/org/apache/lucene/search/spell/package-summary.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;org.apache.lucene.search.spell.*&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; package. The implementation details will lead you to &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/N-gram" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;N-gram&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; based matching model, which is pretty fast technique for most situations. Lucene 4.0 introduced &lt;/span&gt;&lt;a href="http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;fast FuzzyQuery&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;, which is built on a top of &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/Finite-state_machine" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;FSA&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; and offers better performance, while having limitations in supported accuracy range. Here we discuss yet another approach to spell correction, which may be useful in some cases and offer our benchmarks.&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;h1 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 21px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Problem statement&lt;/span&gt;&lt;/h1&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Assuming that we have some corpus of documents indexed, spell correction problem can be viewed as a task to find a number of “similar” phrases in the index for a given input phrase:&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;String[] suggestSimilar(String word, int numSug, float accuracy)&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Where:&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;word&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; - a concrete given input phrase;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;numSug&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; - number of suggestions to produce;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;accuracy&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; - metric describing how “far” can be our suggestions from original input phrase. Most commonly, distance between phrases is measured with &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/Levenshtein_distance" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Levenshtein distance&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;, and our “accuracy is tied with it with the following equation”: &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;maxEditDist = (1-accuracy)*word.length()&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;);&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;h1 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 21px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Approach&lt;/span&gt;&lt;/h1&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Brute force approach to solution of our problem would be to iterate through all phrases in index, calculating Levenshtein distance with input phrase and collecting numSug phrases with smallest distance. Unfortunately, Levenshtein distance is quite expensive operation and it is impractical to invoke for every phrase in the dictionary. So, we need to be cleverer than that. One way of optimization is to build a statistics for our phrases at index time which would allow to quickly filter only those phrases which can potentially have sufficiently small Levenshtein distance with input phrase. This is exactly what is happening with N-gram model and alternative algorithm we are discussing here follows same idea&lt;/span&gt;&lt;/div&gt;&lt;h1 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 21px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Character Histograms and Spell Correction&lt;/span&gt;&lt;/h1&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;This curious approach was inspired by Leo Polovets on this &lt;/span&gt;&lt;a href="http://qr.ae/nTTi7" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Quora conversation&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;. Our goal, as it was stated above, to filter out those phrases from our dictionary, which are guaranteed to don’t pass &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;accuracy&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; restriction. We should find such filter which will leave only ~1% of candidate phrases in the dictionary, for which we have to actually calculate &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/Levenshtein_distance" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Levenshtein distance&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;. Let’s buckle up and dig into details.&lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 17px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;What it character histograms&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Character histogram is a simple way of creating a general statistics about character distribution in a given word or a phrase.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;For instance, character histograms for “burberry watches” can be viewed as:&lt;/span&gt;&lt;img alt="chart_2.png" src="https://lh5.googleusercontent.com/lPl_UdzqMWF3rRxwgZHmPo6kRYw9guy1sEVgzUFba9tg53KuHb4qONwFpy8rwVpUB_S3lYXX1Wj7nXqH-EvzIORJcyPWc29WMVpIvLqnGTpb5lQj6u7exOgw0utGtA" style="-webkit-transform: rotate(0rad); border: none;" /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 17px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;How can we use histograms to compare two phrases&lt;/span&gt;&lt;/h2&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;It is easy to see that there is no direct connection between Levenstein distance and character histogram. For example, for two phrases “step” and “pets” character histograms would be the same: so histograms doesn’t account for character ordering. However, we still can get important information about &amp;nbsp;their Levenstein distance by using their character distribution.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;img alt="chart_2-3.png" src="https://lh3.googleusercontent.com/73YXtg6n-Xe1dbHCp3iJIckE96kkc986O1RM6dyp5Uqo6TYTjsupMYPqU6splT8txb_SLdcS9pdAFO1g8gjapSc0i5ZK74xuzJWTxiQSPRTDDOv60vCvGs6BmLPWyQ" style="-webkit-transform: rotate(0rad); border: none;" /&gt;&lt;img alt="chart_2-2.png" src="https://lh6.googleusercontent.com/kN1IB-Vk9LSgzlTTsqYxi7ikKJs6123xuJtSnrZNww-DsR3pMt1hmYhg6LBX9PltcnVhPxbNv01fCoKaz4lO0Cu3ZkXEMr-UDQOEtacoStUlUtwZQGsG5LQMe6sDIQ" style="-webkit-transform: rotate(0rad); border: none;" /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Distance between our histograms can be calculated with &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/Taxicab_geometry" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Manhattan distance&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;. Key insight of the algorithm is that&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; lower boundary&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; of Levenstein distance depends on Manhattan distance between our histograms: &lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;when character is inserted or deleted, histogram distance will be exactly 1&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;when character is substituted, histogram distance can be both 0 (if new character falls in same bucket) and 2 (new character falls in a different bucket). &lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;To support this uncertainty we can safely say that &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;LD &amp;gt;= MD/ 2&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Well, lower boundary of Levenstein distance as half of Manhattan distance is good, but we should be able to push it up a little bit! Indeed, it is obvious that Levenstein distance is larger that difference between phrase lengths (String distance, SD). We can try to combine both lower bounds into single lower bound. We only have to remember that inserts and deletions are accounted by both metrics, thus we have to divide by two to receive accurate lower boundary:&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;LD &amp;gt;= (MD+SD)/2&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Lets consider our example: Manhattan distance between our histogram is 3. To transform “baboon” into “balloon”, we need to delete one “b” and add two “l” in our histograms. It looks like that Levenstein distance is greater than or equal to 3. As histograms ignore character order, we can say that real distance can be greater than 3. However, in our sample there is a possible 2-edit (not 3-edit) sequence: “baboon” &amp;nbsp;(edit: “b” -&amp;gt; “l”) -&amp;gt; “baloon” (insert: “l”) -&amp;gt; “balloon”. Thus, some “delete”+”insert” operations bundle can be substituted by single “substitute” operation. Taking into account string length difference SD=|6-7|=1, &amp;nbsp;we receive true lower boundary (3+1)/2=2&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;So, if our spell correction accuracy requires Levenstein distance =1, we shouldn’t even bother to calculate Levenstein distance between baboon and balloon, it is guaranteed to be at least 2.&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;So the main conclusion here is:&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;we can obtain a lower bound for the Levenstein distance between input phrase and suggest candidate from a dictionary using character histograms;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;we can filter out suggest candidate if a lower bound for LD doesn’t satisfy &lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;accuracy&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; restriction.&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;All we need to do now is to have ridiculously fast ways to calculate and compare character histograms.&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 17px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;How can we encode histograms&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;For encoding let’s assume that each character is already represented as an ASCII code.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;We want to map character histograms into 64 bits (using Java notation) for obvious performacne reasons:&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;long encode(String word)&lt;/span&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Internals:&lt;/span&gt;&lt;/div&gt;&lt;ol style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;64 bits are be divided into 16 buckets;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Each bucket is a number from 0 to 15 (4 bits);&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;For each character from histogram:&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;ol style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;take an ASCII code modulo 16 - this will be bucket number&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;put/add character frequency to the specified bucket;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/ol&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Conditions:&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;It can happen that one bucket will be holding frequencies for more than characters, i.e. bucket collisions;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;It can happen that the character frequency will be more, that 15. In this case, as we are storing in a bucket a number modulo 16, we can sustain information losses.&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;In sake of performance, we are accepting those risks (in production, ensure that your phrases are short enough). This situation is similar to that of &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/Bloom_filter" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;Bloom filter&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;, where we accept possibility of false negative matches.&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Example: “banana”&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="margin-left: 0pt;"&gt;&lt;table style="border-collapse: collapse; border: none;"&gt;&lt;colgroup&gt;&lt;col width="72"&gt;&lt;/col&gt;&lt;col width="74"&gt;&lt;/col&gt;&lt;col width="115"&gt;&lt;/col&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;char&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;ASCII&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;bucket number = ASCII % 16&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;b&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;98&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;2&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;a&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;97&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;1&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;n&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;110&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;14&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;a&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;97&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;1&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;n&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;110&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;14&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr style="height: 0px;"&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;a&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;97&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;td style="border: 1px solid #000000; padding: 7px 7px 7px 7px; vertical-align: top;"&gt;&lt;div dir="ltr" style="line-height: 1; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;1&lt;/span&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Thus, “Banana” will be encoded into the following 64 bit sequence:&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;0 &amp;nbsp;&amp;nbsp;1 &amp;nbsp;&amp;nbsp;2 &amp;nbsp;&amp;nbsp;3 &amp;nbsp;&amp;nbsp;4 &amp;nbsp;&amp;nbsp;5 &amp;nbsp;&amp;nbsp;6 &amp;nbsp;&amp;nbsp;7 &amp;nbsp;&amp;nbsp;8 &amp;nbsp;&amp;nbsp;9 &amp;nbsp;&amp;nbsp;10 &amp;nbsp;11 &amp;nbsp;12 &amp;nbsp;13 &amp;nbsp;14 &amp;nbsp;15&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;0000&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;00110001&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;00000000000000000000000000000000000000000000&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;0010&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;0000&lt;/span&gt;&lt;/div&gt;&lt;h1 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 21px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Implementation Details&lt;/span&gt;&lt;/h1&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;In the index-time we can pre-calculate all histograms for the whole dictionary. Basically for the list of dictionary phrases we will construct the list of histograms (&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;long[]&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; in Java notation) of the same size.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;At the query time we will calculate the histogram of the input phrase and will iterate through this array calculating histogram difference and obtaining lower boundary for Levenstein difference. We will collect only those phrases, which has acceptable lower bound for the LD. After that, we will calculate actual Levenstein distance only for our candidates.&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;We have (naive and unoptimized) implementation using Java and Lucene 4.5.0. All sources you can find on my github page: &lt;/span&gt;&lt;a href="https://github.com/pvasilyev/pet-project-spellcheck" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;https://github.com/pvasilyev/pet-project-spellcheck&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;h1 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 21px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Benchmark Results&lt;/span&gt;&lt;/h1&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Benchmark results were obtained using &lt;/span&gt;&lt;a href="http://openjdk.java.net/projects/code-tools/jmh/" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;http://openjdk.java.net/projects/code-tools/jmh/&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; micro-benchmark harness.&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Those benchmarks utilized:&lt;/span&gt;&lt;/div&gt;&lt;ul style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;MacBook Air, Mac OS 10.9;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;8 GB, 1600 MHz, DDR-3 RAM;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;2 GHz Intel Core i7, Ivy-Bridge;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: disc; text-decoration: none; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;JMH test against 1 thread both for Lucene spell check engine (&lt;/span&gt;&lt;a href="http://lucene.apache.org/core/4_7_0/suggest/org/apache/lucene/search/spell/SpellChecker.html" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;SpellChecker&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;) and new spell check engine (current approach).&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Input dataset&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;: dictionary with some e-commerce data (brand names, categories, etc.) ranging from 1K up to 200K entries.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Input queries&lt;/span&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;: there was pre-generated list of queries with various typos (from 1 to 6 - depending on the query phrase length)&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Results can be found in a graph:&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;img alt="chart_1.png" src="https://lh3.googleusercontent.com/LJS87L0L_DPolXxT0g_l8H185c6q_n-e72tVuKOq6h5OF6aQvo0mLmgtdoiyLQGOmXSQVGfdC04Q9glXxRj_gCMImu60W0-nf81Niy5T4mDJ96jDveVpkVC9ZJVWMg" style="-webkit-transform: rotate(0rad); border: none;" /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;h1 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: 'Trebuchet MS'; font-size: 21px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Conclusions and further directions&lt;/span&gt;&lt;/h1&gt;&lt;b style="font-weight: normal;"&gt;&lt;br /&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;As you can see, even naive implementation of this approach provides significant performance boost over Lucene’s for small dictionaries. However, scalability of this implementation is poor and it starting to lose to Lucene at near 25K dictionary size. Current implementation can be improved both by pure code optimization and by improved algorithm of histogram filtration. This can be achieved by building additional data structures at index time to improve “neighbour” histogram search performance. Some advanced techniques like &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/BK-tree" style="text-decoration: none;"&gt;&lt;span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;"&gt;BK-trees&lt;/span&gt;&lt;/a&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt; can prove handy. &lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt; &lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"&gt;Stay tuned!&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/5881932733915469988/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=5881932733915469988" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/5881932733915469988" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/5881932733915469988" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/UrzW4H7pp94/alternative-approach-to-spell.html" title="Alternative approach to spell correction: character histograms" /><author><name>Anonymous</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/blank.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2014/04/alternative-approach-to-spell.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-8059596362849395628</id><published>2014-01-03T10:32:00.000-08:00</published><updated>2016-12-16T14:27:05.126-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="block-join" /><category scheme="http://www.blogger.com/atom/ns#" term="filters" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="nrt" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><title type="text">Segmented Filter Cache in Solr</title><content type="html">&lt;script&gt;  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){   (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),   m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)   })(window,document,'script','//www.google-analytics.com/analytics.js','ga');    ga('create', 'UA-47269270-1', 'griddynamics.com');   ga('send', 'pageview');  &lt;/script&gt; &lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;h2 dir="ltr" id="docs-internal-guid-12724b00-5745-bc9f-bcca-5ebb848afd7d" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: trebuchet ms;"&gt;&lt;span style="font-size: 17px;"&gt;To view an updated version of this post click &lt;a href="http://blog.griddynamics.com/segmented-filter-cache-and-block-join-query-parser-in-solr"&gt;here&lt;/a&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/h2&gt;&lt;h2 dir="ltr" id="docs-internal-guid-12724b00-5745-bc9f-bcca-5ebb848afd7d" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;Intro&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Let’s continue exploring block-join query parser in Solr. Today we’ll look at some sort of an accidental finding. What do you think, which query this parser yields if you omit query string eg. &lt;/span&gt;&lt;span style="font-family: &amp;quot;courier new&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;q={!parent which='type_s:parent'}&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;? It might not seems obvious, but it yields the same parent filter (&lt;/span&gt;&lt;span style="font-family: &amp;quot;courier new&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;type_s:parent&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;) from &lt;/span&gt;&lt;a href="https://github.com/apache/lucene-solr/blob/trunk/solr/example/solr/collection1/conf/solrconfig.xml#L533" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;perSegFilter&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; cache. Initial intention for this code branch was to expose parents bitset to users who want to reuse it as Solr’s filter query. It turns out that it can solve a filter cache regeneration issue and make Solr more NRT-friendly (Near Real Time).&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Let’s start from &lt;/span&gt;&lt;a href="http://wiki.apache.org/solr/SolrCaching#filterCache" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;caching basis&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;. If you specify &lt;/span&gt;&lt;span style="font-family: &amp;quot;courier new&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;fq=SIZE:XL&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; in &lt;/span&gt;&lt;a href="http://wiki.apache.org/solr/CommonQueryParameters#fq" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;request params&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;, Solr will create on-heap bitset on top of all segments and will use it as a filter in a very efficient manner. However, when you perform commit (no matter hard or soft) this bitset gets scratched and you are facing with slowdown, either at commit time, when filter bitsets are regenerated, or at query time, when &amp;nbsp;unlucky ‘cold’ requests have to regenerate those bitsets. Such pauses make Solr not really NRT- friendly. If you are dealing with such commit pauses and/or have to commit frequently, read on, otherwise you can consider it as quite untypical Solr use case.&lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;NRT-filters&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;To get rid of these pauses try to rewrite &lt;/span&gt;&lt;span style="font-family: &amp;quot;courier new&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;fq=SIZE:XL&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; to &lt;/span&gt;&lt;span style="font-family: &amp;quot;courier new&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;fq={!parent which='SIZE:XL'}.&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; Also, make sure that perSegFilter has proper size and has NoOpRegenerator specified. Now filters shouldn’t slowdown searches on commit nor commits themselves. If you wish further explanation, please comment below and I’ll elaborate.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;To check that it works as expected, look at cache entries by enabling &lt;/span&gt;&lt;a href="http://wiki.apache.org/solr/SolrCaching#showItems" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;cache introspection&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;. That’s what you should see in perSegFilter dropdown in SolrAdmin:&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;span style="font-family: &amp;quot;courier new&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;item_SIZE:XL: FixedBitSetCachingWrapperFilter(​QueryWrapperFilter(​SIZE:XL))&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Make sure that there is no hit in &lt;/span&gt;&lt;a href="http://wiki.apache.org/solr/SolrCaching#filterCache" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;filterCache&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; while you experiment with those filters.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;It’s worth to mention that intersecting such filters (when you specify several fqs) is not so efficient at comparison to plain Solr’s fq-s, which use bitwise AND on eight-byte words. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;One more drawback of this hack is the memory-wasteful &lt;/span&gt;&lt;a href="https://github.com/apache/lucene-solr/blob/trunk/lucene/join/src/java/org/apache/lucene/search/join/FixedBitSetCachingWrapperFilter.java?source=c" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;plain bitsets&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; (like Solr’s fq-s), rather than &lt;/span&gt;&lt;a href="https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/util/WAH8DocIdSet.java?source=c" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;more compact one&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;OR Filters&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;One of the question which regularly hits the mailing list is about &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-1223" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;disjunction of cached filters&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; i.e. if &lt;/span&gt;&lt;span style="font-family: &amp;quot;courier new&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;fq=SIZE:L&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; and &lt;/span&gt;&lt;span style="font-family: &amp;quot;courier new&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;fq=SIZE:M&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; are cached as two separate cache entries, can’t we reuse these bitsets in disjunction filter &lt;/span&gt;&lt;span style="font-family: &amp;quot;courier new&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;fq=SIZE:L OR SIZE:M &lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;and avoid caching it separately? Yes, we can! &lt;/span&gt;&lt;span style="font-family: &amp;quot;courier new&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;fq={!cache=false}{!parent which='SIZE:L'} OR {!parent which='SIZE:M'}.&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;In addition to cache introspection mentioned in previous paragraph, you can check that you do it right by placing this string to q= param and requesting debugQuery=true, in this case you should see something like this&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;courier new&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;str name="parsedquery"&gt;{!cache=false}{!cache=false}ConstantScore(FixedBitSetCachingWrapperFilter(QueryWrapperFilter(SIZE:L))) {!cache=false}ConstantScore(FixedBitSetCachingWrapperFilter(QueryWrapperFilter(SIZE:M)))&lt;/str&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;courier new&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Here you can see non-cached disjunction of two filters cached in perSegFilters. Last two notes from the previous paragraph (about inefficient combining and storing) are applicable here as well. &amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;Filters 2.0&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Pay attention that all this dance around filters is about using a heap to cache postings list. Providing that most times a postings list file is mmaped in according to &lt;/span&gt;&lt;a href="http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;the great advice&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;, how much is sense in it? The reason for caching is the postings on-disk format, which is CPU-intensive while decoding on reading. This format also stores some scoring necessary data like &lt;/span&gt;&lt;a href="http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html#formula_tf" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;tf&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; which is not needed for filtering; also Solr’s filters use bitwise operation for intersection that gets some gain usually. Thus, we can think about a specialized bitset codec as a future of filters. There is &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/LUCENE-5052" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;a modest prototype&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; of this approach. &amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;NRT-Facets&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;What else still made Solr not NRT-friendly? Right, &lt;/span&gt;&lt;a href="http://wiki.apache.org/solr/SolrCaching#fieldValueCache" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;UnInvertedFields&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;! What you can do with them? If you count facet on single value fields, you can use Lucene’s FieldCache by &lt;/span&gt;&lt;a href="http://wiki.apache.org/solr/SimpleFacetParameters#facet.method" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;facet.method=fcs&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;. if you deal with multivalue fields you can specify &lt;/span&gt;&lt;a href="http://wiki.apache.org/solr/DocValues#Solr.27s_DocValues_types" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;docValues&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; for them that triggers alternative faceting engine. DocValues facets use heap data structure (OrdinalMap) that leads to pauses similar to those, which are caused by UnInvertedField, however, they should be much shorter. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;The most NRT-faceting with sidecar taxonomy index is implemented in &lt;/span&gt;&lt;a href="http://www.lucenerevolution.org/2013/Faceted-Search-with-Lucene" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;Lucene Facets Framework&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;, but it’s still far from Solr. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;The last note is that NRT doesn’t mean better throughput in general, it just tries to achieve a more predictable latency (which doesn’t mean decreasing average latency).&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt; &lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;That’s it for today, folks.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt; &lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Trust noone, watch the profiler! &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; &amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/8059596362849395628/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=8059596362849395628" title="13 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/8059596362849395628" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/8059596362849395628" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/KkVO-7ZbFCQ/segmented-filter-cache-in-solr.html" title="Segmented Filter Cache in Solr" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><thr:total>13</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-4339709255876520708</id><published>2013-12-14T21:26:00.000-08:00</published><updated>2016-12-16T14:25:12.438-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="block-join" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="search" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><title type="text">Grandchildren and Siblings with Block Join</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div dir="ltr" id="docs-internal-guid-2ac59767-f1e3-9dde-0496-e299b07a52e2" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;To view an updated version of this post click &lt;a href="http://blog.griddynamics.com/searching-grandchildren-and-siblings-with-solr-block-join"&gt;here&lt;/a&gt;&lt;br /&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;&lt;a href="http://blog.griddynamics.com/2013/09/solr-block-join-support.html" style="text-decoration: none;"&gt;The last post&lt;/a&gt;&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; about block-join query parser in Solr got many comments and questions, so I decided to consider a few more complicated cases. &lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt; &lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Lets take Solr 4.5 or above and index the &lt;/span&gt;&lt;a href="https://gist.github.com/mkhludnev/7711492#file-t-shirts-xml" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;data&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;. Now the data structure is a bit more complex, so it worth a diagram:&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt;&lt;img height="306px;" src="https://lh6.googleusercontent.com/t8ah9JgqHF1NS6HahvSmpHwjKtuJ3PqUBYk-E6ql4qaEJ15kRjGhmFdux7Sp5ZUcMFqk2Wb45Ty8idDhUiL_sa1DclFzFb0LkjpoDK6r_VZJ9v-crU4JsXEI4w" width="430px;" /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; &amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;As you can see, this hierarchical structure is similar to entity-relationship models from RDBMS world. We name those nested entities “scopes”.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt; &lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Once data is indexed, use &lt;/span&gt;&lt;a href="http://localhost:8983/solr/collection1/select?q=*%3A*&amp;amp;wt=csv&amp;amp;rows=100" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;q=*:*&amp;amp;wt=csv&amp;amp;rows=100&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; to see how documents are aligned in &lt;/span&gt;&lt;a href="https://gist.github.com/mkhludnev/7711492#file-q-3a-wt-csv-rows-100-txt" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;blocks&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;Grandchildren search&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Lets consider a search for t-shirt product (parent) which has particular SKU (child) which has sufficient inventory available in particular storage. We model a storage as a child of SKU scope, and grandchild of Product scope.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt; &lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Here is a search for t-shirts which have sufficient inventory (&amp;gt;10) of Blue XL SKUs in CA &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="http://localhost:8983/solr/collection1/select?q=%7B!parent%20which=type_s:product%7D%2BCOLOR_s:Blue+%2BSIZE_s:XL+%2B%7B!parent%20which=type_s:sku%20v=%27%2BQTY_i:[10%20TO%20*]+%2BSTATE_s:CA%27%7D&amp;amp;rows=100&amp;amp;wt=xml" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;q={!parent which=type_s:product}+COLOR_s:Blue +SIZE_s:XL +{!parent which=type_s:sku v='+QTY_i:[10 TO *] +STATE_s:CA'}&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;here comes &lt;/span&gt;&lt;a href="http://blog.griddynamics.com/2013/09/solr-block-join-support.html?showComment=1385058045178" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;the trick mentioned&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; by &lt;/span&gt;&lt;a href="http://google.com/+DavidSmiley" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;David Smiley&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; - when child query contains a space you need to wrap it into {!... v=’..’} &lt;/span&gt;&lt;a href="http://wiki.apache.org/solr/LocalParams#Parameter_value" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;local parameter&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;, or extract into separate request parameter and refer to it by {!... v=$ref}...&amp;amp;ref=...&amp;amp;.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt; &lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;You can see that crossmatch is excluded - this query returns products 20 and 30. You can remove either QTY_i filter or COLOR_s, which brings product 10 into results.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt; &lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Needless to say that possible nesting depth is unlimited. One more interesting observation about block join that it provides blazingly fast transitive closure on parent-child relationship: you can search for grandchildren and deeper descendants directly, omitting queries for intermediate scopes. &lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;Sibling Scopes&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Vendor and SKU scopes share same parent Product, and are not nested in each other.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Let’s search for t-shirts which are made by Vendor Bob and cost between $20 and $25. Here local parameter reference is necessary:&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="http://localhost:8983/solr/collection1/select?q=%2B%7B!parent+which%3Dtype_s%3Aproduct%20v=$skuq%7D+%2B%7B!parent+which%3Dtype_s%3Aproduct%20v=$vendorq%7D&amp;amp;skuq=%2BCOLOR_s:Blue+%2BSIZE_s:XL+%2B%7B!parent+which%3Dtype_s%3Asku%20v=%27%2BQTY_i%3A[10+TO+*]+%2BSTATE_s%3ACA%27%7D&amp;amp;vendorq=%2BNAME_s:Bob+%2BPRICE_i:[20%20TO%2025]&amp;amp;rows=100&amp;amp;wt=xml" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;q=+{!parent which=type_s:product v=$skuq} +{!parent which=type_s:product v=$vendorq}&amp;amp;skuq=+COLOR_s:Blue +SIZE_s:XL +{!parent which=type_s:sku v='+QTY_i:[10 TO *] +STATE_s:CA'}&amp;amp;vendorq=+NAME_s:Bob +PRICE_i:[20 TO 25]&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt; &lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;As you can see, it returns only product 20, and if you relax query eg. choose Alice or accept more expensive t-shirts, product 30 appears. Works like relational calculus!&lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;Minor usability issue&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;It was raised in the mailing list &lt;/span&gt;&lt;a href="http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201311.mbox/%3CF415CE3A-EBE5-4D15-ADF1-C5EAD32A1EB2@sheffield.ac.uk%3E" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;thread&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;. It seems like to-child block join queries are not user friendly enough, especially for those queries which violate orthogonality between parent filter and child query. We follow up &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-5553" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;SOLR-5553&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;. &amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;P.S.&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Stay tuned! I’ll post about one &lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: line-through; vertical-align: baseline;"&gt;easter egg&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; unexpected breakthrough in Solr soon.&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/4339709255876520708/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=4339709255876520708" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/4339709255876520708" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/4339709255876520708" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/6-oPfo2Jg20/grandchildren-and-siblings-with-block.html" title="Grandchildren and Siblings with Block Join" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><thr:total>4</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-6172546087770333683</id><published>2013-09-05T22:23:00.000-07:00</published><updated>2016-12-16T14:30:18.735-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="block-join" /><category scheme="http://www.blogger.com/atom/ns#" term="join" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="search" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><title type="text">Solr block-join support</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;h2 dir="ltr" id="docs-internal-guid-7bfec1c8-efeb-7fd8-56ff-aa54470d1311" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: trebuchet ms;"&gt;&lt;span style="font-size: 17px;"&gt;To view an updated version of this post click &lt;a href="http://blog.griddynamics.com/how-to-use-block-join-to-improve-search-efficiency-with-nested-documents-in-solr"&gt;here&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/h2&gt;&lt;h2 dir="ltr" id="docs-internal-guid-7bfec1c8-efeb-7fd8-56ff-aa54470d1311" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;Introduction&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;As you may already know &lt;/span&gt;&lt;a href="http://issues.apache.org/jira/browse/SOLR-3076" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;block join support&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; has been committed into Solr and will be available starting from 4.5. Here Solr catches up with &lt;/span&gt;&lt;a href="http://www.elasticsearch.org/guide/reference/mapping/nested-type/" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;ElasticSearch&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;. Let me omit explanation and refer to these two &lt;/span&gt;&lt;a href="http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;great&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;a href="http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;articles&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;, and also mention my modest &lt;/span&gt;&lt;a href="http://blog.griddynamics.com/2012/08/block-join-query-performs.html" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;benchmark&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;I performed some experiments with 4.5-SNAPSHOT obtained from &lt;/span&gt;&lt;a href="http://wiki.apache.org/solr/NightlyBuilds" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;NightlyBuilds&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; and want to share my experience.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;One preparation step is necessary (however it might be already there): make sure you have &lt;/span&gt;&lt;a href="https://gist.github.com/mkhludnev/6406734#file-schema-xml" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;_root_ field&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; in schema.xml and &lt;/span&gt;&lt;a href="https://gist.github.com/mkhludnev/6406734#file-solrconfig-xml" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;perSegFilter&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; in solrconfig.xml.&lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;Indexing&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;SolrInputDocument has new methods getChildDocuments()/addChildDocument() for nesting child documents into a parent document. XML and Javabin formats are now capable to transfer them. &lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-5183" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;JSON support&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; is ongoing. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Start from indexing a &lt;/span&gt;&lt;a href="https://gist.github.com/mkhludnev/6406734#file-t-shirts-xml" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;few t-shirts&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;, which is a sample product-SKU hierarchy, by &lt;/span&gt;&lt;a href="http://lucene.apache.org/solr/4_4_0/tutorial.html" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;post.jar&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;script src="https://gist.github.com/mkhludnev/6406734.js?file=t-shirts.xml"&gt;&lt;/script&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;To check how blocks is laid out run &lt;/span&gt;&lt;a href="http://localhost:8983/solr/collection1/select?q=*%3A*&amp;amp;wt=csv" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;match-all query&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; with csv output. You see that parent document is placed &lt;/span&gt;&lt;a href="https://gist.github.com/mkhludnev/6406734#file-q-3a-wt-csv" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;right after children&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;It is worth to be aware about implicit _root_ field which works as block identifier, all children documents obtain _root_ value from the parent’s uniqueKey field. It’s used for overwriting whole block on update. &lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;Searching&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Let’s assume we have a query matching our Red-XL children documents (SKUs aka UPCs):&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="http://localhost:8983/solr/collection1/select?q=%2BCOLOR_s%3ARed+%2BSIZE_s%3AXL&amp;amp;wt=xml&amp;amp;indent=true" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;q=+COLOR_s:Red +SIZE_s:XL&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;. It returns children with ids 11 and 31. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Now let’s join from children to parent by calling special “parent” query parser.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3D%27type_s%3Aparent%27%7D%2BCOLOR_s%3ARed+%2BSIZE_s%3AXL%0A&amp;amp;wt=json&amp;amp;indent=true" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;q={!parent which='type_s:parent'}+COLOR_s:Red +SIZE_s:XL&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; that returns parents 10 and 30, as expected. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Local parameter ‘which’ provides a filter which distinguishes parent documents from children ones. Keep in mind two important things about it:&lt;/span&gt;&lt;/div&gt;&lt;ol style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;li dir="ltr" style="font-family: Arial; font-size: 15px; list-style-type: decimal; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="vertical-align: baseline;"&gt;it should not match any children documents;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li dir="ltr" style="font-family: Arial; font-size: 15px; list-style-type: decimal; vertical-align: baseline;"&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="vertical-align: baseline;"&gt;it should always match all parent documents.&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Pay attention that block join avoids cross-match problem - it doesn’t capture parent 20 which is a candidate for potential false positive match, as it has Red and XL SKU’s but doesn’t have SKU which is _both_ Red and XL.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;This {!parent} query can be combined with any other query and filter. For example we can constrain results by brand.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="http://localhost:8983/solr/collection1/select?q=%2BBRAND_s%3ANike+%2B_query_%3A%22%7B!parent+which%3Dtype_s%3Aparent%7D%2BCOLOR_s%3ARed+%2BSIZE_s%3AXL%22&amp;amp;wt=json&amp;amp;indent=true" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;q=+BRAND_s:Nike +_query_:"{!parent which=type_s:parent}+COLOR_s:Red +SIZE_s:XL"&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;same can be achieved by employing filter query:&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3Dtype_s%3Aparent%7D%2BCOLOR_s%3ARed+%2BSIZE_s%3AXL&amp;amp;fq=BRAND_s%3APuma&amp;amp;wt=xml&amp;amp;indent=true" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;q={!parent which=type_s:parent}+COLOR_s:Red +SIZE_s:XL&amp;amp;fq=BRAND_s:Puma&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Don’t try to constraint children by filter queries, it doesn’t work, because filter queries explicitly constraint {!parent} query. &amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;There is a “reverse” query parser for searching children documents by parent filter.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="http://localhost:8983/solr/collection1/select?q=%7B!child+of%3Dtype_s%3Aparent%7DBRAND_s%3APuma&amp;amp;wt=xml&amp;amp;indent=true" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;{!child of=type_s:parent}BRAND_s:Puma&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; returns SKUs belongs to the single Puma product.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Note that even as local parameter name changed, it has the same meaning, namely supplies a parent filter.&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;If you are not familiar with nested queries and local parameters check &lt;/span&gt;&lt;a href="http://searchhub.org/2009/03/31/nested-queries-in-solr/" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;the short intro&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;The last but not least - it works for distributed search too!&lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;Caveat&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;You always need to be quite accurate with updating blocks. They always need to be updated as whole. Let me show you one unlucky example. Let’s remove parent and left children in the index. &lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;a href="about:blank" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;&amp;lt;update&amp;gt;&amp;lt;delete&amp;gt;&amp;lt;query&amp;gt;id:10&amp;lt;/query&amp;gt;&amp;lt;/delete&amp;gt;&amp;lt;commit/&amp;gt;&amp;lt;/update&amp;gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; &amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;At first, It seems like everything still works. Children 11 and 12 are left in the index, but ToParentBlockJoinQuery somehow detects it and &lt;/span&gt;&lt;a href="http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3D%27type_s%3Aparent%27%7D%2BCOLOR_s%3ARed+%2BSIZE_s%3AXL%0A&amp;amp;wt=json&amp;amp;indent=true" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;q={!parent which='type_s:parent'}+COLOR_s:Red +SIZE_s:XL&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; &amp;nbsp;correctly returns parent 30. However after &lt;/span&gt;&lt;a href="http://localhost:8983/solr/update?stream.body=%3Coptimize/%3E" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;&amp;lt;optimize/&amp;gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; is executed, deleted parent document is purged from the index and all of the sudden children 11 and 12 start to be considered as if they belong to parent 20! The same query &lt;/span&gt;&lt;a href="http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3D%27type_s%3Aparent%27%7D%2BCOLOR_s%3ARed+%2BSIZE_s%3AXL%0A&amp;amp;wt=json&amp;amp;indent=true" style="text-decoration: none;"&gt;&lt;span style="color: #1155cc; font-family: &amp;quot;arial&amp;quot;; font-size: 15px; text-decoration: underline; vertical-align: baseline;"&gt;q={!parent which='type_s:parent'}+COLOR_s:Red +SIZE_s:XL&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt; now returns 20 and 30 which is wrong! I’m afraid there are few other similar cases of wrong behavior. As a reliable workaround I suggest to send explicit deletes by query with implicit field _root_. I hope this caveat will be fixed in future. &lt;/span&gt;&lt;/div&gt;&lt;h2 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 10pt;"&gt;&lt;span style="font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 17px; vertical-align: baseline;"&gt;Further Directions&lt;/span&gt;&lt;/h2&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Here is a few further desirable features in random order.&lt;/span&gt;&lt;/div&gt;&lt;h3 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 8pt;"&gt;&lt;span style="color: #666666; font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 16px; vertical-align: baseline;"&gt;Faceting&lt;/span&gt;&lt;/h3&gt;&lt;div dir="ltr" style="margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;div style="text-align: left;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; line-height: 1.15; vertical-align: baseline;"&gt;Facet component for block indexes is quite useful in eCommerce. The trickiest thing is to count SKU field values and aggregate them into product counts like it was described at the earlier &lt;a href="http://blog.griddynamics.com/2011/10/solr-experience-search-parent-child.html"&gt;posts&lt;/a&gt;. &lt;b&gt;Upd.&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot; , &amp;quot;helvetica&amp;quot; , sans-serif;"&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-5743"&gt;https://issues.apache.org/jira/browse/SOLR-5743&lt;/a&gt;&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;;"&gt;&lt;span style="font-size: 15px; line-height: 17.25px;"&gt;has the patch.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;h3 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 8pt;"&gt;&lt;span style="color: #666666; font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 16px; vertical-align: baseline;"&gt;FilterCache&lt;/span&gt;&lt;/h3&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Now there is no way to use filter query (fq=) or their combination in a child query, so it needs to read index every time, instead of intersecting bitsets from the heap. &lt;b&gt;Upd&lt;/b&gt;. however, you can use the suggestion from &lt;a href="http://blog.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html"&gt;next post&lt;/a&gt;.&lt;/span&gt;&lt;/div&gt;&lt;h3 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 8pt;"&gt;&lt;span style="color: #666666; font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 16px; vertical-align: baseline;"&gt;Schema&lt;/span&gt;&lt;/h3&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Now an application should be aware about relations between documents while it indexes and searches. However, it might be much more convenient if search engine provides “flat” navigation model to a front-end. Front-end just refines search results by color and search engine figures out itself which documents to filter and which one to join.&lt;/span&gt;&lt;/div&gt;&lt;h3 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 8pt;"&gt;&lt;span style="color: #666666; font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 16px; vertical-align: baseline;"&gt;Scoring Mode&lt;/span&gt;&lt;/h3&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;ToParentBlockJoinQuery supports a few modes of score calculations. {!parent} parser has None mode hardcoded. Contributions are welcome! &lt;b&gt;Upd:&lt;/b&gt; patch &lt;a href="https://issues.apache.org/jira/browse/SOLR-5882"&gt;SOLR-5882&lt;/a&gt;&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;h3 dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 8pt;"&gt;&lt;span style="color: #666666; font-family: &amp;quot;trebuchet ms&amp;quot;; font-size: 16px; vertical-align: baseline;"&gt;Group Collecting&lt;/span&gt;&lt;/h3&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;There is nothing similar to collecting groups in FileldCollapsing. &lt;b&gt;Upd&lt;/b&gt;. &lt;a href="https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents"&gt;[child] doctransformer&lt;/a&gt; is delivered! Here is the r&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; line-height: 17.25px;"&gt;elated&amp;nbsp;&lt;/span&gt;&lt;a href="https://issues.apache.org/jira/browse/SOLR-5285" style="font-family: Arial; font-size: 15px; line-height: 17.25px;"&gt;ticket&lt;/a&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; line-height: 17.25px;"&gt;.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; line-height: 17.25px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; line-height: 17.25px;"&gt;&lt;b&gt;UPD: &lt;/b&gt;DataImportHandler will be able to index children docs at 5.x. see&amp;nbsp;&lt;a href="https://issues.apache.org/jira/browse/SOLR-5147"&gt;SOLR-5147&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;b style="font-family: Arial; font-size: 15px; line-height: 17.25px;"&gt;UPD: &lt;/b&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; line-height: 17.25px;"&gt;SolrJ support for nesting children documents is provided via&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; line-height: 17.25px;"&gt;&lt;a href="http://lucene.apache.org/solr/4_9_1/solr-solrj/org/apache/solr/common/SolrInputDocument.html#getChildDocuments%28%29"&gt;SolrInputDocument.getChildDocuments()&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;&lt;/span&gt; &lt;br /&gt;&lt;div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;"&gt;&lt;span style="font-family: &amp;quot;arial&amp;quot;; font-size: 15px; vertical-align: baseline;"&gt;Have a good join!&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/6172546087770333683/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=6172546087770333683" title="49 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/6172546087770333683" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/6172546087770333683" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/Wh0QuwNa1gk/solr-block-join-support.html" title="Solr block-join support" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><thr:total>49</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2013/09/solr-block-join-support.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-981907660332172374</id><published>2012-08-23T10:10:00.003-07:00</published><updated>2016-02-27T17:04:58.313-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="block-join" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="search" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><title type="text">Block Join Query Performs</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div id="contents" style="margin: 6px;"&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;As join support is highly demanded feature, especially in eCommerce, I repeated&amp;nbsp;&lt;span class="c3" style="color: #1155cc; text-decoration: underline;"&gt;&lt;a class="c0" href="http://searchhub.org/dev/2012/06/20/solr-and-joins/" style="text-decoration: inherit;"&gt;Erick’s benchmark&lt;/a&gt;&lt;/span&gt;&amp;nbsp;with proposed&amp;nbsp;&lt;span class="c3" style="color: #1155cc; text-decoration: underline;"&gt;&lt;a class="c0" href="https://issues.apache.org/jira/browse/SOLR-3076" style="text-decoration: inherit;"&gt;block join support&lt;/a&gt;&lt;/span&gt;&amp;nbsp;for Solr, and want to share my observations.&lt;/div&gt;&lt;h3 class="c1" style="color: #666666; direction: ltr; font-family: Arial; font-size: 12pt; line-height: 1.15; padding-bottom: 4pt; padding-top: 14pt;"&gt;&lt;a href="http://www.blogger.com/blogger.g?blogID=3946011063058389308" name="h.ww57scppxyki"&gt;&lt;/a&gt;Definitions&lt;/h3&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;For this post and in future let’s distinguish&amp;nbsp;&lt;span class="c3" style="color: #1155cc; text-decoration: underline;"&gt;&lt;a class="c0" href="http://wiki.apache.org/solr/Join" style="text-decoration: inherit;"&gt;Join&lt;/a&gt;&lt;/span&gt;&amp;nbsp;and&amp;nbsp;&lt;span class="c3" style="color: #1155cc; text-decoration: underline;"&gt;&lt;a class="c0" href="http://www.google.com/url?q=http%3A%2F%2Fblog.mikemccandless.com%2F2012%2F01%2Fsearching-relational-content-with.html&amp;amp;sa=D&amp;amp;sntz=1&amp;amp;usg=AFQjCNEkgMbbodHWD1eHKVuagoaSwSHvBg" style="text-decoration: inherit;"&gt;Block Join&lt;/a&gt;&lt;/span&gt;. Most technical details are covered in&amp;nbsp;&lt;span class="c3" style="color: #1155cc; text-decoration: underline;"&gt;&lt;a href="http://www.lucenerevolution.org/2012/sessions-day-1#Martijn-van-Groningen"&gt;Martijn talk&lt;/a&gt;&lt;/span&gt;. &amp;nbsp;&lt;/div&gt;&lt;h3 class="c1" style="color: #666666; direction: ltr; font-family: Arial; font-size: 12pt; line-height: 1.15; padding-bottom: 4pt; padding-top: 14pt;"&gt;&lt;a href="http://www.youtube.com/watch?v=-OiIlIijWH0" name="h.nw5z2rjcb7gc"&gt;&lt;/a&gt;Data&lt;/h3&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;I have a single segment 55Gb index with 27 M docs - about a million parent documents, each having five children. I ran worst case from Eric’s benchmark, where join field has many unique values. It’s a worst case for Join, but not for BlockJoin.&lt;/div&gt;&lt;h3 class="c1" style="color: #666666; direction: ltr; font-family: Arial; font-size: 12pt; line-height: 1.15; padding-bottom: 4pt; padding-top: 14pt;"&gt;&lt;a href="http://www.blogger.com/blogger.g?blogID=3946011063058389308" name="h.7fg8uld72t81"&gt;&lt;/a&gt;Tools and Procedure&lt;/h3&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;I used &lt;a href="http://code.google.com/p/solrmeter/"&gt;Solr Meter&lt;/a&gt; with slightly modified RandomExecutor, which tries to keep specified rate of queries per time period. I prefer this constant throughput model, rather than virtual user’s model. Solr-Meter allows gently ramp up load and empirically find the saturation point. It provides several useful statistics and charts.&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;Also I attached&amp;nbsp;&lt;span class="c3" style="color: #1155cc; text-decoration: underline;"&gt;&lt;a class="c0" href="http://developer.apple.com/library/mac/#documentation/Darwin/Reference/Manpages/man8/iostat.8.html" style="text-decoration: inherit;"&gt;iostat&lt;/a&gt;&lt;/span&gt;&amp;nbsp;traces to show system load during tests.&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;I have 2.4 GHz Core i5 laptop with 8G RAM and good old 5400 rpm HDD onboard.&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;Query Result Cache and Filter Cache has been disabled, Document Cache was enabled and shows hit ratio about 0.5.&amp;nbsp;&lt;span class="c3" style="color: #1155cc; text-decoration: underline;"&gt;&lt;a class="c0" href="http://wiki.apache.org/solr/SolrCaching#Types_of_Caches_and_Example_Configuration" style="text-decoration: inherit;"&gt;See more&lt;/a&gt;&lt;/span&gt;&amp;nbsp;about these Solr bolts and nuts.&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;My goal is to find maximum throughput, which doesn’t impact search latency.&lt;/div&gt;&lt;h3 class="c1" style="color: #666666; direction: ltr; font-family: Arial; font-size: 12pt; line-height: 1.15; padding-bottom: 4pt; padding-top: 14pt;"&gt;&lt;a href="http://www.blogger.com/blogger.g?blogID=3946011063058389308" name="h.t3zif78bpvyf"&gt;&lt;/a&gt;Join&lt;/h3&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;A queries look like&lt;/div&gt;&lt;div class="c1" style="direction: ltr;"&gt;&lt;div style="font-family: Arial; font-size: 11pt;"&gt;&lt;span class="c2" style="font-family: &amp;quot;courier new&amp;quot;;"&gt;q=text_all:(patient OR autumn OR helen)&amp;amp;fl=id,score&amp;amp;sort=score desc&amp;amp;&lt;/span&gt;&lt;span class="c2 c4" style="font-family: &amp;quot;courier new&amp;quot;; font-weight: bold;"&gt;fq={!join from=join_id to=id}acl:[1303 TO 1309]&lt;/span&gt;&lt;br /&gt;&lt;span class="c2 c4" style="font-family: &amp;quot;courier new&amp;quot;; font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-hSSYPdioxwc/UDYDsa2KOAI/AAAAAAAAAGM/1IGtfz3NCks/s1600/join-hist.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img alt="" border="0" height="337" src="https://4.bp.blogspot.com/-hSSYPdioxwc/UDYDsa2KOAI/AAAAAAAAAGM/1IGtfz3NCks/s640/join-hist.png" title="Join - 100 req/min" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;span id="goog_1770551582"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;I did several measurements, but decided to post this particular histogram (caveat, it’s not a timeline). You can see that join almost never runs for less than a second, and CPU saturates with 100 requests per minute. Adding more queries harms latency.&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;From iostat trace you can see that there is no I/O activity, all index is cached in RAM via memory mapped files magic. I’ll tell about it later.&lt;/div&gt;&lt;h3 class="c1" style="color: #666666; direction: ltr; font-family: Arial; font-size: 12pt; line-height: 1.15; padding-bottom: 4pt; padding-top: 14pt;"&gt;&lt;a href="http://www.blogger.com/blogger.g?blogID=3946011063058389308" name="h.nesvpnbnwlj9"&gt;&lt;/a&gt;BlockJoin&lt;/h3&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;I sen the same queries with block join.&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;&lt;span class="c2" style="font-family: &amp;quot;courier new&amp;quot;;"&gt;q=text_all:(patient OR autumn OR helen)&amp;amp;fl=id,score&amp;amp;sort=score desc&amp;amp;&lt;/span&gt;&lt;span class="c4 c2" style="font-family: &amp;quot;courier new&amp;quot;; font-weight: bold;"&gt;fq={!parent which=kind:body}acl:[1303 TO 1309]&lt;/span&gt;&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;Here are latency timeline and statistics.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-DketE14P-Lo/UDYGM1fl6tI/AAAAAAAAAGY/VYz4bs2rk6g/s1600/bjq-fq-6k-timeline.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="330" src="https://3.bp.blogspot.com/-DketE14P-Lo/UDYGM1fl6tI/AAAAAAAAAGY/VYz4bs2rk6g/s640/bjq-fq-6k-timeline.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-iEW-tsmNgTk/UDYGMAXxxMI/AAAAAAAAAGU/vWnRxzm0gOU/s1600/bjq-fq-6k-stat.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="334" src="https://1.bp.blogspot.com/-iEW-tsmNgTk/UDYGMAXxxMI/AAAAAAAAAGU/vWnRxzm0gOU/s640/bjq-fq-6k-stat.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="c5 c1" style="direction: ltr; font-family: Arial; font-size: 11pt; height: 11pt;"&gt;&lt;span style="font-size: 11pt;"&gt;You see it! Search now takes a few tens of milliseconds and survives with 6K request per minute (100 qps). And you can notice plenty of free CPU!&lt;/span&gt;&lt;/div&gt;&lt;h3 class="c1" style="color: #666666; direction: ltr; font-family: Arial; font-size: 12pt; line-height: 1.15; padding-bottom: 4pt; padding-top: 14pt;"&gt;&lt;a href="http://www.blogger.com/blogger.g?blogID=3946011063058389308" name="h.wfza61kikwl5"&gt;&lt;/a&gt;Culprit&lt;/h3&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;We can check where Join spends so much CPU by simple&amp;nbsp;jstack:&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;&lt;span class="c2" style="font-family: &amp;quot;courier new&amp;quot;;"&gt;&amp;nbsp; &amp;nbsp;java.lang.Thread.State: RUNNABLE&lt;/span&gt;&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;&lt;span class="c2" style="font-family: &amp;quot;courier new&amp;quot;;"&gt;&amp;nbsp; &amp;nbsp; at o.a.l.codecs.BlockTreeTermsReader$FieldReader$&lt;/span&gt;&lt;span class="c4 c2" style="font-family: &amp;quot;courier new&amp;quot;; font-weight: bold;"&gt;SegmentTermsEnum&lt;/span&gt;&lt;span class="c2" style="font-family: &amp;quot;courier new&amp;quot;;"&gt;.docFreq(BlockTreeTermsReader.java:2098)&lt;/span&gt;&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;&lt;span class="c2" style="font-family: &amp;quot;courier new&amp;quot;;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at o.a.s.search.JoinQuery$&lt;/span&gt;&lt;span class="c4 c2" style="font-family: &amp;quot;courier new&amp;quot;; font-weight: bold;"&gt;JoinQueryWeight.getDocSet&lt;/span&gt;&lt;span class="c2" style="font-family: &amp;quot;courier new&amp;quot;;"&gt;(JoinQParserPlugin.java:338)&lt;/span&gt;&lt;/div&gt;&lt;h3 class="c1" style="color: #666666; direction: ltr; font-family: Arial; font-size: 12pt; line-height: 1.15; padding-bottom: 4pt; padding-top: 14pt;"&gt;&lt;a href="http://www.blogger.com/blogger.g?blogID=3946011063058389308" name="h.rbfdv6npde5a"&gt;&lt;/a&gt;I/O exercises&lt;/h3&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;You can notice that the last screenshot was taken with zero I/O rate. How could it be? I ran two tests to understand how cache index files impacts performance. You can consider it as lab exersize for the great lecture&amp;nbsp;&lt;span class="c3" style="color: #1155cc; text-decoration: underline;"&gt;&lt;a class="c0" href="http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html" style="text-decoration: inherit;"&gt;Use Lucene’s MMapDirectory on 64bit platforms, please!&lt;/a&gt;&lt;/span&gt;&amp;nbsp;&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;First of all, let’s explain how 55G index can ever be cached in just 8G RAM. You should know that not all files in your index are equally valuable (it other words - tune your schema wisely). In my index&amp;nbsp;&lt;span class="c2" style="font-family: &amp;quot;courier new&amp;quot;;"&gt;frq&lt;/span&gt;&amp;nbsp;file is 7.7G and&amp;nbsp;&lt;span class="c2" style="font-family: &amp;quot;courier new&amp;quot;;"&gt;tim&lt;/span&gt;&amp;nbsp;file is 427M only, and it’s almost all what’s needed for these queries. Of course, a file which stores primary key values is also read, but it doesn’t seem significant.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-tUpf27R_kgE/UDYIAwIe5YI/AAAAAAAAAGk/ddSeb_oZff4/s1600/bjq6kwarmup.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="338" src="https://1.bp.blogspot.com/-tUpf27R_kgE/UDYIAwIe5YI/AAAAAAAAAGk/ddSeb_oZff4/s640/bjq6kwarmup.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="c5 c1" style="direction: ltr; font-family: Arial; font-size: 11pt; height: 11pt;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;Here is search latency timeline taken after flushing filesystem cache with 50 threads configured in servlet container. Right after flush search takes more than 7 seconds.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-qky8JZdZMh4/UDYIuibcNoI/AAAAAAAAAGs/9J47wP1AIM8/s1600/4thrd-warmup-hist.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="366" src="https://1.bp.blogspot.com/-qky8JZdZMh4/UDYIuibcNoI/AAAAAAAAAGs/9J47wP1AIM8/s640/4thrd-warmup-hist.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;This timeline shows how search time decreases while cache gets warmed, but it’s taken with 4 thread limit in servlet container. All searches are sub-second. Although four threads server isn’t able to reach 6K requests per minute due to “idle” limit, it speedups much faster than 50 thread server with I/O bottleneck.&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;I/O numbers say that we hit HDD limit. My “lab machine” usually shows 100-200 tps (I/O transactions per second), but I saw even 300 once. First and third columns: KB/t - kilobytes per transaction and MB/s - IO throughput shows how efficiently it reads. To get peak numbers run&amp;nbsp;&lt;span class="c2" style="font-family: &amp;quot;courier new&amp;quot;;"&gt;cat * &amp;gt;/dev/null&lt;/span&gt;&amp;nbsp;in folder with your index files, and check iostat while it sequentially reads.&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;One more interesting observation is related to KB/t. My first tests showed really slow search and low I/O utilization about 4 KB/t. I was really upset until realized that on my OS, which is not Linux,&lt;span class="c3" style="color: #1155cc; text-decoration: underline;"&gt;&lt;a class="c0" href="http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/core/org/apache/lucene/store/FSDirectory.html" style="text-decoration: inherit;"&gt;FSDirectory&lt;/a&gt;&lt;/span&gt;&amp;nbsp;chooses NIOFSDirectory. After I explicitly specified MMapDirectory, in according to Uwe advice, cache magic starts working for me, and I’ve got the great result above.&lt;/div&gt;&lt;h3 class="c1" style="color: #666666; direction: ltr; font-family: Arial; font-size: 12pt; line-height: 1.15; padding-bottom: 4pt; padding-top: 14pt;"&gt;&lt;a href="http://www.blogger.com/blogger.g?blogID=3946011063058389308" name="h.jt3z3g6jpa2z"&gt;&lt;/a&gt;To block or not to block (join)?&lt;/h3&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;From my point of view BlockJoin is the most efficient way to do the join operation, but it doesn’t mean you need to get rid of your solution based on the other one (slow) Join. The room for Join is frequent children updates, and small indexes, of course.&lt;/div&gt;&lt;div class="c5 c1" style="direction: ltr; font-family: Arial; font-size: 11pt; height: 11pt;"&gt;&lt;/div&gt;&lt;div class="c1" style="direction: ltr; font-family: Arial; font-size: 11pt;"&gt;Happy (block) joining!&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/981907660332172374/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=981907660332172374" title="6 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/981907660332172374" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/981907660332172374" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/n2xeWkP85NE/block-join-query-performs.html" title="Block Join Query Performs" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://4.bp.blogspot.com/-hSSYPdioxwc/UDYDsa2KOAI/AAAAAAAAAGM/1IGtfz3NCks/s72-c/join-hist.png" height="72" width="72" /><thr:total>6</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2012/08/block-join-query-performs.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-7832207577379613268</id><published>2012-05-23T13:30:00.000-07:00</published><updated>2012-06-08T01:34:15.493-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="CI" /><category scheme="http://www.blogger.com/atom/ns#" term="QA" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><title type="text">Ignoring test failures at CI</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;b id="internal-source-marker_0.6568898633122444" style="text-align: -webkit-auto;"&gt;&lt;/b&gt;&lt;br /&gt;&lt;h2 dir="ltr"&gt;     &lt;b id="internal-source-marker_0.6568898633122444" style="text-align: -webkit-auto;"&gt;&lt;span style="font-family: Arial; font-size: 19px; vertical-align: baseline; white-space: pre-wrap;"&gt;Problem&lt;/span&gt;&lt;/b&gt;&lt;/h2&gt;&lt;b id="internal-source-marker_0.6568898633122444" style="text-align: -webkit-auto;"&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;Today, I want to talk about one obvious issue which occurs when your code and tests are written by the different people and you have a properly established &lt;a href="http://en.wikipedia.org/wiki/Continuous_integration"&gt;CI&lt;/a&gt; process. In this scenario, tests appear in VCS first, because without a test the code &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/Continuous_integration#Make_the_build_self-testing"&gt;&lt;span style="color: #1155cc; font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;can’t be committed&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;, but this test breaks the build. Obviously, test author skips it or disables it from CI test suite until this functionality is actually implemented. I believe that this is not the best practice, because it leads to a lot of wasted efforts from both dev and QA teams. Let’s look at what’s going on in typical project team:&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;h2 dir="ltr"&gt;    &lt;b id="internal-source-marker_0.6568898633122444" style="text-align: -webkit-auto;"&gt;&lt;span style="font-family: Arial; font-size: 19px; vertical-align: baseline; white-space: pre-wrap;"&gt;Commit&lt;/span&gt;&lt;/b&gt;&lt;/h2&gt;&lt;b id="internal-source-marker_0.6568898633122444" style="text-align: -webkit-auto;"&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;Developer checks out the disabled test for his new feature from VCS, runs it, and checks that the code is ready. Now he’s able to commit the code. He does it, but unfortunately makes two usual mistakes: he performs one-line-improvement right before commit without full test run (of course, what can possibly go wrong?); and he forgets to commit one property file. Doesn’t it happen to you? Really? &lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;h2 dir="ltr"&gt;   &lt;b id="internal-source-marker_0.6568898633122444" style="text-align: -webkit-auto;"&gt; &lt;span style="font-family: Arial; font-size: 19px; vertical-align: baseline; white-space: pre-wrap;"&gt;Test&lt;/span&gt;&lt;/b&gt;&lt;/h2&gt;&lt;b id="internal-source-marker_0.6568898633122444" style="text-align: -webkit-auto;"&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;Then he pings the test’s author, - Hey! I’ve done it. Check your test pls. Tester needs to run a private build to verify that all code changes have been committed, and then enables the test for CI. If tester skips the private build phase, which can include pulling snapshot or build and creating deployment, you’ll have CI failure without genuine code issue. &lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;h2 dir="ltr"&gt;  &lt;b id="internal-source-marker_0.6568898633122444" style="text-align: -webkit-auto;"&gt;  &lt;span style="font-family: Arial; font-size: 19px; vertical-align: baseline; white-space: pre-wrap;"&gt;What it should be &lt;/span&gt;&lt;/b&gt;&lt;/h2&gt;&lt;b id="internal-source-marker_0.6568898633122444" style="text-align: -webkit-auto;"&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;Just free tester from the unnecessary private build and testing cycle. Let CI work for you. Don’t skip test from CI suite, let it run and fail without impacting the build status. Tester sets the expectation for functionality, and let developers reach it, without a necessity to verify an uncommitted code. &lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;h2 dir="ltr"&gt;  &lt;b id="internal-source-marker_0.6568898633122444" style="text-align: -webkit-auto;"&gt;  &lt;span style="font-family: Arial; font-size: 19px; vertical-align: baseline; white-space: pre-wrap;"&gt;Tooling&lt;/span&gt;&lt;/b&gt;&lt;/h2&gt;&lt;b id="internal-source-marker_0.6568898633122444" style="text-align: -webkit-auto;"&gt;&lt;span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"&gt;JUnit&lt;/span&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt; has only one workaround - &lt;/span&gt;&lt;a href="http://junit.sourceforge.net/javadoc/org/junit/Assume.html#assumeTrue%28boolean%29"&gt;&lt;span style="color: #1155cc; font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;assumeTrue&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;, which allows to mark failed test as skipped. To introduce a fancier way you need to &lt;/span&gt;&lt;span style="color: #1155cc; font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;a href="http://stackoverflow.com/questions/4055022/mark-unit-test-as-an-expected-failure-in-junit"&gt;write your Runner&lt;/a&gt;.&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;b id="internal-source-marker_0.6568898633122444"&gt;&lt;span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"&gt;UPD: &lt;/span&gt;&lt;/b&gt;&lt;span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"&gt;there&lt;/span&gt;&lt;b id="internal-source-marker_0.6568898633122444"&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt; is &lt;a href="http://blog.schauderhaft.de/2009/10/04/junit-rules/"&gt;@Rule feature&lt;/a&gt; gives &lt;a href="https://gist.github.com/a37e897c2622a325ee7a"&gt;the simple solution&lt;/a&gt;. Thanks to &lt;a href="http://stackoverflow.com/questions/10804039/how-to-create-own-annotation-for-junit-that-will-skip-test-if-concrete-exception"&gt;the thread&lt;/a&gt;!&lt;/span&gt;&lt;/b&gt;&lt;b id="internal-source-marker_0.6568898633122444"&gt;&lt;br /&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"&gt;TeamCity&lt;/span&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt; has nice looking &lt;b id="internal-source-marker_0.6568898633122444" style="font-family: 'Times New Roman'; font-size: medium; white-space: normal;"&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt; &lt;/span&gt;&lt;a href="http://confluence.jetbrains.net/display/TCD65/Muting+Test+Failures"&gt;&lt;span style="color: #1155cc; font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;“mute” feature&lt;/span&gt;&lt;/a&gt;&lt;/b&gt;. Nice work, JetBrains!&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"&gt;TestNG&lt;/span&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt; has &lt;b id="internal-source-marker_0.6568898633122444" style="font-family: 'Times New Roman'; font-size: medium; white-space: normal;"&gt;&lt;a href="http://testng.org/javadoc/org/testng/annotations/Test.html#successPercentage%28%29"&gt;&lt;span style="color: #1155cc; font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;something&lt;/span&gt;&lt;/a&gt;&lt;/b&gt; along these lines. I&lt;/span&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;t works fine, but surefire doesn’t support it &lt;/span&gt;&lt;span style="color: #1155cc; font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;a href="http://jira.codehaus.org/browse/SUREFIRE-654"&gt;doesn’t support it&lt;/a&gt;.&lt;/span&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: Arial; font-size: 15px; vertical-align: baseline; white-space: pre-wrap;"&gt;JBehave&lt;/span&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt; has &lt;a href="http://jbehave.org/reference/stable/running-stories.html"&gt;ignoreFailureInView&lt;/a&gt; property, but it belongs to whole suite not for paticular test. &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;h2 dir="ltr"&gt;  &lt;b id="internal-source-marker_0.6568898633122444"&gt;  &lt;span style="font-family: Arial; font-size: 19px; vertical-align: baseline; white-space: pre-wrap;"&gt;Credits&lt;/span&gt;&lt;/b&gt;&lt;/h2&gt;&lt;b id="internal-source-marker_0.6568898633122444"&gt;&lt;span style="font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;I realized that some time ago but decide to post after found the same concern in the &lt;/span&gt;&lt;span style="color: #1155cc; font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;a href="http://mail-archives.apache.org/mod_mbox/lucene-dev/201205.mbox/%3Calpine.DEB.2.00.1205041516210.3541@bester%3E"&gt;mail thread&lt;/a&gt;.&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;b style="text-align: -webkit-auto;"&gt;&lt;span style="color: #1155cc; font-family: Arial; font-size: 15px; font-weight: normal; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/7832207577379613268/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=7832207577379613268" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/7832207577379613268" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/7832207577379613268" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/Am9jNGEr1IY/ignoring-test-failures-at-ci.html" title="Ignoring test failures at CI" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2012/05/ignoring-test-failures-at-ci.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-8194295765385379678</id><published>2011-10-25T10:28:00.000-07:00</published><updated>2011-10-25T10:40:58.984-07:00</updated><title type="text">Highlights from our OpenStack-specific blog</title><content type="html">&lt;p&gt;A few highlights from our &lt;a href="http://openstackgd.wordpress.com/"&gt;OpenStack blog&lt;/a&gt;:&lt;br /&gt;&lt;/p&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://openstackgd.wordpress.com/2011/10/06/how-we-build-packages-in-grid-dynamics-using-gear-mock/"&gt;How our RHEL port is built and tested, using Gear + Mock&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://openstackgd.wordpress.com/2011/10/03/cloudpipe-setting-up-vpn-for-projects/"&gt;Setting up VPN for projects with CloudPipe&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://openstackgd.wordpress.com/2011/10/06/using-nova-instead-of-eucatools-while-working-with-keypairs/"&gt;Using SSH keys with novaclient&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://openstackgd.wordpress.com/2011/10/12/improving-novaclient-cli-flexibility-with-ssh-keys-while-booting-server/"&gt;Improving novaclient CLI: boot a server with specific SSH key&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;Also, for those who do not follow our OpenStack blog closely, OpenStack Diablo release is out for both &lt;a href="http://openstackgd.wordpress.com/2011/10/03/openstack-2011-3-release/"&gt;RHEL&lt;/a&gt; and &lt;a href="http://openstackgd.wordpress.com/2011/10/04/diablo-centos-build/"&gt;CentOS&lt;/a&gt; (version 6, as usual). &lt;a href="http://openstackgd.wordpress.com/2011/10/05/source-code-for-diablo-packages/"&gt;Sources&lt;/a&gt; are available as well.&lt;/p&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/8194295765385379678/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=8194295765385379678" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/8194295765385379678" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/8194295765385379678" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/Z5cpMJkfpaQ/highlights-from-our-openstack-specific.html" title="Highlights from our OpenStack-specific blog" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><thr:total>0</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2011/10/highlights-from-our-openstack-specific.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-573639735676487931</id><published>2011-10-03T01:16:00.001-07:00</published><updated>2011-10-03T16:28:50.080-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="search" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><category scheme="http://www.blogger.com/atom/ns#" term="~Oleg Malakhov" /><title type="text">Solr Experience: search parent-child relations. Part III</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div&gt;In the previous posts (&lt;a href="http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html"&gt;part I&lt;/a&gt; and &lt;a href="http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html"&gt;part II&lt;/a&gt;)  I presented &amp;nbsp;challenges in searching parent-child relations by Solr, a  couple of workarounds and handsome final solution - SpanQueries.&lt;br /&gt;&lt;br /&gt;In  this post I will describe the infrastructure for SpanQueries we’ve  built inside Solr . It includes indexing, query parsing (creating  SpanQueries), caching filter queries, faceting and sorting.&lt;br /&gt;&lt;h2&gt;Indexing&lt;/h2&gt;Fist  of all I should say that we use SolrJ - embedded Solr server, and this  section is about adding documents into Solr by its’ Java API only.&lt;br /&gt;&lt;br /&gt;If you put a collection of values into &lt;a href="http://lucene.apache.org/solr/api/org/apache/solr/common/SolrInputDocument.html#addField%28java.lang.String,%20java.lang.Object%29"&gt;SolrInputDocument.addField(“field”, values)&lt;/a&gt; with &lt;i&gt;termPositions=”true”&lt;/i&gt; in schema.xml, Solr will interpret them as several different terms for one field and for each value it sets position to 0.&lt;br /&gt;&lt;br /&gt;The straightforward approach is adding all values as one text string (e. g. “red blue green”) and using analyzer (&lt;a href="http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/analysis/standard/StandardAnalyzer.html"&gt;StandardAnalyzer&lt;/a&gt;  works fine), which adds proper term positions into the index. In this  case, if we want to index several numbers, we have to add them as one  string, e. g. “123.45 42.15 98.27”. It’s not useful, so we implemented  field types for String, Integer, Double and Float. They create Field  instance with special &lt;a href="http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/analysis/TokenStream.html"&gt;TokenStream&lt;/a&gt;, which uses List as a source and saves &amp;nbsp;positions (indexes in List). &lt;br /&gt;&lt;h2&gt;Parsing SpanQueries&lt;/h2&gt;By default Solr uses &lt;a href="http://lucene.apache.org/solr/api/org/apache/solr/search/LuceneQParserPlugin.html"&gt;LuceneQParser&lt;/a&gt;  plugin &amp;nbsp;to parse user queries and create Lucene Queries. We created a  wrapper for it - SpanQParserPlugin - which transforms regular  BooleanQueries and TermQueries into SpanQueries. Then we registered it  in solrconfig.xml:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;queryParser name="span" class="com.griddynamics.solr.spans.SpanQParserPlugin"/&amp;gt;&lt;br /&gt;&lt;br /&gt;This plugin transforms BooleanQuery with AND clause into SpanNearQuery(...) and BooleanQuery with OR clause into SpanOrQuery.&lt;br /&gt;As result the query ”&lt;i&gt;fq={!span tag=spanFQ}+color:Red +size:XL&lt;/i&gt;” is parsed into:&lt;br /&gt;SpanNearQuery(&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SpanTermQuery("color", "Red”),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FieldMaskingSpanQuery(&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; SpanTermQuery("size", "XL"),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; “color”&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ), -1, false&lt;br /&gt;)&lt;br /&gt;&lt;br /&gt;Here &lt;i&gt;{!span}&lt;/i&gt; is &lt;a href="http://wiki.apache.org/solr/SolrPlugins#QParserPlugin"&gt;defType local param&lt;/a&gt;. It says to Solr to use SpanQParserPlugin and spanFQ is a &lt;a href="http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters"&gt;tag&lt;/a&gt; we will use for faceting later.&lt;br /&gt;Please be informed that there are a couple of better ways to extend query parsing. &lt;br /&gt;&lt;h2&gt;Filter queries caching&lt;/h2&gt;Solr  caches docSets (materialized FilterQuery results, which are intersected  on searching) and docLists (ranked query results that store document  lists). But they are not enough for us, because for faceting we need  spans, which passed the filter.&lt;br /&gt;&lt;br /&gt;We’ve created SpanQueryFilterCache for this - it’s similar to &lt;a href="http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/search/SpanFilterResult.html"&gt;SpanFilterResult&lt;/a&gt;  but uses primitives and arrays instead wrappers and collections. It  stores spans which were found by SpanQuery in the user cache provided by  Solr. To get spans we call SpanQueryFilterCache.getSpans(spanQuery,  solrIndexSearcher) that initializes the cache lazily and gets spans by  spanQuery key. &lt;br /&gt;&lt;h2&gt;Faceting&lt;/h2&gt;I suppose it’s the most interesting part of this article. &lt;br /&gt;Standard  field faceting mechanism calculates counts of documents that contain  field value. It doesn’t fit for us, because we need to count facets on  the matched spans only (parts of a document).&lt;br /&gt;&lt;br /&gt;To count span facets we made a special component SpanFacetComponent, which is activated with the parameter &lt;i&gt;span.facet.field&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;The query for searching Red sweaters and count facets for size and color fields looks like this:&lt;br /&gt;&lt;div dir="ltr" style="margin-bottom: 0pt; margin-top: 0pt; text-align: center;"&gt;q={!span tag=spanFQ}+color:Red&amp;amp;facet=true&amp;amp;&lt;i&gt;span.facet.field={!by=spanFQ aggOp=unique}color&amp;amp;span.facet.field={!by=spanFQ aggOp=unique}size&lt;/i&gt;&lt;/div&gt;And we have a Product, which passes the filter:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-fbxHxhYoqV4/TolxlJPWjHI/AAAAAAAAABE/bmmx9tXYQBg/s1600/picture_p3_01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-fbxHxhYoqV4/TolxlJPWjHI/AAAAAAAAABE/bmmx9tXYQBg/s1600/picture_p3_01.png" /&gt;&lt;/a&gt;&lt;/div&gt;Only  two of its’ Items pass the filter and must be counted for faceting. The  tag “spanFQ” (see above) gets SpanQuery instance and uses  SpanQueryFilterCache to enumerate matched spans. It works for &lt;i&gt;color&lt;/i&gt; and &lt;i&gt;size&lt;/i&gt; fields, because span doesn’t have a field and we can count &lt;i&gt;size&lt;/i&gt; facet by spans matched by &lt;i&gt;color&lt;/i&gt; filter. For counting facets we need field values on matched spans. We get from the use &lt;i&gt;forward view of data&lt;/i&gt; - UnInvertedSpans (it’s allusion to &lt;a href="https://issues.apache.org/jira/browse/SOLR-475"&gt;UnInvertedField&lt;/a&gt;; it stores mapping {(docId, spanNum) -&amp;gt; {termNum}}).&lt;br /&gt;&lt;br /&gt;We  have two Items, which passed the filter, but we have a constraint that a  single document increases facet count for a value only once i.e.  document gives a &lt;i&gt;set&lt;/i&gt; of values (not a bag&lt;i&gt;)&lt;/i&gt; for facet counts. The match above gives {Red,XL,XXL}. To determine which span should be counted we use &lt;i&gt;aggOp&lt;/i&gt; (aggregation operation) parameter that can be one of three values:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;unique - each unique value increases facet count;&lt;/li&gt;&lt;li&gt;first - only first span value increases facet count;&lt;/li&gt;&lt;li&gt;last - only last span value increases facet count.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;The last two operations are used for calculating min/max price (here we use &lt;i&gt;natural ordering&lt;/i&gt; enforced during indexing, instead honest query time min/max). It’s worth to implement min/max operations also. &lt;br /&gt;&lt;h2&gt;Sorting&lt;/h2&gt;Our  Product can have several prices. To sort by price we need to use  different values depending on sorting direction: for ascending - min  price, for descending - max price.&lt;br /&gt;The  solution is store prices aligned by positions and SpanQueries to get a  price values for sorting. During indexing we put prices in the Product  document in ascending order. In query time we use SpanQuery and  UnInvertedSpans to get spans with prices and then our  MultiValueFieldComparatorSource uses value from first (on asc) or last  (on desc) span for sorting. For example, our query with additional price  filter and sorting by price field will be:&lt;br /&gt;&lt;div dir="ltr" style="margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt;"&gt;&lt;i&gt;q={!span tag=spanFQ}+color:Red&lt;/i&gt;&lt;/div&gt;&lt;div dir="ltr" style="margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;i&gt;&amp;amp;fq={!span tag=priceFQ}+price:[5 TO 15]&lt;/i&gt;&lt;/div&gt;&lt;div dir="ltr" style="margin-bottom: 0pt; margin-left: 36pt; margin-top: 0pt; text-indent: 36pt;"&gt;&lt;i&gt;&amp;amp;sort.field={!bySpanFQ=priceFQ}price ASC&lt;/i&gt;&lt;/div&gt;&lt;h2&gt;Conclusion&lt;/h2&gt;That’s  all. We hope our experience and ideas will help someone to resolve his  or her issues about parent-child relations. In future we want to try  storing two or more terms for one field in one position - by this way we  want to handle the same issues with multi-valued field.&lt;br /&gt;We  also work on deeper understanding of the problem and building a  powerful model of nested documents on inverted lists. One of the last  findings is that ‘&lt;i&gt;false positive matching on multi-value fields&lt;/i&gt;’ is well-known &lt;a href="http://db.grussell.org/section005.html"&gt;Fan Trap&lt;/a&gt; in ER-modelling.&lt;br /&gt;&lt;br /&gt;There is a considerable alternative approach &lt;a href="https://issues.apache.org/jira/browse/LUCENE-3171"&gt;LUCENE-3171&lt;/a&gt;. It’s a little bit raw at the moment, but the idea is promising.&lt;/div&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/573639735676487931/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=573639735676487931" title="10 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/573639735676487931" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/573639735676487931" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/7wfOXeD8RA4/solr-experience-search-parent-child.html" title="Solr Experience: search parent-child relations. Part III" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/-fbxHxhYoqV4/TolxlJPWjHI/AAAAAAAAABE/bmmx9tXYQBg/s72-c/picture_p3_01.png" height="72" width="72" /><thr:total>10</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2011/10/solr-experience-search-parent-child.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-6343873243968971608</id><published>2011-07-27T23:32:00.001-07:00</published><updated>2011-07-28T14:45:20.770-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="indexing" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="search" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><category scheme="http://www.blogger.com/atom/ns#" term="~Oleg Malakhov" /><title type="text">Solr Experience: search parent-child relations. Part II</title><content type="html">In  the &lt;a href="http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html"&gt;previous post&lt;/a&gt; I presented an e-Commerce site that sells sweaters. I  showed how we tried to implement Faceted Navigation with Solr, found  the problem with the multi-valued field for searching parent-child  relations. I also proposed a workaround that solves the problem but that  is far from perfect.&lt;br /&gt;&lt;br /&gt;In this post I’ll describe the ultimate solution - SpanQueries.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;SpanQuery&lt;/h2&gt;SpanQueries are purposed for &lt;a href="http://en.wikipedia.org/wiki/Proximity_search_%28text%29"&gt;proximity search&lt;/a&gt;,  which considers terms’ positions inside text. Lucene can index terms’  positions in addition to terms. For example, “crazy cat and crazy dog”  gives us the following inverted index:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-5f-fIwGigRw/TjED9RCVeaI/AAAAAAAAAA4/8YQVxeKI3Ig/s1600/picture_p2_01.png" /&gt;&lt;/div&gt;Word “and” is excluded as stop word.&lt;br /&gt;&lt;br /&gt;SpanQuery considers that terms’ positions during searching. &lt;a href="http://lucene.apache.org/java/3_0_3/api/all/org/apache/lucene/search/spans/package-summary.html"&gt;Span&lt;/a&gt; - is a term position range or tuple that contains document ID and start and end  positions. A term can have several spans per document. There is a  basement method SpanQuery.getSpans(...), which returns iterator for  matched documents and spans.&lt;br /&gt;&lt;br /&gt;SpanQuery - is an abstract class, there are several its descendants. You can find their description in &lt;a href="http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/"&gt;the post &amp;nbsp;in the Lucid Imagination blog&lt;/a&gt; and &lt;a href="http://www.amazon.com/Lucene-Action-Second-Covers-Apache/dp/1933988177"&gt;"Lucene In Action" book&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;We are interested in three of them:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/spans/SpanTermQuery.html"&gt;SpanTermQuery&lt;/a&gt; - finds all spans by the term;&lt;/li&gt;&lt;li&gt;&lt;a href="http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/spans/SpanNearQuery.html"&gt;SpanNearQuery&lt;/a&gt; - combines several SpanQueries considering the distance between matched spans;&lt;/li&gt;&lt;li&gt;&lt;a href="http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html"&gt;FieldMaskingSpanQuery&lt;/a&gt; - masks SpanQuery by one field as query by another field.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Now let’s take a look at how do we use this SpanQueries to search parent-child relations.&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Parent-child relations with SpanQueries&lt;/h2&gt;We index Products in a way when attributes of each Item are located on the same position:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-wgc-KOeKx9M/TjED_TTYOJI/AAAAAAAAAA8/Q_EyzHd8i5o/s1600/picture_p2_02.png" /&gt;&lt;/div&gt;&lt;br /&gt;Combining  the SpanTermQueries with FiledMaskingSpanQuery and putting them inside  SpanNearQuery allows searching the intersections of the terms'  positions, and properly find the Product, which contains the specified  Item (color:Red and size:XL):&lt;br /&gt;SpanNearQuery(&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SpanTermQuery("color", "Red”),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FieldMaskingSpanQuery(&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; SpanTermQuery("size", "XL"),&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; “color”&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ), -1, false&lt;br /&gt;)&lt;br /&gt;&lt;br /&gt;This query returns the only documents, which have color:Red and size:XL at the same position i.e. belonging to the same Item:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-kujNds_ptH4/TjED_6oTuBI/AAAAAAAAABA/6hY-95AAWEs/s1600/picture_p2_03.png" /&gt;&lt;/div&gt;The third product (Style&amp;amp;co.) is not returned, because its' terms "Red" and "XL" are located in different positions.&lt;br /&gt;&lt;br /&gt;We've  got what we want out-of-the box. But there is another problem: Solr  doesn't have any infrastructure for supporting SpanQueries. In the next  post I will tell about how do we bring SpanQueries into the Solr,  including interesting issues of caching, faceting and sorting.&lt;br /&gt;&lt;br /&gt;P. S.: recently we found the &lt;a href="http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene"&gt;similar approach had been proposed&lt;/a&gt; and ongoing work in Lucene core (see &lt;a href="https://issues.apache.org/jira/browse/LUCENE-2454"&gt;LUCENE-2454&lt;/a&gt;, &lt;a href="https://issues.apache.org/jira/browse/LUCENE-3133"&gt;LUCENE-3133&lt;/a&gt;, &lt;a href="https://issues.apache.org/jira/browse/LUCENE-3171"&gt;LUCENE-3171&lt;/a&gt;). Looking forward for incorporating this feature in Solr.</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/6343873243968971608/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=6343873243968971608" title="4 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/6343873243968971608" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/6343873243968971608" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/-ZtzAEapOKQ/solr-experience-search-parent-child.html" title="Solr Experience: search parent-child relations. Part II" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-5f-fIwGigRw/TjED9RCVeaI/AAAAAAAAAA4/8YQVxeKI3Ig/s72-c/picture_p2_01.png" height="72" width="72" /><thr:total>4</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2011/07/solr-experience-search-parent-child.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-1306114694043780707</id><published>2011-07-02T11:26:00.000-07:00</published><updated>2013-09-05T13:57:51.416-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="Spring" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><title type="text">Spring Nested - Part III</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;div&gt;﻿In the previous posts &lt;a href="http://blog.griddynamics.com/2011/04/spring-nested-part-i-why.html"&gt;I&lt;/a&gt;, &lt;a href="http://blog.griddynamics.com/2011/04/spring-nested-part-ii-what.html"&gt;II&lt;/a&gt; I told you how we came up with our own Service Locator framework based on Spring. Frankly speaking, a root application context creates child application contexts and keeps a services registry, where the child contexts advertise and look up service beans by names. Today I will tell you about two vital features.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Handling Web Requests&lt;/b&gt;&lt;/div&gt;&lt;div&gt;We use &lt;a href="http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/mvc.html"&gt;Spring MVC&lt;/a&gt; to expose platform services via Web. To handle Web requests WAR should has &lt;a href="http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/mvc.html#mvc-handlermapping"&gt;handlers&lt;/a&gt; or &lt;a href="http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/mvc.html#mvc-controller"&gt;controllers&lt;/a&gt; declared in Spring application context. In our platform all business logic resides in component contexts aka child contexts. Therefore, a component developer declares request handlers and controllers in component context, but Spring MVC’s &lt;a href="http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/mvc.html#mvc-servlet"&gt;DispatcherServlet&lt;/a&gt; looks for request handlers in the root context only. So, we need to pass the request to these controller/handler beans declared in the target child context. We went through three approaches below one by one:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;the first was really explicit: child contexts publish their Web request handlers to the service registry. Root context contains a special request handlers, every of which delegates a request to a handler obtained from service registry by the specified name. So, we had to amend the root config xml when a new web handler appears in any component;&lt;/li&gt;&lt;li&gt;then, we introduced the handler mapping registry in the root context. Child contexts have factory beans register the child request handlers in that registry. Thus, the registry can handle a request by looking up a child handler by an URL and delegating request to it. That’s better than the first approach because we don’t need to update the root config xml on introducing a new request handler. But it was too far from non-invasive approach, i.e. we had to consider these declarations when add a new handler;&lt;/li&gt;&lt;li&gt;the final approach is completely implicit: root context has the same child handler mapping registry (url -&amp;gt; handler), but it’s populated automatically during scanning the child contexts for handler/controller beans. Then, this mapping registry handles requests by delegating them to particular child handlers found by an url. As result, component developer doesn’t care about the framework when he delivers a new Web request handler.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;In contrast to Spring DM way we have a single WAR with the single &lt;a href="http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/mvc.html#mvc-servlet"&gt;DispatcherServlet&lt;/a&gt; contains the several Web components. &lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Component Initialization and Dependencies&lt;/b&gt;&lt;/div&gt;&lt;div&gt;It’s the most feature I am proud of. To instantiate all child contexts we need to know a correct initialization sequence - which child context goes after which. Initially we relied on the list of config XMLs explicitly provided in the root context. Then, we found problems with this straightforward approach, e.g.&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;absence or fail of one export breaks the related lookup;&lt;/li&gt;&lt;li&gt;a looked up bean (which is actually a proxy) can be accessed before its’ context initialization.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;We decided to implement Dependency Analyzer that verifies dependencies across child contexts before any child context initialization. &lt;/div&gt;&lt;div&gt;How have we done it? We rejected the idea of parsing Spring context XMLs to analyze dependencies, just because we had two XML formats already. Spring has quite extensible and elegant design: &lt;a href="http://static.springsource.org/spring/docs/3.0.x/api/org/springframework/beans/factory/xml/XmlBeanDefinitionReader.html"&gt;BeanDefinitionReader&lt;/a&gt; parses XML and passes &lt;a href="http://static.springsource.org/spring/docs/3.0.x/api/org/springframework/beans/factory/config/BeanDefinition.html"&gt;BeanDefinitions&lt;/a&gt; to &lt;a href="http://static.springsource.org/spring/docs/3.0.x/api/org/springframework/beans/factory/support/BeanDefinitionRegistry.html#registerBeanDefinition(java.lang.String, org.springframework.beans.factory.config.BeanDefinition)"&gt;BeanDefinitionRegistry&lt;/a&gt;. For default flow BeanDefinitionRegistry is an ApplicationContext, which creates singleton bean instances etc. But we can run this routine against a fake one: &lt;a href="http://static.springsource.org/spring/docs/3.0.x/api/org/springframework/beans/factory/support/SimpleBeanDefinitionRegistry.html"&gt;SimpleBeanDefinitionRegistry&lt;/a&gt; that just keeps bean definitions in a map . Then analyzer can iterate through bean definitions, recognize exports and lookups, verify it, raise concerns and/or arrange initialization order. Here is the list of rules which analyzer enforces now:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;verifies that an every bean class is exist in classpath;&lt;/li&gt;&lt;li&gt;warns unnecessary exports (without any lookup);&lt;/li&gt;&lt;li&gt;prohibits imports without a matching export;&lt;/li&gt;&lt;li&gt;prohibit dependency cycles between contexts (a but allows to resolve them manually in “god mode”);&lt;/li&gt;&lt;li&gt;orders child context initialization considering exports before to imports;&lt;/li&gt;&lt;li&gt;finds a minimal list of the components which are required for running the given sub-set of services (transitively finds the dependent contexts);&lt;/li&gt;&lt;li&gt;skips the dependent contexts after a some context initialization failure;&lt;/li&gt;&lt;li&gt;outputs the derived component dependencies graph into debug log in &lt;a href="http://www.graphviz.org/"&gt;Graph Viz&lt;/a&gt; format.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;Some of these features just spot the trivial errors and reduce the number of log lines to look through. Remaining features provide a flexible and robust container for our platform.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Finally I’d like to spot a significant difference between Spring DM and Spring Nested. Spring DM has an &lt;a href="http://static.springsource.org/osgi/docs/2.0.0.M1/reference/html/bnd-app-ctx.html"&gt;asynchronous ad-hoc initialization&lt;/a&gt; which is too complex and inconvenient sometimes. Spring Nested proposes a single thread deterministic initialization. It’s rigid enough - prohibits cycles, but flexible - you code against services, don’t care about a component initialization sequence. And it’s developer friendly: you shouldn't change you Spring habits much; troubleshooting is easy as possible with the reduced logs and the preliminary checks. &lt;/div&gt;&lt;div&gt;If you accept some simplifications, which has been made in Spring Nested design this framework can help you with your large Spring application.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;The Real Life Application&lt;/b&gt;&lt;/div&gt;&lt;div&gt;Here is the modules dependencies graph of our application that is built with Spring Nested. This graph is derived from the services references by the dependency analyzer and logged in Graph Viz text format on startup.&lt;/div&gt;&lt;div&gt;&lt;a href="http://3.bp.blogspot.com/-b-uP89ncdQ4/Tg9yd4GGS2I/AAAAAAAAADM/sVGT8sY-mMQ/s1600/universe_grey.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img alt="" border="0" id="BLOGGER_PHOTO_ID_5624840317208775522" src="http://3.bp.blogspot.com/-b-uP89ncdQ4/Tg9yd4GGS2I/AAAAAAAAADM/sVGT8sY-mMQ/s400/universe_grey.png" style="cursor: hand; cursor: pointer; display: block; height: 78px; margin: 0px auto 10px; text-align: center; width: 400px;" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;I smudged the component names at this picture just to avoid NDA aspects. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Further Plans&lt;/b&gt;&lt;/div&gt;&lt;div&gt;We want to open its’ source. Please let us know if you are interested in it or leave a valuable feedback and ideas.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;This framework is available on&amp;nbsp;&lt;/b&gt;&lt;a href="https://github.com/griddynamics/banshun"&gt;https://github.com/griddynamics/banshun&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/1306114694043780707/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=1306114694043780707" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1306114694043780707" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/1306114694043780707" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/PNMfVYMJSC0/spring-nested-part-iii.html" title="Spring Nested - Part III" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-b-uP89ncdQ4/Tg9yd4GGS2I/AAAAAAAAADM/sVGT8sY-mMQ/s72-c/universe_grey.png" height="72" width="72" /><thr:total>0</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2011/07/spring-nested-part-iii.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-2688718753579819198</id><published>2011-06-06T07:15:00.000-07:00</published><updated>2011-06-20T11:16:19.078-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="indexing" /><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="lucene" /><category scheme="http://www.blogger.com/atom/ns#" term="search" /><category scheme="http://www.blogger.com/atom/ns#" term="Solr" /><category scheme="http://www.blogger.com/atom/ns#" term="~Mikhail Khludnev" /><category scheme="http://www.blogger.com/atom/ns#" term="~Oleg Malakhov" /><title type="text">Solr Experience: search parent-child relations. Part I</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;Typical e-Commerce sites provide a catalog of products with filtration (or search by criteria). We implement such catalog with&amp;nbsp;&lt;a href="http://lucene.apache.org/solr/"&gt;Solr&lt;/a&gt; search server based on &lt;a href="http://lucene.apache.org/"&gt;Apache Lucene&lt;/a&gt; library.&lt;br /&gt;&lt;br /&gt;A catalog entries usually have parent-child relations and search must consider these relations. Unfortunately Solr documents are flat and some straightforward approaches e.g. several indexes, denormalization have been rejected because of efficiency and implementation efforts.&lt;br /&gt;&lt;br /&gt;In this post I describe the problem and some possible approaches.&lt;br /&gt;&lt;h2&gt;E-Commerce site&lt;/h2&gt;&lt;b&gt;Product&lt;/b&gt;&amp;nbsp;is the model of any goods which are sold on the site. A product has attributes like Brand, Occasion etc. You can’t buy a Product, you can buy an &lt;b&gt;Item&lt;/b&gt;&amp;nbsp;- a concrete instance of the Product, which has such attributes like Color, Size, Length, etc. Also every Item shares Product’s attributes.&lt;br /&gt;&lt;br /&gt;For example, if you &lt;a href="http://www.shopadidas.com/search/index.jsp?kwCatId=&amp;amp;kw=nizza%20hi&amp;amp;origkw=Nizza%20Hi&amp;amp;sr=1"&gt;search "Nizza Hi" on www.shopadidas.com site&lt;/a&gt;, you see different sneakers of the single model ("Men's Originals Nizza Hi Shoes"). They have different colors, therefore they are Items. Then&amp;nbsp;&lt;a href="http://www1.macys.com/search/index.ognc?SearchTarget=*&amp;amp;Keyword=Nizza+Hi&amp;amp;KEYWORD_GO_BUTTON.x=0&amp;amp;KEYWORD_GO_BUTTON.y=0&amp;amp;KEYWORD_GO_BUTTON=KEYWORD_GO_BUTTON"&gt;search the same on Macys.com&lt;/a&gt; - you see the only pair of sneakers. If you look to its' page, you see that there are several colors available. The Macys.com search result contains Product, not Items.&lt;br /&gt;&lt;br /&gt;Let's have a site, where we sell sweaters with different colors and sizes (Items). There are three brands and each brand is presented as separate Product. The list of all available Items:&lt;span class="BRAND"&gt;&lt;/span&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;i&gt;&lt;span class="BRAND"&gt;Alfani&lt;/span&gt;&lt;/i&gt;:&lt;ul type="disc"&gt;&lt;li&gt;Red - XL&lt;/li&gt;&lt;li&gt;Red - XXL&lt;/li&gt;&lt;li&gt;Blue - XL&lt;/li&gt;&lt;li&gt;Green - L&lt;/li&gt;&lt;li&gt;Green - XXL&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;span class="BRAND"&gt;Calvin Klein&lt;/span&gt;&lt;/i&gt;:&lt;ul type="disc"&gt;&lt;li&gt;Red - XL&lt;/li&gt;&lt;li&gt;Blue - M&lt;/li&gt;&lt;li&gt;Blue - L&lt;/li&gt;&lt;li&gt;Green - M&lt;/li&gt;&lt;li&gt;Green - XL&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;span class="BRAND"&gt;Style&amp;amp;co.&lt;/span&gt;&lt;/i&gt;:&lt;ul type="disc"&gt;&lt;li&gt;Red - M&lt;/li&gt;&lt;li&gt;Blue - XXL&lt;/li&gt;&lt;li&gt;Blue - XL&lt;/li&gt;&lt;li&gt;Green - S&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ol&gt;To provide a better usability we need to show three Products instead of twelve Items, and allow&amp;nbsp;to filter them by color and size i.e. to provide &lt;a href="http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr"&gt;Faceted Navigation&lt;/a&gt;.&lt;br /&gt;&lt;ol&gt;&lt;/ol&gt;For example, a client need Red shoes with XL size. Therefore, the site must show Products that contain Items with &lt;i&gt;color:Red and size:XL&lt;/i&gt; only:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;b&gt;&lt;span class="BRAND"&gt;Alfani&lt;/span&gt;&lt;/b&gt;:&lt;ul type="disc"&gt;&lt;li&gt;&lt;b&gt;Red - XL&lt;/b&gt;&lt;/li&gt;&lt;li&gt;Red - XXL&lt;/li&gt;&lt;li&gt;Blue - XL&lt;/li&gt;&lt;li&gt;Green - L&lt;/li&gt;&lt;li&gt;Green - XXL&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;span class="BRAND"&gt;Calvin Klein&lt;/span&gt;&lt;/b&gt;:&lt;ul type="disc"&gt;&lt;li&gt;&lt;b&gt;Red - XL&lt;/b&gt;&lt;/li&gt;&lt;li&gt;Blue - M&lt;/li&gt;&lt;li&gt;Blue - L&lt;/li&gt;&lt;li&gt;Green - M&lt;/li&gt;&lt;li&gt;Green - XL&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="BRAND"&gt;Style&amp;amp;co.&lt;/span&gt;:&lt;ul type="disc"&gt;&lt;li&gt;Red - M&lt;/li&gt;&lt;li&gt;Blue - XXL&lt;/li&gt;&lt;li&gt;Blue - XL&lt;/li&gt;&lt;li&gt;Green - S&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ol&gt;I’ve highlighted with bold font Items and Products passed the filter. The third Product (&lt;span class="BRAND"&gt;Style&amp;amp;co.&lt;/span&gt;) doesn’t match the filter. Although it has Items with Red color and XL size, but it doesn’t contain any Item with both Red color and XL size we are looking for.&lt;br /&gt;&lt;h2&gt;Attemt #1: Two indexes&lt;/h2&gt;The first intention is using Solr as database: to create separate indexes for Items and Products. Thus we can get the result with two Solr-queries:&lt;br /&gt;&lt;ul style="text-align: left;"&gt;&lt;li&gt;find all Items passed the filter;&lt;/li&gt;&lt;li&gt;find all Products that contain Items we got on the previous step.&lt;/li&gt;&lt;/ul&gt;&lt;ol&gt;&lt;/ol&gt;The logic seems working, but further investigation has shown two problems:&lt;br /&gt;&lt;ul style="text-align: left;"&gt;&lt;li&gt;Pagination - usually we want to show not the whole result, but only 10, 30, 50, or 100 Products. But it's very expensive to query all Items (number of them can be huge) and than pass them inside the second query for getting first N Products;&lt;/li&gt;&lt;li&gt;Faceting -&amp;nbsp; we have to write a lot of custom code around Solr to calculate all facet counts from the result of the first query. Also the first query result must contain product ID, which is expensive.&lt;/li&gt;&lt;/ul&gt;&lt;ol&gt;&lt;/ol&gt;Thus this approach was rejected.&lt;br /&gt;&lt;h2&gt;Attempt #2: Denormalization&lt;/h2&gt;Denormalization is creating the only index for Items. It's similar to the previous approach, but there are two differences:&lt;br /&gt;&lt;ul style="text-align: left;"&gt;&lt;li&gt;productID field is not stored field, because we need it only for querying;&lt;/li&gt;&lt;li&gt;Faceting requires custom code too, but it can be placed inside Solr within own Components. They use Solr/Lucine internal data structures and much faster than post-processing the result.&lt;/li&gt;&lt;/ul&gt;&lt;ol&gt;&lt;/ol&gt;There is another problem - result collapsing. We need Products as final result, not Items. Solr’s FiledCollapsing can be used for this but it has a little bit different purpose. We ended up with our own collapser.&lt;br /&gt;&lt;br /&gt;Thid approach is not flexible: small requirements changes impact the codebase a lot.&lt;br /&gt;We've got that the document granularity should match the search result.&lt;br /&gt;&lt;h2&gt;Attempt #3: Item Attributes' Promotion&lt;/h2&gt;Then we came back to the Product index. The question was how to index Items inside Product documents with keeping their identity for filtering. Here we come to the attributes’ promotion approach: item’s attributes (sub-documents) are promoted into the multi-value document fields.&lt;br /&gt;But doing it straightforward we met the &lt;i&gt;false&lt;/i&gt; &lt;i&gt;positive multi-value field match problem&lt;/i&gt;. Let's look to Lucene documents:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-mvLF7SQv2WY/TeeF21MKPGI/AAAAAAAAAAU/vVNIM7z4yc0/s1600/picture_01_v2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-mvLF7SQv2WY/TeeF21MKPGI/AAAAAAAAAAU/vVNIM7z4yc0/s1600/picture_01_v2.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&amp;nbsp; &lt;/div&gt;&lt;b&gt;&lt;/b&gt;&lt;br /&gt;I highlighted with bold font the result for query &lt;i&gt;color:Red and size:XL&lt;/i&gt;. The third Product “&lt;span class="BRAND"&gt;Style&amp;amp;co.&lt;/span&gt;” is presented in the result table, but it must be filtered out (see our example above), because it doesn’t contain any Items with both Red color and XL size we are looking for.&lt;br /&gt;&lt;br /&gt;After we realized this problem, we moved to...&lt;br /&gt;&lt;h2&gt;Attempt #4: Term Encoding&lt;/h2&gt;The solution was quite brave - we solved it with an accurate term encoding. We introduced a composite terms for keeping an Item identity. For example:&lt;br /&gt;&lt;br /&gt;&lt;span class="BRAND"&gt;Alfani&lt;/span&gt;:&lt;br /&gt;&lt;ul type="disc"&gt;&lt;li&gt;Red - XL : &lt;i&gt;RedXL&lt;/i&gt;&lt;/li&gt;&lt;li&gt;Red - XXL : &lt;i&gt;RedXXL&lt;/i&gt;&lt;/li&gt;&lt;li&gt;Blue - XL : &lt;i&gt;BlueXL&lt;/i&gt;&lt;/li&gt;&lt;li&gt;Green - L : &lt;i&gt;GreenL&lt;/i&gt;&lt;/li&gt;&lt;li&gt;Green - XXL : &lt;i&gt;GreenXXL&lt;/i&gt;&lt;/li&gt;&lt;/ul&gt;As result you can find a document by an item attributes combination and avoid a false match for absent items like RedXL. We also had to generate all subsets of attribute values. Thus the full list of generated terms for our example looks like:&lt;br /&gt;&lt;div style="text-align: center;"&gt;Red, XL, RedXL, XXL,  RedXXL, Blue, BlueXL, Green, L, GreenL, GreenXXL.&lt;/div&gt;The findability is solved (because &lt;span class="BRAND"&gt;Style&amp;amp;co.&lt;/span&gt; doesn't have term RedXL, which we are looking for), but we still needed a proper faceting. Honestly speaking we faceted such composite fields with Tokenizer and  HashMap (a kind of brute force).&lt;br /&gt;&lt;br /&gt;We knew that this approach is far from perfect due to performance and scalability issues and the complex processing in query time. That's why we continued to find a more robust approach.&lt;br /&gt;&lt;h2&gt;Part I: Conclusion&lt;/h2&gt;Keep in mind that the described issue takes place with almost any case, when your search result contains the documents with any form of hierarchy. We actually met two instances of that:&lt;br /&gt;&lt;ul style="text-align: left;"&gt;&lt;li&gt;product document has an item sub-documents (which is not presented as a separate documents itself);&lt;/li&gt;&lt;li&gt;product documents are grouped by another documents reside in the index;&lt;/li&gt;&lt;li&gt;document has associations with another entity encoded by a term (it seems like relation sub-document).&lt;/li&gt;&lt;/ul&gt;&lt;ol&gt;&lt;/ol&gt;In the next part I will introduce the silver bullet - SpanQueries. &lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/2688718753579819198/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=2688718753579819198" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/2688718753579819198" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/2688718753579819198" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/imlDP7Fxwgo/solr-experience-search-parent-child.html" title="Solr Experience: search parent-child relations. Part I" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-mvLF7SQv2WY/TeeF21MKPGI/AAAAAAAAAAU/vVNIM7z4yc0/s72-c/picture_01_v2.png" height="72" width="72" /><thr:total>5</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2011/06/solr-experience-search-parent-child.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-6808317930895353577</id><published>2011-06-02T19:09:00.000-07:00</published><updated>2011-06-02T19:10:58.601-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="~Alexey Ragozin" /><title type="text">Understanding GC pauses in JVM, HotSpot's CMS collector.</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;&lt;br /&gt;&lt;div class="MsoNormal"&gt;&lt;i&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;(previous artictle "&lt;a href="http://blog.griddynamics.com/2011/06/understanding-gc-pauses-in-jvm-hotspots.html"&gt;Understanding GC pauses in JVM, HotSpot's minor GC&lt;/a&gt;")&lt;/span&gt;&lt;/i&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;Concurrent Mark Sweep (CMS) is one of HotSpot JVM low pause garbage collectors. CMS can do most of its work for reclaiming memory concurrently with application (without stopping it). But still it requires few stop-the-world pauses to make its work. This article will explain nature of these pauses and how to minimize them.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;Basics of concurrent mark sweep&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;HotSpot’s CMS is a generational collector, it means that heap is separated into young and old (tenured) space and these spaces are collected independently. For young space collection usual HotSpot’s copy collector is use (see &lt;a href="http://blog.griddynamics.com/2011/06/understanding-gc-pauses-in-jvm-hotspots.html"&gt;previous article&lt;/a&gt; about HotSpot’s young space collector). Concurrent Mark Sweep is used only to collect old space.&amp;nbsp;&lt;span style="font-size: 11pt; line-height: 115%;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;To enable of using CMS collector you have to specify&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New'; font-size: 15px; line-height: 17px;"&gt;–&lt;/span&gt;&lt;span style="font-family: 'Courier New'; font-size: 11pt; line-height: 115%;"&gt;XX:&lt;/span&gt;&lt;span style="font-family: 'Courier New'; font-size: 11pt; line-height: 115%;"&gt;+UseConcMarkSweepGC&lt;/span&gt;&lt;span style="font-size: 11pt; line-height: 115%;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt; in JVM’s command line.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;CMS collection cycle has following phases:&lt;/div&gt;&lt;ul style="margin-top: 0in;" type="disc"&gt;&lt;li class="MsoNormal" style="mso-list: l0 level1 lfo1;"&gt;Initial mark – this is      stop-the-world phase while CMS is collecting root references.&lt;/li&gt;&lt;li class="MsoNormal" style="mso-list: l0 level1 lfo1;"&gt;&amp;nbsp;Concurrent mark – this phase is done      concurrently with application, garbage collector traverses though object      graph in old space marking live objects.&lt;/li&gt;&lt;li class="MsoNormal" style="mso-list: l0 level1 lfo1;"&gt;Concurrent pre clean –      this is another concurrent phase, basically it is another mark phase which      will try to account references changed during previous mark phase. Main      reason for this phase is reduce time of stop-the-world remark phase.&lt;/li&gt;&lt;li class="MsoNormal" style="mso-list: l0 level1 lfo1;"&gt;Remark – once concurrent      mark is finished, garbage collector need one more stop-the-world pause to      account references which have been changed during concurrent mark phase.&lt;/li&gt;&lt;li class="MsoNormal" style="mso-list: l0 level1 lfo1;"&gt;Concurrent sweep – garbage      collector will scan through whole old space and reclaim space occupied by      unreachable objects.&lt;/li&gt;&lt;li class="MsoNormal" style="mso-list: l0 level1 lfo1;"&gt;Concurrent reset – after      CMS cycle is finished, some structures have to be reset before next cycle      can start.&lt;/li&gt;&lt;/ul&gt;&lt;div class="MsoNormal"&gt;Unlike most other garbage collectors, CMS does not do compaction of heap space. Instead of moving objects to make unoccupied space continuous, CMS keeps lists of all fragments of free memory. This way CMS is avoiding cost associated with relocating of live objects (and relocating of objects is expensive operation which require stop-the-world pause), but as down size of this heap space is prone to fragmentation. To minimize risk of fragmentation CMS is doing statistical analysis of object’s sizes and have separate free lists for objects of different sizes.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;Length of CMS pauses&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;CMS itself has only two pauses, but your application will also experience pauses of young space collector which is working in conjunction with CMS. See previous article about pauses of young space collector.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;i&gt;&lt;span class="Apple-style-span" style="color: #666666;"&gt;Initial mark&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;During&amp;nbsp;&amp;nbsp; initial mark CMS should collect all root references to start marking of old space. This includes:&lt;/div&gt;&lt;ul style="margin-top: 0in;" type="disc"&gt;&lt;li class="MsoNormal" style="mso-list: l0 level1 lfo1;"&gt;References from thread      stacks,&lt;/li&gt;&lt;li class="MsoNormal" style="mso-list: l0 level1 lfo1;"&gt;References from young space.&lt;/li&gt;&lt;/ul&gt;&lt;div class="MsoNormal" style="text-align: justify;"&gt;References from stacks are usually collected very quickly (less than 1ms), but time to collect references from young space depends on size of objects in young space. Normally initial mark starts right after young space collection, so Eden space is empty and only live objects are in one of survivor space. Survivor space is usually small and initial mark after young space collection often takes less than millisecond. But if initial mark is started when Eden is full it may take quite long (usually longer than young space collection itself).&lt;/div&gt;&lt;div class="MsoNormal"&gt;Once CMS collection is triggered, JVM may wait some time for young collection to happen before it will start initial marking. JVM configuration option &lt;span style="font-family: 'Courier New';"&gt;–XX:CMSWaitDuration=&lt;t&gt;&lt;/t&gt;&lt;/span&gt; can be used to set how long CMS will wait for young space collection before start of initial marking. If you want to avoid long initial marking pauses, you should configure this time to be longer than typical period of young collections in your application.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;&lt;i&gt;&lt;span class="Apple-style-span" style="color: #666666;"&gt;Remark&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;Most of marking is done in parallel with application, but it may not be accurate because application may modify object graph during marking. When concurrent marking is finished; garbage collector should stop application and repeat marking to be sure that all reachable objects marked as alive. But collector doesn’t have to traverse through whole object graph; it should traverse only reference modified since start of marking (actually since start pre clean phase). Card table (see card marking write barrier) is used to identify modified portions of memory in old space, but thread stacks and young space should be scanned once again.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&amp;nbsp;Usually most time of remark phase is spent of scanning young space. This time will be much shorter if we collect garbage in young space before starting of remark. We can instruct JVM to always force young space collection before CMS remark. Use JVM parameter &lt;span style="font-family: 'Courier New';"&gt;–XX:+CMSScavengeBeforeRemark&lt;/span&gt; to enable this option.&lt;/div&gt;&lt;div class="MsoNormal"&gt;Even is young space is empty, remark phase still have to scan through modified references in old space, this usually takes time close to normal young collection pause (due scanning of old space done during young collection is similar to scanning required for remark).&amp;nbsp;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;When CMS collection starts?&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;Unlike stop-the-world old space collectors, CMS collection cycle should start before old space become full. CMS collection is triggered when amount of free memory in old space falls below certain threshold (this threshold can be chosen by JVM based of runtime statistics or set via parameters) and actual start of CMS collection cycle may be delayed until next young collection.&lt;/div&gt;&lt;div class="MsoNormal"&gt;Normally objects are allocated in old space only during young space collection (which may promote some objects to old space). So CMS cycle usually starts right after young space collection, which is good because init mark pause will be very small.&lt;/div&gt;&lt;span style="font-size: 11pt; line-height: 115%;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;But in certain cases object may be allocated directly in old space and CMS cycle could start while Eden has lots of objects. In this case initial mark can be 10-100 times slower which is bad. Usually this is happening due to allocation of very large objects (few megabyte arrays). &amp;nbsp;To avoid these long pauses you should configure reasonable &lt;/span&gt;&lt;/span&gt;&lt;span style="font-family: 'Courier New'; font-size: 11pt; line-height: 115%;"&gt;–XX:CMSWaitDuration&lt;/span&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt;.&lt;/span&gt;&lt;br /&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="color: #666666; font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;b&gt;&lt;i&gt;Configuring fixed threshold for CMS start&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="line-height: 17px;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-size: 15px;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;You can set fixed threshold for olds space occupation for triggering CMS cycle by using JVM options&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt; &lt;span style="font-family: 'Courier New';"&gt;‑XX:+UseCMSInitiatingOccupancyOnly ‑XX:CMSInitiatingOccupancyFraction=70&lt;/span&gt; &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;(this will force CMS cycle to start when more than 70% of old space is used).&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-size: 15px;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="color: #666666; font-size: 15px;"&gt;&lt;i&gt;&lt;b&gt;Explicitly invoking CMS cycle&lt;/b&gt;&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-size: 15px;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;You can also configure JVM to start CMS cycle by invocation of System.gc() by &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span style="font-family: 'Courier New';"&gt;‑XX:+ExplicitGCInvokesConcurrent&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt; &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;command line option.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-size: 15px;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-size: 15px;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;b&gt;Full GC with CMS&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="font-size: 15px;"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;If CMS cannot free enough in old space, JVM may fallback to compacting collector. Compacting collector will force stop-the-world pause so it can be considered emergency case. Normally you would like to avoid full GC and long stop-the-world pause associated with it. Full GC may happen either if CMS is not fast enough for dealing with garbage (or collection cycle has been started too late) or due to fragmentation of old space (there is no large enough continuous space for object to be allocated). Also it is possible that you just didn’t give JVM enough memory and after full GC it will through &lt;span style="font-family: 'Courier New';"&gt;OutOfMemoryExpection&lt;/span&gt; anyway.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;b&gt;&lt;i&gt;&lt;span class="Apple-style-span" style="color: #666666;"&gt;Permanent generation collection&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;One of reasons why CMS may end up in full GC is garbage in permanent space. By default CMS does not reclaim unused space in permanent space. If your application is using multiple class loaders and/or reflection you may need to enable collecting of garbage in permanent space. JVM option &lt;span style="font-family: 'Courier New';"&gt;‑XX:+CMSClassUnloadingEnabled&lt;/span&gt; will allow CMS collector to clean permanent space. Remember that objects in permanent space may have references to normal old space thus even if permanent space is not full itself, references from perm to old space may keep some dead objects unreachable for CMS if class unloading is not enabled.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;span class="Apple-style-span" style="color: #666666;"&gt;&lt;i&gt;&lt;b&gt;Utilizing multiple cores&lt;/b&gt;&lt;/i&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;CMS has multiple phases. Some of them are concurrent; others are stop-the-world pauses but may be executed in parallel to compressed application freeze time.&lt;/span&gt;&lt;/div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;  &lt;/span&gt;&lt;br /&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;span style="font-family: 'Courier New';"&gt;‑XX:+CMSConcurrentMTEnabled&lt;/span&gt; – allows CMS to use multiple cores for concurrent phase.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;span style="font-family: 'Courier New';"&gt;‑XX:+ConcGCThreads=&lt;n&gt;&lt;/n&gt;&lt;/span&gt; – specifies number of thread for concurrent phases.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;span style="font-family: 'Courier New';"&gt;‑XX:+ParallelGCThreads=&lt;n&gt;&lt;/n&gt;&lt;/span&gt; – specifies number of thread for parallel work during stop-the-world pauses (by default it equals to number of physical cores).&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;span style="font-family: 'Courier New';"&gt;‑XX:+UseParNewGC&lt;/span&gt; – instructs JVM to use parallel collector for young space collections in conjunction with CMS.&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;i&gt;&lt;span class="Apple-style-span" style="color: #0b5394;"&gt;&lt;b&gt;Comming soon:&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;ul style="text-align: left;"&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;i&gt;&lt;span class="Apple-style-span" style="color: #0b5394;"&gt;&lt;b&gt;Tuning CMS collector for large IMDG storage nodes&lt;/b&gt;&lt;/span&gt;&lt;/i&gt;&lt;/span&gt;&lt;/li&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt; &lt;/span&gt;&lt;/ul&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/6808317930895353577/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=6808317930895353577" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/6808317930895353577" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/6808317930895353577" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/hvUGysYmAFM/understanding-gc-pauses-in-jvm-hotspots_02.html" title="Understanding GC pauses in JVM, HotSpot's CMS collector." /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><thr:total>2</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2011/06/understanding-gc-pauses-in-jvm-hotspots_02.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-7584924187610960896</id><published>2011-06-01T03:04:00.000-07:00</published><updated>2011-06-06T04:31:36.823-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="Java" /><category scheme="http://www.blogger.com/atom/ns#" term="~Alexey Ragozin" /><title type="text">Understanding GC pauses in JVM, HotSpot's minor GC.</title><content type="html">&lt;div dir="ltr" style="text-align: left;" trbidi="on"&gt;Stop-the-world pauses of JVM due to the work of garbage collector are known foes of java-based applications. HotSpot JVM has a set of very advanced and tunable garbage collectors, but to find optimal configuration it is very important to understand an exact mechanics of garbage collection algorithms. This article is first of the series explaining how exactly GC spends our precious CPU cycles during stop-the-world pauses. An algorithm for young space garbage collection in HotSpot is explained in this post.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Structure of heap&lt;/b&gt;&lt;br /&gt;Most of modern GCs are generational. That means that java heap memory is separated into few "spaces". Spaces are usually distinguished by “age” of resident objects. Objects are allocated in young space, then eventually copied to old (or tenured) space, if they survive long enough. This principle is based on hypothesis that most object “die young”, i.e. majority of objects become garbage shortly after being allocated. All HotSpot garbage collectors separate memory into 5 spaces (though for G1 collector spaces may be not continuous).&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-NwrMpaA2Wf8/TeYGgmdzLXI/AAAAAAAAI0c/GeBQSgDhXQE/s1600/blog-6.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-NwrMpaA2Wf8/TeYGgmdzLXI/AAAAAAAAI0c/GeBQSgDhXQE/s1600/blog-6.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Eden are space there objects are allocated,&lt;br /&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Survivor spaces are used to receive object during young (or minor GC),&lt;br /&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Tenured space is for long lived objects,&lt;br /&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Permanent space is for JVM own objects (like classes and JITed code), it is behaves just like tenured space so we will ignore it for rest of article.&lt;br /&gt;Eden and 2 survivor spaces together are called young space.&lt;br /&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;HotSpot GC algorithms overview&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;HotSpot JVM is implementing few algorithms for GC which are combined in few possible GC profiles.&amp;nbsp;&lt;/div&gt;&lt;div&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Serial generational collector (&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;-XX:+UseSerialGC&lt;/span&gt;).&lt;/div&gt;&lt;div&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Parallel for young space, serial for old space generational collector (&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;-XX:+UseParallelGC&lt;/span&gt;).&lt;/div&gt;&lt;div&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Parallel young and old space generational collector (&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;-XX:+UseParallelOldGC&lt;/span&gt;).&lt;/div&gt;&lt;div&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Concurrent mark sweep with serial young space collector (&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;-XX:+UseConcMarkSweepGC&lt;/span&gt;&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;–XX:-UseParNewGC&lt;/span&gt;).&lt;/div&gt;&lt;div&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Concurrent mark sweep with parallel young space collector (&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;-XX:+UseConcMarkSweepGC&lt;/span&gt; &lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;–XX:+UseParNewGC&lt;/span&gt;).&lt;/div&gt;&lt;div&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;G1 garbage collector (&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;-XX:+UseG1GC&lt;/span&gt;).&lt;/div&gt;&lt;div&gt;All profiles except G1 are using almost same young space collection algorithms (with serial vs parallel variations).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Write barrier&lt;/b&gt;&lt;/div&gt;&lt;div&gt;Key point of generational GC is what it does need to collect entire heap each time, but just portion of it (e.g. young space). But to achieve this JVM have to implement special machinery called “write barrier”. There 2 types of write barriers implemented in HotSpot: dirty cards and snapshot-at-the-beginning (SATB). SATB write barrier is used in G1 algorithms (which is not covered in this article). All other algorithms are using dirty cards.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="color: #444444;"&gt;Dirty cards write barrier&lt;/span&gt;&lt;/b&gt;&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"&gt;Principle of dirty card write-barrier is very simple. Each time when program modifies reference in memory, it should mark modified memory page as dirty. There is a special card table in JVM and each 512 byte page of memory has associated byte in card table.&lt;/span&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-4BSXuryn7Ss/TeYHvOqwltI/AAAAAAAAI0g/kfH9vN0KgrI/s1600/blog-7.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-4BSXuryn7Ss/TeYHvOqwltI/AAAAAAAAI0g/kfH9vN0KgrI/s1600/blog-7.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"&gt;&lt;b&gt;Young space collection algorithm&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;Almost all new objects (there are few exception when new object can be allocated directly in old space) are allocated in Eden space. To be more effective HotSpot is using thread local allocation blocks (TLAB) for allocation of new objects, but TLAB themselves are allocated in Eden. Once Eden becomes full minor GC is triggered. Goal of minor GC is to clear fresh garbage in Eden space. Copy-collection algorithm is used (live objects are copied to another space, and then whole space is marked as free memory). But before start collecting live objects, JVM should find all root references. Root references for minor GC are references from stack and all references from old space.&lt;/div&gt;&lt;div&gt;Normally collection of all reference from old space will require scanning through all objects in old space. That is why we need write-barrier. All objects in young space have been created (or relocated) since last reset of write-barrier, so non-dirty pages cannot have references into young space. This means we can scan only objects in dirty pages.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-JeoPvpns3II/TeYILJO5mnI/AAAAAAAAI0k/rIuloiOSZ-I/s1600/blog-8.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-JeoPvpns3II/TeYILJO5mnI/AAAAAAAAI0k/rIuloiOSZ-I/s1600/blog-8.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Once initial reference set is collected, dirty cards are reset and JVM starts coping of live objects from Eden and one of survivor spaces into other survivor space. JVM only need to spend time on live objects. Relocating of object also requires updating of references pointing to it.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-ooT_iuZ694I/TeYIcAQrbgI/AAAAAAAAI0o/x4dy6cb2dyA/s1600/blog-9.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/-ooT_iuZ694I/TeYIcAQrbgI/AAAAAAAAI0o/x4dy6cb2dyA/s1600/blog-9.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;Finally we have Eden and one survivor space clean (and ready for allocation) and one survivor space filled with objects.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;i&gt;&lt;span class="Apple-style-span" style="color: #444444;"&gt;Object promotion&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;If object is not cleared during young GC it will be eventually copied (promoted) to old space. Promotion occurs in following situations:&lt;/div&gt;&lt;div&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;-XX:+AlwaysTenure&lt;/span&gt; &amp;nbsp;makes JVM to promote objects directly to old space instead of survivor &amp;nbsp;space (survivor spaces are not used in this case).&lt;/div&gt;&lt;div&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;once survivor space is full, all remaining live object are relocated directly to old space.&lt;/div&gt;&lt;div&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;If object has survived certain number of young space collections, it will be promoted to old space (required number of collections can be adjusted using &lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;–XX:MaxTenuringThreshold&lt;/span&gt; option and &lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;–XX:TargetSurvivorRatio&lt;/span&gt; JVM options).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: #444444;"&gt;&lt;i&gt;&lt;b&gt;Allocation of new objects in old space&lt;/b&gt;&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;It would be beneficial if we could possibly allocate long lived objects directly in old space. Unfortunately there is no way to instruct JVM to do this for particular object. But there are few cases when object can be allocated directly in old space.&lt;/div&gt;&lt;div&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;Option &lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;-XX:PretenureSizeThreshold=&lt;i&gt;n&lt;/i&gt;&lt;/span&gt;&amp;nbsp;instructs JVM what all objects larger &amp;nbsp;than &lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;&lt;i&gt;n&lt;/i&gt;&lt;/span&gt;&amp;nbsp;bytes should be allocated directly in old space (though if object size fits TLAB, JVM will allocate it in TLAB and thus young space, so you should also limit TLAB size).&lt;/div&gt;&lt;div&gt;•&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;If object is larger than size of Eden space it also will be allocated in old space.&lt;/div&gt;&lt;div&gt;Unlike application objects, system objects are always allocated by JVM directly in permanent space.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;i&gt;&lt;span class="Apple-style-span" style="color: #444444;"&gt;Parallel execution&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;Most of task during young space collection can be done in parallel. If there are several CPUs available, JVM can utilize them to compress duration of stop-the-world pause during collection. Number of threads can be configured in HotSpot JVM by &lt;span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"&gt;–XX:ParallelGCTreads=&lt;i&gt;n&lt;/i&gt;&lt;/span&gt;&amp;nbsp;parameter. By default JVM will choose number of thread by number of available CPU. As expected, serial version of collector will ignore this parameter because it can use only one CPU. Using parallel collection reduces time of stop-the-world pause by factor close to number of physical cores.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Measuring stop-the-world pause for young collection&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Young space collection happens during stop-the-world pause (all non-GC-related threads in JVM are suspended). Wall clock time of stop-the-world pause is very important factor for applications (especially applications requiring fast response time). Parallel execution affects wall clock time of pause but not work effort to be done.&amp;nbsp;&lt;/div&gt;&lt;div&gt;Let’s summarize components of young GC pause. Total pause time can be written as:&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div class="MsoNormal"&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;T&lt;sub&gt;young&lt;/sub&gt; = T&lt;sub&gt;stack_scan&lt;/sub&gt; +&amp;nbsp;&lt;/span&gt;&lt;/i&gt;&lt;i&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 21px;"&gt;T&lt;sub&gt;card_scan&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;&amp;nbsp;+ T&lt;sub&gt;old&lt;/sub&gt;_&lt;sub&gt;scan&lt;/sub&gt;+ T&lt;sub&gt;copy&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-size: 14pt; line-height: 115%;"&gt; &lt;/span&gt;; there &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;T&lt;sub&gt;young&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-size: 14pt; line-height: 115%;"&gt; &lt;/span&gt;is total time of young GC pause, &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;T&lt;sub&gt;stack_scan&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-size: 14pt; line-height: 115%;"&gt; &lt;/span&gt;is time to scan root in stacks, &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;T&lt;sub&gt;old&lt;/sub&gt;_&lt;sub&gt;scan&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-size: 14pt; line-height: 115%;"&gt;&amp;nbsp; &lt;/span&gt;is time to scan roots in old space and &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;T&lt;sub&gt;copy&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt; is time to copy live objects &lt;span style="font-size: 14pt; line-height: 115%;"&gt;(1)&lt;/span&gt;.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;Thread stack are usually very small, so major factors affecting time of young GC is &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;T&lt;sub&gt;old&lt;/sub&gt;_&lt;sub&gt;scan&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-size: 14pt; line-height: 115%;"&gt; &lt;/span&gt;and &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;T&lt;sub&gt;copy&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;.&lt;/div&gt;&lt;div class="MsoNormal"&gt;Another important parameter is frequency of young GC. Period between young collections is mainly determined by application allocation rate (bytes per second) and size of Eden space.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;P&lt;sub&gt;young&lt;/sub&gt; = S&lt;sub&gt;eden&lt;/sub&gt; / R&lt;sub&gt;alloc&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;sub&gt;&lt;span style="font-size: 14pt; line-height: 115%;"&gt; &lt;/span&gt;&lt;/sub&gt;; there &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;P&lt;sub&gt;young&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&amp;nbsp; is period between young GC, &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;S&lt;sub&gt;eden&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt; is size of Eden and &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;R&lt;sub&gt;alloc&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt; is rate of memory allocations (bytes per second) &lt;span style="font-size: 14pt; line-height: 115%;"&gt;(2)&lt;/span&gt;.&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;T&lt;sub&gt;stack_scan &lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;– can be considered application specific constant.&lt;br /&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: &amp;quot;Times New Roman&amp;quot;,&amp;quot;serif&amp;quot;; font-size: 14.0pt; line-height: 115%; mso-ansi-language: EN-US; mso-bidi-font-size: 11.0pt; mso-bidi-language: EN-US; mso-fareast-font-family: &amp;quot;Times New Roman&amp;quot;; mso-fareast-language: EN-US;"&gt;T&lt;sub&gt;card_scan &lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-family: &amp;quot;Calibri&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 11.0pt; line-height: 115%; mso-ansi-language: EN-US; mso-bidi-font-family: &amp;quot;Times New Roman&amp;quot;; mso-bidi-language: EN-US; mso-fareast-font-family: &amp;quot;Times New Roman&amp;quot;; mso-fareast-language: EN-US;"&gt;– is proportional to size of old space. Literally, JVM have to check single byte of table for each 512 bytes of heap (e.g. 8G of heap -&amp;gt; 16m to scan).&lt;/span&gt;&lt;!--EndFragment--&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;T&lt;sub&gt;old&lt;/sub&gt;_&lt;sub&gt;scan&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 14pt; line-height: 115%;"&gt;&amp;nbsp; &lt;/span&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt;– is proportional to number of dirty cards in old space at the moment of young GC. If we assume that references to young space are distributed randomly in old space, then we can provide following formula for time of old space scanning.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;a href="http://1.bp.blogspot.com/-D9UacWPh_x4/TeYKoHMc-EI/AAAAAAAAI0s/3YWMMlhNOjA/s1600/blog-10.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/-D9UacWPh_x4/TeYKoHMc-EI/AAAAAAAAI0s/3YWMMlhNOjA/s1600/blog-10.png" /&gt;&lt;/a&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt;; there &lt;/span&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;S&lt;sub&gt;old&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt; is size of old space and &lt;/span&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;D&lt;/span&gt;&lt;/i&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt;, &lt;/span&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;k&lt;sub&gt;card&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt; and &lt;/span&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;n&lt;sub&gt;card&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt; are coefficients specific for application &lt;/span&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 14pt; line-height: 115%;"&gt;(3)&lt;/span&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt;.&lt;/span&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;T&lt;sub&gt;copy &lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;– is proportional to number of live objects in heap. We can approximate it by formula:&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-3YjJgH1l3Gc/TeYK9r5iNPI/AAAAAAAAI0w/nYXtPAliYys/s1600/blog-11.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-3YjJgH1l3Gc/TeYK9r5iNPI/AAAAAAAAI0w/nYXtPAliYys/s1600/blog-11.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt;There &lt;/span&gt;&lt;i&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;k&lt;sub&gt;copy&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;sub&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt; &lt;/span&gt;&lt;/sub&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt;is effort to copy object, &lt;/span&gt;&lt;i&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;R&lt;sub&gt;long_live&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt; is rate of allocation of long lived objects,&lt;b&gt;&amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif; font-size: 15px; line-height: 17px;"&gt;&lt;i&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 21px;"&gt;k&lt;sub&gt;survive&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif; font-size: 15px; line-height: 17px;"&gt;&amp;nbsp;=&amp;nbsp;&lt;/span&gt;&lt;i&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 21px;"&gt;R&lt;sub&gt;long_live&amp;nbsp;&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif; font-size: 15px; line-height: 17px;"&gt;/&amp;nbsp;&lt;/span&gt;&lt;i&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 21px;"&gt;R&lt;sub&gt;alloc&amp;nbsp;&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif; font-size: 15px; line-height: 17px;"&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt;(&lt;/span&gt;&lt;i&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;k&lt;sub&gt;survive&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt; usually very small), and &lt;/span&gt;&lt;i&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;k&lt;sub&gt;tenure&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt; is a coefficient to approximate aging of object in young space before tenuring (&lt;/span&gt;&lt;i&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;k&lt;sub&gt;tenure&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt; ≥ 1) &lt;/span&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 14pt; line-height: 115%;"&gt;(4)&lt;/span&gt;&lt;span style="font-family: Calibri, sans-serif; font-size: 11pt; line-height: 115%;"&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;Now we can analyze how various JVM options may affect time and frequency of young GC.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;b style="font-weight: bold;"&gt;Size of old space.&lt;/b&gt; Size of old space is affecting &lt;i&gt;&lt;span style="font-family: &amp;quot;Times New Roman&amp;quot;,&amp;quot;serif&amp;quot;; font-size: 14.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"&gt;T&lt;sub&gt;card_scan&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt; and &lt;i&gt;&lt;span style="font-family: &amp;quot;Times New Roman&amp;quot;,&amp;quot;serif&amp;quot;; font-size: 14.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"&gt;T&lt;sub&gt;old_scan&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt; part of young GC pause time according to formulas above. So we as we are increasing size of old space (read total heap size) time of young GC pauses will grow and it can be helped. After certain size of heap (usually 4-8 Gb) time of young collection is dominated by &lt;i&gt;&lt;span style="font-family: &amp;quot;Times New Roman&amp;quot;,&amp;quot;serif&amp;quot;; font-size: 14.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"&gt;T&lt;sub&gt;card_scan&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt; (technically &lt;i&gt;&lt;span style="font-family: &amp;quot;Times New Roman&amp;quot;,&amp;quot;serif&amp;quot;; font-size: 14.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"&gt;T&lt;sub&gt;copy&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt; can be even greater than &lt;i&gt;&lt;span style="font-family: &amp;quot;Times New Roman&amp;quot;,&amp;quot;serif&amp;quot;; font-size: 14.0pt; line-height: 115%; mso-bidi-font-size: 11.0pt;"&gt;T&lt;/span&gt;&lt;/i&gt;&lt;i&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 21px;"&gt;&lt;sub&gt;card_scan&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;, bur it usually can be controlled by tuning of GC options).&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;u&gt;&lt;br /&gt;&lt;/u&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;u&gt;HotSpot JVM options: &lt;/u&gt;&lt;u&gt;&lt;span style="font-family: 'Courier New';"&gt;-Xmx=&lt;i&gt;n&lt;/i&gt;, -Xms=&lt;i&gt;n&lt;/i&gt;&lt;/span&gt;&lt;/u&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;Size of Eden space.&lt;/b&gt; Period between young GC is proportional to size of Eden. &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;T&lt;sub&gt;copy&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt; is also proportional to size of eden but in practice &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;k&lt;sub&gt;survive&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt; can be so small that for some applications we can forget about Tcopy. Unfortunately time between young GC will also affect coefficient &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;D&lt;/span&gt;&lt;/i&gt; in equation (4). Though dependency between &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;D&lt;/span&gt;&lt;/i&gt; and &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;P&lt;sub&gt;young&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt; is very application specific, increasing &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;P&lt;sub&gt;young&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt; will increase &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;D&lt;/span&gt;&lt;/i&gt; and as a consequence &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;T&lt;sub&gt;scan_old&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;u&gt;&lt;br /&gt;&lt;/u&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;u&gt;HotSpot JVM options: &lt;/u&gt;&lt;u&gt;&lt;span style="font-family: 'Courier New';"&gt;-XX:NewSize=&lt;i&gt;n&lt;/i&gt;, -XX:MaxNewSize=&lt;i&gt;n&lt;/i&gt;&lt;/span&gt;&lt;/u&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;Size of survivor space.&lt;/b&gt; Size of survivor space puts hard limit of how much objects can stay in young space between collections. Changing size of survivor space may affect &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;k&lt;sub&gt;tenure &lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;(or mat not, e.g. if &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;k&lt;sub&gt;tenure &lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;is already 1).&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;u&gt;HotSpot JVM options: &lt;/u&gt;&lt;u&gt;&lt;span style="font-family: 'Courier New';"&gt;-XX:SurviorRatio=&lt;i&gt;n&lt;/i&gt;&lt;/span&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/u&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;Max tenuring threshold &lt;/b&gt;and&lt;b style="mso-bidi-font-weight: normal;"&gt; target survivor ratio.&lt;/b&gt; These two JVM options also allow artificially adjust &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;k&lt;sub&gt;tenure&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;u&gt;&lt;br /&gt;&lt;/u&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;u&gt;HotSpot JVM options: &lt;/u&gt;&lt;u&gt;&lt;span style="font-family: 'Courier New';"&gt;-XX:TargetSurviorRatio=&lt;i&gt;n&lt;/i&gt;, -XX:MaxTenuringThreshold=&lt;i&gt;n&lt;/i&gt; , -XX:+AlwaysTenure, -XX:+NeverTenure&lt;/span&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/u&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;&lt;br /&gt;&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;b style="mso-bidi-font-weight: normal;"&gt;Pretenuring threshold.&lt;/b&gt; For some applications using pretenuring threshold could reduce &lt;i style="mso-bidi-font-style: normal;"&gt;&lt;span style="font-family: 'Times New Roman', serif; font-size: 14pt; line-height: 115%;"&gt;k&lt;sub&gt;survive&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt; due to allocation of long lived object directly in old space.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;u&gt;&lt;br /&gt;&lt;/u&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;u&gt;HotSpot JVM options: &lt;/u&gt;&lt;u&gt;&lt;span style="font-family: 'Courier New';"&gt;-XX:PretenureThreshold=&lt;i&gt;n&lt;/i&gt;&lt;/span&gt;&lt;/u&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;u&gt;&lt;span style="font-family: 'Courier New';"&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/span&gt;&lt;/u&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;span style="font-family: 'Courier New';"&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;span style="font-family: inherit;"&gt;&lt;i&gt;(Next article "&lt;a href="http://blog.griddynamics.com/2011/06/understanding-gc-pauses-in-jvm-hotspots_02.html"&gt;Understanding GC pauses in JVM, HotSpot's CMS collector.&lt;/a&gt;")&lt;/i&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Calibri, sans-serif;"&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;span style="font-family: 'Courier New';"&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;span style="color: #0b5394; font-family: inherit;"&gt;&lt;i&gt;&lt;b&gt;Comming soon:&lt;/b&gt;&lt;/i&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;ul style="text-align: left;"&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="font-size: 15px; line-height: 17px;"&gt;&lt;i&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="color: #0b5394; font-family: inherit;"&gt;Tuning CMS collector for large IMDG storage nodes&lt;/span&gt;&lt;/b&gt;&lt;/i&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/7584924187610960896/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=7584924187610960896" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/7584924187610960896" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/7584924187610960896" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/KXTdeFZXyGc/understanding-gc-pauses-in-jvm-hotspots.html" title="Understanding GC pauses in JVM, HotSpot's minor GC." /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/-NwrMpaA2Wf8/TeYGgmdzLXI/AAAAAAAAI0c/GeBQSgDhXQE/s72-c/blog-6.png" height="72" width="72" /><thr:total>7</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2011/06/understanding-gc-pauses-in-jvm-hotspots.html</feedburner:origLink></entry><entry><id>tag:blogger.com,1999:blog-3946011063058389308.post-6101106840079275370</id><published>2011-05-31T02:18:00.000-07:00</published><updated>2011-05-31T09:06:02.188-07:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="cloud computing" /><category scheme="http://www.blogger.com/atom/ns#" term="openstack" /><category scheme="http://www.blogger.com/atom/ns#" term="scalability" /><category scheme="http://www.blogger.com/atom/ns#" term="~Andrey Brindeyev" /><title type="text">Running 200 VM instances on OpenStack Compute</title><content type="html">&lt;h2&gt;Introduction&lt;/h2&gt;&lt;p&gt;&lt;a href="http://openstack.org/projects/compute/"&gt;OpenStack Compute&lt;/a&gt; is an exciting piece of software. It is open, it is evolving rapidly and the community is terrific. However, as with any software that is new to the market, there’s little understanding and hard data on its performance. It’s scalability, for instance, is something that is only known through anecdotes around the watercooler. This post tells a story of a “baby cloud” running in one of our labs.&lt;/p&gt;&lt;p&gt;Our goal is to see how much we can stretch the software until it rips. In this installment, we’ll start with a very basic test — launching 200 virtual machines in parallel.&lt;/p&gt;&lt;h2&gt;Setup and first steps&lt;/h2&gt;&lt;p&gt;The test stand consists of three blades (two X5570 and 48G RAM each). All blades are diskless and SAN-backed. The exact hardware configuration is not relevant, since performance of individual VMs is not relevant for this kind of test.&lt;/p&gt;&lt;p&gt;Test image that was used for test is a stripped installation of RHEL 5.6, packed using QCOW2 (153Mb).&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;All the tests were performed using KVM-based RHEL 6.0 &lt;a href="http://yum.griddynamics.net/"&gt;version of OpenStack Compute&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;Here is our /etc/nova/nova.conf for cloud controller node: &lt;script src="https://gist.github.com/996182.js?file=nova.conf.sh"&gt;&lt;/script&gt;&lt;/p&gt;&lt;p&gt;We used the following script to run the VMs: &lt;script src="https://gist.github.com/996182.js?file=run200.sh"&gt;&lt;/script&gt;&lt;/p&gt;&lt;p&gt;We started with a trunk build 1058 of OpenStack Compute.&lt;/p&gt;&lt;h2&gt;First try&lt;/h2&gt;&lt;p&gt;After firing up the script for the first time, we noticed that 3-5% of all machines were not coming up — that is, showing as down when listing instances. This was not completely unexpected, yet still discouraging. Having ten machines out of two hundred failing on the spot is not a good thing.&lt;/p&gt;&lt;p&gt;The investigation showed that connections to Rabbit are frequently timing out. Turned out that we stumbled upon a known bug in the eventlet library: &lt;a href="https://bitbucket.org/which_linden/eventlet/issue/87/socket-connects-are-incorrectly-reported"&gt;socket connects are incorrectly reported as timed out&lt;/a&gt;. Patch to eventlet and &lt;a href="https://code.launchpad.net/~cbehrens/nova/rpc-improvements"&gt;improvements in RPC code&lt;/a&gt; helped us to eliminate the Rabbit timeout issue and bring down the error rate significantly. Kudos to Chris Behrens and Soren Hansen for the solution.&lt;/p&gt;&lt;h2&gt;Second try&lt;/h2&gt;&lt;p&gt;The error rate went down, but it was still above zero. Looked like we’ve stumbled on a race condition. Logs showed traces of exceptions about releasing locks. It turned out that the problem was in synchronization while creating VLANs. Simply adding a lock on the VLAN creation (&lt;a href="http://bazaar.launchpad.net/~hudson-openstack/nova/trunk/revision/1097"&gt;bzr1097&lt;/a&gt;) made the problem go away — kudos to Vish Ishaya.&lt;/p&gt;&lt;h2&gt;Third try&lt;/h2&gt;&lt;p&gt;Now all instances show up as running, but are they actually usable?&lt;/p&gt;&lt;p&gt;“Usable” may mean many things, depending on how thorough you are. We have some pretty sophisticated test suites that can verify if a cloud instance is healthy for most practical purposes. But in this case, a simple  ping of the VMs from outside to see if they respond was sufficiently informative.&lt;/p&gt;&lt;p&gt;The result was somewhat unexpected and bizarre: we could ping only 150 instances on network while API was indicating that all 200 were running. Run 150 instances… all 150 are responding. 180… 150 are responding. 151… again only 150 are responding. Looked like we stumbled on some sort of a limit.&lt;/p&gt;&lt;p&gt;Investigation showed that the problem is in dnsmasq, which has default DHCP lease cap at 150. Took some time to figure this one out, but very straightforward to fix — &lt;a href="http://bazaar.launchpad.net/~hudson-openstack/nova/trunk/revision/1100"&gt;bzr1100&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;Finally, the two hundred nodes were able to run without a hitch. Here’s one for the team!&lt;/p&gt;&lt;h2&gt;Some random data&lt;/h2&gt;&lt;p&gt;For those who want more data, here go a couple pretty graphs.&lt;/p&gt;&lt;p&gt;We ran two different configurations — one with two compute nodes, another with three (the third one being co-located with CC).&lt;/p&gt;&lt;p&gt;&lt;em&gt;Fig 1. Two compute nodes, dedicated cloud controller.&lt;/em&gt;&lt;a href="http://1.bp.blogspot.com/-TLeBzRVFlik/TeAVcH_WAmI/AAAAAAAABOc/8BgAwp8w2KY/s1600/two_nodes.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img style="display: block;cursor:pointer; cursor:hand;width: 320px; height: 178px;" src="http://1.bp.blogspot.com/-TLeBzRVFlik/TeAVcH_WAmI/AAAAAAAABOc/8BgAwp8w2KY/s320/two_nodes.png" border="0" id="BLOGGER_PHOTO_ID_5611508708628890210" /&gt;&lt;/a&gt;Total startup time: 12 minutes 6 seconds.&lt;/p&gt;&lt;p&gt;&lt;em&gt;Fig 2. Three compute nodes, co-located cloud controller.&lt;/em&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/-dT7SNGLxnVE/TeAVvJcQcZI/AAAAAAAABOk/QTLJNs4zpbY/s1600/three_nodes.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img style="display: block;cursor:pointer; cursor:hand;width: 320px; height: 178px;" src="http://3.bp.blogspot.com/-dT7SNGLxnVE/TeAVvJcQcZI/AAAAAAAABOk/QTLJNs4zpbY/s320/three_nodes.png" border="0" id="BLOGGER_PHOTO_ID_5611509035436110226" /&gt;&lt;/a&gt;&lt;br /&gt;Total startup time: 8 minutes 20 seconds.&lt;/p&gt;&lt;p&gt;Though this is hardly conclusive, co-locating cloud controller with one of the compute nodes is not necessary a bad idea, at least from the performance standpoint.&lt;/p&gt;&lt;h2&gt;Some random observations&lt;/h2&gt;&lt;ol&gt;&lt;li&gt;Final success rate is 100%, reproducible;&lt;/li&gt;&lt;li&gt;Most of startup time is spent injecting SSH keys through libguestfs;&lt;/li&gt;&lt;li&gt;To run one hundred VMs per host from the same image, only 24Gb of RAM and 3.4GB in /var/lib/nova is needed. Thanks to &lt;a href="http://www.linux-kvm.org/page/KSM"&gt;KSM&lt;/a&gt; and QCOW2!&lt;/li&gt;&lt;/ol&gt;&lt;h2&gt;Next steps&lt;/h2&gt;&lt;p&gt;Obviously, we are not settling on two hundred VMs run in parallel. This is still very much a work-in-progress and we’re raising the bar higher every time. Stay tuned for the next report from the trenches.&lt;/p&gt;&lt;h2&gt;P.S.&lt;/h2&gt;&lt;p&gt;By the time this post was written, we were able to go past 400 VMs. The team is working around another race condition as we speak.&lt;/p&gt;</content><link rel="replies" type="application/atom+xml" href="http://blog-archive.griddynamics.com/feeds/6101106840079275370/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://www.blogger.com/comment.g?blogID=3946011063058389308&amp;postID=6101106840079275370" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/6101106840079275370" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/3946011063058389308/posts/default/6101106840079275370" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/griddynamics/~3/M-Ni1goYrr0/running-200-vm-instances-on-openstack.html" title="Running 200 VM instances on OpenStack Compute" /><author><name>Unknown</name><email>noreply@blogger.com</email><gd:image rel="http://schemas.google.com/g/2005#thumbnail" width="16" height="16" src="https://img1.blogblog.com/img/b16-rounded.gif" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/-TLeBzRVFlik/TeAVcH_WAmI/AAAAAAAABOc/8BgAwp8w2KY/s72-c/two_nodes.png" height="72" width="72" /><thr:total>7</thr:total><feedburner:origLink>http://blog-archive.griddynamics.com/2011/05/running-200-vm-instances-on-openstack.html</feedburner:origLink></entry></feed>
