<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>I'd rather play golf</title>
	
	<link>http://iablog.sybase.com/paulley</link>
	<description>Thoughts on data management, autonomic computing, and self-managing database systems.</description>
	<lastBuildDate>Mon, 02 Nov 2009 15:57:57 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/IdRatherPlayGolf" type="application/rss+xml" /><feedburner:emailServiceId>IdRatherPlayGolf</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item>
		<title>Holistic approaches to query performance analysis</title>
		<link>http://feedproxy.google.com/~r/IdRatherPlayGolf/~3/76xMOrRCoz0/</link>
		<comments>http://iablog.sybase.com/paulley/2009/11/holistic-approaches-to-query-performance-analysis/#comments</comments>
		<pubDate>Mon, 02 Nov 2009 12:24:11 +0000</pubDate>
		<dc:creator>Glenn Paulley</dc:creator>
				<category><![CDATA[Cost models]]></category>
		<category><![CDATA[DBA Administration Tools]]></category>
		<category><![CDATA[Database Administration]]></category>
		<category><![CDATA[SQL Anywhere]]></category>
		<category><![CDATA[Self-managing database systems]]></category>
		<category><![CDATA[adaptive query execution]]></category>
		<category><![CDATA[data visualization]]></category>
		<category><![CDATA[disk performance]]></category>
		<category><![CDATA[disk subsystems]]></category>
		<category><![CDATA[Goetz Graefe]]></category>
		<category><![CDATA[graphical plans]]></category>
		<category><![CDATA[performance analysis]]></category>
		<category><![CDATA[Query optimization]]></category>
		<category><![CDATA[Shivnath Babu]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://iablog.sybase.com/paulley/?p=1417</guid>
		<description><![CDATA[Despite our efforts at making relational database systems such as SQL Anywhere self-managing, self-tuning, and self-healing, there remains the need to be able to diagnose and repair performance problems. In part, this requirement is due to the overall complexity of the optimization task. Query optimization is &#8211; still &#8211; an NP-hard problem and the input [...]]]></description>
			<content:encoded><![CDATA[<p>Despite our efforts at making relational database systems such as <a alt="SQL Anywhere embeddable self-managing DBMS" href="http://www.sybase.com/products/databasemanagement/sqlanywhere">SQL Anywhere</a> self-managing, self-tuning, and self-healing, there remains the need to be able to diagnose and repair performance problems. In part, this requirement is due to the overall complexity of the optimization task. Query optimization is &#8211; still &#8211; an NP-hard problem and the input to the optimization process includes various heuristics, particularly predicate selectivity estimation. However, performance problems are also due to the increasingly complex hardware environment that characterize computing today. In this article, I want to highlight two recent papers that attempt to diagnose problems of this kind. </p>
<p><span id="more-1417"></span>The first two articles [1,2] are work jointly authored by researchers at Duke University and IBM Almaden. The papers describe <strong>DIADS</strong>, a prototype performance diagnostic tool designed to assist in the diagnosis of performance problems when a database server &#8211; the authors use Postgres as a test system &#8211; utilizes a SAN (Storage Area Network) for its disk resources. The problem with a SAN is that it is a complex, independent system consisting of logical units of disk storage ( <em>pools</em> or <em>volumes</em> ) which typically services multiple application/database servers simultaneously. All too often SAN administration is done independently, forcing DBAs to treat a SAN as a &#8220;black box&#8221;.</p>
<p>Borisov et al. developed a prototype diagnostic system called DIADS which uses <em>annotated plan graphs</em> to illustrate the use of SAN resources with specific query plan operators. Moreover, DIADS uses configuration dependency analysis and <em>symptom signatures</em> to drill down in the detail of specific annotated plans. DIADS has the ability to compare access plans for the same query, and computes metrics based on the deviation from the mean times for each operator. SAN monitoring data, including physical and logical configuration details, component connectivity, configuration changes over time, and DBA-defined events are included in annotated query plans, enabling DBAs to analyze the actual performance characteristics of specific plan operators together with detailed SAN statistics in a single display. DIADS contains a knowledge base as well, enabling expert-system analysis of plan operator performance degradation through the tracking of correlated and dependent plan operators, correlated operator cardinalities, and the inclusion of a <em>symptoms database</em> to help in the analysis of <em>cause</em> versus <em>effect</em>.</p>
<p>The second work is by Goetz Graefe, Harumi Kuno, and Janet Wiener of HP Labs. In their work, the authors study the problem of determining the level of robustness in a query execution engine &#8211; that is, the ability of a server to deliver consistent performance across a variety of unexpected run-time conditions: for example, errors in cardinality estimation or resource contention. The authors argue that query execution robustness is as important as the underlying fundamentals of the query operators themselves &#8211; and I couldn&#8217;t agree more. </p>
<p>The authors&#8217; approach is to describe the robustness of a plan operator visually using <em>plan robustness maps</em>. These two- or three-dimensional graphs can then be used to reason about how particular execution strategies degrade as the amount of work increases or as system resources become constrained:</p>
<blockquote><p>
Reflecting on the visualization techniques employed here, these diagrams enable rapid verification of expected performance, testing of hypotheses, and insight into the absolute and relative performance of alternative query execution plans. Moreover, even for this very simple query, there is a plethora of query execution plans. Investigating many plans over a parameter space with multiple dimensions is possible only with efficient visualizations.
</p></blockquote>
<p>This work provides an interesting perspective of query execution performance that compliments other work that addresses optimization quality, or dynamic re-optimization of SQL requests on-the-fly: that is, the ability of the system&#8217;s optimizer to find the optimal plan for a specific set of system parameters.</p>
<p>[1] Nedyalko Borisov, Shivnath Babu, Sandeep Uttamchandani, Ramani Routray, and Aameek Singh (January 2009). <a href="http://arxiv.org/PS_cache/arxiv/pdf/0907/0907.3183v1.pdf">Why did my query slow down?</a> In Proceedings, 4th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, California.</p>
<p>[2] Nedyalko Borisov, Shivnath Babu, Sandeep Uttamchandani, Ramani Routray, and Aameek Singh (February 2009). <a href="http://www.usenix.org/events/fast09/tech/full_papers/babu/babu.pdf">DIADS: Addressing the “My-Problem-or-Yours” Syndrome with Integrated SAN and Database Diagnosis</a>. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST&#8217;09), San Francisco, California.</p>
<p>[3] Goetz Graefe, Harumi Kuno, and Janet L. Wiener (January 2009). <a href="http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_82.pdf">Visualizing the robustness of query execution</a>. In Proceedings, 4th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, California.</p>
<img src="http://feeds.feedburner.com/~r/IdRatherPlayGolf/~4/76xMOrRCoz0" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://iablog.sybase.com/paulley/2009/11/holistic-approaches-to-query-performance-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://iablog.sybase.com/paulley/2009/11/holistic-approaches-to-query-performance-analysis/</feedburner:origLink></item>
		<item>
		<title>Disk failures in the real world</title>
		<link>http://feedproxy.google.com/~r/IdRatherPlayGolf/~3/dfqtavshc-c/</link>
		<comments>http://iablog.sybase.com/paulley/2009/10/disk-failures-in-the-real-world/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 01:47:09 +0000</pubDate>
		<dc:creator>Glenn Paulley</dc:creator>
				<category><![CDATA[Database Administration]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Operating systems]]></category>
		<category><![CDATA[Andrea Arpaci-Dusseau]]></category>
		<category><![CDATA[data corruption]]></category>
		<category><![CDATA[data reliability]]></category>
		<category><![CDATA[disk errors]]></category>
		<category><![CDATA[disk performance]]></category>
		<category><![CDATA[disk subsystems]]></category>
		<category><![CDATA[file system corruption]]></category>
		<category><![CDATA[file systems]]></category>
		<category><![CDATA[Jeffrey F. Naughton]]></category>
		<category><![CDATA[MTBF]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Remzi Arpaci-Dusseau]]></category>
		<category><![CDATA[transient disk errors]]></category>
		<category><![CDATA[unrecoverable disk errors]]></category>
		<category><![CDATA[UREs]]></category>

		<guid isPermaLink="false">http://iablog.sybase.com/paulley/?p=1371</guid>
		<description><![CDATA[One thing that the sheer scale of the computing landscape has contributed to the field of Computer Science is the opportunity to study these systems statistically &#8211; and in particular to prove or disprove various aspects of hardware and software reliability. 
With respect to disk drives, several large studies of disk drive reliability [2,3,4,7] have [...]]]></description>
			<content:encoded><![CDATA[<p>One thing that the sheer scale of the computing landscape has contributed to the field of Computer Science is the opportunity to study these systems statistically &#8211; and in particular to prove or disprove various aspects of hardware and software reliability. </p>
<p>With respect to disk drives, several large studies of disk drive reliability [2,3,4,7] have been published in the last few years. In particular, the study done at Google [4] showed a steep increase in failure rates &#8211; to between 6 and 10 percent &#8211; once a drive passed three years of usage, an interesting point since many disk drive manufacturers offer three-year warranties. Their study also showed a lower correlation between heat and drive failure in later-model drives, something that James Hamilton has written <a href="http://perspectives.mvdirona.com/2009/06/15/PUEAndTotalPowerUsageEfficiencyTPUE.aspx">about</a> recently in the push towards using less air conditioning within data centers. Recently at FAST 2009, Alyssa Henry of Amazon [6] spoke in her conference keynote that, at Amazon, the Amazon Simple Storage (EC3) data service sees a hard disk failure rate of 3-5 percent per year across the board, though I am sure, given Google&#8217;s survey results, that Amazon&#8217;s failure experience is not uniformly distributed across all disk drive manufacturers. Iliadis and Hu [3] believe that the trend towards lower-cost magnetic media results in higher failure rates, a <a href="http://iablog.sybase.com/paulley/2008/05/its-11-oclock-do-you-know-where-your-data-is/">conclusion also reached [7] by Remzi Arpaci-Dusseau</a> and his team at the University of Wisconsin in Madison. To some extent, at least, you do get what you pay for.</p>
<p><span id="more-1371"></span>The actual failure rates reported in these studies is vastly different from the reliability metrics offered by disk drive manufacturers. Moreover, disk hardware failure is only part of the story. <a href="http://iablog.sybase.com/paulley/2008/05/its-11-oclock-do-you-know-where-your-data-is/">Previous work</a> by Remzi Arpaci-Dusseau and his research team at Wisconsin found that <strong>transient</strong> errors with magnetic disk media were commonplace. Here is a quote from the <a href="http://www.usenix.org/events/lsf08/tech/lsf08sums.pdf">summary</a> of the Linux Storage &#038; Filesystem Workshop, held in San Jose in February 2008:</p>
<blockquote><p>
Ric Wheeler (<em>aside: now with RedHat</em>) introduced the perennial error-handling topic with the comment that bad sector handling had markedly improved over the “total disaster” it was in 2007. He moved on to silent data corruption and noted that the situation here was improving with data checksumming now being built into filesystems (most notably BTRFS and XFS) and emerging support for T10 DIF. The “forced unmount” topic provoked a lengthy discussion, with James Bottomley claiming that, at least from a block point of view, everything should just work (surprise ejection of USB storage was cited as the example). Ric countered that NFS still doesn’t work and others pointed out that even if block I/O works, the filesystem might still not release the inodes. Ted Ts’o closed the debate by drawing attention to a yet to be presented paper at FAST ’08 showing over <strong>1,300 cases where errors were dropped or lost in the block and filesystem layers</strong>. (<em>emphasis added</em>)
</p></blockquote>
<p>Reference [5] below studies the lack or mis-reporting of both transient and &#8220;hard&#8221; filesystem errors across several filesystems. Here is the first paragraph of the paper&#8217;s abstract:</p>
<blockquote><p>
The reliability of file systems depends in part on how well they propagate errors. We develop a static analysis technique, EDP, that analyzes how file systems and storage device drivers propagate error codes. Running our EDP analysis on all file systems and 3 major storage device drivers in Linux 2.6, we find that errors are often incorrectly propagated; 1153 calls (13%) drop an error code without handling it.
</p></blockquote>
<p>Write caching or out-of-order writes can cause additional problems. The use of EXT3 on Linux systems, in particular, can result in a corrupt filesystem upon a catastrophic hardware failure due to EXT3&#8217;s <a href="http://en.wikipedia.org/wiki/Ext3">lack of support for checksumming when writing to the journal</a> &#8211; which is supported in EXT4.</p>
<p>Arpaci-Dusseau and his research team at Wisconsin have just recently taken this error analysis to the next level [1]. They purposefully and systematically introduced errors into a MySQL database to determine the server&#8217;s ability to recover from the sorts of hard and transient failures known to occur on the filesystems studied previously. Their results, coupled with the sweeping disk failure studies mentioned above, should give all DBAs reason to worry. I would encourage DBAs to review the papers below.</p>
<p>And keep those backups handy.</p>
<p>[1] Sriram Subramanian, Yupu Zhang, Rajiv Vaidyanathan, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Jeffrey F. Naughton (April 2010). <a href="http://www.cs.wisc.edu/adsl/Publications/corrupt-mysql-icde10.pdf">Impact of Disk Corruption on Open-Source DBMS</a>. In Proceedings, 2010 IEEE International Conference on Data Engineering, Long Beach, California. To appear.</p>
<p>[2] Bianca Schroeder and Garth A. Gibson (February 2007). <a href="http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html">Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?</a> In Proceedings, 5th USENIX Conference on File and Storage Technologies, San Jose, California, pp. 1-16.</p>
<p>[3] Ilias Iliadis and Xiao-Yu Hu (June 2008). <a href="http://www.computer.org/portal/web/csdl/doi/10.1109/NAS.2008.20">Reliability Assurance of RAID Storage Systems for a Wide Range of Latent Sector Errors</a>. Proceedings of the International Conference on Networking, Architecture, and Storage, Chongqing, China. IEEE Computer Society, ISBN 978-0-7695-3187-8.</p>
<p>[4] Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz André Barroso (February 2007). <a href="http://www.usenix.org/events/fast07/tech/pinheiro.html">Failure Trends in a Large Disk Drive Population</a>. In Proceedings, 5th USENIX Conference on File and Storage Technologies, San Jose, California, pp. 1-16.</p>
<p>[5] Haryadi S. Gunawi, Cindy Rubio-González, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Ben Liblit (February 2008). <a href="http://www.usenix.org/events/fast08/tech/gunawi.html">EIO: Error Handling is Occasionally Correct.</a> In Proceedings, 6th USENIX Conference on File and Storage Technologies, San Jose, California, pp. 207-222.</p>
<p>[6] Alyssa Henry (February 2009). <a href="http://www.usenix.org/media/events/fast09/tech/videos/henry.mov">Cloud Storage FUD (Failure, Uncertainty, and Durability)</a>. Keynote address, 7th USENIX Conference on File and Storage Technologies, San Francisco, California.</p>
<p>[7] Lakshmi N. Bairavasundaram, Garth R. Goodson, Bianca Schroeder, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau (February 2008). <a href="http://www.usenix.org/events/fast08/tech/bairavasundaram.html">An Analysis of Data Corruption in the Storage Stack</a>. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST ’08), San Jose, California, pp. 223–238. </p>
<img src="http://feeds.feedburner.com/~r/IdRatherPlayGolf/~4/dfqtavshc-c" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://iablog.sybase.com/paulley/2009/10/disk-failures-in-the-real-world/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://iablog.sybase.com/paulley/2009/10/disk-failures-in-the-real-world/</feedburner:origLink></item>
		<item>
		<title>SMDB 2010 paper submission deadline extension</title>
		<link>http://feedproxy.google.com/~r/IdRatherPlayGolf/~3/Ucc3AD9EZxI/</link>
		<comments>http://iablog.sybase.com/paulley/2009/10/smdb-2010-paper-submission-deadline-extension/#comments</comments>
		<pubDate>Thu, 22 Oct 2009 17:47:14 +0000</pubDate>
		<dc:creator>Glenn Paulley</dc:creator>
				<category><![CDATA[Computer Science education]]></category>
		<category><![CDATA[Self-managing database systems]]></category>
		<category><![CDATA[IEEE Data Engineering]]></category>
		<category><![CDATA[Shivnath Babu]]></category>
		<category><![CDATA[SMDB 2010]]></category>

		<guid isPermaLink="false">http://iablog.sybase.com/paulley/?p=1399</guid>
		<description><![CDATA[The forthcoming 2010 Workshop on Self-managing Database Systems will be held on March 1, 2010, collocated with the 2010 IEEE Data Engineering conference in Long Beach, California. 
In early October, Shivnath Babu, PC chair for SMDB 2010, along with the Executive Committee of the IEEE Workgroup on Self-managing Database Systems announced an extension to the [...]]]></description>
			<content:encoded><![CDATA[<p>The forthcoming <a href="http://www.cs.duke.edu/smdb10/">2010 Workshop on Self-managing Database Systems</a> will be held on March 1, 2010, collocated with the <a href="http://www.icde2010.org/">2010 IEEE Data Engineering conference</a> in Long Beach, California. </p>
<p>In early October, <a href="http://www.cs.duke.edu/~shivnath/">Shivnath Babu</a>, PC chair for SMDB 2010, along with the Executive Committee of the <a href="http://db.uwaterloo.ca/tcde-smdb/">IEEE Workgroup on Self-managing Database Systems</a> announced an extension to the paper submission deadline from today, October 22, to <strong>November 23, 2009</strong> to accommodate conflicting deadlines with the Eurosys and ACM SIGMOD conferences.</p>
<img src="http://feeds.feedburner.com/~r/IdRatherPlayGolf/~4/Ucc3AD9EZxI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://iablog.sybase.com/paulley/2009/10/smdb-2010-paper-submission-deadline-extension/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://iablog.sybase.com/paulley/2009/10/smdb-2010-paper-submission-deadline-extension/</feedburner:origLink></item>
		<item>
		<title>Using RowGenerator – part deux</title>
		<link>http://feedproxy.google.com/~r/IdRatherPlayGolf/~3/QpwYgIp-cAo/</link>
		<comments>http://iablog.sybase.com/paulley/2009/10/using-rowgenerator-part-deux/#comments</comments>
		<pubDate>Fri, 16 Oct 2009 22:25:44 +0000</pubDate>
		<dc:creator>Glenn Paulley</dc:creator>
				<category><![CDATA[SQL Anywhere]]></category>
		<category><![CDATA[SQL Standard]]></category>
		<category><![CDATA[ANSI]]></category>
		<category><![CDATA[bill-of-materials]]></category>
		<category><![CDATA[common table expression]]></category>
		<category><![CDATA[hierarchical query]]></category>
		<category><![CDATA[Jan-Eike Michels]]></category>
		<category><![CDATA[recursive query]]></category>
		<category><![CDATA[recursive UNION]]></category>
		<category><![CDATA[RowGenerator]]></category>
		<category><![CDATA[sa_rowgenerator]]></category>
		<category><![CDATA[sa_rowgenerator system procedure]]></category>

		<guid isPermaLink="false">http://iablog.sybase.com/paulley/?p=1387</guid>
		<description><![CDATA[After posting my recent article on the use of the RowGenerator system table, I received a welcome email from Jan-Eike Michels of IBM who, like me, sits on the DM32.2 committee for INCITS as the IBM representative for the SQL Standard:

Hi Glenn,
Just stumbled across your blog about the RowGenerator (http://iablog.sybase.com/paulley/2009/09/using-rowgenerator/) . I don&#8217;t know whether [...]]]></description>
			<content:encoded><![CDATA[<p>After <a href="http://iablog.sybase.com/paulley/2009/09/using-rowgenerator/">posting</a> my recent article on the use of the RowGenerator system table, I received a welcome email from Jan-Eike Michels of IBM who, like me, sits on the DM32.2 committee for <a href="http://www.incits.org/">INCITS</a> as the IBM representative for the SQL Standard:</p>
<blockquote><p>
Hi Glenn,</p>
<p>Just stumbled across your blog about the RowGenerator (http://iablog.sybase.com/paulley/2009/09/using-rowgenerator/) . I don&#8217;t know whether iAnywhere supports the WITH clause but (since the standard does) you could use that one as well (similar to your sa_rowgenerator procedure):</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">WITH</span> dummy <span style="color: #66cc66;">&#40;</span>counter<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> 
    <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> counter <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #993333; font-weight: bold;">TABLE</span><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">VALUES</span> <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> x<span style="color: #66cc66;">&#40;</span>counter<span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">UNION</span> <span style="color: #993333; font-weight: bold;">ALL</span>
     <span style="color: #993333; font-weight: bold;">SELECT</span> counter <span style="color: #66cc66;">+</span> <span style="color: #cc66cc;">1</span> <span style="color: #993333; font-weight: bold;">FROM</span> dummy <span style="color: #993333; font-weight: bold;">WHERE</span> counter <span style="color: #66cc66;">&lt;</span> <span style="color: #cc66cc;">1000</span> <span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">SELECT</span> counter <span style="color: #993333; font-weight: bold;">FROM</span> dummy</pre></td></tr></table></div>

<p>would return 1000 rows.
</p></blockquote>
<p>I welcomed Jan-Eike&#8217;s contribution because, as he quite rightly points out, it is straightforward to generate a set of identifiers recursively using the SQL standard&#8217;s common table expression syntax, in this case using the recursive <code>UNION</code> construction.</p>
<p>One can use Jan-Eike&#8217;s example almost verbatim in <a alt="SQL Anywhere self-managing and embeddable DBMS" href="http://www.sybase.com/products/databasemanagement/sqlanywhere">SQL Anywhere</a>. The issues with Jan-Eike&#8217;s SQL query are:</p>
<ul>
<li>In SQL Anywhere, one must include the <code>RECURSIVE</code> keyword when specifying a recursive query;
<li>SQL Anywhere servers do not recognize the <code>TABLE</code> keyword; and
<li>SQL Anywhere already contains a (real) table, DUMMY, that generates a single-row, single-column result set.
</ul>
<p>So here is a version of Jan-Eike&#8217;s example that generates the values between 1 and 10 in SQL Anywhere:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>6
7
8
9
10
</pre></td><td class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">WITH</span> RECURSIVE foo<span style="color: #66cc66;">&#40;</span>counter<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span>
  <span style="color: #66cc66;">&#40;</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #cc66cc;">1</span> <span style="color: #993333; font-weight: bold;">FROM</span> DUMMY
     <span style="color: #993333; font-weight: bold;">UNION</span> <span style="color: #993333; font-weight: bold;">ALL</span>
   <span style="color: #993333; font-weight: bold;">SELECT</span> counter <span style="color: #66cc66;">+</span> <span style="color: #cc66cc;">1</span> <span style="color: #993333; font-weight: bold;">FROM</span> foo <span style="color: #993333; font-weight: bold;">WHERE</span> counter <span style="color: #66cc66;">&lt;</span> <span style="color: #cc66cc;">10</span> <span style="color: #66cc66;">&#41;</span>
<span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> foo</pre></td></tr></table></div>

<p>that defines the common table expression &#8220;foo&#8221; (instead of &#8220;dummy&#8221;) and generates the specified values. The graphical plan for this query is as follows:</p>
<p><img src="http://iablog.sybase.com/paulley/wp-content/uploads/2009/10/recursive_row_generator.png" alt="recursive_row_generator" title="recursive_row_generator" width="497" height="719" class="aligncenter size-full wp-image-1391" /></p>
<p>Some points to mention:</p>
<ul>
<li>Specifying a larger number of values &#8211; and hence a deeper level of recursion &#8211; may require setting the <code>MAX_RECURSIVE_ITERATIONS</code> connection option to a higher value.
<li>Jan-Eike&#8217;s example generates  a sequential set of values, equivalent to what the RowGenerator system table or the sa_rowgenerator() system procedure generates. However, one could modify this query to generate a non-contiguous sequence of any values desired, simply by rewriting the <code>SELECT</code> list expressions in the common table expression.
<li>Finally, while this recursive version does have utility, the RowGenerator system table may still be a better approach. The advantage of RowGenerator is that it is a (static) base table; hence the query optimizer is much better able to estimate the cardinality of intermediate results when RowGenerator is used within a complex query than when a common table expression is used.
</ul>
<img src="http://feeds.feedburner.com/~r/IdRatherPlayGolf/~4/QpwYgIp-cAo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://iablog.sybase.com/paulley/2009/10/using-rowgenerator-part-deux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://iablog.sybase.com/paulley/2009/10/using-rowgenerator-part-deux/</feedburner:origLink></item>
		<item>
		<title>Differences between jConnect and the iAnywhere JDBC driver – part trois</title>
		<link>http://feedproxy.google.com/~r/IdRatherPlayGolf/~3/lNthSxuW7vU/</link>
		<comments>http://iablog.sybase.com/paulley/2009/10/differences-between-jconnect-and-the-ianywhere-jdbc-driver-part-trois-2/#comments</comments>
		<pubDate>Thu, 15 Oct 2009 21:03:30 +0000</pubDate>
		<dc:creator>Glenn Paulley</dc:creator>
				<category><![CDATA[SQL Anywhere]]></category>
		<category><![CDATA[iAnywhere JDBC driver]]></category>
		<category><![CDATA[jConnect]]></category>
		<category><![CDATA[JDBC]]></category>
		<category><![CDATA[ODBC]]></category>
		<category><![CDATA[ODBC driver]]></category>

		<guid isPermaLink="false">http://iablog.sybase.com/paulley/?p=1382</guid>
		<description><![CDATA[Today my colleague Jason Hinsperger posted a summary of the JDBC URL&#8217;s, DLLs, and classnames (along with code samples) for connecting to a SQL Anywhere database via the iAnywhere JDBC driver.
In his post, Jason mentions the availability of a new Type 2 JDBC driver in the SQL Anywhere 11.0.1 release which does not require ODBC [...]]]></description>
			<content:encoded><![CDATA[<p>Today my colleague <a href="http://iablog.sybase.com/hinsperg">Jason Hinsperger</a> posted a <a href="http://iablog.sybase.com/hinsperg/2009/10/connecting-to-sql-anywhere-using-jdbc/">summary</a> of the JDBC URL&#8217;s, DLLs, and classnames (along with code samples) for connecting to a <a alt="SQL Anywhere self-managing database system" href="http://www.sybase.com/products/databasemanagement/sqlanywhere">SQL Anywhere</a> database via the iAnywhere JDBC driver.</p>
<p>In his post, Jason mentions the availability of a <strong>new</strong> Type 2 JDBC driver in the SQL Anywhere 11.0.1 release which does <strong>not</strong> require ODBC to be installed on the machine, significantly simplifying the deployment of JDBC applications on non-Windows platforms with SQL Anywhere and offering a slight performance boost to boot. The option of using this new Type 2 driver (sajdbc.jar) is not reflected in the product documentation, as this driver was made available beginning with Version 11.0.1 builds after 2250.</p>
<p>Jason also mentions that our forthcoming Innsbruck release will offer JDBC 4.0 support, which will eliminate the need to register the driver from within the application.</p>
<img src="http://feeds.feedburner.com/~r/IdRatherPlayGolf/~4/lNthSxuW7vU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://iablog.sybase.com/paulley/2009/10/differences-between-jconnect-and-the-ianywhere-jdbc-driver-part-trois-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://iablog.sybase.com/paulley/2009/10/differences-between-jconnect-and-the-ianywhere-jdbc-driver-part-trois-2/</feedburner:origLink></item>
		<item>
		<title>IMS is still the king</title>
		<link>http://feedproxy.google.com/~r/IdRatherPlayGolf/~3/gdV57gEFtNM/</link>
		<comments>http://iablog.sybase.com/paulley/2009/10/ims-is-still-the-king/#comments</comments>
		<pubDate>Fri, 09 Oct 2009 20:53:29 +0000</pubDate>
		<dc:creator>Glenn Paulley</dc:creator>
				<category><![CDATA[IMS]]></category>

		<guid isPermaLink="false">http://iablog.sybase.com/paulley/?p=1367</guid>
		<description><![CDATA[Eighteen months ago I wrote how IBM&#8217;s IMS was, after 44 years in production, still in use across much of the corporate business world. This week offered some additional proof of the continued reliance by big business on IBM mainframe systems by the anti-trust probe launched by the US Department of Justice on behalf of [...]]]></description>
			<content:encoded><![CDATA[<p>Eighteen months ago I <a href="http://iablog.sybase.com/paulley/2008/04/has-much-changed-in-thirty-years/">wrote</a> how IBM&#8217;s IMS was, after 44 years in production, still in use across much of the corporate business world. This week offered some additional proof of the continued reliance by big business on IBM mainframe systems by the <a href="http://www.theglobeandmail.com/news/technology/us-probes-ibm-for-antitrust-violations/article1316411/">anti-trust probe</a> launched by the US Department of Justice on behalf of the <a href="http://www.ccianet.org/">Computer and Communications Industry Association</a> (CCIA), who submitted a complaint. The CCIA spells out the reasons for this complaint <a href="http://www.ccianet.org/index.asp?bid=41">here</a>.</p>
<p>Unsurprisingly news coverage of this probe has been widespread (see <a href="http://www.computerworld.com/s/article/9139137/Update_IBM_faces_DOJ_antitrust_probe_on_mainframes?taxonomyId=154&#038;pageNumber=1">here</a> and <a href="http://community.zdnet.co.uk/blog/0,1000000567,10014128o-114626b,00.htm">here</a>). The current press coverage notwithstanding, I would argue that we are a long, long way from anything of consequence from this &#8220;probe&#8221;. The United States government attempted to restrict IBM&#8217;s business practices during the late 1970&#8217;s and early 1980&#8217;s with a long, drawn-out anti-trust court case that, eventually, was dropped by the Reagan administration. Withdrawing from the case subsequently prompted the lead government counsel to write a <a href="http://www.amazon.com/Big-Blue-IBMs-Abuse-Power/dp/0396085156/ref=sr_1_10?ie=UTF8&#038;s=books&#038;qid=1255121171&#038;sr=8-10">book</a> [1] about the wealth of evidence in the case &#8211; at least, in his opinion. I read Delamarter&#8217;s book for the first time in early 1987 and even today it&#8217;s still a great read.</p>
<p>The players in this case are different, except for the two that matter the most. But I would not count on a different outcome.</p>
<p>[1] Richard T. Delamarter (October 1986). Big Blue: IBM&#8217;s Use and Abuse of Power. Dodd Mead; 1st edition. ISBN 978-0396085157.</p>
<img src="http://feeds.feedburner.com/~r/IdRatherPlayGolf/~4/gdV57gEFtNM" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://iablog.sybase.com/paulley/2009/10/ims-is-still-the-king/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://iablog.sybase.com/paulley/2009/10/ims-is-still-the-king/</feedburner:origLink></item>
		<item>
		<title>Differences between jConnect and the iAnywhere JDBC driver – part deux</title>
		<link>http://feedproxy.google.com/~r/IdRatherPlayGolf/~3/JNGpSXdf-Jg/</link>
		<comments>http://iablog.sybase.com/paulley/2009/10/differences-between-jconnect-and-the-ianywhere-jdbc-driver-part-deux/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 18:59:42 +0000</pubDate>
		<dc:creator>Glenn Paulley</dc:creator>
				<category><![CDATA[SQL Anywhere]]></category>
		<category><![CDATA[iAnywhere JDBC driver]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[jConnect]]></category>
		<category><![CDATA[JDBC]]></category>

		<guid isPermaLink="false">http://iablog.sybase.com/paulley/?p=1344</guid>
		<description><![CDATA[In a previous post I briefly described some of the differences between the jConnect JDBC driver and the iAnywhere JDBC driver when used with SQL Anywhere. A whitepaper on sybase.com summarizes the architectural differences between the two drivers.
Both the jConnect and iAnywhere drivers support JDBC 3.0. jConnect is a &#8220;pure Java&#8221; solution (termed a Type [...]]]></description>
			<content:encoded><![CDATA[<p>In a previous <a href="http://iablog.sybase.com/paulley/2009/03/differences-between-jconnect-and-the-ianywhere-jdbc-driver-part-un/">post</a> I briefly described some of the differences between the jConnect JDBC driver and the iAnywhere JDBC driver when used with <a alt="SQL Anywhere embeddable self-managing database system" href="http://www.sybase.com/products/databasemanagement/sqlanywhere">SQL Anywhere</a>. A <a href="http://www.sybase.com/detail?id=1037304">whitepaper</a> on sybase.com summarizes the architectural differences between the two drivers.</p>
<p>Both the jConnect and iAnywhere drivers support JDBC 3.0. jConnect is a &#8220;pure Java&#8221; solution (termed a Type 4 JDBC driver), while the iAnywhere driver is a Type 1 driver because of its reliance on the SQL Anywhere ODBC driver which must be properly installed.  It is sometimes argued that a &#8220;pure Java&#8221; solution is better/faster/more robust; hence, on paper, jConnect should be &#8220;better&#8221; than the iAnywhere Type 1 driver. However, if one looks more closely, the significant differences between the two solutions are (1) memory management, (2) the use of the TDS wire protocol, and (3) differences in semantics. We look at each of these in turn.</p>
<h3>Memory management</h3>
<p>With a pure Java solution:</p>
<ul>
<li>All objects are managed by the Java virtual machine.
<li>Java garbage collection cleans things up automatically. The application programmer does not have to worry about objects sticking around indefinitely, memory leaks or objects disappearing while still in use.
</ul>
<p>Unfortunately, the weakness of pure Java solutions is the same: <em>memory management</em>. The application programmer has little or no real control over the lifespan of an object. Moreover, the programmer has no effective control over garbage collection; garbage collection can kick in at critical times, resulting in random and unreproducible performance problems. </p>
<p><span id="more-1344"></span>With a hybrid solution such as the iAnywhere JDBC driver, the most important advantage is <em>memory management</em>:</p>
<ul>
<li>The programmer retains full control over non-Java objects.
<li>Garbage collection can be prevented or postponed by non-Java references to Java objects.
</ul>
<p>However, as with pure Java, the greatest <em>disadvantage</em> of a hybrid solution is &#8211; you guessed it &#8211; <strong>also</strong> <em>memory management</em>. In the hybrid case, non-Java objects need to be managed explicitly; program errors lead to memory leaks at best, and memory corruption or GPFs at worst. Moreover, if Java object references are held too long, Java garbage collection won’t kick in.</p>
<h3>CMDSEQ versus TDS</h3>
<p>jConnect uses Sybase ASE&#8217;s native wire protocol, the Tabular Data Stream (TDS) protocol, whereas the iAnywhere JDBC driver uses SQL Anywhere&#8217;s native wire protocol which is called Command Sequence (CMDSEQ). There are both semantic and performance differences between the use of the two protocols; each has advantages and disadvantages.</p>
<p>An advantage of TDS is that it supports &#8220;fire hose&#8221; cursors. That is, with a single TDS language command token one can instruct the server to execute a set of statements, describe all the result sets, and return all the results in one go to the client. In situations where the application desires all of the rows of a result set(s), a fire hose cursor does offer a performance advantage by reducing the amount of round-trip traffic over the wire. However, this comes at a cost: it is the client that is responsible for caching the result set, and the client that must implement cursor scrolling. The TDS client supports a  &#8220;window&#8221; of rows in the result set that the Java application can scroll through &#8211; both forwards and backwards. However, should scrolling occur to a range of rows outside this window the entire request is <strong>re-issued</strong> to the server &#8211; necessary since prior rows outside the &#8220;window&#8221; have been lost. Hence, in this model with scrollable cursors, cursor sensitivity semantics are impossible to guarantee. Moreover, with very large result sets the communication stream can become blocked if the client cannot process the returned rows quickly enough, which can, in turn, block the server.</p>
<p>While fire-hose cursors give an advantage to jConnect connections under the right circumstances, recently-added support for adaptive prefetching in CMDSEQ (see below) mitigates this advantage. Moreover, there are several additional features supported by the iAnywhere JDBC driver that provide advantages over jConnect. These include:</p>
<ul>
<li>TDS is limited to TCP/IP, even for local connections, while the iAnywhere JDBC driver can use either TCP/IP or shared memory. This means that when using jConnect applications cannot automatically start and stop local database servers, since this is supported only with shared memory connections.
<li>Strong encryption support, including RSA, RSA-FIPS, and ECC encryption technologies.
<li>Complete server-side cursor support. jConnect does not support server-side cursors; it implements a cursor on the client-side by retrieving the entire result set across the network, even if the client will only use a small number of rows from that result set. When using jConnect, application programmers must be careful to write their SQL queries to return the smallest result set necessary, rather than rely on FETCHing only the first few rows, since the entire result set is sent to the client with each SQL request.
<li>Complete AppInfo support. jConnect truncates AppInfo details.
<li>Integrated logins on Windows platforms.
<li>Richer batch SQL statement support &#8211; for example, wide (batch) inserts and wide fetches. With SQL Anywhere, jConnect only fully supports wide fetches. jConnect does support wide inserts from the application, which reduces the amount of network traffic required, but on the server TDS wide inserts are <em>simulated</em>, with each row initiating a separate <code>INSERT</code> statement. In contrast, the iAnywhere JDBC driver efficiently supports both wide inserts and wide fetches.
</ul>
<h3>Adaptive prefetching with CMDSEQ</h3>
<p>SQL Anywhere version 11 introduced <em>adaptive prefetch</em> as a variant of prefetch behaviour with CMDSEQ connections. Prefetch is designed to reduce communication in a client-server environment by transferring sets of rows to the client in advance of a <code>FETCH</code> request, and is enabled by default. Prefetching can be disabled outright by specifying the DisableMultiRowFetch connection parameter, or by setting the Prefetch connection option to OFF. Prefetch is turned off for cursors declared with sensitive value semantics.</p>
<p>With <em>adaptive prefetching</em>, a SQL Anywhere CMDSEQ client will automatically adjust the number of rows that are prefetched &#8211; increasing or decreasing &#8211; depending on application behaviour. A hard limit on the maximum number of rows that will be prefetched is 1000. Adaptive prefetching is also controlled by number of rows the application can <code>FETCH</code> in one elapsed second. Adaptive prefetching is enabled for cursors for which all of the following are true:</p>
<ul>
<li>ODBC and OLE DB: <code>FORWARD ONLY</code>, <code>READ ONLY</code> (default) cursor types; ESQL: <code>DYNAMIC SCROLL</code> (default), <code>NO SCROLL</code> and <code>INSENSITIVE</code> cursor types; all ADO.Net cursors
<li>only <code>FETCH NEXT</code> operations are done (no absolute, relative or backwards fetching)
<li>the application does not change the host variable type between fetches and does not use <code>GET DATA</code> to get column data in chunks (but using <strong>one</strong> <code>GET DATA</code> to retrieve the value is fine).
</ul>
<h3>jConnect semantics</h3>
<p>In addition to the automatic setting of connection options to ASE-equivalent settings upon connecting with jConnect &#8211; described in my <a href="http://iablog.sybase.com/paulley/2009/03/differences-between-jconnect-and-the-ianywhere-jdbc-driver-part-un/">previous post</a> &#8211; there are other semantic differences with jConnect. They include:</p>
<ul>
<li>The TDS protocol does not support dates or timestamps prior to January 1, 1753.
<li>Fixed-length <code>CHAR</code> and <code>BINARY</code> values are automatically padded upon retrieval from blank-padded databases.
<li>With older versions of jConnect, empty string values &#8211; strings of length zero &#8211; are returned to the application as a string with a single blank in it. This is because earlier versions of TDS did not distinguish between an empty string and the NULL value.
</li>
</ul>
<p>If a JDBC application wanted to use jConnect but not want Sybase ASE-like behaviour, then the application would have to:</p>
<ul>
<li>Revert the connection option settings issued by the sp_tsql_environment() system procedure by setting these options temporarily immediately after connecting.
<li>Set the connection option <code>RETURN_DATE_TIME_AS_STRING</code> to ON in order to get SQL Anywhere to always return <code>DATE</code>/<code>TIME</code>/<code>TIMESTAMP</code> values as strings. This is to overcome the inability of TDS to handle dates prior to January 1, 1753.
<li>Set the jConnect option &#8220;dynamic prepare&#8221; to TRUE to make sure prepared statements are not re-PREPAREd every time they are used.
<li>Set a cursor name for each statement in order to force jConnect to use TDS cursors instead of fire-hose cursors. Note that with SQL Anywhere, jConnect will still cache result sets on the client regardless of which cursor type is used.
<li>Set the fetch size explicitly on every statement in order to get jConnect to mimic CMDSEQ prefetch behaviour.
<li>For older versions of jConnect:
<ul>
<li>Handle &#8217;single-blank strings&#8217; as empty strings.
<li>Refrain from using unsigned data types, since unsigned values are not supported with older jConnect releases.
</ul>
</ul>
<p>In a subsequent post I&#8217;ll outline performance differences between the jConnect and iAnywhere drivers. In our experience with customer applications, most applications benefit from a significant performance boost by switching to the iAnywhere JDBC driver, occasionally up to a factor of two, depending on the nature of the application and the precise sequence of JDBC API calls issued by the application.</p>
<p>My thanks to colleague Karim Khamis for providing me with the background for this article. </p>
<img src="http://feeds.feedburner.com/~r/IdRatherPlayGolf/~4/JNGpSXdf-Jg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://iablog.sybase.com/paulley/2009/10/differences-between-jconnect-and-the-ianywhere-jdbc-driver-part-deux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://iablog.sybase.com/paulley/2009/10/differences-between-jconnect-and-the-ianywhere-jdbc-driver-part-deux/</feedburner:origLink></item>
		<item>
		<title>Sybase iAnywhere sponsors Business Intelligence research</title>
		<link>http://feedproxy.google.com/~r/IdRatherPlayGolf/~3/1FQE943H1-8/</link>
		<comments>http://iablog.sybase.com/paulley/2009/09/sybase-ianywhere-sponsors-business-intelligence-research/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 05:55:42 +0000</pubDate>
		<dc:creator>Glenn Paulley</dc:creator>
				<category><![CDATA[Computer Science education]]></category>
		<category><![CDATA[SQL Anywhere]]></category>
		<category><![CDATA[BIN project]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[NSERC]]></category>
		<category><![CDATA[University of Waterloo]]></category>

		<guid isPermaLink="false">http://iablog.sybase.com/paulley/?p=148</guid>
		<description><![CDATA[As a result of our close ties with the Advanced Database Research Group at the David R. Cheriton School of Computer Science at the University of Waterloo, Sybase iAnywhere is participating in, and providing funding to, an NSERC-sponsored research project, the Business Intelligence Network, or BIN. The awarding of the research grant was announced by [...]]]></description>
			<content:encoded><![CDATA[<p>As a result of our close ties with the <a href="http://db.uwaterloo.ca/">Advanced Database Research Group</a> at the <a href="http://cs.uwaterloo.ca">David R. Cheriton School of Computer Science</a> at the <a href="http://www.uwaterloo.ca">University of Waterloo</a>, Sybase iAnywhere is participating in, and providing funding to, an <a href="http://www.artsci.utoronto.ca/main/release_funding-for-bin">NSERC-sponsored research project</a>, the <em>Business Intelligence Network</em>, or BIN. The awarding of the research grant was <a href="http://www.nserc-crsng.gc.ca/Media-Media/NewsRelease-CommuniqueDePresse_eng.asp?ID=160">announced</a> by NSERC on September 24.</p>
<blockquote><p>The NSERC Business Intelligence Network will create an innovation platform to enhance the collaboration of the top Canadian knowledge and information management researchers and the top Canadian companies in business intelligence, an area that is central to NSERC&#8217;s strategic Advanced Communication and Management of Information target area. The NSERC BIN network will raise the visibility and impact of the Canadian information management research community to that of a world leader in business intelligence research. The network&#8217;s goal is to provide new solutions for businesses and government organizations to enable them to solve modern business problems and make decisions using integrated, trustworthy, and up-to-date data.
</p></blockquote>
<p>What exactly is Business Intelligence? Here&#8217;s the definition of the term from the BIN research proposal:</p>
<blockquote><p>
Business intelligence (BI) is the commercial term for using information within organizations to make informed decisions, and to run operations effectively based on available data. As a research field, it encompasses data and knowledge management, management of digital media, modeling of processes and policies, data quality, data privacy and security, data integration, data exchange, data cleaning, inconsistency management, information retrieval, data mining, analytics, and decision support.
</p></blockquote>
<p>Specific goals of the BIN network are:</p>
<ul>
<li>To develop novel technologies for facilitating business decision-making;
<li>To develop novel technologies for creating and managing integrated, trustworthy<br />
and up-to date data;</p>
<li>To strengthen the innovation pipeline of our industry partners;
<li>To raise the visibility and impact of the Canadian information management<br />
research community on the practice of business intelligence.
</ul>
<p>BIN is a strategic NSERC project, which will link fifteen researchers from seven Canadian academic institutions: Dalhousie, Toronto, Waterloo, Ottawa, Carleton, Calgary, and UBC. In addition, BIN has formal industry partners including IBM, Bell Canada, Ingres, Business Objects, and Sybase iAnywhere. At present there are four defined research areas (themes) that are under the BIN umbrella:</p>
<ul>
<li>Strategy and Policy Management;
<li>Capitalizing on Document Assets, which includes (i) feature extraction, (ii) ontology generation and maintenance, (iii) data exploration and summarization, and (iv) access and privacy control;
<li>Adaptive Data Cleaning; and
<li>Support for Top-Down (Business-driven) Data Integration.
</ul>
<p>My team is looking forward to working with some of the BIN researchers to study these aspects of Business Intelligence and, hopefully, incorporate some of these ideas into iAnywhere products, including <a alt="SQL Anywhere self-managing embeddable database management system" href="http://www.sybase.com/products/databasemanagement/sqlanywhere">SQL Anywhere</a>.</p>
<img src="http://feeds.feedburner.com/~r/IdRatherPlayGolf/~4/1FQE943H1-8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://iablog.sybase.com/paulley/2009/09/sybase-ianywhere-sponsors-business-intelligence-research/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://iablog.sybase.com/paulley/2009/09/sybase-ianywhere-sponsors-business-intelligence-research/</feedburner:origLink></item>
		<item>
		<title>Using RowGenerator</title>
		<link>http://feedproxy.google.com/~r/IdRatherPlayGolf/~3/0kL7ONBh0lQ/</link>
		<comments>http://iablog.sybase.com/paulley/2009/09/using-rowgenerator/#comments</comments>
		<pubDate>Fri, 25 Sep 2009 15:11:14 +0000</pubDate>
		<dc:creator>Glenn Paulley</dc:creator>
				<category><![CDATA[SQL Anywhere]]></category>
		<category><![CDATA[non-equijoin predicate]]></category>
		<category><![CDATA[OLAP]]></category>
		<category><![CDATA[RowGenerator]]></category>
		<category><![CDATA[sa_rowgenerator]]></category>
		<category><![CDATA[sa_rowgenerator system procedure]]></category>
		<category><![CDATA[table function]]></category>

		<guid isPermaLink="false">http://iablog.sybase.com/paulley/?p=1334</guid>
		<description><![CDATA[Join conditions that involve only inequality conditions are rare, primarily because most joins are between tables related through referential integrity constraints. In doing some analysis this week, however, I came up with an example that illustrates a case where joins over inequalities are useful.
My example involved doing some analysis over project tasks that had &#8220;creation&#8217; [...]]]></description>
			<content:encoded><![CDATA[<p>Join conditions that involve only inequality conditions are rare, primarily because most joins are between tables related through referential integrity constraints. In doing some analysis this week, however, I came up with an example that illustrates a case where joins over inequalities are useful.</p>
<p>My example involved doing some analysis over project tasks that had &#8220;creation&#8217; and &#8220;completion&#8217; timestamps, akin to</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> projects <span style="color: #66cc66;">&#40;</span>
  project_id INTEGER <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span><span style="color: #66cc66;">,</span>
  short_desc VARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">255</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span> 
  long_desc LONG VARCHAR<span style="color: #66cc66;">,</span>
  project_status VARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">20</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span>
  creation_ts TIMESTAMP <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>
  completion_ts TIMESTAMP <span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>The actual schema I was querying is much more complex than this, but this simple example serves to illustrate the basic idea. What I wanted was to create a result set that, for every week, contained a count of the number of projects that were in-progress, and the number of projects that were completed in that week. Once the data is factored out week-by-week, then I could perform historical analysis on that intermediate result using some of the builtin <a href="http://iablog.sybase.com/paulley/2009/03/window-aggregate-functions/">OLAP functionality</a> in <a alt="SQL Anywhere self-managing database system" href="http://www.sybase.com/products/databasemanagement/sqlanywhere">SQL Anywhere</a>.</p>
<p><span id="more-1334"></span>The function <code>DATEDIFF( WEEK, completion_ts, creation_ts )</code> gives the difference in weeks between the two timestamps, so that part is straightforward but for those projects that span a calendar year. Notwithstanding that complication, the more significant problem is that I wanted to generate a row for every week the project was unfinished. I needed to join the projects table with <em>something</em> to generate the additional rows, but what?</p>
<p>SQL Anywhere databases contain a table named RowGenerator precisely for this purpose; it&#8217;s a single-column table (row_num) that contains 255 rows with values starting from one. To generate the result set I needed, here&#8217;s the query:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
</pre></td><td class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> p<span style="color: #66cc66;">.</span>project_id<span style="color: #66cc66;">,</span> p<span style="color: #66cc66;">.</span>short_desc<span style="color: #66cc66;">,</span> p<span style="color: #66cc66;">.</span>creation_ts<span style="color: #66cc66;">,</span> p<span style="color: #66cc66;">.</span>completion_ts<span style="color: #66cc66;">,</span>
       <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">IF</span> p<span style="color: #66cc66;">.</span>completion_ts <span style="color: #993333; font-weight: bold;">IS</span> <span style="color: #993333; font-weight: bold;">NULL</span> THEN
          ABS<span style="color: #66cc66;">&#40;</span>DATEDIFF<span style="color: #66cc66;">&#40;</span> WEEK<span style="color: #66cc66;">,</span> NOW<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span> p<span style="color: #66cc66;">.</span>creation_ts <span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
       ELSE
          ABS<span style="color: #66cc66;">&#40;</span>DATEDIFF<span style="color: #66cc66;">&#40;</span> WEEK<span style="color: #66cc66;">,</span> p<span style="color: #66cc66;">.</span>completion_ts<span style="color: #66cc66;">,</span> p<span style="color: #66cc66;">.</span>creation_ts <span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
       ENDIF <span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> weeks_outstanding<span style="color: #66cc66;">,</span>
       <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">IF</span> p<span style="color: #66cc66;">.</span>project_status !<span style="color: #66cc66;">=</span> <span style="color: #ff0000;">'Complete'</span> <span style="color: #993333; font-weight: bold;">OR</span> weeks_outstanding <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">0</span> <span style="color: #993333; font-weight: bold;">OR</span> weeks_outstanding <span style="color: #66cc66;">&gt;</span> week_number THEN <span style="color: #cc66cc;">1</span> ELSE <span style="color: #cc66cc;">0</span> ENDIF<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> incomplete_projects<span style="color: #66cc66;">,</span>
       <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">IF</span> p<span style="color: #66cc66;">.</span>completion_ts <span style="color: #993333; font-weight: bold;">IS</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span> <span style="color: #993333; font-weight: bold;">AND</span> <span style="color: #66cc66;">&#40;</span>weeks_outstanding <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">0</span> <span style="color: #993333; font-weight: bold;">OR</span> weeks_outstanding <span style="color: #66cc66;">=</span> week_number<span style="color: #66cc66;">&#41;</span> THEN <span style="color: #cc66cc;">1</span> ELSE <span style="color: #cc66cc;">0</span> ENDIF<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> completed_projects<span style="color: #66cc66;">,</span>
       <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">IF</span> weeks_outstanding <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">0</span> THEN
           DATEPART<span style="color: #66cc66;">&#40;</span> YEAR<span style="color: #66cc66;">,</span> p<span style="color: #66cc66;">.</span>creation_ts <span style="color: #66cc66;">&#41;</span>
       ELSE
           DATEPART<span style="color: #66cc66;">&#40;</span> YEAR<span style="color: #66cc66;">,</span> DATEADD<span style="color: #66cc66;">&#40;</span> WEEK<span style="color: #66cc66;">,</span> RG<span style="color: #66cc66;">.</span>week_number<span style="color: #66cc66;">,</span> p<span style="color: #66cc66;">.</span>creation_ts<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span>
       ENDIF<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> calendar_year<span style="color: #66cc66;">,</span> 
       <span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">IF</span> weeks_outstanding <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">0</span> THEN
           DATEPART<span style="color: #66cc66;">&#40;</span> WEEK<span style="color: #66cc66;">,</span> p<span style="color: #66cc66;">.</span>creation_ts <span style="color: #66cc66;">&#41;</span>
       ELSE
           DATEPART<span style="color: #66cc66;">&#40;</span> WEEK<span style="color: #66cc66;">,</span> DATEADD<span style="color: #66cc66;">&#40;</span> WEEK<span style="color: #66cc66;">,</span> RG<span style="color: #66cc66;">.</span>week_number<span style="color: #66cc66;">,</span> p<span style="color: #66cc66;">.</span>creation_ts<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span>
       ENDIF<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> calendar_week
<span style="color: #993333; font-weight: bold;">FROM</span> 
    <span style="color: #66cc66;">&#40;</span> <span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">&#40;</span>row_num <span style="color: #66cc66;">-</span> <span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> week_number <span style="color: #993333; font-weight: bold;">FROM</span> RowGenerator<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> RG<span style="color: #66cc66;">,</span>
     projects p
<span style="color: #993333; font-weight: bold;">WHERE</span> 
    weeks_outstanding <span style="color: #66cc66;">&gt;=</span> RG<span style="color: #66cc66;">.</span>week_number</pre></td></tr></table></div>

<p>The query joins the builtin RowGenerator table to the projects table based on the weeks_outstanding value. Hence, for each week a project is incomplete, a row will be generated in the output, including for those projects that are created and completed in the same week (where weeks_outstanding would be zero). Using the <code>DATEPART</code> function with <code>WEEK</code> means that up to 54 weeks in a year are possible, because <code>DATEPART</code> defines a week to begin on a Sunday. </p>
<p>Once I have this result set, I can then embed it in a derived table and, for example, sum the number of open and completed projects by calendar week in a straightforward way. </p>
<p>The correctness of the solution depends on one factor: that there be no projects that take more than 255 weeks to complete, because otherwise there are insufficient rows in the RowGenerator table to generate the required number of rows. Should that be a problem, SQL Anywhere provides another row generator mechanism: the sa_rowgenerator() system procedure. The sa_rowgenerator() procedure takes three parameters: the starting value, the end value, and the step increment (default is 1). Joining sa_rowgenerator() to the projects table is identical to using the RowGenerator base table, since SQL Anywhere supports table functions (procedures in the <code>FROM</code> clause).</p>
<img src="http://feeds.feedburner.com/~r/IdRatherPlayGolf/~4/0kL7ONBh0lQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://iablog.sybase.com/paulley/2009/09/using-rowgenerator/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://iablog.sybase.com/paulley/2009/09/using-rowgenerator/</feedburner:origLink></item>
		<item>
		<title>Factors to consider for utilizing materialized views</title>
		<link>http://feedproxy.google.com/~r/IdRatherPlayGolf/~3/5_mSEYxg5GU/</link>
		<comments>http://iablog.sybase.com/paulley/2009/09/factors-to-consider-for-utilizing-materialized-views/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 06:45:55 +0000</pubDate>
		<dc:creator>Glenn Paulley</dc:creator>
				<category><![CDATA[Database Administration]]></category>
		<category><![CDATA[Query optimization]]></category>
		<category><![CDATA[SQL Anywhere]]></category>
		<category><![CDATA[Application Profiling]]></category>
		<category><![CDATA[deferred-maintenance materialized view]]></category>
		<category><![CDATA[immediately-maintained materialized view]]></category>
		<category><![CDATA[materialized view selection problem]]></category>
		<category><![CDATA[materialized views]]></category>
		<category><![CDATA[view selection problem]]></category>

		<guid isPermaLink="false">http://iablog.sybase.com/paulley/?p=1322</guid>
		<description><![CDATA[Starting with Version 10, SQL Anywhere supports deferred-maintenance materialized views; version 11 introduced support for immediately-maintained materialized views. The major differences between the two are:

With deferred-maintenance materialized views, the query optimizer may answer queries utilizing one or more materialized views that contain stale data. The &#8220;staleness&#8221; of any view, and whether or not the view [...]]]></description>
			<content:encoded><![CDATA[<p>Starting with Version 10, <a alt="SQL Anywhere self-managing database system" href="http://www.sybase.com/products/databasemanagement/sqlanywhere">SQL Anywhere</a> supports deferred-maintenance materialized views; version 11 introduced support for immediately-maintained materialized views. The major differences between the two are:</p>
<ul>
<li>With deferred-maintenance materialized views, the query optimizer may answer queries utilizing one or more materialized views that contain <em>stale data</em>. The &#8220;staleness&#8221; of any view, and whether or not the view can be used in query answering, is entirely under DBA control. However, deferred-maintenance views permit one to tradeoff data accuracy with the performance gains offered by the materialized view, and the update maintenance cost for that view.
<li>Conversely, immediately-maintained materialized view are updated within the same transaction as the base-table updates upon which the materialized view is defined. Immediately-maintained views offer a derived, up-to-the-minute copy of the view&#8217;s underlying base tables, at the expense of requiring view maintenance with each update operation.
</li>
</ul>
<p>In summary, deferred-maintenace views permit the amortization of maintenance costs for the materialized view. In contrast, immediately-maintained materialized views require each update transaction to incur the overhead of view maintenance, which may <a href="http://iablog.sybase.com/paulley/2008/04/schema-design-tools-are-a-tough-challenge/">result in contention between concurrently-executing transactions</a>. </p>
<p><span id="more-1322"></span>The deferred-or-immediate materialized view decision is but one of the factors to consider by a database administrator when deciding whether or not to use a materialized view &#8211; in the literature this question is known as the <em>view selection problem</em>. However, there are several considerations other than the deferred-or-immediate maintenance tradeoff. Here is a checklist of questions to consider when deciding upon the utility of a materialized view: </p>
<ul>
<li>What is the set of queries that can benefit from creation of materialized views?
<p><br/>Answering this question involves an analysis of the query workload for the system, including detailed consideration of both the definition and frequency of individual queries. A good starting point is to begin with frequently executed and expensive queries, particularly those expensive queries with critical response time requirements. SQL Anywhere&#8217;s Application Profiling capability, contained in the Sybase Central SQL Anywhere plug-in, is an excellent way to capture your application&#8217;s workload and determine the &#8220;heavy hitters&#8221; contained within it.</p>
<p>Materialized views that can benefit multiple queries in common represent the most significant gains, because the storage and maintenance costs for the view are constant but the benefits of the materialized view increase with usage. Also, remember that a single query can make use of multiple materialized views. Splitting a complex materialized view into multiple views may permit the optimizer to utilize the materialized view to assist a larger set of queries. If considering a materialized view involving aggregation (<code>GROUP BY</code>), it is often better to materialize basic functions that will permit a wider applicability to multiple queries; for example, <code>AVG()</code> can be obtained from a combination of <code>SUM()</code> and <code>COUNT(*)</code>. SQL Anywhere&#8217;s query optimizer is intelligent enough to utilize <code>SUM()</code> and <code>COUNT()</code> from a materialized view when the original query contains <code>AVG()</code>.</p>
<li>Does potential improvement in query performance outweigh the storage and maintenance costs of materialized views?
<p><br/>One must tradeoff the potential improvements in query performance with the space requirements for materialized views &#8211; and their indexes &#8211; and the maintenance costs for the view. Here, one must be aware of the update patterns from application requests; materialized views on heavily-updated base tables may have unacceptable maintenance costs, for two reasons: the cost of the updates to the materialized views themselves, and the increase in lock contention amongst update transactions from concurrent updates to the table (or index) containing the materialized view. This latter problem is difficult to assess without proper capacity planning.</p>
<p>DBAs often fail to realize that materialized views can be indexed, just like any other base table. Indexes are particularly useful when the application query contains additional joins to tables that are not included in the view; if indexes exist, the optimizer has more physical operator choices &#8211; particularly indexed nested-loop join &#8211; that can result in significant speed improvements.</p>
<li>Can the same query be allowed to return different results, if the optimizer chooses to utilize stale data from a materialized view in one case, and chooses to process the underlying (and up-to-date) base tables in another?
<li>Can stored data for materialized views be allowed to become stale?
<li>How stale can the data become before it is unacceptable?
<p><br/>These latter questions pertain to the tradeoffs of immediate versus deferred-maintenance materialized views. As described above, deferred-maintenance permits one to amortize view maintenance across multiple update transactions, at the expense of data staleness. Whether or not your application can benefit from deferred-maintenance views is primarily a business question, not a systems one.
</ul>
<p>In a later post, I&#8217;ve present some examples of using materialized views. My thanks to colleague Anil Goel for providing much of the detail in this article.</p>
<img src="http://feeds.feedburner.com/~r/IdRatherPlayGolf/~4/5_mSEYxg5GU" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://iablog.sybase.com/paulley/2009/09/factors-to-consider-for-utilizing-materialized-views/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://iablog.sybase.com/paulley/2009/09/factors-to-consider-for-utilizing-materialized-views/</feedburner:origLink></item>
	</channel>
</rss><!-- Dynamic page generated in 1.368 seconds. --><!-- Cached page generated by WP-Super-Cache on 2009-11-10 21:22:37 -->
