<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:thr="http://purl.org/syndication/thread/1.0"><id>http://douglas.mayle.org/</id><title>douglas.mayle.org</title><updated>2009-03-06T01:30:53Z</updated><link href="http://douglas.mayle.org/" rel="alternate" /><author><name>Douglas</name></author><generator version="r33" uri="http://code.google.com/p/django-atompub/">django-atompub</generator><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/mayle" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="mayle" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry><id>http://douglas.mayle.org/2009/03/05/syncing-safari-downloads-intro-screen-scraping/</id><title type="html">Syncing Safari Downloads - an intro to screen scraping</title><updated>2009-03-06T01:30:53Z</updated><published>2009-03-05T22:05:39Z</published><category term="lxml" /><category term="planetdev" /><category term="python" /><category term="safari" /><link href="http://douglas.mayle.org/2009/03/05/syncing-safari-downloads-intro-screen-scraping/" rel="self" /><link href="http://douglas.mayle.org/2009/03/05/syncing-safari-downloads-intro-screen-scraping/" rel="alternate" /><content type="html">&lt;p&gt;In honor of the upcoming &lt;a title="PyCon: Connecting The Python Community" href="http://us.pycon.org"&gt;PyCon&lt;/a&gt; (which I&amp;#8217;ll be attending on behalf of &lt;a title="The Open Planning Project" href="http://theopenplanningproject.org/"&gt;The Open Planning Project&lt;/a&gt;) I decided to write about Python&amp;nbsp;today.&lt;/p&gt;&lt;p&gt;Some time back I wrote myself a simple utility for synchronizing Safari downloads (the book service, not the web browser), and I decided to polish it up, release it, and write about the process.&amp;#160; This is the first of two parts where I will talk about my first time handling the start to finish of publishing an open source python package.&amp;nbsp; The next part will be a tutorial on how to screen scrape the web, from inspecting the &lt;span class="caps"&gt;HTTP&lt;/span&gt; headers to using &lt;span class="caps"&gt;CSS&lt;/span&gt; selectors with lxml to parse out the interesting&amp;nbsp;data.&lt;/p&gt;&lt;p&gt;Anyway, back to the topic at hand.&amp;nbsp; If you&amp;#8217;ve never heard of &lt;a title="Safari Books Online" href="http://www.safaribooksonline.com/"&gt;Safari&lt;/a&gt;, and you&amp;#8217;re a tech professional, than I hope it&amp;#8217;s because you have personal access to the &lt;a title="Fog Creek's Library" href="http://www.joelonsoftware.com/items/2006/08/21.html"&gt;Library of Alexandria&lt;/a&gt;.&amp;nbsp; If not, then let me be your personal cluestick.&amp;nbsp; For about the price of five tech books (per year), you can maintain an online bookshelf that gives you access of up to 120 books in that year.&amp;nbsp; In practice, I think I average about 30, but this also gives you the ability to search through their entire library to find the answers you need.&amp;nbsp; When you find a book, you add can add it to your bookshelf with two clicks (&lt;a title="World's Dumbest Patent" href="http://en.wikipedia.org/wiki/1-Click"&gt;Thanks Amazon!&lt;/a&gt;) and then start reading.&amp;nbsp; What&amp;#8217;s more, the service includes 5 downloads per month (usually one chapter or section of a book),&amp;nbsp; that give you a personalized &lt;span class="caps"&gt;PDF&lt;/span&gt; for offline&amp;nbsp;reading.&lt;/p&gt;&lt;p&gt;My only problem with the service is managing the downloads.&amp;nbsp; Once you&amp;#8217;ve downloaded a chapter, it will always be available to you (at least as long as you have an account), but the PDFs are auto-generated on demand, and when you save them, you end up with files named something like &lt;a title="A subliminal message is a signal or message embedded in another medium" href="http://en.wikipedia.org/wiki/Subliminal_message"&gt;0EITGkillY6ALIkill3kHfWkillC4RwjkillwKb69kill736MGkillY4UuykillEJTsC.pdf&lt;/a&gt;.&amp;nbsp; I tried to give them sensible names, and organize them, but it was always a pain, and I always had the weirdest urges just afterward.&amp;nbsp; To top it all off, the last time I changed computers, I decided not to copy the files (knowing that I could re-download them), so I was left with a lot of manual work to&amp;nbsp;do.&lt;/p&gt;&lt;p&gt;Well, I&amp;#8217;ve been telling myself for some time that I wanted to play with &lt;a title="lxml is the most feature-rich and easy-to-use library for working with XML and HTML in the Python language" href="http://codespeak.net/lxml/"&gt;lxml&lt;/a&gt; (it&amp;#8217;s the fast python library for working with &lt;span class="caps"&gt;XML&lt;/span&gt; and &lt;span class="caps"&gt;HTML&lt;/span&gt;).&amp;nbsp; Also, I&amp;#8217;ve been working entirely in javascript lately, so I felt that it was time to stretch some mental muscles and get something done in python.&amp;nbsp; For the impatient, you can get a copy of the script by typing the following at a&amp;nbsp;terminal: &lt;/p&gt;&lt;blockquote&gt;&lt;pre&gt;export STATIC_DEPS=true # Only necessary on a Mac&lt;br /&gt;easy_install&amp;nbsp;safarisync&lt;/pre&gt;&lt;/blockquote&gt;&lt;p&gt;If the output you get looks something like this:&lt;img alt="This is Easy Install on Windows" title="This is Easy Install on Windows" src="../../../../../files/blog/syncing-safari-downloads-intro-screen-scraping/easy_install_windows.png" /&gt; &lt;/p&gt;&lt;p&gt;Fear not, poor windows user, I intend to release a simple executable to coincide with the second part of this article.&amp;nbsp; If you don&amp;#8217;t feel like waiting, you can download and install &lt;a title="Python is a dynamic object-oriented programming language that can be used for many kinds of software development." href="http://python.org/"&gt;python&lt;/a&gt;, then download and install &lt;a title="Download, build, install, upgrade, and uninstall Python packages -- easily!" href="http://pypi.python.org/pypi/setuptools"&gt;setuptools&lt;/a&gt;, and finally fix up your &lt;span class="caps"&gt;PATH&lt;/span&gt; environment&amp;nbsp;variable.&lt;/p&gt;&lt;p&gt;For everyone else, you can start playing along.&amp;nbsp; Just type &lt;code&gt;safarisync&lt;/code&gt; to start the process, or &lt;code&gt;safarisync --help&lt;/code&gt; to get a list of&amp;nbsp;options.&lt;/p&gt;&lt;p&gt;Since I&amp;#8217;ve only worked with lxml peripherally before (as it was embedded into other projects I was working on), I ended up writing three completely different versions.&amp;nbsp; The first version was fully functional, using the cookie handling that I learned from this &lt;a title="cookielib and ClientCookie &amp;mdash; Handling Cookies in Python" href="http://www.voidspace.org.uk/python/articles/cookielib.shtml"&gt;well written tutorial&lt;/a&gt;.&amp;nbsp; It also iterated through all of the elements in the tree to find the ones we were interested in.&amp;nbsp; Just after finishing it up, I stumbled across this &lt;a title="lxml: an underappreciated web scraping library" href="http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/"&gt;quick intro to lxml&lt;/a&gt;, written by a colleague of mine (&lt;a title="Ian Bicking: a blog" href="http://blog.ianbicking.org/"&gt;Ian Bicking&lt;/a&gt;).&amp;nbsp; If you haven&amp;#8217;t heard him speak somewhere already, than chances are high that either you&amp;#8217;ve used something he&amp;#8217;s written, or used something based on something he&amp;#8217;s&amp;nbsp;written.&lt;/p&gt;&lt;p&gt;His article introduced me to the &lt;span class="caps"&gt;CSS&lt;/span&gt; selector engine and form handling now built into lxml.&amp;nbsp; Thus was born the second version of safarisync.&amp;nbsp; The only problem was that it usually didn&amp;#8217;t work.&amp;nbsp; In the debug shell, I could usually get the code to run, after some tinkering, but never&amp;nbsp;standalone.&lt;/p&gt;&lt;p&gt;The first problem I always had was unnecessarily hard to diagnose.&amp;nbsp; I was consistently receiving a UnicodeDecodeError from lxml.&amp;nbsp; I was confused by this because the string I passed in had the proper encoding specified&amp;nbsp;within: &lt;/p&gt;&lt;blockquote&gt;&lt;pre id="line1"&gt;&lt;span class="pi"&gt;&amp;lt;?xml version="1.0" encoding="utf-8"?&amp;gt;&lt;/span&gt;&lt;span class="doctype"&gt;&lt;/span&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;p&gt;I received the help I needed from my colleague &lt;a title="Luke Tucker" href="http://www.openplans.org/people/ltucker/profile"&gt;Luke Tucker&lt;/a&gt; (of &lt;a title="Melkjug tunes your RSS feeds to find the news you care about." href="http://melkjug.org/"&gt;Melkjug&lt;/a&gt; fame, which by the way, you should check out, they just released a new version).&amp;nbsp; As it turns out, there was a problem in the error handling of lxml such that if you had a bug &lt;span class="caps"&gt;AND&lt;/span&gt; you had unicode data, instead of getting the correct bug reported, you got a UnicodeDecodeError.&amp;nbsp; He suggested I strip any unicode data and try the same operations to get to the real error.&amp;nbsp; Thankfully, I&amp;#8217;ve been told that this is fixed in the latest&amp;nbsp;version.&lt;/p&gt;&lt;p&gt;Solving the last problem took me outside of the debugging shell, and into the bowels of lxml.&amp;nbsp; It&amp;#8217;s partially written in &lt;a title="Cython is a language that makes writing C extensions for the Python language as easy as Python itself" href="http://www.cython.org/"&gt;Cython&lt;/a&gt;, which is a python-like language that compiles down to C.&amp;nbsp; This means (in theory) that you get the speed of C with the beauty of Python.&amp;nbsp; In practice, this is only half true.&amp;nbsp; You get the speed of C.&amp;nbsp; Beauty, however, is in the &lt;a title="As beautiful as a chicken with lips" href="https://codespeak.net/viewvc/lxml/tag/lxml-2.1.1/src/lxml/lxml.etree.pyx?revision=57850&amp;amp;view=markup"&gt;eye of the beholder&lt;/a&gt;.&amp;nbsp; In any case, peering through the code showed me that while the new form handling code uses python for network access, the rest of lxml uses the built-in downloading facilities of libxml, the C library it wraps.&amp;nbsp; This means that you have to avoid lxml&amp;#8217;s network helpers almost entirely if you need to handle&amp;nbsp;cookies.&lt;/p&gt;&lt;p&gt;The third version of the code can be found at my public &lt;a title="safarisync source repository" href="http://projects.mayle.org/hg/safarisync"&gt;source repository&lt;/a&gt;.&amp;nbsp; The interesting code is found in &lt;a title="safarisync source file" href="http://projects.mayle.org/hg/safarisync/file/tip/safarisync/safarisync.py"&gt;safarisync.py&lt;/a&gt;.&amp;nbsp; I&amp;#8217;ve tried to comment it well enough that you can follow through, even without my help.&amp;nbsp; I&amp;#8217;ve had it reviewed by Ian and &lt;a title="Reflections on software" href="http://robmarianski.com/"&gt;Robert Marianski&lt;/a&gt;, another colleague of mine and talented python programmer.&amp;nbsp; He helped me with the details necessary to publish the package on &lt;a title="The Python Package Index is a repository of software for the Python programming language." href="http://pypi.python.org/pypi"&gt;PyPI&lt;/a&gt;. (For example, if you want your package to have an executable shortcut, you need to create a specially named entry point in&amp;nbsp;setup.py).&lt;/p&gt;&lt;p&gt;Well, thanks for tuning in.&amp;nbsp; Come back next week for a detailed tutorial teaching you how to write your own screen scraping&amp;nbsp;tools.&lt;/p&gt;</content></entry><entry><id>http://douglas.mayle.org/2009/01/27/xinha4wp-wordpress-power-xinha/</id><title type="html">Xinha4WP - Wordpress with the power of Xinha</title><updated>2009-01-27T05:36:36Z</updated><published>2009-01-27T00:49:58Z</published><category term="planetdev" /><category term="wordpress" /><category term="xinha" /><link href="http://douglas.mayle.org/2009/01/27/xinha4wp-wordpress-power-xinha/" rel="self" /><link href="http://douglas.mayle.org/2009/01/27/xinha4wp-wordpress-power-xinha/" rel="alternate" /><content type="html">&lt;p&gt;This won&amp;#8217;t be one of my usual blog posts, because I just want to announce the &lt;a title="Xinha4WP 0.96 Beta Release" href="http://oss.openplans.org/xinhatools/raw-attachment/wiki/Downloads/xinha4wp_0.96beta.tbz2"&gt;0.96 beta&lt;/a&gt; release of &lt;a title="Xinha For Wordpress" href="http://oss.openplans.org/xinhatools"&gt;Xinha4WP&lt;/a&gt;.&amp;#160; It&amp;#8217;s a &lt;a href="http://wordpress.org/"&gt;wordpress&lt;/a&gt; plugin that installs &lt;a title="Xinha, the community driven open source web WYSIWYG editor" href="http://www.xinha.org"&gt;Xinha&lt;/a&gt; as a drop-in replacement for &lt;a title="Javascript WYSIWYG Editor" href="http://tinymce.moxiecode.com/"&gt;TinyMCE&lt;/a&gt;.&amp;nbsp; (For those not in the know, Xinha is a community-driven open source web &lt;span class="caps"&gt;WYSIWYG&lt;/span&gt;&amp;nbsp;editor.)&lt;/p&gt;&lt;p&gt;At my employer (&lt;a title="Using technology to do good, better" href="http://topp.openplans.org"&gt;The Open Planning Project&lt;/a&gt;), we switched to wordpress from &lt;a href="http://www.blogger.com/"&gt;Blogger&lt;/a&gt; back in 2006 for &lt;a href="http://www.streetsblog.org/"&gt;Streetsblog&lt;/a&gt;.&amp;nbsp; After a short trial, our writers and editors became increasingly frustrated with the state of &lt;span class="caps"&gt;WYSIWYG&lt;/span&gt; editing (powered by TinyMCE).&amp;nbsp; At the time, TinyMCE&amp;nbsp; was almost unusable (at least as it was embedded into wordpress).&amp;nbsp; Our writers and editors were about to give up and switch back when we came across &lt;a href="http://baptiste.us/"&gt;Mike Baptiste&lt;/a&gt;&amp;#8216;s wonderful&amp;nbsp;plugin.&lt;/p&gt;&lt;p&gt;Here at &lt;span class="caps"&gt;TOPP&lt;/span&gt;, we believe in open source software not only for idealistic reasons but also for pragmatic ones.&amp;nbsp; Being in control of our stack gives us much more flexibility in terms of site design and functionality.&amp;nbsp; By using Mike&amp;#8217;s plugin, we were able keep our writers happy while still maintaing control of our entire&amp;nbsp;platform.&lt;/p&gt;&lt;p&gt;Well, fast forward to the beginning of 2009 and the crew over at Wordpress and Moxiecode have put an amazing amount of work and polish into TinyMCE. It is now the first class option it should be and no longer behind other platforms.&amp;nbsp; At the same time, Mike no longer has time to update the Xinha4WP plugin, and Xinha&amp;#8217;s last stable release was over 8 months&amp;nbsp;ago.&lt;/p&gt;&lt;p&gt;That&amp;#8217;s where I come in.&amp;nbsp; I&amp;#8217;m now one of the core Xinha developers and we&amp;#8217;ve just published a new beta release (&lt;a title="Xinha 0.96 Phoenix Beta" href="http://xinha.webfactional.com/wiki/PhoenixRelease"&gt;0.96 Phoenix beta&lt;/a&gt;).&amp;nbsp; In addition, we&amp;#8217;ve received Mike&amp;#8217;s blessing to take over the Xinha4WP plugin and bring it up to&amp;nbsp;date.&lt;/p&gt;&lt;p&gt;Because of the time lapse, we&amp;#8217;re now playing catch up to TinyMCE in terms of integration into wordpress, but we&amp;#8217;ve added some long needed features for our first new&amp;nbsp;release.&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Autosave was added to wordpress 2.2, two years and five versions ago.&amp;nbsp; Now we sync textareas to allow this feature to&amp;nbsp;work.&lt;/li&gt;&lt;li&gt;Up until recently, Xinha4WP always enabled Xinha, and required the user
to disable TinyMCE.&amp;nbsp; Now we auto-disable TinyMCE and respect the users
visual editing&amp;nbsp;preference.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;There are some outstanding issues in this release as well.&amp;nbsp; While Xinha normally supports autoresizing, our embedded version doesn&amp;#8217;t correctly resize with the page (requiring a page refresh).&amp;nbsp; What&amp;#8217;s worse, since TinyMCE supports user-draggable resizing, the default size of the visual editor is a bit cramped for normal use.&amp;nbsp; These two together means that this is a bit of a pain point for writers.&amp;nbsp; While we&amp;#8217;re working on the fix, you can use Xinha&amp;#8217;s full screen mode to provide a comfortable editing space for your blog&amp;nbsp;post.&lt;/p&gt;&lt;p&gt;In brief, it&amp;#8217;s been awhile, and the people over at wordpress have put a lot of effort into TinyMCE integration, and this has become a viable option.&amp;nbsp; If, however, you crave more from your users, take a look at Xinha.&amp;nbsp; We&amp;#8217;ve got some catching up to do, but we&amp;#8217;ve got a great alternative and we can only go up from&amp;nbsp;here.&lt;/p&gt;&lt;p&gt;If you&amp;#8217;re interested in finding out more, &lt;a title="Xinha4WP 0.96 Beta" href="http://oss.openplans.org/xinhatools/raw-attachment/wiki/Downloads/xinha4wp_0.96beta.tbz2"&gt;download the release&lt;/a&gt; and come join our &lt;a title="Xinha4WP mailing list" href="http://www.openplans.org/projects/xinha4wp/lists/xinha4wp-discussion"&gt;mailing list&lt;/a&gt;!&lt;/p&gt;</content></entry><entry><id>http://douglas.mayle.org/2009/01/22/n-one-webs-tough-problems/</id><title type="html">"\n" - One of the Web's Tough Problems</title><updated>2009-01-23T02:33:06Z</updated><published>2009-01-22T22:03:48Z</published><category term="firefox" /><category term="javascript" /><category term="mozilla" /><category term="planetdev" /><category term="xinha" /><link href="http://douglas.mayle.org/2009/01/22/n-one-webs-tough-problems/" rel="self" /><link href="http://douglas.mayle.org/2009/01/22/n-one-webs-tough-problems/" rel="alternate" /><content type="html">&lt;p&gt;So I&amp;#8217;ve got you.&amp;#160; This doesn&amp;#8217;t make much sense, does it?&amp;nbsp; All except the &lt;a title="Dinosaurs" href="http://upload.wikimedia.org/wikipedia/commons/c/ce/Styracosaurus_Baltow_20051003_1315.jpg"&gt;oldest&lt;/a&gt; among you used a newline in your &lt;a title="Hello world!" href="javascript:alert('Hello world!\n');"&gt;first program&lt;/a&gt;.&amp;nbsp; The tough problem is not re-displaying the page as it changes, that&amp;#8217;s easy.&amp;nbsp; Okay, maybe not easy, but at least it&amp;#8217;s already been&amp;nbsp;solved.&lt;/p&gt;&lt;p&gt;Imagine that you&amp;#8217;re typing an email to a lovely lady you want to move here from St. Petersburg.&amp;nbsp; You&amp;#8217;ve finished a paragraph about naked scuba diving, you&amp;#8217;ve told her about your pet rock collection and now it&amp;#8217;s time to add you&amp;#8217;re closing line.&amp;nbsp; &amp;#8220;Sincerely yours&amp;#8221; is a bit too formal, and &amp;#8220;With all my love&amp;#8221; might scare her off.&amp;nbsp; In any case, if you can&amp;#8217;t hit enter to type that line, than she&amp;#8217;s never going to move here and marry&amp;nbsp;you.&lt;/p&gt;&lt;p&gt;Well, it can&amp;#8217;t be too tough a problem, can it?&amp;nbsp; Olav Kjær wrote a &lt;a title="Rich HTML editing in the browser: part 1" href="http://dev.opera.com/articles/view/rich-html-editing-in-the-browser-part-1/"&gt;great article&lt;/a&gt; about the problems and inconsistencies involved.&amp;nbsp; &lt;span class="caps"&gt;HTML&lt;/span&gt; is a great language for documents (I write software for the web, they make me say that), but its rules for containing text are pretty lax.&amp;nbsp; And when you give a user a mouse and allow him to just click anywhere on the page?&amp;nbsp; That&amp;#8217;s just crazy&amp;nbsp;talk!&lt;/p&gt;&lt;p&gt;Why do I care?&amp;nbsp; Well, I was editing a *cough* wiki page on &lt;a title="OpenPlans is a platform for social activism." href="http://www.openplans.org/"&gt;OpenPlans.org&lt;/a&gt; using the Firefox 3 Beta (took me awhile to finish off this post, eh?).&amp;nbsp; When I clicked in the middle of the page and hit enter to start typing a new paragraph, half of my page disappeared.&amp;nbsp; &lt;a title="Hitting enter in certain documents causes the rest of the text to disappear in Firefox 3" href="http://trac.xinha.org/ticket/1226"&gt;Expected results&lt;/a&gt;?&amp;nbsp; Uh, a new paragraph?&amp;nbsp; We use Xinha, the open source &lt;span class="caps"&gt;WYSIWYG&lt;/span&gt; editor, and a pretty old version at that, but there was no problem in any other browser, or in previous versions of&amp;nbsp;Firefox.&lt;/p&gt;&lt;p&gt;So I did what any &lt;a title="A Web Developer's Responsibility" href="http://ejohn.org/blog/a-web-developers-responsibility/"&gt;self-respecting software engineer&lt;/a&gt; does when the problem&amp;#8217;s not in his code, and he can&amp;#8217;t understand it. I &lt;a title=" iframe.contentDocument.getSelection() reports selection when nothing is selected if BR element is on the line" href="https://bugzilla.mozilla.org/show_bug.cgi?id=437672"&gt;blamed someone else&lt;/a&gt;.&amp;nbsp; I was so worried about supporting the problem for the life of Firefox 3 that I even called &lt;a title="All Around Cool Guy" href="http://ejohn.org/"&gt;John Resig&lt;/a&gt;, Mozilla&amp;#8217;s Javascript Evangelist.&amp;nbsp; You&amp;#8217;ll notice that he&amp;#8217;s removed the phone number from his site (sorry&amp;nbsp;John).&lt;/p&gt;&lt;p&gt;After filing the bug, I started to search through snapshots of Minefield (the testing and development version of Firefox), and was able to narrow it down to one commit.&amp;nbsp; Looking at the source (in nsRange.cpp), it turns out that the change was a bugfix that caused Firefox to correctly implement the &lt;span class="caps"&gt;W3C&lt;/span&gt; Range standard.&amp;nbsp; Before the fix, trying to create a backwards selection raised an exception, and after the fix it returned an empty selection, as it was supposed to.&amp;nbsp; That meant the Xinha code depended on broken behavior; and that I had work to&amp;nbsp;do.&lt;/p&gt;&lt;p&gt;My first job was to find out what was wrong,&amp;nbsp; That&amp;#8217;s easy, I have access to the code, I just have to find out where things went wrong.&amp;nbsp; Eeek.&amp;nbsp; &amp;#8220;processRng&amp;#8221; and &amp;#8220;processSide&amp;#8221;.&amp;nbsp; Well, it&amp;#8217;s pretty obvious what those two functions do.&amp;nbsp; The first processes a range, and the second processes a side.&amp;nbsp; Thanks to some helpful comments, I know that it returns a neighbor node, and insertion type, and a &amp;#8220;roam&amp;#8221;. What that means?&amp;nbsp; No idea.&amp;nbsp; My favorite&amp;nbsp;comment?&lt;/p&gt;&lt;blockquote&gt;&amp;#8220;I do not profess to understand any of this, simply applying a patch that others say is good &amp;mdash; &lt;a title="Paragraph tags acting weird in Firefox 1.0.6" href="http://trac.xinha.org/ticket/446"&gt;ticket:446&lt;/a&gt;&amp;#8221;&lt;/blockquote&gt;&lt;p&gt;After stepping through the code I was finally able to figure out what it was trying to do.&amp;nbsp; It divided the document into two pieces.&amp;nbsp; It cuts out everything from the current cursor to the end of the document, inserts a break, and then pastes it back in again.&amp;nbsp; It sounds like a simple enough idea, but I couldn&amp;#8217;t for the life of me figure out what was going wrong.&amp;nbsp; So again I did what any self-respecting software engineer would do.&amp;nbsp; I decided to &lt;a title="Things You Should Never Do, Part I" href="http://www.joelonsoftware.com/articles/fog0000000069.html"&gt;rewrite the algorithm from scratch&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;It&amp;#8217;s now six months later, and I&amp;#8217;ve finally nailed this bug.&amp;nbsp; Of course, other things happened in between, but that&amp;#8217;s always the case.&amp;nbsp; Let&amp;#8217;s take a look at what&amp;#8217;s so tough about&amp;nbsp;newlines.&lt;/p&gt;&lt;ol&gt;&lt;li&gt;The first difficult problem is determining user intent.&amp;nbsp; If the user finishes typing a heading and then hits enter, they probably want to start typing text in a new paragraph.&amp;nbsp; If however, the cursor is in the middle of two sentences in that heading, they probably want to split it into two headings.&amp;nbsp; In a table cell and they probably just want a line break.&amp;nbsp; If they&amp;#8217;re editing a definition list, they might want to insert a new definition, a new term, or even split two sentences into two seperate definitions or&amp;nbsp;terms.&lt;/li&gt;&lt;li&gt;The second problem is cursor position.&amp;nbsp; Since a cursor is defined as a pointer to a node and an offset, in the following &lt;span class="caps"&gt;HTML&lt;/span&gt; snippet, the position just before the letter &amp;#8216;T&amp;#8217; can be targeted with two different cursors.&lt;br /&gt;&lt;blockquote&gt;&lt;code&gt;&amp;lt;p&amp;gt;&amp;lt;em&amp;gt;&lt;span style="text-decoration: underline;"&gt;T&lt;/span&gt;ext&amp;lt;/em&amp;gt;&amp;lt;/p&amp;gt;&lt;/code&gt;&lt;/blockquote&gt;The first would be a pointer to the &amp;#8220;&amp;lt;em&amp;gt;&amp;#8221; element with an offset of zero.&amp;nbsp; This would mean we were pointed at the text node.&amp;nbsp; The second would be a pointer to the text node with a zero offset.&amp;nbsp; In this case, we are pointing at the characters of text, and not at a&amp;nbsp;node.&lt;/li&gt;&lt;li&gt;Third is what it means to break a line.&amp;nbsp; In a list, breaking a line means creating a new list item.&amp;nbsp; In a preformatted block, it means a newline character.&amp;nbsp; In a table cell, you want a &amp;#8220;&amp;lt;br&amp;gt;&amp;#8221; element, and in a paragraph you want a new paragraph.&amp;nbsp; I won&amp;#8217;t even get into how this changes for&amp;nbsp;shift-enter.&lt;/li&gt;&lt;li&gt;The final tough problem is inline elements.&amp;nbsp; The formatting of the text at a given cursor position is the result of a tree of inline elements that heads up towards the containing block.&amp;nbsp; When splitting that block, you have to create a duplicate of this tree with all of the same elements, and you have to split each inline element into the parts that come before the cursor, and the parts that come&amp;nbsp;after.&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;After having finished the majority of this back when I found the bug, I shelved the code and moved on to other things.&amp;nbsp; With the help of my colleague, &lt;a title="hacker, activist, and life-long unschooler" href="http://www.nicholasbs.net/"&gt;Nicholas Bergson-Shilcock&lt;/a&gt;, I&amp;#8217;ve picked this up again and finished it off.&amp;nbsp; This means that the new &lt;a title="Xinha 0.96 Phoenix Beta" href="http://trac.xinha.org/wiki/PhoenixRelease"&gt;Phoenix Release&lt;/a&gt; (0.96) of Xinha will get a bugfix that makes Firefox 3 usable&amp;nbsp;again.&lt;/p&gt;&lt;p&gt;All of the code for this fix is pluggable, and should be usable by anyone needing to break lines in &lt;span class="caps"&gt;HTML&lt;/span&gt;.&amp;nbsp; The only dependence is on &lt;span class="caps"&gt;W3C&lt;/span&gt; Ranges and &lt;span class="caps"&gt;DOM&lt;/span&gt; Selections.&amp;nbsp; Luckily, there&amp;#8217;s been talk of a cross-platform &lt;span class="caps"&gt;W3C&lt;/span&gt; Range and &lt;span class="caps"&gt;DOM&lt;/span&gt; Selection&amp;nbsp;library.&lt;/p&gt;&lt;p&gt;When the guys over at 37signals released their own super-light-weight &lt;span class="caps"&gt;WYSIWYG&lt;/span&gt; editor WysiHat, they talked about wanting to help with the problem.&amp;nbsp; &lt;a title="Mozilla Inline Editor" href="http://mozile.mozdev.org/"&gt;Mozile&lt;/a&gt;, the Mozilla Inline Editor, actually has one, but it&amp;#8217;s too tied to the editor to be able to useful elsewhere.&amp;nbsp; TinyMCE goes the other way and has an &lt;span class="caps"&gt;IE&lt;/span&gt; TextRange implementation for Firefox, and I&amp;#8217;ve recently been told that &lt;a title="the text editor for internet" href="http://www.fckeditor.net/"&gt;FCKEditor&lt;/a&gt; has the beginnings of a usable library.&amp;nbsp; I&amp;#8217;ve implemented the tough parts twice now (finding the &lt;span class="caps"&gt;DOM&lt;/span&gt; node and offset of the ranges start and end points) and learned the best way to do it.&amp;nbsp; For the next release of Xinha (0.97) I hope to bring my work together with the work of all other interested parties and release it as a library.&amp;nbsp; When we do that, users will finally be able to go back and forth between browsers and not have to fight to edit a&amp;nbsp;document.&lt;/p&gt;&lt;p&gt;Until&amp;nbsp;then&amp;#8230;&lt;/p&gt;</content></entry><entry><id>http://douglas.mayle.org/2009/01/08/blah-blah-blog-post-getting-it-out/</id><title type="html">Blah-blah Blog Post - Getting It Out</title><updated>2009-01-09T01:16:34Z</updated><published>2009-01-08T19:01:04Z</published><category term="planetdev" /><link href="http://douglas.mayle.org/2009/01/08/blah-blah-blog-post-getting-it-out/" rel="self" /><link href="http://douglas.mayle.org/2009/01/08/blah-blah-blog-post-getting-it-out/" rel="alternate" /><content type="html">&lt;p&gt;No more excuses, I&amp;#8217;m publishing this.&amp;#160; This is a post about getting it done; something to help me write.&amp;nbsp; Something to help get over writer&amp;#8217;s&amp;nbsp;block.&lt;/p&gt;&lt;p&gt;I enjoy writing, and I think it&amp;#8217;s helps me to be better organized.&amp;nbsp; When I started working for &lt;a title="The Open Planning Project" href="http://topp.openplans.org/"&gt;&lt;span class="caps"&gt;TOPP&lt;/span&gt;&lt;/a&gt;, we were encouraged to blog, which is one of the things I like about working here.&amp;nbsp; (The wild, swinging from the rafter, parties aren&amp;#8217;t so bad either.)&amp;nbsp; Great policy, but it only helped me to keep up blogging for about five minutes (10 if you count a drunken blog post about &lt;a title="Wine and Coke" href="http://en.wikipedia.org/wiki/Calimocho"&gt;Calimocho&lt;/a&gt;).&amp;nbsp; The funny thing is that I&amp;#8217;ve always wanted to write more, and I&amp;#8217;ve often wanted to come back and just do&amp;nbsp;it.&lt;/p&gt;&lt;p&gt;Well, I finally found my voice this summer, and started posting about more technical issues.&amp;nbsp; (Not that I&amp;#8217;ve been prolific.&amp;nbsp; Unless you count my drafts folder.)&amp;nbsp; I&amp;#8217;ve done some blogging on this site as well as a couple of the work blogs, and I&amp;#8217;m really looking forward to a guest post for my favorite political blog, &lt;a title="Ideas so open they’ll poke your eyes out." href="http://digifesto.wordpress.com/"&gt;Digifesto&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;However, like anyone trying to start a new habit, especially one in an area of non-expertise, I found excuses not to write, or I&amp;#8217;d start and never finish.&amp;nbsp; I knew this was going to happen.&amp;nbsp; It&amp;#8217;s pretty well known that writing is not easy, especially keeping it up regularly.&amp;nbsp; I&amp;#8217;d thought I&amp;#8217;d be clever by getting started on a couple of downtime posts.&amp;nbsp; That way, when I hit a slowdown, I could just pick one and finish it.&amp;nbsp; It turns out for me, however, having a bunch of unfinished posts wasn&amp;#8217;t helping.&amp;nbsp; When I got to a tough point in a post, I&amp;#8217;d turn away, or start a new draft for later.&amp;nbsp; All of that &amp;#8220;unfinished&amp;#8221; work started to drag on me, and 50 pounds of blog posts really make your muscles&amp;nbsp;sore.&lt;/p&gt;&lt;p&gt;Well, this is a kick in my pants.&amp;nbsp; Each line you read is one giant boot to my tookus.&amp;nbsp; (You&amp;#8217;re still reading?&amp;nbsp; Kind of cruel, don&amp;#8217;t you think?&amp;nbsp; What does that say about&amp;nbsp;you?)&lt;/p&gt;&lt;p&gt;Well, I recently finished reading &lt;a href="http://www.pragprog.com/titles/ahptl/pragmatic-thinking-and-learning"&gt;Pragmatic Thinking and Learning&lt;/a&gt;, a book about personal productivity (Thanks, &lt;a title="Congratulations on the new kid!" href="http://twitter.com/whitmo"&gt;Whit&lt;/a&gt;!).&amp;nbsp; (It&amp;#8217;s by the same authors as &lt;a href="http://www.pragprog.com/titles/tpp/the-pragmatic-programmer"&gt;The Pragmatic Progammer&lt;/a&gt;, which you might be familiar with.)&amp;nbsp; Something they spend a considerable amount of ink on (and carriage returns) is how to focus, and how to transition mentally from the part of your mind that sticks up roadblocks to the part that really flows and has the great&amp;nbsp;ideas.&lt;/p&gt;&lt;p&gt;Inside is the story of a client they were trying to get started with &lt;a title="Get out of bed you bum!" href="http://freelanceswitch.com/productivity/create-a-morning-writing-ritual/"&gt;morning writing&lt;/a&gt;.&amp;nbsp; (It&amp;#8217;s a tool to harness some of the great ideas that you have and forget about, or just plain ignore.) He thought the exercise was a bit ridiculous, and so couldn&amp;#8217;t get anything written down.&amp;nbsp; They told him to just fill a couple of pages with nonsense sentences (Blah blah blah, I&amp;#8217;m writing a sentence) to get over the mental block.&amp;nbsp; Well, it took him a couple of weeks, but he started having some great ideas and got to actually&amp;nbsp;writing.&lt;/p&gt;&lt;p&gt;Luckily for me, I&amp;#8217;m not fighting this quite so overtly, but I still seem to get in my own way.&amp;nbsp; That&amp;#8217;s why I&amp;#8217;ve decided to take the same advice every time I want to blog.&amp;nbsp; This post started with a couple of notes (since I already had the topic) but each time I got blocked, I just wrote down a couple of paragraphs of nonsense (Blah blah, software bugs are good, people like using Windows, etc.).&amp;nbsp; It didn&amp;#8217;t matter if I just didn&amp;#8217;t have any ideas, or something interrupted my train of thought. I kept belting it out, and managed to make it all the way&amp;nbsp;through.&lt;/p&gt;&lt;p&gt;So, here it is. I hope you find the idea useful as&amp;nbsp;well.&lt;/p&gt;</content></entry><entry><id>http://douglas.mayle.org/2008/10/03/the-wild-west-of-javascript/</id><title type="html">The wild west of javascript.</title><updated>2008-11-14T22:06:41Z</updated><published>2008-10-03T21:35:18Z</published><category term="ie" /><category term="javascript" /><category term="planetdev" /><category term="xinha" /><link href="http://douglas.mayle.org/2008/10/03/the-wild-west-of-javascript/" rel="self" /><link href="http://douglas.mayle.org/2008/10/03/the-wild-west-of-javascript/" rel="alternate" /><content type="html">&lt;p&gt;Just last week, I was working on the new version of &lt;a title="Xinha WYSIWYG editor" href="http://www.xinha.org"&gt;Xinha&lt;/a&gt;.&amp;#160; If you don't know, Xinha's a web-based document editor.&amp;nbsp; Embed it in your blog, your web software, so that you and your users can create web documents. Xinha is &lt;a title="What You See Is What You Get" href="http://en.wikipedia.org/wiki/Wysiwyg"&gt;WYSIWYG&lt;/a&gt;, so there's no need to know HTML.&amp;nbsp; &lt;a href="http://topp.openplans.org/"&gt;The Open Planning Project&lt;/a&gt;, my employer, uses Xinha to power &lt;a title="OpenPlans, the platform for social activism." href="http://www.openplans.org"&gt;OpenPlans&lt;/a&gt;, which is why I get to work on it.&amp;nbsp; Xinha is &lt;a href="http://en.wikipedia.org/wiki/Open_source_software"&gt;Open Source Software&lt;/a&gt;, so we use it, and contribute fixes and enhancements back to the original project.&lt;/p&gt;&lt;p&gt;I was working with &lt;a href="http://www.nicholasbs.com/"&gt;Nicholas Bergson-Shilcock&lt;/a&gt;, my colleague, on his new plugin for Xinha.&amp;nbsp; With this plugin, you can finally make great footnotes in your documents.&amp;nbsp; We were testing his code on &lt;a href="http://en.wikipedia.org/wiki/Internet_Explorer"&gt;Internet Explorer&lt;/a&gt;, and we noticed IE acting strange.&amp;nbsp; Now I don't mean normal IE strange, IE is the bane of all web developers, so I'm used to strange.&amp;nbsp; (If you use IE, then please don't.&amp;nbsp; I don't care whether you use &lt;a href="http://www.mozilla.com/firefox/"&gt;Mozilla Firefox&lt;/a&gt;, &lt;a href="http://www.google.com/chrome"&gt;Google Chrome&lt;/a&gt;, &lt;a href="http://www.opera.com/"&gt;Opera&lt;/a&gt;, &lt;a href="http://www.apple.com/safari/"&gt;Apple Safari&lt;/a&gt;, or if you connect to web servers directly with telnet.&amp;nbsp; Just do all web developers a favor and stop using IE.)&lt;/p&gt;&lt;p&gt;When I say strange, I mean screwy.&amp;nbsp; Certain places in the document just didn't seem to exist.&amp;nbsp; His code used Xinha in different ways than the rest of the plugins, so we were expecting edge cases.&amp;nbsp; But black holes?&amp;nbsp; Nobody expects black holes! &lt;/p&gt;&lt;p&gt;Editable documents are still the wild west of web development, and so I shouldn't be surprised.&amp;nbsp; Javascript and DOM has its &lt;a title="The Write Less, Do More, JavaScript Library" href="http://jquery.com/"&gt;Wyatt Earp&lt;/a&gt; and &lt;a title="Easy Ajax and DOM manipulation for dynamic web applications" href="http://www.prototypejs.org/"&gt;Doc Holliday&lt;/a&gt;, but document editing is too new to have seen the same kind of law enforcement.&amp;nbsp; When it comes to selection, manipulation, and document processing, the browser differences aren't well defined, and there are no libraries to abstract the problems.&amp;nbsp; Even Peter-Paul Koch (of &lt;a href="http://www.quirksmode.org/"&gt;QuirksMode&lt;/a&gt;) told me that "IE's TextRange is a disaster" when I asked for help.&lt;/p&gt;&lt;p&gt;After a bit of exploring the problem we figured out exactly what happens.&amp;nbsp; In Internet Explorer, you can't select the end of a text node (in javascript) if it's followed by a block node.&amp;nbsp; That means that for the valid HTML snippet:&lt;/p&gt;&lt;pre name="code" class="xml"&gt;&amp;lt;div&amp;gt;
  This is my first line
  &amp;lt;p&amp;gt;This is my second line&amp;lt;/p&amp;gt;
&amp;lt;/div&amp;gt;
&lt;/pre&gt;&lt;p&gt;You can't touch the end of the first line.&amp;nbsp; Let me say that again, &lt;em&gt;you can't touch the end of the first line&lt;/em&gt;. What does that mean?&amp;nbsp; All of you DOM jockeys know how to get a reference to the node, and could manipulate the elements, but that's no help for the user.&lt;/p&gt;&lt;p&gt;Your user pushes that cursor beyond the event horizon.&amp;nbsp; They click on your footnote button to bring up a dialog.&amp;nbsp; You insert the text they type, and BAM!&amp;nbsp; The cursor's not where the user left it; you've just crapped markup at some other place in the document.&amp;nbsp; When you do things like that, users start to fear pressing buttons, and we can't have that.&lt;/p&gt;&lt;p&gt;Why haven't we seen it before?&amp;nbsp; Xinha was using pop-ups for dialogs, and they don't change the original selection.&amp;nbsp; Now that we've moved to a lightbox-style dialog system, we're moving the cursor about on the page, and we don't have a way to move it back.&lt;/p&gt;&lt;p&gt;How do we fix it?&amp;nbsp; Our first step was to test in IE8 beta to see if it was fixed.&amp;nbsp; No such luck; sometimes I wonder why I'm an optimist. ;-)&amp;nbsp; My next step was to try out &lt;a href="http://stackoverflow.com/"&gt;StackOverflow&lt;/a&gt;, the new Jeff Atwood / Joel Spolsky software development community.&amp;nbsp; It's pretty hot right now, so I thought it would be a good place to get help, but again, &lt;a title="IE TextRange select method not working properly" href="http://stackoverflow.com/questions/130186/ie-textrange-select-method-not-working-properly"&gt;no go&lt;/a&gt;.&amp;nbsp; The only answer I got was someone who &lt;a href="http://stackoverflow.com/questions/130186/ie-textrange-select-method-not-working-properly#149310"&gt;seemed to remember some comments related to this bug in Javascript&lt;/a&gt;.&amp;nbsp; I tried to find the software he was referring to, but no bugfix there.&amp;nbsp; &lt;a title="The text editor for Internet" href="http://www.fckeditor.net/"&gt;FCKeditor&lt;/a&gt; doesn't have a fix.&amp;nbsp; Neither does &lt;a title="Javascript WYSIWYG Editor" href="http://tinymce.moxiecode.com/"&gt;TinyMCE&lt;/a&gt;. Wikipedia offered up &lt;a href="http://geniisoft.com/showcase.nsf/WebEditors"&gt;this link&lt;/a&gt; to a list of 5000 web-based editors.&amp;nbsp; I tried them all, and all of the software not using pop-ups had the exact same bug.&lt;/p&gt;&lt;p&gt;So, what can we do?&amp;nbsp; Unfortunately, I tried to see if there was a way to trick IE into moving the selection to where we want.&amp;nbsp; I tried moving the selection left, or right, and then back again.&amp;nbsp; I tried inserting content, then deleting it, but there was no direct way to solve the problem.&amp;nbsp; We ended up with three different workarounds, all of which have drawbacks, but are better than no solution at all:&lt;/p&gt;&lt;dl&gt;&lt;dt&gt;Change the justification&lt;/dt&gt;&lt;dd&gt;If you change the justification on the current selection, IE modifies the document so that the selection continues to work.&amp;nbsp; Set it to no justification, and you even get valid HTML! Unfortunately, it re-parents the following element, moving it one node closer to the root of the document.&lt;/dd&gt;&lt;dt&gt;Insert an empty span&lt;/dt&gt;&lt;dd&gt;This works by making sure that you are attempting to select the span element, rather than a text node, and element selection actually works in IE.&amp;nbsp; It craps spans all over the document, though. and even though we try to clean these up, you never know.&lt;/dd&gt;&lt;dt&gt;Insert a visual cue&lt;/dt&gt;&lt;dd&gt;The final method works by inserting a visual cue for the user in the form of a little block (□), then selecting it.&amp;nbsp; If we're about to modify the document, or the user begins to type, the block will be removed automatically.&amp;nbsp; In any other case, the user will see the block and naturally want to delete it from the text.&lt;/dd&gt;&lt;/dl&gt;&lt;p&gt;All three are written in to the &lt;a href="http://trac.xinha.org/browser/trunk/modules/InternetExplorer/InternetExplorer.js#L451"&gt;code&lt;/a&gt;, but we decided to default to the visual cue, because it's the safest in terms of damaging the markup.&amp;nbsp; Otherwise, we've done everything we could to avoid triggering the error, so we hope it won't affect too many users; it's always a trade off.&lt;/p&gt;&lt;p&gt;I wrote this to get some visibility for this problem.&amp;nbsp; This is probably just some sort of off by one error, and IE8 is still in beta, so maybe it can still get fixed.&amp;nbsp; If not, at least you'll have a way to work around the problem when you run into it.&lt;/p&gt;</content></entry></feed>
