<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:admin="http://webns.net/mvcb/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
<channel>
    <title>The Text Frontier</title>
    <link>http://blogs.sas.com/text-mining/</link>
    <description>Text mining, voice mining and unstructured data analysis</description>
    <dc:language>en</dc:language>
    <generator>Serendipity 1.0.3 - http://www.s9y.org/</generator>
    <pubDate>Wed, 18 Nov 2009 18:58:41 GMT</pubDate>

    <image>
        <url>http://blogs.sas.com/text-mining/templates/wwwgeneric/img/rss_banner.png</url>
        <title>RSS: The Text Frontier - Text mining, voice mining and unstructured data analysis</title>
        <link>http://blogs.sas.com/text-mining/</link>
        <width>1</width>
        <height>1</height>
    </image>

<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/TheTextFrontier" type="application/rss+xml" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><item>
    <title>On Text Data Quality</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/W_OBShBB-m0/index.php</link>
            <category>Manya Mayes</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/48-On-Text-Data-Quality.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=48</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=48</wfw:commentRss>
    

    <author>nospam@example.com (Manya Mayes)</author>
    <content:encoded><![CDATA[
    <br />
<a href="http://altaplana.com/grimesbio.html" title="Seth's Bio">Seth Grimes </a>posted a fantastic article on <a href="http://www.b-eye-network.com/view/12072"  title="B-eye Network article by Seth Grimes">Text Data Quality </a>yesterday.  A must read for anyone in this space.  The article points to some of the text quality issues I have mentioned in my <a href="http://blogs.sas.com/text-mining/index.php?/archives/46-Google,-Bing,-Twitter-and-Instant-Web-Search.html"  title="Manya's last blog entry">last two blogs</a>.  Text is in a league of its own when it comes to data quality.  And the more you have to work with social media generated data, the more you will run into non-standard text and the need for text cleansing.  I presented a workshop where I talked about "The Ten Transgressions of Text" at the <a href="http://www.textanalyticsnews.com/usa/"  title="Text Analytics Summit Web site">Text Analytics Summit </a>in June:<br />
<br />
1. UPPERCASE/lowercase<br />
2. Miss-spelings<br />
3. A.C.R.O.N.Y.M.S<br />
4. Shrt-hnd or clipped text (e.g. hmm tink nid >2 twitter acs; els msgs all jumbled up btwn personal &amp; thots! dilemma!)<br />
5.  Pr☺f@nity<br />
6. !!NOISY TEXT!!<br />
7. /*Punctuation*\<br />
8. ♪  Voice ♫<br />
9. Email / Attachments<br />
10. Poor grammar <br />
<br />
Customers ask me if we can automatically remove profanity from documents and, yes, WE CAN!<br />
<br />
My interest in the sorts of shortened/clipped texts that you get in text messages or via Twitter is huge.  There is a lot text analytics users and vendors can do to work with this data.  Terms like "cul8r" (see you later), or "LOL" (laughing out loud / lots of love) could be expanded into their intended forms, mapped to other synonyms(we provide ontologies to handle this), or left as is.  When a shortened term can mean several different things depending on context, that's when the linguistics can help.  I see a big need for including this new 'language' into standard language dictionaries.<br />
<br />
Adhering to the standard rules of grammar looks like a thing of the past.  As traditional print media loses favor, so will standard grammar in social media (blogs, micro-blogs such as FaceBook, Twitter, Bebo etc.). I'm excited to see how other natural language processing technologies will change to accommodate the new breed of user. 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/W_OBShBB-m0" height="1" width="1"/>]]></content:encoded>

    <pubDate>Wed, 18 Nov 2009 13:30:03 -0500</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/48-guid.html</guid>
    <category>data quality</category>
<category>social media</category>
<category>text mining</category>
<category>textspeak</category>
<category>twitter</category>

<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/48-On-Text-Data-Quality.html</feedburner:origLink></item>
<item>
    <title>Google, Bing, Twitter and Instant Web Search</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/HXQeiuzQ3ew/index.php</link>
            <category>Manya Mayes</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/46-Google,-Bing,-Twitter-and-Instant-Web-Search.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=46</wfw:comment>

    <slash:comments>3</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=46</wfw:commentRss>
    

    <author>nospam@example.com (Manya Mayes)</author>
    <content:encoded><![CDATA[
    I read an interesting article this morning entitled <a href="http://www.usatoday.com/tech/products/2009-11-03-real-time-search_N.htm?csp=usat.me"  title="USA Today article">Companies race to offer instant Web search, including Twitter</a> by USA TODAY reporter <a href="http://www.usatoday.com/community/tags/reporter.aspx?id=321"  title="Jon Swartz articles">Jon Swartz </a>. The recent announcements that <a href="http://www.mercurynews.com/breaking-news/ci_13609992?nclick_check=1"  title="Google, Bing announcement">Google and Bing will include Twitter in search results </a>raise some intriguing questions – and challenges. Ever seen a Tweet like this? <br />
<br />
hmm tink nid >2 twitter acs; els msgs all jumbled up btwn personal &amp; thots! dilemma!<br />
<br />
What search methodology is going to pick that up? Why, why not and who cares? And what do you do with it anyway?<br />
<br />
Given Twitter information is publicly available, it makes sense that using natural language search is extremely valuable for finding information - whether you are monitoring brand information or looking for answers.  I bet that one large US airline would love to see the many, many YouTube hits about <a href="http://www.youtube.com/watch?v=5YGc4zOqozo"  title="United Breaks Guitars">breaking a musical instrument</a> go away!  <br />
<br />
The fact still remains that you will get a lot of search results that you still have to deal with.  Say you work for Target, and you want to monitor the Twittersphere. Try searching Twitter for the word "<a href="http://twitter.com/#search?q=target"  title="Twitter search for "target"">target</a>".  Results may include target run rates for the recent NZ cricket test match, the US retail store, target marketing references or target practice. A quick search on "target" on <a href="http://thesaurus.reference.com/browse/target"  title="thesaurus.com search results">thesaurus.com </a> shows target used as both a noun and a verb.  Target is also a proper noun.  A graphical link from thesaurus.com to visual thesaurus gives a hint of the many ways the word target can be used. <br />
<br />
Including text analytics capabilities means you can group similar documents together, thus creating categories around the document meaning.  This allows you to automatically weed out the information you don't need and get to the information you need.  Furthermore, the inclusion of text analytics in social media analysis means you can analyze comments, determine relevance, find promoters, detractors and much more!<br />
<br />
Stay tuned as we explore this topic more – it’s not going away!<br />
 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/HXQeiuzQ3ew" height="1" width="1"/>]]></content:encoded>

    <pubDate>Wed, 04 Nov 2009 14:24:49 -0500</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/46-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/46-Google,-Bing,-Twitter-and-Instant-Web-Search.html</feedburner:origLink></item>
<item>
    <title>What's in a name?  YOUR BRAND!!!</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/j1APc2iP_G8/index.php</link>
            <category>Manya Mayes</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/45-Whats-in-a-name-YOUR-BRAND!!!.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=45</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=45</wfw:commentRss>
    

    <author>nospam@example.com (Manya Mayes)</author>
    <content:encoded><![CDATA[
    With the explosion in social media sites turning us all into digital producers as opposed to digital consumers, businesses are left in the unenviable position of needing to rapidly adapt to the new generation of customers who take their opinions global, and in a heartbeat!  <br />
<br />
These new age digital producers are leaving their opinions on social media sites at an exploding rate.  It pays for companies to keep track of what their customers are saying and taking the appropriate action where needed.  For companies that are looking to analyze blogs and tweets, etc. having a brand/product name that is commonplace and has multiple parts of speech in a dictionary will make that analysis job much more difficult.  Consider the word Jaguar for example. Is it a car? A plane? An animal? A US sports team? An international sports team? A BBC show?  A Twitter search on Jaguar may pull back information you aren't so interested in.  Conversely, having an unusual name (not ideal according to traditional marketing strategy) means it's easier to find information about you, your company, or your products.  The Google name/brand would be a great example.  Do a google on "Google" and all of the results (I did spend a little time wading through these) look to be all about the Google brand!!! Food for thought as you consider product naming going forward.<br />
<br />
And, who knows, there's probably someone out there that has named their son or daughter "Google", but surely so few of them that they wouldn't impact any brand analysis.  
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/j1APc2iP_G8" height="1" width="1"/>]]></content:encoded>

    <pubDate>Fri, 30 Oct 2009 13:54:08 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/45-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/45-Whats-in-a-name-YOUR-BRAND!!!.html</feedburner:origLink></item>
<item>
    <title>SAS is hiring - Text Analytics is Growing!</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/tStX-4mLPkM/index.php</link>
            <category>Mary Grace Crissey</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/44-SAS-is-hiring-Text-Analytics-is-Growing!.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=44</wfw:comment>

    <slash:comments>2</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=44</wfw:commentRss>
    

    <author>nospam@example.com (Mary Grace Crissey)</author>
    <content:encoded><![CDATA[
    Things have been mighty busy here at SAS lately.  Please accept our apology for the infrequent blog postings.  The good news is that there have been some really interesting customer engagements going on that required our full attention.    I just returned from working an exhibit with <a href="http://meetings.informs.org/sandiego09/plenaries.html"  title="INFORMS san diego">INFORMS in San Diego</a>, where we had so many people ask about text analytics we ran out of handouts.  Its the buzz on social media and the idea of mining the words and text we use in those communications that seems to catching on - Big Time!  What was traditionally an optimization operations research event, has expanded into all sorts of analytics with NLP and Text Analytics being cited in a dozen different sessions. <br />
<br />
Now - for the subject line of this blog entry........ I am delighted to announce that we are expanding our Text Analytics team.  We are looking for someone to perform pragmatic product management duties along with our existing Text Miner R&D experts, the Teragram Employees, and our Text Analytic consulting and sales engineers.  Text Analytics has grown to encompass much more that the SAS Text Miner product which was originally launched as an "add on" piece to our Enterprise Miner offering.  Today Text Analytics includes more than predictive modeling so we are opening up new positions for experts -- thus the open position of <strong>Product Manager for SAS Text Analytics.  </strong><br />
If you have Bachelor's degree in computer science, applied mathematics, statistics, or a related quantitative discipline and 5 years of experience in product management, consulting, or a related function in the software industry we welcome your application.   Expertise in the application of text analytics methodologies is required so I encourage those of you reading this blog to help spread the word to those who run one or more of the text analytics software packages now on the market.   <br />
<br />
The instructions on how to apply are provided by our Human Relations department <a href="http://www.sas.com/jobs/USjobs/apply.html"  title="SAS job applications instructions">here </a> <br />
the job number for the <u><strong>TEXT ANALYTICS position is - 09001816.</u> </strong><br />
This position is located at SAS Headquarters Cary, NC near the RDU airport. <br />
<br />
I invite you to browse all our jobs as posted on the main sas web page which you can find by selecting CAREERS on the top horizonal menu bar, and then clicking on <a href="http://www.sas.com/jobs/USjobs/index.html#"  title="SAS professional opportunities">professional opportunities</a>.   You will see that we are also seeking experienced consultants for our Advanced Analytics Lab.  So those of you familiar with our SAS software may wish to apply to those jobs also.  Its exciting our these technologies and the people who run them are now IN DEMAND <img src="http://blogs.sas.com/text-mining/templates/wwwgeneric/img/emoticons/wink.png" alt=";-)" style="display: inline; vertical-align: bottom;" class="emoticon" /><br />
<br />
Meanwhile we on the SAS team are having fun enhancing Text Miner to make the next release of version 4.2 available early December 2009.  This will be the release with many Teragram capabilities weaved inside the Text Miner product.  More on that topic in a future blog.  <br />
<br />
Thanks for reading and<br />
...... Thanks for carrying the message out to your boss and your clients that <em><strong>Yes indeed </strong></em> , "Text Analytic Technologies are ready TODAY to be applied and make impacts in our world".   <br />
 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/tStX-4mLPkM" height="1" width="1"/>]]></content:encoded>

    <pubDate>Fri, 16 Oct 2009 17:24:22 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/44-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/44-SAS-is-hiring-Text-Analytics-is-Growing!.html</feedburner:origLink></item>
<item>
    <title>Man versus Machine ---Logical operator "&amp;"  or "V"?</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/QAzcmxSV9xA/index.php</link>
            <category>Mary Grace Crissey</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/42-Man-versus-Machine-Logical-operator-or-V.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=42</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=42</wfw:commentRss>
    

    <author>nospam@example.com (Mary Grace Crissey)</author>
    <content:encoded><![CDATA[
    Last Month at the <a href="http://www.sigir2009.org/"  title="Association for Computing Machinery's Special Interest Group on Information Retrieval">SIGIR meeting in Boston </a>, one of the presentations given by <a href="http://twitter.com/guppywon/statuses/2779603408"  title="tweet on NY times use of Teragram">a Teragram customer attracted notice in a twitter post. </a>   <br />
<br />
The NY Times automated the tagging of topics for their online website by their implementation of software to automatically build their indexes. However, as the tweet points out - the machine has NOT replaced Man because the newspaper continues to rely on MANUAL entries by people who maintain and build the<a href="http://www.library.wwu.edu/ref/howtoguides/nyti.htm"  title="NY Times Index"> New York Times Index</a>, a more traditional index. <br />
<br />
<img width='125' height='108' style="float: left; border: 0px; padding-left: 5px; padding-right: 5px;" src="http://blogs.sas.com/text-mining/uploads/Man-VS-Machine.jpg" alt="" /><br />
<br />
<a href="http://arnoldit.com/wordpress/2009/07/31/new-york-times-two-indexing-methods/"  title="stephen arnold blog">Stephen Arnold wondered in his blog</a> why an organization might continue to require human labor on a task machine can now perform? Could be political resistance to change? or perhaps the machine fails sometimes? Perhaps the employees without  skills to be reassigned are in fact prime for the next round of employees to see a "pink slip" as budgets get cut?  <br />
<br />
Mr. Arnold's ideas are all valid possibilities and I've seen cases of each in my experiences transferring technology from the research lab into business production environments.  Those who put a stake in the ground and step forward to be the first to serve as role models for how text analytics can carry their business forward - ought to pause and consider their own culture. <br />
<br />
Since the original question was about a Teragram customer implementation, I asked  <a href="http://www.sas.com/events/dmconf/speaker.html"  title="M2009 speakers bio">Saratendu Sethi </a>, the director of Engineering at Teragram to share what he's pbserved in his consulting engagements. Here is his response.<br />
<br />
First of all,  even if automatic categorization guarantees >99% accuracy, for a News company, it is absolutely critical to not portray any wrong information for even 1%. This can only be verified by having humans validate the categorization results. They are doing that on a subset of articles, e.g. front-page articles.   <br />
<br />
Secondly, new topics constantly emerge in the coverage of current events. Even the best text mining algorithms can’t achieve perfection in spotting emerging topics because these algorithms are usually based on processing<strong> past </strong>content.  Also, the <blockquote>definition of emerging topics is based on human perception </blockquote>which is affected by time, location and the type of entities involved in the event. Therefore, these topics have to be manually spotted and added to documents/taxonomy while they are emerging.<br />
<br />
Having said that, the following are <strong>four benefits</strong> that <a href="http://www.teragram.com/news/pr20070917tgsoftware.html"  title="Teragram mines news sites">Teragram categorization achieves for New York Times:</a><br />
(1)	If two people are asked to suggest categories on the same document on their own, they are always going to come up with different categories. Automatic categorization <em><u>enforces consistency and removes human subjectivity </u></em>by automatically suggesting them categories.<br />
(2)	Automatic categorization <em><u>saves time </u></em>because it is easier to ask editors to select appropriate categories from an automatically generated list rather than having them to think about them. With automatic categorization, I can just spend few seconds but with manual categorization I have to use few minutes to read the content and decide the appropriate topics<br />
(3)	<em><strong><u>Entity extraction </u></strong></em>(e.g. identifying person, locations, etc), which doesn’t require much human input, is automated.<br />
(4)	<em><strong><u>Automatic categorization </u></strong></em>enables New York Times to process all their past archives. Currently New York Times re-processes all their past 25years of content with updated taxonomies every few months. <br />
<br />
      4a.	The human editors are only reviewing articles for current day (~500-1000 articles/day) whereas the past archives might include 100K articles/year. <br />
<br />
      4b.	If “<a href="http://query.nytimes.com/search/sitesearch?query=swine+flu&date_select=full&srchst=cse"  title="swine flu news">swine flu</a>” was only identified as a News topic in 2009, then automatic categorization allows NYT to find out what other news appeared in past.<br />
<br />
So what do you conclude from this post? How would YOU answer the question posed in the title of this entry ?  It is cost effective to apply MAN "and" Machine together -- or has the science progressed enought to replace MAN ?  Is it time to choose and go with Man "or" Machine  approach when deciding about becoming more efficient?<br />
<br />
Saratendu answers with the "AND" operator -- and thats the answer I prefer too --cause i'm not comfortable letting those sci fi robots and machines take over my world.<br />
  <br />
How about you?<br />
<br />
<br />
<br />
 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/QAzcmxSV9xA" height="1" width="1"/>]]></content:encoded>

    <pubDate>Fri, 11 Sep 2009 00:00:00 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/42-guid.html</guid>
    <category>artifical intelligence</category>
<category>content categorization</category>
<category>extraction</category>
<category>information retrevial</category>
<category>teragram</category>
<category>twitter</category>

<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/42-Man-versus-Machine-Logical-operator-or-V.html</feedburner:origLink></item>
<item>
    <title>Text Analytics for Ye of Little Faith</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/BywVnf-zwrE/index.php</link>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/41-Text-Analytics-for-Ye-of-Little-Faith.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=41</wfw:comment>

    <slash:comments>4</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=41</wfw:commentRss>
    

    <author>nospam@example.com (Manya Mayes)</author>
    <content:encoded><![CDATA[
    While at a customer site last week, I presented our text analytics capabilities (text mining, content categorization, <a href="http://www.teragram.com/solutions/sentiment-analysis.html"  title="sentiment">sentiment</a> and <a href="hhttp://www.teragram.com/solutions/webcrawler.html"  title="Teragram Crawler">crawling</a>). Before the meeting proper, one attendee admitted that he wasn’t a text analytics believer.  <br />
<img width='110' height='110' style="float: right; border: 0px; padding-left: 5px; padding-right: 10px;" src="http://blogs.sas.com/text-mining/uploads/graphics/Faith-Trust-Pixie-Dust_6E9B819C.jpg" alt="" /><br />
 I guess he was warning me - not to wave my hands vaguely referring to some "higher power" of fancy math and linguistics.  At 30k feet in the air (the return flight home), I realized my missed opportunity.   <br />
<br />
I wish I'd casually explained to him,  how the <a href="http://www.sas.com/success/honda.html"  title="SAS text miner keeps Honda drivers Safe">car he is driving today is safer </a>today thanks to text analytics.   It means a lot to me that text analytics makes such a difference in people’s lives – even if they don’t realize it or "believe". <br />
<br />
Over time the dismissive "yeah right" doubters will see the obvious.   Artificial Intelligence and Text Analytics are due respect today as <a href="http://for-the-time-being.blogspot.com/2009/08/science-fiction-friday-series-re-boot.html"  title="sci fiction ">Science rather than Sci Fi entertainment </a> <br />
<br />
 All that said, I expect that our Text Frontier followers are already believers, so my blog post might be in vain.    You may have felt the rush of adrenaline when discovering the treasure of a rare  "<strong><em>Ah-Ha moment</em>"</strong>  when insights are found buried in text.    Let's get creative and brainstorm how to "evangelize" others and bring them into the fold.<br />
<br />
Gradually people are taking notice -- our profession is progressing and earning trust.  Future posts here will highlight <a href="http://www.sas.com/success/indexByTechnology.html#1000.1008.0003"  title="SAS text analytic success stories">customer success stories SAS and Teragram </a>have documented.   Please drop a comment to our blog here and share any TM related jokes or favorite one liners that you use to turn heads to build curiosity over this field of analytics.  Better yet -- prove it with your own innovative implementation at your office.  Seeing is Believing! 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/BywVnf-zwrE" height="1" width="1"/>]]></content:encoded>

    <pubDate>Thu, 03 Sep 2009 00:00:00 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/41-guid.html</guid>
    <category>auto</category>
<category>customers</category>
<category>manufacturing</category>
<category>safety</category>
<category>sentiment</category>
<category>significance</category>
<category>success stories</category>
<category>webcrawler</category>

<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/41-Text-Analytics-for-Ye-of-Little-Faith.html</feedburner:origLink></item>
<item>
    <title>Keeping SAFE with Text Analytics</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/jJrCXMJIR1c/index.php</link>
            <category>Mary Grace Crissey</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/40-Keeping-SAFE-with-Text-Analytics.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=40</wfw:comment>

    <slash:comments>1</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=40</wfw:commentRss>
    

    <author>nospam@example.com (Mary Grace Crissey)</author>
    <content:encoded><![CDATA[
    Where has the time gone?  Here in Texas the hot sun is still roasting, while local retailers are promoting their back to school items on sale.  For several weeks now, I've had a blog idea brewing from a talk entitled "WHY COUNT CRIME WHEN YOU CAN PREVENT IT?" You'll see why it caught my interest by noting the image in the top left corner of this slide. <br />
<br />
<img width='434' height='322' style="float: left; border: 5px; padding-left: 5px; padding-right: 15px;" src="http://blogs.sas.com/text-mining/uploads/RegionCapture.jpg" alt="" /><br />
<br />
Dr Colleen McCue shows how handwritten police notes and data taken from phone calls can be analyzed to predict future locations and potential criminal events.   She'll be speaking live at <a href="http://www.sas.com/events/dmconf"  title="M2009 SAS data mining conference">M2009 </a>but for those who you want to hear her sooner - you can view the <a href="http://www.bettermanagement.com/seminars/seminar.aspx?L=15097"  title="police notes crime data analysis">archived presentation </a>at your leisure.  Her engaging explanation illustrates how Analytics are helping Police departments do their job of keeping neighborhoods safer. <br />
<br />
According to <a href="http://www.mc2solutions.net/id1.html"  title="Dr Colleen McCue">Dr McCue</a> "Automated text analytic software could be<strong> game changing </strong>in information intensive tasks (e.g., a major case will have thousands of tips – the DC Sniper case was compromised in some ways because people focused on the “white van” – the software won’t get tired, bring bias, or forget what it just read).  It also has tremendous potential in culling through a lot of interview data (e.g., the detainee data), particularly when you have disparate sources that are geographically diverse but likely connected (through common operational goals, training, etc).  "<br />
<br />
Three cheers for the FBI, local police - and your local government -- all holding future potential customer success stories for text analytics.  Meanwhile you don't want to miss the <a href="http://www.sas.com/reg/wp/corp/9961"  title="Text mining for SAFETY">recent white paper Text Mining for Safety </a>describing how the Oil and Gas industry sees Text Analytics as the answer to moving beyond simply tracking accidents (counting them) to REDUCING hazards on the job.  Text Analytics is keeping us safe on the job and at home.    <br />
<br />
 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/jJrCXMJIR1c" height="1" width="1"/>]]></content:encoded>

    <pubDate>Tue, 04 Aug 2009 00:00:00 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/40-guid.html</guid>
    <category>Crime</category>
<category>criminal</category>
<category>Dr McCue</category>
<category>FBI</category>
<category>interview data</category>
<category>M2009</category>
<category>predict</category>
<category>safety</category>

<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/40-Keeping-SAFE-with-Text-Analytics.html</feedburner:origLink></item>
<item>
    <title>Calling all Lone Rangers</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/fLQK2D9bZN0/index.php</link>
            <category>Mary Grace Crissey</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/39-Calling-all-Lone-Rangers.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=39</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=39</wfw:commentRss>
    

    <author>nospam@example.com (Mary Grace Crissey)</author>
    <content:encoded><![CDATA[
    With SAS analytics; many of you are breaking ground as you strive to deliver more value from textual data.  Data mining has matured into an accepted practice for Customer Relationship Management teams in Telco, Finance and Marketing. I'd go as far to say that its become essential to survival for most large companies across the globe.  Text Mining, as you readers are well aware,  is not yet as popular, with many employers assigning just one or two of you with responsibility for text mining.   <br />
<br />
It can be a burden to struggle alone in a silo without anyone to bounce ideas or brainstorm with.  To make it easier for you to connect with peers who share a passion for these technologies  –  we set up a discussion forum on the topic of SAS Enterprise Miner and SAS Text Miner, two months ago.  While SAS employees may participate on these discussions, this forum is not meant to replace the <a href="http://support.sas.com/techsup/contact/"  title="SAS tech Support">SAS Technical Support help center</a>. <img width='208' height='87' style="float: right; border: 0px; padding-left: 5px; padding-right: 5px;" src="http://blogs.sas.com/text-mining/uploads/sgflogo_sub.gif" alt="" /><br />
  <br />
<br />
Another excellent way for you to get feedback on your work is to respond to the <a href="http://support.sas.com/events/sasglobalforum/2010/cfp.html"  title="SAS user group conference">SGF Call for presentations</a> 2010 (Seattle).    <br />
<br />
Honestly, <a href="http://www.youtube.com/watch?v=irhroQ14Ufo"  title="sound of music">one of my favorite things </a>about SAS is <br />
- <img src="http://blogs.sas.com/text-mining/templates/wwwgeneric/img/emoticons/smile.png" alt=":-)" style="display: inline; vertical-align: bottom;" class="emoticon" /> You - , <br />
our innovative customer implementing software on your real world challenges. <br />
      <br />
Only a few text related topics have surfaced in the discussion forum to date, so I’m writing this blog to encourage more of you to join in and set up your profile.   Please accept <a href="http://support.sas.com/forums/thread.jspa?threadID=5877&tstart=0"  title="SAS discussion forum">my invitation </a>to post questions, experiences, and thoughts on best practices. 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/fLQK2D9bZN0" height="1" width="1"/>]]></content:encoded>

    <pubDate>Tue, 28 Jul 2009 20:18:00 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/39-guid.html</guid>
    <category>alone</category>
<category>burden</category>
<category>CRM</category>
<category>Discussion Forum</category>
<category>feedback</category>
<category>global forum</category>

<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/39-Calling-all-Lone-Rangers.html</feedburner:origLink></item>
<item>
    <title>Why customer intelligence will fail without text mining (cont.)</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/033tzjUJ0EM/index.php</link>
            <category>Manya Mayes</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/38-Why-customer-intelligence-will-fail-without-text-mining-cont..html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=38</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=38</wfw:commentRss>
    

    <author>nospam@example.com (Manya Mayes)</author>
    <content:encoded><![CDATA[
    My colleague Mark Chaves, product manager for <a href="http://www.sas.com/solutions/crm/index.html"  title="Sales Funnel">SAS Customer Intelligence</a> responded to my earlier post “Why customer intelligence will fail without text mining,” with some strong opinions of his own. And remember – he’s a marketing guy, too! Read on:<blockquote>I agree with Manya’s comments and wanted to add that advertising as a medium through which marketers communicate is evolving, not diminishing.  <br />
<br />
When I hear comments like “we don't need online advertising”… it makes me laugh. Professors like Eric Clemons have the luxury of living in an academic world and being provocative gives them a chance to get their names out there (as <a href="http://kdpaine.blogs.com/kdpaines_pr_m/"  title="KD Paine's PR Measurement Blog">KD Paine </a>will tell you, universities are also tracking “mentions” and “quotes” in order to get name recognition for their professors).<br />
<br />
Without having read Eric’s comments in full context, I would only add that evaluation of advertising or any marketing tactics has to be made in the context of a firm’s strategic goals and objectives.  This is typically represented by scorecards and “funnel” diagrams that describe the “path” that consumers take when making purchasing decisions.<br />
<img width='225' height='260' style="float: right; border: 0px; padding-left: 5px; padding-right: 5px;" src="http://blogs.sas.com/text-mining/uploads/CIfunnel.PNG" alt="" /><br />
In the funnel to the right, we see that online advertising may have a positive effect on improving “Brand/Product Awareness” (top of the funnel) and then we may also notice that online peer recommendations may have a measurable effect on web traffic or store traffic, and ultimately sales.  Conversely, a peer may recommend a new product to another peer (thereby driving awareness) and a price promotion may actually compel the consumer to make the purchase.<br />
<br />
Two examples:<br />
1)      I see a banner advertisement for a new Weber grill from Lowe’s (increased brand awareness) but I may call up my friend Dan (a BBQ nut) or check BBQ blogs to see how it is rated.<br />
<br />
2)      A blog or a tweet or a friend directly introduces me to the Flip Camcorder (increased awareness, consideration) but I may be compelled to purchase said camera if a Best Buy banner ad tells me there is a sales on Camcorders</blockquote>Coming back to my comment about marketers beware, tying promotions to the right kind of consumer reviews could be extremely valuable.  Text mining can analyze consumer reviews to help identify the appropriate comments and segment(s) to go after. 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/033tzjUJ0EM" height="1" width="1"/>]]></content:encoded>

    <pubDate>Fri, 10 Jul 2009 15:10:34 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/38-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/38-Why-customer-intelligence-will-fail-without-text-mining-cont..html</feedburner:origLink></item>
<item>
    <title>Why customer intelligence will fail without text mining</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/DE537jBfNio/index.php</link>
            <category>Manya Mayes</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/36-Why-customer-intelligence-will-fail-without-text-mining.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=36</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=36</wfw:commentRss>
    

    <author>nospam@example.com (Manya Mayes)</author>
    <content:encoded><![CDATA[
    I'm doing my weekly round-up of text mining/unstructured data/information management news.  Having lived in numerous continents around the world, I like to make sure my information hunting is equally intercontinental.  Different cultures have different slants on topics.  <br />
<br />
This morning's search led me to an article posted by the New Zealand Herald entitled "<a href="http://www.nzherald.co.nz/technology/news/article.cfm?c_id=5&objectid=10583213"  title="New Zealand Herald: "How Can YouTube Survive?"">How Can YouTube Survive?</a>" A section of the article mentioned insider's technology blog, TechCrunch, and a guest blog post entitled "Why Advertising Is Failing On The Internet" written by Eric Clemons, Professor of Operations and Information Management at the University of Pennsylvania. A very interesting read, and also a very provocative one (validated by <strong>many</strong> comments to the blog post).  <br />
<br />
According to this excerpt from the NZ Herald article, Clemons "argued that the way that we're using the Internet has shattered the whole concept of advertising. We need no encouragement to share our opinions online regarding products and services and offer them star ratings; as a result, we're much more likely to look for personal recommendations from other customers than wait for a gaudy advert to beckon us wildly in the direction of a company website or online store. He claims we don't trust online advertising, we don't need online advertising, but above all we don't want online advertising." <br />
<br />
Based on my personal Internet shopping habits, I agree! I'd much rather see personal testimony about a product in addition to (or instead of) marketing collateral. This personal testimony has becoming a new form of marketing. It would serve marketing professionals well to pay attention. <br />
<br />
Understanding individuals' commentaries about products helps marketers better understand consumer reaction to the four P's of the marketing mix: product, price, placement and promotion. <br />
<br />
This evolution of marketing influencers is exactly what makes text mining a pivotal technology for this generation. It provides the ability to gauge those huge volumes of Web-based consumer reactions in an automated, consistent manner. And then you can actually do something about it -- or with it! 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/DE537jBfNio" height="1" width="1"/>]]></content:encoded>

    <pubDate>Fri, 10 Jul 2009 09:54:02 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/36-guid.html</guid>
    <category>advertising</category>
<category>consumer opinion</category>
<category>YouTube</category>

<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/36-Why-customer-intelligence-will-fail-without-text-mining.html</feedburner:origLink></item>
<item>
    <title>Travels to Paris and Copenhagen this week!</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/FOJ1xZKKBdU/index.php</link>
            <category>Mary Grace Crissey</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/35-Travels-to-Paris-and-Copenhagen-this-week!.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=35</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=35</wfw:commentRss>
    

    <author>nospam@example.com (Mary Grace Crissey)</author>
    <content:encoded><![CDATA[
    SAS is sending data &amp; text mining experts (including Teragram employees) over the ocean to Europe for two different events this week. <img width='400' height='45' style="float: right; border: 0px; padding-left: 5px; padding-right: 5px;" src="http://blogs.sas.com/text-mining/uploads/kdd09.jpg" alt="" /><br />
We'll have a booth in the exhibit hall at   <a href="http://www.sigkdd.org/kdd2009/index.html"  title="KDD 2009 Paris">KDD09 </a>  Sunday through Wednesday. If you are one of the lucky ones attending KDD, mark your program to attend the panel discussion to listen to Dr Wayne Thompson from SAS talk about <strong>Emerging Trends in Open Standards and Cloud Computing for Data Mining</strong> . <br />
<br />
Even if you don't make it to the KDD conference to personally pick up the new book authored by the conference chair <a href="http://www.sas.com/events/dmconf/speaker.html#elder"  title="john elder datamining lab">John Elder </a>, you can experience our Software-on-Demand version of data mining  by buying his book, <a href="http://www.amazon.com/Handbook-Statistical-Analysis-Mining-Applications/dp/0123747651/ref=sr_1_1?ie=UTF8&s=books&qid=1246058602&sr=1-1"  title="elders book on amazon">"Handbook of Statistical Analysis and Data Mining Applications."</a> <br />
<br />
<img width='250' height='55' style="float: right; border: 0px; padding-left: 5px; padding-right: 5px;" src="http://blogs.sas.com/text-mining/uploads/a2009.jpg" alt="" /><br />
The second event where you can find us is at the SAS conference devoted to ANALYTICS called <a href="http://www.sas.com/events/aconf/ "  title="SAS analytics show">A2009</a> in Denmark July 1,2. The program is online. There you can read the abstract about the  <a href="http://www.sas.com/success/afa.html"  title="AFA text mining story">a Swedish Insurance firm </a> that studied hand written notes collected by police officers and security guards during 2004-2007. <br />
<br />
At both shows, you'll be able to see live demos of our software and pick up a hard copy of the most recent fact sheet, highlighting the enhancements that are now available with the TEXT MINER 4.1 version that was made available to customers 5 weeks ago. Those of you reading this blog that haven't yet seen it may want to read the <a href="http://www.sas.com/resources/factsheet/sas-text-miner--factsheet.pdf"  title="SAS TM4.1 fact sheet">fact sheet on our SAS 9.2 release of Text Miner</a> on the SAS website.    <br />
<br />
What does your summer hold for you?  Do you have travel plans to shows or conferences with text analytics tracks or sessions included?  Please add a comment to this blog and do share! <br />
 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/FOJ1xZKKBdU" height="1" width="1"/>]]></content:encoded>

    <pubDate>Fri, 26 Jun 2009 18:48:15 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/35-guid.html</guid>
    <category>a2009</category>
<category>analytics</category>
<category>denmark</category>
<category>france</category>
<category>john elder</category>
<category>kdd</category>
<category>sas</category>
<category>teragram</category>

<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/35-Travels-to-Paris-and-Copenhagen-this-week!.html</feedburner:origLink></item>
<item>
    <title>IDG asks 131 executives about their IT spend priorities for 2009</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/omqVH-HXvHI/index.php</link>
            <category>Mary Grace Crissey</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/34-IDG-asks-131-executives-about-their-IT-spend-priorities-for-2009.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=34</wfw:comment>

    <slash:comments>1</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=34</wfw:commentRss>
    

    <author>nospam@example.com (Mary Grace Crissey)</author>
    <content:encoded><![CDATA[
    A recent survey by IDG Research Services, highlights <strong>Business Process Automation </strong>as an IT priority.<br />
<br />
 Some of the findings include:<br />
•	More than 2/3 of respondents are automating most of their core business processes<br />
•	Another 21% are moving towards this goal<br />
•	87% consider BPA to be a critical or important IT priority <br />
•	87% see a connection between unified communications and process automation,<br />
•	More than one third envision communication technology being incorporated into BPA in the future<br />
<br />
Even though I have not spoken with Joe Staples and  Brad Herrington from  "<a href="http://www.ucstrategies.com/unified-communications-expert-views/interactive-intelligence-discusses-communications-based-process-automation.aspx"  title="unified communication strategies">Interactive Intelligence</a>", I share their observation that many in today’s economic environment, are trying to streamline operations and do more with less.  As organizations seek ways to be more efficient, both in the front office and back office, we might position our technology as a tool for automating business processes leading to improved business results.   Have any of you motivated your IT department to spend $$ on Textual Analytic software or recruit support for your research program with this approach?<br />
<br />
Its rare for BPA companies to include automating manual processes surrounding words or unstructured content via TEXT Technologies.  After I watch the <a href="https://www.techwebonlineevents.com/ars/eventregistration.do?mode=eventreg&F=1001639"  title="Info Week webcast on IDG survey">webcast on June 25 </a>and get the white paper - I'll let you know if any mention of Natural Language processing or <a href="http://www.sas.com/resources/factsheet/factsheetContentCategorization.pdf"  title="SAS fact sheet Teragram">Content Categorization </a>or <a href="http://www.cmswire.com/cms/social-media/a-sentiment-analysis-manager-for-your-online-communities-004746.php "  title="CMSwire on Teragram">Sentiment Analysis</a> is made.  <br />
<br />
Meanwhile,  it's up to all of us to continue to promote awareness and implement Text Analytics into real world situations.  We aren't talking about a dream of some vague emerging futuristic possibility , the time is now to include text communication in with traditional data sources of computer processing applications.   <br />
<br />
When one combines text analytics with <strong>mathematical optimization </strong>and <strong>predictive analytics</strong>, we can go well beyond merely automating business processes by <em>improving and discovering </em>entirely new processes leading to a sustainable future.  Thanks for reading.  <br />
<br />
<br />
<br />
 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/omqVH-HXvHI" height="1" width="1"/>]]></content:encoded>

    <pubDate>Fri, 19 Jun 2009 13:18:22 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/34-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/34-IDG-asks-131-executives-about-their-IT-spend-priorities-for-2009.html</feedburner:origLink></item>
<item>
    <title>Text Speak</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/7f_fOkZ4Tb0/index.php</link>
            <category>Manya Mayes</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/33-Text-Speak.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=33</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=33</wfw:commentRss>
    

    <author>nospam@example.com (Manya Mayes)</author>
    <content:encoded><![CDATA[
    I just posted a tweet to my <a href="http://twitter.com/manyamayes"  title="Manya's tweets">@ManyaMayes </a>Twitter account.  In order to get my message across, in 140 characters or less, I had to shorten my text.  This is a very common practise for mobile phone users who send text messages that look a lot like a foreign language.  My Mum writes messages that are so clipped that I have trouble deciphering them!  As a BlackBerry user, I send email messages but I rarely send SMS messages.  I've spent many years making sure I write messages that are easy for audiences to understand.  It's going to take me a while to get used to writing clipped text (writing in <a href="http://en.wikipedia.org/wiki/Text_message#Text_speak"  title="Wikipedia defn of text speak">text speak</a>) as part of my job.  It goes against much of my professional training to write like this:  u no wot u no &amp; <a href="http://en.wikipedia.org/wiki/Known_unknown"  title="The known unknown defined">u don't no wot u don't</a><br />
<br />
How does text mining handle this?  One approach would be to <a href="http://jaredprins.squarespace.com/blog/2008/10/22/does-shorthand-and-symbols-cause-problems-with-text-mining.html#comments"  title="Jared Prins blog about emoticons">specify synonyms for these clipped terms</a>:<br />
<br />
u = you<br />
no = know<br />
wot = what<br />
<br />
But "no" and "know" are both valid dictionary entries, so this will immediately cause a follow on problem since surely not all occurrences of "no" should be replaced with "know".  Deciding which occurrences of "no" should be replaced with "know" is aided by using additional context of the document.   Boolean and linguistic rules can help with this. <br />
<br />
It can be difficult to solve data quality problems like this and typically solutions are specific to both the data and the application.  For example, the way you would replace R&R would depend on whether the data came from a forum for military personnel talking about upcoming "rest and relaxation" or whether it was a warranty report describing "repair and replace" for a defective part or other... 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/7f_fOkZ4Tb0" height="1" width="1"/>]]></content:encoded>

    <pubDate>Wed, 17 Jun 2009 10:40:33 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/33-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/33-Text-Speak.html</feedburner:origLink></item>
<item>
    <title>Sentiment Analysis Overview</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/zCT7A5CbGqo/index.php</link>
            <category>Manya Mayes</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/32-Sentiment-Analysis-Overview.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=32</wfw:comment>

    <slash:comments>3</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=32</wfw:commentRss>
    

    <author>nospam@example.com (Manya Mayes)</author>
    <content:encoded><![CDATA[
    I saw the following comment on Twitter yesterday about sentiment analysis limitations and decided it would make a good topic for a blog update:<br />
<br />
<a href="http://twitter.com/Concannon"  title="Lance Concannon's Twitter Posts">@concannon</a>:  <strong>Can anybody explain to me why automated sentiment analysis is anything more than flaky, snake-oil BS? The technology just isn't ready yet.</strong><br />
<br />
I’m going make a bold statement here – automated sentiment analysis using the right methodology – is actually superior to human sentiment analysis. Bear with me and read through.<br />
<br />
The available approaches to analyzing sentiment/satisfaction vary based on the data provided. I would categorize the approaches based on the availability of three types of data:<br />
1. Customer feedback (free-form text) with customer ranked satisfaction (discrete value), like Amazon product reviews.<br />
2. Customer feedback (free-form text) with manually ranked satisfaction (discrete value), where human readers subjectively score the content.<br />
3. Customer feedback only, no ranked satisfaction, as with blog posts and comments<br />
<br />
For the first data type, machine learning algorithms do a good job of measuring overall sentiment (say, +ve/neutral/-ve). Examples of data suitable for this approach are: survey data and product review forums. The problem is that not a lot of text is gathered this way (with a purpose in mind). Even if it is, the machine learning algorithms struggle with distinguishing positive elements from negative. It's one thing to know if a customer is dissatisfied, it is another to know about what!<br />
<br />
Given no customer ranked satisfaction, it is possible to build a statistical model using a sample of manually ranked documents, then automatically score the remaining unranked documents. Not many companies are willing to do this. It also doesn't truly represent the customer’s opinion - just the reader’s interpretation of what the customer thinks.<br />
<br />
For the third option, customer opinion with no ranking, you can derive sentiment from the context of the text using natural language processing or NLP. This data is most common and hence so are the approaches to analyzing it. It’s not easy, but it’s the sweet spot for gain value from the massive volumes of consumer generated text. <br />
<br />
One widely available, cheap technology assigns an overall positive or negative sentiment based on assigning positive or negative values to individual words then summing them to get an overall sentiment rating. This approach fails in situations like the following:<br />
"It's not bad" (two negatives that actually suggest a positive)<br />
"I'm not going to say this sucks" (sarcasm or humor)<br />
“The keyboard is impossibly small but the display is the best I’ve seen.” (combination)<br />
<br />
The most recent advances in sentiment analysis technology use a combination of techniques: <br />
(1) statistics <br />
(2) rule-based definitions and <br />
(3) human intervention, e.g. a final review of the machine scoring. <br />
<br />
The results are less expensive than human-only sentiment analysis, but more consistent. Why? Because the automation adds consistency, while the human verifies the result. When put in the right workflow then it clearly increases scalability by a substantial factor. <br />
<br />
Teragram, a division of SAS, announced the <a href="http://www.teragram.com/solutions/sentiment-analysis.html" >Teragram Sentiment Analysis Manager</a> at the Text Analytics Summit early June. More to come on that!<br />
 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/zCT7A5CbGqo" height="1" width="1"/>]]></content:encoded>

    <pubDate>Thu, 11 Jun 2009 13:37:47 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/32-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/32-Sentiment-Analysis-Overview.html</feedburner:origLink></item>
<item>
    <title>The Phenomenon that is Twitter</title>
    <link>http://feedproxy.google.com/~r/TheTextFrontier/~3/9w51hK5K6ZI/index.php</link>
            <category>Manya Mayes</category>
    
    <comments>http://blogs.sas.com/text-mining/index.php?/archives/31-The-Phenomenon-that-is-Twitter.html#comments</comments>
    <wfw:comment>http://blogs.sas.com/text-mining/wfwcomment.php?cid=31</wfw:comment>

    <slash:comments>0</slash:comments>
    <wfw:commentRss>http://blogs.sas.com/text-mining/rss.php?version=2.0&amp;type=comments&amp;cid=31</wfw:commentRss>
    

    <author>nospam@example.com (Manya Mayes)</author>
    <content:encoded><![CDATA[
    I mentioned the buzz around Social Media Analysis (SMA) at the Text Analytics Summit.  If we took all the speakers content and produced a tag cloud, Twitter would have the biggest 'floor space'.  I don't think there was a single presentation that did NOT mention Twitter.  <br />
<br />
While doing some background research for SMA, I ran across an article entitled <a href=": http://blog.hubspot.com/blog/tabid/6307/bid/4829/Announcing-the-June-2009-State-of-the-Twittersphere-Report.aspx "  title="HubSpot Blog">State of the Twittersphere</a>, that <a href="http://hubspot.com"  title="HubSpot Web Site">HubSpot</a> blogged about just this week (that's <a href="http://twitter.com/search/users?q=@hubspot&category=people&source=find_on_twitter"  title="HubSpot Twitter Accounts">@HubSpot </a>for the 55.5% of Twitter users that don't follow anyone).  There's a lot of really great Twitter usage statistics in this report.  It's amazing how many people sign up with Twitter but are very inactive (I have multiple Twitter accounts and one is definitely contributing to inactivity).  I'm more interested in those users that are very active.  It would be good to connect with other users who post materials similar to my own (like a document recommendation system) and Text Mining can definitely help with this.  I'd also like to see something like a “users who posted materials like this, also connected with these users:" - like the recommendations you get from <a href="http://www.amazon.com/Competing-Analytics-New-Science-Winning/dp/1422103323/ref=sr_1_1?ie=UTF8&s=books&qid=1244733206&sr=8-1"  title="Amazon Recommendation Example">Amazon</a>.  Ranking the tweets of users you follow based on content would also be fabulous.  Some users post about both personal and business related materials.  I personally prefer not to read the personal posts (sorry y'all).  Having personal tweets, or topics less interesting to me appear further down the list (if at all) would be another desirable feature...<br />
<br />
I have a bunch of other recommendations for Twitter product management - as do many other Twitter users.  How about using Text Analytics/Text Mining for managing product requirements...<br />
 
    <img src="http://feeds.feedburner.com/~r/TheTextFrontier/~4/9w51hK5K6ZI" height="1" width="1"/>]]></content:encoded>

    <pubDate>Thu, 11 Jun 2009 10:10:46 -0400</pubDate>
    <guid isPermaLink="false">http://blogs.sas.com/text-mining/index.php?/archives/31-guid.html</guid>
    
<feedburner:origLink>http://blogs.sas.com/text-mining/index.php?/archives/31-The-Phenomenon-that-is-Twitter.html</feedburner:origLink></item>

</channel>
</rss>
