I'm gathering data for a casual survey, and I'm like to share what I learn from you. I won't be releasing any names or organizations unless you send your email and your approval to me as an "Additional comments" I look forward form hearing from you!
//
Miles
@
mbk@ideaeng.com
Years ago, what we now call ‘enterprise search’ came with pretty basic fictionality. Product arrived with the ability to create an index; ‘filters’ to enable the indexing to work properly on non-text content like Word Perfect and WordStar; and an application that delivered a basic search experience, so inexperienced users could find the content they wanted. Rarely, advanced products included basic reporting, but everything was primarily focused on the indexing process, and very rarely on user query activity. Some companies – I’m thinking of organizations like Verity and Fulcrum Technologies (both former employers of mine) actually included some limited reporting, with the focus primarily on user queries.
Nowadays, technology has expanded everywhere. And with the popularity and success of the leading names in ‘internet search’ - Google, Bing, Yahoo, and others – more and more users have come to expect similar results in their corporate search capabilities for both internal and public-facing content. The catch is that the intranet is not the internet, and search is frequently an orphan product managed by an IT team short on time and often with little experience with search. I’ve summarized this problem before by confessing that search is not “fire and forget technology.
I’ve written about the challenges of enterprise search before: to summarize, enterprise queries generally have a ‘right answer’; document-level security, associated with many – if not a majority of documents; a variety of different file formats; and most importantly, someone whose job is to ensure content and search quality.
But we’ve seen a major enhancement in enterprise search lately, one that promises quantum leaps in search quality, precision, and personalization: machine learning or ML. And just to set the record straight, ML is not the same as ‘artificial intelligence’; technically, ML is an implementation of AI.
ML does not free you from managing your search platform. The “L” in ML is about learning, but sadly, the “M” is not for ‘magic’. As with we humans, machine learning happens by repetition and by observing behavior over time. And sadly, few enterprises have enough content and query activity quantity to be anywhere near as good as Google. And even though we are well into the 21st century, we don’t have the technology of the HAL 9000 computer – at least not yet. Come back next year!
What can you do to at least start seeing some benefits from ML technology we now see integrated with commercial and open-source search technologies? Regardless of what advanced technologies you use, you still have to pay attention to the basics. I’ve written about much all of these in the past, but they all deserve repeating. As one of my teachers in college used to say, "I ask you to put a hand over one of your ears because what I’m about to say is too important to go in one ear and out the other". Good old Coach Gollnick.
First, make it a habit to watch your search platform. Most likely, your IT folks have tools that track behavior of critical software – yes, search is critical, even if you’re not using it for eCommerce. Talk to those folks, have them set up monitoring of search: if possible, have them track queries, presented results, viewed content, and most importantly, ‘no hits’. If a user performs a search, presumably he or she expects to see content relevant to the query; and if you have no content, either your user is looking for content you don’t have, or your content is not tagged properly. When in doubt, ask your users to confirm what he or she was looking for.
What Metrics to Track
Most search technologies now offer at least some metrics; what you’d ideally have includes these basic statistics:
Top Queries: Simply, what are the most common queries users submit. You want to understand queries across departments; for example, Marketing will have different interests and queries than Sales or IT staff. Understanding queries provides the metrics the search team needs to keep content relevant.
No Hits: When users perform a search – especially a relevant search term - and find nothing, they become frustrated. Tracking these queries provides critical information to the teams that create and manage content; and addressing what are often misspellings or incorrect terms leads to happier end users.
Rare terms frequently seen in queries: This is a very interesting metric, since it provides knowledge on content users are looking for, but for which there may not be much content.
Additional Must-Do Tasks
What about content that is common throughout your content and indices, but where you can identify and promote the ‘right answer’. Sometimes, this means talking to folks in other departments who may have better understanding of the query intent. One thing we’ve seen that has often proved worthwhile is to form a ‘Enterprise Search Team’ or a “Search Center of Excellence’. These groups, made up with users from across the organization and all levels of the company from individual contributors to senior management, meet periodically – quarterly seems to be the most common frequency - to discuss users’ feedback on search platform
In addition to the above reporting, there are a few more things to track. It’s a good idea to track queries by department: often we’ve seen instances where one or two departments utilize search thoroughly, while others use it rarely. Talk to folks in those departments to understand a users’ intent; and if you are missing content, address the problem. Keep an eye out for rare but potentially critical queries; believe it or not, they can be important.
One extreme, but hopefully rare case: sensitive terms. A query for ‘sexual harassment policy” is a symptom of potentially legal vulnerabilities; and if that query occurs, it should escalate to HR. I’m certainly not an expert in the law, but I’ve been told that, in cases where companies should have known there were issues, there may be liability. Talk to HR (and perhaps Corporate Legal); Investigate the issue, and address the problem. And, as a potential bonus, to fend lawsuits.
Search is not ‘fire and forget’ technology, and it takes time and resources to maintain high-quality user satisfaction. The squeaky wheel may get the grease; but constant improvement will help address any dissatisfaction and that’s always a good deal.
A Final Word
A lack of feedback from users is not always a good thing: combined with unexpectedly low query activity, it could be that your users have given up on search. Communicate with your users, gather their input, and communicate.
The magic in early instances of what we now call 'enterprise search' was being able to find content by typing in a few keywords. It wasn't as cool as the HAL 9000 computer featured in "2001 - A Space Odyssey", but it was good enough to draw a large number of people - myself included - into the business.
Along the way, Google perfected a search platform based on the theory that, at scale, just about any query you could think of had already been used by thousands, if not millions. of other humans. All Google needed to do is keep track of what pages other humans viewed following a query and promoting the page to the top. Essentially, they created a 'crowd-sourced search'.
The bad news for those of us who work on search designed for use within the enterprise is that there just isn't sufficient content - or query activity - to deliver results as accurate as those we experience on the public web. Consider: Google marketed the Google Search Appliance for the enterprise. It didn't deliver the kinds of results public-facing Google does, and Google pulled the product from the market. For great search, size matters.
Nonetheless, some of the companies that market enterprise search products are now adding elements of machine learning with their products; and while perhaps not as accurate as web-based Google, they do deliver results that start out pretty well and get better with age, as the platforms learn what documents humans view following queries.
And if you've not noticed, some leading vendors are now integrating - and encouraging - what is known as 'conversational search'. Think about it: when you need to find a document in your organization, you may ask a colleague. But you don't simply say "sales'. Chances are you'll ask "where is the new sales report".
It's encouraging to see an increasing number of vendors delivering these capabilities in their commercial products. The most recent to announce conversational search is Algolia, although I have to say I'm quite disappointed in the Wikipedia write-up on them. In my spare time, should I ever find any, I should go do some edits, but this 'spare time' thing is rare for me.
Nonetheless, I'm happy to see an increasing number of commercial search vendors beginning to integrate these advanced capabilities into their products. Search in the enterprise has challenges: but hang in there: it's getting better!
Note: How has your experience been with machine learning and AI integrated with your enterprise search? I'd love to hear your experiences - even if under NDA!
The new year is a time when most of us resolve to make changes in our personal lives: losing weight, exercising more, spending more time with a spouse and/or the kids. We start the year with great energy to meet our goals, but sadly many of us fall short through the year.
This often happens in the enterprise as well. Improving internal search is a common resolution at the time of the year. For eCommerce sites, January generally means fewer site visitors once the holiday rush is done; so making changes won’t have a great impact on sales. For corporations, it’s a time of new budgets and great expectations: and more than a few of the clients I’ve we’ve worked with over the years tell me how poorly their internal search performs compared to the public search sites like Google, Bing, and DuckDuckGo. Why do these search platforms work so well? And why can’t your site search match their success? It’s a numbers game. By definition, public search platforms index millions of sites; and many of these contain similar if not identical content. This makes is easy to find what you’re looking for because thousands of sites have relevant results for just about any query you may try.
Intranet sites are different, Usually, there is only one page with the information you are looking for. But often, content authors, who have read about how to promote consent on Google, will add keywords using Microsoft Word’s “Properties” field in an effort to promote their documents. This attempt to ‘game’ the internal search platform generally interferes with the platform’s relevance functions and results in poor result relevance. Even the Document Properties the Microsoft Word provides can interfere with search effectiveness.
Years ago, we were working with a client who was interested in knowing which employees were contributing to the intranet content. When the data was processed, it turned out that an Administrative Assistant in Marketing had authored more documents than anyone else in the corporation. After a quick review, we discovered why this one person was apparently more prolific than any other employee. That person had created all of the template forms used throughout the company, so the Word Document Properties listed that employee’s name as the author of virtually every standard template throughout the company.
So in the spirit of the new year, I’d suggest that you spend a day or two performing a data audit to discover where your content – or lack thereof – is negatively impacting your enterprise search results. And if you find any doozies – I’d love to hear about it!
The month of January is associated with the Roman god Janus who, with two heads, could look forward and back. That said, I find December a quiet time that provides the opportunity to review the current year and to plan the coming new year. As I tweeted yesterday at @miles_kehoe, this is the most stressful time of the year for most sites focused on eCommerce. Changes are generally 'off-limits' - even an hour offline can put a dent in sales.
But for those responsible for corporate internal and public-facing sites, this is the time to review content, identify potential changes, and even new content. And if planned well, the holidays are often a great time to update intranet sites: from late November through the new year, activity tends to slow for more corporate sites. Both IT and content staff should be using this quiet time to make changes, from updates to current content - the new vacation schedule is just one the comes to mind - to minor restructuring. (Note: while the holidays are a great time to roll out major changes, these should have been in planning months ago: it's a holiday, not a sabbatical!)
For the search team, this is time to review search activity: top queries, zero hits, misspellings, and synonyms come to mind as a minimum effort. It's also a good time to identify popular content, as well as content that was either never part of any search result or was included in result lists but never viewed.
So - December is nearly half over: take advantage of what is normally a quiet time for intranets and make that site better!
Happy Holidays!
VC firms seem attracted to the Enterprise Search space
Just today, it was announced that Canadian-based Coveo closed a ~170MUS round, following the Lucidworks’ recent $100MUS round. Earlier this year we’ve seen Algolia come in with $110M of funding, and of course the recent Elasticsearch’s IPO – sure looks it looks like 2019 will have been a good year for the leading technologies.
Stay tuned as we learn more about the trend!
Enterprise search was once easy. It was often bad - but understanding the results was pretty easy. If the query term(s) were in the document, it was there in the results. Period. The more times the terms appeared, the higher the result appeared in the result list. And when the user typed a multi-term query, documents with all of the terms displayed higher in the result list than those with only some of the terms.
And some search platforms could 'explain' why a particular document was ranked where it was. Those of us who have been in the business a while may remember the Verity Topic and Verity K2 product lines. One of the most advanced capabilities in these was the 'explain' function. It would reverse engineer the score for an individual document and report the critical 'why' the selected document was ranked as it was.
"People Like You"
Now, search results are generally better, but every now and then, you’ll see a result that surprises you, especially on sites that are enhanced with machine learning technologies. A friend of mine tells of a query she did on Google while she was looking for summer clothes, but the top result was a pair of shoes. She related her surprise: "I asked for summer clothes, and Google shows me SHOES?". But, she admitted, "Those shoes ARE pretty nice!"
How did that happen? Somewhere, deep in the data Google maintains on her search history, it concluded that "people like her" purchased that pair of shoes.
In the enterprise, we don't have the volume of content and query activity of the large Internet players, but we do tend to have more focused content and a narrower query vocabulary. ML/AI tools like MLLib, part of both Mahout and Spark, can help our search platforms generate such odd yet often relevant results; but these technologies are still limited when it comes to explaining the 'why' for a given result. And those of us who still exhibit skepticism when it comes to computers, that capability would be nice.
Are you using or planning to implement) ML-in-search? A skeptic? Which camp you're in? Let me hear from you! miles.kehoe@ideaeng.com.
Lucidworks, the commercial organization with the largest pool of Solr committers, announced today a new funding round of $50M US from venture firms Top Tier Capital Partners and Silver Lake Waterman, as well as additional participation from existing investors Shasta Ventures, Granite Ventures, and Allegis Capital.
While a big funding round for a privately held company isn't uncommon here in 'the valley', what really caught my attention is where and how Lucidworks will use the new capital. Will Hayes, Lucidworks' CEO, intends to focus the investment on what he calls "smart data experiences" that go beyond simply artificial intelligence and machine learning. The challenge is to provide useful and relevant results by addressing what he calls "the last mile" problem in current AI: enabling mere mortals to find useful insights in search without having to understand the black art of data science and big data analysis. The end target is to drive better customer experiences and improved employee productivity.
A number of well-known companies utilize Lucidworks Fusion already, many along with AI and ML tools and technologies. I've long thought that to take advantage of 'big data' like Google, Amazon, and others do, you needed huge numbers of users and queries to confidently provide meaningful suggestions in search results. While that helps, Hayes explained that smaller organizations will be able to benefit from the technology in Fusion because of both smaller and more focused data sets, even with a smaller pool of queries. With the combination of these two characteristics, Lucidworks expects to deliver many of the benefits of traditional machine learning and AI-like results to enterprise-sized content. It will be interesting to see what Lucidworks does in the next several releases of Fusion!
If you’re involved in managing the enterprise search instance at your company, there’s a good chance that you’ve experienced at least some users complaining about the poor results they see. A common lament search teams hear is “Why didn’t we use Google?” Even more telling is that many organizations that used the Google Search Appliance on their sites heard the same lament.
We're often asked to help a client improve results on an internal search platform; and sometimes, the problem is the platform. Not every platform handles every use case equally, and sometimes that shows up. Occasionally, the problem is a poor or misconfigured search, or simply an instance that hasn’t been managed properly. The renowned Google public search engine does well not because it is a great search platform. In fact, Google has become less of a search platform and more of a big data analytics engine.
Our business is helping clients select, implement, and manage Intranet search. Frequently, the problem is not the search platform. Rather, the culprit is poor data quality.
Enterprise data isn’t created with search in mind. There is little incentive for authors to attach quality metadata in the properties fields Adobe PDF Maker, Microsoft Office, and other document publishing tools support. To make matters worse, there may be several versions of a given document as it goes through creation, editing, and updating; and often the early drafts, as well as the final version, are in the same directory or file share. Very rarely will a public facing website have such issues.
We have an updated two-part series on data quality and search, starting here. We hope you find it helpful; let us know if you have any questions!