Tuesday, September 29, 2009

The Web Knows

We all get this weird feeling sometimes that the Web knows a lot about us but there's no way to know otherwise because the Web doesn't tells how much it knows. For me, it knows what magazines I read, what friends I share my photos with, what places I fly, what food and clothes I buy, where do I work, what movies I watch, what songs I like, what blogs I write...the list goes on and suddenly I shout.."Jesus, the Web knows more about me than my mother." But then I calm down knowing that its not going to tell anybody, at-least in near future. However Sir Tim Berners-Lee thinks otherwise, he believes that the web will reveal everything once we start to ask in a structured manner. 20 years ago as a frustrated Software Engineer Tim Berners-Lee invented the World Wide Web. Now he is frustrated again with how the web has evolved so far. As the head of W3C, he is now evangelizing the idea of linked data and semantic web. So what we would have will be a "Web of data" rather than "Web of documents". And I couldn't agree more with Tim, the current web however useful is still a mesh of incoherent, in-congruent and highly unstructured data which is bound to be replaced by Linked Data.  If you are fumbling with idea of how linked data is going to reframe the next web, Sir Tim's talk (on TED.com) on Linked Data is a must watch for everyone who wants to know where the Web is headed. He is pivotal in creating W3C design specs for Linked Data, some of the key points of which are:


  • Use URIs to identify things that you expose to the Web as resources.
  • Use HTTP URIs so that people can locate and look up (dereference) these things.
  • Provide useful information about the resource when its URI is dereferenced.
  • Include links to other, related URIs in the exposed data as a means of improving information discovery on the Web.

Technically nothing new is going on here but logically the world is changing. However the transformation is easier said than done. There are uncountable websites with unstructured data which no doubt amass valuable information that can't be ignored. Currently there are two possible ways of integrating this data with semantic web, one is that websites themselves expose their data with webservices and other is to scrape those websites to collect and organize the data. The first option being more plausible is also more easier to implement. Later in this article we will see how web scraping really works and what problems confront it. The below image depicts how many websites have opened their webservice and are currently participating in Semantic Web as datasets and these datasets are increasing exponentially.





So the "Web of data", as some call it Web 3.0, will eventually encourage web sites to expose themselves as Web Services. And we are now witnessing such services already surfacing on the horizon with giants like Google, Yahoo, Amazon and Thomson Reuters joining the bandwagon. Lets us take a brief look at some of these exciting webservices.


My personal favorite is OpenCalais which is probably the best current example of Linked Data which is a type of structured data recommended by Sir Lee. OpenCalais API was launched on Feb '08  by the international business and financial news giant Thomson Reuters. The reason why I favor OpenCalais is the ease with which Linked Data can be generated. The users passed unstructured HTML in API and it turns  into semantically marked up data. The linking is more profound in categories such as 'people,' 'places,' 'companies' and few more. This way, third party applications and sites can build interesting new things from that data - one of the defining principles of Linked Data.

Thomson Reuters is not alone, Wolfram Research launched a "computational knowledge engine" called Wolfram|Alpha in May '09 which is not Google killer as some predicted. With a search engine-like interface Wolfram|Alpha serves natural language query like Google but it also does some interesting computation on the retrieved data.  Wolfram|Alpha is more inclined towards consuming structured data rather than generating it. Wolfram|Alpha is one of the few existing products that marks the beginning of era when machines will consume human generated content.

Not quiet coincidently, also in May '09, Google added a new feature in its core search called 'Rich snippets' which is a form of structured data. This features shows little more useful information about the pages in result by using structured data format such as microformats and RDFa. Although this markup is not widespread yet but given the wide reach of Google this is surely a good news for the development of Semantic Web.


Above three examples are certain indication that structured data is rapidly becoming a feature of today's and future's Web. Players like Thomson Reuters and Google are encouraging generation of structured data and products (like Wolfram|Alpha) will make use of structured data in ways we perhaps can't imagine right now. Linked data can also helpful in making businesses grow by expanding their userbase or making their data more accessible. This is evident from Amazon's visionary WebOS strategy. Amazon has released number of developer friendly API to expose their infrastructure. One of the interesting web services opened up by Amazon was the E-Commerce service which allows access to Amazon's product catalog. Third party developers can use this feature rich API to manipulate users, wish lists and shopping carts.  Making this API completely free makes perfect business sense for Amazon as the application developed on top of this API will drive user traffic back to Amazon as the webservice returns items with Amazon URL.

Despite the evident benefits of webservices some site will choose not expose their data through webservices, this will force third party developers to deploy scrapers in order to collect the data from  such websites. Web Scraping is more or less reverse engineering of HTML pages and has its disadvantages as with any other reverse engineering technique. It is essentially parsing out chunks of information from a page. The problem with scraping web pages coded in HTML is that actual data is mingled with layout and rendering information and is not readily available to a computer. For Scrapers programs to get the data back from a given HTML page, first they have to learn the details of the particular markup and figure out where the actual data is. By applying such a scraper, it is possible to discover what URLs are tagged with any given tag but the result may not be accurate as achieved through webservices..

When compared to scrapers, webservices offers numerous advantages. To name a few, websites will have the control over the data and can track usage of data alongwith granular details like how the data is used and by whom. Following Amazon's track other sites can do this in a way to encourage third party developers to build applications which will eventually drive the traffic back to their sites.


In the past websites were very conservative about the data they own as they believed closed data gives them a competitive advantage. However people have started to realize that opening up their data can open new business possibilities. Amazon being pioneer in this change has already proved that charging a very small amount for their data can indeed increase the revenue as more traffic is directed to its sites through non-Amazon applications.


In the future websites will have to act as a database for other applications, how they do it still unclear. More or less websites will transform into webservices. However webservices APIs may not be available for all and this will fuel the expansion of scraper program  penetration. Some sites will fail to notice this change and will pay the price for it. Only those who understand and appreciate the importance of Semantic Web will survive to see the dawn of "Web 3.0".

Sunday, September 20, 2009

Customer Felicity Through Usability Experience

By gone are the days when you just meet customer requirements and call it a job well done.
The last time I checked we are not breathing in Stone Age. This is age when it is quite expected to deliver beyond expected. Customer expectations have soared through the roof. Only a spark of brilliance is not enough, those who bring everything each day week-in week-out will rise to the top. There is a Chinese proverb for it "What you got here is not going to take you there." These turbulent times have changed the way customers buy software or services and competition is fierce than ever. Software vendors are hunting for new ways to allure customers; be it dazzling application functionality or world class support. Apart from this there is one more factor which is playing decisive role in success of a software product; Usability of the software. Software's usability—the ease with which end users can be trained on and operate the product— is becoming a fundamental purchasing criterion and a direct way of cutting operational costs.


Moreover it is ludicrous to consider application functionality and usability as trade-offs anymore,

Good Functionality + Bad Usability != Bad functionality + Good usability

Good functionality always take precedence but usability is making its way into the board room discussions. We simply can't afford to pay for products that cost us a lot of overhead anymore," said Keith Butler, a technical fellow at Boeing's Phantom Works research and development arm. When thousands of end users are involved, design flaws can cost millions of dollars in lost time and productivity, he said. So even if you pack ocean boiling features in your application but it takes 12 engineers 3 months to log into your application, you are going right out of the window. Yes, there are exceptions, for instance take Facebook, results of a heuristic evaluation show that Facebook performs poorly with regards to traditional usability guidelines. So in theory, Facebook should not be the success it currently is due to its failure when tested using a traditional usability evaluation method but its immensely popular and gazillions user flock around it. But every software doesn't get evolved up to Facebook level. Usability is something that no longer can be compromised in favour of diverting focus to functionality. Thanks to NIST there is a standard for generating usability test reports called CIF Usability Test Reports.

The development of a standard for comparing product usability was spearheaded by the National Institute of Standards and Technology. Called the Common Industry Format (CIF) for Usability Test Reports, the standard outlines a format for reporting test conditions and results and gives user companies enough information about a test to replicate it. This format has evolved as an ANSI standard already
Boeing played a lead role in the development of CIF after its experience and internal studies showed that usability played a significant role in total cost of ownership. In one pilot of the CIF standard on a widely deployed productivity application, the Chicago-based company said improved product usability had a cost benefit of about $45 million. Butler said it's much better to have vendors refine an interface design "than to have thousands of end users doing it involuntarily on top of their jobs and then just feeling frustrated."


With CIF it possible for vendors and users to discuss usability as a science rather than marketing hype. Several benefits of introducing summative usability testing into the development process:

* It provides a concrete benchmark for user performance and satisfaction, thus reducing the risk that the new system is more difficult to use (and therefore less successful) than the existing system.
* It highlights usability problems with the existing system that need to be addressed in the design of the new system.
* It provides specific goals for usability and gives developers the opportunity to became familiar with typical user task scenarios.
* It provides the framework for the more detailed usability work required by ISO 13407.


Microsoft Corp., also a major CIF development participant, has incorporated the usability testing it conducted on its Windows XP, Windows ME and Windows 2000 operating systems into the CIF format, said Kent Sullivan, Microsoft's usability lead for the Windows client. CIF has been adopted by major giants like IBM, Kodak, Cisco to name a few.

Even if the numbers of end users are very less usability tests still make a lot of business sense. As those few users will have direct say in whether their company should go for the next version or consider some other vendor. Usability is all about how you offer functionality to be harnessed without much effort, training and certainly frustration. It takes much longer time for user to explore all the functionality of the application but usability starts making a dent from Day One. Spending more thought, effort and time will put you in an elite group of vendors who claim to deliver Customer Felicity.



Further Readings:
CIF: http://www.usabilitynet.org/prue/cif.htm
http://zing.ncsl.nist.gov/iusr/documents/cifv1.1b.htm
http://portal.acm.org/citation.cfm?id=633292.633470

User Experience: http://mashable.com/2009/01/09/user-experience-design/





Courtesy: www.dilbert.com