Saturday, March 12, 2016

Big Data Basics Part - 1

The term “big data” remains difficult to understand because it can mean different things to different set of people. Behavioural economist Dan Ariely once compared Big Data to teenage sex: “everyone talks about it, nobody really knows how to do it, and everyone thinks everyone else is doing it, so everyone claims they are doing it.”   

So what is Big Data?
Big data explained in simple terms by Bernard Marr:
The basic idea behind the phrase 'Big Data' is that everything we do is increasingly leaving a digital trace (or data), which we (and others) can use and analyse.
Big Data therefore refers to our ability to make use of the ever-increasing volumes of data.

Type of data (Datafication):
  • Activity Data - Digital music players and eBooks collect data on our activities. Your smart phone collects data on how you use it and your web browser collects information on what you are searching for.
  • Conversation Data - Most of our conversations leave a digital trail. Just think of all the conversations we have on social media sites like Facebook or Twitter. Even many of our phone conversations are now digitally recorded.
  • Photo and Video Image Data- . We upload and share 100s of thousands of them on social media sites every second. The increasing amounts of CCTV cameras take video images and we up-load hundreds of hours of video images.
  • Sensor Data - Your smart phone contains a global positioning sensor to track exactly where you are every second of the day, it includes an accelometer to track the speed and direction at which you are travelling.
  • The Internet of Things Data - Smart TVs are able to collect and process data, we have smart watches, smart fridges, and smart alarms. The Internet of Things connects these devices.


Currently we need to wait a considerable amount of time to gather the data from around the word, analyze it, and take action.  The process is slow and inefficient and contributing factors includes; Not having fast enough computer systems capable of gathering and storing the ever changing data (velocity), not having computer systems that can accommodate the volume of the data pouring in from all of the sources (volume), not having computer systems that can process images, media files etc e.g. x-rays, mp3 (variety) and messiness or trustworthiness of the data (veracity).  Big Data technology changed abovementioned issues by solving the velocity-volume-variety-veracity problem.

How it’s different from traditional BI?
To understand the difference between Big Data and Traditional BI, let’s first look how Analytics has changed/ improved over the period of time:


The goal of any analytics solution is to provide the organization with actionable insights for smarter decisions and better business outcomes. Once you have enough data, you start to see patterns and you then start building a model of how these data work. Once you build a model, you can predict.
Different types of analytics, however, provide different types of insights (refer figure above). The analytics models are moving from descriptive analytics through Predictive to Prescriptive.  
  1. ·      Descriptive Analytics (The first step, insight into the past). This is the simplest class of analytics that allows you to condense data into smaller, more useful nuggets of information. It uses data aggregation and data mining techniques to summarize raw data and make it something that is interpretable by humans providing an insight into the past and answer: “What has happened?”
  2. ·      Predictive Analytics (Predict/ Understand the future). It utilizes a variety of statistical, modelling, data mining, and machine learning techniques to study recent and historical data, thereby allowing analysts to make predictions about the future. Predictive analytics can only forecast what might happen in the future, because the foundation of predictive analytics is based on probabilities that use statistical models and forecasts techniques to understand the future and answer: “What could happen?”
  3. ·      Prescriptive Analytics (Advise on possible outcomes). The relatively new field of prescriptive analytics allows users to “prescribe” a number of different possible actions to and guide them towards a solution. In a nut-shell, these analytics predicts not only what will happen, but also why it will happen by providing recommendations regarding actions that will take advantage of the predictions. It uses optimization and simulation algorithms to advice on possible outcomes and answer: “What should we do?”


Now let’s try to compare traditional BI and Analytics (descriptive and predictive) with Big Data (+ Prescriptive)

Traditional business intelligence (BI) has always been top-down, putting data in the hands of executives and managers who are looking to track their businesses on the big-picture level. Big Data, on the other hand, is bottom-up. It empowers business end-users to carry out in-depth analysis to inform real-time decision-making.

BI is about making decisions and analytics is about asking questions: Which product model got the most complaints? What is the lead conversion ratio of a particular product? Which products are selling more in north-east states? In other worlds traditional BI and analytics is about getting answers you already know are important, and because you know they’re important you put mechanisms in place to produce the key metrics. Big Data, on the other hand, is about finding answers to questions you didn’t even know you had.

The scope of traditional BI is limited to structured data that can be stuffed into columns and rows on a data warehouse. BI could never have anticipated the multitude of images, MP3 files, videos and social media snippets that companies would contend with.  Big Data refers to the immense volumes of data (structured and unstructured) available online and in the cloud, which requires ever more computing power to gather and analyze.


Prescriptive analytics is the future of Big Data, its potential is enormous, but it also requires massive amounts of data to be able to make correct decisions. You have to collect, store, analyze, organize, purge, and use the data. It's that process from collection to use to purge that is the great unknown of big data. Hope you find the article helpful in connecting outline dots.

Monday, September 12, 2011

SharePoint 2010 Search | Fast?

This post is an attempt to answer the question “what is the right search strategy (using SharePoint 2010) for your customer”.


Whilst finding the right approach I came across a phrase “actionable search”. Actionable search is an important phrase to understand. Actionable search results are directly used by searcher to complete a specific task. Let’s take an example of an ecommerce site where you enter the product code in search box and the search result shows the product with link “add to cart”. Adding product to the shopping cart here is the user action directly associated with the search performed. Actionable search is less relevant in search engines like Google and Bing; however, it can have significant impact on employee productivity in intranet scenario. Actionable search is meaningful when there is high degree of relevance and becomes more powerful when contextual actions (options you get when hovering on result e.g. bookmark, see similar results, add metadata, like C etc.) are associated with the results.


The reason I bring this point is the fact that “actionable search” is the key consideration for organizations looking specifically for an enterprise search solution for improved employee productivity.


Coming back to the search strategy question; to get an answer, it’s important to understand 1) customer requirements (functional and non-functional), 2) existing platform/data, 3) long and short term goals, 4) audiences and 5) budget. These are generic parameters, a thorough analysis of the best solution to meet an organization's particular needs should be conducted before making a recommendations. On the lighter side, while working in Microsoft India the point 5) above was the only/ main consideration for meJ.


There are two main enterprise search options coming with the SharePoint 2010 release:


1) SharePoint Server 2010 Search – the out-of-the-box SharePoint search for enterprise deployments


2) FAST Search Server 2010 for SharePoint – product based on the FAST search technology that combines the best of FAST’s high-end search capabilities with the best of SharePoint.


Please note that there are various components (comprising the enterprise search solution) and parameters (performance, availability, content size etc) that are out of scope of this article. This article aims towards providing high level overview of SharePoint 2010 search options and usage scenarios.



Choosing the Right Search Product (Microsoft version)


With Enterprise Search inside the firewall, there are two distinct types of search applications:


· General-purpose search applications increase employee efficiency by connecting "everyone to everything." These search solutions increase employee efficiency by connecting a broad set of people to a broad set of information. Intranet search is the most common example of this type of search application.


· Special-purpose search applications help a specific set of people make the most of a specific set of information. Common examples include product support applications, research portals ranging from market research to competitive analysis, knowledge centers, and customer-oriented sales and service applications. This kind of application is found in many places, with variants for essentially every role in an enterprise. These applications typically are the highest-value search applications, as they are tailored to a specific task that is usually essential to the users they serve. They are also typically the most rewarding for developers.


SharePoint Server 2010's built-in search is targeted at general-purpose search applications, and can be tailored to provide specific intranet search experiences for different organizations and situations. FAST Search for SharePoint can be used for general-purpose search applications, and can be an "upgrade" from SharePoint search to provide superior search in those applications. However, it is designed with special-purpose search applications in mind. So applications you identify as fitting the "special-purpose" category should be addressed with FAST Search for SharePoint.



For simplicity, the "special-purpose" category mentioned above can be related to “actionable search” we’ve discussed in the beginning along with the importance of relevance and contextual search. The FAST search brings following key features in addition to the out of box search offerings:


Visual Best Bets and Similar Results


The Visual Best Bets are enhanced version of Best Bets which were introduced in SharePoint 2007. The FAST Search for SharePoint provides the ability to render visual best bests in the form of images and even videos. You can show Visual Best Bets to the users while they are searching for a particular key word.


With FAST Search Server 2010 for SharePoint, results returned by a query include links to "Similar Results." When a user clicks on the link, the search is redefined and rerun to include documents that are similar to the result in question.


Search Enhancement based on user context


The different search settings: Best Bets, Visual Best Bets, document promotions, document demotions, site promotions and site demotions, can have one or more user contexts associated with them. The user context defines rules for when the settings should be applied. User contexts match the properties defined on the user’s SharePoint User Profile page. For example, an administrator can define that a Visual Best Bet banner should be displayed only if the user who enters the query has Office Location set to Stockholm. This way, different search users will experience different search results for the same query.


Other Features


Following are the other additional features provided by FAST search.


· Entity Extraction


· Advanced Sorting (by rank/ managed property/formula and random)


· Deep Refinement (Faceted Search)


· Document Preview


· Rich Web Indexing


· Structured Data Search using FAST Query Language (FQL)


· Support for high-end performance and scalability requirements.


I am not going into details of aforementioned items as detailed information on each is available on net. In case there is a specific customer requirement that requires one or more features provided only by FAST search, a cost v/s value proposition analysis will be required.


Last but not the least, non functional requirements of the desired system plays a vital role in recommending a best fit search solution. Following are few questions needs to be answered during the analysis process:


· Data size: How many documents need to be searched?


· User Load: How many queries / second are expected?


· Performance Requirement: What is the expected query latency?


· Availability: What is an acceptable down time / year / month?


· Scalability: What is the expected growth rate of data?


· Security: What is the level of security required?


These are some generic questions and there can be a huge list based on customer requirements like multilingual support, geography based scenarios, off line usage etc.


With virtualization support the concept of infinite storage capacity is a reality. Software and Infrastructure as a service model can be recommended in case of limited budget scenarios. It is imperative to understand the performance and search scale requirements to define the right strategy. I will try to unleash these non-functional considerations in my next post.



Tuesday, September 21, 2010

Information Architecture 4U

You go to the bar floor and the music on demand is on. You request the DJ to play “Smells like teen spirit” and the lad rocks the floor in no time with this Nirvana’s number. That is Information Architecture for you. If you are able to find the right information at right time, it cannot happen by chance; there has to be a logical and intuitive content structuring behind this. This logical and intuitive content structuring is called Information Architecture (IA). I am not defining the definition of IA; the attempt is to facilitate the perceptive.

Information Architect is not the Designer
The difference between an information architect and a designer is similar to that you think of difference between a apartment architect and an interior designer.
The Architect job is to define structure, ventilation, placement of plumbing and electrical systems etc. The apartment might collapse or fail to meet the needs of the family using or living in the building if it’s not properly architected.
Interior Designer’s job is to take care of coloring, placement, vastu and style of furnishings; textures; surfaces; etc.

The Evolution Theory
Richard Saul Wurman in 1975 termed the concept of “Information Architecture”. Wurman’s initial definition of information architecture was “organising the patterns in data, making the complex clear”.
In 1996 library scientists Lou Rosenfeld and Peter Morville used IA as the term to define the work they were doing while structuring large-scale websites and intranets. In Information Architecture for the World Wide Web: Designing Large-Scale Web Sites they define information architecture as:
• The combination of organization, labeling, and navigation schemes within an information system.
• The structural design of an information space to facilitate task completion and intuitive access to content.
• The art and science of structuring and classifying web sites and intranets to help people find and manage information.
• An emerging discipline and community of practice focused on bringing principles of design and architecture to the digital landscape

Elements of Information Architecture

The End User
The successful Information Architecture is all about usability. For effective usability you need to involve end users in planning the IA. There are proven games that you can play involving end users:
Card Sorting: Card sorting a definitive guide – Read learn and use it… you are sorted for the aforementioned threat. The same guide has reference to card based classification evolution technique for testing the IA.

Business Value
As an information architect, you need to understand the business context of an organization and identify the value addition you can bring while implementing the Information Architect.
Stakeholders need to be identified and interviewed to understand business objective and issues.

Content
You need to have the right content to interest your target audience. The content needs to be written based on your audience’s expectations and proficiencies.
Keep in mind that content is not a static object: it can change not only in structure and delivery, but from day to day as well. Being mindful of content strategy and alignment with business goals is the key to successful information architecture.
To accomplish these goals, an Architect has to run both top-down and bottom-up discovery sessions. Top-down discovery involves getting a picture of the entire information space and working down into the details. Bottom-up discovery is all about figuring out metadata for each piece of content and working up toward the general.

Quotes
Information architecture encompasses a wide range of problems. But regardless of the specific context or objectives of a given information architecture project, our concern is always with creating structures to facilitate effective communication. This notion is the core of our discipline. –– Jessie James Garrett

Flickr allows me to upload my pictures and organize them, tag them, however I see fit. There is no central authority telling me what to tag my pictures. This is partly because it’s not going to hurt anybody if I do it wrong … Flickr isn’t a mission-critical system. It’s a playful social platform…if you want a serious photo library, then use a system like the national archive or Corbis has, but not Flickr. There’s a difference between managing information, and designing the infrastructure to let others manage it themselves.

But both approaches are architectural. –– Andrew Hinton in Linkosophy

Thursday, September 2, 2010

Cloud Computing - ABC

If you only need milk, would you buy a cow? If not, please consider the cloud computing stack for your IT requirements.

There are 3 top flavors of cloud computing (new models are emerging like BpaS, DaaS etc.) – Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and the mother of all Software as a Service (SaaS). Let’s have a geek dive into these flavors.

SaaS is the most enhanced category of cloud services and so far the only segment of cloud computing that has proven useful as a business model. With SaaS, software applications are rented from a provider as opposed to purchased for enterprise installation and deployment. Typically the services are provided in a 'pay-as-you-go' model with payments charged on a monthly basis based on the number of users or services consumed. The key segments within the SaaS segments include messaging, collaboration and Customer Relationship Management (CRM). Salesforce.com is an example of a SAAS where ERP is used as an on demand service. Microsoft BPOS provides SharePoint, Exchange and CRM as a service over internet.

In simple terms, IaaS is selling hardware as a service over the cloud. The hardware includes processing power, firewalls, network load balancing, availability, storage and so on. The IaaS customer is the owner of software that has some SLA’s for end users. To meet these SLA’s (e.g. Availability, Performance, Security etc.) the software needs to be hosted in an environment that is built to provide the failover, compute power and security. The IaaS customer now needs a service provider who shall provide the hosting environment that meets the SLA’s. The IaaS service provider leverages virtualization to provide the computing power. The software owner needs to deploy the software in a virtual environment locally and then upload the Virtual Machine to the hosting environment of the IaaS service provider. From here, based on the SLA’s, the service provider configures and host the software application as per the SLA’s required.
The leading IaaS service providers are Amazon and Rackspace. The billing is generally based on processor usage, storage and data transfer. However, the additional billing items may include- security, compliance, usage reports etc.

PaaS provides an application platform in the cloud that lets you deploy any application you develop, or any application you acquire from another vendor. PaaS offerings generally include facilities for software application design, development, testing, deployment and hosting. PaaS may also include - database integration, security, scalability, storage, persistence, state management, application versioning, application instrumentation and developer community facilitation.
In short, PaaS makes all the required facilities that support the full software life cycle of building or delivering applications through the Web, while assuring the availability of services from the Internet
Windows Azure is an example of Platform as a Service. Azure is a foundation for running Windows applications and storing data in the cloud.
To run an application, a developer accesses the Windows Azure portal through her Web browser, signing in with a Windows Live ID. She then chooses whether to create a hosting account for running applications, a storage account for storing data, or both. Once the developer has a hosting account, she can upload her application, specifying how many instances the application needs. Windows Azure then creates the necessary Virtual Machines and runs the application.

I think it's getting complex now. I’ll take a pause and try to make it simple enough in upcoming posts.

Friday, May 21, 2010

Back after the break

More than two years I have been off and was learning to sell. Joined Microsoft India (SMSG) and it was indeed a great journey. Two years in Microsoft presales...Trust me you, are ready to run your shop anywhere in the universe. Now I need to come back to my roots of technology and here I am...back to business; Joined Datacraft India as Solutions Architect.

My last post was on SAAS and again have the oppertunity to work on the SAAS model BIG TIME. I'm good to go for blogging and will soon post a geeky note on SAAS and Cloud business.

Monday, March 17, 2008

Let’s Talk About SaaS

Before I write a blog post, I spend some time on understanding the concept from a geeks perspective. The intention is to simplify the concept and make it easier to understand / imagine (abstracting hi-tech lingo).

Let’s talk about SaaS… SaaS stands for Software as Service, which is becoming very popular trend in the software industry. Let’s try to understand how it’s different from have we have now (& had in past).

Hotmail.com, one of the most used online email software that is delivered as a service to end users. Now what is wrong in the hotmail that stops me tagging it as SaaS ? Salesforce.com is offering in-the-cloud version of CRM software that can be accessed over Internet. Both salesforce.com and hotmail doesn’t require any installation at user side and serves multiple users from single instance.

Salesforce delivers CRM, a line of business application and changes $59.00 per user per month, with no upfront cost. On the other side hotmail is a generic free email service available to internet users. The nature of the service is what differentiates a SaaS applications with other internet based (consumer) applications. A line of business SaaS application is a paid solution that addresses a business problem and requires a customized delivery channel.

When you want to get a line-of-business (LOB) application like an ERP; Currently, you will have to check for an ERP vendor, finalize the best fit solution, buy the hardware to install and plan for maintenance and customization. In the SaaS world the software is paid as it is consumed and the end user required no software or infrastructure to buy, install and maintain.

Though the concept sounds very exciting, there are lot of challenges that needs to be addressed for achieving the SaaS delivery model. There is a complete lifecycle considerations starting from Discovery, SLA, Security, Performance, Deployment etc.; which is beyond the scope of this topic; I’ll only mention the key considerations.

On the top is the multitenancy concept. This is like you have tenants in different apartments of a housing complex, sharing some common facilities (Beside having their private amenities). In SaaS, each tenant is the customer/ consumer of the hosted service instance. The multitenant-efficient SaaS offering supports a data model that allows sharing of the database and resources, while protecting data access to each of the individual tenants.

Next thing to consider is the ability to provide customized service based on the tenant while running a single instance of the service. This can be achieved by using metadata that describes the attributes of a tenant and map it with service. In other words, the behavior of service is defined by the metadata associated with each tenant. The level/ effort of customization required for each tenant is reduced by having the metadata.

Monday, March 3, 2008

Design for Upgrade

While this blog is more inclined towards the Agile software implementation, one of the incident happened last month has forced me to pen in a few thoughts on software architecture. We developed an application with Microsoft Office SharePoint and InfoPath. The development started a year back with Office 2003 and now it’s migrated to MOSS 2007. The application has been simply architected, keeping in mind the customer requirements. As mentioned in one of my previous post, the “tale of never ending requirements” hit us in the middle of migration. A new product release always carries additional/ improved features for addressing business problems. During the migration, it makes sense to evaluate new features and analyze the possibility of leveraging them for addressing business problems (in better way). Having said this, the solution architecture must support the up-gradation scenario with reasonable amount of time and effort.

Design decisions while architecting a software solution are derived from various sources like availability of resources, customer needs and preferences, business domain etc. However, the basics remains unchallenged for every implementation scenario. One of such fundamental is explained in this great quote from “Software Architecture in Practice, Bass, Clements, Kazman”

"Architects make design decisions because of the downstream effects they will have on the system(s) they are building and these effects are known and predictable. If they were not, the process of crafting an architecture would be no better than throwing dice. "

No architecture can be perfect, in-fact you do a tradeoff between software systems attributes to come with the best possible solution for specific business requirements. Thinking of an architecture generally brings into mind the concepts of layers and tiers (layers are logical separation of components while tiers are physical). Each layer will add an additional message call which will hit the response time. Then why architects still model the system in separate layers and takes the performance hit? The answer is “to make complex things simple”. More complex the system requirement is, more are the number of layers that are generally included in the solution architecture.


Coming back to our scenario, how would an architecture can support the technology up-gradation process? I believe there can not be a complete answer to this; unless you understand the vision of the technology/ product company, it’s difficult to handle this 100% with architecture. However, there are some basic principles that remains unchallenged. XML functional calls, loosely coupled layers, different tiers for application and database etc. are few checks to handle the situation .


One important point to consider: If requirements are simple then it’s not recommended to take the performance hit by including layers. Trying to over simplify the system will make it complex .