Want to know how to choose Machine Learning algorithm?

Machine Learning is the foundation for today’s insights on customer, products, costs and revenues which learns from the data provided to its algorithms.

Some of the most common examples of machine learning are Netflix’s algorithms to give movie suggestions based on movies you have watched in the past or Amazon’s algorithms that recommend products based on other customers bought before.

Typical algorithm model selection can be decided broadly on following questions:

  • How much data do you have & is it continuous?
  • Is it classification or regression problem?
  • Predefined variables (Labeled), unlabeledor mix?
  • Data class skewed?
  • What is the goal? – predict or rank?
  • Result interpretation easy or hard?

Here are the most used algorithms for various business problems:

Decision Trees: Decision tree output is very easy to understand even for people from non-analytical background. It does not require any statistical knowledge to read and interpret them. Fastest way to identify most significant variables and relation between two or more variables. Decision Trees are excellent tools for helping you to choose between several courses of action. Most popular decision trees are CART, CHAID, and C4.5 etc.

In general, decision trees can be used in real-world applications such as:

  • Investment decisions
  • Customer churn
  • Banks loan defaulters
  • Build vs Buy decisions
  • Company mergers decisions
  • Sales lead qualifications
Also on the CIO WaterCooler
Operating Model Lessons from the Fast Paced World of F1 - The First Lap

Logistic Regression: Logistic regression is a powerful statistical way of modeling a binomial outcome with one or more explanatory variables. It measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution.

In general, regressions can be used in real-world applications such as:

Support Vector Machines: Support Vector Machine (SVM) is a supervised machine learning technique that is widely used in pattern recognition and classification problems – when your data has exactly two classes.

In general, SVM can be used in real-world applications such as:

  • detecting persons with common diseases such as diabetes
  • hand-written character recognition
  • text categorization – news articles by topics
  • stock market price prediction

Naive Bayes: It is a classification technique based on Bayes’ theorem and very easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods. Naive Bayes is also a good choice when CPU and memory resources are a limiting factor

Also on the CIO WaterCooler
Managing IT spend in an uncertain economic period

In general, Naive Bayes can be used in real-world applications such as:

  • Sentiment analysis and text classification
  • Recommendation systems like Netflix, Amazon
  • To mark an email as spam or not spam
  • Facebook like face recognition

Apriori: This algorithm generates association rules from a given data set. Association rule implies that if an item A occurs, then item B also occurs with a certain probability.

In general, Apriori can be used in real-world applications such as:

  • Market basket analysis like amazon – products purchased together
  • Auto complete functionality like Google to provide words which come together
  • Identify Drugs and their effects on patients

Random Forest: is an ensemble of decision trees. It can solve both regression and classification problems with large data sets. It also helps identify most significant variables from thousands of input variables.

In general, Random Forest can be used in real-world applications such as:

The most powerful form of machine learning being used today, is called “Deep Learning”.

In today’s Digital Transformation age, most businesses will tap into machine learning algorithms for their operational and customer-facing functions

Sandeep Raut

7th Rank in Global Top 100 Digital Transformation Influencers Delivered speech at India Analytics & Big Data Summit at Bangalore on "How Machine Learning is helping in Digital Transformation" on 4th Feb 2016 Delivered Thought Leadership speech at Unicom - India Analytics & Big Data Summit on "Big Data Analytics disrupting industry" Delivered speech at IIT Mumbai on "Analysing Big data for disruptive innovation" Delivered a keynote speech at Rizvi College of Engineering on "Fraud Detection & Prevention using Analytics" • Director for Digital Transformation in Syntel. • Has more than 29 years of IT Services / Consulting / Off-shoring experience • Over 18 years in Business Intelligence space. • Had helped organizations in establishing the BI-Analytics Services CoEs. • Had spearheaded several marquee accounts and was significantly instrumental in building new business for the practice as well. • Had successfully initiated, mentored & deployed various strategic consulting services & solutions like Digital Transformation, BI Strategy Planning, BI Offshorization, BI Development/Deployment, Campaign Management, Inventory Optimization which resulted into multi-million dollar business. • Had developed & managed Customer relations with Global players across USA, UK & Asia Pacific. Specialties: Digital Transformation, BI & Big Data Analytics Banking and Financial Services, Healthcare LifeSciences, Insurance, Retail Manufacturing - Supply Chain Management

Have Your Say:

CIO WaterCooler