07 Nov

Notebook Thoughts: Machine Learning for Dummies

Machine learning is an evolving and complex science. If one takes into account all possible scenarios, dependencies and models, it would be impossible to sketch an all-encompassing explanation. So I will focus on what the non-data scientist should know about machine learning. That is, they should focus on understanding available data, machine learning types, algorithm models and business applications.

1.  First, determine if what is the nature of the available data. Do you have historical campaign data with historical results or an unstructured database of customer records? Or are you trying to make real-time decisions on streaming data?

2.  The data available–and use case–will determine the appropriate machine learning approach. There are three major types: supervised, unsupervised and reinforcement learning. a) Supervised Learning allows to you to predict an outcome based on input and output data (e.g. churn). b) Unsupervised Learning allows to you categorize outcomes based on input data (e.g. segmentation). c) Reinforcement Learning allows you to react to an environment (e.g. driverless car).

3.  Each machine learning type will use a number of algorithm. There are hundreds of variations. Supervised Learning typically uses regression or classification algorithms. Unsupervised Learning uses, but is not limited to, clustering algorithms. Reinforcement Learning will usually use to type of neural network algorithm.

4.  The uses of machine learning are nearly limitless. Although I listed here as the last step, determining the use case or business application should probably be the first step. For example, we would use logistic regression to determine whether a house would sell at a certain price or not and we would use linear regression to predict the future price of a house. You should keep in mind that machine learning business applications typically require the use of more than one machine learning type as well as multiple algorithms.

I hope this sketch serves as a useful guide. Feel free to share and use as needed.

Thoughts and comments as well as suggestions for other sketched explanations are welcomed.

15 Oct

Notebook Thoughts: Choosing the Right AI Algorithm for the Right Problem

There are countless algorithms we can used to mathematically predict an outcome to a business challenge.  However, the most widely used algorithms will fall into four categories: classification, continuous, clustering and recommendation.

Let’s use a real life example to illustrate how we choose the right algorithm to solve the right problem. For illustration purposes we are making a number of assumptions to keep things simple for the non-analyst.

Let’s say that a realtor is trying to answer the following questions:

  1. Will a couple buy a house? Here we are looking for a categorical answer of Yes or No. For this we would use some kind of Classification algorithm, which could include: Logistic Regression, Decision Trees or Convolutional Neural Network
  2. How much will they pay of the house? For this question we would use Continuous estimation as we trying to determine the value in a sequence. Is this case, one would likely use a Linear Regression algorithm.
  3. Where will the buy the house? Clustering would be the best approach to determine where they are likely to buy a house. K-means and Affinity
  4. If they buy a house, what else will they buy? Recommender System Algorithms are commonly used to determine next best offer or next best action. The most commonly used Recommender algorithm is Collaborative Filtering: either user-to-user or item-to-item.
26 May

Notebook Thoughts: Using Social Media to Measure Brand Health

Rather than using sentiment as a proxy for brand health, we should embrace a new model that measures the health of brands in the context of the competitive set and category ecosystem. The model looks at two core areas; Perception and Engagement. On the Perception side we focus on key areas that define thoughts and feelings about the brand. The Engagement side quantifies the reach and strength of the brand and its messaging. All volumes are weighed against sentiment, to ensure that brands are not rewarded for negatively driven spikes in activity. Both Perception and Engagement consist of four distinct areas of measurement:

PERCEPTION

  • Value: perception of the usefulness and benefit of a product compared to the price charged for it
  • Quality: general level of satisfaction with the way a product works and its ability to work as intended
  • Aspiration: expressing a longing or wish to own the product or to be associated with the product’s qualities
  • Differentiation: the extend to which social media users draw distinctions between the qualities and characteristics of the brand and its competitors

ENGAGEMENT

  • Presence: the size of a brand’s owned social communities weighed with the sentiment expressed by the community members toward the brand
  • Influence: the ability of a brand to earn unaided mentions as well as have its messaging amplified and shared by the social media community
  • Virality: the number of unique people engaged in conversations with or about the brand; weighed with the sentiment expressed by those users
  • Resonance: the ability of a brand to engage users with its content and elicit reactions from them
30 May

Notebook Thoughts: Understanding Marketing Attribution

Algorithmic or Probabilistic Attribution uses statistics and machine learning to determine the probability of conversion across marketing touchpoints. In other words, how much of a conversion should be attributed to each channel. In order to keep things simple, I randomly chose a few variables—out of dozens or more—that could go into our model. Let’s go through the rows one at a time.

Touchpoints – This is a mix that includes online as well as offline touchpoints.

Platforms – These are some of the platforms that can be used to collect the data for each touchpoint.

Cost – Cost is one of our most important variables since it helps us determine ROI. Just because something is effective does not meant it is efficient.

Frequency – How many times was our ad/content served to the prospect.

Action – Did the prospect take any action upon viewing our content (i.e. click on it, etc.).

Duration – What is the duration of an engagement? In this example, our prospect spent 6 seconds on landing page after clicking on a mobile ad. Also, we used technology to determine that the prospect looked at an OHH ad for 2 seconds.

Recency – When was the last engagement before conversation. The closer the recency between touch points the higher the weight to the precedent one. Here we see that the prospect conducted a product search within the hour after he/she engaged a social ad/content. Thus, “Social” would get a larger attribution.

Quality score – This a variable that you don’t see often that is extremely important. What is the quality of the ad/content? Was the ad place next to undesirable content? Was the engagement likely from a bot?

Halo – Was there a “halo effect?” That is, did the prospect take a secondary action of value to the brand. In this example, the prospect did a search for a related product and ended up buying that product in addition to the one advertised.

Data Lake / BI – This is the infrastructure needed to process the data and run the machine learning models.

Attribution – Self-explanatory.