ALEXA: State-of-the-art AI

If you want to see how good Alexa is at answering people's questions you should sign on to Alexa Answers and see the questions Alexa cannot answer. This site has gamified helping Alexa answer these questions. I spent a week doing this and figured out a pretty good work flow to stay in the top 10 of the leader board.

The winning strategy is to use Google. You copy the question in to Google and paste the answer Google back in to the Alexa Answers website for it to played back to the person who asked it. The clever thing is that since it is impossible to legally web-scrape at a commercially viable rate, Amazon have found a way of harnessing the power of Google without a) having to pay, b) violating's TOS, and c) getting caught stealing Google's IP.

After doing this for a week, the interesting thing to note is why Alexa could not answer these questions. Most of them are interpretation errors. Alexa misheard the question (e..g connor virus, coronda virus, instead of coronavirus). The remainder of the errors are because the question assumes Alexa's knowledge of the context (e.g. Is fgtv dead? - he's a youtube star) and without the subject of the question being a known entity in Alexa's knowledge graph, the results are ambiguous. Rather than be wrong, Alexa declines to answer.

Obviously this is where the amazing pattern matching abilities of the human brain come in. We can look at the subject of the question and the results and choose the most probable correct answer. Amazon can then augment Alexa's knowledge graph using these results. This is probably in violation of Google's IP if Amazon intentionally set out to do this.

Having a human being perform the hard task in a learning loop is something that we have also employed in building our platform. Knowledge Leaps can take behavioral data and tease out price sensitivity signals, using purchase data, as well as semantic signals in survey data.

Building An Agile Market Research Tool

For the past five years we have been building our app Knowledge Leaps, an agile market research tool. We use it to power our own business serving some of the most demanding clients on the planet.

To build an innovative market research tool I had leave the industry. I spent 17 years working in market research and experienced an industry that struggled to innovate. There are many reasons why innovation failed to flourish, one of which lies in the fact that it is a service industry. Service businesses are successful when they focus their human effort on revenue generation (as it should be). Since the largest cost base in the research are people, there is no economic incentive to invest in the long term especially as the industry has come under economic pressure in recent years. The same could be said of many service businesses that have been disrupted by technology. Taxi drivers being a good example of this effect.

This wouldn't be the first time market research innovations have come from firms that are outside of the traditional market research category definition. For example, SurveyMonkey was founded by a web developer with no prior market research experience. While, Qualtrics was founded by a business school professor and his son, again with no prior market research industry experience.

Stepping outside of the industry and learning how other types of businesses are managing data, using data and extracting information from it has been enlightening. It has also helped us build an abstracted-solution. While we can focus on market research use-cases, since we have built a platform that fosters analytics collaboration and an open-data philosophy finding new uses for it is a frequent occurrence.

To talk tech-speak what we have done is to productize a service. We have taken the parts of market research process which happen frequently and are expensive and turned them into a product. A product that delivers the story in data with bias. It does it really quickly too. Visit the site or email us to find out more.

Market Research 3.0

In recent years, there has been lots of talk about incorporating Machine Learning and AI into market research. Back in 2015, I met someone at a firm who claimed to be able scale up market research survey results from a sample of 1,000 to samples as large as 100,000 using ML and AI.

Unfortunately that firm, Philometrics, was founded by Aleksandr Kogan - the person who wrote the app for Cambridge Analytica that scraped Facebook data using quizzes. Since then, the MR world has moved pretty slowly. I have a few theories but I will save those for later posts.

Back on topic, Knowledge Leaps got a head start on this six years ago when we filed our patent for technology that automatically analyzes survey data to draw out the story. We don't eliminate human input, we just make sure computers and humans are put to their best respective uses.

We have incorporated that technology into a web-based platform: We still think we are a little early to market but there might be enough early adopters out there now around which we can build a business. 

As well as reinventing market research, we will also reinvent the market research business model. Rather than charge a service fee for analysis, we only charge a subscription for using the platform.

Obviously you still have to pay for interviews to gather the data, but you get the idea. Our new tech-enabled service will dramatically reduce the time-to-insight and the cost-of-insight in market research. If you want to be a part of this revolution, then please get in touch:

No Code Data Engineering #2

We are adding to our no-code data engineering use cases. Our new Collection Manager feature plugs data pipelines into databases with no code just using a simple drag-and-drop interface.

This feature allows users with zero knowledge of databases and query languages to import data into a database. An additional UI will then allow them to create queries, aggregations and extracts using a simple UI.

The UI can be set up to update the database with new data as it is arrives from external sources, it will also automate extract creation as new data is added.

Example use-cases for this feature would be in the creation of data feeds for dashboards that auto-populate, or creating custom data products which can be timed with a guaranteed delayed delivery time. This feature will also drive our retail experimentation business - we can design and set up a data framework that captures and tags the results from test-and-learn activity.

Platforms In Data

Data-is-the-new-oil is a useful framework for describing one of the use-cases we are developing our platform for.

Rather than their being just one platform in the create-process-deliver-use data analytics pipeline, a number of different platforms are required. The reason we don't fill our cars up with gasoline at our local oil rig is the same reason why data distribution requires a number of different platforms.

Data Platforms

The Knowledge Leaps platform is designed to take raw data from our providers, process and merge these different data feeds before delivering to our customers internal data platforms. Just like an oil-refinery produces the various distillates of crude-oil, the Knowledge Leaps platform can produce many different data products from single or multiple data feeds.

Using a simple UI, we can customize the processing of raw data to maximize the value of the raw data to providers as well as its usefulness to users of the data products we produce.

Data Engineering & Analytics Scripting Functions

We are expanding the operational functions that can be applied to data sets on the platform. This week we pushed out another product release incorporating some new functions that are helping us standardize data streams. Over the next few weeks we will continue to broaden out the data engineering capabilities of the platform. Below is a description of what each function does to data files.

We have also completed Exavault and AWS S3 integrations - we can know upload to as well as download from these two cloud providers.

Key WordDescription
@MAPPINGMap this var value to this new var value
@FILTERKeep rows where this var equals this value
@ADVERTISED LISTSpecify date + item combinations
@GROUPCreate a group of stores, items, countries
@COLUMN REDUCEKeep only these columns
@REPLACEReplace this unicode character with this value.
@RELABELChange the name of a column from this to that.
@COLUMN ORDERPut columns into this order prior to merge.
@PRESENCEReturn list of unique values in this column.
@SAMPLEKeep between 0.1% and 99.9% of rows.
@FUNCTIONApply this function for each row.
@FORMATStandardize format of this column
@MASKEncrypt this var salted with a value
@COLUMN MERGECombine these columns in to a new column

A Programming Language For Data Engineering

Noodling on the internet I read this paper (Integrating UNIX Shell In A Web Browser). While it is written 18 years ago, it comes to a conclusion that is hard to argue with: Graphical User Interfaces slow work processes.

The authors claim that GUI slow us down because they require a human to interact with them. In building a GUI-led data analytics application I am inclined to agree — the time and cost associated with development of GUIs increases with simplification.

To that end we are creating a programming language for data engineering on our platform.  Our working title for the language is wrangle (WRANgling Data Language). It will support ~20 data engineering functions (e.g., filter, mapping, transforming) and the ability to string commands together to perform more complex data engineering.

Excerpt from paper: "The transition from command-line interfaces to graphical interfaces carries with it a significant cost. In the Unix shell, for example, programs accept plain text as input and generate plain text as output. This makes it easy to write scripts that automate user interaction. An expert Unix user can create sophisticated programs on the spur of the moment, by hooking together simpler programs with pipelines and command substitution. For example:

kill `ps ax | grep xterm | awk '{print $1;}'`

This command uses ps to list information about running processes, grep to find just the xterm processes, awk to select just the process identifiers, and finally kill to kill those processes.

These capabilities are lost in the transition to a graphical user interface (GUI). GUI programs accept mouse clicks and keystrokes as input and generate raster graphics as output. Automating graphical interfaces is hard, unfortunately, because mouse clicks and pixels are too low-level for effective automation and interprocess communication."

Machine Screws and AI


The attraction of AI is that it is learns by experience. All learning requires feedback, whether its an animal, a human or a computer doing the learning, The learning-entity needs to explore its environment in order to try different behaviors, create experiences and then learn from them.

For computers to learn about computer-based environments is relatively easy. Based on a set of instructions, a computer can be trained to learn about code that it is executing or is being executed by another computer. The backbone of the internet uses something to similar to ensure data gets from point A to point B.

For humans to learn about human-environments, this is also easier. It is what we have been doing for tens of thousands of years.

For humans to learn about computer-based environments is hard. We need a system to translate from one domain into another. Then we need a separate system to interpret what we have translated. We call this computer programming, and because we designed the computer, we have a bounded-system. There is still a lot to understand, but it is finite and we know the edges of the system, since we created it.

It is much harder for computers to learn about human environments. The computer must translate real-world (human) environment data into its own environment, and then the computer needs to decode this information and interpret what it means. Because the computer didn't design our world it doesn't have the advantage that humans do when we learning about computers. It also doesn't know if it is bounded-system or not. For all the computer knows, the human-world is infinite and unbounded - which it could well be.

In the short term, to make this learning feasible we use human input. Human's help train computers to learn about the real-world environments. I think of the reasons that driver-less car technology is being focused on, is that the road system is a finite system (essentially its 2D) that is governed by a set of rules.

•Don't drive into anything, except a parking spot.

•Be considerate to other drivers, e.g. take turns at 4-way stop signs.

•Be considerate to pedestrians and cyclists.


This combination of elements and rules makes it a perfect environment to train computers to learn to drive, not so much Artificial Intelligence but Human-Assisted Intelligence. Once we have trained a computer to decode the signals from this real-world environment and make sensible decisions with good outcomes, we can then apply this learning to different domains that have more variability in them, such as delivering mail and parcels.

This is very similar to the role of the machine screw in the industrial revolution. Once we had produced the first screw, we could then make a machine that could produce more accurate screws. The more accurate the screw, the more precise the machine, the smaller tolerance of components it could produce, the better the end-machine. Without the machine screw, there would have been no machine age.

This could open the doors to more advanced AI, it is some way off though because time required to train computers to learn about different domains.

Building the Future of Machine Learning and Analytics. Right Here, Right Now.



TechCrunch recently published an article which describes what I am building with the Knowledge Leaps platform (check out website here).

Knowledge Leaps, is a soup-to-nuts data management and analytics platform. With a focus on data engineering, the platform is aimed at helping people prepare data in readiness for predictive modeling.

The first step to incorporating AI in to an analytics process is to build an application that automates grunt work. The effort is in cleaning data, mapping it and converting it to the right structure for further manipulation. It's time-consuming but can be systematized. The Knowledge Leaps application does this, right now. It seamlessly converts any data structure into user-level data using a simple interface, perfect for those who aren't data scientists.

Any data can then be used in classification models using an unbiased algorithm combined with k-fold cross validation for rigorous,objective testing. This is just the tip of the iceberg of its current, and future, functionality.

Onward, to the future of analytics.

When Do We Start Working For Computers?

I have done some quick back-of-envelope calculations on the progress of AI, trying to estimate how much progress has been made vs. how many job-related functions and activities there are left to automate.

On Angel List and Crunchbase there are a total of 4830 AI start-ups listed (assuming both lists contain zero duplicates). To figure out how many unique AI tools and capabilities there are, let's assume the following:

  1. All these companies have a working product,
  2. Their products are unique and have no competitors,
  3. They are all aimed at automating a specific job function, and
  4. These start-ups only represent 30% of all AI-focused company universe.

This gives us a pool of 16,100 unique, operational AI capabilities. These capabilities will be in deep domains (where current AI technology is most successful) such as booking a meeting between two people via email.

If we compare this to the number of domain specific activities in the world of work, we can see how far AI has come and how far it has to go before we are all working for the computers. Using US government data, there are 820 different occupations, and stock markets list 212 different industrial categories. If we make the following set of assumptions:

  1. 50% of all occupations exists in each industrial category,
  2. Each occupation has 50 discrete activities.

This gives us a total of 4.34 million different occupational activities that could be automated using AI. In other words, at its most optimistic, current AI tools and processes could automate 0.37% of our current job functions. We have come a long way, but there is still a long way to go before we are out of work.  As William Gibson said, "the future's here, it's just not widely distributed yet"