Benford’s law is how the IRS/HMRC can tell if the information you submit on your tax filings is fraudulent. When people lie on their tax forms they tend to use random numbers, when really the number distribution should follow Benford’s law. https://en.wikipedia.org/wiki/Benford%27s_law
On recent visit to Southwest Utah I saw lots of pygmy forests containing pinyon pines and small oak trees, these forests are sparse and the trees no more than 8-10 feet tall. The National Park literature says that these trees have adapted to low water conditions. Contrast this with the Redwood forests of coastal California where resources (water & sunlight) are abundant. In this environment the trees are more densely packed and grow much taller.
Replace trees with firms and resources for customers, and this paragraph could describe a business landscape. Being binary for a moment, a new firm gets to choose between choosing to enter a market where resources (customers) are slim or to enter a market where there are lots of customers. Choosing a market with few customers, makes it easier to differentiate your firm but the odds of survival are worse. Choosing a market with more customers makes it harder differentiate your firm and therefore the survival odds are also tough.
Unless of course, your firm is first. In both instances you get to choose the best position and consume all available resources.
Some thoughts on what I have learnt by working in a new company that is building software. A lot of what you “should” do is the wrong thing to do. Here are some reflections on building a firm in San Francisco.
Speaking to prospect firms will get you further, faster than speaking to venture capital firms. Firms that have pain points will pay for solutions and they won’t care so much how many other firms have the same pain point. Venture capital firms are interested in size of market, size of outcome, probability of success, experience of the team. Answering a VC’s questions won’t necessarily help you build a product and a business. If you can’t afford to build the software that will answer the pain point you are trying to solve, then work out what you can build and how you can bridge the gap using other means.
Perform The Process By Hand, Before Writing Code
The best business software is first cut-by-hand like the first machine screw. If your software replaces a human-business-process and you can’t afford to build the software, ask yourself ‘how much can my firm afford to build?’
Most processes have the same elements: Task Specification, Task Execution, Present Results. The most complex part of this is Task Execution as this will require a lot of code and a lot of investment. As your company speaks to firms work out if it is possible to use humans to perform the complex Task Execution element. If you think it is then you should build a software architecture and framework that allows humans to do the hard work at first. This will help you refine the use-case and build more effective and efficient code. This also wouldn’t be the first time this has been done, see here and here for more background.
A useful piece of military wisdom is worth keeping in mind; no plan survives first contact with the enemy. While customers are certainly not the enemy, the sentiment still holds. It’s not until you put your plan in to action and have firms use your product that you realize its true strengths and weaknesses. Here begins the process of iterating product development.
“Speak to people, we might learn something”
This is what my business development lead says a lot. He also asks questions that get customers and prospects talking. In these moments you will learn about the firm, the buyer, the competition, and lots of other information that will make your product and service better.
“We are just starting out”
This is another useful mantra. In lots of ways we do not know where our journey will take us. It is part inspired by company vision but also customer feedback. In Eric Beinhocker’s book, The Origin of Wealth, he likens innovation to the process of searching technology-solution-space, an innovation map, looking for high points (that correlate with company profits and growth). The important part of this search process is customer feedback. What your company does determines you starting point on the innovation map, how your firm reacts to customer and market feedback determines which direction you will go in, and ultimately will be a critical factor in its success.
Data-is-the-new-oil is a useful framework for describing one of the use-cases we are developing our platform for.
Rather than their being just one platform in the create-process-deliver-use data analytics pipeline, a number of different platforms are required. The reason we don’t fill our cars up with gasoline at our local oil rig is the same reason why data distribution requires a number of different platforms.
The Knowledge Leaps platform is designed to take raw data from our providers, process and merge these different data feeds before delivering to our customers internal data platforms. Just like an oil-refinery produces the various distillates of crude-oil, the Knowledge Leaps platform can produce many different data products from single or multiple data feeds.
Using a simple UI, we can customize the processing of raw data to maximize the value of the raw data to providers as well as its usefulness to users of the data products we produce.
Knowledge Leaps is now Knowledge Leaps ®.
Many firms (Amazon, Google, etc) are touting their plug-and-play AI and Machine Learning tool kits as being a quick way for firms to adopt these new technologies without having to invest resources building their own.
Sound like a good idea but I challenge that. If data is going to drive the new economy, it will be a firm’s analytics capabilities that will give it a competitive advantage. In the short-term adopting a third-party framework for analytics will move a firm up the learning curve faster. Over time this competitive edge becomes blunter, as more firms in a sector start to use the same frameworks in the race to be “first”.
This homogenization will be good for a sector but pretty rapidly firms competing in that sector will be soon locked back in to trench warfare with their competitors. Retail distribution is a good example, do retailers use a 3rd party distribution network or do they buy and maintain their own fleet. Using a 3rd party distributer saves upfront capex but it voids an area of competitive advantage. Building their own fleet, while more costly, gives a retailer optionality about growth and expansion plans.
The same is true in the rush for AI/ML capabilities. While the concepts of AI / ML will be the same for all firms, their integration and application has to vary from firm-to-firm to preserve their potential for providing lasting competitive advantage. The majority of firms we have spoken to are developing their own tool kit, they might use established infrastructure providers but everything else is custom and proprietary. This seems to be the smart way to go.
We are expanding the operational functions that can be applied to data sets on the platform. This week we pushed out another product release incorporating some new functions that are helping us standardize data streams. Over the next few weeks we will continue to broaden out the data engineering capabilities of the platform. Below is a description of what each function does to data files.
We have also completed Exavault and AWS S3 integrations – we can know upload to as well as download from these two cloud providers.
|@MAPPING||Map this var value to this new var value|
|@FILTER||Keep rows where this var equals this value|
|@ADVERTISED LIST||Specify date + item combinations|
|@GROUP||Create a group of stores, items, countries|
|@COLUMN REDUCE||Keep only these columns|
|@REPLACE||Replace this unicode character with this value.|
|@RELABEL||Change the name of a column from this to that.|
|@COLUMN ORDER||Put columns into this order prior to merge.|
|@PRESENCE||Return list of unique values in this column.|
|@SAMPLE||Keep between 0.1% and 99.9% of rows.|
|@FUNCTION||Apply this function for each row.|
|@FORMAT||Standardize format of this column|
|@DYNAMIC DATA||Implement an API|
|@MASK||Encrypt this var salted with a value|
|@COLUMN MERGE||Combine these columns in to a new column|
As we work with more closely with our partner company DecaData, we are building tools and features that help bring data products to market and then deliver them to customers. A lot of this is repetitive process work, making it ideal for automation. Furthermore, if data is the new oil, we need an oil-rig, refinery and pipeline to manage this new commodity.
Our new feature implements these operations. Users can now create automated, time-triggered pipelines that import new data files and then perform a set of customizable operations before delivering them to customers via SFTP or to an AWS S3 bucket.
Why Human Data Is More Powerful than Tools or Platforms.
At KL we realize the value of data is far greater than either analytic tools or platforms. As a team, we spend a lot of our time discussing the topics of data and analytics, especially analytics tools. We used to devote more time to this latter topic in terms of selection of existing tools and development of new ones. We spent less time talking about platforms and data. Overt time we have come to understand that all three of Data, Platform, Analytics are vital ingredients to what we do. This is visualized in our logo, we are about the triangulation of all three.
On this journey, I have come to realize that some things take a long time to learn. In my case , when you study engineering, you realize that the desire to make tools (in the broadest sense) is in your DNA. Not just your own, in everyone’s.
Building tools is what humans do, whether it’s a flint arrowhead, the first machine screw or a self-driving car. It’s what we have been doing for millennia and what we will continue to do.
As a species I think we are blind to tools because they are so abundant and seemingly easy to produce – because as a species we make so many of them. In that sense they are not very interesting and those that are interesting are soon copied and made ubiquitous.
What is true of axes, arrowheads and pottery is also true of analytics businesses. The reason it is hard-to-build a tool-based business is that the competition is intense. As a species, this won’t stop us trying.
In stark contrast to analytics tools, is the importance of data and platforms. If a flint arrowhead is a tool then the cave painting is data. When I look at images of cave paintings, such as the cave of hands shown, I am in awe. A cave painting represents a data point of human history, the cave wall the platform that allows us to view it.
This is very relevant to building a data-driven business, those firms that have access to data and provide a platform to engage with it will always find more traction than those that build tools to work on top of platforms and data.
Human data points are hard to substitute and, as a result, are more interesting and have a greater commercial value than tools.
In conversations with a friend from university I learned about the No Free Lunch Theorem and how it affects the state-of-the-art of machine learning and artificial intelligence development.
Put simply, the No Free Lunch Theorem (NFL) proves that if an algorithm is good at solving a specific type of problem then it pays for this success by being less successful at solving other classes of problems.
In this regard, Algorithms, AI Loops and Machine Learning solutions are like people; training to achieve mastery in one discipline doesn’t guarantee that same person is a master in a related discipline without further training. However, unlike people, algorithm training might be a zero-sum game with further training likely to reduce the competency of a machine learning solution in an adjacent discipline. For example, while Google’s AlphaZero can be trained to beat world champions at chess and Go, this was achieved using separate instances of the technology. A new rule set was created to win at chess rather than adapting the Go rule set. Knowing how to win at Go doesn’t guarantee being able to win at chess without retraining.
What does this mean for the development of AI? In my opinion while there are firms with early-mover advantage in the field, their viable AI solutions are in very deep domains that tend to be closed systems, e.g. board games, video games, and making calendar appointments. As the technology is developed, each new domain will require new effort, likely to lead to a high number of AI solutions/providers. So rather than an AI future dominated by corporate superpowers there will be many providers, each with a domain-distinct AI offerings.