Software has been eating the world for quite some time.
In the last few years, this Mark Andreesen quote has been used by investors and tech leaders alike to describe various shifts that resulted from technological advancements. One of the most common transformations is perhaps the notion that machine learning (ML) is eating the world. In many ways, that is true — advancements in AI and ML are staggering — people who were amazed by an AI ability to detect hotdogs
in still images, are now baffled by a Tesla driving on autopilot as its driver struggles to stay awake behind the wheel.
When a company has the resources, know-how, and talent needed to implement ML within its core business, its subsequent impact on business results is unquestionable. Unfortunately, this isn’t the case for most. Even in companies that employ data teams, who research and experiment, only 13% of ML projects reach the production stage.
The challenge begins with the inherent differences between data science and software development. Building an AI/ML prototype is a task that resembles a scientist’s work in a lab, it’s based on notebooks, experimentations, comparison of results, etc., and it’s performed by data scientists.
On the other hand, software development is engineering work. Like a building being built, every new piece of software is added to the overall structure with predefined perimeters and integrations. It is designed to stay intact, and if something breaks, a roll-back is possible with a click of a button. There are no experiments in production, whereas data science is all about experimentation.
Unlike code-changes that don’t necessarily change how the program behaves, ML is based on real-life data that by definition changes the output of the model, and in some cases, even changes the model itself.
Creating a framework that supports these constant changes, as well as bridges the differences between data science and software development requires data scientists and developers (or data engineers, as they are known today) to work hand-in-hand. The various stakeholders, which in some cases also include non-technical staff like product managers, business-line owners, and other functions, do not always have a common language. They do not share the same tools and they lack boundaries and a clear definition of responsibilities.
Solving the issues caused by the gap between data science and software development is not merely an operational promise. Data shows that AI companies have significantly lower gross margins, compared to their SaaS counterparts
(50%-60% compared to more than 80%). Building an ML production environment with the right framework not only unlocks new business opportunities but also holds the key to improving the most important business KPIs that affect a company’s value.
Similar to how DevOps has been the enabler of scalability, allowing companies to innovate and release new versions faster than ever, MLOps practices are being developed to face the challenges and growing pains AI companies face.
MLOps can be broadly defined as the convergence of technologies, practices, and processes to smoothly and scalably deploy, monitor, and manage ML models in production. Yet, though it sounds exceptionally simple, it is extremely complex to execute. The Internet is overflowing with blog posts
explaining MLOps from various angles, so I’ll just sum up the challenges concerning building a good product.
First, rather than serving a single type of user, you must tend to the needs of a variety of stakeholders who must understand and use your product on a daily basis. As such, MLOps is intended to be the thread that ties the entire ML lifecycle together
. Second, the ML landscape is still in a nascent stage, making designing and building a framework more like shooting at many moving targets rather than aiming at a sitting duck. MLOps tries to reduce the time to market of ML products when paradoxically, the plethora of new tools and common practices create confusion that hinders the maturation of projects.
If we look at the Israeli market, we see more and more companies emerging in this space, and as an early backer of companies in the DevOps space such as Coralogix, Epsagon, SafeDK to name a few, we have been closely monitoring the emergence of the MLOps space. Our first investment in the space was in DAGsHub
which is building a collaboration platform for data scientists, to address the pains data scientists face during the research phase. As more companies have gained ML momentum, we’ve been learning about the challenges of scaling ML production environments and were actively looking for the right team to back.
As a Seed investor, a key component in our decision-making process is Founder-Market Fit
. Therefore, we look for teams that had the best understanding of the problem’s customers face, the best knowledge of state-of-the-art offerings available in the market (by cloud vendors, start-ups, or open-source projects), and the ability to execute an ambitious vision. A great example of this is a company called Qwak
and their vision is big. Eventually, they want to untangle the ties and dependencies between data scientists and ML/ Data engineers
, bringing the agility of SW development to managing ML in production. To execute that vision, they’ve built a comprehensive product addressing multiple stakeholders’ needs, within a unified platform. It smoothly integrates with the common tools that most organizations use today. Onboarding a model onto the platform only requires a single line of code.
We are in the very early phase of the ML revolution. At StageOne, we believe that it will touch almost every aspect of our lives and have been investing in horizontal and vertical start-ups in the space accordingly. The companies building ML-powered products and services are eager to find solutions to help them scale and build their offerings well.
Nate Meir is a Principal at StageOne Ventures