What‌ ‌is‌ ‌DataOps,‌ ‌and‌ ‌why‌ ‌it’s‌ ‌a‌ ‌top‌ ‌trend‌

Enterprise

Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.


Enterprises‌ ‌have‌ ‌struggled‌ ‌to‌ ‌collaborate‌ ‌well ‌around‌ ‌their‌ ‌data, which hinders their ability to adopt‌ ‌transformative‌ ‌applications‌ ‌like‌ ‌AI.‌ ‌

‌The‌ ‌evolution‌ ‌of‌ ‌‌DataOps‌ ‌could‌ ‌fix that problem. The‌ ‌term‌ ‌DataOps‌ ‌emerged‌ ‌seven‌ ‌years‌ ‌ago‌ to refer to ‌best‌ ‌practices‌ for ‌getting‌ ‌proper‌ ‌analytics,‌ ‌and research firm Gartner calls it a major trend encompassing several steps in the data lifecycle.

Just‌ as‌ ‌the‌ ‌DevOps‌ ‌trend‌ ‌led‌ ‌to‌ ‌a‌ ‌better‌ ‌process‌ ‌for‌ ‌collaboration‌ ‌between‌ ‌‌developers‌ ‌and‌ ‌operations‌ ‌teams,‌ ‌DataOps‌ ‌refers‌ ‌to closer collaboration between various teams handling data and operations teams deploying data into applications.

Gartner:‌ ‌DataOps‌ ‌is‌ ‌a‌ ‌major‌ ‌trend‌ ‌in‌ ‌2021‌

‌Getting‌ ‌DataOps‌ ‌right‌ ‌is‌ ‌a‌ ‌significant‌ ‌challenge‌ ‌because‌ ‌of‌ ‌the‌ ‌multiple‌ ‌stakeholders‌ ‌and‌ ‌processes‌ ‌involved‌ ‌in‌ ‌the‌ ‌data‌ ‌lifecycle.‌ In the DevOps world, enterprises ‌can‌ ‌develop,‌ ‌test,‌ ‌and‌ ‌deploy‌ ‌app‌ ‌updates‌ ‌in‌ ‌a‌ ‌matter‌ ‌of‌ ‌hours.‌ It is harder to move that fast in the data world, as it‌ can ‌take‌ ‌eight‌ ‌months‌ ‌to‌ integrate ‌an‌ ‌ML‌ ‌model‌ ‌into‌ business‌ ‌workflows‌ ‌and‌ deliver tangible value.

‌”[Creating]‌ ‌a‌ ‌common‌ ‌architecture‌ ‌pattern‌‌ helps‌ ‌with‌ ‌operationalizing‌ ‌data‌ ‌science‌ ‌and‌ ‌ML‌ ‌pipelines‌ ‌and‌ ‌has‌ ‌been‌ ‌identified‌ ‌as‌ ‌one‌ ‌of‌ ‌the‌ ‌major‌ ‌trends‌ ‌for‌ ‌2021,” Gartner research director ‌Soyeb‌ ‌Barot said.‌ ‌

Gartner‌ ‌‌predicts ‌enterprises‌ ‌will‌ ‌begin‌ ‌to‌ ‌see‌ ‌real‌ ‌gains‌ ‌in‌ ‌these‌ ‌efforts‌ ‌through‌ ‌the‌ ‌evolution‌ ‌and‌ ‌extension‌ ‌of‌ ‌DataOps‌ ‌to‌ ‌support‌ ‌trusted‌ ‌AI.‌ ‌The research firm ‌predicts‌ ‌the‌ ‌number‌ ‌of‌ ‌enterprises‌ ‌that‌ ‌have‌ ‌operationalized‌ their‌ ‌AI‌ ‌efforts‌ ‌will‌ ‌grow‌ ‌from‌ ‌8%‌ ‌in‌ ‌2020‌ ‌to‌ ‌70%‌ ‌in‌ ‌2025‌ ‌due‌ ‌to‌ ‌the‌ ‌maturity‌ ‌of‌ ‌AI‌ ‌orchestration‌ ‌platforms.‌ ‌ ‌

Above: Soyeb Barot, research director at Gartner.

Image Credit: Gartner

Even so, ‌enterprises‌ ‌will‌ ‌struggle‌ ‌to‌ ‌move‌ ‌their‌ ‌AI‌ ‌predictive‌ ‌projects‌ ‌past‌ the‌ ‌proof‌ ‌of‌ ‌concept‌ stage ‌because‌ ‌they‌ ‌have‌ ‌not‌ ‌addressed‌ ‌the‌ ‌full‌ ‌range‌ ‌of‌ ‌processes‌ ‌for‌ ‌collaborating‌ ‌across‌ ‌the‌ ‌AI‌ ‌lifecycle.‌ ‌A‌ 2019‌ ‌Gartner‌ ‌survey‌ ‌found‌ ‌that‌ ‌the‌ ‌top‌ ‌four‌ ‌challenges‌ ‌companies‌ ‌face‌ were ‌security‌ ‌or‌ ‌privacy‌ ‌concerns‌ ‌(30%)‌ ,‌ ‌complexity‌ ‌of‌ ‌AI‌ ‌integration‌ ‌with‌ ‌existing‌ ‌infrastructure‌ ‌(30%)‌, ‌data‌ ‌volume‌ ‌or‌ ‌complexity‌ ‌(22%),‌ ‌and‌ ‌potential‌ ‌risks‌ ‌or‌ ‌liabilities‌ ‌(22%).‌ ‌

Gartner‌ ‌argues‌ ‌that‌ ‌a‌ ‌more‌ ‌nuanced‌ ‌way‌ ‌of‌ ‌thinking‌ ‌about‌ ‌different‌ ‌types‌ ‌of‌ ‌collaboration‌ ‌can‌ ‌improve‌ ‌this‌ ‌transition.‌ ‌This‌ ‌includes‌ ‌extending‌ ‌the‌ ‌older‌ ‌idea‌ ‌of‌ ‌DataOps‌ ‌(data‌ ‌engineering)‌ ‌to‌ ‌include‌ ‌MLOps‌ ‌(machine‌ ‌learning‌ ‌development),‌ ‌ModelOps‌ ‌(AI‌ ‌governance),‌ ‌and‌ ‌Platform‌ ‌Ops‌ ‌(overarching‌ ‌AI‌ ‌platform‌ ‌management).‌ ‌It‌ ‌has‌ ‌characterized‌ ‌this‌ ‌entire‌ ‌collection‌ ‌of‌ ‌capabilities‌ ‌as‌ ‌XOps.‌ ‌ ‌

‌”These‌ ‌frameworks‌ ‌can‌ ‌help‌ ‌implement‌ ‌a‌ ‌structured‌ ‌process‌ ‌for‌ ‌the‌ ‌people‌ ‌involved‌ ‌to‌ ‌productionalize‌ ‌AI.‌ ‌Think‌ ‌of‌ ‌it‌ ‌as‌ ‌the‌ ‌assembly‌ ‌line‌ ‌of‌ ‌an‌ ‌automobile‌ ‌manufacturing‌ ‌plant,‌ ‌but‌ ‌for‌ ‌data,” Barot said.‌ ‌

Getting‌ ‌to‌ ‌DataOps‌ ‌

‌Software development was historically a slow plodding process in which developers spent months or even years working on new updates that were collectively thrown over the wall to testing and operations teams. In 2008, Andrew Clay and Patrick Debois began discussing how to streamline this process through better collaboration between developers, testers, and operation teams. This came to be known as DevOps since it improved the handoff between development and operations teams.

As the movement took hold, it led to the creation of a variety of platforms, tools, and processes that allowed teams to continuously integrate and deploy applications in small bits that could be rolled back if problems occurred. But these same kinds of innovations eluded efforts to create value from the growing volume, variety, and velocity of big data. As much as pundits predicted that big data was the new oil, companies struggled to operationalize big data in the way DevOps improved the deployment of code.

Value is gleaned from data by creating artifacts like analytics, machine learning models, and data-driven applications. But doing these things introduced a variety of new challenges and bottlenecks outside the scope of DevOps practices. In a blog post for IBM in 2014, Lenny Liebmann, then a contributing editor at InformationWeek, introduced the notion of DataOps to characterize these challenges and suggest a path forward.

In an interview with VentureBeat, Liebmann, who is now a founding partner of technology adoption consultancy Morgan Armstrong, said that at the time a lot of enterprises were struggling to solve big data problems using improved technology without addressing the organizational and process side. He said, “People thought you could just throw big data into a magic bucket and it would work.” But they bumped up against a variety of issues connecting disparate sources and types of data to new applications and analytics.

One of the main issues he saw was that businesses would focus on the functional aspects, like moving the actual data around through better data engineering tools, without addressing non-functional issues like performance, availability, quality, scalability, security, and governance.

A lot of the fundamental data engineering challenges have been solved as enterprises begin moving their infrastructure to the cloud. “This is less a problem today than when I first talked about it,” Liebmann said. The next step lies in mapping out a strategy to address security, governance, and quality issues as companies scale their data operations.

The dawn of XOps

Barot has had many conversations with enterprises asking for DataOps tools only to discover they already had a strong DataOps framework. They really needed more help in operationalizing their AI processes. This is where Gartner’s model of XOps emerges to provide the foundation for a more comprehensive set of distinctions.

“We were looking at all these ‘ops’ terminologies in the marketplace, and there was ambiguity about what they were for and the relationship between them,” Barot said. “We wanted to set the record straight as to what they stand for and how they are related to each other as part of bigger strategic initiatives in the enterprise.”

Above: Gartner’s model for AI includes MLOps, SecOps, DevOps, and DataOps.

Image Credit: Gartner

In this expanded taxonomy, Gartner constrains DataOps to the challenges associated with building, managing, and scaling data pipelines in a way that promotes reusability, reproducibility, and rolling back changes if problems occur. Some of these key capabilities include data extraction, integration, transformation, and analysis. Governance is constrained to the data itself.

MLOps focuses on improving the collaboration across development and operationalization of the machine learning model development lifecycle. These activities are typically performed outside of the purview of traditional data engineering practices. Data scientists are often tasked with a process called feature engineering for tuning ML models to improve decision-making, discover insight, or enable a new application feature. MLOps makes it easier to tie these efforts in with teams on the operations side that are responsible for deploying the models into production.

ModelOps is an extension of MLOps to help companies work with third-party AI models that may be baked into enterprise applications or improve decision-making using tools like knowledge graphs, rules engines, or new optimization algorithms. The biggest differentiation is that MLOps makes it easier for business experts to manage AI models with less reliance on data engineering and for data science teams to implement changes.

Platform Ops provides an overarching framework to help organizations manage activities that span all of these different kinds of activities, as well as DevOps. It is also the youngest and most immature market.

AIOps would probably have been a better term to describe this overall way of thinking about AI management, Barot said. However, the term was already widely used to describe the use of AI to improve IT operations management.

While there are dozens of commercial products for the other domains, Barot said there are only four commercial Platform Ops tools today: Amazon SageMaker, Cloudera SDC, ForePaas, and OneLogic. There are also a variety of open source Platform Ops tools that are being championed by commercial vendors as part of their larger portfolio of AI tools. Barot expects to see intense competition among vendors rushing to become the AI orchestration platform other things get plugged into.

Barot cautions that there are no silver bullet products. Every enterprise will need to adopt the best capabilities suited to their existing development practices and industry niche.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member