Enterprise AI Can’t Succeed Without Enterprise Information Architecture

Enterprise AI can’t succeed without enterprise information architecture

In 1967, Melvin Conway submitted a paper to Harvard Business Review called “How Do Committees Invent?” that described correlations between innovation, design of systems, and organizational group structures. Since there was no evidence submitted it was rejected by HBR. However, over the years, one of Conway’s observations has consistently stood out as true and has now become codified as Conway’s law:

“Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations.”

Over the years, while organizations have become better at building solutions that are business goals and outcome-based, Conway’s law means there continues to be a disconnect between expectations and final output due to the communication dynamics and workflows of organizations.

In a similar vein, I argue that there’s a corollary to Conway’s law that endures itself in Enterprise AI and ML: AI/ML outcomes are constrained to produce results that are reflective of an Enterprise’s information architecture and the structure of its information flow. An enterprise’s IA and information flows represent its information fingerprint.

There are a few patterns that I have seen over the years that illustrate this point.

The “script as you grow” pattern: Take the example of one of the iconic behemoths of IT: the Mainframe. Historically built as a reliable transactional system, IT workers such as data analysts, BI analysts, and others did all their work on the Mainframe itself. Then when modern software applications became popular with their advanced user interfaces, data from the Mainframe had to be filtered down to databases associated with these applications.

The journey of this data from the Mainframe to these databases usually involved a proliferation of scripts and hops across servers on the enterprise network. These scripts were typically written given one need at a time, so as applications grew, so did the scripts, either derived from existing or whipped up brand new. They kept getting patched as needs changed. Their complexity grew as scripts embedded other scripts. Modern alternatives to unload data from the Mainframe are available today, but they have not been able to replace the complexity embedded in decades of scripting. What follows from this type of information architecture is the following:

  • Duplication of data
  • Information inconsistencies: Interpretation of the same data in different ways, often not compatible
  • Information silos
  • Varying “freshness” of data
  • Potential data loss: With multiple dependencies across scripts, complex scheduling, and SLAs, one failure across the load and update process along the pipeline can cause unintended data overwrites and losses

From an AIML perspective, this IA and flow results in quite a few frustrations: unreliable data, rigidity, and constraint around obtaining new features to incorporate and general resistance to new asks for additional data.

The “SOA” pattern: One can argue that the previous example is an older era, although I know plenty of organizations that are still dealing with it. Let’s move on to a more evolved and modern scenario.

Many organizations today have developed an Enterprise Application Architecture (EAI / EA). Often at the core of this is the Enterprise Service Bus. To its credit, the ESB does attempt to and does a reasonable job of addressing “communication” challenges between applications. But here’s an important distinction: ESBs were designed for messaging between applications, they were not really designed for “data-driven” solutions. In the end, the net result is:

  • It’s data that’s siloed for applications
  • It’s data that’s “filtered” for the application, meaning a lot of contextual information needed for AI/ML will be missing
  • Data quality is still not consistent and isn’t given the top priority with this pattern
  • Scale continues to be an issue
  • Incorporating changes can still take a long time
  • Data formats are standardized early on and the early de facto has always been XML or JSON

The “Digital Transformation” pattern: So I am cheating here a little as this is not a real pattern. This is more of a present-day trend. However, many enterprises are today on this transformation journey, some ahead of others. They understand that data-driven initiatives using AI and ML can create transformative outcomes. However, AI/ML-based outcomes are only as good as the data that feeds it, and many organizations are quickly discovering that the needed enterprise information architecture for AI/ML is lacking:

  • Multiple AI/IML projects are being launched across different parts of the enterprise, many requiring potentially the same datasets, but varying ages and formats
  • AI/ML most often requires access to production data and in the end will run against production data. This means Information governance and trustworthiness of the data is of critical importance
  • There is an order of magnitude growth in data volumes and data types that traditional architectures are not fit to handle
  • Models themselves need to be treated like any other software engineering product: governed and managed (I had earlier written an article on this topic: Enterprise data science is largely an engineering challenge)

Enterprise AI/ML can help realize transformative outcomes. But the success of enterprise AI/ML requires an investment in a scalable, dynamic, and resilient enterprise information architecture.

Raj Nair

Raj Nair

VP, Data Engineering & AIML