Do you have data that supports AI?

Technology sector blogs and news sites are pushing daily news about AI and how its critical for your company’s success, but how do you actually take advantage of these technologies. Are your current systems even ready to take the leap into AI and do you even have good enough data available for it?

Topics:

  1. What kind of data do you have?
  2. Internal data models and data flows
  3. Requirements for AI

What kind of data do you have?

Depending on how long history do you have with different systems in your company usually determines how much technological debt you have with them. Having started in the mainframe era gives you more challenges than start-ups which have used the latest technologies to collect data in to their systems. The limitations come in different shapes and your approach to implementing the correct data architecture needs to reflect this. I believe there is absolutely no limitations what you can achieve, if you implement the correct processes for the data. To understand your data you need to first map the data schemas in the different systems and then document them in central place. Then you need to understand how the data gets collected in different systems, how old the data is and how the data gets refreshed.

  1. List all the systems related to collecting data in your operations, so all the CRM, Marketing, Inventory, Support, Contract and Asset management data. The primary objective is to understand how the different silos in your organization uses and captures data in your systems.
  2. List all the data models related to the systems in step 1. and determine if the data models have common attributes that could be used to create a unified data entity. For example is customer x in system y same as in system z? If data is replicated across systems then you need to determine the master collection point for that data.
  3. Separate the data models to different types:
    • Operational data (Sales/Orders, Deliveries, Change requests, Support request, Tasks, Activity, Events),
    • Contact data (Customers, Employees, Companies, Locations, Teams),
    • Asset data (Physical/Virtual assets, Agreements/Contracts, Services/Products)

Internal data models and data flows

Now that you have collected understanding of different data models in your organization we need to create your internal data models. But what are internal data models anyways? The basic idea is to create data models that are virtual representation of your data, so that you can unify different types of data sources to one unified model. Lets open this by example, so that it becomes a bit more clear. For example when we talk about contact data, it has certain attributes: First name, Last name, Initials, email, mobile etc. Then all the different types of contacts can be categorized as being employees, customers (internal/external), managers etc. These contacts can have different relationships to locations, companies, teams, assets etc. The key to internal data modeling is to have common language for data, so that when we supply the data to the AI it understands the context and relationships of the data. Also when we do ETL (Extract, Transform, Load) operations for the data we understand how the data will be processed or used. It is also easier to add new data sources to your data model, if you have the basics covered and documented.

Next we need to understand how the data flows in your systems, so that we can understand the possible pitfalls in the gathering and enhancement of the data. If your data master for contact data is some external service, then we need to understand how we need to enhance the data to match our other systems. Combining or migrating data from different systems is always difficult, because different systems have built-in limitations. To overcome these limitations there needs to be systematic process in place. One way to overcome these limitations is to use unified identifiers (UID). Most companies have not yet taken advantages of unified identifiers, or created single UID for customer or asset that is then tracked across the systems. This is commonly used in marketing and advertising, but not so commonly used in enterprise systems. This is vital in cases where you need to link data between systems and understand how it flows across your systems. Of course it is not always easy to have one UID for everything, but you could use one database, where all the UID’s are collected and then when analyzing data you define the relationships. So in system x the UID is buyer, in system y its owner and system z its support person.

To understand data flows better it is vital to visualize them in different levels, so you have in-system, internal and external flows. Then these flows have different abstraction levels, from very detailed (individual fields that need to be filled in that flow) to vague (Sales process). It is important to determine unified terminology for these flows, so that all parties involved in gathering data understand how the data is used, updated and created in different data flows.

Requirements for AI

So now we understand how our company gathers data to different systems and we know how it flows between systems. The next thing is to understand how AI can help us create more value from the data we have gathered and how it can be used to enhance the data quality. So what is AI (https://snips.ai/content/intro-to-ai/) and why is it important to have good understanding of the data and data flows. It is fairly clear that AI is dependent on good quality raw data and it is also important to know the metadata of that data, so relationships and history of that data.

How does AI then learn? Well we simply give it a lot of data that has been prepared for understanding a certain scenario. Here are some example scenarios:

  • Event data – Predict issues with monitored devices: Data would be time serialized data on x devices, including issue category and value. AI (Machine learning tools) could be used to estimate when device is likely to have an issue or estimate what type of issues are likely to occur on day x.
  • Contact data – Good quality leads: Data would be action data from different marketing and advertising systems, where customer can be identified and linked to some type of campaign or event. Enhancing data would be required before any kind of analytics, should be attempted. Usually you can combine information on client IP to domain and then to x email address registered to event y or calculate the likeliness. Then try to understand which pages the possible lead accessed. Also knowing the customer domain you can analyze the company involved and estimate the type of products they might be interested. One challenge here is to be able to analyze and store information so that it complies to GDPR and other privacy regulations.
  • Asset data – Changes and support cases related to the device / service: Data would be time serialized data of different operational data related to the assets. This would allow you to analyze possible issue scenarios and correlations between changes and support tickets related to those changes. Also analyzing deliveries and support tickets related to device type, would helpful when trying to understand delivery issues and optimizations of delivery process.

I think you start to get an idea what AI can do for you and how to get there. This post was also not limited to specific technical tool, which I think is not in my domain knowledge. However I believe these are good general steps on how to get prepared for AI related knowledge building for your organization.

If this post gets any attraction then ill likely continue with the following topics next:

  • API strategy and how you can build your API portfolio
  • How to create value from citizen integration’s
  • API Management and how to start generating revenue from data
  • 1 + 1 = 3 when it comes to integration’s
  • How to enhance data with integration’s

Thank you for reading my first post, have a great day!