Artificial intelligence (AI) is the development of computer systems capable of performing tasks generally associated with human intelligence. You’ve all heard of ChatGPT? It’s one of many well-known examples of AI.
In concrete terms, artificial intelligence will transform machines into intelligent assistants, opening the door to countless possibilities for businesses. AI projects in business, whether focused on research, product development or process automation, are commonplace in many industries.
As with any project, the keys to the success of an AI project lie in a number of factors (rigorous planning, effective communication, etc.), some of which are specific to the project’s field of application (working on the safety of a power plant is not the same as working on the recommendations of an online sales site, for example).
Similarly, for Sophie, our data scientist, for any AI project there are also a few key notions to be put into the hands of all those involved in the project (PO, project manager, analyst, developer, etc.) in order to maximise the benefits of AI while controlling its risks.
1) Technological viability: the vagaries of performance
When talking about artificial intelligence projects, it is crucial to understand that we are venturing into a field that is intrinsically non-deterministic. An artificial intelligence project generally consists of a model or a set of models encapsulated in software. This structure is divided into two parts:
- A non-deterministic part (the model or models)
- A deterministic part (data pipeline, UI/dashboards, back end)
The biggest determinant of the “profitability’’ or viability of an AI project is what we commonly call performance uncertainty. Given that AI operates in a probabilistic environment, it is inevitable that performance uncertainties will arise.
Basically, the main idea is that no matter how good your prediction model is, it is impossible to guarantee that it will produce correct results 100% of the time. Weather forecasting is a perfect example of this
It’s then important that the customer understands that a model won’t always be right, and that you need to make sure that your project is viable despite this uncertainty before you even start! What’s more, sometimes aggressive marketing of certain suppliers makes it difficult to get a fair idea of performance on your own data.
In our customer projects, we often recommend separating the software components from the AI components. This separation could be the subject of a small initial budget envelope dedicated to a proof of concept and would be preliminary to committing the budget to the deterministic parts.
This separation offers two main advantages:
- Evaluate the possible performance on your data and on your problem.
- Determine whether an out-of-shelves AI can be used or whether an open-source model should be re-trained or a bespoke model opted for.
Before signing a contract and starting any new artificial intelligence project at Uzinakod, our team always asks the customer questions about the cost of error. These questions often enable our experts to ensure that the client fully understands the non-deterministic dimension of their project.
For example, our AI experts want to know “what is the financial, human or legal cost of an erroneous prediction” or “who, how and for what purposes will the results be used”. The answers often make the customer realise that they are not in a deterministic context.
In pragmatic terms, the profitability of an AI project is often assessed based on the trade-off between the cost of the error + the opportunity cost on the one hand (= cost of not carrying out the project) and the gain from correct predictions on the other. This evaluation also helps to determine a project’s budget.
Here’s a concrete example: in order to provide a public authority with information X, you need to extract personally identifiable information from around a hundred contracts in PDF format. You have two choices:
1. You ask a human to do it alone.
2. You ask an AI to do it.
- Opportunity cost = Amount of the fine you risk by doing nothing
- Cost of error = Risk incurred if the public authority realises that you have mistakenly provided incorrect information.
- Gain from correct predictions = Salary(s) of the employee(s) you don’t have to mobilise for this tedious task.
When a project is underway, it is important to keep the above aspect in mind from a management strategy point of view. Unlike most software development tasks, where the assessment of the progress of each feature is often based on a positive (it works) or negative (it doesn’t work) score, data science tasks are often infinite. It’s a matter of regularly asking yourself whether you’re continuing for perhaps 1% more performance or whether you’re stopping there.
2) No data, no science
It is essential to obtain data samples from the outset of a project, even during the customer needs discovery phase. A first look at the data helps to make crucial decisions such as choosing AI models, understanding needs and issues, detecting biases, etc.
Access & Availability
It may seem obvious, in “data science”, we find the word “data”. Doing an AI project without having access to data wouldn’t be science, it would be more like magic. In short, data is not just a component of a project, it is the foundation.
- Make it clear in a project’s timescale proposal that AI-related tasks do not start when the contract is signed, but when the teams have access to the data. A delay in access means a delay in delivery.
- In practice, it is sometimes complicated to obtain access or make data available when the volume is massive. A good practice in this case is to request a sample at the start of discussions.
All AI projects involve a data collection, cleansing and formatting phase. This stage is often underestimated, but it is nonetheless a crucial one.
A simple way of describing data quality is that, in essence, data and the models built on it are always a more or less good representation of reality. In order for your model to meet your needs once in production, the data on which it is built and evaluated must be as representative of reality as possible. For example, IQ is often used as a proxy to represent people’s intelligence, but this measure is an imperfect reflection of what intelligence actually is.
For example, if you are trying to predict the sale price of your house, the way the data is processed will differ depending on the official transaction dataset available and the data extract from the Kijiji real estate section. Similarly, it is unwise to train an AI to detect abnormal machine operation using only data from a period when you changed the number of rooms.
- Data acquisition and/or collection can be an integral part of an AI project. For example, data can be structured from unstructured data, such as square footage, price and number of bedrooms in hundreds of property sales contracts. We can also advise a client on free or open-source data that could be used for a case study.
- In all cases, it is essential to include data issues in your budget.
Apart from the quality of the dataset, the volume of the data has an impact on the architecture of the project. For example, calling up a predictive model every minute on data collected from IoT sensors does not involve the same processing effort or the same architectural complexity as an analysis on a large quantity of CSV files updated once a year.
What’s more, the ‘’useful’’ size of the data available is often overestimated. For example, in an IoT project with sensor data collected every second, we can quickly accumulate gigabytes of data. However, if the aim is to produce a predictive model with the probability that your machine will break down tomorrow, you will still have just one evaluation observation per day.
- Take into account at an early stage the technological requirements and associated capabilities that will be needed for big data streaming projects. A data engineer will most likely be required for data manipulation pipelines.
3) Inference vs Prediction
It’s a general truth that the more complex a model is, the less interpretable it is, but the greater its potential predictive power is! From a modelling point of view, this Inference/Prediction distinction determines whether the team will have to work with machine learning methods (prediction) or with ‘’classic’’ statistical modelling methods (inference).
Machine learning always seeks to obtain the best prediction, whereas statistical modelling is more concerned with the relationships between the variables and the target, or between the variables themselves.
For example, if you are planning to renovate your house, you will want to know the resale value of your property. If your aim is simply to obtain a mortgage to finance your renovation, then what you’re looking for is the best possible prediction. On the other hand, if you’re looking to find out whether it’s worth spending $5,000 to add an extra shower room, and how much it could increase the value of your property on resale, then you’re looking at inference.
Another example with predictive maintenance: let’s say you have an automated production line with several sensors per machine/station along the line. You might want a model that can accurately determine whether your line is likely to break down in the next 24 hours.
Such a model could be used to plan maintenance activities in advance, or to initiate an emergency shutdown procedure to limit the impact of an unexpected production line stoppage. On the other hand, don’t rely on a predictive model to tell you the nature of the problem (a worn part or incorrectly set parameters) or to tell you exactly which machine is causing the problem.
Similarly, a predictive model will not be able to tell you by what percentage you can increase the output of a machine before significantly increasing its breakdown percentage…. All these considerations are more a matter of inference.
- Generally speaking, a client will always start by saying that what they want is a model with the strongest possible predictive power. From experience, this is rarely what they need; models with strong explanatory power are generally much more useful. This requires a strong link with the end users of the model and their business.
- In a perfect world, we would always want to make inferences and predictions. But it’s vital to understand that each type of model really represents a different data processing project, right through to the lifecycle of the model and its use.
Navigating the complex field of AI can be intimidating. That’s why it’s crucial to have a reliable partner with expertise in the field.
At Uzinakod, we guide you every step of the way, from initial planning to implementation, asking the critical questions that will ensure the success of your project.
If you’re looking to maximise the benefits of AI while controlling its risks, don’t hesitate to consult our AI experts to ensure the success of your project.