By Adrien Jun 8, 2021

Artificial Intelligence Applied to Temporal Data – Part 1/2

Every day, we encounter temporal data in our environments. These are simply readings, measurements of the same phenomenon over time. You observe them on a daily basis and will undoubtedly encounter them in your workplace.

What are their specifics? How can we integrate them and store them correctly in order to analyze past behaviors? Can we predict the future?

We will answer all these questions in this article, and then we will show you how artificial intelligence can integrate and value this data.

What Is Temporal Data?

In simple terms, temporal data refers to a consistent flow of data sets over a period of time. Analyzing this type of data has become a recent area of interest in artificial intelligence, as accurate forecasts are becoming increasingly vital in all types of industries for making more informed decisions.

In more technical language, we talk about time series, which are data streams in a time domain of a particular signal, such as your heart rate. Your heart rate is a signal with a data stream measuring its activity over a period of time.

Today, virtually every industry can benefit from artificial intelligence automating these predictions, from finance and business operations to manufacturing and maintenance.

Temporal Data Analysis

Temporal data analysis primarily includes clustering, classification, anomaly detection and forecasting, each of which is particularly useful to the business.

The Challenge of Pre-Processing

Time series data is an important form of indexed data, found in stock markets, climate data sets, and many other time-dependent data forms. Because of this dependence, time series data are susceptible to missing points due to problems with reading or recording the data.

To effectively apply future artificial intelligence models, the data sets must be continuous, as most AI models are not designed to handle missing values. Therefore, rows with missing data must be deleted or filled with the appropriate values.

A common practice is to fill in missing values with the mean or median value of the series. However, this is not always applicable depending on the data being studied. To understand why, consider a temperature data set. The temperature value for February is far from its value for July. This also applies to a company’s sales data set, which has some seasons with high sales and others with low or steady sales. The allocation method should thus be time-dependent.

Therefore, it is best to impute values with the average of the hours at the intervals of each of the missing values, or with different moving averages when possible.

In order to take better account of seasonality, interpolated imputations with different methods are certainly more suitable. We are talking about methods with different degrees such as Linear, Quadratic, Cubic, Akima, Polynomial or Spline interpolation.

However, it becomes complicated to be able to select the most efficient imputation method at the first attempt. An iterative approach consists in isolating the largest clean part of the time series (without missing values) and generating a “clone” of this series by randomly integrating missing values (5 and 20%). It is then possible to test each of the imputation methods in order to compare the degree of correlation of the imputed series for each method with the original clean series, in order to select the most efficient method.

Identifying Trends and Seasonality

Temporal data are studied both to interpret a phenomenon, identify the components of a trend, cyclicality, or seasonality, and to predict its future values.

That said, before working on designing and testing forecasting models, it is important to understand the basic steps of temporal data analysis. In particular, the following:

  • Trend analysis to determine if it is linear or not, as most models require this information as input.
  • Outlier detection to understand how to spot and manage outliers.
  • Stationary testing to know whether the time series can be assumed to be stationary or not, making stationary time data easier to predict.
  • Seasonality analysis to determine the best seasonal parameter to use in future modeling (weekly seasonality if it fluctuates every 7 days, monthly if it fluctuates every 30 days, etc.).

Selecting the Right Data

Feature selection is one of the first major steps when performing any artificial intelligence task. A feature in the case of a dataset simply means a column. When we get a dataset, not all columns (features) necessarily have an impact on the output variable. If we add these irrelevant features into the model, it will only make the model worse. This gives rise to the need to do feature selection.

With time series data, feature selection can be done in several ways, but there are 3 principal methods:

  1. Filtering Method
  2. Packing Method
  3. Integrated Method

Filtering Method

As the name suggests, this method consists of filtering and keeping only the subset of relevant features. Filtering can be done using a correlation matrix. In fact, it is most often done using the Pearson correlation, which measures the degree of linear correlation with the features.

Packing Method

This method requires a machine learning algorithm and uses its performance as an evaluation criterion. This means feeding features to the learning algorithm, and based on the performance of the model, adding/removing features. This is an iterative and computationally expensive process, but it is more accurate than the filtering method.

There are different packing methods such as backward elimination, forward selection, bi-directional elimination, or the Recursive Feature Elimination (RFE) method.

Integrated Method

Integrated methods also require a machine learning algorithm. These methods are iterative and allow each iteration of the model training process to carefully extract the features that contribute the most to the training for a particular iteration. Regularization methods are the most commonly used integrated methods that penalize a feature based on a coefficient threshold, such as Lasso regularization.

Conclusion

We are now familiar with various methods and approaches to time series analysis. Unfortunately, or perhaps fortunately, there is no quick fix for preparing and analyzing data before the key modeling step. The methods developed in recent years are still popular today.

Historical data is only the starting point for the learning process, which is why the application of AI to temporal data analysis is one of the most exciting recent innovations. With 5G and IoT, a wealth of data is about to be unlocked across the globe, and with AI applied to these types of data, significant benefits can be realized across all sorts of enterprise verticals and for all types of use.

In the second part of this article, we will discuss the different models of artificial intelligence adapted to time series, review each model, and detail their differences and their application areas.

Recommended Articles
Published on June 18, 2021

Artificial Intelligence Applied to Temporal Data - Part 2/2

The Use of Artificial Intelligence Models for the Prediction of Temporal Data For several years, there have been numerous techniques for forecasting temporal data, some of which are very simple. Others, such as neural networks, are much more complex. To apply these artificial intelligence models eff

Read more
Published on March 3, 2021

Google to Discontinue Support for AngularJS at the End of 2021!

The countdown has started! You may already know it, but by the end of December 2021, Google will no longer support AngularJS. If your applications use this framework, it's time to consider a quick code migration for your projects to Angular. Why is this important, and how much work is involved? Our

Read more
Search the site
Share on