Performance Monitoring

Determining the performance metrics of each Agent Node is crucial for evaluating the performance and effectiveness of the AI models that are deployed. This ensures that Agent Nodes are incentivised to perform at high levels of quality and performance, and be properly and fairly compensated for it.

Since not all AI models can be evaluated in the same way, the platform needs to adapt to each case in choosing the best evaluation metrics depending on the specific data processing requirements. Evaluation metrics need to be defined as objective criteria to be able to determine the overall quality of an AI model in accurately predicting outcomes, as well as in its ability to generalise its inference in every given context. Based on a given type of Task, the platform determines the appropriate evaluation metrics that it applies to all Agent Nodes participating in each Consensus round. These metrics are further aggregated into a performance score for each Agent Node in part.

Before diving into the performance metrics, it is important to understand that predictive AI models fall in one of two categories: regression and classification. Whenever we talk about classification models, we refer to binary outputs (Yes vs No) or nominal outputs (Cat vs Dog vs Rabbit). On the other hand, regression models refer to continuous outputs, such as predicting the next temperature based on a given series of thermometer readings.

Classification Metrics

Since classification AI models have discrete outputs, the evaluation metrics need to be able to discern whether the predicted output falls into the correct class.

Before jumping into metrics, we need to define a few terms:

True Positive (TP): the number of positive classes predicted correctly, e.g. the AI model was able to predict that the sound was made by a cat, and it was correct
True Negative (TN): the number of negative classes predicted correctly, e.g. the AI model was able to predict that the sound was not made by a cat, and it was correct
False Positive (FP): the number of positive classes predicted incorrectly, e.g. the AI model was able to predict that the sound was made by a cat, and it was not correct
False Negative (FN): the number of negative classes predicted incorrectly, e.g. the AI model was able to predict that the sound was not made by a cat, and it was not correct

The platform defines the following classification metrics:

Accuracy: the ratio between correct predictions and the total number of predictions.
Precision: the ratio between true positive (TP) and total positives that have been predicted (TP + FP)
Recall: the ratio of true positives (TP) to all of the positives in ground truth (TP + FN)
F1-score: the harmonic mean between Precision and Recall $F_1 = \frac{2}{\frac{1}{Precision}+\frac{1}{Recall}}$

Regression Metrics

Since regression AI models have continuous outputs, the evaluation metrics need to determine how close the predicted output was to the actual output, in terms of the numerical distance between them:

The platform defines the following regression metrics:

Mean Absolute Error (MAE): the average difference between the predicted values and the ground truth
Mean Squared Error (MSE): the average of the squared difference between predicted values and the ground truth
Root Mean Squared Error (RMSE): the square root of the average of the squared difference between predicted values and the ground truth
R2 coefficient of determination: compound metric to determine how much the total variation in the ground truth is explain by the variation in predicted values

PreviousDecentralised Inference Protocol NextDelegation of Trust

Last updated 1 year ago