🍰 Self-Regulating Systems with Golang and AI

Let's reflect on the concept of self-regulating systems. Imagine a chain of interacting services. The first service we will call the main one. Let's imagine that this service manages the state of a certain system. In addition to the ability to control the system, the main service has built-in functionality for reading, storing, and publishing data about the state of the system in the form of an array of numbers.

When the system is running, the data array about its state is constantly updated and sent to a queue subscribed to by a multitude of services capable of analyzing the flow of incoming data, determining the state of the system based on them. If the state of the system does not raise any suspicions, the work continues in normal mode: the incoming message is ignored by the analyzing services and removed from the queue. In case the analyzer that read the state data array notices an anomalous system behavior, it signals this by sending a message to the next queue, informing that the state of the service needs to be corrected.

Thus, by analyzers, we mean such services that can

make assumptions about the occurrence of an anomaly by evaluating the behavior of the system as a whole;
publish messages about anomalous behavior in topics listened to by correctors.

Analyzers do not know anything about the parameters of the system or the environment in which the system exists. I assume that the evaluation of the system state can be made either based on statistics or using a specially prepared file with a neural network specifically trained to detect anomalies. The latter approach is preferable since it is significantly faster and does not require the analyzer to perform complex and frequent calculations. (The small note about how to build your own NN with Golang)

When the analyzer publishes a message, the latter is intercepted by the corrector. Correctors are services that will publish recommendation letters to the main service with instructions for correcting the anomaly that has occurred. On the other hand, if there are no anomalous states, the correctors can recommend the main service to slightly increase the process control parameters to achieve optimal performance. As soon as the anomaly is noticed again by the analyzer, the corrector processing the message will recommend reducing the control parameter to half of the current value, and so on. This algorithm is somewhat reminiscent of binary search.

Thus, by correctors, we mean such services that

know the current configuration of the system being processed by the main service;
know the configuration of the environment in which the system exists to avoid exceeding the maximum allowable values.

Correctors do not change the state of the system directly, but only give recommendations to the main service for fine-tuning, which can be either accepted or rejected.

The scheme of a self-regulating service

Thus, during the update of the data stream and analysis of the presence or absence of anomalies in the system, the main service adjusts the control parameters to ensure that the system operates most efficiently. It is important to note that this approach makes the system's configuration dynamic: each new element added to the system requires additional costs, and the removal of an element should release the resources allocated to its maintenance. Regular data updates allow for synchronizing the system's control parameters with the actual costs, and theoretically provide not only the best performance of the entire system but also economic benefits for maintaining the entire environment.

Let's examine each component of the scheme to create a simple but effective model of a self-regulating service.

Analyzer

As mentioned earlier, analyzing services may not have information about the system's state and the parameters of the environment in which the system exists. Nevertheless, they can transmit this information from the main service to correctors for processing. Analyzing services base their work on tools that allow them to evaluate the system's behavior and analyze changes in the values of the system's state data array.

To determine an abnormal state, the analyzing service needs to have an idea of the normal state of the system. To do this, an array of accumulated data can be obtained from the main service. The size of the array can be 100, 200 or 300 points. The larger the main data sample, the more detailed we can describe the system's state. However, it is worth remembering that predicting the system's state based on only three points is possible, but the probability of making a mistake in such a case is also very high. In addition, we can use the Student's t-test to determine the optimal number of points.

We can assume that anomalous behavior can be detected if the values in the sample deviate beyond the standard deviation.

The system state graph shows the main trend, marked by a black dashed line, around which fluctuations occur. The green line marks the fluctuations of the normal system state, which are within one standard deviation. The red line marks 10 anomalous states, whose fluctuations from the main trend exceed one standard deviation.

I created a simple FeedForward neural network trained on sine wave signals with varying amplitude, frequency, and shift. The dataset is small, with only 10,000 records, each containing 100 points. Therefore, potentially, every 100 points of the system's state can be analyzed to determine abnormal states.

In fact, working with neural networks in Golang is quite simple. There are many popular libraries that provide a wide range of functionality for training neural networks, such as TensorFlow, for example. However, I chose a small and easy-to-use library called gobrain, which is also well-documented. To create a FeedForward neural network with it, you only need to add three following commands:

neuralNetwork := gobrain.FeedForward{}
neuralNetwork.Init(SIZE_OF_SET, NUMBER_OF_HIDDEN_LAYERS, NUMBER_OF_OUTPUT_LAYERS)
neuralNetwork.Train(sets, NUMBER_OF_ITERATIONS, RATE, FACTOR, false)

In this example, we are creating a FeedForward neural network with five hidden layers (NUMBER_OF_HIDDEN_LAYERS), one output layer (NUMBER_OF_OUTPUT_LAYERS), and a size of 100 records (SIZE_OF_SET). The dataset (sets) was pre-generated and stored in a database.

To save the neural network to a file, you need to specify its name.

if err := persist.Save("./dp-analyzer.v0.0.1.nya", neuralNetwork); err != nil {
	...
}

We can expand the capabilities of the neural network by decreasing the steps of changing amplitudes, frequencies, and shifts, and adding new fluctuations to the library. After preparing a new version of the neural network, we simply need to place the neural network file in the shared storage of the analyzing services and call the Update command in the analyzing service when receiving a new array of system state values (in).

neuralNetwork := &gobrain.FeedForward{}
if err := persist.Load("./dp-analyzer.v0.0.1.nya", neuralNetwork); err != nil {
	...
}
out = neuralNetwork.Update(in)
if isAnomaly(out) {
	...// Send a message to the Topic...
}

The value of out is in the range from 0 to 1. At the same time, 0 corresponds to the normal state of the system, and 1 corresponds to the case when the analyzer has detected an anomalous state of the system.

Could our theory be undermined by the question of whether training a neural network only on sine waves is enough? I believe that we won't go wrong if we choose sine waves as the main function for training the neural network. Firstly, any periodic signal can be represented as a combination of harmonic oscillations. Secondly, we are not interested in the form of the main trend of the system's behavior, but only fluctuations relative to it.

It should also be noted that an analyzer studying the behavior of a single control parameter can track its normal state. For example, a small database with records of anomaly detection time can be used. If the anomalous state does not occur for a long enough period, the analyzer can send a message to the topic with a recommendation to increase the control parameter. This operation is intended to improve the system's performance.

Corrector

As mentioned earlier, the corrector provides recommendations on what actions the main service should take to stabilize the system's operation. To do this, the corrector must be aware of the system's parameters and the environment in which the system operates. This helps the corrector accurately determine what size of control parameter change is permissible to avoid overloading the system or depriving it of the necessary resources to operate in a given environment.

Therefore, we distinguish between the critical and the maximum permissible values of the control parameter, where the latter is a property of the environment, not the system. We also assume that the critical value of the control parameter is less than or equal to the maximum permissible value, since we often provide the system with more resources than it actually requires.

The graph shows the search for the optimal state, which is performed by the controller when receiving data from the analyzer. The blue line with dots represents the control parameter set by the controller. Each point on the graph corresponds to new incoming data from the analyzer, which informs about the presence of anomalies. The red line indicates the critical value of the control parameter at which anomalous behavior of the system occurs. The green dashed lines show the maximum and minimum possible value of the control parameter. As we can see, the algorithm does not allow setting the control parameter above the green line to avoid overloading the system.

My algorithm for finding the optimal control state is similar to binary search. Every time a new event from the analyzer is received, the algorithm checks its type. If the event is an anomaly, the corrector recommends setting the control parameter value (currentControlParameter) to half of what it was at the time of registering the anomalous state. Otherwise, the corrector should recommend increasing the control parameter, but in such a way that the new recommended value does not exceed the maximum or minimum allowable value of the control parameter.

delta := 0.5 * currentControlParameter
if isAnomaly() {
	for delta <= min {
		delta = 1.5 * delta
	}
	currentControlParameter = delta
} else {
	for delta+currentControlParameter >= max {
		if delta <= min {
			delta = 1.5 * delta
		} else {
			delta = 0.5 * delta
		}
		currentControlParameter = delta
	}
	currentControlParameter += delta
}

The algorithm will still be able to track the optimal state of the system even if the values of the maximum, minimum allowable control parameters and the critical permissible control parameter gradually changing.

The graph shows how the algorithm synchronizes with changes in the values of the maximum, minimum allowable control parameters and the critical control parameter of the system.

Interactions between elements

Before, we talked about the components of an automatically regulated service. However, we did not pay enough attention to an important topic that is necessary for creating a complete model of an automatically regulated service. Each component of the model interacts with each other by publishing messages to topics and reading messages from queues. To model such interaction, a simple and efficient combination of Kafka and Zookeeper can be used.

The following code can be used as an example of writing a message (message) to a topic (topic) that passes through brokers located at predetermined addresses (brokersIPs).

writer := kafka.NewWriter(
	kafka.WriterConfig{
		Brokers: brokersIPs,
		Topic:   topic,
	},
)
...
if err := writer.WriteMessages(
	*context,
	kafka.Message{
		Key:   []byte(uuid.New().String()),
		Value: []byte(message),
	},
); err != nil {
	return err
}

Similarly, reading a message (message) from a topic (topic) through brokers located at predetermined addresses (brokersIPs) is carried out.

resource := kafka.NewReader(kafka.ReaderConfig{
	Brokers: brokersIPs,
	Topic:   topic,
	GroupID: groupID,
})
...
message, err := resource.ReadMessage(context)
if err != nil {
	...// Handle the error
} else {
	...// Handle the message
}

I would like to point out that the analyzer, corrector, and control service are producers and consumers in Kafka notation, as they read and write messages to KafkaCluster. Therefore, when designing the system, we need to predefine a set of addresses for brokers, as well as establish a connection between the following triplets:

Control Service -> KafkaCluster -> Analyzers;
Analyzers -> KafkaCluster/PublishSubscriber -> Correctors;
Correctors -> KafkaCluster -> Control Service.

The creation of a set of Kafka configuration files (*.properties) allows for easy execution of this task.

Conclusion

We have considered the idea of creating a self-regulating system and conducted tests of the load balancing algorithm for a system with a single control parameter. However, real systems are much more complex and may include multiple parameters. Nevertheless, the approach that evaluates the system's overall state and performs automatic load balancing has many advantages, such as ease of management and configuration, economic benefits when releasing excess resources, and others.

There is no need to be afraid that there may be too many control parameters. There is no need to create chains of analyzers and correctors for each control parameter. It is sufficient to assess the importance of each of the available resources for control and train the neural network to use them effectively.

The use of a neural network also frees us from performing calculations of clustering algorithms and statistical anomaly detection, which can be very resource-intensive. Even a simple neural network trained to detect outliers in harmonic oscillations allows for quickly and accurately determining the presence of anomalies in real systems. If you have a more complex system, the behavior of which you can preserve and transmit, then you can train the neural network on data that actually existed in your system. This is much more effective than statistical assumptions that all processes tend toward a normal distribution, as in reality, we cannot be sure of anything.

You can improve the model of the self-regulating system by adding a storage of states that were not recognized as anomalous but were not recognized as normal either. Such situations are rare, but possible. In this case, you can evaluate the data sets yourself and use them for retraining the neural network, which significantly improves the model's performance.

Snapshots

Here we can see how the Controller sends recommendations to the Control Service.

Here we can see how the Control Parameter changes to the Critical Parameter