Detecting Channels From FBC Spectrum Data Using Machine Learning
An integral part of FBC impairments detection is channel detection. For instance, the adjacency impairment looks at neighboring channels when deciding for an impairment. Most deterministic FBC impairment algorithms use channels in one way or the other. In the beginning, we didn't have a good algorithm for detecting channels and thus decided to let the Machine do it for us. Additionally, by detecting channels ourselves we don't need to request the details on how channels are allocated from the operator. While the customer knows this information, it is hard to get it; moreover, our independent channel map can actually help validate the operator's target allocation vs the actual channel allocation as seen in the field.
For reference, the channels detected here are downstream QAM channels (DOCSIS <=3.0 data or DVB-C). This does not include the latest DOCSIS 3.1 OFDM.
The figure above shows an FBC downstream spectrum containing Analog TV channels: the ones with high variance from data points 1000 to 3800 we ignore, and digital channels (marked red). Incidentally, these red areas are exactly the channels detected by our Machine Learning model. For reference, we will be discussing Neural Network Classification. There are many more classifiers out there such as Decision Trees or Gaussian Processes. If you're curious, have a look at the list here.
While one may assume that data is king and without it no Machine Learning is possible, that could not be further from the truth - generating synthetic data and training a model on that is definitely possible and sensible. However, this method should be used carefully - creating a complicated algorithm just for labelling synthetic data will make the optimizer learn a model for the algorithm and not generalize well beyond that. Thus, labelling should be done with the least amount of assumptions possible.
There are many ways of generating synthetic data. For instance, data augmentation is widely used (e.g. pictures are rotated and added as an additional new sample). For us, the issue was that we had very little labelled data.
Before even beginning with anything Machine Learning related, you should always make sure to clean the data. Most importantly, it needs to be normalized for neural networks to learn well. We tried standardization (around the gaussian distribution), but also MinMax normalization, and stuck to the latter.
Now, the first step was to create a test dataset - we ended up with roughly 120k labelled channels in a rather efficient way (manual labelling took no more than one hour per person). Below, you can see the matplotlib application for labelling. With six spectrum scans and 170 channels per scan, it does not take long to amass a sizable dataset. Note that in total there are three distinct datasets in Machine Learning: a train set, a validation set and a test set. The validation set is split from the train set to help with hyperparameter tuning (what learning rate is required to make the optimizer learn anything at all, or what number of hidden layers do we need?), while the test set is to see whether the model is overfitting or underfitting.
After this, a bit of an adventure begins - while there are quite a few guidelines and a lot of theory on what to do now, it is still a bit of an art. We now investigate model architecture. Since the problem is still rather simple, with an input size of just 50 and a simple binary classification (channel or not channel), hidden layer sizes of <50 should suffice, and the number of hidden layers should not exceed more than half a dozen. The reason for this is straightforward and outlined here - overfitting becomes a major issue really quickly, and the number of parameters explode quickly in fully connected networks. Since in our problem the channels are after all a nicely defined reverse U-shape, CNNs (Convolutional Neural Networks) are a straightforward pick. Keep in mind, that CNNs are connected locally, i.e. node 1 from layer 1 is connected to only a few, and not all nodes from the next layer. This decreases the amount of parameters dramatically. We quickly found that the problem really did not need many parameters at all and ended up using a CNN layer.
Ultimately having an extremely high accuracy is really important since FBC impairment detection (the algorithms we have) may break with just 2-3 channels (out of >100) missing. Thus, while most models we tried learned rather well and got to a quite high accuracy, it was still just not enough - we wanted more. An excellent way to improve here is by adding more data. This is exactly what we did, ending up with gigabytes of data (millions of samples). Getting that last 0.1% accuracy was the toughest part, but proved to be the most important one to achieve a high quality model.
In summary, what would one do with raw data and few or no labels but a problem to solve? Invest time in creating a good test set. Invest a lot of time in data preparation. Keep 3 datasets to not get caught overfitting. Try different types of layers (fully connected, CNN) and different hyperparameters. And keep the problem size in mind.
For reference, we used Tensorflow for training and evaluation.
This wraps up the post. I hope you could gain some more knowledge on the process of training neural networks - consider it the next time you face a numerical or spatial problem.