CAIR Deep Learning Research Publications

2023

1.
Davel MH, Lotz S, Theunissen MW, et al. Knowledge Discovery in Time Series Data. In: Deep Learning Indaba 2023. ; 2023.

• Complex time series data often encountered in scientific and engineering domains. • Deep learning (DL) is particularly successful here: – large data sets, multivariate input and/or ouput, – highly complex sequences of interactions. • Model interpretability: – Ability to understand a model’s decisions in a given context [1]. – Techniques typically not originally developed for time series data. – Time series interpretations themselves become uninterpretable. • Knowledge Discovery: – DL has potential to reveal interesting patterns in large data sets. – Potential to produce novel insights about the task itself [2, 3]. • ‘know-it’: Collaborative project that studies knowledge discovery in time series data.

@{507,
  author = {Marelie Davel and Stefan Lotz and Marthinus Theunissen and Almaro de Villiers and Chara Grant and Randle Rabe and Stefan Schoombie and Cleo Conacher},
  title = {Knowledge Discovery in Time Series Data},
  abstract = {• Complex time series data often encountered in scientific and engineering domains.
• Deep learning (DL) is particularly successful here:
– large data sets, multivariate input and/or ouput,
– highly complex sequences of interactions.
• Model interpretability:
– Ability to understand a model’s decisions in a given context [1].
– Techniques typically not originally developed for time series data.
– Time series interpretations themselves become uninterpretable.
• Knowledge Discovery:
– DL has potential to reveal interesting patterns in large data sets.
– Potential to produce novel insights about the task itself [2, 3].
• ‘know-it’: Collaborative project that studies knowledge discovery in
time series data.},
  year = {2023},
  journal = {Deep Learning Indaba 2023},
  month = {September 2023},
}
1.
Olivier JC, Barnard E. Minimum phase finite impulse response filter design. The Institute of Engineering and Technology. 2023;17. doi: https://doi.org/10.1049/sil2.12166.

The design of minimum phase finite impulse response (FIR) filters is considered. The study demonstrates that the residual errors achieved by current state-of-the-art design methods are nowhere near the smallest error possible on a finite resolution digital computer. This is shown to be due to conceptual errors in the literature pertaining to what constitutes a factorable linear phase filter. This study shows that factorisation is possible with a zero residual error (in the absence of machine finite resolution error) if the linear operator or matrix representing the linear phase filter is positive definite. Methodology is proposed able to design a minimum phase filter that is optimal—in the sense that the residual error is limited only by the finite precision of the digital computer, with no systematic error. The study presents practical application of the proposed methodology by designing two minimum phase Chebyshev FIR filters. Results are compared to state-of-the-art methods from the literature, and it is shown that the proposed methodology is able to reduce currently achievable residual errors by several orders of magnitude.

@article{506,
  author = {Jan Olivier and Etienne Barnard},
  title = {Minimum phase finite impulse response filter design},
  abstract = {The design of minimum phase finite impulse response (FIR) filters is considered. The study demonstrates that the residual errors achieved by current state-of-the-art design methods are nowhere near the smallest error possible on a finite resolution digital computer. This is shown to be due to conceptual errors in the literature pertaining to what constitutes a factorable linear phase filter. This study shows that factorisation is possible with a zero residual error (in the absence of machine finite resolution error) if the linear operator or matrix representing the linear phase filter is positive definite. Methodology is proposed able to design a minimum phase filter that is optimal—in the sense that the residual error is limited only by the finite precision of the digital computer, with no systematic error. The study presents practical application of the proposed methodology by designing two minimum phase Chebyshev FIR filters. Results are compared to state-of-the-art methods from the literature, and it is shown that the proposed methodology is able to reduce currently achievable residual errors by several orders of magnitude.},
  year = {2023},
  journal = {The Institute of Engineering and Technology},
  volume = {17},
  edition = {7},
  month = {July 2023},
  doi = {https://doi.org/10.1049/sil2.12166},
}
1.
Ngorima SA, Helberg ASJ, Davel MH. Sequence Based Deep Neural Networks for Channel Estimation in Vehicular Communication Systems. In: Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science. Vol. 1976. Springer, Cham; 2023. doi:https://doi.org/10.1007/978-3-031-49002-6_12.

Channel estimation is a critical component of vehicular communications systems, especially in high-mobility scenarios. The IEEE 802.11p standard uses preamble-based channel estimation, which is not sufficient in these situations. Recent work has proposed using deep neural networks for channel estimation in IEEE 802.11p. While these methods improved on earlier baselines they still can perform poorly, especially in very high mobility scenarios. This study proposes a novel approach that uses two independent LSTM cells in parallel and averages their outputs to update cell states. The proposed approach improves normalised mean square error, surpassing existing deep learning approaches in very high mobility scenarios.

@inbook{504,
  author = {Simbarashe Ngorima and Albert Helberg and Marelie Davel},
  title = {Sequence Based Deep Neural Networks for Channel Estimation in Vehicular Communication Systems},
  abstract = {Channel estimation is a critical component of vehicular communications systems, especially in high-mobility scenarios. The IEEE 802.11p standard uses preamble-based channel estimation, which is not sufficient in these situations. Recent work has proposed using deep neural networks for channel estimation in IEEE 802.11p. While these methods improved on earlier baselines they still can perform poorly, especially in very high mobility scenarios. This study proposes a novel approach that uses two independent LSTM cells in parallel and averages their outputs to update cell states. The proposed approach improves normalised mean square error, surpassing existing deep learning approaches in very high mobility scenarios.},
  year = {2023},
  journal = {Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science},
  volume = {1976},
  pages = {176 - 186},
  month = {29 November 2023},
  publisher = {Springer, Cham},
  isbn = {978-3-031-49001-9},
  doi = {https://doi.org/10.1007/978-3-031-49002-6_12},
}
1.
Lotz S, Nel A, Wicks R, et al. The Radial Variation of the Solar Wind Turbulence Spectra near the Kinetic Break Scale from Parker Solar Probe Measurements. In: The Astrophysical Journal. Vol. 942. 2nd ed. The American Astronomical Society; 2023. doi:10.3847/1538-4357/aca903.

In this study we examine the radial dependence of the inertial and dissipation range indices, as well as the spectral break separating the inertial and dissipation range in power density spectra of interplanetary magnetic field fluctuations using Parker Solar Probe data from the fifth solar encounter between ∼0.1 and ∼0.7 au. The derived break wavenumber compares reasonably well with previous estimates at larger radial distances and is consistent with gyro-resonant damping of Alfvénic fluctuations by thermal protons. We find that the inertial scale power-law index varies between approximately −1.65 and −1.45. This is consistent with either the Kolmogorov (−5/3) or Iroshnikov–Kraichnan (−3/2) values, and has a very weak radial dependence with a possible hint that the spectrum becomes steeper closer to the Sun. The dissipation range power-law index, however, has a clear dependence on radial distance (and turbulence age), decreasing from −3 near 0.7 au (4 days) to −4 [±0.3] at 0.1 au (0.75 days) closer to the Sun.

@inbook{503,
  author = {Stefan Lotz and Amore Nel and Robert Wicks and Owen Roberts and Nicholas Engelbrecht and Roelf Strauss and Gert Botha and Eduard Kontar and Alexander Pitňa and Stuart Bale},
  title = {The Radial Variation of the Solar Wind Turbulence Spectra near the Kinetic Break Scale from Parker Solar Probe Measurements},
  abstract = {In this study we examine the radial dependence of the inertial and dissipation range indices, as well as the spectral break separating the inertial and dissipation range in power density spectra of interplanetary magnetic field fluctuations using Parker Solar Probe data from the fifth solar encounter between ∼0.1 and ∼0.7 au. The derived break wavenumber compares reasonably well with previous estimates at larger radial distances and is consistent with gyro-resonant damping of Alfvénic fluctuations by thermal protons. We find that the inertial scale power-law
index varies between approximately −1.65 and −1.45. This is consistent with either the Kolmogorov (−5/3) or Iroshnikov–Kraichnan (−3/2) values, and has a very weak radial dependence with a possible hint that the spectrum becomes steeper closer to the Sun. The dissipation range power-law index, however, has a clear dependence on radial distance (and turbulence age), decreasing from −3 near 0.7 au (4 days) to −4 [±0.3] at 0.1 au (0.75 days) closer to the Sun.},
  year = {2023},
  journal = {The Astrophysical Journal},
  volume = {942},
  edition = {2},
  month = {01/2023},
  publisher = {The American Astronomical Society},
  doi = {10.3847/1538-4357/aca903},
}
1.
Ramalepe S, Modipa TI, Davel MH. The Analysis of the Sepedi-English Code-switched Radio News Corpus. Journal of the Digital Humanities Association of Southern Africa. 2023;4(Vol. 4 No. 01 (2022): Proceedings of the 3rd workshop on Resources for African Indigenous Languages (RAIL). doi:https://doi.org/10.55492/dhasa.v4i01.4444.

Code-switching is a phenomenon that occurs mostly in multilingual countries where multilingual speakers often switch between languages in their conversations. The unavailability of large scale code-switched corpora hampers the development and training of language models for the generation of code-switched text. In this study, we explore the initial phase of collecting and creating Sepedi-English code-switched corpus for generating synthetic news. Radio news and the frequency of code-switching on read news were considered and analysed. We developed and trained a Transformer-based language model using the collected code-switched dataset. We observed that the frequency of code-switched data in the dataset was very low at 1.1%. We complemented our dataset with the news headlines dataset to create a new dataset. Although the frequency was still low, the model obtained the optimal loss rate of 2,361 with an accuracy of 66%.

@article{502,
  author = {Simon Ramalepe and Thipe Modipa and Marelie Davel},
  title = {The Analysis of the Sepedi-English Code-switched Radio News Corpus},
  abstract = {Code-switching is a phenomenon that occurs mostly in multilingual countries where multilingual speakers often switch between languages in
their conversations. The unavailability of large scale code-switched corpora hampers the development and training of language models for the generation of code-switched text. In this study, we explore the initial phase of collecting and creating Sepedi-English code-switched corpus for generating synthetic news. Radio news and the frequency of code-switching on read news were considered and analysed. We developed and trained a Transformer-based language model using the collected code-switched dataset. We observed that the frequency of code-switched data in the dataset was very low at 1.1%. We complemented our dataset with the news headlines dataset to create a new dataset.
Although the frequency was still low, the model obtained the optimal loss rate of 2,361 with an accuracy of 66%.},
  year = {2023},
  journal = {Journal of the Digital Humanities Association of Southern Africa},
  volume = {4},
  edition = {1},
  month = {2023-01-25},
  issue = {Vol. 4 No. 01 (2022): Proceedings of the 3rd workshop on Resources for African Indigenous Languages (RAIL)},
  doi = {https://doi.org/10.55492/dhasa.v4i01.4444},
}
1.
Ramalepe S, Modipa TI, Davel MH. Transformer-based text generation for code-switched Sepedi-English news. In: Southern African Conference for Artificial Intelligence Research (SACAIR). ; 2023.

Code-switched data is rarely available in written form and this makes the development of large datasets required to train codeswitched language models difficult. Currently, available Sepedi-English code-switched corpora are not large enough to train a Transformer-based model for this language pair. In prior work, larger synthetic datasets have been constructed using a combination of a monolingual and a parallel corpus to approximate authentic code-switched text. In this study, we develop and analyse a new Sepedi-English news dataset (SepEnews). We collect and curate data from local radio news bulletins and use this to augment two existing sources collected from Sepedi newspapers and news headlines, respectively. We then develop and train a Transformer-based model for generating historic code-switched news, and demonstrate and analyse the system’s performance.

@{501,
  author = {Simon Ramalepe and Thipe Modipa and Marelie Davel},
  title = {Transformer-based text generation for code-switched Sepedi-English news},
  abstract = {Code-switched data is rarely available in written form and this makes the development of large datasets required to train codeswitched language models difficult. Currently, available Sepedi-English code-switched corpora are not large enough to train a Transformer-based
model for this language pair. In prior work, larger synthetic datasets have been constructed using a combination of a monolingual and a parallel
corpus to approximate authentic code-switched text. In this study, we develop and analyse a new Sepedi-English news dataset (SepEnews). We collect and curate data from local radio news bulletins and use this to augment two existing sources collected from Sepedi newspapers and news headlines, respectively. We then develop and train a Transformer-based model for generating historic code-switched news, and demonstrate and analyse the system’s performance.},
  year = {2023},
  journal = {Southern African Conference for Artificial Intelligence Research (SACAIR)},
  pages = {84 - 97},
  month = {December 2023},
}
1.
Middel C, Davel MH. Comparing Transformer-based and GBDT models on tabular data: A Rossmann Store Sales case study. In: Southern African Conference for Artificial Intelligence Research (SACAIR). ; 2023.

Heterogeneous tabular data is a common and important data format. This empirical study investigates how the performance of deep transformer models compares against benchmark gradient boosting decision tree (GBDT) methods, the more typical modelling approach. All models are optimised using a Bayesian hyperparameter optimisation protocol, which provides a stronger comparison than the random grid search hyperparameter optimisation utilized in earlier work. Since feature skewness is typically handled differently for GBDT and transformer-based models, we investigate the effect of a pre-processing step that normalises feature distribution on the model comparison process. Our analysis is based on the Rossmann Store Sales dataset, a widely recognized benchmark for regression tasks.

@{500,
  author = {Coenraad Middel and Marelie Davel},
  title = {Comparing Transformer-based and GBDT models on tabular data: A Rossmann Store Sales case study},
  abstract = {Heterogeneous tabular data is a common and important data format. This empirical study investigates how the performance of deep transformer models compares against benchmark gradient boosting decision tree (GBDT) methods, the more typical modelling approach. All models are optimised using a Bayesian hyperparameter optimisation protocol, which provides a stronger comparison than the random grid search hyperparameter optimisation utilized in earlier work. Since feature skewness is typically handled differently for GBDT and transformer-based models, we investigate the effect of a pre-processing step that normalises feature distribution on the model comparison process. Our analysis is based on the Rossmann Store Sales dataset, a widely recognized benchmark for regression tasks.},
  year = {2023},
  journal = {Southern African Conference for Artificial Intelligence Research (SACAIR)},
  pages = {115 - 129},
  month = {December 2023},
}

2022

1.
Ramalepe SP, Modipa TI, Davel MH. The development of a Sepedi text generation model using transformers. In: Southern Africa Telecommunication Networks and Applications Conference (SATNAC). ; 2022.

Text generation is one of the important sub-tasks of natural language generation (NLG), and aims to produce humanly readable text given some input text. Deep learning approaches based on neural networks have been proposed to solve text generation tasks. Although these models can generate text, they do not necessarily capture long-term dependencies accurately, making it difficult to coherently generate longer sentences. Transformer-based models have shown significant improvement in text generation. However, these models are computationally expensive and data hungry. In this study, we develop a Sepedi text generation model using a Transformerbased approach and explore its performance. The developed model has one Transformer block with causal masking on the attention layers and two separate embedding layers. To train the model, we use the National Centre for Human Language Technology (NCHLT) Sepedi text corpus. Our experimental setup varied the model embedding size, batch size and the sequence length. The final model was able to reconstruct unseen test data with 75% accuracy: the highest accuracy achieved to date, using a Sepedi corpus.

@{511,
  author = {Simon Ramalepe and Thipe Modipa and Marelie Davel},
  title = {The development of a Sepedi text generation model using transformers},
  abstract = {Text generation is one of the important sub-tasks of natural language generation (NLG), and aims to produce humanly readable text given some input text. Deep learning approaches based on neural networks have been proposed to solve text generation tasks. Although these models can generate text, they do not necessarily capture long-term dependencies accurately, making it difficult to coherently generate longer sentences. Transformer-based models have shown significant improvement in text generation. However, these models are computationally expensive and data hungry. In this study, we develop a Sepedi text generation model using a Transformerbased approach and explore its performance. The developed model has one Transformer block with causal masking on the attention layers and two separate embedding layers. To train the model, we use the National Centre for Human Language Technology (NCHLT) Sepedi text corpus. Our experimental setup varied the model embedding size, batch size and the sequence length. The final model was able to reconstruct unseen test data with 75% accuracy: the highest accuracy achieved to date, using a Sepedi corpus.},
  year = {2022},
  journal = {Southern Africa Telecommunication Networks and Applications Conference (SATNAC)},
  pages = {51 - 56},
  month = {August 2022},
}
1.
Oosthuizen M, Hoffman A, Davel MH. A Comparative Study of Graph Neural Network Speed Prediction during Periods of Congestion. In: In Proceedings of the 14th International Joint Conference on Computational Intelligence (IJCCI 2022) - NCTA. Vol. 1. ; 2022. doi:10.5220/0011374100003332.

Traffic speed prediction using deep learning has been the topic of many studies. In this paper, we analyse the performance of Graph Neural Network-based techniques during periods of traffic congestion. We first compare a selection of recently proposed techniques that claim to achieve good results using the METR-LA and PeMS-BAY data sets. We then investigate the performance of three of these approaches – Graph WaveNet, Spacetime Neural Network (STNN) and Spatio-Temporal Attention Wavenet (STAWnet) – during congested periods, using recurrent congestion patterns to set a threshold for general congestion through the entire traffic network. Our results show that performance deteriorates significantly during congested time periods, which is concerning, as traffic speed prediction is usually of most value during times of congestion. We also found that, while the above approaches perform almost equally in the absence of congestion, there are much bigger differences in performance during peri ods of congestion.

@{510,
  author = {Marko Oosthuizen and Alwyn Hoffman and Marelie Davel},
  title = {A Comparative Study of Graph Neural Network Speed Prediction during Periods of Congestion},
  abstract = {Traffic speed prediction using deep learning has been the topic of many studies. In this paper, we analyse the performance of Graph Neural Network-based techniques during periods of traffic congestion. We first compare a selection of recently proposed techniques that claim to achieve good results using the METR-LA and PeMS-BAY data sets. We then investigate the performance of three of these approaches – Graph WaveNet, Spacetime Neural Network (STNN) and Spatio-Temporal Attention Wavenet (STAWnet) – during congested periods, using recurrent congestion patterns to set a threshold for general congestion through the entire traffic network. Our results show that performance deteriorates significantly during congested time periods, which is concerning, as traffic speed prediction is usually of most value during times of congestion. We also found that, while the above approaches perform almost equally in the absence of congestion, there are much bigger differences in performance during peri ods of congestion.},
  year = {2022},
  journal = {In Proceedings of the 14th International Joint Conference on Computational Intelligence (IJCCI 2022) - NCTA},
  volume = {1},
  pages = {331 - 338},
  month = {October 2022},
  isbn = {978-989-758-611-8},
  doi = {10.5220/0011374100003332},
}
1.
Oosthuizen AJ, Helberg ASJ, Davel MH. Adversarial training for channel state information estimation in LTE multi-antenna systems. In: Southern African Conference for Artificial Intelligence Research. Vol. 1734. Springer, Cham; 2022. doi:https://doi.org/10.1007/978-3-031-22321-1_1.

Deep neural networks can be utilised for channel state information (CSI) estimation in wireless communications. We aim to decrease the bit error rate of such networks without increasing their complexity, since the wireless environment requires solutions with high performance while constraining implementation cost. For this reason, we investigate the use of adversarial training, which has been successfully applied to image super-resolution tasks that share similarities with CSI estimation tasks. CSI estimators are usually trained in a Single-In Single-Out (SISO) configuration to estimate the channel between two specific antennas and then applied to multi-antenna configurations. We show that the performance of neural networks in the SISO training environment is not necessarily indicative of their performance in multi-antenna systems. The analysis shows that adversarial training does not provide advantages in the SISO environment, however, adversarially trained models can outperform non-adversarially trained models when applying antenna diversity to Long-Term Evolution systems. The use of a feature extractor network is also investigated in this study and is found to have the potential to enhance the performance of Multiple-In Multiple-Out antenna configurations at higher SNRs. This study emphasises the importance of testing neural networks in the context of use while also showing possible advantages of adversarial training in multi-antenna systems without necessarily increasing network complexity.

@{509,
  author = {Andrew Oosthuizen and Albert Helberg and Marelie Davel},
  title = {Adversarial training for channel state information estimation in LTE multi-antenna systems},
  abstract = {Deep neural networks can be utilised for channel state information (CSI) estimation in wireless communications. We aim to decrease the bit error rate of such networks without increasing their complexity, since the wireless environment requires solutions with high performance while constraining implementation cost. For this reason, we investigate the use of adversarial training, which has been successfully applied to image super-resolution tasks that share similarities with CSI estimation tasks. CSI estimators are usually trained in a Single-In Single-Out (SISO) configuration to estimate the channel between two specific antennas and then applied to multi-antenna configurations. We show that the performance of neural networks in the SISO training environment is not necessarily indicative of their performance in multi-antenna systems. The analysis shows that adversarial training does not provide advantages in the SISO environment, however, adversarially trained models can outperform non-adversarially trained models when applying antenna diversity to Long-Term Evolution systems. The use of a feature extractor network is also investigated in this study and is found to have the potential to enhance the performance of Multiple-In Multiple-Out antenna configurations at higher SNRs. This study emphasises the importance of testing neural networks in the context of use while also showing possible advantages of adversarial training in multi-antenna systems without necessarily increasing network complexity.},
  year = {2022},
  journal = {Southern African Conference for Artificial Intelligence Research},
  volume = {1734},
  pages = {3 - 17},
  month = {November 2022},
  publisher = {Springer, Cham},
  isbn = {978-3-031-22320-4},
  doi = {https://doi.org/10.1007/978-3-031-22321-1_1},
}
1.
Fourie E, Davel MH, Versfeld J. Neural speech processing for whale call detection. In: Southern African Conference for AI Research (SACAIR). Vol. 1734. Springer, Cham; 2022. doi:https://doi.org/10.1007/978-3-031-22321-1_19.

Passive acoustic monitoring with hydrophones makes it possible to detect the presence of marine animals over large areas. For monitoring to be cost-effective, this process should be fully automated. We explore a new approach to detecting whale calls, using an end-to-end neural architecture and traditional speech features. We compare the results of the new approach with a convolutional neural network (CNN) applied to spectrograms, currently the standard approach to whale call detection. Experiments are conducted using the “Acoustic trends for the blue and fin whale library” from the Australian Antarctic Data Centre (AADC). We experiment with different types of speech features (mel frequency cepstral coefficients and filter banks) and different ways of framing the task. We demonstrate that a time delay neural network is a viable solution for whale call detection, with the additional benefit that spectrogram tuning – required to obtain high-quality spectrograms in challenging acoustic conditions – is no longer necessary. While the initial speech feature-based system (accuracy 96%) did not outperform the CNN (accuracy 98%) when trained on exactly the same dataset, it presents a viable approach to explore further.

@{508,
  author = {Edrich Fourie and Marelie Davel and Jaco Versfeld},
  title = {Neural speech processing for whale call detection},
  abstract = {Passive acoustic monitoring with hydrophones makes it possible to detect the presence of marine animals over large areas. For monitoring to be cost-effective, this process should be fully automated. We explore a new approach to detecting whale calls, using an end-to-end neural architecture and traditional speech features. We compare the results of the new approach with a convolutional neural network (CNN) applied to spectrograms, currently the standard approach to whale call detection. Experiments are conducted using the “Acoustic trends for the blue and fin whale library” from the Australian Antarctic Data Centre (AADC). We experiment with different types of speech features (mel frequency cepstral coefficients and filter banks) and different ways of framing the task. We demonstrate that a time delay neural network is a viable solution for whale call detection, with the additional benefit that spectrogram tuning – required to obtain high-quality spectrograms in challenging acoustic conditions – is no longer necessary. While the initial speech feature-based system (accuracy 96%) did not outperform the CNN (accuracy 98%) when trained on exactly the same dataset, it presents a viable approach to explore further.},
  year = {2022},
  journal = {Southern African Conference for AI Research (SACAIR)},
  volume = {1734},
  pages = {276 - 290},
  month = {November 2022},
  publisher = {Springer, Cham},
  doi = {https://doi.org/10.1007/978-3-031-22321-1_19},
}
1.
Theunissen MW, Mouton C, Davel MH. The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs. In: Artificial Intelligence Research (SACAIR 2022), Communications in Computer and Information Science. Vol. 1734. Springer, Cham; 2022. doi: https://doi.org/10.48550/arXiv.2302.06925.

Classification margins are commonly used to estimate the generalization ability of machine learning models. We present an empirical study of these margins in artificial neural networks. A global estimate of margin size is usually used in the literature. In this work, we point out seldom considered nuances regarding classification margins. Notably, we demonstrate that some types of training samples are modelled with consistently small margins while affecting generalization in different ways. By showing a link with the minimum distance to a different-target sample and the remoteness of samples from one another, we provide a plausible explanation for this observation. We support our findings with an analysis of fully-connected networks trained on noise-corrupted MNIST data, as well as convolutional networks trained on noise-corrupted CIFAR10 data.

@inbook{505,
  author = {Marthinus Theunissen and Coenraad Mouton and Marelie Davel},
  title = {The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs},
  abstract = {Classification margins are commonly used to estimate the generalization ability of machine learning models. We present an empirical study of these margins in artificial neural networks. A global estimate of margin size is usually used in the literature. In this work, we point out seldom considered nuances regarding classification margins. Notably, we demonstrate that some types of training samples are modelled with consistently small margins while affecting generalization in different ways. By showing a link with the minimum distance to a different-target sample and the remoteness of samples from one another, we provide a plausible explanation for this observation. We support our findings with an analysis of fully-connected networks trained on noise-corrupted MNIST data, as well as convolutional networks trained on noise-corrupted CIFAR10 data.},
  year = {2022},
  journal = {Artificial Intelligence Research (SACAIR 2022), Communications in Computer and Information Science},
  volume = {1734},
  pages = {78 - 92},
  month = {November 2022},
  publisher = {Springer, Cham},
  doi = {https://doi.org/10.48550/arXiv.2302.06925},
}
1.
Heymans W, Davel MH, Van Heerden CJ. Efficient acoustic feature transformation in mismatched environments using a Guided-GAN. Speech Communication. 2022;143. doi:https://doi.org/10.1016/j.specom.2022.07.002.

We propose a new framework to improve automatic speech recognition (ASR) systems in resource-scarce environments using a generative adversarial network (GAN) operating on acoustic input features. The GAN is used to enhance the features of mismatched data prior to decoding, or can optionally be used to fine-tune the acoustic model. We achieve improvements that are comparable to multi-style training (MTR), but at a lower computational cost. With less than one hour of data, an ASR system trained on good quality data, and evaluated on mismatched audio is improved by between 11.5% and 19.7% relative word error rate (WER). Experiments demonstrate that the framework can be very useful in under-resourced environments where training data and computational resources are limited. The GAN does not require parallel training data, because it utilises a baseline acoustic model to provide an additional loss term that guides the generator to create acoustic features that are better classified by the baseline.

@article{492,
  author = {Walter Heymans and Marelie Davel and Charl Van Heerden},
  title = {Efficient acoustic feature transformation in mismatched environments using a Guided-GAN},
  abstract = {We propose a new framework to improve automatic speech recognition (ASR) systems in resource-scarce environments using a generative adversarial network (GAN) operating on acoustic input features. The GAN is used to enhance the features of mismatched data prior to decoding, or can optionally be used to fine-tune the acoustic model. We achieve improvements that are comparable to multi-style training (MTR), but at a lower computational cost. With less than one hour of data, an ASR system trained on good quality data, and evaluated on mismatched audio is improved by between 11.5% and 19.7% relative word error rate (WER). Experiments demonstrate that the framework can be very useful in under-resourced environments where training data and computational resources are limited. The GAN does not require parallel training data, because it utilises a baseline acoustic model to provide an additional loss term that guides the generator to create acoustic features that are better classified by the baseline.},
  year = {2022},
  journal = {Speech Communication},
  volume = {143},
  pages = {10 - 20},
  month = {09/2022},
  doi = {https://doi.org/10.1016/j.specom.2022.07.002},
}
1.
Oosthuizen AJ, Davel MH, Helberg A. Multi-Layer Perceptron for Channel State Information Estimation: Design Considerations. In: Southern Africa Telecommunication Networks and Applications Conference (SATNAC). Fancourt, George; 2022.

The accurate estimation of channel state information (CSI) is an important aspect of wireless communications. In this paper, a multi-layer perceptron (MLP) is developed as a CSI estimator in long-term evolution (LTE) transmission conditions. The representation of the CSI data is investigated in conjunction with batch normalisation and the representational ability of MLPs. It is found that discontinuities in the representational feature space can cripple an MLP’s ability to accurately predict CSI when noise is present. Different ways in which to mitigate this effect are analysed and a solution developed, initially in the context of channels that are only affected by additive white Guassian noise. The developed architecture is then applied to more complex channels with various delay profiles and Doppler spread. The performance of the proposed MLP is shown to be comparable with LTE minimum mean squared error (MMSE), and to outperform least square (LS) estimation over a range of channel conditions.

@{491,
  author = {Andrew Oosthuizen and Marelie Davel and Albert Helberg},
  title = {Multi-Layer Perceptron for Channel State Information Estimation: Design Considerations},
  abstract = {The accurate estimation of channel state information (CSI) is an important aspect of wireless communications. In this paper, a multi-layer perceptron (MLP) is developed as a CSI estimator in long-term evolution (LTE) transmission conditions. The representation of the CSI data is investigated in conjunction with batch normalisation and the representational ability of MLPs. It is found that discontinuities in the representational feature space can cripple an MLP’s ability to accurately predict CSI when noise is present. Different ways in which to mitigate this effect are analysed and a solution developed, initially in the context of channels that are only affected by additive white
Guassian noise. The developed architecture is then applied to more complex channels with various delay profiles and Doppler spread. The performance of the proposed MLP is shown to be comparable with LTE minimum mean squared error (MMSE), and to outperform least square (LS) estimation over a range of channel conditions.},
  year = {2022},
  journal = {Southern Africa Telecommunication Networks and Applications Conference (SATNAC)},
  pages = {94 - 99},
  month = {08/2022},
  address = {Fancourt, George},
}
1.
Modipa T, Davel MH. Two Sepedi‑English code‑switched speech corpora. Language Resources and Evaluation. 2022;56. doi:https://doi.org/10.1007/s10579-022-09592-6 (Read here: https://rdcu.be/cO6lD).

We report on the development of two reference corpora for the analysis of SepediEnglish code-switched speech in the context of automatic speech recognition. For the first corpus, possible English events were obtained from an existing corpus of transcribed Sepedi-English speech. The second corpus is based on the analysis of radio broadcasts: actual instances of code switching were transcribed and reproduced by a number of native Sepedi speakers. We describe the process to develop and verify both corpora and perform an initial analysis of the newly produced data sets. We find that, in naturally occurring speech, the frequency of code switching is unexpectedly high for this language pair, and that the continuum of code switching (from unmodified embedded words to loanwords absorbed into the matrix language) makes this a particularly challenging task for speech recognition systems.

@article{483,
  author = {Thipe Modipa and Marelie Davel},
  title = {Two Sepedi‑English code‑switched speech corpora},
  abstract = {We report on the development of two reference corpora for the analysis of SepediEnglish code-switched speech in the context of automatic speech recognition. For the first corpus, possible English events were obtained from an existing corpus of transcribed Sepedi-English speech. The second corpus is based on the analysis of radio broadcasts: actual instances of code switching were transcribed and reproduced by a number of native Sepedi speakers. We describe the process to develop and verify both corpora and perform an initial analysis of the newly produced data sets. We find that, in naturally occurring speech, the frequency of code switching is unexpectedly high for this language pair, and that the continuum of code switching (from unmodified embedded words to loanwords absorbed into the matrix language) makes this a particularly challenging task for speech recognition systems.},
  year = {2022},
  journal = {Language Resources and Evaluation},
  volume = {56},
  pages = {https://rdcu.be/cO6lD)},
  publisher = {Springer},
  address = {South Africa},
  url = {https://rdcu.be/cO6lD},
  doi = {https://doi.org/10.1007/s10579-022-09592-6 (Read here: https://rdcu.be/cO6lD)},
}
  • CSIR
  • DSI
  • Covid-19