Research Publications

2017

Van Niekerk, D. R. (2017). Evaluating acoustic modelling of lexical stress for Afrikaans speech synthesis. In Pattern Recognition Association of South Africa and Mechatronics International Conference (PRASA-RobMech). Bloemfontein, South Africa. http://doi.org/10.1109/RoboMech.2017.8261128

An explicit lexical stress feature is investigated for statistical parametric speech synthesis in Afrikaans: Firstly, objective measures are used to assess proposed annotation protocols and dictionaries compared to the baseline (implicit modelling) on the Lwazi 2 text-to-speech corpus. Secondly, the best candidates are evaluated on additional corpora. Finally, a comparative subjective evaluation is conducted to determine the perceptual impact on text-to-speech synthesis. The best candidate dictionary is associated with favourable objective results obtained on all corpora and was preferred in the subjective test. This suggests that it may form a basis for further refinement and work on improved prosodic models.

@{277,
  author = {Daniel Van Niekerk},
  title = {Evaluating acoustic modelling of lexical stress for Afrikaans speech synthesis},
  abstract = {An explicit lexical stress feature is investigated for statistical parametric speech synthesis in Afrikaans: Firstly, objective measures are used to assess proposed annotation protocols and dictionaries compared to the baseline (implicit modelling) on the Lwazi 2 text-to-speech corpus. Secondly, the best candidates are evaluated on additional corpora. Finally, a comparative subjective evaluation is conducted to determine the perceptual impact on text-to-speech synthesis. The best candidate dictionary is associated with favourable objective results obtained on all corpora and was preferred in the subjective test. This suggests that it may form a basis for further refinement and work on improved prosodic models.},
  year = {2017},
  journal = {Pattern Recognition Association of South Africa and Mechatronics International Conference (PRASA-RobMech)},
  pages = {86-91},
  address = {Bloemfontein, South Africa},
  isbn = {978-1-5386-2314-5, 978-1-5386-2313-8},
  doi = {10.1109/RoboMech.2017.8261128},
}
Van Heerden, C. J., Karakos, D. ., Narasimhan, K. ., Davel, M. H., & Schwartz, R. . (2017). Constructing Sub-Word Units for Spoken Term Detection. In IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP). New Orleans, Louisiana. http://doi.org/10.1109/ICASSP.2017.7953264

Spoken term detection, especially of out-of-vocabulary (OOV) key-words, benefits from the use of sub-word systems. We experiment with different language-dependent approaches to sub-word unit generation, generating both syllable-like and morpheme-like units, and demonstrate how the performance of syllable-like units can be improved by artificially increasing the number of unique units. The effect of unit choice is empirically evaluated using the eight languages from the 2016 IARPA BABEL evaluation.

@{276,
  author = {Charl Van Heerden and Damianos Karakos and Karthik Narasimhan and Marelie Davel and Richard Schwartz},
  title = {Constructing Sub-Word Units for Spoken Term Detection},
  abstract = {Spoken term detection, especially of out-of-vocabulary (OOV) key-words, benefits from the use of sub-word systems. We experiment with different language-dependent approaches to sub-word unit generation, generating both syllable-like and morpheme-like units, and demonstrate how the performance of syllable-like units can be improved by artificially increasing the number of unique units. The effect of unit choice is empirically evaluated using the eight languages from the 2016 IARPA BABEL evaluation.},
  year = {2017},
  journal = {IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
  pages = {5780-5784},
  address = {New Orleans,  Louisiana},
  isbn = {9781509041176},
  doi = {10.1109/ICASSP.2017.7953264},
}
Van der Walt, C. ., & Barnard, E. . (2017). Variable Kernel Density Estimation in High-dimensional Feature Spaces. In AAAI Conf. on Artificial Intelligence (AAAI-17).

Estimating the joint probability density function of a dataset is a central task in many machine learning applications. In this work we address the fundamental problem of kernel bandwidth estimation for variable kernel density estimation in high-dimensional feature spaces. We derive a variable kernel bandwidth estimator by minimizing the leave-one-out entropy objective function and show that this estimator is capable of performing estimation in high-dimensional feature spaces with great success. We compare the performance of this estimator to state-of-the art maximum likelihood estimators on a number of representative high-dimensional machine learning tasks and show that the newly introduced minimum leave-one-out entropy estimator performs optimally on a number of high-dimensional datasets considered.

@{275,
  author = {Christiaan Van der Walt and Etienne Barnard},
  title = {Variable Kernel Density Estimation in High-dimensional Feature Spaces},
  abstract = {Estimating the joint probability density function of a dataset is a central task in many machine learning applications. In this work we address the fundamental problem of kernel bandwidth estimation for variable kernel density estimation in high-dimensional feature spaces. We derive a variable kernel bandwidth estimator by minimizing the leave-one-out entropy objective function and show that this estimator is capable of performing estimation in high-dimensional feature spaces with great success. We compare the performance of this estimator to state-of-the art maximum likelihood estimators on a number of representative high-dimensional machine learning tasks and show that the newly introduced minimum leave-one-out entropy estimator performs optimally on a number of high-dimensional datasets considered.},
  year = {2017},
  journal = {AAAI Conf. on Artificial Intelligence (AAAI-17)},
  pages = {2674-2680},
  month = {04/02-09/04},
}
Giwa, O. ., & Davel, M. H. (2017). The Effect of Language Identification Accuracy on Speech Recognition Accuracy of Proper Names. In Pattern Recognition Association of South Africa and Mechatronics International Conference (PRASA-RobMech). Bloemfontein, South Africa. http://doi.org/10.1109/RoboMech.2017.8261145

Utilizing the known language of origin of a name can be useful when predicting the pronunciation of the name. When this language is not known, automatic language identification (LID) can be used to influence which language-specific grapheme-to-phoneme (G2P) predictor is triggered to produce a pronunciation for the name. We investigate the implications when both the LID system and the G2P system generate errors: what influence does this have on a resulting speech recognition system? We experiment with different approaches to LID-based dictionary creation and report on results in four South African languages: Afrikaans, English, Sesotho and isiZulu.

@{274,
  author = {Oluwapelumi Giwa and Marelie Davel},
  title = {The Effect of Language Identification Accuracy on Speech Recognition Accuracy of Proper Names},
  abstract = {Utilizing the known language of origin of a name can be useful when predicting the pronunciation of the name. When this language is not known, automatic language identification (LID) can be used to influence which language-specific grapheme-to-phoneme (G2P) predictor is triggered to produce a pronunciation for the name. We investigate the implications when both the LID system and the G2P system generate errors: what influence does this have on a resulting speech recognition system? We experiment with different approaches to LID-based dictionary creation and report on results in four South African languages: Afrikaans, English, Sesotho and isiZulu.},
  year = {2017},
  journal = {Pattern Recognition Association of South Africa and Mechatronics International Conference (PRASA-RobMech)},
  pages = {187-192},
  address = {Bloemfontein, South Africa},
  isbn = {978-1-5386-2314-5, 978-1-5386-2313-8},
  doi = {10.1109/RoboMech.2017.8261145},
}
Giwa, O. ., & Davel, M. H. (2017). Bilateral G2P Accuracy: Measuring the effect of variants. In Pattern Recognition Association of South Africa and Mechatronics International Conference (PRASA-RobMech). Bloemfontein, South Africa. http://doi.org/10.1109/RoboMech.2017.8261149

Incorporating pronunciation variants in a dictionary is controversial, as this can be either advantageous or detrimental for a speech recognition system. Grapheme-ophoneme (G2P) accuracy can help guide this decision, but calculating the G2P accuracy of variant-based dictionaries is not fully straightforward. We propose a variant matching technique to measure G2P accuracy in a principled way, when both the reference and hypothesized dictionaries may include variants. We use the new measure to evaluate G2P accuracy and speech recognition performance of systems developed with an existing set of dictionaries, and observe a better correlation between G2P accuracy and speech recognition performance, than when utilising alternative metrics.

@{273,
  author = {Oluwapelumi Giwa and Marelie Davel},
  title = {Bilateral G2P Accuracy: Measuring the effect of variants},
  abstract = {Incorporating pronunciation variants in a dictionary is controversial, as this can be either advantageous or detrimental for a speech recognition system. Grapheme-ophoneme (G2P) accuracy can help guide this decision, but calculating the G2P accuracy of variant-based dictionaries is not fully straightforward. We propose a variant matching technique to measure G2P accuracy in a principled way, when both the reference and hypothesized dictionaries may include variants. We use the new measure to evaluate G2P accuracy and speech recognition performance of systems developed with an existing set of dictionaries, and observe a better correlation between G2P accuracy and speech recognition performance, than when utilising alternative metrics.},
  year = {2017},
  journal = {Pattern Recognition Association of South Africa and Mechatronics International Conference (PRASA-RobMech)},
  pages = {208-213},
  address = {Bloemfontein, South Africa},
  isbn = {978-1-5386-2314-5, 978-1-5386-2313-8},
  doi = {10.1109/RoboMech.2017.8261149},
}
De Wet, F. ., Kleynhans, N. ., Van Compernolle, D. ., & Sahraeian, R. . (2017). Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems. South African Journal of Science , 113(1/2). http://doi.org/https://doi.org/10.17159/sajs.2017/20160038

For purposes of automated speech recognition in under-resourced environments, techniques used to share acoustic data between closely related or similar languages become important. Donor languages with abundant resources can potentially be used to increase the recognition accuracy of speech systems developed in the resource poor target language. The assumption is that adding more data will increase the robustness of the statistical estimations captured by the acoustic models. In this study we investigated data sharing between Afrikaans and Flemish – an under-resourced and well-resourced language, respectively. Our approach was focused on the exploration of model adaptation and refinement techniques associated with hidden Markov model based speech recognition systems to improve the benefit of sharing data. Specifically, we focused on the use of currently available techniques, some possible combinations and the exact utilisation of the techniques during the acoustic model development process. Our findings show that simply using normal approaches to adaptation and refinement does not result in any benefits when adding Flemish data to the Afrikaans training pool. The only observed improvement was achieved when developing acoustic models on all available data but estimating model refinements and adaptations on the target data only. Significance: • Acoustic modelling for under-resourced languages • Automatic speech recognition for Afrikaans • Data sharing between Flemish and Afrikaans to improve acoustic modelling for Afrikaans

@article{272,
  author = {Febe De Wet and Neil Kleynhans and Dirk Van Compernolle and Reza Sahraeian},
  title = {Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems},
  abstract = {For purposes of automated speech recognition in under-resourced environments, techniques used to
share acoustic data between closely related or similar languages become important. Donor languages
with abundant resources can potentially be used to increase the recognition accuracy of speech
systems developed in the resource poor target language. The assumption is that adding more data will
increase the robustness of the statistical estimations captured by the acoustic models. In this study
we investigated data sharing between Afrikaans and Flemish – an under-resourced and well-resourced
language, respectively. Our approach was focused on the exploration of model adaptation and refinement
techniques associated with hidden Markov model based speech recognition systems to improve the
benefit of sharing data. Specifically, we focused on the use of currently available techniques, some
possible combinations and the exact utilisation of the techniques during the acoustic model development
process. Our findings show that simply using normal approaches to adaptation and refinement does
not result in any benefits when adding Flemish data to the Afrikaans training pool. The only observed
improvement was achieved when developing acoustic models on all available data but estimating model
refinements and adaptations on the target data only.
Significance:
• Acoustic modelling for under-resourced languages
• Automatic speech recognition for Afrikaans
• Data sharing between Flemish and Afrikaans to improve acoustic modelling for Afrikaans},
  year = {2017},
  journal = {South African Journal of Science},
  volume = {113},
  pages = {25-33},
  issue = {1/2},
  publisher = {Academy of Science for South Africa (ASSAf)},
  doi = {https://doi.org/10.17159/sajs.2017/20160038},
}
Ogundele, O. ., Moodley, D. ., Seebregts, C. ., & Pillay, A. . (2017). Building Semantic Causal Models to Predict Treatment Adherence for Tuberculosis Patients in Sub-Saharan Africa. In Software Engineering in Health Care, LNCS vol. 9062. Springer, Cham. http://doi.org/https://doi.org/10.1007/978-3-319-63194-3_6

Poor adherence to prescribed treatment is a major factor contributing to tuberculosis patients developing drug resistance and failing treatment. Treatment adherence behaviour is influenced by diverse personal, cultural and socio-economic factors that vary between regions and communities. Decision network models can potentially be used to predict treatment adherence behaviour. However, determining the network structure (identifying the factors and their causal relations) and the conditional probabilities is a challenging task. To resolve the former we developed an ontology supported by current scientific literature to categorise and clarify the similarity and granularity of factors.

@inbook{250,
  author = {Olukunle Ogundele and Deshen Moodley and Chris Seebregts and Anban Pillay},
  title = {Building Semantic Causal Models to Predict Treatment Adherence for Tuberculosis Patients in Sub-Saharan Africa},
  abstract = {Poor adherence to prescribed treatment is a major factor contributing to tuberculosis patients developing drug resistance and failing treatment. Treatment adherence behaviour is influenced by diverse personal, cultural and socio-economic factors that vary between regions and communities. Decision network models can potentially be used to predict treatment adherence behaviour. However, determining the network structure (identifying the factors and their causal relations) and the conditional probabilities is a challenging task. To resolve the former we developed an ontology supported by current scientific literature to categorise and clarify the similarity and granularity of factors.},
  year = {2017},
  journal = {Software Engineering in Health Care, LNCS vol. 9062},
  pages = {81 - 95},
  publisher = {Springer, Cham},
  isbn = {978-3-319-63193-6},
  doi = {https://doi.org/10.1007/978-3-319-63194-3_6},
}
Berglund, M. ., & van der Merwe, B. . (2017). On the semantics of regular expression parsing in the wild. Theoretical Computer Science, 679. http://doi.org/http://dx.doi.org/10.1016/j.tcs.2016.09.006

We introduce prioritized transducers to formalize capturing groups in regular expression matching in a way that permits straightforward modeling of capturing in Java’s 1 regular expression library. The broader questions of parsing semantics and performance are also considered. In addition, the complexity of deciding equivalence of regular expressions with capturing groups is investigated.

@article{218,
  author = {Martin Berglund and Brink van der Merwe},
  title = {On the semantics of regular expression parsing in the wild},
  abstract = {We introduce prioritized transducers to formalize capturing groups in regular expression
matching in a way that permits straightforward modeling of capturing in Java’s 1 regular
expression library. The broader questions of parsing semantics and performance are also
considered. In addition, the complexity of deciding equivalence of regular expressions with
capturing groups is investigated.},
  year = {2017},
  journal = {Theoretical Computer Science},
  volume = {679},
  pages = {69 - 82},
  publisher = {Elsevier},
  isbn = {0304-3975},
  url = {https://www.sciencedirect.com/science/article/pii/S0304397516304790?via%3Dihub},
  doi = {http://dx.doi.org/10.1016/j.tcs.2016.09.006},
}
Watson, B. ., Runge, T. ., Schaefer, I. ., & Cleophas, L. . (2017). Many-MADFAct: Concurrently Constructing MADFAs. In Prague Stringology Conference 2017. Prague Stringology Club. Retrieved from https://dblp.org/db/conf/stringology/stringology2017

No Abstract

@{215,
  author = {Bruce Watson and T. Runge and I. Schaefer and L.G.W.A. Cleophas},
  title = {Many-MADFAct: Concurrently Constructing MADFAs},
  abstract = {No Abstract},
  year = {2017},
  journal = {Prague Stringology Conference 2017},
  pages = {127-142},
  month = {28/08-30/08},
  publisher = {Prague Stringology Club},
  isbn = {978-80-01-06193-0},
  url = {https://dblp.org/db/conf/stringology/stringology2017},
}
Watson, B. . (2017). Efficient pattern matching in degenerate strings with the Burrows-Wheeler transform. In WCTA 2017 12th Workshop on Compression, Text and Algorithms. Retrieved from pages.di.unipi.it/spire2017/wcta.html

No Abstract

@{214,
  author = {Bruce Watson},
  title = {Efficient pattern matching in degenerate strings with the Burrows-Wheeler transform},
  abstract = {No Abstract},
  year = {2017},
  journal = {WCTA 2017 12th Workshop on Compression, Text and Algorithms},
  pages = {1-7},
  month = {29/09},
  url = {pages.di.unipi.it/spire2017/wcta.html},
}
Watson, B. ., Nxumalo, M. ., Kourie, D. ., & Cleophas, L. . (2017). An Assessment of Algorithms for Deriving Failure Deterministic Finite Automata. South African Computer Journal, 29(1). Retrieved from http://dx.doi.org/10.18489/sacj.v29i1.456

No Abstract

@article{213,
  author = {Bruce Watson and M. Nxumalo and D.G Kourie and L.G.W.A. Cleophas},
  title = {An Assessment of Algorithms for Deriving Failure Deterministic Finite Automata},
  abstract = {No Abstract},
  year = {2017},
  journal = {South African Computer Journal},
  volume = {29},
  pages = {43-68},
  issue = {1},
  isbn = {2313-7835},
  url = {http://dx.doi.org/10.18489/sacj.v29i1.456},
}
Watson, B. ., & Daykin, J. . (2017). Indeterminate String Factorizations and Degenerate Text Transformations. Mathematics in Computer Science, 11(2). Retrieved from https://core.ac.uk/download/pdf/81595959.pdf

No Abstract

@article{212,
  author = {Bruce Watson and J.W. Daykin},
  title = {Indeterminate String Factorizations and Degenerate Text Transformations},
  abstract = {No Abstract},
  year = {2017},
  journal = {Mathematics in Computer Science},
  volume = {11},
  pages = {209-218},
  issue = {2},
  isbn = {1661-8270},
  url = {https://core.ac.uk/download/pdf/81595959.pdf},
}
de Waal, A. ., Koen, H. ., de Villiers, J. ., & Roodt, H. . (2017). An expert-driven causal model of the rhino poaching problem. Ecological Modelling, 347. Retrieved from https://www.sciencedirect.com/science/article/pii/S0304380016307621

A significant challenge in ecological modelling is the lack of complete sets of high-quality data. This is especially true in the rhino poaching problem where data is incomplete. Although there are many poaching attacks, they can be spread over a vast surface area such as in the case of the Kruger National Park in South Africa, which is roughly the size of Israel. Bayesian networks are useful reasoning tools and can utilise expert knowledge when data is insufficient or sparse. Bayesian networks allow the modeller to incorporate data, expert knowledge, or any combination of the two. This flexibility of Bayesian networks makes them ideal for modelling complex ecological problems. In this paper an expert-driven model of the rhino poaching problem is presented. The development as well as the evaluation of the model is performed from an expert perspective. Independent expert evaluation is performed in the form of queries that test different scenarios. Structuring the rhino poaching problem as a causal network yields a framework that can be used to reason about the problem, as well as inform the modeller of the type of data that has to be gathered.

@article{191,
  author = {Alta de Waal and Hildegarde Koen and J.P de Villiers and Henk Roodt},
  title = {An expert-driven causal model of the rhino poaching problem},
  abstract = {A significant challenge in ecological modelling is the lack of complete sets of high-quality data. This is especially true in the rhino poaching problem where data is incomplete. Although there are many poaching attacks, they can be spread over a vast surface area such as in the case of the Kruger National Park in South Africa, which is roughly the size of Israel. Bayesian networks are useful reasoning tools and can utilise expert knowledge when data is insufficient or sparse. Bayesian networks allow the modeller to incorporate data, expert knowledge, or any combination of the two. This flexibility of Bayesian networks makes them ideal for modelling complex ecological problems. In this paper an expert-driven model of the rhino poaching problem is presented. The development as well as the evaluation of the model is performed from an expert perspective. Independent expert evaluation is performed in the form of queries that test different scenarios. Structuring the rhino poaching problem as a causal network yields a framework that can be used to reason about the problem, as well as inform the modeller of the type of data that has to be gathered.},
  year = {2017},
  journal = {Ecological Modelling},
  volume = {347},
  pages = {29-39},
  publisher = {Elsevier},
  isbn = {0304-3800},
  url = {https://www.sciencedirect.com/science/article/pii/S0304380016307621},
}
Gueorguiev, V. ., & Moodley, D. . (2017). Hyperparameter Optimization for Astronomy. University of Cape Town. Retrieved from http://projects.cs.uct.ac.za/honsproj/cgi-bin/view/2017/gueorguiev_henhaeyono_stopforth.zip/#downloads

The task of phenomenon classification in astronomy provides a novel and challenging setting for the application of state-of-the-art techniques addressing the problem of combined algorithm selection and hyperparameter optimization (CASH) of machine learning algorithms, which find local applications such as at the data-intensive Square Kilometre Array (SKA). This work will use various algorithms for CASH to explore the possibility and efficacy of hyperparameter optimization on improving performance of machine learning techniques for astronomy. Then, with focus on the Galaxy Zoo project, these algorithms will be used to conduct an indepth comparison of state-of-the-art in hyperparameter optimization (HPO) along with techniques that aim to improve performance on large datasets and expensive function evaluations. Finally, the likelihood for an integration with a cognitive vision system for astronomy will be examined by conducting a brief exploration into different feature extraction and selection methods.

@phdthesis{180,
  author = {V. Gueorguiev and Deshen Moodley},
  title = {Hyperparameter Optimization for Astronomy},
  abstract = {The task of phenomenon classification in astronomy provides a novel and challenging setting for the application of state-of-the-art techniques addressing the problem of combined
algorithm selection and hyperparameter optimization (CASH) of machine learning algorithms, which find local applications such as at the data-intensive Square Kilometre Array
(SKA). This work will use various algorithms for CASH to explore the possibility and efficacy of hyperparameter optimization on improving performance of machine learning
techniques for astronomy. Then, with focus on the Galaxy Zoo project, these algorithms will be used to conduct an indepth comparison of state-of-the-art in hyperparameter optimization
(HPO) along with techniques that aim to improve performance on large datasets and expensive function evaluations. Finally, the likelihood for an integration with a cognitive
vision system for astronomy will be examined by conducting a brief exploration into different feature extraction and selection methods.},
  year = {2017},
  volume = {Honours},
  publisher = {University of Cape Town},
  url = {http://projects.cs.uct.ac.za/honsproj/cgi-bin/view/2017/gueorguiev_henhaeyono_stopforth.zip/#downloads},
}
Watson, B. ., Strauss, T. ., Kourie, D. ., & Cleophas, L. . (2017). CSP for Parallelising Brzozowski’s DFA Construction Algorithm. In The Role of Theory in Computer Science. World Scientific Publishing Co. Pte. Ltd. Retrieved from https://doi.org/10.1142/9789813148208_0010

No Abstract

@inbook{179,
  author = {Bruce Watson and T. Strauss and D.G Kourie and L.G.W.A. Cleophas},
  title = {CSP for Parallelising Brzozowski’s DFA Construction Algorithm},
  abstract = {No Abstract},
  year = {2017},
  journal = {The Role of Theory in Computer Science},
  pages = {217-243},
  publisher = {World Scientific Publishing Co. Pte. Ltd.},
  isbn = {978-981-3148-19-2},
  url = {https://doi.org/10.1142/9789813148208_0010},
}
van der Merwe, B. ., Weideman, N. ., & Berglund, M. . (2017). Turning evil regexes harmless. In Conference of South African Institute of Computer Scientists and Information Technologists (SAICSIT’17). ACM. Retrieved from https://dl.acm.org/citation.cfm?id=3129416

No Abstract

@{178,
  author = {Brink van der Merwe and N. Weideman and Martin Berglund},
  title = {Turning evil regexes harmless},
  abstract = {No Abstract},
  year = {2017},
  journal = {Conference of South African Institute of Computer Scientists and Information Technologists (SAICSIT'17)},
  month = {26/09-28/09},
  publisher = {ACM},
  url = {https://dl.acm.org/citation.cfm?id=3129416},
}
Berglund, M. ., Björklund, H. ., & Drewes, F. . (2017). Single-rooted DAGs in regular DAG languages: Parikh image and path languages. In International Workshop on Tree Adjoining Grammars and Related Formalisms. The Association for Computational Linguistics (ACL). Retrieved from http://www.aclweb.org/anthology/W/W17/W17-62.pdf

No Abstract

@{177,
  author = {Martin Berglund and H. Björklund and F. Drewes},
  title = {Single-rooted DAGs in regular DAG languages: Parikh image and path languages},
  abstract = {No Abstract},
  year = {2017},
  journal = {International Workshop on Tree Adjoining Grammars and Related Formalisms},
  pages = {94-101},
  month = {04/09-06/09},
  publisher = {The Association for Computational Linguistics (ACL)},
  isbn = {978-1-945626-98-2},
  url = {http://www.aclweb.org/anthology/W/W17/W17-62.pdf},
}
Berglund, M. ., & van der Merwe, B. . (2017). Regular Expressions with Backreferences Re-examined. In The Prague Stringology Conference (PSC 2017). Czech Technical University in Prague.

Most modern regular expression matching libraries (one of the rare exceptions being Google’s RE2) allow backreferences, operations which bind a substring to a variable allowing it to be matched again verbatim. However, different implementations not only vary in the syntax permitted when using backreferences, but both implementations and definitions in the literature offer up a number of different variants on how backreferences match. Our aim is to compare the various flavors by considering the formal languages that each can describe, resulting in the establishment of a hierarchy of language classes. Beyond the hierarchy itself, some complexity results are given, and as part of the effort on comparing language classes new pumping lemmas are established, and old ones extended to new classes.

@{176,
  author = {Martin Berglund and Brink van der Merwe},
  title = {Regular Expressions with Backreferences Re-examined},
  abstract = {Most modern regular expression matching libraries (one of the rare exceptions being Google’s RE2) allow backreferences, operations which bind a substring to a variable allowing it to be matched again verbatim. However, different implementations not only vary in the syntax permitted when using backreferences, but both implementations and definitions in the literature offer up a number of different variants on how backreferences match. Our aim is to compare the various flavors by considering the formal languages that each can describe, resulting in the establishment of a hierarchy of language classes. Beyond the hierarchy itself, some complexity results are given, and as part of the effort on comparing language classes new pumping lemmas are established, and old ones extended to new classes.},
  year = {2017},
  journal = {The Prague Stringology Conference (PSC 2017)},
  pages = {30-41},
  month = {28/08-30/08},
  address = {Czech Technical University in Prague,},
  isbn = {ISBN 978-80-01-06193-0},
}
Berglund, M. ., van der Merwe, B. ., Watson, B. ., & Weideman, N. . (2017). On the Semantics of Atomic Subgroups in Practical Regular Expressions. In Implementation and Application of Automata, 22nd International Conference, CIAA 2017. Marne-la-Vallee, France: Springer. Retrieved from http://www.springer.com/978-3-319-60133-5

Most regular expression matching engines have operators and features to enhance the succinctness of classical regular expressions, such as interval quantifiers and regular lookahead. In addition, matching engines in for example Perl, Java, Ruby and .NET, also provide operators, such as atomic operators, that constrain the backtracking behavior of the engine. The most common use is to prevent needless backtracking, but the operators will often also change the language accepted. As such it is essential to develop a theoretical sound basis for the matching semantics of regular expressions with atomic operators. We here establish that atomic operators preserve regularity, but are exponentially more succinct for some languages. Further we investigate the state complexity of deterministic and non-deterministic finite automata accepting the language corresponding to a regular expression with atomic operators, and show that emptiness testing is PSPACE-complete.

@{175,
  author = {Martin Berglund and Brink van der Merwe and Bruce Watson and N. Weideman},
  title = {On the Semantics of Atomic Subgroups in Practical Regular Expressions},
  abstract = {Most regular expression matching engines have operators and features to enhance the succinctness of classical regular expressions, such as interval quantifiers and regular lookahead. In addition, matching engines in for example Perl, Java, Ruby and .NET, also provide operators, such as atomic operators, that constrain the backtracking behavior of the engine. The most common use is to prevent needless backtracking, but the operators will often also change the language accepted. As such it is essential to develop a theoretical sound basis for the matching semantics of regular expressions with atomic operators. We here establish that atomic operators preserve regularity, but are exponentially more succinct for some languages. Further we investigate the state complexity of deterministic and non-deterministic finite automata accepting the language corresponding to a regular expression with atomic operators, and show that emptiness testing is PSPACE-complete.},
  year = {2017},
  journal = {Implementation and Application of Automata, 22nd International Conference, CIAA 2017},
  pages = {14-26},
  month = {27/06-30/06},
  publisher = {Springer},
  address = {Marne-la-Vallee, France},
  isbn = {978-3-319-60133-5},
  url = {http://www.springer.com/978-3-319-60133-5},
}
Fischer, B. ., Esterhuizen, M. ., & Greene, G. . (2017). Visualizing and Exploring Software Version Control Repositories using Interactive Tag Clouds over Formal Concept Lattices. Elsevier, 87(2017). Retrieved from https://www.sciencedirect.com/science/article/pii/S0950584916304050?via%3Dihub

Context: version control repositories contain a wealth of implicit information that can be used to answer many questions about a project’s development process. However, this information is not directly accessible in the repositories and must be extracted and visualized. Objective: the main objective of this work is to develop a flexible and generic interactive visualization engine called ConceptCloud that supports exploratory search in version control repositories. Method: ConceptCloud is a flexible, interactive browser for SVN and Git repositories. Its main novelty is the combination of an intuitive tag cloud visualization with an underlying concept lattice that provides a formal structure for navigation. ConceptCloud supports concurrent navigation in multiple linked but individually customizable tag clouds, which allows for multi-faceted repository browsing, and scriptable construction of unique visualizations. Results: we describe the mathematical foundations and implementation of our approach and use ConceptCloud to quickly gain insight into the team structure and development process of three projects. We perform a user study to determine the usability of ConceptCloud. We show that untrained participants are able to answer historical questions about a software project better using ConceptCloud than using a linear list of commits. Conclusion: ConceptCloud can be used to answer many difficult questions such as “What has happened in this project while I was away?” and “Which developers collaborate?”. Tag clouds generated from our approach provide a visualization in which version control data can be aggregated and explored interactively.

@article{174,
  author = {Bernd Fischer and M. Esterhuizen and G.J. Greene},
  title = {Visualizing and Exploring Software Version Control Repositories using Interactive Tag Clouds over Formal Concept Lattices},
  abstract = {Context: version control repositories contain a wealth of implicit information that can be used to answer many questions about a project’s development process. However, this information is not directly accessible in the repositories and must be extracted and visualized.
Objective: the main objective of this work is to develop a flexible and generic interactive visualization engine called ConceptCloud that supports exploratory search in version control repositories.
Method: ConceptCloud is a flexible, interactive browser for SVN and Git repositories. Its main novelty is the combination of an intuitive tag cloud visualization with an underlying concept lattice that provides a formal structure for navigation. ConceptCloud supports concurrent navigation in multiple linked but individually customizable tag clouds, which allows for multi-faceted repository browsing, and scriptable construction of unique visualizations.
Results: we describe the mathematical foundations and implementation of our approach and use ConceptCloud to quickly gain insight into the team structure and development process of three projects. We perform a user study to determine the usability of ConceptCloud. We show that untrained participants are able to answer historical questions about a software project better using ConceptCloud than using a linear list of commits.
Conclusion: ConceptCloud can be used to answer many difficult questions such as “What has happened in this project while I was away?” and “Which developers collaborate?”. Tag clouds generated from our approach provide a visualization in which version control data can be aggregated and explored interactively.},
  year = {2017},
  journal = {Elsevier},
  volume = {87},
  pages = {223-241},
  issue = {2017},
  url = {https://www.sciencedirect.com/science/article/pii/S0950584916304050?via%3Dihub},
}
Fischer, B. ., Dunaiski, M. ., & Greene, G. . (2017). Exploratory Search of Academic Publication and Citation Data using Interactive Tag Cloud Visualizations. Scientometrics (Springer), 110(3). Retrieved from https://link.springer.com/article/10.1007%2Fs11192-016-2236-3

Acquiring an overview of an unfamiliar discipline and exploring relevant papers and journals is often a laborious task for researchers. In this paper we show how exploratory search can be supported on a large collection of academic papers to allow users to answer complex scientometric questions which traditional retrieval approaches do not support optimally. We use our ConceptCloud browser, which makes use of a combination of concept lattices and tag clouds, to visually present academic publication data (specifically, the ACM Digital Library) in a browsable format that facilitates exploratory search. We augment this dataset with semantic categories, obtained through automatic keyphrase extraction from papers’ titles and abstracts, in order to provide the user with uniform keyphrases of the underlying data collection. We use the citations and references of papers to provide additional mechanisms for exploring relevant research by presenting aggregated reference and citation data not only for a single paper but also across topics, authors and journals, which is novel in our approach. We conduct a user study to evaluate our approach in which we asked 34 participants, from different academic backgrounds with varying degrees of research experience, to answer a variety of scientometric questions using our ConceptCloud browser. Participants were able to answer complex scientometric questions using our ConceptCloud browser with a mean correctness of 73%, with the user’s prior research experience having no statistically significant effect on the results.

@article{173,
  author = {Bernd Fischer and M. Dunaiski and G.J. Greene},
  title = {Exploratory Search of Academic Publication and Citation Data using Interactive Tag Cloud Visualizations},
  abstract = {Acquiring an overview of an unfamiliar discipline and exploring relevant papers and journals is often a laborious task for researchers. In this paper we show how exploratory search can be supported on a large collection of academic papers to allow users to answer complex scientometric questions which traditional retrieval approaches do not support optimally. We use our ConceptCloud browser, which makes use of a combination of concept lattices and tag clouds, to visually present academic publication data (specifically, the ACM Digital Library) in a browsable format that facilitates exploratory search. We augment this dataset with semantic categories, obtained through automatic keyphrase extraction from papers’ titles and abstracts, in order to provide the user with uniform keyphrases of the underlying data collection. We use the citations and references of papers to provide additional mechanisms for exploring relevant research by presenting aggregated reference and citation data not only for a single paper but also across topics, authors and journals, which is novel in our approach. We conduct a user study to evaluate our approach in which we asked 34 participants, from different academic backgrounds with varying degrees of research experience, to answer a variety of scientometric questions using our ConceptCloud browser. Participants were able to answer complex scientometric questions using our ConceptCloud browser with a mean correctness of 73%, with the user’s prior research experience having no statistically significant effect on the results.},
  year = {2017},
  journal = {Scientometrics (Springer)},
  volume = {110},
  pages = {1539-1571},
  issue = {3},
  address = {Netherlands},
  isbn = {0138-9130},
  url = {https://link.springer.com/article/10.1007%2Fs11192-016-2236-3},
}
Britz, K. ., & Varzinczak, I. . (2017). Context-based defeasible subsumption for dSROIQ. In 13th International Symposium on Commonsense Reasoning.

The description logic dSROIQ is a decidable extension of SROIQ that supports defeasible reasoning in the KLM tradition. It features a parameterised preference order on binary relations in a domain of interpretation, which allows for the use of defeasible roles in complex concepts, as well as in defeasible concept and role subsumption, and in defeasible role assertions. In this paper, we address an important limitation both in dSROIQ and in other defeasible extensions of description logics, namely the restriction in the semantics of defeasible concept subsumption to a single preference order on objects. We do this by inducing preference orders on objects from preference orders on roles, and use these to relativise defeasible subsumption. This yields a notion of contextualised defeasible subsumption, with contexts described by roles.

@{169,
  author = {Katarina Britz and Ivan Varzinczak},
  title = {Context-based defeasible subsumption for dSROIQ},
  abstract = {The description logic dSROIQ is a decidable extension of SROIQ that supports defeasible reasoning in the KLM tradition. It features a parameterised preference order on binary relations in a domain of interpretation, which allows for the use of defeasible roles in complex concepts, as well as in defeasible concept and role subsumption, and in defeasible role assertions. In this paper, we address an important limitation both in dSROIQ and in other defeasible extensions of description logics, namely the restriction in the semantics of defeasible concept subsumption to a single preference order on objects. We do this by inducing preference orders on objects from preference orders on roles, and use these to relativise defeasible subsumption. This yields a notion of contextualised defeasible subsumption, with contexts described by roles.},
  year = {2017},
  journal = {13th International Symposium on Commonsense Reasoning},
  month = {06/11-08/11},
}
Britz, K. ., & Varzinczak, I. . (2017). Towards defeasible SROIQ. Retrieved from http://ceur-ws.org/Vol-1879/

We present a decidable extension of the Description Logic SROIQ that supports defeasible reasoning in the KLM tradition, and extends it through the introduction of defeasible roles. The semantics of the resulting DL dSROIQ extends the classical semantics with a parameterised preference order on binary relations in a domain of interpretation. This allows for the use of defeasible roles in complex concepts, as well as in defeasible concept and role subsumption, and in defeasible role assertions. Reasoning over dSROIQ ontologies is made possible by a translation of entailment to concept satisfiability relative to an RBox only. A tableau algorithm then decides on consistency of dSROIQ-concepts in the preferential semantics.

@misc{168,
  author = {Katarina Britz and Ivan Varzinczak},
  title = {Towards defeasible SROIQ},
  abstract = {We present a decidable extension of the Description Logic SROIQ that supports defeasible reasoning in the KLM tradition, and extends it through the introduction of defeasible roles. The semantics of the resulting DL dSROIQ extends the classical semantics with a parameterised preference order on binary relations in a domain of interpretation. This allows for the use of defeasible roles in complex concepts, as well as in defeasible concept and role subsumption, and in defeasible role assertions.  Reasoning over dSROIQ ontologies is made possible by a translation of entailment to concept satisfiability relative to an RBox only. A tableau algorithm then decides on consistency of dSROIQ-concepts in the preferential semantics.},
  year = {2017},
  isbn = {ISSN 1613-0073},
  url = {http://ceur-ws.org/Vol-1879/},
}
Casini, G. ., & Meyer, T. . (2017). Belief Change in a Preferential Non-Monotonic Framework. In International Joint Conference on Artificial Intelligence (IJCAI-17).

Belief change and non-monotonic reasoning are usually viewed as two sides of the same coin, with results showing that one can formally be defined in terms of the other. In this paper we show that we can also integrate the two formalisms by studying belief change within a (preferential) non-monotonic framework. This integration relies heavily on the identification of the monotonic core of a non-monotonic framework. We consider belief change operators in a non-monotonic propositional setting with a view towards preserving consistency. These results can also be applied to the preservation of coherence—an important notion within the field of logic-based ontologies. We show that the standard AGM approach to belief change can be adapted to a preferential non-monotonic framework, with the definition of expansion, contraction, and revision operators, and corresponding representation results. Surprisingly, preferential AGM belief change, as defined here, can be obtained in terms of classical AGM belief change.

@{167,
  author = {Giovanni Casini and Tommie Meyer},
  title = {Belief Change in a Preferential Non-Monotonic Framework},
  abstract = {Belief change and non-monotonic reasoning are usually viewed as two sides of the same coin, with results showing that one can formally be defined in terms of the other. In this paper we show that we can also integrate the two formalisms by studying belief change within a (preferential) non-monotonic framework. This integration relies heavily on the identification of the monotonic core of a non-monotonic framework. We consider belief change operators in a non-monotonic propositional setting with a view towards preserving consistency. These results can also be applied to the preservation of coherence—an important notion within the field of logic-based ontologies. We show that the standard AGM approach to belief change can be adapted to a preferential non-monotonic framework, with the definition of expansion, contraction, and revision operators, and corresponding representation results. Surprisingly, preferential AGM belief change, as defined here, can be obtained in terms of classical AGM belief change.},
  year = {2017},
  journal = {International Joint Conference on Artificial Intelligence (IJCAI-17)},
  pages = {929-935},
  month = {19/08-25/08},
  isbn = {978-0-9992411-0-3},
}
Mouton, F. ., Teixeira, M. ., & Meyer, T. . (2017). Benchmarking a Mobile Implementation of the Social Engineering Prevention Training Tool. In Information Security for South Africa (ISSA).

As the nature of information stored digitally becomes more important and confidential, the security of the systems put in place to protect this information needs to be increased. The human element, however, remains a vulnerability of the system and it is this vulnerability that social engineers attempt to exploit. The Social Engineering Attack Detection Model version 2 (SEADMv2) has been proposed to help people identify malicious social engineering attacks. Prior to this study, the SEADMv2 had not been implemented as a user friendly application or tested with real subjects. This paper describes how the SEADMv2 was implemented as an Android application. This Android application was tested on 20 subjects, to determine whether it reduces the probability of a subject falling victim to a social engineering attack or not. The results indicated that the Android implementation of the SEADMv2 significantly reducedthe number of subjects that fell victim to social engineering attacks. The Android application also significantly reduced the number of subjects that fell victim to malicious social engineering attacks, bidirectional communication social engineering attacks and indirect communication social engineering attacks. The Android application did not have a statistically significant effect on harmless scenarios and unidirectional communication social engineering attacks.

@{166,
  author = {F. Mouton and M. Teixeira and Tommie Meyer},
  title = {Benchmarking a Mobile Implementation of the Social Engineering Prevention Training Tool},
  abstract = {As the nature of information stored digitally becomes more important and confidential, the security of the systems put in place to protect this information needs to be increased. The human element, however, remains a vulnerability of the system and it is this vulnerability that social engineers attempt to exploit. The Social Engineering Attack Detection Model version 2 (SEADMv2) has been proposed to help people identify malicious social engineering attacks. Prior to this study, the SEADMv2 had not been implemented as a user friendly application or tested with real subjects. This paper describes how the SEADMv2 was implemented as an Android application. This Android application was tested on 20 subjects, to determine whether it reduces the probability of a subject falling victim to a social engineering attack or not. The results indicated that the Android implementation of the SEADMv2 significantly reducedthe number of subjects that fell victim to social engineering attacks. The Android application also significantly reduced the number of subjects that fell victim to malicious social engineering attacks, bidirectional communication social engineering attacks and indirect communication social engineering attacks. The Android application did not have a statistically significant effect on harmless scenarios and unidirectional communication social engineering attacks.},
  year = {2017},
  journal = {Information Security for South Africa (ISSA)},
  pages = {106-116},
  month = {16/08-17/08},
  isbn = {978-1-5386-0545-5},
}
  • CSIR
  • DSI
  • Covid-19