People
Latest Research Publications:
She has a PhD in Philosophy in the domains of mathematical logic and the philosophy of science. Her thesis focused on formulating a mathematical model-theoretic analysis of the structure of scientific theories. Currently, she works on themes in the philosophy of technology relating to human-technology relations, and in AI ethics on themes in machine ethics, the ethics of social robotics, and data ethics. She also does research on technology-related policy making, and focuses on generating culturally sensitive policies for trustworthy AI technologies, while aiming for global regulation. In the philosophy of science, her work is centred on debates in scientific realism, the structure of scientific theories, and the status of machine learning-based methodologies in the discovery/justification debate in the philosophy of science. Her research in both the ethics of artificial intelligence and the philosophy of science includes application of non-classical formal logics to selected problems.
Prof Ruttkamp-Bloem is a corresponding member of the International Academy for the Philosophy of Science. She has been the elected South African representative at the International Union of the History and Philosophy of Science and Technology (IUHPST) since 2014. She is an associate editor of the Journal for Science and Engineering Ethics, a member of the editorial board of Springer’s respected Synthese Library Book Series, and a member of the editorial board of Acta Baltica: Historiae et Philosophiae Scientiarum. She is the founder of the CAIR/UP ‘Artificial Intelligence for Society’ Symposium Series and the ‘South African Logic and Philosophy of Science’ Colloquium Series.
She currently has 6 current PhD, 2 MA, and 2 honours students and has delivered 5 PhD, 9 MA, and 14 Honours students. She is the author of a book in the Springer Synthese Library Series, 15 articles, and 5 book chapters. She has collaborated on 5 intergovernmental policy documents. She has been interviewed on various media platforms such as the Deutsche Welle (Germany), Tortoise Media (UK) and Business Day TV (South Africa), and has recently written an OpEd for the Mail and Guardian on the technology driving inequality, as well as a piece for Acteurs Publics in France.
In her capacity as an AI ethics policy researcher, Prof Ruttkamp-Bloem is a member of the African Union Development Agency (AUDA)-NEPAD Consultative Roundtable on Ethics in Africa, a member of the African Commission Human and People’s Rights Committee (ACHPR) task team working on the Resolution 473 study on Human and Peoples’ Rights and AI, Robotics and other New and Emerging Technologies in Africa, and the rapporteur for the UNESCO World Commission for Ethics of Scientific Knowledge and Technology (COMEST). She was the chairperson of the Bureau of the UNESCO Ad Hoc Expert Group (AHEG) on the ethics of artificial intelligence tasked to draft the Recommendation for a global instrument on the ethics of AI which was adopted by UNESCO member states after various periods of negotiation in November 2021. She is a current member of the AHEG working on implementing the Recommendation.
In addition, Prof Ruttkamp-Bloem is the South African representative at the Responsible AI Network Africa (RAIN), which is a joint venture of the Technical University Munich and the Kwame Nkrumah University of Science and Technology in Ghana. She is a member of the advisory board of the Wallenberg AI, Autonomous Systems and Software Programme (Human Sciences) hosted by Umeå University in Sweden; of the advisory board of the Global AI Ethics Institute; and, as country advisor for South Africa, of the advisory board of the International Group of Artificial Intelligence (IGOAI). She is also a member of the advisory board of the international Z-Inspection® network. The Z-Inspection® assessment method for Trustworthy AI is an approach based on the Ethics Guidelines for Trustworthy AI by the European Commission High-Level Expert Group on Artificial Intelligence. She is also on the advisory board for SAP GE in Germany. Prof Ruttkamp-Bloem is a collaborator on the IEEE’s (Institute of Electrical and Electronics Engineers) Strong Sustainability by Design and Accountable Sustainability by Design reports as part of their Planet Positive 2030 Campaign. She is also a collaborating fellow at the International Research Center for AI Ethics and Governance, Chinese Academy of Sciences. She is the co-convener of the Ethics Working Group at their AI for Atoms Technical Meeting in 2021 at the International Atomic Energy Agency (IAEA) as well as at future IAEA-ITU events. Finally, she is a member of the GAIA (Global AI Association) Think Tank on Compassionate AI.
Latest Research Publications:
In the face of the fact that AI ethics guidelines currently, on the whole, seem to have no significant impact on AI practices, the quest of AI ethics to ensure trustworthy AI is in danger of becoming nothing more than a nice ideal. Serious work is to be done to ensure AI ethics guidelines are actionable. To this end, in this paper, I argue that AI ethics should be approached 1) in a multi-disciplinary manner focused on concrete research in the discipline of the ethics of AI and 2) as a dynamic system on the basis of virtue ethics in order to work towards enabling all AI actors to take responsibility for their own actions and to hold others accountable for theirs. In conclusion, the paper emphasises the importance of understanding AI ethics as playing out on a continuum of interconnected interests across academia, civil society, public policy-making and the private sector, and a novel notion of ‘AI ethics capital’ is put on the table as outcome of actionable AI ethics and essential ingredient for sustainable trustworthy AI.
@article{384, author = {Emma Ruttkamp-Bloem}, title = {The Quest for Actionable AI Ethics}, abstract = {In the face of the fact that AI ethics guidelines currently, on the whole, seem to have no significant impact on AI practices, the quest of AI ethics to ensure trustworthy AI is in danger of becoming nothing more than a nice ideal. Serious work is to be done to ensure AI ethics guidelines are actionable. To this end, in this paper, I argue that AI ethics should be approached 1) in a multi-disciplinary manner focused on concrete research in the discipline of the ethics of AI and 2) as a dynamic system on the basis of virtue ethics in order to work towards enabling all AI actors to take responsibility for their own actions and to hold others accountable for theirs. In conclusion, the paper emphasises the importance of understanding AI ethics as playing out on a continuum of interconnected interests across academia, civil society, public policy-making and the private sector, and a novel notion of ‘AI ethics capital’ is put on the table as outcome of actionable AI ethics and essential ingredient for sustainable trustworthy AI.}, year = {2020}, journal = {Communications in Computer and Information Science}, volume = {1342}, pages = {34-52}, publisher = {Springer}, isbn = {978-3-030-66151-9}, url = {https://link.springer.com/chapter/10.1007/978-3-030-66151-9_3}, doi = {https://doi.org/10.1007/978-3-030-66151-9_3}, }
This Introduction has two foci: the first is a discussion of the motivation for and the aims of the 2014 conference on New Thinking about Scientific Realism in Cape Town South Africa, and the second is a brief contextualization of the contributed articles in this special issue of Synthese in the framework of the conference. Each focus is discussed in a separate section.
@article{416, author = {Stathis Psillos, Emma Ruttkamp-Bloem}, title = {Scientific realism: quo vadis? Introduction: new thinking about scientific realism}, abstract = {This Introduction has two foci: the first is a discussion of the motivation for and the aims of the 2014 conference on New Thinking about Scientific Realism in Cape Town South Africa, and the second is a brief contextualization of the contributed articles in this special issue of Synthese in the framework of the conference. Each focus is discussed in a separate section.}, year = {2017}, journal = {Synthese}, volume = {194}, pages = {3187-3201}, issue = {4}, publisher = {Springer}, isbn = {0039-7857 , 1573-0964}, doi = {10.1007/s11229-017-1493-x}, }
"Naturalised realism" is presented as a version of realism which is more compatible with the history of science than convergent or explanationist forms of realism. The account is unpacked according to four theses : 1) Whether realism is warranted with regards to a particular theory depends on the kind and quality of evidence available for that theory; 2) Reference is about causal interaction with the world; 3) Most of science happens somewhere in between instrumentalism and scientific realism on a continuum of stances towards the status of theories; 4) The degree to which realism is warranted has something to do with the degree to which theories successfully refer, rather than with the truth of theories.
@article{417, author = {Emma Ruttkamp-Bloem}, title = {Repositioning realism}, abstract = {"Naturalised realism" is presented as a version of realism which is more compatible with the history of science than convergent or explanationist forms of realism. The account is unpacked according to four theses : 1) Whether realism is warranted with regards to a particular theory depends on the kind and quality of evidence available for that theory; 2) Reference is about causal interaction with the world; 3) Most of science happens somewhere in between instrumentalism and scientific realism on a continuum of stances towards the status of theories; 4) The degree to which realism is warranted has something to do with the degree to which theories successfully refer, rather than with the truth of theories.}, year = {2015}, journal = {Philosphia Scientiæ}, volume = {19}, pages = {85-98}, issue = {1}, publisher = {Université Nancy 2}, isbn = {1281-2463}, doi = {10.4000/philosophiascientiae.1042}, }
Latest Research Publications:
Aim/Purpose
The aim of this project was to explore models for stimulating health
informatics innovation and capacity development in South Africa.
Background
There is generally a critical lack of health informatics innovation and capacity in South Africa and sub-Saharan Africa. This is despite the wide anticipation that digital health systems will play a fundamental role in strengthening health systems and improving service delivery
Methodology
We established a program over four years to train Masters and Doctoral students and conducted research projects across a wide range of biomedical and health informatics technologies at a leading South African university. We also developed a Health Architecture Laboratory Innovation and Development Ecosystem (HeAL-IDE) designed to be a long-lasting and potentially reproducible output of the project.
Contribution
We were able to demonstrate a successful model for building innovation and capacity in a sustainable way. Key outputs included: (i)a successful partnership model; (ii) a sustainable HeAL-IDE; (iii) research papers; (iv) a world-class software product and several
demonstrators; and (iv) highly trained staff.
Findings
Our main findings are that: (i) it is possible to create a local ecosystem for innovation and capacity building that creates value for the partners (a university and a private non-profit company); (ii) the ecosystem is able to create valuable outputs that would be much less likely to have been developed singly by each partner, and; (iii) the ecosystem could serve as a powerful model for adoption in other settings.
Recommendations for Practitioners
Non-profit companies and non-governmental organizations implementing health information systems in South Africa and other low resource settings have an opportunity to partner with local universities for purposes of internal capacity development and assisting with the research, reflection and innovation aspects of their projects and programmes.
Recommendation for Researchers
Applied health informatics researchers working in low resource settings could productively partner with local implementing organizations in order to gain a better understanding of the challenges and requirements at field sites and to accelerate the testing and deployment of health information technology solutions.
Impact on Society
This research demonstrates a model that can deliver valuable software products for public health.
Future Research
It would be useful to implement the model in other settings and research whether the model is more generally useful
@{252, author = {Deshen Moodley, Anban Pillay, Chris Seebregts}, title = {Establishing a Health Informatics Research Laboratory in South Africa}, abstract = {Aim/Purpose The aim of this project was to explore models for stimulating health informatics innovation and capacity development in South Africa. Background There is generally a critical lack of health informatics innovation and capacity in South Africa and sub-Saharan Africa. This is despite the wide anticipation that digital health systems will play a fundamental role in strengthening health systems and improving service delivery Methodology We established a program over four years to train Masters and Doctoral students and conducted research projects across a wide range of biomedical and health informatics technologies at a leading South African university. We also developed a Health Architecture Laboratory Innovation and Development Ecosystem (HeAL-IDE) designed to be a long-lasting and potentially reproducible output of the project. Contribution We were able to demonstrate a successful model for building innovation and capacity in a sustainable way. Key outputs included: (i)a successful partnership model; (ii) a sustainable HeAL-IDE; (iii) research papers; (iv) a world-class software product and several demonstrators; and (iv) highly trained staff. Findings Our main findings are that: (i) it is possible to create a local ecosystem for innovation and capacity building that creates value for the partners (a university and a private non-profit company); (ii) the ecosystem is able to create valuable outputs that would be much less likely to have been developed singly by each partner, and; (iii) the ecosystem could serve as a powerful model for adoption in other settings. Recommendations for Practitioners Non-profit companies and non-governmental organizations implementing health information systems in South Africa and other low resource settings have an opportunity to partner with local universities for purposes of internal capacity development and assisting with the research, reflection and innovation aspects of their projects and programmes. Recommendation for Researchers Applied health informatics researchers working in low resource settings could productively partner with local implementing organizations in order to gain a better understanding of the challenges and requirements at field sites and to accelerate the testing and deployment of health information technology solutions. Impact on Society This research demonstrates a model that can deliver valuable software products for public health. Future Research It would be useful to implement the model in other settings and research whether the model is more generally useful}, year = {2018}, journal = {Digital Re-imagination Colloquium 2018}, pages = {16 - 24}, month = {13/03 - 15/03}, publisher = {NEMISA}, isbn = {978-0-6399275-0-3}, url = {http://uir.unisa.ac.za/bitstream/handle/10500/25615/Digital%20Skills%20Proceedings%202018.pdf?sequence=1&isAllowed=y}, }
• Several different paradigms and standards exist for creating digital health architectures that
are mostly complementary, but sometimes contradictory.
• The potential benefits of using EA
approaches and tools are that they help to ensure the appropriate use of standards for
interoperability and data storage and exchange, and encourage the creation of reusable
software components and metadata.
@article{162, author = {Chris Seebregts, Anban Pillay, Ryan Crichton, S. Singh, Deshen Moodley}, title = {14 Enterprise Architectures for Digital Health}, abstract = {• Several different paradigms and standards exist for creating digital health architectures that are mostly complementary, but sometimes contradictory. • The potential benefits of using EA approaches and tools are that they help to ensure the appropriate use of standards for interoperability and data storage and exchange, and encourage the creation of reusable software components and metadata.}, year = {2017}, journal = {Global Health Informatics: Principles of eHealth and mHealth to Improve Quality of Care}, pages = {173-182}, publisher = {MIT Press}, isbn = {978-0262533201}, url = {https://books.google.co.za/books?id=8p-rDgAAQBAJ&pg=PA173&lpg=PA173&dq=14+Enterprise+Architectures+for+Digital+Health&source=bl&ots=i6SQzaXiPp&sig=zDLJ6lIqt3Xox3Lt5LNCuMkUoJ4&hl=en&sa=X&ved=0ahUKEwivtK6jxPDYAhVkL8AKHXbNDY0Q6AEINDAB#v=onepage&q=14%20Enterp}, }
Poor adherence to prescribed treatment is a major factor contributing to tuberculosis patients developing drug resistance and failing treatment. Treatment adherence behaviour is influenced by diverse personal, cultural and socio-economic factors that vary between regions and communities. Decision network models can potentially be used to predict treatment adherence behaviour. However, determining the network structure (identifying the factors and their causal relations) and the conditional probabilities is a challenging task. To resolve the former we developed an ontology supported by current scientific literature to categorise and clarify the similarity and granularity of factors
@{158, author = {Olukunle Ogundele, Deshen Moodley, Chris Seebregts, Anban Pillay}, title = {Building Semantic Causal Models to Predict Treatment Adherence for Tuberculosis Patients in Sub-Saharan Africa}, abstract = {Poor adherence to prescribed treatment is a major factor contributing to tuberculosis patients developing drug resistance and failing treatment. Treatment adherence behaviour is influenced by diverse personal, cultural and socio-economic factors that vary between regions and communities. Decision network models can potentially be used to predict treatment adherence behaviour. However, determining the network structure (identifying the factors and their causal relations) and the conditional probabilities is a challenging task. To resolve the former we developed an ontology supported by current scientific literature to categorise and clarify the similarity and granularity of factors}, year = {2014}, journal = {4th International Symposium (FHIES 2014) and 6th International Workshop (SEHC 2014)}, pages = {81-95}, month = {17/07-18/07}, }
The development of drug resistance is a major factor imped- ing the efficacy of antiretroviral treatment of South Africa’s HIV infected population. While genotype resistance testing is the standard method to determine resistance, access to these tests is limited in low-resource set- tings. In this paper we investigate machine learning techniques for drug resistance prediction from routine treatment and laboratory data to help clinicians select patients for confirmatory genotype testing. The tech- niques, including binary relevance, HOMER, MLkNN, predictive clus- tering trees (PCT), RAkEL and ensemble of classifier chains were tested on a dataset of 252 medical records of patients enrolled in an HIV treat- ment failure clinic in rural KwaZulu-Natal in South Africa. The PCT method performed best with a discriminant power of 1.56 for two drugs, above 1.0 for three others and a mean true positive rate of 0.68. These methods show potential for application where access to genotyping is limited.
@{97, author = {Pascal Brandt, Deshen Moodley, Anban Pillay, Chris Seebregts, T. de Oliveira}, title = {An Investigation of Classification Algorithms for Predicting HIV Drug Resistance Without Genotype Resistance Testing}, abstract = {The development of drug resistance is a major factor imped- ing the efficacy of antiretroviral treatment of South Africa’s HIV infected population. While genotype resistance testing is the standard method to determine resistance, access to these tests is limited in low-resource set- tings. In this paper we investigate machine learning techniques for drug resistance prediction from routine treatment and laboratory data to help clinicians select patients for confirmatory genotype testing. The tech- niques, including binary relevance, HOMER, MLkNN, predictive clus- tering trees (PCT), RAkEL and ensemble of classifier chains were tested on a dataset of 252 medical records of patients enrolled in an HIV treat- ment failure clinic in rural KwaZulu-Natal in South Africa. The PCT method performed best with a discriminant power of 1.56 for two drugs, above 1.0 for three others and a mean true positive rate of 0.68. These methods show potential for application where access to genotyping is limited.}, year = {2014}, journal = {Third International Symposium on Foundations of Health Information Engineering and Systems}, pages = {236-253}, month = {21/08-23/08}, isbn = {978-3-642-53955-8}, url = {http://link.springer.com/chapter/10.1007/978-3-642-53956-5_16}, }
eHealth governance and regulation are necessary in low resource African countries to ensure effective and equitable use of health information technology and to realize national eHealth goals such as interoperability, adoption of standards and data integration. eHealth regulatory frameworks are under-developed in low resource settings, which hampers the progression towards coherent and effective national health information systems. Ontologies have the potential to clarify issues around interoperability and the effectiveness of different standards to deal with different aspects of interoperability. Ontologies can facilitate drafting, reusing, implementing and compliance testing of eHealth regulations. In this regard, we have developed an OWL ontology to capture key concepts and relations concerning interoperability and standards. The ontology includes an operational definition for interoperability and is an initial step towards the development of a knowledge representation modeling platform for eHealth regulation and governance.
@{92, author = {Deshen Moodley, Chris Seebregts, Anban Pillay, Tommie Meyer}, title = {An Ontology for Regulating eHealth Interoperability in Developing African Countries}, abstract = {eHealth governance and regulation are necessary in low resource African countries to ensure effective and equitable use of health information technology and to realize national eHealth goals such as interoperability, adoption of standards and data integration. eHealth regulatory frameworks are under-developed in low resource settings, which hampers the progression towards coherent and effective national health information systems. Ontologies have the potential to clarify issues around interoperability and the effectiveness of different standards to deal with different aspects of interoperability. Ontologies can facilitate drafting, reusing, implementing and compliance testing of eHealth regulations. In this regard, we have developed an OWL ontology to capture key concepts and relations concerning interoperability and standards. The ontology includes an operational definition for interoperability and is an initial step towards the development of a knowledge representation modeling platform for eHealth regulation and governance.}, year = {2014}, journal = {Foundations of Health Information Engineering and Systems, Revised and Selected Papers, Lecture Notes in Computer Science Volume 7789}, pages = {107-124}, month = {15/09}, isbn = {978-3-642-53955-8}, }
FULL DETAIL OF TALKS:
1) Segun, S.T. (4 September 2019). “Making Afro-ethics Computational”. Paper presented at 2019 SAHUDA Conference, Themed, Time, Thought, Materiality: Africa and the Fourth Industrial Revolution, University of Johannesburg, Johannesburg, South Africa;
2) Segun, S.T. (9 November 2018). “Conditions for an Explicit Ethical Agent”. Paper presented at the Forum for Artificial Intelligence Research (FAIR 2018). Hermanus, South Africa;
3) Segun, S.T. (26 October 2018). “Teaching Machines to Be Moral”. Paper presented as part of the series of academic public lectures tagged University of Johannesburg Talks. Johannesburg, South Africa;
4) Segun, S.T. (20 October 2018). “Rethinking Privilege and Otherness”. Paper presented at the Postgraduate Philosophical Society Conference. University of Pretoria, South Africa;
5) Segun, S.T. (11 May 2018) “Constructing an Afro-ethical Framework for Autonomous Intelligence Systems”. Paper presented at the AI for Social Good Working Group. University of Pretoria, South Africa;
6) Segun, S.T. (20 October 2018). “Rethinking Privilege and Otherness”. Paper presented at the Postgraduate Philosophical Society Conference. University of Pretoria, South Africa;
7) Segun, S.T. (11 May 2018) “Constructing an Afro-ethical Framework for Autonomous Intelligence Systems”. Paper presented at the AI for Social Good Working Group. University of Pretoria, South Africa;
8) Segun, S.T. (14 August 2017). “Is There a Language of Philosophy?” Paper presented at the Contemporary Language, Logic, and Metaphysics Conference (CLLMC). University of Witwatersrand, South Africa.
FULL DETAIL OF ANY PUBLICATION:
BOOK CHAPTER:
1) Segun, S. T. (2019). Neurophilosophy and the Problem of Consciousness: An Equiphenomenalist Perspective. In New Conversations on the Problems of Identity, Consciousness and Mind. J.O. Chimakonam, U. Egbai, S.T. Segun, A.D. Attoe (pp. 33-65). Springer, Cham.
PEER-REVIEWED/REFEREED JOURNAL:
1) Segun, S.T. (2020). From Machine Ethics to Computational Ethics. 'AI & Society: Knowledge, Culture and Communication'. Karamjit S. Gill (ed). Springer [Scopus Accredited].
IN PROGRESS:
1) Segun, S.T. (2020). Computational Possibilities of Non-classical Ethics.
LINKS TO RESEARCH:
Research gate: https://www.researchgate.net/profile/Samuel_Segun2;
Google Scholar: https://scholar.google.com/citations?user=p33vnrcAAAAJ&hl=en;
Academia.edu: https://johannesburg.academia.edu/samuelsegun;
Orcid: https://orcid.org/0000-0002-1017-0906.
Latest Research Publications:
Latest Research Publications:
Latest Research Publications:
Latest Research Publications:
Latest Research Publications:
Latest Research Publications:
When training neural networks as classifiers, it is common to observe an increase in average test loss while still maintaining or improving the overall classification accuracy on the same dataset. In spite of the ubiquity of this phenomenon, it has not been well studied and is often dismissively attributed to an increase in borderline correct classifications. We present an empirical investigation that shows how this phenomenon is actually a result of the differential manner by which test samples are processed. In essence: test loss does not increase overall, but only for a small minority of samples. Large representational capacities allow losses to decrease for the vast majority of test samples at the cost of extreme increases for others. This effect seems to be mainly caused by increased parameter values relating to the correctly processed sample features. Our findings contribute to the practical understanding of a common behaviour of deep neural networks. We also discuss the implications of this work for network optimisation and generalisation.
@article{484, author = {Arthur Venter, Marthinus Theunissen, Marelie Davel}, title = {Pre-interpolation loss behaviour in neural networks}, abstract = {When training neural networks as classifiers, it is common to observe an increase in average test loss while still maintaining or improving the overall classification accuracy on the same dataset. In spite of the ubiquity of this phenomenon, it has not been well studied and is often dismissively attributed to an increase in borderline correct classifications. We present an empirical investigation that shows how this phenomenon is actually a result of the differential manner by which test samples are processed. In essence: test loss does not increase overall, but only for a small minority of samples. Large representational capacities allow losses to decrease for the vast majority of test samples at the cost of extreme increases for others. This effect seems to be mainly caused by increased parameter values relating to the correctly processed sample features. Our findings contribute to the practical understanding of a common behaviour of deep neural networks. We also discuss the implications of this work for network optimisation and generalisation.}, year = {2020}, journal = {Communications in Computer and Information Science}, volume = {1342}, pages = {296-309}, publisher = {Southern African Conference for Artificial Intelligence Research}, address = {South Africa}, isbn = {978-3-030-66151-9}, doi = {https://doi.org/10.1007/978-3-030-66151-9_19}, }
The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance trade off in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework.We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically,we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally,we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.
@article{394, author = {Marthinus Theunissen, Marelie Davel, Etienne Barnard}, title = {Benign interpolation of noise in deep learning}, abstract = {The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance trade off in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework.We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically,we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally,we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.}, year = {2020}, journal = {South African Computer Journal}, volume = {32}, pages = {80-101}, issue = {2}, publisher = {South African Institute of Computer Scientists and Information Technologists}, isbn = {ISSN: 1015-7999; E:2313-7835}, doi = {https://doi.org/10.18489/sacj.v32i2.833}, }
A robust theoretical framework that can describe and predict the generalization ability of deep neural networks (DNNs) in general circumstances remains elusive. Classical attempts have produced complexity metrics that rely heavily on global measures of compactness and capacity with little investigation into the effects of sub-component collaboration. We demonstrate intriguing regularities in the activation patterns of the hidden nodes within fully-connected feedforward networks. By tracing the origin of these patterns, we show how such networks can be viewed as the combination of two information processing systems: one continuous and one discrete. We describe how these two systems arise naturally from the gradient-based optimization process, and demonstrate the classification ability of the two systems, individually and in collaboration. This perspective on DNN classification offers a novel way to think about generalization, in which different subsets of the training data are used to train distinct classifiers; those classifiers are then combined to perform the classification task, and their consistency is crucial for accurate classification.
@{236, author = {Marelie Davel, Marthinus Theunissen, Arnold Pretorius, Etienne Barnard}, title = {DNNs as layers of cooperating classifiers}, abstract = {A robust theoretical framework that can describe and predict the generalization ability of deep neural networks (DNNs) in general circumstances remains elusive. Classical attempts have produced complexity metrics that rely heavily on global measures of compactness and capacity with little investigation into the effects of sub-component collaboration. We demonstrate intriguing regularities in the activation patterns of the hidden nodes within fully-connected feedforward networks. By tracing the origin of these patterns, we show how such networks can be viewed as the combination of two information processing systems: one continuous and one discrete. We describe how these two systems arise naturally from the gradient-based optimization process, and demonstrate the classification ability of the two systems, individually and in collaboration. This perspective on DNN classification offers a novel way to think about generalization, in which different subsets of the training data are used to train distinct classifiers; those classifiers are then combined to perform the classification task, and their consistency is crucial for accurate classification.}, year = {2020}, journal = {The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)}, pages = {3725 - 3732}, month = {07/02-12/02/2020}, address = {New York}, }
The understanding of generalization in machine learning is in a state of flux. This is partly due to the elatively recent revelation that deep learning models are able to completely memorize training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about generalization. The phenomenon was brought to light and discussed in a seminal paper by Zhang et al. [24]. We expand upon this work by discussing local attributes of neural network training within the context of a relatively simple and generalizable framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the global deep learning model to generalize in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterized multilayer perceptrons and controlled noise in the training data. The main insights are that deep learning models are optimized for training data modularly, with different regions in the function space dedicated to fitting distinct kinds of sample information. Detrimental overfitting is largely prevented by the fact that different regions in the function space are used for prediction based on the similarity between new input data and that which has been optimized for.
@{284, author = {Marthinus Theunissen, Marelie Davel, Etienne Barnard}, title = {Insights regarding overfitting on noise in deep learning}, abstract = {The understanding of generalization in machine learning is in a state of flux. This is partly due to the elatively recent revelation that deep learning models are able to completely memorize training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about generalization. The phenomenon was brought to light and discussed in a seminal paper by Zhang et al. [24]. We expand upon this work by discussing local attributes of neural network training within the context of a relatively simple and generalizable framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the global deep learning model to generalize in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterized multilayer perceptrons and controlled noise in the training data. The main insights are that deep learning models are optimized for training data modularly, with different regions in the function space dedicated to fitting distinct kinds of sample information. Detrimental overfitting is largely prevented by the fact that different regions in the function space are used for prediction based on the similarity between new input data and that which has been optimized for.}, year = {2019}, journal = {South African Forum for Artificial Intelligence Research (FAIR)}, pages = {49-63}, address = {Cape Town, South Africa}, }
Latest Research Publications: