from discrimination discovery to fairness-aware data mining

A Tutorial at KDD'16

Welcome to the mini-website on the tutorial titled Algorithmic bias: from discrimination discovery to fairness-aware data mining, which will take place at KDD'16 in San Francisco, California

Abstract

Algorithms and decision making based on Big Data have become pervasive in all aspects of our daily (offline and online) lives, as they have become essential tools in personal finance, health care, hiring, housing, education, and policies. Data and algorithms determine the media we consume, the stories we read, the people we meet, the places we visit, but also whether we get a job, or whether our loan request is approved. It is therefore of societal and ethical importance to ask whether these algorithms can be discriminative on grounds, such as gender, ethnicity, marital or health status. It turns out that the answer is positive: for instance, recent studies have shown that Google's online advertising system displayed ads for high-income jobs to men much more often than it did to women; and ads for arrest records were significantly more likely to show up on searches for distinctively black names or a historically black fraternity.

This algorithmic bias exists even when there is no discrimination intention in the developer of the algorithm. Sometimes it may be inherent to the data sources used (software making decisions based on data can reflect, or even amplify, the results of historical discrimination), but even when the sensitive attributes have been suppressed from the input, a well trained machine learning algorithm may still discriminate on the basis of such sensitive attributes because of correlations existing in the data. One approach is to develop data mining systems which are discrimination-conscious by-design. This is a novel and challenging research area for the data mining community.

The aim of this tutorial is to survey the different aspects of the algorithmic bias problem, presenting its most common variants, with an emphasis on the algorithmic techniques and key ideas developed to derive efficient solutions. The tutorial will cover two main complementary approaches: algorithms for discrimination discovery and discrimination prevention by means of fairness-aware data mining. We will conclude by summarizing the most promising paths for future research. .

More details on the material we will cover are available in the 7-pages outline.

Videos

Part I: Introduction and Context

Part II: Discrimination Discovery

Parts III: Fairness-Aware Data Mining and IV: Challenges and Direction for Future Research

Slides

Slides of part I and II

Slides of part III and IV

Instructors

Sara Hajian

Sara Hajian is a research scientist at Eurecat Technology Center, Barcelona, Spain. She received her Ph.D. degree from Computer Engineering and Maths Department of the Universitat Rovira i Virgili (URV) in June 2013. She received her M.Sc. degree in Computer Science from Iran University of Science and Technology (IUST) in 2008. She also had been a member of APA-IUTcert, an academic research and development center in the area of Network Security Vulnerabilities and Incident Handling (2008-2010). Her research interests are data mining methods and algorithms, social media and social network analysis, privacy-preserving data mining and publishing, and algorithmic bias (discovery and prevention of discrimination). She has been a visiting student at the Knowledge Discovery and Data Mining Laboratory (KDD-Lab), a joint research group of the Information Science and Technology Institute of the Italian National Research Council (CNR) in Pisa and the Computer Science Department of the University of Pisa (2011). She has been a visiting scientist at Yahoo! Labs in Barcelona (2013-2014).

Francesco Bonchi

Francesco Bonchi is Research Leader at the ISI Foundation, Turin, Italy, where he leads the ``Algorithmic Data Analytics" group. Before he was Director of Research at Yahoo Labs in Barcelona, where he was leading the Web Mining Research group. He will be PC Chair of the 16th IEEE International Conference on Data Mining (ICDM 2016) to be held in Barcelona in December 2016. He is member of the ECML PKDD Steering Committee, Associate Editor of the newly created IEEE Transactions on Big Data (TBD), of the IEEE Transactions on Knowledge and Data Engineering (TKDE), the ACM Transactions on Intelligent Systems and Technology (TIST), Knowledge and Information Systems (KAIS), and member of the Editorial Board of Data Mining and Knowledge Discovery (DMKD). He has been program co-chair of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2010). Dr. Bonchi has also served as program co-chair of the first and second ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD (PinKDD 2007 and 2008), the 1st IEEE International Workshop on Privacy Aspects of Data Mining (PADM 2006), and the 4th International Workshop on Knowledge Discovery in Inductive Databases (KDID 2005). He is co-editor of the book ``Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques" (Chapman \& Hall/CRC Press).

Twitter: @FrancescoBonchi

Carlos Castillo

Carlos Castillo is the Director of Research for Data Mining at Eurecat. His current research focuses in social computing, particularly the application of web mining methods to social media during disasters and humanitarian crises. Carlos is an active researcher with more than 70 papers in top-tier international conferences and journals, including an upcoming book on Big Crisis Data, a book on Information and Influence Propagation, and a monograph on Adversarial Web Search. Carlos received his Ph.D from the University of Chile, and was a senior scientist at Yahoo! Research, and a principal scientist at Qatar Computing Research Institute. He has served in the PC or SPC of all major conferences in his area (WWW, WSDM, KDD, SIGIR, CIKM) and is part of the editorial committee of ACM Transactions on the Web and Internet Research. He is Program Committee Co-Chair of ACM Digital Health 2016, and was Program Committee Co-chair of WSDM 2014, co-organized the Adversarial Information Retrieval Workshop and Web Spam Challenge in 2007 and 2008, the ECML/PKDD Discovery Challenge in 2010 and 2014, the Web Quality Workshop from 2011 to 2014, and the Social Web for Disaster Management Workshop in 2015. .

Twitter: @ChaToX