Home
Program
Invited Speakers
Posters
- 2024 Accepted Posters
- Winners of 2024
Registration
Sponsors’ Announcements
Organizers
- Steering Committee
- 2024 Organizing Committee
Contact
SEMLA Series

RT

SEMLA 2019 Registration are now closed.

Software Engineering for Machine Learning Application is proudly powered by WordPress

Algorithmic Fairness: From Traditional ML to Foundation Models

Abstract: The emergence of foundation models presents significant opportunities for advancing AI, yet it also brings forth considerable challenges, particularly in relation to existing risks and inequalities. In this talk, we focus on the complexities surrounding algorithmic fairness in the context of foundation models. We argue that disparities faced by marginalized communities – encompassing issues of performance, representation, privacy, robustness, interpretability, and safety – are not isolated concerns but interconnected elements contributing to a cascade of disparities. Through a comparative analysis with traditional ML models, we highlight the potential for exacerbating disparities against marginalized groups. We conclude the talk with future directions towards responsible development of AI.

Golnoosh Farnadi (McGill University, Canada) is an Assistant Professor at McGill University’s School of Computer Science, an Adjunct Professor at Université de Montréal, and a visiting researcher at Google. She’s a key member of MILA, holds a Canada CIFAR AI Chair, co-directs McGill’s McCAIS, and leads the EQUAL lab at Mila/McGill, focusing on algorithmic fairness and responsible AI. Previously, she was at HEC Montréal and held postdoctoral positions at Université de Montréal/MILA and the University of California, Santa Cruz. Golnoosh completed her Ph.D. at KU Leuven and Ghent University in 2017, with research visits to several prestigious institutions. She has received awards from Google and Facebook, recognized as a Rising Star in AI Ethics, and in 2023, won a Google inclusion research award and was acknowledged as a leading woman in AI Ethics. More information is available at: Link to Website

Responsibility and Rigor in AI Research and Practice

Abstract: The Responsible AI (RAI) field has seen a rapid growth over the last decade, primarily driven by growing concerns about how computational systems can exacerbate, replicate, or give rise to harms. To address these concerns, a core goal of RAI research has been incentivizing and supporting more rigor in AI research and practice (e.g., better documentation, better evaluation practices, better development practices, better understanding of impacts, clarifying problems and terminology). In this talk, I will overview foundational concepts, problem formulations, and challenges to evaluating AI systems for both performance and possible harms they might give rise to.

Alexandra Olteanu (Microsoft Research, Canada) is a researcher in computational social science and social computing, currently serving as a Principal Researcher in the Fairness, Accountability, Transparency, and Ethics (FATE) Group. Before joining the FATE group, she was a Social Good Fellow at the IBM T.J. Watson Research Center in New York. Her research focuses on the impact of data biases and methodological limitations on our understanding of online social traces, aiming to develop safer, fairer, and less biased systems. She addresses societal challenges like hate speech, racial discrimination, climate change, and disaster relief. Her work has earned her two best paper awards (WISE 2014 and Eurosys’ SNS workshop 2012) and has been highlighted in the UN OCHA’s “World Humanitarian Data and Trends” as well as in media outlets such as The Washington Post, VentureBeat, and ZDNet.

Alexandra has contributed to the program committees of major social media and web conferences, including ICWSM, WWW, WebSci, CIKM, and SIGIR. She also serves on the steering committee of the ACM Conference on Fairness, Accountability, and Transparency (FAT*) and has been the Tutorial Co-chair for ICWSM 2018 and FAT* 2019. She earned her PhD from École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland in 2016, and her professional experience spans academic institutions and research labs in five different countries. More information is available at: Link to Website

Mutation for Assessing and Optimising AI Systems

Abstract: Traditional Mutation Testing in software testing applies a fixed set of mutation operators to generate mutants for the purpose of test assessment. However, the power of mutants extends significantly beyond mere test evaluation. In this talk, I will share my experiences in exploring the power the mutants in testing and improving AI systems, such as using mutants for improving machine translation, validating ML classifiers, as well as benchmarking and improving AI unfairness.

Jie M. Zhang (King’s College London, England) is a lecturer (assistant professor) of computer science at King’s College London, UK. Before joining King’s she was a Research Fellow at University College London and a research consultant for Meta. She got her PHD degree at Peking University in 2018. Her main research interests are software testing, software engineering and AI/LLMs, and AI trustworthiness. She has published many papers in top-tier venues including ICLR, ICSE, FSE, ASE, ISSTA, TSE, and TOSEM. She is a steering committee member of ICST and AIware. She is a Program co-chair of AIware 2024, Internetware 2024, ASE 2023 NIER track, SANER 2023 Journal-First Track, PRDC 2023 Fast Abstract Track, SBST 2021, Mutation 2021&2020, and ASE 2019 Student Research Competition. Over the last three years, she has been invited to give over 20 talks at conferences, universities, and IT companies, including four keynote talks. She has also been invited as a panelist for several seminars on large language models. She has been selected as the top-fifteen 2023 Global Chinese Female Young Scholars in interdisciplinary AI. Her research has won the 2022 Transactions on Software Engineering Best Paper award and the ICLR 2022 spotlight paper award. More information is available at: Link to Website

Towards Crystal-clear Code LLMs: StarCoder2 and Magicoder

Abstract: Proprietary Large Language Models (LLMs), such as GPT-4 and Claude-3, have shown strong performance in various programming tasks. To circumvent the privacy and cost concerns associated with using proprietary LLMs, a number of open-source code LLMs (such as Code Llama and DeepSeek Coder) have also been developed and gained widespread adoption. Meanwhile, most of such code LLMs only open-sourced the model weights, while the datasets/pipelines used for pre-training or instruction tuning remain undisclosed. This opacity can significantly impede the advancement of code LLMs, affecting model replication, enhancement, interpretation, decontamination, quality assurance, and other critical areas. In this presentation, I will talk about our recent work on fully transparent pre-training and instruction tuning of code LLMs (including StarCoder2 and Magicoder), with every bit of our technical details, model weights, and training datasets publicly available. Lastly, I will also briefly discuss the promising future of leveraging LLMs to address important and challenging software engineering problems.

Lingming Zhang (University of Illinois at Urbana-Champaign, USA) is an Associate Professor in the Department of Computer Science at the University of Illinois Urbana-Champaign. His main research interests lie in Software Engineering and Programming Languages, as well as their synergy with Machine Learning, aiming to improve developer productivity and software quality. For example, his work has found over 1000 new bugs/vulnerabilities in critical software systems, including deep learning libraries/systems, C/C++ compilers, Java virtual machines, operating systems, and quantum computing systems. He is the recipient of ACM SIGSOFT Early Career Researcher Award, NSF CAREER Award, and UIUC Dean’s Award for Excellence in Research. Additionally, he has received multiple distinguished/best paper awards and various awards/grants from leading tech companies such as Alibaba, Amazon, Google, Kwai Inc., Meta, NVIDIA, and Samsung. He currently serves as program co-chair for ASE 2025 and LLM4Code 2024, and associate chair for OOPSLA 2024. More information is available at: Link to Website

AIware: Re-thinking software and software engineering in the foundation model era

Abstract: Software for all and by all” is the future of humanity. AIware, i.e., AI-powered software, is democratizing software creation. We must reimagine software and software engineering (SE), enabling individuals of all backgrounds to participate in its creation with higher reliability and quality while leveraging Foundation models (FMs), such as Large Language Models (LLMs).

The unique properties of FM applications (FMware) , like prompts and agents, coupled with the intrinsic limitations of FMs (e.g., hallucination) bring completely new set of software engineering challenges. In this talk I will discuss several key challenges that have caused enterprise FMware development to be unproductive, costly, and risky. I will also outline avenues of future research and innovation. More info at: https://arxiv.org/abs/2402.15943 and
https://arxiv.org/abs/2404.10225

Ahmed E. Hassan (Queen’s, Canada) is an ACM Fellow, IEEE fellow, and an NSERC Steacie fellow. He is a laureate of the Mustafa Prize, a distinction equivalent in prestige to a Nobel, for founding the AI-augmented SE field and its associated Mining Software Repositories (MSR) conference. He is a Distinguished Educator of IEEE TCSE and an Influential Educator of ACM SIGSOFT. He is a recipient of the inaugural New Directions IEEE TCSE Award for his outstanding and sustained efforts in advancing the AI-augmented SE field. He is the Canada Research Chair (CRC) in Software Analytics, and the NSERC/BlackBerry Software Engineering Chair at the School of Computing at Queen’s University, Canada. More information is available at: Link to Website

Bimodality in Software

Abstract: Every day, there’s a new LLM, at a bigger size scale, with a new training method, and potentially high-water-mark SOTA performance on an ever-growing set of coding tasks. While innovations pour fourth, many of the underlying ideas seem rather isolated, and sometimes even a bit arbitrary. But are there any general principles on how to use LLMs? We argue one, peculiar to software-related LMs: arising from the Bimodality of Software, wherein software is both formal (with well-defined computational semantics) and natural (written in repetitive ways that are easy for humans to read and write). We show how bimodality is an appealing yet powerful principle that has remained durably useful, over 3 generations of language models, and different training approaches, including current practices of in-context learning.

Prem Devanbu (UC Davis, USA) is Distinguished Professor of Computer Science at UC Davis. He received his B.Tech from IIT Madras, and his Ph.D from Rutgers University under Alex Borgida. After a career in Industrial R&D at Bell Labs in New Jersey, he joined UC Davis. He has worked in several areas, including Software tools, Secure Data Outsourcing, Empirical Software Engineering, and the Naturalness of Software, and applications thereof. Five of his papers (in MSR 2006, MSR 2009, ESEC/FSE 2008, ESEC/FSE 2009, ESEC/FSE 2011) have won “test-of-time” or “10 year most influential paper” awards. He also won the ACM SIGSOFT Outstanding Research Award in 2021, “for profoundly changing the way researchers think about software by exploring connections between source code and natural language”. He is an ACM Fellow. More information is available at: Link to Website

Towards an Interpretable Science of Deep Learning for Software Engineering: A Causal Inference View

Abstract: Neural Code Models (NCMs) are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations of such models is becoming critical. However, the abilities of these models are typically measured using automated metrics that often only reveal a portion of their real-world performance. While, in general, the performance of NCMs appears promising, currently much is unknown about how such models arrive at decisions or whether practitioners trust NCMs’ outcomes. In this talk, I will introduce doCode, a post hoc interpretability framework specific to NCMs that can explain model predictions. doCode is based upon causal inference to enable programming language-oriented explanations. While the theoretical underpinnings of doCode are extensible to exploring different model properties, we provide a concrete instantiation that aims to mitigate the impact of spurious correlations by grounding explanations of model behavior in properties of programming languages. doCode can generate causal explanations based on Abstract Syntax Tree information and software engineering-based interventions. To demonstrate the practical benefit of doCode, I will present empirical results of using doCode for detecting confounding bias in NCMs.

Denys Poshyvanyk (William & Mary, USA) is a Chancellor Professor and a Graduate Director in the Computer Science Department at William & Mary. He currently serves as a Guest Editor-in-Chief of the AI-SE Continuous Special Section at the ACM Transactions on Software Engineering and Methodology (TOSEM) and a Program Co-Chair for FSE’25. He is a recipient of multiple ACM SIGSOFT Distinguished paper awards, the NSF CAREER award (2013). He is an IEEE Fellow and an ACM distinguished member. More information is available at: Link to Website

Tackling NL-to-SQL's End-to-End Cross-Cutting Issues: Strategic Preprocessing and Optimization

Abstract: For LLM-powered analytics, such as NL-to-SQL, instability arises as models are required to make large reasoning leaps. Such leaps are often required, as examples either fail to fully capture the complexity of the request or inadequately cover the breadth of the problem space. To address complexity, we propose an approach centered around the pre-processing of demonstrations into decompositions. To address coverage, we propose an optimization framework that aims to identify and fill gaps in the problem space. This presentation will explore how leveraging these approaches enhances the accuracy, efficiency, and scalability of LLM-powered analytical solutions, ensuring their broad applicability with reduced manual intervention. Throughout the talk, I will detail the hardness of the problem, the expectations of robustness by our customers, where current models fail, and highlight a specific approach we take for one of the analytic tasks, namely NL-to-SQL.

Karime Maamari (Distyl AI, USA is a Research Engineer at Distyl AI, where he specializes in building LLM-powered structured querying and optimization systems for Fortune 500 customers. Prior to that, Karime was a Research Associate at Argonne National Laboratory, where he worked on synthetic data generation and a Simulation Engineer at the NASA Langley Research Center, where he worked on simulation development and analysis. Karime holds a B.S in Physics from the University of Southern California.

Efficacy, Efficiency, and Security of Code LLMs: Promises and Perils

Abstract: For decades, researchers have explored methods to automate software engineering (ASE) tasks. In recent years, we have been excited about the potential of code Large Language Models (code LLMs) for ASE tasks. This talk will discuss recent work done at the Center for Research in Intelligent Software Engineering (RISE) on code LLMs. Specifically, it will briefly introduce ‘VulMaster,’ which doubles the efficacy of vulnerability repair—a crucial and challenging task for software engineers—through LLM collaboration and data-centric innovations. It will also discuss ‘Avatar,’ designed to improve the efficiency of code LLMs, reducing the model size by 160× and significantly decreasing energy consumption (up to 184× less), carbon footprint (up to 157× less), and inference latency (up to 76× faster), with only a negligible loss in effectiveness (1.67% on average). Lastly, the talk will present ‘AFRAIDOOR,’ a stealthy backdoor attack on code models that can achieve a 7x higher success rate than state-of-the-art approaches after defense. The talk will conclude with a brief description of future work and open challenges that require effective industry-academia collaboration.

David Lo ((Singapore Management University, Singapore)) is the OUB Chair Professor of Computer Science and Director of the Center for Research in Intelligent Software Engineering (RISE) at Singapore Management University. Championing the area of AI for Software Engineering (AI4SE) since the mid-2000s, he has demonstrated how AI — encompassing data mining, machine learning, information retrieval, natural language processing, and search-based algorithms — can transform software engineering data into actionable insights and automation. Through surveys and interviews, he has identified practitioners’ pain points and explored the acceptance thresholds for AI-powered tools, effectively performing requirements engineering for AI4SE research. His contributions have led to over 20 awards, including two Test-of-Time awards and ten ACM/IEEE Distinguished Paper awards, accumulating more than 30k citations. An ACM Fellow, IEEE Fellow, ASE Fellow, and National Research Foundation Investigator (Senior Fellow), Lo has also served as a PC Co-Chair for ASE’20, FSE’24, and ICSE’25. More information is available at: Link to Website

Tutorial: Debugging Deep Reinforcement Learning

Abstract:
The DRLDebugger is a tool designed to help Deep Reinforcement Learning (DRL) programmers in identifying and fixing bugs in their code. The tool uses heuristics, rules, and data to identify potential issues during training. This tutorial is an initiation into DRL, during which we’ll try to debug a DRL algorithm containing real bugs. Don’t forget to bring your laptop and your questions.

https://github.com/ahmedhajyahmed/SEMLA-Tutorial-Debugging-Deep-Reinforcement-Learning

Ahmed Haj Yahmed is currently a Master’s student in software engineering at Polytechnique Montreal, Canada, and a member of the SWAT Lab, Department of Computer and Software Engineering. He is also a Mila student and a SE4AI CREATE trainee, a Training Program on the Development, Deployment, and Servicing of Artificial Intelligence-based Software Systems. He earned his Engineer Degree in Software Engineering majoring in Data Science with honor in 2021 from the National Institute of Applied Science and Technology, affiliated with the University of Carthage, Tunisia. His research is primarily focused on Quality Assurance of Deep Learning-based Software Systems, Software Engineering for ML/AI, and dependable and trustworthy ML/AI.

Rached Bouchoucha is currently pursuing his master’s degree at Polytechnique Montreal, while also being a MILA student. He is a member of the SWAT lab and works under the supervision of Professor Foutse Khomh. He holds a Software Engineering degree with a specialization in data science from the National Institute of Applied Science and Technology. Additionally, he has obtained a master’s degree in Informatics from the National Engineering School of Tunis. His current researches revolve around the development of advanced tools for quality assurance in Deep Learning, with a specific emphasis on Deep Reinforcement Learning.

Leveraging LLMs as Analogical Reasoning Engines to enhance Programming-by-Example experiences

Abstract: GPT4 excels with its emergent capability of analogical reasoning, but generating programs from examples is not its best seasoning. Python programs it can generate for many cases, but with non-trivial workflows in some places. GPT4 excels with its emergent capability of analogical reasoning, it can generate additional examples and explanations which is very appealing. This can amplify Flash Fill’s programming might,
while also enabling semantic transformations to take flight.

Bio: Sumit Gulwani is a computer scientist connecting ideas, people, research & practice. He invented the popular Flash Fill feature in Excel, which has now also found its place in middle-school computing textbooks. He leads the PROSE research and engineering team at Microsoft that develops APIs for program synthesis. He has incorporated them into various Microsoft products, including Visual Studio, Office, Notebooks, PowerQuery, PowerApps, PowerAutomate, Powershell, and SQL. He is a sponsor of storytelling training and initiatives within Microsoft. He has started a novel research fellowship program in India, a remote apprenticeship model to scale impact while nurturing globally diverse talent and growing research leaders. He has co-authored 11 award-winning papers (including three test-of-time awards from ICSE and POPL) amongst 150+ research publications across multiple computer science areas and delivered 65+ keynotes/invited talks. He was awarded the Max Planck-Humboldt medal in 2021 and the ACM SIGPLAN Robin Milner Young Researcher Award in 2014 for his pioneering contributions to program synthesis and intelligent tutoring systems. He obtained his Ph.D. in Computer Science from UC Berkeley and was awarded the ACM SIGPLAN Outstanding Doctoral Dissertation Award. He received his BTech in Computer Science and Engineering from IIT Kanpur and was awarded the President’s, Gold Medal.

Towards Interpretable and Bias-free Large Language Models

Abstract:
In this talk, I will focus on two major challenges in designing trustworthy large language models (LLMs): Interpretability and Fairness. In the first half of the talk, I will talk about how existing interpretability methods for language models cannot provide faithful explanations and propose a new fine-tuning technique to design language models that can provide faithful explanations. In the second half of the talk, I will focus on fairness and I will present our recent works on improving the fairness of LLMs by using techniques such as dataset augmentation and attention modulation.

Sarath Chandar is an Assistant Professor at Polytechnique Montreal where he leads the Chandar Research Lab. He is also a core faculty member at Mila, the Quebec AI Institute and holds a Canada CIFAR AI Chair. His research interests include lifelong learning, deep learning, reinforcement learning, and natural language processing. He received his PhD from the University of Montreal where he worked with Yoshua Bengio and Hugo Larochelle. For more information about the speaker, please visit http://sarathchandar.in/.

BigCode: Open and Responsible Development of Large Language Models for Code.

Raymond Li is a Research engineer at ServiceNow Research, where he is dedicated to improving the naturalness of interactions between humans and machines through language-based interfaces. At ServiceNow Research, his work has focused on various NLP topics such as summarization and text-to-SQL. He now primarily works on Large Language Models for code. Raymond’s background is in Applied Mathematics and Computer Science. He holds a Master’s and engineering degree from École polytechnique as well as a MSc. in Computer Science from Polytechnique Montreal where he was supervised by Christopher Pal and Laurent Charlin.

Software Engineering and Foundation Models: Software 4.0 and AI teammates

Abstract:
Foundation Models (FM), commonly referred to as Large Language Models, are rapidly emerging as one of the most distributive technologies in the past decade. The early success of GitHub co-pilot has triggered a gold rush in exploring how FM can support software engineering practices (FM4SE).
In this talk I’ll briefly discuss some of the core challenges in adopting FM4SE innovations in corporate settings to produce trustworthy software. Then I’ll discuss our current efforts to re-imagine how software can be developed around FM innovations (SE4FM). I will primarily focus on a new vision of software called Software 4.0 and how SE practices and platforms need to be updated to reflect the Software 4.0 future.

Ahmed E. Hassan (Queen’s University) is an IEEE Fellow, an ACM SIGSOFT Influential Educator, an NSERC Steacie Fellow, the Canada Research Chair (CRC) in Software Analytics, and the NSERC/BlackBerry Software Engineering Chair at the School of Computing at Queen’s University, Canada. His research interests include mining software repositories, empirical software engineering, AI engineering, load testing, and log mining. He received a PhD in Computer Science from the University of Waterloo. He spearheaded the creation of the Mining Software Repositories (MSR) conference and its research community. He also serves/d on the editorial boards of IEEE Transactions on Software Engineering, Springer Journal of Empirical Software Engineering, and PeerJ Computer Science. More information at: http://sail.cs.queensu.ca/.

SE4AI-- Lessons Learned from Designing SE Methods for Big Data and HW Heterogeneity

Abstract:
Abstract: Software developers are rapidly adopting AI to power their applications. Current software engineering techniques do not provide the same benefits to this new class of compute and data-intensive applications. To provide productivity gains that developers desire, our research group has designed a new wave of software engineering methods.

First, I will discuss the technical challenge of designing automated testing and debugging methods for big data analytics. As an example, I will share the experience of designing, BigTest, symbolic-execution based test generation for Apache Spark. Second, I will discuss technical challenges of making custom hardware accelerators accessible to software developers. As an example, I will showcase HeteroFuzz, an automated fuzz testing method for heterogeneous application development with FPGA.

I will then share the lessons learned from designing SE methods that target big data and HW heterogeneity and discuss open problems in this data and compute-intensive domain.

Miryung Kim (University of California, Los Angeles) Professor and Vice Chair of Graduate Studies in the Department of Computer Science at UCLA. She has taken a leadership role in defining the emerging area of software engineering for AI. Her current research focuses on software developer tools for big data systems and heterogeneous computing. Her group created automated testing and debugging for Apache Spark and conducted the largest scale study of data scientists in industry. She was a Program Co-Chair of ESEC/FSE 2022. She was a Keynote Speaker at ASE 2019 and ISSTA 2022. She gave Distinguished Lectures at CMU, UIUC, UMN, UC Irvine, etc.

She produced 6 professors (Columbia, Purdue, two at Virginia Tech, etc). For her impact on nurturing the next generation of academics, she received the ACM SIGSOFT Influential Educator Award. She is an ACM Distinguished Member.

http://web.cs.ucla.edu/~miryung/

Exploiting The Learned Knowledge of Language Models Using Adapters

Abstract:
Abstract: Language models such as RoBERTa, CodeBERT, and GraphCodeBERT have gotten much attention in the past three years for various Software Engineering tasks. Though these models are proven to have state-of-the-art performance for many SE tasks, such as code summarization, they often require to be fully fine-tuned for the downstream task. Is there a better way for fine-tuning these models that require training fewer parameters? Can we impose new information on the current models without pre-training them again? How do these models perform for different programming languages, especially low-resource ones with less training data available? How can we use the knowledge learned from other programming languages to improve the performance of low-resource languages? This talk will review a series of experiments and our contributions to answering these questions.

p style=”text-align:justify”>Fatemeh Hendijani Fard (University of British Columbia) is an Assistant Professor at The University of British Columbia, Canada, where she leads the Data Science and Software Engineering lab. Her research interests are in the intersection of Natural Language Processing and Software Engineering, focusing on code representation learning and transfer learning for low-resource languages and mining software repositories. She collaborates closely with industry and has served as a program committee member and reviewer in several journals and conferences, including TSE, FSE, and ASE. Dr. Fard is a member of the ACM and IEEE Computer Society. She gets back to the community by mentoring females interested in AI.

Optimizing Software Project Management with AI-powered Tracking Tools

Abstract:
Abstract: Modern software project management can be challenging due to the increasing complexity of software systems, rapidly changing technology stacks, and the need to balance competing demands of time and budget. In addition, effective communication and collaboration between stakeholders can be difficult to achieve, and changing requirements can lead to delays in software releases. In this talk, I will present how AI-powered tracking tools can assist project managers in navigating these challenges. I will focus on two critical areas of software project management: technical debt management and pull request management. Finally, I will discuss the potential risks and lessons learned from designing and implementing these tools.

Yuan Tian (Queen’s University) works as an Assistant Professor at Queen’s University in Canada. Prior to joining Queen’s, she held the position of data scientist at the Living Analytics Research Centre (LARC) at Singapore Management University. Yuan is the head of the RISE research lab, which focuses on creating dependable and intelligent assistance for software development. RISE lab focuses on various topics, including automatic management of technical debt and bugs, code change management through pull requests, software engineering approaches for data science and machine learning workflows, and social science in software engineering. She earned her Ph.D. in Information Systems from Singapore Management University in May 2017.

Out-of-distribution Analysis and Robustness of Deep Neural Networks

Abstract:
Abstract: Machine learning is vulnerable to possible incorrect classification of those cases that are out of the distribution observed during training. To identify these cases, we propose an approach inspired from “Surprise Adequacy”, to measure the computational likelihood of classifications performed by a network.
We target the fully connected part of Convolutional Neural Networks (CNNs). We introduce a novel class-based Computational Profile Likelihood (CPL) method, that estimates the conditional probability of a network internal neuron excitation levels during recognition. CPL distributions observed during training and tests are compared, in contrast with those observed when processing cases such as adversarial attacks and affine transformations.
Experiments have been performed using the MNIST-fashion database, which is a publicly available set of clothing images. Presented experimental results show that the computational likelihoods of the adversarial cases and affine transformations span a much wider and extended range with respect to the training set cases. Only few of those cases lie in the training distribution range, while misclassified inputs very often correspond to Out-Of-Distribution (OOD) computational profiles that did not occur during training. Experiments show that the OOD identification allows up to 70\% to 90\% reduction of misclassifications, by filtering them out. Furthermore, experimental results indicate that not all output classes are equally sensitive and vulnerable to adversarial inputs and affine transformations. Presented results also show that identifying and disregarding OOD computations preserves a high recognition precision.
The identification of OOD computations may be beneficial in sensitive and critical domains such as aerospace, health-sciences, cyber-security, and many others, where it may be hard to forecast proper and representative samples of unknown or unexpected cases.

Ettore Merlo (Polytechnique Montréal) is a Full Professor in the Department of Computer and Software Engineering at Polytechnique Montréal. He received a Ph.D. in computer science from McGill University, Montreal, QC, Canada, in 1989 and the Laurea (summa cum laude) degree from the University of Turin, Turin, Italy, in 1983. He was the Lead Researcher with a software engineering group at the Computer Research Institute of Montreal until 1993 when he joined the Ecole Polytechnique de Montreal, Montreal, QC, Canada, where he is currently a Full Professor with the Department of Computer and Software Engineering. His research interests include software analysis, application security, software testing, software reengineering, user interfaces, software maintenance, artificial intelligence, and bioinformatics.

Root cause analysis of system’s event logs

Abstract:
Abstract: Anomaly detection plays an important role in management of modern large-scale distributed systems. Logs, which record system runtime information, are widely used for anomaly detection. However, Unsupervised anomaly detection algorithms face challenges in addressing complex systems, which generate vast amounts of multivariate time series data. Timely anomaly detection is crucial for managing these systems effectively and minimizing downtime. This proactive approach minimizes system downtime and plays a vital role in incident management for large-scale systems. To address these challenges, a method called Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED) has been developed for detecting anomalies in CN PTC system logs. MSCRED leverages the power of multivariate time series data to perform anomaly detection and diagnosis. It creates multi-scale signature matrices that capture different levels of system statuses across various time steps. The method utilizes a convolutional encoder to capture inter-sensor correlations and a Convolutional Long-Short Term Memory (ConvLSTM) network with attention mechanisms to capture temporal patterns.
In summary, During my speech, I will be presenting MSCRED as a method for effectively detecting anomalies in multivariate time series data from complex systems. This approach can significantly contribute to incident management in large-scale systems by enabling prompt identification and resolution.

Maryam Ahmadi (Canadian National (CN)) is an expert AI and machine learning engineer with over 8 years of experience. She currently serves as an expert at CN, where she leverages her expertise in machine learning, statistical modeling, and data analysis to deliver innovative solutions. Maryam holds a master’s degree in Algorithms and Computation from the University of Tehran. In addition to her work as an engineer, she is also a mentor in the Women in AI Canada , advocating for diversity in the field of AI and machine learning. Maryam is committed to pushing the boundaries of what’s possible with AI and machine learning, and is excited to share her latest projects and insights at the upcoming conference.

Enhancing Programming Experiences using AI: Leveraging LLMs (as Analogical Reasoning Engines) and Beyond

Abstract:
Abstract: AI can significantly improve programming experiences for a diverse range of users: from professional developers and data scientists (proficient programmers) who need help in software engineering and data wrangling, to spreadsheet users (low-code programmers) needing help in authoring formulas, and students (novice programmers) seeking hints when tackling programming homework. To effectively communicate their needs to AI, users can express their intent explicitly through input-output examples or natural language specifications, or implicitly by presenting bugs or recent code edits for AI to analyze and suggest improvements.

Analogical reasoning is at the heart of problem solving as it allows to make sense of new information and transfer knowledge from one domain to another. In this talk, I will demonstrate that analogical reasoning is a fundamental emergent capability of Large Language Models (LLMs) and can be utilized to enhance various types of programming experiences.

However, there is significant room for innovation in building robust experiences tailored to specific task domains. I will discuss how various methods from symbolic AI (particularly programming-by-examples-or-analogies) such as search-and-rank, failure-guided refinement, and neuro-symbolic cooperation, can help fill this gap. This comes in three forms: (a) Prompt engineering that involves synthesizing specification-rich, context-aware prompts from various sources, sometimes using the LLM itself, to elicit optimal output. (b) Post-processing techniques that guide, rank, and validate the LLM’s output, occasionally employing the LLM for these purposes. (c) Multi-turn workflows that involve multiple LLM invocations, allowing the model more time and iterations to optimize results.

Sumit Gulwani (Microsoft Research) is a computer scientist connecting ideas, people, research & practice. He invented the popular Flash Fill feature in Excel, which has now also found its place in middle-school computing textbooks. He leads the PROSE research and engineering team at Microsoft that develops APIs for program synthesis. He has incorporated them into various Microsoft products, including Visual Studio, Office, Notebooks, PowerQuery, PowerApps, PowerAutomate, Powershell, and SQL. He is a sponsor of storytelling training and initiatives within Microsoft. He has started a novel research fellowship program in India, a remote apprenticeship model to scale impact while nurturing globally diverse talent and growing research leaders. He has co-authored 11 award-winning papers (including three test-of-time awards from ICSE and POPL) amongst 150+ research publications across multiple computer science areas and delivered 65+ keynotes/invited talks. He was awarded the Max Planck-Humboldt medal in 2021 and the ACM SIGPLAN Robin Milner Young Researcher Award in 2014 for his pioneering contributions to program synthesis and intelligent tutoring systems. He obtained his Ph.D. in Computer Science from UC Berkeley and was awarded the ACM SIGPLAN Outstanding Doctoral Dissertation Award. He received his BTech in Computer Science and Engineering from IIT Kanpur and was awarded the President’s, Gold Medal.

Machine Learning Documentation -- How Far Away Are We

Abstract:
Abstract: The documentation practice for ML models often falls short of established practices for traditional software, which impedes model accountability and inadvertently abets inappropriate or misuse of models. While previous efforts, such as the model cards proposal, have aimed to promote documentation for ML models, their actual impact on practice remains unclear. In this talk, I propose a reassessment of existing approaches to ML documentation. I will present concrete empirical evidence on the current state of ML model documentation in the field. I will then explore how we can foster more responsible and accountable documentation practices through effective tooling support. In particular, I will discuss how we can nudge the data scientists to comply with the model cards proposal during the model development, especially the content related to ethics, and to assess and manage the documentation quality.

Jin L.C. Guo (McGill University, MILA) is an assistant professor of Software Engineering at McGill University and an associate member of Mila. Jin’s primary area of interest lies in applying Artificial Intelligence methods to tackle problems in Software Engineering. Her most recent research centers around extracting domain-specific knowledge from software traceability data and utilizes this knowledge to enable automated SE tasks like trace retrieval and project Q& A. Jin L.C. Guo received her Ph.D. from the University of Notre Dame. Before her Ph.D., she worked in the research lab at Fuji Xerox in the areas of image processing and computer vision.

Examining How We Examine Language Technologies

Abstract:
Abstract: Effectively evaluating language technologies remains a persistent challenge. In this talk, I will discuss two efforts to examine how language technologies are being evaluated, the assumptions underpinning those evaluations, and their ethical implications. First, I will describe work analyzing benchmark datasets constructed to measure stereotyping for two NLP tasks—language modeling and coreference resolution—which uncovered a range of pitfalls threatening these benchmarks’ ability to effectively measure stereotyping. I will then discuss work exploring the goals, practices, assumptions, and constraints of practitioners evaluating natural language generation systems. Throughout, I will highlight the complexities of language and language technologies in their social contexts, and of deciding what our evaluations ought to be measuring and how they ought to measure it.

Su Lin Blodgett (Microsoft Research) s a researcher in the Fairness, Accountability, Transparency, and Ethics (FATE) group at Microsoft Research Montréal. Her research focuses on the ethical and social implications of language technologies, focusing on the complexities of language and language technologies in their social contexts, and on supporting NLP practitioners in their ethical work. She completed her Ph.D. in computer science at the University of Massachusetts Amherst, where she was supported by the NSF Graduate Research Fellowship, and has been named as one of the 2022 100 Brilliant Women in AI Ethics.

Tailoring Requirements Engineering for Responsible AI

Abstract:
Requirements Engineering (RE) is the discipline for identifying, analyzing, as well as ensuring the implementation and delivery of user, technical, and societal requirements. Recently reported issues concerning the acceptance of Artificial Intelligence (AI) solutions after deployment, e.g. in the medical, automotive, or scientific domains, stress the importance of RE for designing and delivering Responsible AI systems. In this talk, I will argue that RE should not only be carefully conducted but also tailored for Responsible AI. I will outline six major directions and related challenges for research and practice.

Walid Maalej (University of Hamburg) is an award-wining software researcher, passionate educator, and creative handyman. As Professor of Informatics at the University of Hamburg in Germany he teaches software development basics to 500-700 students using fun activities and pair-programming. His advanced software engineering courses often include a big portion of Data, Machine Learning, and Empiricism and are based on real challenges from industry and society. Prof. Maalej was named earlier “The Early Stage Scientist of the Year” by academics and the German Association of University Professors (DHV). His work was cited thousands of times and received a.o. the ACM SIGSOFT Distinguished Paper Award, IEEE RE Best Paper Award, MSR Most Influential Paper Award, as well as Awards by Google and Microsoft.
Prof. Maalej worked as developer, manager, and consultant for numerous companies and organizations including Siemens, Tata Consultancy Services, Rohde und Schwarz, and Telekom. He received his doctoral decree from TU Munich with distinction and is also a proud alumni of the Center for Digital Technology and Management.

Quality Assurance of AI-enabled Systems.

Abstract:
AI-enabled Systems, like any other systems, need to be dependable. Automated and effective techniques for the verification and validation of such systems are therefore required. Given that part of their behaviour—often critical aspects—is learned from data as opposed to being specified and coded, this raises specific challenges that must be carefully addressed. This presentation will summarise my perspectives about the automated testing and analysis of AI models and systems, and what I believe are promising avenues of research and the main obstacles to be able to deploy these systems with sufficient confidence.

Lionel C. Briand (University of Ottawa, University of Lexembourg) a professor of software engineering and has shared appointments between The University of Ottawa, Canada, and The SnT Centre for Security, Reliability, and Trust, University of Luxembourg. He has run many collaborative research projects with companies in the automotive, satellite, aerospace, energy, financial, and legal domains. Lionel has held various engineering, academic, and leading positions in six countries. He was one of the founders of the ICST conference (IEEE Int. Conf. on Software Testing, Verification, and Validation, a CORE A event) and its first general chair. He was also EiC of Empirical Software Engineering (Springer) for 13 years and led, in collaboration with first Victor Basili and then Tom Zimmermann, the journal to the top tier of the very best publication venues in software engineering. Lionel was elevated to the grades of IEEE Fellow and ACM Fellow for his work on software testing and verification. He was granted the IEEE Computer Society Harlan Mills award, the ACM SIGSOFT outstanding research award, and the IEEE Reliability Society engineer-of-the-year award, respectively, in 2012, 2022, and 2013. He received an ERC Advanced grant in 2016 — on the topic of modeling and testing cyber-physical systems — which is the most prestigious individual research award in the European Union. He currently holds a Canada Research Chair (Tier 1) on “Intelligent Software Dependability and Compliance.”

Title: Root cause analysis of system’s event logs

Abstract: Anomaly detection plays an important role in management of modern large-scale distributed systems. Logs, which record system runtime information, are widely used for anomaly detection. However, Unsupervised anomaly detection algorithms face challenges in addressing complex systems, which generate vast amounts of multivariate time series data. Timely anomaly detection is crucial for managing these systems effectively and minimizing downtime. This proactive approach minimizes system downtime and plays a vital role in incident management for large-scale systems. To address these challenges, a method called Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED) has been developed for detecting anomalies in CN PTC system logs. MSCRED leverages the power of multivariate time series data to perform anomaly detection and diagnosis. It creates multi-scale signature matrices that capture different levels of system statuses across various time steps. The method utilizes a convolutional encoder to capture inter-sensor correlations and a Convolutional Long-Short Term Memory (ConvLSTM) network with attention mechanisms to capture temporal patterns.
In summary, During my speech, I will be presenting MSCRED as a method for effectively detecting anomalies in multivariate time series data from complex systems. This approach can significantly contribute to incident management in large-scale systems by enabling prompt identification and resolution.

Bio: Maryam Ahmdadi is an expert AI and machine learning engineer with over 8 years of experience. She currently serves as an expert at CN, where she leverages her expertise in machine learning, statistical modeling, and data analysis to deliver innovative solutions. Maryam holds a master’s degree in Algorithms and Computation from the University of Tehran. In addition to her work as an engineer, she is also a mentor in the Women in AI Canada , advocating for diversity in the field of AI and machine learning. Maryam is committed to pushing the boundaries of what’s possible with AI and machine learning, and is excited to share her latest projects and insights at the upcoming conference.

Title: Out-of-distribution Analysis and Robustness of Deep Neural Networks

Abstract: Machine learning is vulnerable to possible incorrect classification of those cases that are out of the distribution observed during training. To identify these cases, we propose an approach inspired from “Surprise Adequacy”, to measure the computational likelihood of classifications performed by a network. We target the fully connected part of Convolutional Neural Networks (CNNs). We introduce a novel class-based Computational Profile Likelihood (CPL) method, that estimates the conditional probability of a network internal neuron excitation levels during recognition. CPL distributions observed during training and tests are compared, in contrast with those observed when processing cases such as adversarial attacks and affine transformations.

Experiments have been performed using the MNIST-fashion database, which is a publicly available set of clothing images. Presented experimental results show that the computational likelihoods of the adversarial cases and affine transformations span a much wider and extended range with respect to the training set cases. Only few of those cases lie in the training distribution range, while misclassified inputs very often correspond to Out-Of-Distribution (OOD) computational profiles that did not occur during training. Experiments show that the OOD identification allows up to 70\% to 90\% reduction of misclassifications, by filtering them out. Furthermore, experimental results indicate that not all output classes are equally sensitive and vulnerable to adversarial inputs and affine transformations. Presented results also show that identifying and disregarding OOD computations preserves a high recognition precision.The identification of OOD computations may be beneficial in sensitive and critical domains such as aerospace, health-sciences, cyber-security, and many others, where it may be hard to forecast proper and representative samples of unknown or unexpected cases.

Bio: Ettore Merlo is a Full Professor in the Department of Computer and Software Engineering at Polytechnique Montréal. He received a Ph.D. in computer science from McGill University, Montreal, QC, Canada, in 1989 and the Laurea (summa cum laude) degree from the University of Turin, Turin, Italy, in 1983. He was the Lead Researcher with a software engineering group at the Computer Research Institute of Montreal until 1993 when he joined the Ecole Polytechnique de Montreal, Montreal, QC, Canada, where he is currently a Full Professor with the Department of Computer and Software Engineering. His research interests include software analysis, application security, software testing, software reengineering, user interfaces, software maintenance, artificial intelligence, and bioinformatics.

Title: Optimizing Software Project Management with AI-powered Tracking Tools

Abstract: Modern software project management can be challenging due to the increasing complexity of software systems, rapidly changing technology stacks, and the need to balance competing demands of time and budget. In addition, effective communication and collaboration between stakeholders can be difficult to achieve, and changing requirements can lead to delays in software releases. In this talk, I will present how AI-powered tracking tools can assist project managers in navigating these challenges. I will focus on two critical areas of software project management: technical debt management and pull request management. Finally, I will discuss the potential risks and lessons learned from designing and implementing these tools.

Bio: Yuan Tian works as an Assistant Professor at Queen’s University in Canada. Prior to joining Queen’s, she held the position of data scientist at the Living Analytics Research Centre (LARC) at Singapore Management University. Yuan is the head of the RISE research lab, which focuses on creating dependable and intelligent assistance for software development. RISE lab focuses on various topics, including automatic management of technical debt and bugs, code change management through pull requests, software engineering approaches for data science and machine learning workflows, and social science in software engineering. She earned her Ph.D. in Information Systems from Singapore Management University in May 2017.

Title: Exploiting The Learned Knowledge of Language Models Using Adapters

Abstract: Language models such as RoBERTa, CodeBERT, and GraphCodeBERT have gotten much attention in the past three years for various Software Engineering tasks. Though these models are proven to have state-of-the-art performance for many SE tasks, such as code summarization, they often require to be fully fine-tuned for the downstream task. Is there a better way for fine-tuning these models that require training fewer parameters? Can we impose new information on the current models without pre-training them again? How do these models perform for different programming languages, especially low-resource ones with less training data available? How can we use the knowledge learned from other programming languages to improve the performance of low-resource languages? This talk will review a series of experiments and our contributions to answering these questions.

Bio: Dr. Fatemeh Hendijani Fard is an Assistant Professor at The University of British Columbia, Canada, where she leads the Data Science and Software Engineering lab. Her research interests are in the intersection of Natural Language Processing and Software Engineering, focusing on code representation learning and transfer learning for low-resource languages and mining software repositories. She collaborates closely with industry and has served as a program committee member and reviewer in several journals and conferences, including TSE, FSE, and ASE. Dr. Fard is a member of the ACM and IEEE Computer Society. She gets back to the community by mentoring females interested in AI.

Our Spring Sale Has Started

You can see how this popup was set up in our step-by-step guide: https://wppopupmaker.com/guides/auto-opening-announcement-popups/

Amin-Nikanjam

Abstract: A growing demand is witnessed in both industry and academia for employing Deep Learning (DL) in various domains to solve real-world problems. Deep Reinforcement Learning (DRL) is the application of DL in the domain of Reinforcement Learning. Like any other software system, DRL applications can fail because of faults in their programs. However, Testing DL systems is a complex task as they do not behave like traditional systems would, notably because of their stochastic nature.

In this talk, I will go over some approaches to test DRL programs. The first taxonomy of faults in DRL programs will be reviewed, in which real faults of such programs are classified into 4 main categories. Then, a preliminary technique is described for automatic detection of faults in DRL programs. We have defined a meta-model of DRL programs and developed DRLinter, a model-based fault detection approach that leverages static analysis and graph transformations. At the end, I’ll present a framework for Mutation Testing applied to RL.

Amin Nikanjam is a research associate in the SWAT research team at Polytechnique Montréal. He is studying 1) how Software Engineering practices (like testing and fault localization) can be leveraged to Machine Learning Software Systems, and 2) how Machine Learning techniques can be applied for safety-critical systems in terms of reliability, robustness, and explainability. He received his Master’s and Ph.D. in Artificial Intelligence from Iran University of Science and Technology, Iran, and his Bachelor’s in Software Engineering from University of Isfahan. Before joining Polytechnique Montréal, he was an invited researcher at University of Montréal, and before that, he was an assistant professor at K. N. Toosi University of Technology, Iran. His research interests include Systems Engineering for Machine Learning, (Deep) Reinforcement Learning, and Multi-Agent Systems.

Title: More Modular Deep Learning

Abstract: A class of machine learning algorithms known as deep learning that has received much attention in academia and industry. Deep learning has a large number of important societal applications, from self-driving cars to question-answering systems such as Siri and Alexa. A deep learning algorithm uses multiple layers of transformation functions to convert inputs to outputs, each layer learning higher-level of abstractions in the data successively. The availability of large datasets has made it feasible to train deep learning models. Since the layers are organized in the form of a network, such models are also referred to as deep neural networks (DNN). While the jury is still out on the impact of deep learning on the overall understanding of software’s behavior, a significant uptick in its usage and applications in wide-ranging areas and safety-critical systems, e.g., autonomous driving, aviation system, medical analysis, etc., combine to warrant research on software engineering practices in the presence of deep learning. One challenge is to enable the reuse and replacement of the parts of a DNN that has the potential to make DNN development more reliable. This talk will describe a comprehensive approach to systematically investigate the decomposition of deep neural networks into modules to enable reuse, replacement, and independent evolution of those modules. A module is an independent part of a software system that can be tested, validated, or utilized without a major change to the rest of the system. Allowing the reuse of DNN modules is expected to reduce energy- and data-intensive training efforts to construct DNN models. Allowing replacement is expected to help replace faulty functionality in DNN models without needing costly retraining steps. Our preliminary work has shown that it is possible to decompose fully connected neural networks and CNN models into modules and conceptualize the notion of modules. A serious problem facing the current software development workforce is that deep learning is widely utilized in our software systems, but scientists and practitioners do not yet have a clear handle on critical problems such as explainability of DNN models, DNN reuse, replacement, independent testing, and independent development. There was no apparent need to investigate the notions of modularity as neural network models trained before the deep learning era were mostly small, trained on small datasets, and were mostly used as experimental features. The notion of DNN modules developed by our work is helping make significant advances on a number of open challenges in this area. DNN modules enable the reuse of already trained DNN modules in another context. Viewing a DNN as a composition of DNN modules instead of a black box enhances the explainability of a DNN’s behavior. More modular deep learning will thus have a large positive impact on the productivity of these programmers, the understandability and maintainability of the DNN models that they deploy, and the scalability and correctness of software systems that they produce.

Hridesh Rajan is the Kingland Professor and Chair in the Department of Computer Science at Iowa State University (ISU), where he has been since August 2005. He served as the Professor-In-Charge of the ISU Data Science program from 2017-Oct 2019. He has held visiting positions at the University of Bristol, Harvard University, and the University of Texas, Austin. Prof. Rajan earned his Ph.D. in Computer Science from the University of Virginia. He is a AAAS Fellow, ACM Distinguished Scientist and a Fulbright Scholar. He has also been recognized by the US National Science Foundation (NSF) with a CAREER award in 2009, by the Iowa State University College of LAS with an Early Achievement in Research Award in 2010, a Big-12 Fellowship in 2012, a ACM Senior Membership in 2014, an exemplary mentor for Junior Faculty award in 2017, a Kingland Endowed Professorship in 2017, and early achievement in departmental leadership award in 2022. Prof. Rajan specializes in data science, programming languages and software engineering. He is credited with giving the definitive treatment for how to modularly reason about crosscutting concerns, and for the design and implementation of the Boa infrastructure for large-scale analysis of open source software and its evolution. Prof. Rajan served as an associate editor for the IEEE Transactions on Software Engineering and as an associate editor for the ACM SIGSOFT Software Engineering Notes. He served as the general chair of SPLASH 2020 and SPLASH 2021, the ACM SIGPLAN conference on Systems, Programming, Languages, and Applications: Software for Humanity.

Title: Towards reproducible models and consistent interpretations for trustworthy AI engineering

Abstract: With AI being adopted in a rapidly growing number of real-world applications, the trustworthiness of the AI-based systems has gain attention by not only the researchers and practitioners, but also the regulatory bodies around the world. Two important topics in engineering trustworthy AI-based systems are the reproducibility of the models (i.e., given the same code and training data, can the training process be repeated to reproduce models with the same behavior), and the consistency of the interpretations of models (i.e., models that are produced to solve the same task agree with one another on feature importance), as they are closely tied to various tasks like training, testing, debugging, auditing, and decision making. However, machine learning models are challenging to be reproduced due to issues like randomness in the software (e.g., optimizing algorithms) and non-determinism in the hardware (e.g., GPU). In addition, many studies violate established practices in the machine learning community when deriving interpretations, such as interpreting models with suboptimal performance, though the impact of such violations on the interpretation consistency has not been studied. In this talk, we will introduce the trustworthy AI engineering research at Huawei, and dive into the specific research of model reproducibility and interpretation consistency that has been carried out at Huawei to tackle the challenges.

Dayi Lin is a Senior Researcher at Centre for Software Excellence, Huawei Canada, where he leads the research on software engineering for AI systems. He and team develop engineering technologies and guidelines to ensure the compliance, quality, and productivity in the lifecycle of AI systems. His research interests include SE4AI, AI4SE, mining software repositories, and game engineering. His work has been published at several top-tier software engineering venues, such as TSE, ICSE, TOSEM, and EMSE, and has attracted wide media coverage. He has served as program committee member in several conferences such as ICSE-SEIP 2023, ICSE 2022 Poster Track, and RAISE 2021. He is also the co-chair of GAS 2022. He received a Ph.D. in Computer Science from Queen’s University, Canada.

Title: Operationalising responsible AI at scale: challenges and solutions beyond algorithms

Abstract: Although artificial intelligence (AI) is solving real-world challenges and transforming industries, there are serious concerns about its ability to behave and make decisions in a responsible way. To address the responsible AI challenges, a number of AI ethics principles frameworks have been published recently, which AI systems are supposed to conform to. However, without further best practice guidance, practitioners are left with nothing much beyond truisms. In addition, significant efforts have been put on algorithm-level solutions which mainly focus on a subset of mathematics-amenable ethical principles (such as privacy and fairness). However, ethical issues can occur at any step of the development lifecycle crosscutting many AI, non-AI and data components of systems beyond AI algorithms and models. In this talk, we will discuss the challenges in operationalising responsible AI at scale and end-to-end system-level solutions to tackle those challenges.

Qinghua Lu leads the Responsible AI science team at CSIRO’s Data61, Australia. She is a principal research scientist at CSIRO’s Data61. She received her PhD from University of New South Wales in 2013. Her current research interest includes responsible AI, software engineering for AI, software architecture, and blockchain. She has published 100+ academic papers in international journals and conferences. Her recent paper “Towards a Roadmap on Software Engineering for Responsible AI“ won the ACM Distinguished Paper Award.

Sushmitha Bala

Sushmitha Bala is an AI Architect for the AI Factory team at National Bank of Canada. The team is responsible for the deployment and industrialization of AI models in production. Following a master’s degree focused on game theoretic constructs in applied economics and statistics, Sushmitha has spent the last decade in various analytics and data-centric roles in the financial services industry, delivering crucial initiatives for companies such as JP Morgan and National Bank. In her current role, she is responsible for designing AI model architecture that balances delivering value while remaining pragmatic, scalable and secure.

Emmanuel Thepie Fapi

Emmanuel Thepie Fapi is currently a Senior Data Scientist with Ericsson, GAIA AI-Hub, Canada. He obtained a bachelor’s degree in applied mathematics from Douala University, Cameroon. He holds a master’s degree in engineering mathematics and computer tools from Orleans University and a PhD in signal processing and telecommunications from IMT Atlantique in France (former ENSTB de Bretagne), with Nokia Siemens Network as host laboratory in Munich Germany. From 2010 to 2016 he worked with GENBAND US LLC, QNX software System Limited as audio software developer, MDA system as analyst and EasyG as senior DSP engineer in Vancouver, Canada. In 2017 he joined Amazon Lab 126 in Boston, USA as Audio Software Developer for Echo dot 3rd generation. His main areas of interest are 5G network and beyond, Anomaly and Intrusion detection-based AI/ML, Network Observability, Predictive maintenance, Real-time embedded OS, Distributed AI/ML, IoT, Voice and Audio Quality Enhancement. He received in 2022 the Ericsson Impact Award and is the SPOC of Edge Computer Cluster project of MITACS Program at Ericsson, with six projects, in collaboration with four Canadian Universities.

Title: Challenges and Opportunities for AI Software System Engineering in the Data-Driven Era

Abstract: The data-driven AI systems(e.g., machine/deep learning) continue to achieve substantial strides in enabling cutting-edge intelligent applications. However, the development of current data-driven AI systems still lacks systematic quality assurance and engineering support in regard to the adoption of quality, security and reliability assurance standards, as well as the available mature toolchain support in an interpretable way. In this talk, I would provide a high-level overview of our team’s continuous efforts to establish the early foundation of Trustworthy Data-Driven AI System Engineering in the past few years across Canada, Japan and Singapore, I would give a high-level introduction to the challenges and opportunities toward laying down the foundations for engineering safe, secure and reliable systems in the data-driven era.

Lei Ma is currently an Associate Professor with shared appointments between (1) University of Alberta, Canada and (2) Kyushu University, Japan. He is also honorably selected as a Canada CIFAR AI Chair and Fellow at Alberta Machine Intelligence Institute (Amii). Previously, he received the B.E. degree from Shanghai Jiao Tong University, Shanghai, China, and the M.E. and Ph.D. degrees from The University of Tokyo, Tokyo, Japan. His recent research centers around the interdisciplinary fields of software engineering (SE) and trustworthy artificial intelligence (AI) with a special focus on the quality, reliability, safety and security assurance of machine learning and AI Systems. For more detailed information, please visit the website, https://www.malei.xyz.

Title: On the Granularity of Fault Localization for Deep Neural Networks

Abstract: Deep Neural Networks (DNNs) are often used in safety-critical systems, such as autonomous driving. Hazards of these systems are usually linked to specific error patterns of the DNN, such as specific misclassifications. In the context of the “Engineerable AI Techniques for Practical Applications of High-Quality Machine Learning-based Systems” (eAI project), we are investigating techniques to repair a DNN to fix some given misclassifications that are considered particularly critical by stakeholders. The first step of these repair approaches consists in applying fault localization (FL) to identify the DNN components (neuron or weights) responsible for the misclassifications. However, the components responsible for one type of misclassification could be different from those responsible for another type; depending on the granularity of the analyzed dataset, FL may not reveal these differences: failure types more frequent in the dataset may mask less frequent ones. The talk will present a way to perform FL for DNNs that avoids this masking effect by selecting test data in a granular way. We conducted an empirical study, using a spectrum-based FL approach for DNNs, to assess how FL results change by changing the granularity of the analyzed test data. Namely, we performed FL by using test data with two different granularities: following a state-of-the-art approach that considers all misclassifications for a given class together, and the proposed fine-grained approach. Results show that FL should be done for each misclassification, such that practitioners have a more detailed analysis of the DNN faults and can make a more informed decision on what to repair in the DNN.

Paolo Arcaini is a project associate professor at the National Institute of Informatics (NII), Japan. He received a PhD in Computer Science from the University of Milan, Italy, in 2013. Before joining NII, he held an assistant professor position at Charles University, Czech Republic. His main research interests are related to search-based testing, fault-based testing, model-based testing, software product lines, and automated repair. In the context of the “Metamathematics for Systems Design” (MMSD) project, he has worked on search-based testing of autonomous driving systems. Currently, he is involved in the “Engineerable AI Techniques for Practical Applications of High-Quality Machine Learning-based Systems” project (eAI), where he works on fault localisation and automated repair for deep neural networks.

Title: Understanding and testing the complexities of autonomous driving systems

Jinqiu Yang is an Assistant Professor in the Department of Computer Science and Software Engineering at Concordia University, Montreal, Canada. Her research interests include automated program repair, software testing, quality assurance of machine learning software, and mining software repositories. Her work has been published in flagship conferences and journals such as ICSE, FSE, EMSE. She serves regularly as a program committee member of international conferences in Software Engineering, such as ASE, ICSE, ICSME and SANER. She is a regular reviewer for Software Engineering journals such as EMSE, TSE, TOSEM and JSS. Dr. Yang obtained her BEng from Nanjing University, and MSc and PhD from University of Waterloo. More information at: https://jinqiuyang.github.io/.

FoutseKhomh

Foutse Khomh is a Full Professor, a Canada CIFAR AI Chair, and FRQ-IVADO Research Chair at Polytechnique Montréal, where he heads the SWAT Lab (http://swat.polymtl.ca/). He received a Ph.D. in Software Engineering from the University of Montreal in 2011. His research interests include software maintenance and evolution, cloud engineering, machine learning systems engineering, empirical software engineering, software analytics, and dependable and trustworthy AI/ML. His work has received four ten-year Most Influential Paper (MIP) Awards, and six Best/Distinguished Paper Awards. He has served on the program committees of several international conferences including ICSE, FSE, ASE, ICSM(E), SANER, MSR, ICPC, SCAM, ESEM and has reviewed for top international journals such as SQJ, JSS, EMSE, TSE, TPAMI, and TOSEM. He is program chair for Satellite Events at SANER 2015, program co-chair of SCAM 2015, ICSME 2018, PROMISE 2019, and ICPC 2019, and general chair of ICPC 2018, SCAM 2020, and general co-chair of SANER 2020. He initiated and co-organizes the Software Engineering for Machine Learning Applications (SEMLA) symposium. He is one of the organizers of the RELENG workshop series (http://releng.polymtl.ca) and Associate Editor for IEEE Software, EMSE, and JSEP.

Title: Hiccups on the road to XAI: security and privacy risks of explainable AI

Abstract: Post-hoc explanation is the problem of explaining how a machine learning model — whose internal logic is hidden to the end-user and generally complex — produces its outcomes. Current approaches for solving this problem include model explanations and outcome explanations. While these techniques can be beneficial by providing interpretability, there are two fundamental threats to their deployment in real-world applications: the risk of explanation manipulation that targets the trustworthiness of post-hoc explanation techniques and the risk of model extraction that jeopardizes their privacy guarantees. In this talk, we will discuss common explanation manipulation and privacy vulnerabilities in state-of-the-art post-hoc explanation techniques as well as existing lines of research that try to make these techniques more reliable.

Ulrich Aïvodji is an Assistant Professor of Computer Science at ETS Montreal in the Software and Information Technology Engineering Department. He is also a regular member of the International Observatory on the Societal Impacts of AI and Digital Technologies. Before his current position, he was a postdoctoral researcher at UQAM, working on machine learning ethics and privacy. He earned his Ph.D. in Computer Science at Université Toulouse III. His research areas of interest are computer security, data privacy, optimization, and machine learning. His current research focuses on several aspects of trustworthy machine learning, such as fairness, privacy-preserving machine learning, and explainability. .

Emad Shihab

Emad Shihab is Associate Dean of Research and Innovation and Full Professor in the Gina Cody School of Engineering and Computer Science at Concordia University. He holds a Concordia University Research Chair in Software Analytics. His research interests are in Software Engineering, Mining Software Repositories, Software Analytics, and Software Bots. Dr. Shihab received the 2019 MSR Early Career Achievement Award and the 2019 CS-CAN/INFO-CAN Outstanding Young Computer Science Researcher Prize. His work has been published in some of the most prestigious SE venues, including ICSE, ESEC/FSE, MSR, ICSME, EMSE, TOSEM, and TSE. He is recognized as a leader in the field, serving on numerous steering and organization committees of core software engineering conferences. Dr. Shihab has secured more than $2.7 Million, as PI, to support his research, including a highly competitive NSERC Discovery Accelerator Supplement. His work has been done in collaboration with world-renowned researchers from Australia, Brazil, China, Europe, Japan, the United Kingdom, Singapore and the USA and adopted by some of the biggest software companies, such as Microsoft, Avaya, BlackBerry, and Ericsson. He is a senior member of the IEEE. His homepage is: http://das.encs.concordia.ca/.

Reihaneh Rabbany

Reihaneh Rabbany is an Assistant Professor at the School of Computer Science, McGill University. She is a core faculty member of Mila – Quebec’s artificial intelligence institute, and a Canada CIFAR AI Chair. She is also a faculty member at the Center for the Study of Democratic Citizenship. Before joining McGill, she was a Postdoctoral fellow at the School of Computer Science, Carnegie Mellon University. She completed her Ph.D. in the Computing Science Department at the University of Alberta. Her research is at the intersection of network science, data mining and machine learning, with a focus on analyzing real-world interconnected data, and social good applications.

Qinghua Lu

Ipek Ozkaya

Ipek Ozkaya is the technical director of Engineering Intelligent Software Systems group at Carnegie Mellon University Software Engineering Institute (SEI). Her main areas of expertise and interest include software architecture, software design automation, and managing technical debt in software-reliant and AI-enabled systems. At the SEI she has worked with several government and industry organizations in domains including avionics, power and automation, IoT, healthcare, and IT. Ozkaya is the co-author of a practitioner book titled Managing Technical Debt and is the Editor-in-Chief of IEEE Software Magazine. She holds a PhD in Computational Design from Carnegie Mellon University.

Title: Unit tests for Machine Learning research code

Abstract: Unit tests are a valuable (and increasingly essential) tool when building software systems. Indeed, test-driven development is a mainstay of most modern software development processes. In research, however, unit tests are typically eschewed for the sake of expediency and uncertainty about the long-term usage of the research code. This is unfortunate, as the reliability and reproducibility of the code used for research is essential for the advancement of science. Although efforts such as reproducibility checklists and challenges help mitigate some of these concerns, they come only at the end of the software development process. In this talk I will argue for the use of unit tests when writing code for machine learning research, as a means of ensuring correctness and reliability of the code we use for scientific progress.

Pablo Samuel was born and raised in Quito, Ecuador, and moved to Montreal after high school to study at McGill. He obtained his PhD from McGill, focusing on Reinforcement Learning under the supervision of Doina Precup and Prakash Panangaden. He has been working at Google for over 10 years, and is currently a staff research Software Developer in Google Brain in Montreal, focusing on fundamental Reinforcement Learning research, Machine Learning and Creativity, and being a regular advocate for increasing the LatinX representation in the research community. Aside from his interest in coding/AI/math, Pablo Samuel is an active musician..

Title: Next-Generation Software Testing and Verification: An AI Perspective

Abstract: Software testing is about finding failures, assuming that failures are due to faults in the system under test (SUT). Failures, however, may not always indicate SUT faults. For example, when testing is applied at the system level to complex cyber-physical systems, e.g., self-driving cars, a failure may indicate insufficiencies such as performance limitations, physical constraints, or misuse by human operators. In these situations, there is a need for techniques that not only generate individual tests leading to failures but also either explain the circumstances around failures or identify constraints that can steer the system clear of failures. In this talk, I discuss how Interpretable ML can broaden the focus of verification and testing so as to include the learning of insufficiencies caused by the SUT environment. I will present how, using Interpretable ML, one can generate environment conditions that characterize system correctness or, alternatively, explain system failures. To illustrate applications, I will use case studies from the domains of cyber-physical systems and network systems.

Shiva Nejati is an Associate Professor at the School of Electrical Engineering and Computer Science at the University of Ottawa (uOttawa) and Co-director of uOttawa’s recently established IoT Lab (Sedna). Prior to joining the University of Ottawa, she was a Senior Scientist at the SnT Centre, University of Luxembourg and a Scientist at Simula Research Laboratory, Norway. Nejati received her Ph.D. from the University of Toronto, Canada. Her research interests are in software engineering, focussing on software testing, analysis of IoT and cyber-physical systems, search-based software engineering, applied machine learning, and formal and empirical software engineering methods. Nejati has published more than 70 scientific papers and received eight best or ACM distinguished paper awards as well as a 10-Year Most Influential Paper Award from CASCON. She serves as an Associate Editor for IEEE Transactions on Software Engineering and was PC co-chair for SSBSE 2019 and ACM/IEEE MODELS 2021. She has more than 15 years of experience conducting research in collaboration with the IoT, telecom, automotive, aerospace, maritime and energy sectors.

Title: Machine Learning for Assisting Software Development: Recent Highlights and Open Questions

Abstract: There is a great appeal in using machine learning to assist in software development, as it promises to enable the experience of one software engineer to be recorded and then generalized to provide guidance to another. I’ll give an overview of some of our recent work on using deep learning for modeling software development and assisting software engineers, and I’ll reflect on open questions for SEMLA related to developing applications in this space for use “in the wild.”

Danny Tarlow is a Research Scientist at Google Research, Brain Team in Montreal. He is primarily interested in machine learning methods for understanding and generating programs. However, he have fairly broad interests across Machine Learning. On the academic side, he is also an Adjunct Professor in the School of Computer Science at McGill University and an associate member at MILA. He co-supervises a couple PhD students at MILA.He holds a Ph.D. from the Machine Learning group at University of Toronto (2013). Before coming to Montreal, he spent four years as a postdoc and then Researcher at Microsoft Research, Cambridge (UK).

Title: Analyzing Complex Data from Online Societies

Abstract: In this talk, I will talk about designing methods for analyzing complex data from online societies. Complex data is often interconnected, evolving, and hard to label. With my group, we work on designing methods for analyzing such data, building on techniques for graph mining, graph representation learning, unsupervised and self-supervised learning, anomaly detection, learning with weak and/or uncertain labels, etc. I will highlight one of our projects on measuring polarization in social media which works with real-world data from online societies, where we design methods closely with domain experts within an interdisciplinary team.

Are you ready to engineer and sustain AI systems

Abstract: AI systems are software-reliant systems which include data and components that implement algorithms mimicking learning and problem solving. The increasing availability of computing resources and off-the shelf ML solutions give the impression that engineering, deploying, and maintaining an AI system is trivial once the appropriate data is available. The challenges of developing and deploying ML-enabled systems have been extensively reported in the literature and practitioner blogs and articles, with increasing emphasis on responsible AI implementations. Some of these challenges stem from characteristics inherent to ML components, such as data-dependent behavior, detecting and responding to drift over time, and timely capture of ground truth to inform retraining. The sneaky part about engineering AI systems is they are “just like” conventional software systems we can design and reason about until they are not. Regardless, many principles and practices of building long-lived software systems that are sustainable still apply to engineering AI systems. This presentation will take a software architecture lens and introduce foundational software engineering practices and research gaps in software engineering of ML systems.

Philippe Molaret

Philippe Molaret is VP research & technology at Thales Digital Solutions.
TDS is a center to support Thales Group digital transformation on the backbone of Montreal digital intelligence ecosystem. Cofounder of the cortAIx AI research lab in TDS. Currently sponsoring actively the creation and buildup of the Confiance.ai program at CRIM and the ENGINE NFPO for the adoption of 5G technologies. He is a member of Ivado technology transfer committee and Prompt board of directors. Occasionally he is teaching technology and innovation management at Polytechnique Montreal master and PhD training. Between 2010 and 2012 he was ETS Research and Innovation ambassador. In 2002 he is one of the founding members of CRIAQ and sat on its board of directors until 2015. Industry member of MEI strategic council for research and innovation in 2009, leading to the development of the 2010-2013 Quebec Strategy for Research and Innovation.
Before joining Thales, Mr. Molaret worked at CAE for 18 years.
Mr. Molaret graduated in electrical engineering from ETS in 1990 and obtained a master degree in Technology and Innovation management from Polytechnique Montreal in 2017.

Gabriela Nicolescu

Gabriela Nicolescu is a full professor and the director of the Department of Computer and Software Engineering at Polytechnique Montreal. She obtained her B. Sc. A and her MSc degree from Politechnica Bucharest. She obtained her Ph.D. degree, in 2002, from INPG (Institut National Polytechnique de Grenoble) in France, with the award for Best Thesis in Microelectronics. She has been working at Ecole Polytechnique de Montréal (Canada) since august 2003, where she is a professor in the Computer and Software Engineering Department. Dr. Nicolescu’s research interests are in the field of design methodologies, programming and security for systems with advanced technologies, such as 3D multi-processor systems-on-chip integrating liquid cooling and optical networks. She published five books, and she is the author of more than a hundred articles in journals, international conferences and book chapters.

Eric Laufer

Eric Laufer is the lead data scientist at Peritus.ai, a startup building tools to help monitor and grow online communities. After a master’s degree focusing on recommender systems at the MILA, Eric has worked for the last decade as an applied scientist / ML engineer for various startups and large companies. This work includes NLP (NER/Q&A/Search) for Dow Jones, Element AI and Peritus.ai, along with recommendation and supply chain forecasting for JDA. His main focus as a ML practitioner is to build efficient, scalable and useful models in the context of application development.

Mélanie Bosc

Mélanie Bosc Mélanie Bosc holds a diploma in training engineering from University of Paris 1 Panthéon-Sorbonne and a DESS in training management from the University of Sherbrooke. She developed her expertise in the field of training by starting her career at National Institute of Agricultural Research in France and then working for organizations in the banking and university fields in Quebec. In this regard, she held the position of Director of Continuing Education at the Faculty of Continuing Education of the University of Montréal. Passionate about the challenges raised by workforce and human resources issues, as well as by learning and training in all its forms, as Executive Director of the sectoral committee of the ICT workforce, Mélanie has worked to promote the ICT sector and its workforce, as well as the digital transformation of the Quebec economy in general.

PatrickStAmant

Patrick St-Amant is the CTO and cofounder of Zetane Systems with advanced education in mathematics. He is the inventor of Zetane’s technology and leads the development of Zetane Protector (ML models robustness testing and evaluation) and Zetane Insight Engine (models introspection 3D engine).

He has successfully led several end-to-end ML projects with industrial clients and partners in the fields of Security, Defense, Aerospace, Construction, Aviation, Simulation and Manufacturing. This included project scoping, ML solution design, planning, data engineering, implementation, robustness testing and client’s interactions. He has spent years as a researcher in number theory, set theory and fundamentals of mathematics. He did PhD studies in mathematics, category theories and foundation of computing at the University of Ottawa (2007). He was invited to the Institute for Advanced Study in Princeton (2006 & 2007) where he presented his work on a universal mathematical language. He has a M.Sc. degree in computer science and fundamental mathematics from UQAM and holds the patent “Scalable Transform Processing Unit for Heterogeneous Data”.

Over the last five years, he met with over 200 leaders and data scientists in the field of AI and ML. Some examples include IBM, Nvidia, Thales, Microsoft, US Department of Defense, Amazon, MILA, Polytechnique, Université de Montreal, Unity, Quantum Black, CAE, MDA, Creative Destruction Lab, CNRC and others. He presented at the World Summit AI, AI for Defense, Big Data Toronto, Deep Learning Montreal and the yearly ONNX conference.

HouariSahraoui

Houari A. Sahraoui is full professor at the department of computer science and operations research (GEODES, software engineering group) of University of Montreal. Before joining the university, he held the position of lead researcher of the software engineering group at CRIM (Research center on computer science, Montreal). He holds an Engineering Diploma from the National Institute of computer science (1990), Algiers, and a Ph.D. in Computer Science, Pierre & Marie Curie University LIP6, Paris, 1995. His research interests include automated software engineering (SE), Search-base SE, Model-Driven Engineering software visualization, program comprehension, and re-engineering. He has published around 200 papers in conferences, workshops, books, and journals, edited three books, and gives regularly invited talks. He has served as program committee member in several IEEE and ACM conferences, as member of the editorial boards of three journals, and as organization member of many conferences and workshops. He was the general chair of IEEE Automated Software Engineering Conference in 2003, PC co-cahir of VISSOFT 2011, and general chair of VISSOFT 2013.

EmadShihab

MikeRabbat

Mike Rabbat is a Research Scientist and Manager in FAIR, the fundamental AI research group of Meta Platforms. He earned the BSc degree from the University of Illinois Urbana-Champagne, the MSc degree from Rice University, and the PhD from the University of Wisconsin-Madison, all in electrical engineering. Before joining FAIR he was a professor at McGill University and he has held visiting positions at IMT-Atlantique (Brest, France), the Inria Bretagne-Atlantique Research Centre (Rennes, France), and KTH Royal Institute of Technology (Stockholm, Sweden). His research interests include optimization for machine learning, large-scale and distributed optimization, and federated learning.

Applying Software Engineering Principles To A Machine Learning Algorithm

Software engineering and machine learning are two different worlds. There is a lot of research towards applying machine learning to software engineering but the reciprocal is not true. In this poster, we present an example where principles of software engineering were applied successfully to a machine learning prototype algorithm. The machine learning developer was able to improve his workflow by applying simple heuristics borrowed from software engineering. Then, we highlight other common problems that can be explored with software engineering to increase the velocity of machine learning projects and raise questions about various ways to apply software engineering to this domain.

State of the practice in service identification to support the migration to SOA in industry

The migration of legacy software systems to Service Oriented Architectures (SOA) has become a mainstream trend to modernize enterprise software systems. A key step in SOA migration is the identification of services in the target application, but it is a challenging one to the extent that the potential services (1) embody reusable functionalities, (2) can be developed in a cost-effective manner, and (3) should be easy to maintain. In this poster, we report on state of the practice of SOA migration in industry. We surveyed 45 practitioners of legacy-to-SOA migration to understand how migration, in general, and service identification, in particular are done. Key findings include: (1) reducing maintenance costs is a key driver in SOA migration, (2) domain knowledge and source code of legacy applications are most often used respectively in a hybrid top-down and bottom-up approach for service identification, (3) service identification focuses on domain services–as opposed to technical services, (4) the process of service identification remains essentially manual, and (5) RESTful services and microservices are the most frequent target architectures. We conclude with a set of recommendations and best practices.

Machine Learning for Log Analysis: A systematic literature review

Systems logs are widely used and plays a critical role in systems forensic. However, the task of logs analysis faces several challenges. Logs are massive in volume and contain complex kinds of messages, logs are unstructured data and lack homogeneity and log data does not contain explicit information for anomaly detection. Therefore, it is impossible to perform log analysis manually in large-scale router systems. However, Developers face the challenging task of choosing the most appropriate automated log analysis method. Also, there is a Lack of literature review on state-of-the-art machine learning methods for log analysis. Our aim is to help developers choose the most appropriate automated log analysis method for their task. and to answer the following research questions: What are current challenges and proposals in software log analysis? What are the state-of-art ML methods for anomaly detection? (supervised / un-supervised). What are the uses of ML in log analysis? and when ML should or shouldn’t be chosen over other practices?

Understanding the Factors for Fast Answers in Technical Q&A Websites An Empirical Study on Four Stack Exchange Websites

Q&A website (e.g., Stack Overflow) designers have derived several incentive systems to encourage users to answer questions. However, the current incentive systems primarily focus on the quantity and quality of the answers instead of encouraging the rapid answering of questions. In this paper, we use a logistic regression model to analyze 46 factors along four dimensions in order to understand the relationship between the studied factors and the needed time to get an accepted answer. We find that i) factors in the answerer dimension have the strongest effect on the needed time to get an accepted answer. ii) the non-frequent answerers are the bottleneck for fast answers. iii) the current incentive system motivates frequent answerers well, but such frequent answerers tend to answer short questions. Our findings suggest that Q&A website designers should improve their incentive systems to motivate non-frequent answerers to be more active and to answer questions fast.

Recommending Framework Extension Examples

A common way to customize a framework is by passing a framework related object as an argument to an API call. The formal parameter of the method is referred to as the extension point. Such an object can be created by subclassing an existing framework class or an interface, or by directly customizing an existing framework object. However, this requires extensive knowledge of the framework’s extension points and their interactions. We develop a technique that mines a large number of code examples to discover all extension points and patterns for each framework class. Given a framework class that is being used, our approach first recommends all extension points that are available in the class. Once the developer chooses an extension point, our approach discovers all of its usage patterns and recommends the best code examples for each pattern. We evaluate the performance of our two-step recommendation using five different frameworks.

An Empirical Study of the Long Duration of Continuous Integration Builds

Continuous Integration (CI) allows developers to generate software builds more quickly and periodically, which helps in identifying errors at early stages. When builds are generated frequently, a long build duration may hold developers from performing other development tasks. Our initial investigation shows that many projects experience long build durations (e.g., in the scale of hours). In this research, we model long CI build durations of 63 GitHub projects to study the factors that may lead to longer CI builddurations. Our preliminary results indicate that common wisdom factors (e.g., lines of code and build configuration) do not fully explain long build durations. Therefore, we study the relationship of long build durations with CI, code, density, commit, and file factors. Our results show that test density and build jobs have a strong influence on build duration. Our research provides recommendations to developers on how to optimize the duration of their builds.

Learning from imbalanced data

An important challenge in many real-world machine learning applications is imbalance between classes. Learning from imbalanced data is challenging due to bias of performance towards the majority class rather than the minority class of interest. This bias may exist because: (1) classification systems are often optimized and compared using performance measurements that are unsuitable for imbalance problems; (2) most learning algorithms are designed and tested on a fixed imbalance level, which may differ from operational scenarios; (3) the preference of classes is different from one application to another. In this poster, a summary of two papers from my PhD thesis is presented that includes: (1) a new ensemble learning algorithm called Progressive Boosting (PBoost). (2) a new global evaluation space for the F-measure that represent a classifier over all of its decision thresholds and a range of possible imbalance levels for the desired preference of TPR to precision.

The Impact of Feature Reduction Techniques on Defect Prediction Models

Defect prediction is an important task for preserving software quality. Most prior work on defect prediction uses software features, such as the number of lines of code, to predict whether a file or commit will be defective in the future. Feature selection and reduction techniques can help to reduce the number of features in a model. Using a small number of features avoids the problem of multicollinearity and makes the prediction models simpler. However, there do not exist studies in which the impact of feature reduction techniques on defect prediction is investigated, while several recent studies have investigated the impact of feature selection techniques on defect prediction. In our research, we study the impact of eight feature reduction techniques on the performance and the variance in performance of five supervised learning and five unsupervised defect prediction models.

A survey on load testing of large-scale software systems

Several large-scale systems have faced system failures in the past due to their inability to handle a very large number of concurrent requests. Therefore, load tests are designed to verify the scalability, robustness, and reliability of the system (apart from the functionality) to meet the demands of millions of users. In our work, we survey the state of load testing research and practice. We compare techniques, data sources and results that are used in the three phases of a load test: Design, Execution, and Analysis. We focus on the work that was published after 2013. Our work complements existing surveys on load testing.

Studying the Dialogue Between Users and Developers of Free Apps in the Google Play Store

The popularity of mobile apps continues to grow over the past few years. Mobile app stores, such as the Google Play Store and Apple’s App Store provide a unique user feedback mechanism to app developers through app reviews. In the Google Play Store (and most recently in the Apple App Store), developers are able to respond to such user feedback. In our work, we analyze the dynamic nature of the review-response mechanism by studying 4.5 million reviews with 126,686 responses of 2,328 top free-to-download apps in the Google Play Store. One of the major findings of our study is that the assumption that reviews are static is incorrect. Our findings show that it can be worthwhile for app owners to respond to reviews, as responding may lead to an increase in the given rating. In addition, we identify four patterns of developers (e.g., developers who primarily respond to negative reviews).

Applying Consensus Algorithms to Generate Consensual Developer Behavior

Developer behavior is a common research topic in software engineering to spark the future maintenance and evolution of software systems. Studying developers behavior for the purpose of recommending a most common behavior is an area that captures great interest. Given this interest, our work aims to apply consensus algorithms on developers behaviors to generate a consensual behavior. We conduct a number of experiments to analyze how developers behave while performing programming task. We collect developers interaction traces (ITs) through Eclipse Mylyn and VLC video captures. To obtain best results, we perform an in-depth comparison between the results of applying each consensus algorithm. Preliminary results show that Kwiksort algorithm outperforms all other algorithms in producing most common developer behavior. This study demonstrates how using consensus algorithms can help recommend to developers a consensual behavior when performing a particular programming task.

Log4Perf: Suggesting Logging Locations for Web-based Systems’ Performance Monitoring

Logs are widely used to monitor, understand and improve software performance. However, developers often face the challenge of making logging decisions. Prior works on automated logging guidance techniques are rather general, without considering a particular goal, such as monitoring software performance. We present Log4Perf, an automated approach that provides suggestions of where to insert logging statement with the goal of monitoring web-based systems’ software performance. In particular, our approach builds and manipulates a statistical performance model to identify the locations in the source code that statistically significantly influences software performance. Our evaluation results show that Log4Perf can build well-fit statistical performance models, which can be leveraged to investigate the influence of locations in the source code on performance. Also, our approach is an ideal complement to traditional approaches that are based on software metrics or performance hotspots. Log4Perf is integrated into the release engineering process of a commercial software to provide logging suggestions on a regular basis.

DLFinder: Characterizing and Detecting Duplicate Logging Code Smells

Developers rely on software logs for varieties of tasks. Recent research on logs often only consider the appropriateness of a log as an individual item, while logs are typically analyzed in tandem. Thus we focus on studying duplicate logging code, which are log lines that have the same static text message. Such duplication in logs are potential indications of logging code smells, which may affect developers’ understanding of the system. We uncover five patterns of duplicate logging code smells by manually studying a statistical sample of duplicate logs from four large-scale open source systems. We further manually study all the code smell instances and identify the problematic and justifiable cases of the uncovered patterns. Then, we contact developers in order to verify our result. We integrated our manual study result and developers’ feedback into our static analysis tool, DLFinder, which helps developers identify and refactor duplicate logging code smells.

An empirical study of obsolete knowledge on Stack Overflow

An enormous amount of knowledge in software engineering is accumulated on Stack Overflow. However, as time passes, knowledge embedded in answers may become obsolete. Such obsolete answers, if not identified or documented clearly, may mislead answer seekers and cause unexpected problems (e.g., using an outdated security protocol). In this paper, we study the characteristics of obsolete answers. We find that: 1) 58.4% of the obsolete answers were already obsolete when they were first posted. 2) Only 23.5% of such answers are ever updated. 3) Answers in web and mobile development tags are more likely to become obsolete. 4) 79.5% of obsolete observations are supported by evidence (e.g., version information and obsolete time). We suggest that 1) Stack Overflow should encourage the whole community to maintain obsolete answers. 2) Answerers are suggested to include the information of valid versions/time when posting answers. 3) Answer seekers are suggested to go through comments in case of answer obsolescence.

Rewarding open source developers: a case study of Bountysource bounties

Because of the voluntary nature of open source, sometimes it is hard to find a developer to work on a particular issue. However, these issues may be of high priority to others. To motivate developers to address these particular issues, people can offer monetary rewards (i.e., bounties) for addressing an issue report. To better understand how bounties can be leveraged to evolve an open source project, we investigated 3,509 Github projects’ issues for which bounties ($406,425 in total) were offered on Bountysource. We collect 31 factors and build a logistic regression model to understand the relationship between the bounty and the issue-addressed likelihood. We find that (1) providing a bounty for an issue earlier on and adding a bounty label are related to an increased issue-addressing likelihood. (2) The bounty value of an issue does not have a strong relationship with the likelihood of an issue being addressed.

Revisiting “Understanding the Rationale for Updating a Function’s Comment”

Code comments play a fundamental role in Software Maintenance and Evolution. As such, they need to be kept up-to-date. A decade ago, Malik et al. introduced a classification model to flag whether the comments of a function need to be updated when such a function is changed. The authors claimed that their model had an overall accuracy of 80%. We discovered and addressed eight drawbacks in the design and evaluation of their model. In particular, we noticed that the out-of-bag performance evaluation yielded unrealistic results in all cases considered. In addition, we observed that the feature ranking tends to be biased towards the features that are important for the most-frequently occurring type of comment change (i.e., either inner or outer comments). Finally, we introduce and evaluate a simpler model and conclude that its performance is statistically similar to that of the full model and that it is more easily interpretable.

Predicting and Prioritizing Tests that Manifest Performance Regressions at the Commit Level

Performance issues may compromise user experiences, increase the resources cost, and cause field failures. One of the most prevalent performance issues is performance regression. Prior research proposes various automated approaches that detect performance regressions. However, the performance regression detection is conducted after the system is built and deployed. Hence, large amounts of resources are still required to locate and fix performance regressions. In our paper, we propose an approach that automatically predicts whether a test would manifest performance regression in a code commit. We conduct case studies on three open-source systems. Our results show that our approach can predict performance-regression-prone tests with high AUC values. In addition, we find that traditional size metrics are still the most important factors. On the other hand, performance-related metrics that are associated with Loop and Adding Expensive Variable are also risky for introducing performance regressions. Our approach and the study results can be leveraged by practitioners to effectively cope with performance regressions in a timely and proactive manner.

Studying the Characteristics of Logging Practices in Mobile Apps: A Case Study on F-Droid

Logging is a common practice in software development and contains rich information. However, little is known about mobile apps’ logging practices. Therefore, we conduct a case study on 1,444 open source Android apps in the F-Droid repository. We find that although mobile app logging is less pervasive than large software systems, logging is leveraged in almost all studied apps. We compare the log level of each logging statement and developers’ rationale of using the logs. All too often(over 30%), developers choose an inappropriate log level. Such inappropriate log level may prevent the useful run-time information to be recorded or may generate unnecessary logs causing performance overhead and security issues. Finally, we conduct a performance evaluation with disabling logging messages in four open-source Android apps. We observe a significant performance overhead on response time, CPU and I/O. Our results imply the need of systematic guidance to assistant in mobile logging practices.

Improving the Pull Requests Review Process Using Learning-to-rank Algorithms

In collaborative software development platforms (such as Github and Gitlab), the role of reviewers is key to maintain the effective review process of the pull requests. However, the number of decisions that reviewers can make is far superseded by the increasing number of pull requests submissions. To help reviewers to perform more decisions, we propose a learning-to-rank (LtR) approach to recommend pull requests that can be quickly reviewed by reviewers. Our ranking approach complements the existing list of pull requests based on their likelihood of being quickly merged or rejected. We conduct empirical studies on 74 Java projects. We observe that: (1) The random forest LtR algorithm performs better than both the FIFO and the small first baselines obtained from existing pull requests prioritizing criteria, which means our LtR approach can help reviewers perform more decisions and improve their productivity. (2) The contributor’s social connections are the most influential metrics to rank pull requests that can be quickly merged.

Supporting Logging Decisions Using Historical Development Knowledge

Software developers insert logging statements in their source code to record important runtime information. However, providing proper logging statements remains a challenging task. In this work, we firstly studied why developers make log changes in their source code. We then proposed an automated approach to provide developers with log change suggestions as soon as they commit a code change. Our automated approach can effectively suggest whether a log change is needed for a code change with an AUC of 0.84 to 0.91. We also studied how developers assign log levels to their logging statements and proposed an automated approach to help developers determine the most appropriate log level when they add a new logging statement. Our automated approach can accurately suggest the levels of logging statements with an AUC of 0.75 to 0.81.

Specific version or version range? A study of versioning strategies for dependency management on the npm ecosystem

In most software ecosystems, developers use versioning statements to inform which versions of a provider package are acceptable for fulfilling a dependency. There is an ongoing debate about the benefits and challenges of using versioning statements. On the one hand, flexible versioning statements automatically upgrade a provider’s version, helping in keeping providers up-to-date. On the other hand, flexible versioning statements can introduce unexpected breaking changes. We study three different strategies used by developers to define versioning statements, ranging from accepting a large/flexible range of provider versions to a conservative strategy. Using a flexible strategy, one can expect to have more provider upgrades than other strategies while having to modify less versioning statements. Flexible packages with more than 100 providers should be aware of the possibility of larger inter-release times. Finally, the majority of the strategy shifts are from flexible to mixed and vice-versa.

The impact of Using Regression Models to Build Defect Classifiers.

It is common practice to discretize continuous defect counts into defective and non-defective classes and use them as a target variable when building defect classifiers (discretized classifiers). However, this discretization of continuous defect counts leads to information loss that might affect the performance and interpretation of defect classifiers. Another possible approach to build defect classifiers is through the use of regression models then discretizing the predicted defect counts into defective and non-defective classes (regression-based classifiers). In this paper, we compare the performance and interpretation of defect classifiers that are built using both approaches (i.e., discretized classifiers and regression-based classifiers) across six commonly used machine learning classifiers and 17 datasets. We find that: i) Random forest based classifiers outperform other classifiers (best AUC) for both classifier building approaches; ii) In contrast to common practice, building a defect classifier using discretized defect counts does not always lead to better performance.

An Empirical Study of Early Access Games on the Steam Platform

“Early access” is a model that allows players to purchase an unfinished version of the game. In turn, players can provide developers with early feedback. Recently, the benefits of the early access model have been questioned by the community. We conducted an empirical study on 1,182 early access games on the Steam platform to understand the characteristics, advantages and limitations of the early access model. We observe that developers update their games more frequently in the early access stage. On the other hand, the reviewing activity during the early access stage is lower than that after the early access stage. However, the percentage of positive reviews is much higher during the early access stage, suggesting that players are more tolerant of imperfections in the early access stage. Hence, we suggest developers to use the early access model for eliciting early feedback and more positive reviews to attract future customers.

Exploring the Use of Automated API Migrating Techniques in Practice: An Experience Report on Android

When APIs evolve, consumers are left with the difficult task of migration. Studies on API migration often assume that software documentation lacks explicit information for migration guidance and is impractical for API consumers. Past research has shown that it is possible to present migration suggestions based on historical code-change information. Yet, the assumptions made by prior approaches have not been evaluated on large-scale practical systems. We report our recent practical experience migrating the use of Android APIs in FDroid apps when leveraging approaches based on documentation and historical code changes. Our experiences suggest that migration through historical code changes presents various challenges and that API documentation is undervalued. More importantly, during our practice, we experienced that the challenges of API migration lie beyond migration suggestions, in aspects such as coping with parameter type changes in new API. Future research should aim to design automated approaches to address these challenges.