Invited Speakers

Agents Day – June 17th, 2025

Xin Eric Wang

Building AI Agents that Reason and Act Like Humans: In this talk, I will present our recent progress toward building general-purpose AI agents that can reason and act like humans—from automating complex computer tasks to engaging in intuitive, grounded reasoning. We begin with Agent S, an open agentic framework for autonomous interaction with Graphical User Interfaces (GUIs). Agent S addresses three key challenges in real-world task automation: acquiring domain-specific knowledge, planning over long horizons, and adapting to diverse, dynamic interfaces. It introduces experience-augmented hierarchical planning, combining external web search with internal memory retrieval to support efficient task decomposition and execution. Our latest version, Agent S2, achieves new state-of-the-art performance across all major computer-use benchmarks with a compositional generalist-specialist architecture, bringing us closer to AI that can serve as capable virtual workers in everyday and enterprise settings. To complement these agentic capabilities, we explore advances in reasoning. We propose the GRIT method to teach Multimodal Large Language Models (MLLMs) to “think with images”, interleaving text tokens and visual anchors throughout the reasoning process. We also introduce Soft Thinking, a training-free paradigm that supports fluid, abstract thought—enabling models to explore multiple possibilities in parallel rather than committing to one rigid step at a time. Together, these lines of work lay the foundation for AI agents that can see, think, and act—seamlessly and reliably—in complex, real-world environments.

Alexandre Lacoste

Alexandre is a Staff Research Manager leading the UI-Assist team at ServiceNow. Since his PhD in theoretical machine learning, he has published various influential works around foundation models, sequential decision-making, causality, and various works in collaboration with Yoshua Bengio. Prior to ServiceNow, he was the first research scientist to join Element AI and he worked for 3 years at Google. He also is a founding member of ClimateChangeAI and an author of the seminal paper Tackling Climate Change with Machine Learning.

Autonomous UI Agents: Open-Source Tools, Best Practices, and Safety This talk presents an overview of the current open-source ecosystem for building and evaluating autonomous web agents. We will cover key platforms including BrowserGym, AgentLab, WebArena, WorkArena, and DoomArena, highlighting their design choices, capabilities, and how they can be used together to support research on UI-based decision-making. The talk is aimed at researchers interested in language agents, reinforcement learning, and interactive environments. We will also introduce core concepts for developing agents, including DOM parsing, AXTree, action spaces, and evaluation protocols. A dedicated section will address emerging security concerns such as prompt injection and unsafe tool use. The goal is to provide a clear understanding of the available tools, common workflows, and practical considerations for safely developing robust web agents.

Royal Sequeira

Royal is a Machine Learning Engineer at Georgian’s AI Lab, where he helps portfolio companies build AI-driven product features and accelerate their go-to-market (GTM) strategies. He also supports sourcing and diligence for new investments. Previously, he worked at Ada Support as one of their first ML hires, LG Toronto AI Lab, and Microsoft Research India. In 2018, he founded Sushiksha, a mentorship organization that has mentored hundreds of medical and engineering students across rural India with both technical and soft skills.

Voice AI Agents: technology, players, problems, and trends

In this talk, I will present voice AI agents, beginning with the foundational technologies of automatic speech recognition (ASR) and text-to-speech (TTS), and the methodologies used to evaluate their performance. I will outline typical voice AI workflows and the current state of the market, focusing on the key players and the trends that are shaping the space. As voice AI becomes increasingly commoditized, I will discuss potential strategies for building defensibility and differentiation in an industry that is becoming more crowded. Finally, I will share insights on trends to watch out for as the market is expected to evolve.

Tse-Hsun (Peter) Chen

Dr. Peter Chen leads the Software Performance, Analysis, and Reliability (SPEAR) Lab at Concordia University, where he works on log analysis and AIOps, performance profiling, software testing, and mining software repositories to enhance the quality of large-scale systems. His group’s research tools have been adopted by industry partners such as ERA Environmental, Ericsson, Microsoft, and BlackBerry. He has earned multiple prestigious awards and was ranked among the world’s most active software engineering researchers in a Journal of Systems and Software study. Several of his SPEAR‐trained PhD and postdoctoral alumni now hold tenure-track positions at universities worldwide.

Redefining Team Collaboration: Harnessing LLM Agents for Next-Gen Software Engineering

Software development is a collaborative process in which architects, developers, and testers design, implement, and maintain software systems. In this talk, “Redefining Team Collaboration: Harnessing LLM Agents for Next-Gen Software Engineering,” we discuss our recent research outcomes on using large language model (LLM) agents to enhance key software engineering tasks. In particular, we will cover how LLM can help in topics such as code generation, debugging, and log analysis. By emulating the roles and interactions of development teams, we show that LLM agents can benefit from decades of human knowledge in software engineering and achieve state-of-the-art results.

Daniel Jaroslawicz

Daniel is a Research Engineer at Distyl AI, where he develops compound AI systems to automate complex workflows at Fortune 500 companies. His work focuses on transforming organizational knowledge into structured representations that power AI-driven workflows and enable continuous improvement through both automated and expert-driven feedback loops. Prior to Distyl, he was an AI Scientist at BenevolentAI where he worked on hypothesis generation with LLMs for novel drug target discovery. He holds a BA and MS in Computer Science from Columbia University.

Teaching LLMs Your Subject Matter Expertise

While LLMs excel at diverse tasks, their performance hinges on access to relevant contextual information. In many enterprise settings, the necessary domain-specific context is fragmented across disorganized, outdated documentation or exists solely as implicit knowledge held by subject matter experts. This talk will explore strategies we have found effective for developing AI systems capable of ingesting and iteratively refining task-specific context with humans in the loop. We will discuss approaches to knowledge extraction, task representations that improve expert feedback quality, and scalability considerations.

Damien Masson

Damien Masson is an assistant professor in human-computer interaction (HCI) at the Université de Montréal, where he co-leads the Montréal HCI group. He is also an associate academic member of Mila and IVADO professor. Previously, he was a postdoctoral researcher at the University of Toronto. He obtained his PhD in the HCI Lab at the University of Waterloo. His research focuses on building AI-infused systems that prioritize thoughtful interaction design to effectively solve tasks. Currently, his projects relate to designing intelligent system for assisting creative writers, visualizing complex information, and making digital documents easier to understand. His work is published in top-tier HCI venues such as CHI and UIST and received multiple honours, including: a best demo award (CHI 2021); a best paper award (CHI 2023); the AFIHM dissertation award (2023); the Bill Buxton Dissertation Award (2023); and best paper honourable mentions (CHI 2024, CHI 2025).

Beyond Prompts: Designing Powerful Ways to Interact with AI

From an interaction perspective, the way we use AI is a step backward: we are constrained by slow text input that leave much room for misinterpretation, reminiscent of the command-line interfaces of the 70s. Instead, I argue we should move away from textual interactions to explore more gestural approaches. Specifically, I show how classical HCI theories can inform the design of AI interfaces to enable more direct, intuitive, and expressive interactions. I illustrate this through my work reimagining workflows such as writing, and conclude by outlining general principles for designing better human-AI interactions.

Ian Arawjo

Ian Arawjo is an Assistant Professor of Human-Computer Interaction at the University of Montréal, in the Department of Computer Science and Operations Research (DIRO), and is affiliated with Mila. He leads the Montréal HCI group. He was previously a Postdoctoral Fellow at Harvard University and holds a Ph.D. in Information Science from Cornell University. His research explores the social and cultural dimensions of programming, combining methods such as ethnographic fieldwork, archival research, system design, and usability studies. His work has received awards at top HCI conferences including CHI, CSCW, and UIST.

He is the creator and lead developer of ChainForge, the first open-source visual programming environment for prompt engineering, developed in collaboration with colleagues at Harvard CS. He also introduced notational programming, a paradigm that blends handwritten notation with traditional code.

Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale
What’s the deal with AI memory? I present an overview of recent developments in AI memory for agents and the sometimes confusing terminology that has emerged around this concept. Then, I focus on grounding documents–documents composed of user-defined rules and preferences which ground long-term AI agent behavior—and argue why grounding documents are the future of human-AI interaction. To help understand how to robustly update grounding documents, I describe a prototype interface, SemanticCommit, aimed at helping users update lists of information while removing inconsistencies, and discuss design implications for future AI agent interfaces that seek to remain aligned with user intent over long-term usage.

Till Döhmen

Till Döhmen leads the AI team at MotherDuck in Amsterdam. He is also pursuing an external part-time PhD at BIFOLD, TU Berlin, focusing on the intersection of Data Management and ML. His background is in software engineering, with significant experience in data science and ML engineering. Over the past years, he has maintained a presence in both research and industry, embracing the combination of real-world applications and scientific insights. His journey with DuckDB began, with writing the first implementation of DuckDB’s CSV sniffer, and he subsequently worked on a series of DuckDB-related projects (data quality evaluation for ML pipelines; building a DuckDB-based lakehouse query engine for the Hopsworks ML Feature Store). At MotherDuck, he is now working on harnessing the power of LLMs for database users, from Text-to-SQL to leveraging LLMs for unstructured data analysis right within the database.

Why Context Matters: Lessons from building Real-world AI Assistance for DuckDB SQL

Large language models took the lead in coding and text-2-sql benchmarks, and are currently the go-to choice for building state-of-the-art coding assistants. Not only from a quality perspective, but surprisingly, more of then not also from an economical standpoint. LLMs have evolved to a point where they are less bound by their intelligence, then they are by the context we provide to them. Provided with the right context, we can teach LLMs the intricate specifics of DuckDB’s SQL dialect and significantly improve performance on Text-2-SQL benchmarks. We argue that a principled approach to context engineering is key for building applied AI systems these days, and talk through the practical lessons we learned navigating between fine-tuning our own models and deploying LLM-based pipelines for SQL coding assistance.

Intelligent Systems Day – June 18th, 2025

Abhik Roychoudhury

Abhik Roychoudhury is Provost’s Chair Professor of Computer Science at the National University of Singapore (NUS), where he leads a research team on Trustworthy and Secure Software (TSS). He is also a Senior Advisor at SonarSource, subsequent to the acquisition of his spinoff AutoCodeRover on AI-based coding, by Sonar. Abhik received his PhD in Computer Science from the Stony Brook University in 2000, and has been a faculty member at NUS School of Computing since 2001. Abhik’s research group at NUS is known for foundational contributions to software testing and analysis. Specifically the team has made contributions to automatic programming and automated program repair, as well as to fuzz testing for finding security vulnerabilities in software systems. These works have been honored with various awards including an International Conference on Software Engineering (ICSE) Most Influential Paper Award (Test-of-time award) for program repair, IEEE New Directions Award 2022 (jointly with Cristian Cadar) for contributions to symbolic execution.

Abhik was the inaugural recipient of the NUS Outstanding Graduate Mentor Award 2024. Doctoral students graduated from his research team have taken up faculty positions in many academic institutions. He has served the software engineering research community in various capacities including as chair of the major conferences of the field, ICSE and FSE. Currently, he serves as chair of the FSE steering committee. He is the current Editor-in-Chief of the ACM Transactions on Software Engineering and Methodology (TOSEM), and is a member of the editorial board of Communications of the ACM. Abhik is a Fellow of the ACM.

AutoCodeRover:  from research on automatic programming to spinoff acquisition

Large Language Models (LLMs) have shown surprising proficiency in generating code snippets, promising to automate large parts of software engineering via artificial intelligence (AI). We argue that successfully deploying AI software engineers requires a level of trust equal to or even greater than the trust established by human-driven software engineering practices. The recent trend toward LLM agents offers a path toward integrating the power of LLMs to create new code with the power of program analysis tools to increase trust in the code.
In this talk, we will share our experience with students in coming up with the design of AutoCodeRover, an early approach towards agentic Artificial Intelligence (AI) in coding, which is seeing increased attention in 2025. The main differentiation of AutoCodeRover with other proposals was its focus on using program analysis tools autonomously. This allows AutoCodeRover to infer developer intent or specification inference, thereby successfully conducting program improvement such as program repair or feature addition. AutoCodeRover was a spinoff from NUS which has been acquired by SonarSource in February 2025.

We will conclude the talk with a discussion on how agentic AI may be shifting the balance in programming — programming with trust becoming more important than programming at scale.

Weiyi Shang

Weiyi Shang is an Associate Professor at the University of Waterloo. His research interests include AIOps, big data software engineering, software log analytics and software performance engineering. He serves as a Steering committee member of the SPEC Research Group. He is ranked top worldwide SE research stars in a recent bibliometrics assessment of software engineering scholars. He is a recipient of various premium awards, including the CSCAN-INFOCAN Outstanding Early Career Computer Science Researcher Prize in 2021, the SIGSOFT Distinguished paper award at ICSE 2013, ICSE 2020 and 2025, best paper award at WCRE 2011 and the Distinguished reviewer award for the Empirical Software Engineering journal. His research has been adopted by industrial collaborators (e.g., BlackBerry and Ericsson) to improve the quality and performance of their software systems that are used by millions of users worldwide. Contact him at wshang@uwaterloo.cauwaterloo.ca/electrical-computer-engineering/profile/wshang.

Evaluating the efficiency of LLM-generated code

The integration of Large Language Models (LLMs) into software development holds the promise of transforming code generation processes. While AIdriven code generation presents numerous advantages for software development, code generated by large language models may introduce sub-optional efficiency of the generated code. In this talk, I will share our recent study on the efficiency of LLM generated code and associated benchmarks. I will also share our initial try out of using prompt engineering as a potential strategy for optimizing efficiency of LLM-generated code. 

Fabian Wenz

Fabian Wenz is a researcher at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology (MIT), where he works with Cagatay Demiralp and Michael Stonebraker. He holds bachelor’s degrees in Mathematics and Computer Science and a master’s degree in Mathematics and Data Science from the Technical University of Munich (TUM).
His research lies at the intersection of large language models (LLMs), data management, and machine learning systems. He focuses on improving LLM performance for complex reasoning over enterprise databases and structured data. He currently leads the development of BenchPress, a suite of tools and benchmarks for measuring the accuracy of LLM-generated SQL on production database logs.

BENCHPRESS: An Annotation System for Rapid Text-to-SQL Benchmark
Curation Large language models (LLMs) have been applied successfully in many domains, including software generation, document summarization and creating simple legal documents. There has been significant work in applying them to querying data bases, especially decision supports ones (warehouses), which has been called providing text-to-SQL. Much of this work has focused on publicly available data sets, such as Spider and Bird. In previous research, we have shown that LLMs are much less successful at querying large private enterprise data warehouses. Basically, we constructed four benchmarks consisting of {(natural language, Gold-SQL)} pairs so we could assess accuracy on real world applications. Our results were much less impressive than reported results on Spider and Bird. In our work, SQL logs from data warehouses were readily available. However, due to their limited availability, asking database administrators—who are highly trained experts—to take on additional work to construct and validate corresponding natural language utterances is not only challenging but also quite costly. In many cases, it proved to be very difficult to pull them away from their primary responsibilities. As a result we needed a tool to help us (and others) construct the correct natural language that corresponds to a SQL query from a warehouse log. To address this gap, we constructed BenchPress, an interactive, human-in-the-loop system that enables enterprises to efficiently help generate domain-specific text-to-SQL benchmarks. We have evaluated BenchPress on annotated enterprise SQL logs, demonstrating that LLM-assisted annotation drastically reduces the time and effort required to create high-quality benchmarks. Our results show that combining human verification with LLM-generated suggestions enhances annotation accuracy, benchmark reliability, and model evaluation robustness. By streamlining the creation of custom benchmarks, BenchPress offers researchers and others a mechanism for assessing text-to-SQL models on a given,domain-specific workload.

Effy Xue Li

Effy Xue Li is currently a postdoc at CWI (Centrum Wiskunde & Informatica). Her research focuses on Knowledge Graph Construction from Conversational Data, with a particular interest in leveraging large language models (LLMs) to extract structured information such as entities and relations. She also explores how to make LLMs more data-efficient, robust, and adaptable, especially in the context of data management. She recently completed an internship at MotherDuck, working on the application of LLMs for data management tasks. Prior to that, she was an AI Resident at Microsoft Research Cambridge in the UK. She holds a PhD from the University of Amsterdam in the INDE Lab and a Master’s degree from the University of Edinburgh.

Efficient Use of LLMs for Data Preparation 

Data preparation accounts for 80% of a data scientist’s time and remains one of the least enjoyable yet essential tasks in the workflow. While LLMs offer new opportunities for automating structured data preparation, challenges persist in efficiency, scalability, and adaptability. In this talk, we explore the efficient use of LLMs for data preparation, including (1) generating transformation code for data wrangling tasks and (2) fine-tuning small LLMs for entity matching. We highlight how LLMs can be leveraged not just for reasoning but as scalable, cost-effective automation tools in data-cleaning pipelines. Finally, we discuss future research opportunities in the area, paving the way for more adaptable and interpretable AI-driven data science workflows.

Orlando Marquez

Orlando is a ServiceNow Lead Applied Research Scientist with a strong background in software engineering. One of his passions is shipping state-of-the-art AI to end-users through rigorous and careful experimentation as well as sound engineering. He has worked on several NLP tasks such as question answering, natural language understanding and summarization. He most recently led the development of Flow Generation, a GenAI feature with hundreds of enterprise users that automatically generates low-code workflows from text. Nowadays, he is trying to figure out how to reliably ship web agents. He holds a Bachelor of Software Engineering from the University of Waterloo and a Masters of Computer Science from the Université de Montréal (MILA).

Flow Generation: When Generative AI Meets Enterprise Workflows

Workflows are at the core of enterprise systems as they automate repetitive and tedious processes. However, creating them is difficult due to the amount of domain knowledge required. In this talk, I will describe how we leveraged LLMs to generate complete workflows based on user textual requirements. This feature, called Flow Generation, is part of the ServiceNow suite of Generative AI products that are used by hundreds of enterprise customers. I will explore why fine-tuning a small LLM is the way to go in domain-specific use cases such as this one, and how design patterns like Task Decomposition and RAG allow us to satisfy software quality attributes such as scalability and testability, as well as address LLM limitations such as hallucination. Lastly, I will comment on the limitations of our current approach and how web agents offer a promising yet challenging avenue to further automate the creation and execution of workflows.

Nadia Nahar

Nadia Nahar is a Ph.D. candidate in Software Engineering at Carnegie Mellon University’s School of Computer Science, where she is advised by Prof. Christian Kästner. Her research focuses on software engineering for machine learning (SE4ML), interdisciplinary collaboration in ML products, and Responsible AI. Nadia investigates the socio-technical challenges of building and deploying machine learning products, employing empirical methods and designing interventions to bridge the gap between software engineers and data scientists. Her recent work explores a range of topics, including collaboration challenges in building ML products, the challenges and emerging best practices for integrating large language models (LLMs) into software systems, and the landscape of open-source ML product development. Nadia also develops LLM-powered tools to facilitate collaborative requirements elicitation, and responsible AI engagement. Her work has resulted in many publications at top-tier venues such as ICSE, FAccT, and CHI, earning accolades like the ACM SIGSOFT Distinguished Paper Award (ICSE) and the Ivica Crnkovic Early Career Researcher Award (CAIN).

Bridging the Knowledge Boundaries: Enabling Early Collaboration Between Model and Product Teams with LLM Assistants

Turning machine learning (ML) models into successful products is notoriously challenging. The challenges extend far beyond building accurate models: successful ML products require seamless integration with software systems, continuous monitoring, safeguarding against failures, and navigating ethical, regulatory, and operational concerns. These tasks demand collaboration between data science and software engineering teams—teams that often speak different languages and operate with distinct priorities that can create friction and misunderstandings. In this talk, I’ll go over why simply “throwing models over the fence” to engineers does not work and highlight the collaboration challenges in ML product development. I will share practical strategies and tools for fostering early, effective teamwork between model and product teams. In particular, I will demonstrate how large language models (LLMs) can bridge communication gaps—from generating shared stories to offering real-time feedback. I will present several interventions, including boundary objects to negotiate and document model requirements, carefully crafted stories that encourage engagement with responsible AI principles, and collaborative policies for explainable AI. All these interventions aim to empower model and product teams to help themselves and work together more effectively to deliver ML products that succeed in real-world settings.

Olivier Nguyen

Olivier is an Applied Scientist at Twitch working on the Safety ML team building machine learning applications to detect and remove harmful content and behavior from the platform. Previously, he was an Applied Research Scientist at ServiceNow that joined through the acquisition of Element AI. His previous work focused on problems in NLP (question answering, semantic search, natural language understanding) and everything around it (research, engineering, scaling).

Using LLMs for content moderation at Twitch

Recent advances in Large Language Models (LLMs) have opened new possibilities for automating complex decision-making tasks that traditionally required extensive human expertise. This talk presents a case study on leveraging LLMs to enhance content moderation systems for large-scale trust and safety operations. We demonstrate how modern LLMs’ capabilities – including large context windows, multilingual understanding, and structured reasoning – can be applied to interpret nuanced platform policies and make consistent moderation decisions while providing detailed explanations. We discuss the technical architecture, prompt engineering approaches, and evaluation frameworks developed for this system. Our findings show promising results in handling complex moderation scenarios, while highlighting important considerations around scalability and edge cases. The presentation presents exploration of future directions, particularly the potential of composable AI agents to handle increasingly complex trust and safety workflows.

Jennifer McArthur

Dr. Jennifer McArthur is an Associate Professor in the Department of Architectural Science at Toronto Metropolitan University. She holds a PhD in Management (University of Edinburgh) and MASc and BASc in Mechanical Engineering (both University of Waterloo). Jenn also serves as  Associate Chair for the Graduate Program in Project Management in the Built Environment. Jenn’s research program centers on the development of Smart & Sustainable Building Solutions including Smart and Continuous Commissioning applications, Cognitive Digital Twins for Smart Buildings, and supporting industry adoption of Smart technologies. Her other research focuses on building performance improvement through Smart and Ongoing Commissioning (SOCx), Smart Campus Integration, FM-enabled BIM, and  workplace design to improve productivity and health. Currently, Jenn is working with Infrastructure Ontario to develop a provincial Digital Twins strategy. Prior to her academic career, Jenn worked for over a decade as a mechanical engineer in building design, rural development, and disaster relief on three continents.

Cognitive Digital Twins for Decarbonization – Integrating AI with Building Technologies

Cognitive Digital Twins (CDTs) are well-established in the manufacturing sector but have had very little implementation in the buildings sector, which has a low degree of digitization. This presentation highlights the key steps needed to create the first known proof-of-concept CDT for a real-world building, the Daphne Cockwell Complex at TMU and how it is being used to support decarbonization efforts at TMU. The CDT development to date will be presented, describing the approaches and algorithms used, results achieved, and lessons learned. Topics covered will include the integrated data model and supporting ontology; building automation system data acquisition and streaming; data lake; event detection algorithms; integration, and visualization. Insights regarding approach selection, implementation considerations, limitations, and alternatives are presented for each to guide the remaining steps (learning to support true cognition) in the CDT development. Other decarbonization CDT projects in progress will also be presented, including those at the urban scale and at the equipment scale.

Istvan David

Dr. Istvan David is an Assistant Professor of Computer Science and Software Engineering in the Faculty of Engineering at McMaster University. His research is situated under the broader systems engineering umbrella, with a particular interest in the engineering of complex and sustainable systems by digital accelerators, such as digital twins and artificial intelligence. The mission of Dr. David’s program is to promote sustainable practices in complex systems engineering, especially through modeling and simulation and model-driven engineering. He maintains additional lines of research in select topics in modeling and simulation, cyber-physical systems, automated software and systems construction, and collaborative modeling.

The perception of the value and propriety of complex systems is changing. In addition to their functional and extra-functional properties, nowadays’ systems are also evaluated by their sustainability properties. As humankind recognizes the magnitude of contemporary environmental problems, the need for truly sustainable systems is more pronounced than ever. Advanced digital technology, such as digital twins and AI, offers unprecedented opportunities to implement more sustainable systems. However, digital technology itself is a major contributor to sustainability problems (especially of environmental and social kinds) and should be used with care. This talk explores the various forms of sustainability in complex engineered systems and reflects on the benefits and caveats of using digital twins to implement sustainability ambitions.

SEMTL and Tutorials – June 19th, 2025

Julien Cohen-Adad

Prof. Cohen-Adad is an MR physicist and software developer with over 15 years of experience in advanced MRI methods for quantitative assessment of the brain and spinal cord structure and function. He is an associate professor at Polytechnique Montreal, adjunct professor of neuroscience at the University of Montreal, associate director of the Neuroimaging Functional Unit (Univ. Montreal), member of Mila (Univ. Montreal) and he holds the Canada Research Chair in Quantitative Magnetic Resonance Imaging. Along with his colleagues Prof. Nikola Stikov, Prof. Alonso-Ortiz and Prof. De Leener, he is co-directing the NeuroPoly Lab (www.neuro.polymtl.ca), which includes about 30 graduate students and research associates. Prof. Cohen-Adad’s research is highly cited (Google Scholar). As a leader in the field, he organized multiple workshops at international conferences (https://spinalcordmri.org/workshops.html). He is a frequent guest lecturer on advanced MRI methods and he regularly serves as consultant for various companies (e.g. Biospective Inc., NeuroRx, IMEKA) and academic (Harvard, U. Toronto, UCL, UCSF, etc.) for setting up MRI acquisition and image processing protocols. 

An (Un)Success-Story in the Development of Software for AI and Medical Imaging

In this presentation, Julien Cohen-Adad shares lessons learned from developing software for AI in medical imaging, focusing on the challenges of generalizing deep learning models across diverse clinical settings. He outlines key limitations in neuroimaging AI, such as small datasets, limited annotations, and lack of robustness across imaging contrasts and sites. Strategies to address these issues include domain adaptation, multi-contrast training, and injecting MR physics priors into model architectures. While some successes emerged—like improved tumor segmentation—many approaches faced limitations, especially in the face of inter-rater variability. Julien then reflects on the development and eventual sunsetting of some medical AI software framework, advocating instead for lighter, more maintainable tools, leveraging open standards like BIDS and popular ecosystems like MONAI. The talk concludes with a call for reproducible research, practical innovation, and leveraging local strengths—such as MR physics knowledge and clinical collaborations—rather than competing with industry-scale AI.

Lina Marsso

Lina Marsso is an assistant professor, at the department of software engineering at the Polytechnique Montréal, and associate academic member at Montreal-based artificial intelligence research institute. She was a post-doctoral researcher in the Department of Computer Science at the University of Toronto working with Marsha Chechik. She received her PhD from INRIA Grenoble, France where she was advised by Radu Mateescu and Ioaniss Parissis. Her recent work is in the combination of safety, social, legal, ethical, empathetic, and cultural, verification and analysis of autonomous systems. Lina was organizer and served as PC Chair on the first International Workshop on Dependability and Trustworthiness of Safety-Critical Systems with Machine Learned Components and the sixth International Workshop on Automated and Verifiable Software System Development.

Toward Trustworthy AI-Based Systems

In this talk, I will show how we can integrate formal methods with software engineering techniques to build trustworthy autonomous systems. My work and research vision have three objectives: (1) specify what trustworthiness means for autonomous systems; (2) validate systems against those specifications; and (3) enable self-adaptation so that systems evolve until they satisfy the specifications.

Lili Wei

Lili Wei is an Assistant Professor at the ECE department of McGill University. Her major research interests lie in software analysis, testing, and mining code repositories with a focus on Android applications, smart contracts and IoT software. She was a Post-doctoral Fellow in CASTLE group at HKUST, and a recipient of the Postdoctoral Fellowship Award by the Hong Kong Research Grant Council. She completed her doctoral degree under the supervision of Prof. S.C. Cheung at HKUST in Spring 2020. Her doctoral thesis focuses on taming compatibility issues raised by Android fragmentation.

How Far are Android App Secrets from Being Stolen?

Android apps hold secret strings of themselves such as cloud service credentials or encryption keys. Leakage of such secret strings can induce unprecedented consequences like monetary losses or leakage of user private information. In this talk, I will introduce our recent work on characterizing app secrets that are checked in the released app package files. Our study shows that exploitable app secrets can be harvested with nothing more than simple regular expressions.

Amine Mhedhbi

Amine Mhedhbi (https://amine.io/) is an assistant professor at Polytechnique Montréal. His interests are in building and analyzing data management systems; his work includes tackling performance considerations and debuggability, interface design, and applications. Amine received his Ph.D. in 2023 from the University of Waterloo. His research has been awarded a VLDB Best Paper Award, a Microsoft Ph.D. Fellowship Award, a Meta Ph.D. Fellowship Finalist, and the Cheriton School of Computer Science Ph.D. Dissertation Award.

Towards Multimodal Database Management Systems

As language models become widely accessible, organizations are investing in software systems that retrieve and reason over large, semantically rich data for scientific and business workflows. These data sources are often heterogeneous and multimodal. Approximately 80% of enterprise data is unstructured, and much of it remains untapped. Despite growing investment, implementing these workflows remains engineering-heavy and costly, requiring specialized expertise. Developers are forced to make many low-level execution decisions and rely on loosely coupled systems connected by hand-written orchestration scripts. This hinders adoption, especially among small and medium-sized enterprises and government agencies. To address these challenges, I believe a new class of data systems is needed: multimodal database management systems (DBMSes). These data systems would tightly fuse analytics with AI-integrated prediction and reasoning. They will sit atop repositories of raw data spanning tables, documents, images, and audio. In this talk, I focus on giving a broad overview of the needed capabilities and the research challenges towards making multimodal DBMSes mature and usable.

Houssem Ben Braiek

Houssem Ben Braiek is an ML Tech Lead at Sycodal, building AI-powered vision and robotics solutions that enable local small-to-medium manufacturers to integrate versatile automation into their constantly-evolving production lines. He earned both an M.Sc. and a Ph.D. in Software Engineering from Polytechnique Montréal, with each dissertation honored by the department’s Best Thesis Award. His research tackles the practical challenges faced by AI engineers, delivering industry-adopted methods—NeuraLint, TheDeepChecker, DeepEvolution, PhysiCAL, and SmOOD—each addressing a key quality attribute of trustworthy AI systems and published in leading software-engineering venues. As part of his community service, he co-founded VerifIA, a nonprofit initiative devoted to open-source AI-verification tools, and co-created the IVADO MLOps Upskilling Program. Passionate about emerging AI advances, he often shares insights through workshops and blog posts.

This talk introduces VerifIA, an open-source, domain-aware verification framework that evaluates AI models in the staging phase—after they clear data-driven benchmarks but before deployment. At this stage, models must still align with application-specific domain knowledge. VerifIA’s inaugural release automates a battery of rule-based and search-driven verifications—covering in-domain input space—to expose brittle behaviours and inconsistencies that statistical tests miss. It then generates an interactive validation report, linked to the staging model, that equips Ops teams with clear evidence for deploy-or-iterate decisions. Built-in adapters load models from scikit-learn, LightGBM, CatBoost, XGBoost, PyTorch, and TensorFlow, and push HTML reports directly to MLflow, Comet ML, or Weights & Biases for one-click go/no-go approval. A retrieval-augmented, human-in-the-loop flow drafts domain rules automatically and lets experts refine them, trimming setup time from days to minutes. The session unpacks VerifIA’s Arrange–Act–Assert workflow, presents a live demo, and showcases end-to-end use cases—providing a practical blueprint for ensuring forecasting models in production meet domain standards and application requirements.

Ahmed Haj Yahmed

Ahmed Haj Yahmed is an MLOps Specialist at Sycodal, where he participates in the development and deployment of machine learning solutions for intelligent robotic systems. His work focuses on enhancing the perception and decision-making of collaborative robots through advanced computer vision pipelines. He holds an M.Sc. in Software Engineering from Polytechnique Montréal, where he was a member of the SWAT Lab under the supervision of Prof. Foutse Khomh. His master’s thesis received the Best Master’s Thesis Award in Computer Science and Software Engineering for 2023, as well as the “Mention Spéciale du Jury” for interdisciplinary excellence. During his studies, he was also affiliated with Mila and participated in the SE4AI CREATE program, contributing to research on the reliability and production-readiness of deep learning and deep reinforcement learning systems. Prior to his graduate studies, he earned an Engineer Degree in Software Engineering, with honors, from the National Institute of Applied Science and Technology (INSAT), affiliated with the University of Carthage, Tunisia. His academic and industrial work lies at the intersection of software engineering and artificial intelligence, with research contributions published in leading journals and conferences. His research is primarily focused on Quality Assurance of Deep Learning-based Software Systems, Software Engineering for ML/AI, and dependable and trustworthy ML/AI.

This talk introduces VerifIA, an open-source, domain-aware verification framework that evaluates AI models in the staging phase—after they clear data-driven benchmarks but before deployment. At this stage, models must still align with application-specific domain knowledge. VerifIA’s inaugural release automates a battery of rule-based and search-driven verifications—covering in-domain input space—to expose brittle behaviours and inconsistencies that statistical tests miss. It then generates an interactive validation report, linked to the staging model, that equips Ops teams with clear evidence for deploy-or-iterate decisions. Built-in adapters load models from scikit-learn, LightGBM, CatBoost, XGBoost, PyTorch, and TensorFlow, and push HTML reports directly to MLflow, Comet ML, or Weights & Biases for one-click go/no-go approval. A retrieval-augmented, human-in-the-loop flow drafts domain rules automatically and lets experts refine them, trimming setup time from days to minutes. The session unpacks VerifIA’s Arrange–Act–Assert workflow, presents a live demo, and showcases end-to-end use cases—providing a practical blueprint for ensuring forecasting models in production meet domain standards and application requirements.