Posters will be presented during the reception on June 18th. Selected lightning talks will also take place during SEMTL on June 19th.
Posters
Accepted Posters:
1 | AI-Driven Bidding Decisions: Enhancing Competitive Construction Strategies with Machine Learning | Competitive bidding in construction is a high-stakes process requiring accurate decision-making under uncertainty. This study revisits foundational bidding theories—Friedman’s model and Ioannou’s revised formulations—and integrates them with modern machine learning techniques to improve bid outcome predictions. Using real-world data from the Virginia Department of Transportation, we developed a Random Forest classifier that achieved 98% accuracy in distinguishing successful from unsuccessful bids. The model effectively captured non-linear relationships among features like bid amount, rank, and market conditions, and demonstrated resilience to class imbalance. Our findings reveal that AI agents can provide data-driven support for contractors, enabling more strategic and informed bid/no-bid decisions. This research offers a practical path toward integrating predictive analytics into construction management workflows. | Qadri H Shaheen | University of Maryland |
2 | Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB | Knowledge-intensive analytical applications retrieve context from both structured tabular data and unstructured, text-free documents for effective decision-making. Large language models (LLMs) have made it significantly easier to prototype such retrieval and reasoning data pipelines. However, implementing these pipelines efficiently still demands significant effort and has several challenges. This often involves orchestrating heterogeneous data systems, managing data movement, and handling low-level implementation details, e.g., LLM context management.
To address these challenges, we introduce FlockMTL: an extension for DBMSs that deeply integrates LLM capabilities and retrieval augmented generation (RAG). FlockMTL includes model-driven scalar and aggregate functions, enabling chained predictions through tuple-level mappings and reductions. Drawing inspiration from the relational model, FlockMTL incorporates: (i) cost-based optimizations, which seamlessly apply techniques such as batching and caching; and (ii) resource independence, enabled through novel SQL DDL abstractions: PROMPT and MODEL, introduced as first-class schema objects alongside TABLE. Our demonstration showcases how FlockMTL streamlines the development of knowledge-intensive analytical application while highlighting the importance of its optimizations in easing the implementation burden. |
Anas Dorbani | Polytechnique Montréal |
3 | Bit-level Perturbation as a Defense: Mitigating LSB-based Payload Recovery in Transformer Models | Bit-level Perturbation as a Defense: Mitigating LSB-based Payload Recovery in Transformer Models investigates whether introducing low-bit perturbations can serve as a lightweight, model-agnostic defense to disrupt hidden payloads in transformer models such as BERT and ViT without degrading accuracy. We evaluate two defense mechanisms: (1) Random Flip Defense—each least significant bit of floating-point weights is independently flipped with a specified probability (for instance, 20%); (2) Pattern Defense—a fixed binary pattern (such as “1101”) replaces least significant bits across target parameters. Both techniques exploit the principle that modifying LSBs has a minimal numerical effect on weight values while critically impairing payload restoration. Through controlled perturbations applied to transformer weights, we assess impacts on payload recoverability and model performance. Experimental results demonstrate that both randomized and patterned LSB perturbations effectively hinder hidden payload extraction, achieving over 30% reduction in payload retrieval, while preserving inference accuracy within 1–2% of baseline across diverse model types. This defense requires no retraining of existing models and demands minimal computational overhead. Its generality extends across model architectures and payload embedding strategies, making it suitable for integration into real-time threat detection pipelines and supply-chain screening processes. These findings confirm that bit-level perturbation offers a practical, lightweight defense strategy compatible with transformer-based systems. | Tsung-Hsien Chuang |
Polytechnique Montréal
|
4 | Can ChatGPT Migrate My Code? | Code migration is a time-consuming process, especially when dealing with multiple versions of third-party open-source libraries. With the rise of large language models (LLMs) like ChatGPT, we are interested in their potential for automating code-related tasks, including generation, troubleshooting, and migration. This poster presents a study evaluating the reliability of ChatGPT-4 as a code migration assistant across 12 popular Java libraries, each with 10 selected versions. An initial code snippet is selected for each library, and ChatGPT is prompted to migrate the snippet across all versions. In a subsequent step, the model is provided with specific compilation, runtime or incorrect output errors from its previous outputs and asked to correct them. The results show that ChatGPT-4 can successfully migrate code across most library versions and effectively troubleshoot its errors, demonstrating potential as a tool for code migration. However, while promising, this experiment represents a simplified version of migration scenarios in real practice. Further research is needed to assess performance in more complex systems involving interdependent scripts and large systems. | Yasmine Drissi | Concordia University |
5 | CodeChat: A Large Dataset of Conversations Between Programmers and Large Language Models for Understanding Use Cases and Code Quality Issues | Large language models (LLMs) increasingly assist programmers in coding tasks through natural language interactions to generate code, discover solutions, and boost productivity. However, there remains limited understand- ing of how programmers interact with LLMs and the reliability and quality of the code produced through these interactions. This gap raises concerns regarding the overall utility and trustworthiness of LLMs in real-world soft- ware development scenarios. To address this gap, we introduce CodeChat, a large dataset comprising 82,845 real-world programmer-LLM conversations from 26,085 users, containing 368,506 code snippets generated across over 20 programming languages. We systematically analyze characteristics of these conversations, focusing on programmer queries, LLM response patterns, and generated code quality. We find that programmer queries are typically brief, while corresponding LLM responses are considerably more verbose, with a me- dian token-length ratio of approximately 15:1. Programmers frequently seek assistance in tasks related to webpage design and neural network training. In addition, our evaluation of LLM-generated code across five programming lan- guages (Python, JavaScript, C++, Java, and C#) identifies quality issues such as prevalent syntax errors, redundant and unused code elements, insufficient documentation, and deviations from best practices. Based on our findings, we recommend that researchers investigate strategies for optimizing token alloca- tion and develop benchmarks focused on tasks frequently queried by program- mers. Furthermore, we suggest that conversational code assistants integrate artifact management systems, and implement automated error detection and correction mechanisms. |
Suzhen Zhong |
Queen’s University
|
6 | Cognitive Digital Twin for Construction Decarbonization: AI-Driven Sustainability Monitoring | Construction SMEs contribute significantly to global carbon emissions, yet face expertise and cost barriers to decarbonization. This poster presents a Cognitive Digital Twin platform, inspired by SnapReport—an AI/IoT tool reducing reporting time by 80% across 10 pilot sites—for real-time sustainability monitoring. Integrating BIM (Revit) and AI (Python), the platform tracks energy consumption and material reuse, enabling a multi-stakeholder network (SMEs, contractors, policymakers). A win-win model offers SMEs free reporting tools, while anonymized data validates AI-driven decarbonization pathways. The methodology includes semi-structured interviews (20 stakeholders), thematic analysis (NVivo), and a pilot with 5 SMEs. Data verification and validation, via member checking and LCA (OpenLCA), align with SEMLA’s AgentOps focus, ensuring robust AI operations. Hypothetical outcomes include 15% waste reduction and policy incentives for net-zero. This research, developed by an independent researcher with 8 years of construction experience, offers a scalable framework for SDG 13, with potential applications in smart urban systems. Reference: Pomponi, F., & Moncaster, A. (2017). Journal of Cleaner Production, 143, 710-718. https://doi.org/10.1016/j.jclepro.2016.12.055 |
Sayyed Farid Goldoozian | Independent Researcher, Developer of SnapReport Solutions |
7 | Distill or Inject? An evaluation of different GNNs with pretrained transformers features for code related tasks | Pretrained Transformer models have demonstrated exceptional capability in capturing semantic information and achieving state‑of‑the‑art results across numerous code understanding tasks. However, they lack explicit modeling of program structure since they treat code as a sequence of tokens, while Graph Neural Networks (GNNs) excel at exploiting the syntax and topology encoded in Abstract Syntax Trees (ASTs). Bridging these modalities—semantics from Transformers and structure from GNNs—promises richer code representations, yet raises two key questions: which GNN architecture best leverages Transformer features, and should those features be injected directly or distilled via an auxiliary loss? In this work, we use exact AST node‑to‑source‑span alignment to pool Transformer hidden states into node embeddings. We then systematically benchmark four foundational GNNs (GCN, GAT, GIN, VGAE) coupled with three Transformer encoders (CodeBERT, CodeT5‑Base, UniXcoder) across three tasks—Java250 code classification, CodeSearchNet retrieval, and Devign vulnerability detection. By comparing direct feature injection against LLM‑to‑GNN distillation, we analyze and quantify the performance and efficiency trade‑offs of each strategy. Our results reveal how GNN choice, Transformer size, and fusion method interact to drive code task performance, offering insight for future Graph Neural Network and Pretrain Transformers integration works. |
Taoufik El Idrissi | Polytechnique Montréal |
8 | DoomArena: A framework for Testing AI Agents Against Evolving Security Threats | DoomArena is a security evaluation framework for AI agents designed on three principles: 1) It is a plug-in framework and integrates easily into realistic agentic frameworks for web and tool-calling agents; 2) It is configurable and permits detailed threat modeling, allowing configuration of specific components of the agentic framework being attackable, and specifying targets for the attacker; 3) It is modular and decouples the development of attacks from the environment in which the agent is deployed, allowing the same attacks to be applied across multiple environments. We illustrate several advantages of our framework, including the ability to adapt to new threat models and environments easily, the ease of combining several previously published attacks to enable comprehensive and fine-grained security testing, and the ability to analyze trade-offs between various vulnerabilities and performance. We apply DoomArena to SOTA web and tool-calling agents and find the following: 1) SOTA agents have varying levels of vulnerability to different threat models (malicious user vs malicious environment), and there is no Pareto dominant agent across all threat models; 2) When multiple attacks are applied to an agent, they often combine constructively; 3) Guardrail model-based defenses are failing, while defenses based on powerful LLMs work better. | Léo Boisvert | MILA, Polytechnique Montréal, ServiceNow Research |
9 | Ethical and Empathetic Requirements for Trustworthy ChatBots | As LLM-based chatbots become prevalent across domains such as healthcare, education, and customer service, ensuring their behaviour aligns with social, ethical, empathetic, and cultural (SEEC) norms is essential for fostering trust in these systems. Yet, eliciting normative requirements for such LLM-based chatbots remains challenging, in part because of their open-ended capabilities and vast, unpredictable output space. In this paper, we propose normative requirements that describe how chatbot responses should behave, to be SEEC-compliant, across varying inputs, contexts, and user profiles in three application domains: therapy, education, and customer service. We briefly provide a landscape of existing norms, commercial-chatbots in these domains, and discuss the challenges for eliciting normative requirements that does not over restrict the core functionality of the domain-specific chatbots. |
Sophia Jit | University of Toronto/Polytechnique Montréal |
10 | Extracting Microservices from Monolithic Systems using Deep RL | The Microservice Architecture emerged as a solution to problems encountered when developing and maintaining a Monolithic system which motivated the migration of legacy monoliths into microservices. However, previous migration efforts have shown that this process can be very costly and very difficult to execute citing the decomposition as a bottleneck. Approaches that can automate and guide the developers through this phase can alleviate the issues of the migration process. However, the number of potential microservices rises exponentially with the scale of the monolith. Navigating the space of potential decompositions can be difficult due to the ambiguity of the desired qualities of the new architecture. We propose a novel Deep Reinforcement Learning based decomposition approach: RLDec. We formulate the microservices recommendation task as a Markov Decision Process and we train a Deep Neural Networks model using the quality of the generated microservices as a reward function. These quality metrics are based mainly on the structural and semantic analysis of the monolithic system. We evaluate the performance of our proposed approach using distinct metrics and different experimental settings including a comparison of different variations of RLDec, a comparison with state-of-the-art decomposition approaches and qualitative analysis of a decomposition example. | Khaled Sellami | Université Laval |
11 | LLMs for Public Sector Compliance: Automating BABA Reviews with BERT and LLaMA | Manual review of federal grant applications for Build America Buy America Act (BABA) compliance is time-consuming, inconsistent, and error-prone. This study presents an AI-powered compliance checker that integrates two large language models: a fine-tuned BERT classifier and an instruction-tuned LLaMA explanation generator. BERT determines whether a document complies with BABA, while LLaMA provides human-readable justifications for non-compliance. The system was trained on 27 real-world grant PDFs using lightweight preprocessing and tokenization. Despite the small and imbalanced dataset, results show promising classification accuracy and meaningful explanations. The dual-model approach offers scalable, transparent support for regulatory reviews and lays the groundwork for broader AI adoption in public sector compliance workflows. | Qadri H Shaheen | University of Maryland |
12 | MonoEmbed: LLM Representations for Monolith-to-Microservice Decomposition | As Monolithic applications evolve, they become increasingly difficult to maintain and improve, leading to scaling and organizational issues. The Microservices architecture, known for its modularity, flexibility and scalability, offers a solution for large-scale applications allowing them to adapt and meet the demand on an ever increasing user base. Despite its advantages, migrating from a monolithic to a microservices architecture is often costly and complex, with the decomposition step being a significant challenge. This research addresses this issue by introducing MonoEmbed, a Language Model based approach for automating the decomposition process. MonoEmbed leverages state-of-the-art Large Language Models (LLMs) and representation learning techniques to generate representation vectors for monolithic components, which are then clustered to form microservices. By evaluating various pre-trained models and applying fine-tuning techniques such as Contrastive Learning and Low Rank Adaptation (LoRA), MonoEmbed aims to optimize these representations for microservice partitioning. The evaluation of the fine-tuned models showcases that they were able to significantly improve the quality of the representation vectors when compared with pre-trained models and traditional representations. The proposed approach was benchmarked against existing decomposition methods, demonstrating superior performance in generating cohesive and balanced microservices for monolithic applications with varying scales. | Khaled Sellami | Université Laval |
13 | Online Self-Supervised Multimodal Vision Transformers for First-Person Human Action Recognition | This research focuses on developing an online model for First-Person Human Action Recognition using a vision-based approach aimed at improving model robustness, enhancing learning efficiency, and reducing computational complexity. The poster highlights key challenges such as limited paired image-text samples, data imbalance, and the high cost of generating new annotated data. These limitations drive the need for more efficient self-supervised learning (SSL) methods. The proposed research explores state-of-the-art approaches while introducing techniques that reduce reliance on costly labeled data, extract informative features, and address overfitting through advanced regularization strategies. Additionally, this work demonstrates that integrating multiple modalities such as pairing images with corresponding textual descriptions can significantly improve classification accuracy and performance in zero-shot downstream tasks. The main objective is to design a scalable, transformer-based architecture while resolving instability issues commonly observed in Vision Transformer (ViT) training, particularly those caused by vanishing gradients. This will be addressed through augmentation strategies to increase semantic diversity and avoid fixed representations that lead to model collapse. The proposed model will also incorporate methods such as feature masking and projection space learning to accelerate convergence and improve generalization. |
Armin Nabaei | Université de Sherbrooke |
14 | Towards Optimizing SQL Generation via LLM Routing | Text-to-SQL enables users to interact with databases through natural language, simplifying access to structured data. Although highly capable large language models (LLMs) achieve strong accuracy for complex queries, they incur unnecessary latency and dollar cost for simpler ones. In this paper, we introduce the first LLM routing approach for Text-to-SQL, which dynamically selects the most cost-effective LLM capable of generating accurate SQL for each query. We present two routing strategies (score- and classification-based) that achieve accuracy comparable to the most capable LLM while reducing costs. We design the routers for ease of training and efficient inference. In our experiments, we highlight a practical and explainable accuracy-cost trade-off on the BIRD dataset. | Mohammadhossein Malekpour | Mila – Quebec AI Institute / Polytechnique Montréal |
15 | Towards Reliable and Trustworthy AI Systems in Software Engineering: A Literature Review on Miscommunication in Human-AI Interaction | As large language models and conversational agents become increasingly integrated into software development tools and workflows, ensuring reliable and trustworthy interactions between users and AI systems is critical. However, communication failures— where user intentions are misunderstood or AI responses are misinterpreted—pose persistent challenges to effective human-AI collaboration. In this work, we will present a systematic literature review on miscommunication in human-AI interaction within software engineering. Drawing from software engineering, human-computer interaction, and NLP literature, we identify common sources of misunderstanding, categorize types of misunderstandings, and outline common failure modes in interaction. Further, we discuss evaluation methods used to detect and quantify miscommunication, and summarize emerging strategies for mitigating these issues. We conclude by outlining a research agenda for improving the verification, validation, and operational robustness of AI systems in software development, with specific attention to addressing miscommunication as a core reliability concern. By drawing attention to this problem, we aim to inform the design of more robust, interpretable, and user-aligned AI systems for software engineering. | Huizi Hao | Queen’s University, Kingston, Ontario |
16 | Unveiling Kubernetes Misconfigurations: Empirical Analysis and Improved Detection Approaches for Cloud-Native Infrastructures | Cloud-native computing promotes scalability, flexibility, and efficiency, driving the widespread adoption of orchestration platforms such as Kubernetes. However, the complexity of managing containerized applications in these environments increases the likelihood of misconfigurations, which may affect system security, availability, and compliance. This study examines the role of Large Language Models (LLMs) in detecting Kubernetes misconfigurations. We propose a taxonomy covering common misconfiguration types and assess the applicability of LLMs for their identification. Through empirical evaluation, we analyze the performance of existing detection tools and highlight their limitations in terms of coverage. We further investigate the distribution of misconfigurations across Kubernetes object types to identify particularly error-prone components. Our findings show that LLMs can generalize across diverse misconfiguration patterns and provide meaningful predictions even for previously unseen cases. Compared to conventional static analysis tools, LLMs offer enhanced flexibility in reasoning over complex configuration contexts. This work contributes to the understanding of how the use of LLMs can be integrated to enhance security checks in Kubernetes-based validation and operation pipelines. It provides evidence that LLM-based detection can complement existing approaches and improve the robustness of cloud-native system configurations. |
Mostafa Anouar Ghorab | Université Laval |