Research

Code Clone and Refactoring

Our research aims to bridge the gap between these refactoring techniques and their application in practice. To do so, we focused on investigating the effectiveness of detecting and extracting code clones without disrupting the developer’s workflow by developing an IntelliJ IDEA plugin to recommend just-in-time Extract Method refactoring. We call it the AntiCopyPaster tool.

Large Language Models and Code Quality

Our research explores how Large Language Models (LLMs) like ChatGPT and DeepSeek are transforming code quality. This project investigates how AI can enhance software development by detecting issues, refactoring code, and improving maintainability. We are interested in cutting-edge AI applications and work on impactful research that bridges machine learning and software engineering.

Refactoring Practices in Modern Code Review

Modern code review is a widely used technique employed in both industrial and open-source projects to improve software quality, share knowledge, and ensure adherence to coding standards and guidelines. During code review, developers can discuss refactoring activities before merging code changes into the code base. We have committed to explore how refactoring is being reviewed and what developers care about when they review refactored code.

Code Quality in Computing Education

Our research aims to advance the field of computing education through innovative research, collaboration, and dissemination of knowledge. The goal is to build platform to help educators find a wealth of resources, including latest research findings, educational tools, and information about ongoing projects. We are dedicated to exploring effective teaching methodologies, understanding student learning processes, and developing technologies that enhance the educational experience.

Refactoring Documentation

We proposed Self-Affirmed Refactoring (SAR) to better understand developer perception of refactoring. SAR refers to developers’ documentation of their refactoring activities. SAR is key to understanding various aspects of refactoring, including the motivation, procedure, and consequences of the performed code changes, all documented by the code authors themselves. Despite growing efforts in automating refactorings through structural metrics optimization and code smells removal, there is very little evidence on whether developers follow those intentions when refactoring their code.

Data Leakage in Machine Learning Pipelines

Data leakage is a critical issue in machine learning pipelines that occurs when information from the target variable (the variable we want to predict) unintentionally leaks into the features used to train the model. Preventing data leakage in machine learning pipelines requires a careful understanding of the data, rigorous feature engineering, and robust evaluation techniques to ensure the model’s integrity and generalizability. the goal of this project is to aid data scientists with the detection of leaked data in their code. This is done through the implementation Leakage Detector, a plugin that parses Python code to search for patterns of data leakage.