Lab for Intelligent Automation in Software EngineeRing

Lab Director Research Members Projects

 Mission Statement
Software bugs and failures are inevitable, and they are costly. In 2017 alone, 606 software bugs cost the global economy about $1.7 trillion, with 3.7 billion people affected and 314 companies impacted. While the traditional bugs are already ubiquitous, many novel classes of bugs (e.g., deep learning bugs) are emerging due to the popular adoption of machine/deep Learning technologies, cloud-based infrastructures and Big Data analytics within the paradigm of traditional software systems. Thus, software developers need to deal with thousands of known and unknown bugs, which consumes about 50% of their programming time. Furthermore, to remain competitive, they also need to respond to the market demands by frequently adding new features or by enhancing the existing features in their software products. Feature enhancement also consumes about 60% of the total maintenance cost. Thus, fixing bugs and enhancing the features are two of the major challenges that the developers experience during software maintenance. We (1) perform large-scale empirical studies to better understand the reported bugs and requested features, and (2) develop intelligent, robust and cost-effective solutions for dealing with them. We use a blend of Information Retrieval (IR), Machine/Deep Learning (ML/DL), Crowdsourced Knowledge Mining, Genetic Algorithms (GA), Natural Language Processing (NLP) and Big Data Analytics to solve real-life Software Engineering problems. To date, our works have been published in several top venues of Software Engineering -- ICSE(A*), ESEC/FSE(A*), EMSE(A), ASE(A), ICSME(A) and MSR(A). If you find our work interesting and would like to join/collaborate with us, please contact.

Research Topics

  • Bug/Fault Localization
  • Bug/Fault Reproduction
  • Concept Location
  • Internet-scale Code Search
  • Automated Query Reformulation
  • Code Review Automation
  • Crowdsourced Knowledge Mining
  • Recommendation Systems in Software Engineering
  • IDE-based Meta Search Engine
  • Exception Handling
  • Software Re-documentation
  • Code Comprehension

Research Methodologies: Information retrieval, Machine learning, Deep learning, Genetic Algorithms, Static code analysis, Natural language processing, Data Mining, Big Data analytics and Empirical studies.

Message to Prospective Students!

RAISE Lab is looking for multiple highly motivated, hard working PhD students with an evidence of research excellence and communication skill (e.g., writing). If you are enthusiastic about solving real-life Software Engineering problems and your credentials meet these requirements, please send your (1) CV (including major achievements), (2) BSc/MSc transcripts, (3) IELTS/TOEFL scores (more details), (4) sample publications, and (4) one-page summary of your past research excellence to masud[DOT]rahman[AT]dal[DOT]ca. The one-page summary should also include how your past experience might align with our research goals and interests. It should be noted that only potentially eligible candidates might be contacted for an interview due to a high volume of applicants. More details about the admission into Dalhousie University.

Co-supervised/Mentored Students at Polytechnique Montreal
  • Biruk Asmare Muse (PhD)
  • Mehdi Moravati (PhD)
  • Mahmood Vahedi (MSc)
  • Hadhemi Jebnoun (MSc)

Co-supervised/Mentored Students at University of Saskatchewan
  • Rodrigo Fernandes (PhD), CROKAGE
  • Saikat Mondal (MSc), Geddes Award Winner 2019

Co-supervised Students at Khulna University
  • Debasish Chakroborti (BSc)
  • Sristy Sumana Nath (BSc)
  • Ajoy Paul (BSc)
  • Sudip Sarkar (BSc)
  • Aminul Islam (BSc)

Completed Projects

BugDoctor: Intelligent Search Engine for Software Bugs and Features
  Overview: Software developers often attempt to find their code of interest (e.g., bugs, features, reusable code) using code search tools (e.g., Lucene). Unfortunately, even the experienced developers often fail to construct the right queries. Even if the developers come up with a few ad hoc queries, most of them require frequent modifications which cost significant development time and efforts. Thus, construction of an appropriate query for localizing the software bugs, programming concepts or even the reusable code is a major challenge. Bug Doctor assists the developers in localizing the software code of interest (e.g., bugs, concepts and reusable code) during software maintenance. In particular, it reformulates a given search query (1) by designing a novel keyword selection algorithm (e.g., CodeRank) that outperforms the traditional alternatives (e.g., TF-IDF), (2) by leveraging the bug report quality paradigm and source document structures which were previously overlooked and (3) by exploiting the crowd knowledge and word semantics derived from Stack Overflow Q&A site, which were previously untapped. An experiment using 5000+ search queries (bug reports, change requests, and ad hoc queries) suggests that Bug Doctor can improve the given queries significantly through automated query reformulations. Comparison with 10+ existing studies on bug localization, concept location and Internet-scale code search suggests that Bug Doctor can outperform the state-of-the-art approaches with a significant margin. Explore more ...

query-reformulation bug-localization concept-location code-search

Why Are Some Bugs Non-Reproducible? An Empirical Investigation using Data Fusion [ICSME 2020]
Overview: Software developers attempt to reproduce software bugs to understand their erroneous behaviours and to fix them. Unfortunately, they often fail to reproduce (or fix) them, which leads to faulty, unreliable software systems. However, to date, only a little research has been done to better understand what makes the software bugs non-reproducible. In this paper, we conduct a multimodal study to better understand the non-reproducibility of software bugs. First, we perform an empirical study using 576 non-reproducible bug reports from two popular software systems (Firefox, Eclipse) and identify 11 key factors that might lead a reported bug to non-reproducibility. Second, we conduct a user study involving 13 professional developers where we investigate how the developers cope with non-reproducible bugs. We found that they either close these bugs or solicit for further information, which involves long deliberations and counter-productive manual searches. Third, we offer several actionable insights on how to avoid non-reproducibility (e.g., false-positive bug report detector) and improve reproducibility of the reported bugs (e.g., sandbox for bug reproduction) by combining our analyses from multiple studies (e.g., empirical study, developer study). Explore more ...

empirical-study   data-fusion   bug-reproduction

BLIZZARD: Improving IR-Based Bug Localization with Context-Aware Query Reformulation [ESEC/FSE 2018]
Overview: BLIZZARD is a novel technique for IR-based bug localization that uses query reformulation and bug report quality dynamics. We first conduct an empirical study to analyse the report quality dynamics of bug reports and then design an IR-based bug localization technique using graph-based keyword selection, query reformulation, noise filtration, and Information Retrieval. Explore more ...

query-reformulation   bug-localization

STRICT: Search Term Identification for Concept Location using Graph-Based Term Weighting [SANER 2017]
Overview: STRICT is a novel technique for identifying appropriate search terms from a software change request. It uses graph-based term weighting (PageRank), natural language processing and Information Retrieval to identify the important keywords and then finds the code of interest (e.g., software feature). Explore more ...

query-reformulation  concept-location

ACER: Improved Query Reformulation for Concept Location using CodeRank and Document Structures [ASE 2017]
Overview: ACER offers effective reformulations to queries for concept location using CodeRank and source document structures. It uses graph-based keyword selection from source code, query difficulty analysis, machine learning and Information Retrieval for reformulating the queries. Explore more ...

query-reformulation concept-location

RACK: Automatic Query Reformulation for Code Search using Crowdsourced Knowledge [SANER 2016 + EMSE 2019 + ICSE 2017]
  Overview: Traditional code search engines (e.g., Krugle) often do not perform well with natural language queries. They mostly apply keyword matching between query and source code. Hence, they need carefully designed queries containing references to relevant APIs for the code search. Unfortunately, preparing an effective search query is not only challenging but also time-consuming for the developers according to existing studies. In this article, we propose a novel query reformulation technique--RACK--that suggests a list of relevant API classes for a natural language query intended for code search. Our technique offers such suggestions by exploiting keyword-API associations from the questions and answers of Stack Overflow (i.e., crowdsourced knowledge). We first motivate our idea using an exploratory study with 19 standard Java API packages and 344K Java related posts from Stack Overflow. Experiments using 175 code search queries randomly chosen from three Java tutorial sites show that our technique recommends correct API classes within the Top-10 results for 83% of the queries, with 46% mean average precision and 54% recall, which are 66%, 79% and 87% higher respectively than that of the state-of-the-art. Reformulations using our suggested API classes improve 64% of the natural language queries and their overall accuracy improves by 19%. Comparisons with three state-of-the-art techniques demonstrate that RACK outperforms them in the query reformulation by a statistically significant margin. Investigation using three web/code search engines shows that our technique can significantly improve their results in the context of code search. Explore more ...

query-reformulation code-search

NLP2API: Effective Reformulation of Query for Code Search using Crowdsourced Knowledge and Extra-Large Data Analytics [ICSME 2018]
Overview: NLP2API reformulates a natural language query intended for Internet-scale code search using crowdsourced knowledge and extra-large data analytics derived from Stack Overflow Q & A site. Explore more ...

query-reformulation  code-search

BLADER: Improving Queries for IR-Based Bug Localization with Semantics-Driven Query Reformulation [Ongoing Work]
Overview: BLADER complements a bug report that does not contain any structured entities (e.g., program entity names) and uses the complemented report as a query for IR-based bug localization. It employs word embeddings derived from Stack Overflow with FastText for the query reformulation. Explore more ...

query-reformulation   bug-localization

CORRECT: Code Reviewer Recommendation in GitHub Based on Cross-Project and Technology Experience [ICSE 2016 + ASE 2016]
  Overview: Peer code review locates common coding rule violations and simple logical errors in the early phases of software development, and thus reduces overall cost. However, in GitHub, identifying an appropriate developer for code review during pull request submission is a non-trivial task. In this project, we propose a heuristic ranking technique that considers not only the cross-project work history of a developer but also her experience in certain technologies associated with a pull request for determining her expertise as a potential code reviewer. We first motivate our technique using an exploratory study with 20 commercial projects. We then evaluate the technique using 13,081 pull requests from ten projects, and report 92.15% accuracy, 85.93% precision and 81.39% recall in code reviewer recommendation which outperforms the state-of-the-art technique. This project was funded by NSERC Industry Engage grant and was done in collaboration with Vendasta Technologies. Explore more ...

code-review-automation data-mining

RevHelper: Predicting Usefulness of Code Review Comments using Textual Features and Developer Experience [MSR 2017]
Overview: RevHelper predicts the usefulness of code review comments based on their texts and developers' experience. We first conduct an empirical study where we contrast between hundreds of useful and non-useful review comments from multiple commercial projects. Then we collect features from comment texts and reviewers' experience, and apply Random Forest algorithm to them for the usefulness prediction. Explore more ...

code-review-automation data-mining

CodeInsight: Recommending Insightful Comments for Source Code using Crowdsourced Knowledge [SCAM 2015]
Overview: CodeInsight is an automated technique for generating insightful comments for source code using crowdsourced knowledge from Stack Overflow. It uses data mining, topic modelling, sentiment analysis and heuristics for deriving the code-level insights. Explore more ...


ExcClipse: Context-Aware Meta Search Engine for Programming Errors and Exceptions
  Overview: Studies show that software developers spend about 19% of their development time in web surfing. While collecting necessary information with traditional web/code search, they face several practical challenges. Traditional search engines (1) do not capture the problem contexts (e.g., stack traces, code under editing) from the IDE, (2) force the developers to switch between the IDE and the web browser, and also (3) overwhelm the developers with thousands of search results. In this MSc thesis, we develop a context-aware, IDE-based, meta search engine --ExcClipse-- that delivers relevant web pages and code examples within the IDE panel for dealing with programming errors and exceptions. Once a programming error/exception is encountered, the tool (1) constructs an appropriate query by capturing the error details and meta data, (2) collects results from popular search engines--Google, Bing, Yahoo, StackOverflow and GitHub, (3) refines and ranks the results against the context of the encountered exception, and (4) then recommends them within the IDE. We develop our solution as an Eclipse plug-in prototype. Explore more ...

recommendation-system search-engine

SurfClipse: Context-Aware IDE-Based Meta Search Engine for Programming Errors & Exceptions [CSMR-WCRE 2014 + ICSME 2014 + WCRE 2013]
  Overview: Despite various debugging supports of the existing IDEs for programming errors and exceptions, software developers often look at web for working solutions or any up-to-date information. Traditional web search does not consider thecontext of the problems that they search solutions for, and thus it often does not help much in problem solving. In this paper, we propose a context-aware meta search tool, SurfClipse, that analyzes an encountered exception andits context in the IDE, and recommends not only suitable search queries but also relevant web pages for the exception (and its context). The tool collects results from three popular search engines and a programming Q & A site against the exception in the IDE, refines the results for relevance against the context of the exception, and then ranks them before recommendation. It provides two working modes--interactive and proactive to meet the versatile needs of the developers, and one can browse the result pages using a customized embedded browser provided by the tool. Explore more ...

recommendation-system search-engine

Publication Stats
  • ICSE (A*) x 4
  • FSE (A*) x 1
  • EMSE (A) x 2
  • ASE (A) x 3
  • ICSME (A) x 4
  • MSR (A) x 7
  • ICPC (B) x 1
  • SANER (B) x 5
  • SCAM (C) x 2

Artifacts + Tools
  • Accepted Project Artifact Artifact X 2
  • Accepted Tool Demo Tools X 3


Home | Contact
Copyright © Mohammad Masudur Rahman. Last updated on July 21, 2020