Using Machine Learning to Guide the Application of Software Refactorings: A Preliminary Exploration
Abstract: Refactorings constitute the most direct and comprehensible approach for addressing software quality issues, stemming directly from identified code smells. Nevertheless, despite their popularity in both the research and industrial communities: (a) the effect of a refactoring is not guaranteed to be successful; and (b) the plethora of available refactoring opportunities does not allow their comprehensive application. Thus, there is a need of guidance, on when to apply a refactoring opportunity, and when the development team shall postpone it. The notion of interest, forms one of the major pillars of the Technical Debt metaphor expressing the additional maintenance effort that will be required because of the accumulated debt. To assess the benefits of refactorings and guide when a refactoring should take place, we first present the results of an empirical study assessing and quantifying the impact of various refactorings on Technical Debt Interest (building a real-world training set) and use machine learning approaches for guiding the application of future refactorings. To estimate interest, we rely on the FITTED framework, which for each object-oriented class assesses its distance from the best-quality peer; whereas the refactorings that are applied throughout the history of a software project are extracted with the RefactoringMiner tool. The dataset of this study involves 4,166 refactorings applied accross 26,058 revisions of 10 Apache projects. The results suggest that the majority of refactorings reduce Technical Debt interest; however, considering all refactoring applications, it cannot be claimed that the mean impact differs from zero, confirming the results of previous studies highlighting mixed effects from the application of refactorings. To alleviate this problem, we have built an ade-quately accurate (~70%) model for the prediction of whether or not a refactoring should take place, in order to reduce Technical Debt interest.
Are Machine Programming Systems Using Right Source-code Measures to Select Code Repositories?
Abstract: Machine programming (MP) is an emerging field at the intersection of deterministic and probabilistic computing, and it aims to assist software and hardware engineers, among other applications. Along with powerful compute resources, MP systems often rely on vast amount of open-source code to learn interesting properties about code and programming and solve problems in the areas of debugging, code recommendation, auto-completion, etc. Unfortunately, several of the existing MP systems either do not consider quality of code repositories or use atypical quality measures than those typically used in software engineering community to select them. As such, impact of quality of code repositories on the performance of these systems needs to be studied. In this preliminary paper, we evaluate impact of sets of repositories of different quality on the performance of a candidate MP system. Towards that objective, we develop a framework, named GitRank, to rank open-source repositories on quality, maintainability, and popularity by leveraging existing research on this topic. We then apply GitRank to evaluate correlation between the quality measures used by the candidate MP system and the quality measures used by our framework. Our preliminary results reveal some correlation between the quality measures used in GitRank and ControlFlag's performance, suggesting that some of the measures used in GitRank are applicable to ControlFlag. But the results also raise questions around right quality measures for code repositories used in MP systems. We believe that our findings also generate interesting insights towards code quality measures that affect performance of MP systems.
DeepCrash: Deep Metric Learning for Crash Bucketing Based on Stack Trace
Abstract: Some software projects collect vast crash reports from testing and end users, then organize them in groups to efficiently fix bugs. This task is crash report bucketing. In particular, a high precision and fast speed crash similarity measurement approach is the critical constraint for large-scale crash bucketing. In this paper, we propose a deep learning-based crash bucketing method which maps stack trace to feature vectors and groups these feature vectors into buckets. First, we develop a frame tokenization method for stack trace, called frame2vec, to extract frame representations based on frame segmentation. Second, we propose a deep metric model to map the sequential stack trace representations into feature vectors whose similarity can represent the similarity of crashes. Third, a clustering algorithm is used to rapidly group similar feature vectors into same buckets to get the final result. Additionally, we evaluate our approach with the other seven competing methods on both private and public data sets. The results reveal that our method can speed up clustering and maintain high competitive precision.
On the Application of Machine Learning Models to Assess and Predict Software Reusability
Abstract: Software reuse has been proven to be an effective strategy for developers to significantly increase software quality, reduce costs and increase the effectiveness of software development. Research in software reuse typically aims to address two main hurdles: 1.) reduce the time and effort required to identify reusable candidates, and 2.) avoid selecting low quality software components which may potentially lead to higher cost of development (i.e., solving bugs, errors, refactoring, etc.). Inherently, human judgment falls short in the aspect of reliability and effectiveness. Hence, in this paper, we investigate the applicability of Machine Learning (ML) algorithms in assessing software reusability. We collected more than 32k open- source projects and employed GitHub fork as the ground truth to its reusability. We have developed ML classification pipelines which are based on both internal and external software metrics to perform software reusability prediction. Our best performing ML classification model achieved an accuracy of 86%, outperforming existing research in both prediction performance and data coverage. Subsequently, we leverage our results by identifying key software characteristics that make software highly reusable. Our results show that size-related metrics (i.e., number of setters, methods, attributes) are the most impactful in contributing towards the reusability of the software.
Neural Language Models for Code Quality Identification
Abstract: Neural Language Models for code have lead to interesting applications such as code completion and bug fix generation. Another type of code related application is the identification of code quality issues such as repetitive code and unnatural code. Neural language models contain implicit knowledge about such aspects. We propose a framework to detect code quality issues using neural language models. To handle repository-specific conventions, we use local or repository-specific models. The models are successful in detecting real-world code quality issues with low false positive rate.