At the Università della Svizzera Italiana:
At the Université de Caen:
I was also an invited lecturer at the PL 2009 summer school.
dismiss
During the evolution of a software system, a large amount of information, which is not always directly related to the source code, is produced. Several researchers have provided evidence that the contents of mailing lists represent a valuable source of information: Through e-mails, developers discuss design decisions, ideas, known problems and bugs, etc. which are otherwise not to be found in the system.A technical challenge in this context is how to establish the missing link between free-form e-mails and the system artifacts they refer to. Although the range of approaches is vast, establishing their accuracy remains a problem, as there is no benchmark against which to compare their performance.To overcome this issue, we manually inspected a statistically significant number of e-mails pertaining to the ArgoUML system. Based on this benchmark, we present a variety of lightweight techniques to assign e-mails to software artifacts and measure their effectiveness in terms of precision and recall.
@inproceedings{BDLR-WCRE2009,
author = {Alberto Based and Marco D'Ambros and Michele Lanza and Romain Robbes},
title={Benchmarking Lightweight Techniques to Link E-Mails and Source Code},
booktitle = {WCRE 2009: Proceedings of the 16th IEEE Working Conference on Reverse Engineering},
year = {2009},
pages = {205--214},
}
dismiss
E-mails concerning the development issues of a system constitute an important source of information about high-level design decisions, low-level implementation concerns, and the social structure of developers.
Establishing links between e-mails and the software artifacts they discuss is a non-trivial problem, due to the inherently informal nature of human communication. Different approaches can be brought into play to tackle this traceability issue, but the question of how they can be evaluated remains unaddressed, as there is no recognized benchmark against which they can be compared.
In this article we present such a benchmark, which we created through the manual inspection of a statistically significant number of e-mails pertaining to six unrelated software systems. We then use our benchmark to measure the ectiveness of a number of approaches, ranging from lightweight approaches based on regular expressions to full-fledged information retrieval approaches.
@inproceedings{BLR-ICSE2010,
author = {Alberto Bacchelli and Michele Lanza and Romain Robbes},
title = {Linking E-Mails and Source Code Artifacts},
booktitle = {ICSE 2010: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering},
year = {2010},
pages = {to appear}
}
dismiss
While traditional approaches to code profiling help locate performance bottlenecks, they offer only limited support for removing these bottlenecks. The main reason is the lack of visual and detailed runtime information to identify and eliminate computation redundancy.
We provide two profiling blueprints which help identify and remove performance bottlenecks. The structural distribution blueprint graphically represents the CPU consumption share for each method and class of an application. The behavioral distribution blueprint depicts the distribution of CPU consumption along method invocations, and hints at method candidates for caching optimizations. These two blueprints helped us to significantly optimize Mondrian, an open source visualization engine. Our implementation is freely available for the Pharo development environment and has been evaluated in a number of different scenarios.
@inproceedings{BRB-TOOLS2010,
author = {Alexandre Bergel and Romain Robbes and Walter Binder},
title = {Visualizing Dynamic Metrics with Profiling Blueprints},
booktitle = {TOOLS 2010: Proceedings of the 48th International Conference on Objects, Models, Components, Patterns},
year = {2010},
pages = {to appear}
}
dismiss
Software systems are hard to understand due to the complexity and the sheer size of the data to be analyzed. Software visualization tools are a great help as they can sum up large quantities of data in dense, meaningful pictures. Traditionally such tools come in the form of desktop applications. Modern web frameworks are about to change this status quo, as building software visualization tools as web applications can help in making them available to a larger audience in a collaborative setting. Such a migration comes with a number of promises, perils and technical implications that have to be taken into account before starting any migration process.
In this paper we share our experiences in porting two such tools to the web and discuss the promises and perils that go hand in hand with such an endeavour.
@inproceedings{DLLR-WSE2009,
author = {Marco D'Ambros and Michele Lanza and Michele Lanza and Romain Robbes},
title={Promises and Perils of Porting Software Visualization Tools to the Web},
booktitle = {WSE 2009: Proceedings of the 11th IEEE International Symposium on Web Systems Evolution},
year = {2009},
pages = {109-118},
}
dismiss
Reliably predicting software defects is one of software engineering’s holy grails. Researchers have devised and implemented a plethora of bug prediction approaches varying in terms of accuracy, complexity and the input data they require. However, the absence of an established benchmark makes it hard, if not impossible, to compare approaches.
We present a benchmark for defect prediction, in the form of a publicly available data set consisting of several software systems, and provide an extensive comparison of the explanative and predictive power of well-known bug prediction approaches, together with novel approaches we devised.
Based on the results, we discuss the performance and stability of the approaches with respect to our benchmark and deduce a number of insights on bug prediction models.
@inproceedings{DLR-MSR2010,
author = {Marco D'Ambros and Michele Lanza and Romain Robbes},
title = {An Extensive Comparison of Bug Prediction Approaches},
booktitle = {MSR 2010: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories},
year = {2010},
pages = {to appear}
}
dismiss
Change coupling is the implicit relationship between two or more software artifacts that have been observed to frequently change together during the evolution of a software system. Researchers have studied this dependency and have observed that it points to design issues such as architectural decay. It is still unknown whether change coupling correlates with a tangible effect of design issues, i.e., software defects.In this paper we analyze the relationship between change coupling and software defects on three large software systems. We investigate whether change coupling correlates with defects, and if the performance of bug prediction models based on software metrics can be improved with change coupling information.
@inproceedings{DLR-WCRE2009,
author = {Marco D'Ambros and Michele Lanza and Romain Robbes},
title={On the Relationship Between Change Coupling and Software Defects},
booktitle = {WCRE 2009: Proceedings of the 16th IEEE Working Conference on Reverse Engineering},
year = {2009},
pages = {135--144},
}
dismiss
Commit comments written by developers when they submit their changes to a versioning system are useful for a number of tasks: Developers write commit comments to document changes and as a means to communicate with the rest of the development team; Researchers mine commit-related data contained in software repositories to support software evolution and reverse engineering activities. However, the support provided by IDEs is restricted in this respect, as they limit the users to use only text to document their changes.
We present Commit 2.0, an IDE enhancement to enrich commit comments using software visualization. Commit 2.0 generates visualizations of the performed changes at different granularity levels, and lets the user annotate them.
@inproceedings{DLR-WEB2SE2010,
author = {Marco D'Ambros and Michele Lanza and Romain Robbes},
title={Commit 2.0},
booktitle = {Web2SE 2010: Proceedings of the 1st Workshop on Web 2.0 for Software Engineering},
year = {2010},
pages = {to appear},
}
dismiss
Software evolution research has focused mostly on analyzing the evolution of single software systems. However, it is rarely the case that a project exists as standalone, independent of others. Rather, projects exist in parallel within larger contexts in companies, research groups or even the open-source communities. We call these contexts software ecosystems, and on this paper we present The Small Project Observatory, a prototype tool which aims to support the analysis of project ecosystems through interactive visualization and exploration. We present a case-study of exploring an ecosystem using our tool, we describe about the architecture of the tool, and we distill the lessons learned during the tool-building experience.
@article{LLGR-SCP2010,
Author = {Mircea Lungu and Michele Lanza and Tudor G\^irba and Romain Robbes},
Title = {The {Small Project Observatory}: Visualizing Software Ecosystems},
journal = {Science of Computer Programming},
year = {2010},
volume = {75},
number = {4},
pages = {264--275},
}
dismiss
Software changes. Any long-lived software system has maintenance costs dominating its initial development costs as it is adapted to new or changing requirements. Systems on which such continuous changes are performed inevitably decay, making maintenance harder. This problem is not new: The software evolution research community has been tackling it for more than two decades. However, most approaches have been targeting specific maintenance activities using an ad-hoc model of software evolution.
Instead of only addressing individual maintenance activities, we propose to take a step back and address the software evolution problem at its root by treating change as a first-class entity. We apply the strategy of reification, used with success in other branches of software engineering, to the changes software systems experience. Our thesis is that a reified change-based representation of software enables better evolution support for both reverse and forward engineering activities. To this aim, we present our approach, Change-based Software Evolution, in which first-class changes to programs are recorded as they happen.
We implemented our approach and recorded the evolution of several systems. We validated our thesis by providing support for several maintenance activities. We found that:
* Change-based Software Evolution eases the reverse engineering and program comprehension of systems by providing access to historical information that is lost by other approaches. The fine-grained change information we record, when summarized in evolutionary measurements, also gives more accurate insights about a system’s evolution.
* Change-based Software Evolution facilitates the evolution of systems by integrating program transformations, their definition, comprehension and possible evolution in the overall evolution of the system. Further, our approach is a source of fine-grained data useful to both evaluate and improve the performance of recommender systems that guide developers as they change a software system.
These results support our view that software evolution is a continuous process, alternating forward and reverse engineering activities that requires the support of a model of software evolution integrating these activities in a harmonious whole.
@phdthesis{R-USI2008,
author = {Romain Robbes},
title = {Of Change and Software},
school = {University of Lugano},
month = {December},
year = {2008},
}
dismiss
Code completion is a widely used productivity tool. It takes away the burden of remembering and typing the exact names of methods or classes: As a developer starts typing a name, it provides a progressively refined list of candidates matching the name. However, the candidate list always comes in alphabetic order, i.e., the environment is only second-guessing the name based on pattern matching. Finding the correct candidate can be cumbersome or slower than typing the full name.
We present an approach to improve code completion with program history. We define a benchmark measuring the accuracy and usefulness of a code completion engine. Further, we use the change history data to also improve the results offered by code completion tools. Finally, we propose an alternative interface for completion tools.
@inproceedings{RL-ASE2008,
author = {Romain Robbes and Michele Lanza},
title = {How Program History Can Improve Code Completion},
booktitle = {ASE 2008: Proceedings of the 23rd ACM/IEEE International Conference on Automated Software Engineering},
year = {2008},
pages = {317-326}
}
dismiss
Software evolution research is limited by the amount of information available to researchers: Current version control tools do not store all the information generated by developers. They do not record every intermediate version of the system issued, but only snapshots taken when a developer commits source code into the repository. Additionally, most software evolution analysis tools are not a part of the day-to-day programming activities, because analysis tools are resource intensive and not integrated in development environments. We propose to model development information as change operations that we retrieve directly from the programming environment the developers are using, while they are effecting changes to the system. This accurate and incremental information opens new ways for both developers and researchers to explore and evolve complex systems.
@article{RL-ENTCS2010,
author = {Romain Robbes and Michele Lanza},
title = {How Program History Can Improve Code Completion},
journal = {Electr. Notes Theor. Comput. Sci.},
volume = {166},
year = {2007},
pages = {93-109},
}
dismiss
The understanding of development sessions, the phases during which a developer actively modifies a software system, is a valuable asset for program comprehension, since the sessions directly impact the current state and future evolution of a software system. Such information is usually lost by state-of-the-art versioning systems, because of the checkin/checkout model they rely on: a developer must explicitly commit his changes to the repository. Since this happens in arbitrary and sometimes long intervals, recovering the changes between two commits is difficult and inaccurate, and recovering the order of the changes is impossible.
We have implemented an evolution monitoring prototype which records every semantic change performed on a system, and is able to completely reconstruct development sessions. In this paper we use this fine-grained information to understand and characterize the development sessions as they were carried out on two object-oriented systems.
@inproceedings{RL-ICPC2007,
author = {Romain Robbes and Michele Lanza},
title = {Characterizing and Understanding Development Sessions},
booktitle = {ICPC 2007: Proceedings of the 15th International Conference on Program Comprehension}
year = {2007},
pages = {155-166},
}
dismiss
Code completion is a widely used productivity tool. It takes away the burden of remembering and typing the exact names of methods or classes: As a developer starts typing a name, it provides a progressively refined list of candidates matching the name. However, the candidate list usually comes in alphabetic order, i.e., the environment is only second-guessing the name based on pattern matching, relying on human intervention to pick the correct one. Finding the correct candidate can thus be cumbersome or slower than typing the full name.
We present an approach to improve code completion based on recorded program histories. We define a benchmarking procedure measuring the accuracy of a code completion engine and apply it to several completion algorithms on a dataset consisting of the history of several systems. Further, we use the change history data to improve the results offered by code completion tools. Finally, we propose an alternative interface for completion tools that we released to developers and evaluated.
@article{RL-JASE2010,
author = {Romain Robbes and Michele Lanza},
title = {How Program History Can Improve Code Completion},
journal = {Autom. Softw. Eng.},
year = {2010},
volume = {in press},
number = {in press},
pages = {in press},
}
dismiss
Software changes. During their life cycle, software systems experience a wide spectrum of changes, from minor modifications to major architectural shifts. Small-scale changes are usually performed with text editing and refactorings, while large-scale transformations require dedicated program transformation languages. For medium-scale transformations, both approaches have disadvantages. Manual modifications may require a myriad of similar yet not identical edits, leading to errors and omissions, while program transformation languages have a steep learning curve, and thus only pay off for large-scale transformations.
We present a system supporting example-based program transformation. To define a transformation, a programmer performs an example change manually, feeds it into our system, and generalizes it to other application contexts. With time, a developer can build a palette of reusable medium-sized code transformations. We provide a detailed description of our approach and illustrate it with examples.
@inproceedings{RL-MODELS2008,
author = {Romain Robbes and Michele Lanza},
title = {Example-Based Program Transformation},
booktitle = {MoDELS 2008: Proceedings of the 11th ACM/IEEE International Conference on Model Driven Engineering},
year = {2008},
pages = {174-188},
}
dismiss
Change prediction helps developers by recommending program entities that will have to be changed alongside the entities currently being changed. To evaluate their accuracy, current change prediction approaches use data from versioning systems such as CVS or SVN. These data sources provide a coarse-grained view of the development history that flattens the sequence of changes in a single commit. They are thus not a valid basis for evaluation in the case of development-style prediction, where the order of the predictions has to match the order of the changes a developer makes.
We propose a benchmark for the evaluation of change prediction approaches based on fine-grained change data recorded from IDE usage. Moreover, the change prediction approaches themselves can use the more accurate data to fine-tune their prediction. We present an evaluation procedure and use it on several change prediction approaches, both novel and from the literature, and report on the results.
@inproceedings{RPL-MSR2010,
author = {Romain Robbes and Damien Pollet and Michele Lanza},
title = {Replaying IDE Interactions to Evaluate and Improve Change Prediction},
booktitle = {MSR 2010: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories},
year = {2010},
pages = {to appear}
}
dismiss