Curiosity-Driven

October 2, 2007

Visualizing Build Data Pt.2: Coverage Time-Series Multiples

Filed under: continuous_integration, web — teropa @ 5:09 pm

This post continues the exploration started in Pt.1

All packages in a project, all classes in a package, and all methods in a class could benefit from a display where they are all on the same screen, in small separate graphs.

This technique fits a lot of data on a screen without making things crowded. It also scales quite well to larger numbers of categories.

The technique invites the user to make comparisons between categories, to identify differences and similarities, but it doesn’t compromise the significance of any individual time-series, as a single graph with multiple categories would do.

multiples.gif

This is the small multiples technique from Edward Tufte’s Envisioning Information:

“Small multiple designs, multivariate and data bountiful
answer directly by visually enforcing comparisons of changes,
of the differences among objects, of the scope of alternatives.
For a wide range of problems in data presentation,
small multiples are the best design solution.”

September 29, 2007

Visualizing Build Data Pt.1: Simple Coverage Time-Series

Filed under: continuous_integration, web — Tags: , , , — teropa @ 9:44 pm

Many continuous integration servers that have been running for any significant period of time have accumulated a large amount of data about project health. This is especially true for projects that use code coverage and analysis tools.

However, often this data is just sitting there, in XML files on the build server, seen by no one. Seems like an awful waste of perfectly good data to me. So during this fall I’m going to be thinking about what can be done with it to make it look more interesting. Being a beginning student of information visualization, I’m going to be especially concentrating on the visual aspect of things, but I’ll bet I’m going to do some forays into statistical analysis and data mining as well.

Starting with the simplest of things: A time-series showing the trend of code coverage. What is the least chart-junky way to display this variable?

A line chart is probably the best way to visualize a time-series:

coverage_timeseries_1.gif

Code coverage is distinctively a variable of “volume”, and always a fraction of some maximum volume (100%), so it might benefit from being displayed as such, by coloring the “filled” area:

coverage_timeseries_2.gif

The filled area of coverage is usually considered good, and the uncovered area as bad, so maybe the negative quality of the negative space could be highlighted by coloring it red?

coverage_timeseries_3.gif

The colored areas now outline the data, so the original line representing the graph becomes redundant. This means it must go. Let’s get rid of the bounding box too. Now we can also increase the value of the fill colors as they become the primary (only) element of the graph:

coverage_timeseries_4.gif

That’s better. However, although right now the graph shows the general trend pretty well, it isn’t very easy to make out what the actual coverage value is at any given time. Maybe this can be helped by introducing a horizontal grid line at every 20% interval?

coverage_timeseries_5.gif

It does help, but the grid is way too heavy. It actually takes over as the primary graphic element. That can’t be good. I don’t like things that are too heavy for their purpose. We’ll make the grid as thin as possible, and make it white so it fades to the background:

coverage_timeseries_6.gif

There. It is now easy to see the grid but it doesn’t get in your face.

I haven’t thought about the temporal scale here at all, nor the different granularities of code coverage that we could examine (project / package / class / method / block / line -level). That is what I’ll look at next.

September 19, 2007

Development Best Practices for a Medium-sized Rails Project

Filed under: agile, bdd, continuous_integration, rails, testing — teropa @ 8:51 am

As part of a Wicket-Spring-Hibernate vs. Rails evaluation we’re doing, I’ve been thinking of a set of best practices we could employ if we were to go for the Rails solution.

Most of these would be of course relevant for Java as well as Rails. However, being familiar with the realities of most software projects I know some of these things are the first out the door when scheduling pressures kick in. I think the considerable productivity gain we would get from employing Rails should give us an opportunity to really concentrate on improving the quality of both the software and the process.

So, here’s what I have in mind: Behaviour-driven development, automated acceptance tests, continuous integration, code reviews, shared ownership, automated deployment, enhacing communication with a wiki, a chat solution and daily meetings, and last but not least, reflecting on how we’re doing with agile retrospectives.

Behaviour-Driven Development

We should use the wonderful rSpec library to do behaviour-driven development. BDD brings real rigor to the coding process by making us always specify the code we write before we write it.

BDD is the next natural step from test-driven development, and improves it by replacing the vocabulary of testing with that of specifying. rSpec is becoming quite mature at this point and can pretty much fully replace Test::Unit in a Rails project.

We’ll want to use rCov to keep track of code coverage. We could decide to make some level of coverage an acceptance criterion for an iteration, but once we fully embrace BDD we won’t need to do that. At that point coverage reporting becomes a tool for the developers to use to find those little nooks of code that might have slipped in untested (or unspecified).

Automated Acceptance Tests

The acceptance criteria of each iteration should include automated acceptance tests of all implemented features. We can employ Selenium to do them, and they should be done by the developers, together with the system tester and the customer representative. They could be defined and stored in a system like Fitnesse, or just in the version control system with the code.

Automating acceptance makes it provable that the agreed-upon features were actually implemented. They also define a powerful regression test harness, as acceptance tests from previous iterations can be run any number of times without additional cost. Using Selenium also has the added benefit that the tests double as a cross-browser sanity check.

It is notable that automating acceptance tests does not replace the need for a separate system testing / QA process. What it does do is free up the system tester’s time from hunting down coding errors to doing exploratory testing and finding the really hard problems caused by integration issues or miscommunication between developers and customers.

Continuous Integration

We should setup a continuous integration server, such as CruiseControl.rb, which runs all our specs on each commit. It should also run our in-browser acceptance tests.

The CI machine can also be used for automating deployment to development / testing servers, so that they automatically get new versions of the software when its committed into version control and all automated tests pass.

Code Reviews

All code changes should be reviewed by other developers before they get committed to version control. We can do this in a controlled & automated fashion by using Review Board or a similar solution.

Code reviews not only provide a proven reduction to defect rate, but also enhance communication by keeping everyone up-to-date on what’s going on in different parts of the software.

Shared Ownership

There should be no “my code” or “your code”. All code is owned by the team. We can support this by rotating the responsibility areas for each iteration: The developer who was working on feature A in iteration 1 will work on feature B in iteration 2.

This prevents silos from forming, and helps everyone to understand all parts of the software and how they fit together.

Another thing which happens with rotating responsibilities is that developers will want to produce cleaner code. This is because they know the code will absolutely have to be understood and continued by someone else in the near future.

Automated Deployment

This is pretty much a no-brainer for a Rails project, but I feel it needs to be said as this still often doesn’t happen in Java projects.

Deployment to development, testing and production servers should be automated so that it can be done easily by any developer, any number of times.

The whole process of deploying the application, which includes updating the code, updating the database schema and data and restarting all server processes should be reduced to a single command.

In a Rails project, we can of course employ Capistrano to achieve this.

Wiki

An active wiki should be maintained. It can include any kind of information which could be of interested to developers. Here are a few examples:

  • News and announcements
  • Scheduling issues, upcoming deadlines and milestones
  • Project metrics (open defects, code coverage, server status)
  • Server infrastructure information. IPs, directories, server software
  • Development environment setup instructions
  • Contact information for developers, customer representatives and other project personnel
  • Coding conventions and standards
  • Documentation of used plugins, both in-house and external
  • Solutions for common problems, “knowledge base”

A rotating wiki gardener duty should be set up, so that every project member is responsible for one week at a time for keeping the wiki clean, consistent, relevant and up-to-date.

Chat

The team will be distributed, so we need some kind of group IM solution. We could setup an IRC room or some web-based solution.

It should be encouraged that everyone is present in the chat whenever they are working on the project, as this will probably be the closest thing we get to a shared agile workspace.

Daily Meeting

A daily 15-minute meeting, such as a daily scrum should be held consistently. This meeting should be face-to-face when possible, and over the phone when not.

Agile Retrospectives

In each iteration, we should spend some time reflecting on how these practices are working and what should be added, dropped or improved.

We should also do some “code strategizing”, like looking for emerging patterns in the code base which could be abstracted into reusable components. We should do this consistently in each iteration.

Blog at WordPress.com.