D3 Sunburst Diagram Tutorial

D3 is a Javascript library that allows you to bind data to the DOM and apply data-driven transformations to the document.

What does that mean? Essentially, D3 (which stands for data driven document) allows you to generate HTML or SVG elements based on arrays of numbers. For example, you could create an array of objects that map student's names to their heights, feed that data into D3 and use it to generate a bar chart.

D3 has a lot of functionality and can lend itself to some really beautiful data visualizations. To check out some super nice examples, visit the official D3 example page.

In this post, I'll break down the code used to generate a zoomable sunburst diagram. A sunburst is similar to a tree diagram, except it uses a radial layout. The root node of the tree is at the center, with leaves on the circumference. The area and angle of each arc corresponds to its value.

We'll be using data on New York City public housing repair violations, which I've categorized by category and subcategory. Accordingly, our sunburst diagram will have two tiers. This is part of a larger project mapping out NYCHA repair violations. The data has been pulled down from the NYC government's open database. To learn more about how to make requests to and work with data from this database, check out my post: NYC Open Gov, A Socrata API.

Although the sunburst diagram I created is 'zoomable', meaning you can zoom in on and expand each tier or node of the diagram, we'll be walking through the generation of a static sunburst diagram.

We'll be working with SVG (Scalable Vector Graphics), an XML markup language for describing two-dimensional vector graphics. SVG is very similar to HTML, but with SVG we can draw curves, whereas HTML allows only straight lines.

One important feature of SVG that we'll be utilizing is the "path" element. SVG Paths represent the outline of a shape that can be stroked, filled, used as a clipping path, or any combination of all three. The Path elementis defined by one attribute: "d". This attribute contains a series of instructions in the SVG Path Mini-Language. The Path element takes the instructions stored in the "d" attributes and uses them to draw a shape.

Now that we have some of these basics covered, let's jump into the code. We'll be working out way backwards from the function that actually draws our sunburst, back through all of its constituent parts.

Here's where the magic happens:

g.append("path")
    .attr("d", arc)
    .style("fill", function(d) { return color((d.children ? d : d.parent).name); })
    .on("click", click);

This part of our D3 drawSunburst function grabs a variable called g, appends an SVG path to it, gives that path an "d" attribute set equal to the arc function and then colors in the shape drawn by path with some further instructions.

In other words, we take the containers set up on the page to hold our sunburst and we draw shapes with some instructions that will be based on the data we are trying to represent.

What is 'g'? - Creating the layers of the sunburst

What is the "g" to which we are appending our path-drawn shapes?

var g = svg.selectAll("g")
      .data(partition.nodes(nychaData))
    .enter().append("g");

Here, we are selecting all of the "g" elements on our page (think of an svg "g" like and html "div") and binding some data to them. We need to create containers that correspond to the data we are trying to represent. That's were' D3's partition comes in.

Before we jump into the partition class, let's take a look at the data we are passing into this function, the nychaData we are trying to represent. The sunburst diagram is meant to represent hierarchical data, and it expects to receive and operate on an object that has parent and children nodes:

{"name": "NYCHA Repair Violations",
 "children": [{:name=>"Insect/rodent infestation",
  :children=>
   [{:name=>"mice", :count=>30},
    {:name=>"insect", :count=>14},
    {:name=>"rat", :count=>6},
    {:name=>nil, :count=>2}]},
    {:name=>"Window",
     :children=>
        [{:name=>"broken window guards", :count=>27},
        {:name=>nil, :count=>49},
        {:name=>"broken glass", :count=>2}]},
    {:name=>"Misc.", 
      :children=>
        [{:name=>nil, :count=>262}]},
    {:name=>"Plaster/Paint", 
      :children=>
        [{:name=>nil, :count=>313}]},
    {:name=>"Fire Hazard", 
      :children=>
        [{:name=>"smoke detector", :count=>51}]},
    {:name=>"Broken Lock", 
      :children=>
        [{:name=>nil,  :count=>43}]},
    {:name=>"CO Detector", 
      :children=>
        [{:name=>nil, :count=>46}]},
    {:name=>"Bathroom", 
      :children=> 
        [{:name=>nil, :count=>68}]},
    {:name=>"Water Leak", 
      :children=>
        [{:name=>nil, :count=>49}]},
    {:name=>"Water/Plumbing", 
      :children=>
        [{:name=>nil, :count=>15}]},
    {:name=>"Mold", 
      :children=>
        [{:name=>nil, :count=>22}]},
    {:name=>"Electrical", 
      :children=>[{:name=>nil, :count=>1}]}]
}

Here we've placed NYCHA repair violations into categories and subcategories. Not all of our categories have subcategories, but this is detailed enough for now. This particular application uses Rails. If you'd like to learn more about passing ruby objects from a Postgresql database into a Javascript function, check out my post on the gon gem. But for the purposes of this tutorial, let's just assume we have access to the data above and that it is stored in a variable, nychaData.

Now that we have our data, we can learn how the D3 partition class uses it to build appropriately sized nodes on our sunburst diagram.

Partition Layout

The partition layout produces adjacency diagrams: Similar to a tree diagram, but instead of drawing a line between parent and child nodes, nodes are drawn as solid areas. Their placement relative to other nodes reveals their position in the hierarchy. The size of the nodes is determined by some quantitative value associated with that node. For example, Our Insect/rodent infestation parent node has several children--mice, rats, roaches, and vermin. Each of these child nodes will be drawn in relation to their parent and their areas will be determined by the count, i.e. number of violations, associated with the violation.

Above, we're calling partition.nodes(nychaData)

This invokes the partition layout, returning the array of nodes associated with the specified root node.

You might be wondering: if we're invoking partition, where is it defined? That is the right question to ask. partition is defined above:

var partition = d3.layout.partition(nychaData)
      .value(function(d) { return d.count; });

The partition layout is part of D3's family of hierarchical layouts. These layouts follow the same basic structure: the input argument to the layout is the root node of the hierarchy, and the output return value is an array representing the computed positions of all nodes.

So, when we call partition.nodes(nychaData) inside the g function above, we're generating a new partition layer, creating a set of nodes, each of which have a value derived from the 'count' attribute of each node in our nychaData data structure.

Let's put it all together:

var g = svg.selectAll("g")
      .data(partition.nodes(nychaData))
    .enter().append("g");

We're setting a variable, g equal to the return of a chain of functions.

  1. svg.selectAll("g"): Select all of the "g" elements within the svg object. (svg is defined elsewhere in our program and has whatever width, height and radius that you choose).

  2. .data(partition.nodes(nychaData): Associate each "g" element with a newly generated node, sized according to the count attribute of that node, as stored in our nychaData data structure.

  3. .enter().append("g"): If there are more nodes than "g" elements, use .enter().append("g") to build more and associate each to a node.

g is now an array of "g" elements and their associated data.

Now that we've build a bunch of "g" containers and associated them with our data, we have to draw them into the actual sunburst shape.

Let's revisit our magic code from the beginning of this tutorial:

g.append("path")
    .attr("d", arc)
    .style("fill", function(d) { return color((d.children ? d : d.parent).name); })
    .on("click", click);

We need to take our g array and draw a shape for each element. We'll put these shaped together into our sunburst diagram. Let's take a closer look at the .append("path").attr("d", arc) portion of the above code.

SVG Paths and D3 shapes

We've already defined an SVG path, but this is a long post, so let's recap. An SVG "path" element is capable of drawing fancy shapes and its "d" attribute carries the instructions for drawing those shapes.

In the code snippet above, we are doing a handful of things:

  1. .append("path"): draw a shape for each element of the g array.
  2. attr("d", arc): set the "d" attribute for each new path element equal to the arc variable.

Right about now you are probably wondering what the arc variable is and even how on earth D3 is going to communicate to an SVG "path" element at all.

Let's dive into the arc variable, defined elsewhere in our code:

var arc = d3.svg.arc()
      .startAngle(function(d) { return Math.max(0, Math.min(2 * Math.PI, x(d.x))); })
      .endAngle(function(d) { return Math.max(0, Math.min(2 * Math.PI, x(d.x + d.dx))); })
      .innerRadius(function(d) { return Math.max(0, y(d.y)); })
      .outerRadius(function(d) { return Math.max(0, y(d.y + d.dy)); });

Okay, there is a lot going on there, but we're going to take it one step at a time.

D3 Path Data Generators

D3 includes a number of helper classes for generating path data and arc is one such class. Each generator is a function and you can define accessor functions on it that the path generator will use to produce path data.

In this case, we need to tell our arc path generator its start angle, end angle, inner radius and outer radius. Those function definitions are based on x and y variables that we defined elsewhere in our code:

var x = d3.scale.linear()
      .range([0, 2 * Math.PI]);

var y = d3.scale.linear()
      .range([0, radius]);

At this poinst, you may be wondering what a D3 scale is and how it works. We'll you are super lucky because I'm about to tell you.

D3 Scales

Scales are functions that map from an input domain to an output range, i.e. the desired output visualization (such as the height of bars in a bar chart or the area of a weird fan shape in a sunburst diagram).

Think of the scale as a little machine that takes in your data and maps or scales it to a corresponding value that will be used to inform each "path" element of what to draw and how.

Later in our program, when we create our arc variable and assign it to the "d" attribute of each "path" element, our scales kick in. They'll take data about each of the nodes we created (bound to "g" elements, if you recall), and use it to output values that will inform the start angle, end angle, inner and outer radius of the shape being drawn.

Guess what? We're almost done! So far we've created "g" elements, associated them with nodes that correspond to our nychaData, drawn SVG shapes of the appropriate size/location for each node. Now, we have to color it in.

D3 Categorical Colors

Remember when you were a kid and you would take out your coloring book, choose your favorite Disney princess and fill it in with glorious colors of your choosing?

This is nothing like that.

Once again, let's visit our magic code from the beginning of this tutorial:

g.append("path")
    .attr("d", arc)
    .style("fill", function(d) { return color((d.children ? d : d.parent).name); })
    .on("click", click);

With the following snippet: .style("fill", function(d) {return color((d.children ? d " d.parent).name); })

We are applying a fill or background color to the shape that constitutes each "g" element. Above, we are invoking the color function with some arguments.

What's the color function? Great question. We've defined it elsewhere in our program:

var color = d3.scale.category20c();

The above code constructs a new ordinal scale with a range of twenty categorical colors. When set the "fill" of each shape equal to the return value of color, we give color the argument d.children or d.parent. Here, d refers to the current node.

And that's it! We're ignoring the click function for now--maybe a later post will detail the 'zoomability' of this diagram. But, as it stands, we've built a really beautiful, static, sunburst together.

D3 can be complex and I hope this tutorial has been of some help, although we didn't cover everything. You can check out my code for this project to go deeper. I also strongly recommend diving into the D3 documentation. The docs have been the main source for this post and they are extremely clear and thorough.

Enjoy!

subscribe and never miss a post!

Blog Logo

Sophie DeBenedetto

comments powered by Disqus
comments powered by Disqus