NYC Open Data: A Socrata API

As a member of a workers center in Manhattan, The National Mobilization Against Sweatshops, I've organized with community members to fight displacement and stop the city's neglect of public housing. Hearing NYCHA residents on the Lower East Side describe the many repair issues they lived with, I became curious about such issues around the city.

I wanted to create an web app that could access and visualize this data using Javascript's D3 library (you can learn more about D3 in my post on the D3 sunburst diagram). At first I had low expectations of what it would be like to access this information. Anyone who has worked with APIs to access data sets knows how frustrating it can be to parse dense, sometimes confusing, documentation and learn the the ins and outs of these resources. Add that to the fact that governments don't always have the tech resources they need and I thought I was going to have to endure a big headache to find what I was looking for.

Well...I was wrong! NYC Open Data is one of a few hundred Socrata APIs. Socrata hosts over one hundred different data catalogs for governments, non-profits, and NGOs. The Socrata Open Data API (SODA) allows you to programmatically access a ton of open data resources from these sources. Socrata has extensive and clear documentation of utilizing their APIs with myriad languages, including Ruby.

How does it work?

First, visit easy-to-use and nicely designed NYC Open Data website:

There, you can search for datasets that interest you--anything from health, housing, public safety and much more.

Once you're viewing a data set, click the "export" button on the top right of the document. From there, you can jump into the API documentation specific to this data set and start accessing data programmatically.

You'll need to register for an API key here, then you're ready to the query the API.


Socrata APIs are accessed via the Socrata Query Language (SoQL). SoQL is a very straightforward query language, based on SQL, that is specifically designed to make it easy to access data on the web.

The API follows REST-ful paradigms. All requests are made through HTTPS and available response types include JSON, XML, and CSV--selectable by the "extension" (.json, for example) on the request.

The following example would return the first thousand records from the data set recording NYCHA repair violations:

For the purposes of my application, however, I needed to examine violations by borough. SoQL makes such query filters incredibly simple.$where=boroid = 1

The above query would return the first thousand records for the borough of Manhattan. The SoQL Querying documentation is extensive, but to give you a preview:

"SoQL statements are broken into “parameters” similar to clauses in SQL statements. Each clause can be expressed either directly as a URL parameter or as a SoQL statement."

Using Socrata APIs with Ruby

Working with the NYC Open Data API is even easier if you're programming with Ruby. There's a gem for it! The Soda Ruby Gem is very easy to use.

  1. In your gem file, include the following line: gem 'soda-ruby', :require => 'soda'

  2. Create a new client: client ={:domain => "", :app_token => ENV["OPEN_DATA_APP_TOKEN"]})

  3. Issue a query: client.get("wvxf-dwi5", {:boroid => 1})

The two arguments above are the resource code (code of the NYCHA repair violations data set) and the filter of borough id (1 represents the borough of Manhattan).

Populating our database

For this Rails app, I created a Postgresql database and used the following code in my seed file to populate it with records or NYCHA repair violations:

client ={:domain => "", :app_token => ENV["OPEN_DATA_APP_TOKEN"]})

#manhattan borough data, id = 1
#bk borough data, id = 3
#bronx borough data, id = 2

records = []
records << client.get("wvxf-dwi5", {:boroid => 1})
records << client.get("wvxf-dwi5", {:boroid => 2})
records << client.get("wvxf-dwi5", {:boroid => 3})

records.each do |record|
  record.each do |r|
    violation = Violation.find_or_create_by(violation_id: r.violationid, building_id: r.buildingid, house_number: r.housenumber,
      street_name: r.streetname, apt_number: r.apartment, zip_code: r["zip"], approval_date: r.approveddate,
      current_status_date: r.currentstatusdate, current_status: r.currentstatus, boro: r.boro, 
      description: r.novdescription[20..-1].downcase, boro_id: r.boroid )

    if r.novdescription.downcase.include?("mold")
      violation.keyword = "Mold"
    elsif r.novdescription.downcase.include?("water leak")
      violation.keyword = "Water Leak"
    elsif r.novdescription.downcase.include?("plastered") ||r.novdescription.downcase.include?("paint")
      violation.keyword = "Plaster/Paint"
    elsif r.novdescription.downcase.include?("lock")
      violation.keyword = "Broken Lock"
    elsif r.novdescription.downcase.include?("smoke detector")
      violation.keyword = "Fire Hazard"
    elsif r.novdescription.downcase.include?("mice") || r.novdescription.downcase.include?("roaches") || r.novdescription.downcase.include?("rodents") || r.novdescription.downcase.include?("vermin")
      violation.keyword = "Insect/rodent infestation"
    elsif r.novdescription.downcase.include?("electric outlet") || r.novdescription.downcase.include?("light switch")
      violation.keyword = "Electrical"
    elsif r.novdescription.downcase.include?("window") || r.novdescription.downcase.include?("missing glass") || r.novdescription.downcase.include?("window guards")
      violation.keyword = "Window"
    elsif r.novdescription.downcase.include?("lead paint")
      violation.keyword = "Lead Paint"
    elsif r.novdescription.downcase.include?("carbon monoxide")
      violation.keyword = "CO Detector"
    elsif r.novdescription.downcase.include?("bathroom") 
      violation.keyword = "Bathroom"
    elsif r.novdescription.downcase.include?("defective faucets") || r.novdescription.downcase.include?("hot water") || r.novdescription.downcase.include?("cold water")
      violation.keyword = "Water/Plumbing"
      violation.keyword = "Misc."

Let's break this down.

  1. Instantiate the client
  2. Make requests to the API for records from each borough and push them into an array of "records".
  3. Iterate over the array, turn each record that has come down from the API into an instance of "Violation" (for which you have a model and a migration in your rails app).
  4. Store that Violation in your database.

You probably notice that there is still something else going on there. Something to do with keywords and descriptions. For my data visualizations, I wanted to represent categories of repair issues--i.e. fire hazard, infestation, etc. So, in creating my Violations and storing them, I check the content of each record's description for certain tags and assign them a category accordingly.

And that's it! I now have a starting point of 3,000 records to visualize.

Before you go, let me just recommend again that you dive into the NYC (or any!) Open Data Socrata API. These APIs are easy to use (especially for anyone who has used SQL), free to access and provide a wealth of information about our (and many other) cities.

subscribe and never miss a post!

Blog Logo

Sophie DeBenedetto

comments powered by Disqus
comments powered by Disqus