Elixir Linter: Building an OTP Elixir Code Quality Engine with Credo

While learning Elixir, I've found that it's often-times similar feel to Ruby can be both a blessing and a curse. One the one hand, its semantic and eloquent style makes it feel familiar and manageable. On the other hand, being a part of the well-established Ruby community has spoiled me for support systems and tools. It's easy to forget that Elixir is still in its infancy and feel surprise when the community doesn't provide quite the level of support that Ruby does.

For example, as a Rubyist, I'm used to having my pick of code quality tools––Rubocop, Rubycritic and Flog are just a few that come to mind while I write this. Elixir doesn't yet provide such a wealth of options, but one code quality tool has proven to be well documented, well maintained and comprehensive. Credo can

show you refactoring opportunities in your code, complex and duplicated code fragments, warn you about common mistakes, show inconsistencies in your naming scheme and - if needed - help you enforce a desired coding style.

Credo is easy to use--simply add it to the dependencies of any Elixir project and run it from the command line like this:

$ mix credo

It will run through all of the files specified in your projects config/.credo.exs and check them on various code quality metrics. It will then print out a nicely formatted and prettily colored report marking the ares of your code that need improvement, explaining why those areas are not up to code (get it?) and suggesting changes.

Credo is a great tool, but I wanted a way to quickly and easily run it against a number of projects, without needing to go through the work of cloning down the repo, adding Credo as a dependency, creating a Credo config file for the project, and running it myself. I wanted a quick, easy and modular way to execute Credo against any project that could be run from the command line or be packaged into a web application.

Enter Elixir Linter, a OTP application for running Credo against any GitHub repo from your command line. In this post, we'll walk through the build of this application step-by-step, paying particular attention to how OTP supervisors are leveraged to design a fast and fault-tolerant engine for running Credo.

In my next post, we'll take our OTP app and make it command line executable with the help of a CLI module and the Mix Escripts task. Then, stay tuned for a later post on incorporating our OTP app in a Phoenix application so we can run our tool on the web.

Let's get started!

OTP Architecture

Program Requirements

Before we start writing any code, let's lay out the architecture of our OTP application. We'll begin by breaking down the jobs that our app will have to be responsible for.

Given a repo name, our app will need to clone it down.
Given a cloned-down repo, our app will need to run Credo against that repo and collect the results.
Once the repo has been linted for code quality, our app will need to output those results to the command line.
Also once the repo has been linted for code quality, the application will need to remove the cloned repo from its cloned location.

Now that we understand the basic responsibilities of our application, let's identify some of the opportunities for the program to fail. This will help us to design our supervisor tree.

The application could fail to clone down the repo.
The application could fail to properly lint the repo.
The application could fail to properly output the results of the lint.
The application could fail to remove the cloned repo.

Our supervisor tree should account for these failure points by having a supervisor dispatch each discreet task. That way, if any task fails, its parent will be able to spin it back up and try again.

The Supervisor Tree

We will start up our OTP app by starting our top-level supervisor with an argument of the name of the repo want to clone and lint.

We will have a top-level supervisor that runs two children: a worker in charge of storing the name of the repo we are trying to clone (our "repo name store") and a sub-supervisor.

This sub-supervisor will know the PID of the repo name store, and pass it to a child worker that does the heavy lifting of cloning and linting the repo. We'll call this our "server".

Our server will in turn supervise the main tasks of:

Cloning the repo to a temp folder
Linting the repo
Outputting the results
Removing the cloned repo from the temp folder

Ready to lint some code? This cat is:

Generating an OTP Application

Mix makes it easy to generate the skeleton for a basic OTP application.

$ mix new elixir_linter --sup

This generates a project with the following structure:

└── elixir_linter
    ├── README.md
    ├── config
    │   └── config.exs
    ├── lib
    │   └── elixir_linter.ex
    ├── mix.exs
    └── test
        ├── elixir_linter_test.exs
        └── test_helper.exs

The lib/elixir_linter.ex file is the entry point of our app. The call to ElixirLinter.start will start up our entire supervisor tree. So, this is the function in which we should start our top-level supervisor by calling on our top-level supervisor module. We'll keep it simple:

defmodule ElixirLinter do
  use Application

  def start(_type, repo_name) do
    {:ok, _pid} =   
      ElixirLinter.Supervisor.start_link([repo_name])
  end
end

We'll begin with our Supervisor module.

Building the Top-Level Supervisor

We'll define our top-level supervisor in a new subdirectory, lib/elixir_linter/.

# lib/elixir_linter/supervisor.ex

defmodule ElixirLinter.Supervisor do 
  use Supervisor

  def start_link(repo) do 
   # start children 
  end

  def init(_) do 
    # start supervising and institute supervision strategy
  end
end

First, we pull in the Supervisor module with the help of the use Supervisor line.

Then, in order to conform to and leverage that module, we define two functions: #start_link and #init

We call the #start_link function to start up our supervisor, which will in turn invoke the #init callback function.

# lib/elixir_linter/supervisor.ex
...

def start_link(repo) do 
  result = {:ok, sup} = Supervisor.start_link(__MODULE__, [repo])
  start_workers(sup, repo)
  result
end

def start_workers(sup, repo) do 
  {:ok, store} = Supervisor.start_child(sup, worker(ElixirLinter.Store, [repo]))
  Supervisor.start_child(sup, supervisor(ElixirLinter.SubSupervisor, [store]))
end

def init(_) do 
  supervise [], strategy: :one_for_one
end

Let's break this down.

First, our start_link function is designed to take an an argument of GitHub repository, which will eventually be passed in from the command line in the format: owner/repo_name.
start_link calls on Supervisor.start_link, passing it the name of the module, and the repo name as an argument. This starts our supervisor running.
Then we call on a helper function, start_workers, to start the children of our supervisor tree. One of our children is a worker process, the running of our ElixirLinter.Store (coming soon!), in which we will store the repo name. It get's started up like this:

Supervisor.start_child(sup, worker(ElixirLinter.Store, [repo]))

To the Supervisor.start_child call, we pass in our supervisor process PID, sup, and the worker it will supervise--the worker running ElixirLinter.Store. This will start the ElixirLinter.Store module running via ElixirLinter.Store.start_link(repo). We'll come back to that when we're ready to build the store.

Our start-up of the store worker returns a tuple with {:ok, store}, in which store represents the PID of the worker running the ElixirLinter.Store module.

Our second child process is itself a supervisor. When we build out that sub-supervisor in a bit, we'll teach it to run our server module, which will in turn run several tasks, as laid out in our earlier diagram. It get's started up by this line:

Supervisor.start_child(sup, supervisor(ElixirLinter.SubSupervisor, [store]))

Here, we pass Supervisor.start_link our supervisor process PID, as sup, and the instruction supervisor(ElixirLinter.SubSupervisor, [store], which starts up a supervisor worker running our ElixirLinter.SubSupervisor module. That module will be started up automatically via its own #start_link function, with an argument of the store PID.

Now that our top-level supervisor knows how to get started up, and in turn start the worker running the store module and the sub-supervisor that will run the server module, we're ready to build out the next level of our supervisor tree.

The `ElixirLinter.Store` Module

Why Do we Need a Store? Fault Tolerance!

The purpose of this module is to act as a central repository for the name of the repo passed in to our supervisor tree when it starts up. We want to isolate the repo name and store it here in the second level of our supervisor tree so that the sub-supervisor child process can always retrieve the repo name should any of it's children fail and need to be restarted. If a failure occurs that causes the server module running our repo-cloning code to die, for example, the sub-supervisor will be able to start it back up and retrieve the repo name from the store module.

Now that we know why we need it, let's build it!

Using Agents to Maintain State

Our store module has just one responsibility––hold on to the repo name that get's passed in to our top-level supervisor.

We'll use Elixir's Agent module to store repo name is state.

# lib/elixir_linter/store.ex
defmodule ElixirLinter.Store do 
  def start_link(repo) do
    Agent.start_link(fn -> %{repo_name: repo} end, name: __MODULE__)
  end
end

When the store module is started up via start_link, it starts an agent running.

Agent.start_link takes in a first argument of a function, and the return value of that function becomes the state of the module.

We know that our store module will need to respond with the repo name when asked by the sub-supervisor, so we'll build out a get_repo method that does just that.

# lib/elixir_linter/store.ex
defmodule ElixirLinter.Store do 
  def start_link(repo) do
    Agent.start_link(fn -> %{repo_name: repo} end, name: __MODULE__)
  end

  def get_repo(pid) do 
    Agent.get(pid, fn dict -> dict[:repo_name] end)
  end
end

Our get_repo function wraps a call to Agent.get, passing in the PID of the process whose state we want to query. Agent.get takes in a second argument of a function, which will be automatically passed an argument of the current state of the process whose pid is the first argument. Since the state of our store module process is the map returned by the function we passed to Agent.start_link, we can use the dict[] function to retrieve the repo name we stored earlier.

Now we're ready to build out the sup-supervisor process that will take in the store module process's PID and use it to retrieve the repo name, clone down the repo and lint it for code quality.

The Sub-Supervisor Module

The sub-supervisor itself is fairly simple, it's job is to start up the main work horse of our application, the server module, and supervise it.

Recall that the top-level supervisor starts up the store worker and captures its PID to pass to the start-up of the sub-supervisor:

 # lib/elixir_linter/server.ex
{:ok, store} = Supervisor.start_child(sup, worker(ElixirLinter.Store, [repo]))
    Supervisor.start_child(sup, supervisor(ElixirLinter.SubSupervisor, [store]))

This will automatically call ElixirLinter.SubSupervisor.start_link(store). So let's build out the sub-supervisor's start_link function to take in the store process's PID.

defmodule ElixirLinter.SubSupervisor do 
  use Supervisor 

  def start_link(store_pid) do 
    {:ok, _pid} = Supervisor.start_link(__MODULE__, store_pid)
  end

  def init(store_pid) do 
    child_processes = [worker(ElixirLinter.Server, [store_pid])]
    supervise child_processes, strategy: :one_for_one
  end
end

Our start_link function starts up the supervisor, passing in an argument of the PID of the store worker.

Calling Supervisor.start_link, inside our own ElixirLinter.SubSupervisor module invokes our init function, which starts a worker running our server module, and tells our sub-supervisor to supervise it.

Now that we see how and when our ElixirLinter.Server module gets started with an argument of the store worker's PID, let's build out our work horse module, ElixirLinter.Server.

The Work-Horse: `ElixirLinter.Server`

Before we start writing code, let's remind ourselves what our server module has to do for us.

It needs to:

Clone down the given repo to the tmp/
Lint that repo for code quality (with the help of Credo)
Output the results to the command line
Remove the cloned down repo from the tmp/ directory

Of course it would be madness to have just one module handle all of these responsibilities, so we will rely on a number of helper modules. But our server module will be responsible for spinning up these helpers and supervising them as needed.

First things first, our start_link function:

# lib/elixir_linter/server.ex
defmodule ElixirLinter.Server do
  def start_link(store_pid) do
    repo_name = List.first(ElixirLinter.Store.get_repo(store_pid))
    {:ok, task_supervisor_pid} = Task.Supervisor.start_link()
    Agent.start_link(fn -> %{repo_name: repo_name, task_supervisor: task_supervisor_pid} end, name: __MODULE__)
  end

We're doing a few things here:

Retrieve the repo name from the worker running the ElixirLinter.Store module.
Start another supervisor process, this one will be used to supervise the cloning and linting tasks.
Store the repo name and the task supervisor process PID in this module's state, with the help of Agent.start_link.

Next up, we'll want to invoke functions that run those tasks. Our goal is for the following code to work:

# lib/elixir_linter/server.ex
defmodule ElixirLinter.Server do
  def start_link(store_pid) do
    repo_name = List.first(ElixirLinter.Store.get_repo(store_pid))
    {:ok, task_supervisor_pid} = Task.Supervisor.start_link()
    Agent.start_link(fn -> %{repo_name: repo_name, task_supervisor: task_supervisor_pid} end, name: __MODULE__)
    fetch_repo
    |> lint_repo
    |> process_lint
  end

Let's build out those functions now, then we'll build the modules that support them.

# lib/elixir_linter/server.ex
...

def fetch_repo do
  get_task_supervisor
  |> Task.Supervisor.async(fn ->
    get_repo
    |> ElixirLinter.RepoFetcher.fetch
  end)
  |> Task.await
end

def get_repo do 
  Agent.get(__MODULE__, fn dict -> dict[:repo_name] end)
end

def get_task_supervisor do 
  Agent.get(__MODULE__, fn dict -> dict[:task_supervisor] end)
end

Let's break down our fetch_repo function.

First, is uses a helper function, get_task_supervisor to retrieve the task supervisor PID from state.
Then, it tells the task supervisor to supervise the execution of an anonymous function that does two things:
- Fetch the repo name from state
- Pass that repo name to the execution of our ElixirLinter.RepoFetch.fetch function. This is the module where we'll build the "cloning down the repo" functionality. More on this later.
Lastly, we pass the return of calling Task.Supervisor.async, a Task struct, to Task.await. which will return to us the result of the anonymous function that our task supervisor executes. This will, in effect, return the return of ElixirLinter.RepoFetcher.fetch. We'll code this function later on to return the destination to which we've cloned the repo, i.e. the filepath to the cloned directory. In this way, we will be able to pass that filepath to our linting function so Credo knows which files to lint.

It's important to understand that we used Task.async and Task.await to wait on the execution of our asynchronous repo-cloning code and return the result of that code's execution.

This return value get's piped in to our lint_repo function.

Let's take a look at that function now.

...
def lint_repo(filepath) do
  get_task_supervisor
  |> Task.Supervisor.async(fn ->
    ElixirLinter.Linter.lint(filepath)
  end)
  |> Task.await
end

lint_repo behaves similarly to fetch_repo. It retrieves the task supervisor PID from state and uses it to spin up and supervise the execution of our linter module's lint method. We'll build out that module in a bit.

This function to uses Task.await to wait for and capture the return of executing our linting code so that it can be piped into our next function, the process_lint function, which will be responsible for outputting the results to the command line. So, when we build out our ElixirLinter.Linter.lint function, we'll need to make sure it returns the results of the code quality checks.

Lastly, our process_lint function:

...
def process_lint(results) do
  repo_name = get_repo
  worker = get_task_supervisor
  |> Task.Supervisor.async(fn -> 
    ElixirLinter.RepoFetcher.clean_up(repo_name)
  end)
  ElixirLinter.Cli.print_to_command_line(results)
  Task.await(worker)
end

This function will take in the results of the code quality checks run by Credo, and pass them to the ElixirLinter.Cli module to be output to the terminal. We'll also build out this module shortly.

This function is also responsible for running some code to "clean up" after ourselves, i.e. remove the cloned repo from the tmp/ directory into which we originally cloned it.

This functionality is once again managed by our task supervisor process.

Now that we have our server module spec-ed out, we'll build our helper modules.

Cloning the Repo

The ElixirLinter.RepoFetcher module is responsible for cloning down the given repo, as well as removing it after the code quality lint is complete.

We'll clone down our repo with the help of the Porcelain Elixir library, which will allow us to execute shell commands from inside our Elixir program. Think of it as the equivalent of Ruby's system method.

First things first, we'll need to add Porcelain to our application's dependencies and make sure our app starts up the Porcelain application with it starts itself up.

# mix.exs
defmodule ElixirLinter.Mixfile do
  use Mix.Project

  def project do
    [app: :elixir_linter,
     version: "0.1.0",
     elixir: "~> 1.3",
     build_embedded: Mix.env == :prod,
     start_permanent: Mix.env == :prod,
     deps: deps()]
  end

  def application do
    [applications: [:logger, :porcelain]]
  end

  defp deps do
    [
      {:porcelain, "~> 2.0"}
    ]
  end

Then we'll run mix deps.get to install our new dependency.

Now we're ready to build out our RepoFetcher module.

Our main function is fetch_repo. It needs to clone down the repo into a tmp/ directory, which we'll create as an empty directory in the root of our project.

 # lib/elixir_linter/repo_fetcher.ex

defmodule ElixirLinter.RepoFetcher do
  @dir "tmp"

  def fetch(repo) do
    repo
    |> get_repo_name
    |> clone_repo_to_tmp(repo)
  end

  def get_repo_name(repo) do
    String.split(repo, "/")
    |> List.last
  end

  def clone_repo_to_tmp(repo_name, repo) do
    Porcelain.shell("git clone https://#{Application.get_env(:elixir_linter, :github_oauth_token)}:x-oauth-basic@github.com/#{repo} #{@dir}/#{repo_name}")
    "#{@dir}/#{repo_name}"
  end

Notice that we've set a module attribute to store the name of the directory to which we are cloning repos: tmp.

The fetch_repo method pipes the full name of the repo, owner/repo_name into the get_repo_name function. This function grabs just the name of the repo, minus the owner's name, so that we are able to return the final destination of the repo, its cloned location in tmp/.

fetch_repo calls on a helper function, clone_repo_to_tmp, which uses Porcelain to execute the git clone shell command.

And that's it. Our server module's call to ElixirLinter.RepoFetcher.fetch_repo will clone the repo to tmp/ and return the path to the cloned repo: tmp/some_great_elixir_repo. This filepath gets piped into the server module's call to ElixirLinter.Linter.lint. Let's build that our now.

Linting the Repo with Credo

Our Linter module will rely on Credo to check the repo for code quality. First things first, let's include Credo in our application dependencies and run mix deps.get.

defmodule ElixirLinter.Mixfile do
  use Mix.Project

  def project do
    [app: :elixir_linter,
     version: "0.1.0",
     elixir: "~> 1.3",
     build_embedded: Mix.env == :prod,
     start_permanent: Mix.env == :prod,
     deps: deps()]
  end

  def application do
    [applications: [:logger, :porcelain, :credo]]
  end

  defp deps do
    [
      {:credo, "~> 0.5", only: [:dev, :test]},
      {:porcelain, "~> 2.0"}
    ]
  end

Our Linter module will wrap up a call to Credo's Check.Runner module:

Credo.Check.Runner.run(parsed_source_files, config)

So, we'll need to pass in a list of the Elixir files in a the given project, parsed by Credo.SourceFile.parse, which breaks them down into their constituent lines, and the Credo config map that is the result of running the following Credo code with filepath to a project that includes a valid Credo config file:

Credo.Config.read_or_default(filepath, nil, true)

This call to Credo.Config.read_or_default takes in an argument of a path to a directory. If the directory contains a config/.credo.exs file, Credo will read that file. If not, it will look for a config/.credo.exs file in the current project. This is perfect for us because it allows us to include a default config file in our own application, while respecting the Credo configuration that the individual projects we clone down may include.

Let's take this one step at a time. First, we'll collect a list of all the Elixir files in the given directory.

Listing All the Project Files

We'll build a function, list_all, that takes in the path to the project and iterates over its directories, recursively listing all the files and collecting them in a new list, provided they end in .ex or .exs.

# lib/elixir_linter/linter.ex

defmodule ElixirLinter.Linter do 
  def lint(filepath) do 
    source_files = list_all(filepath)
    |> Enum.map(&Credo.SourceFile.parse(File.read!(&1), &1))
  end

  def list_all(filepath) do
    _list_all(filepath)
  end

  defp _list_all(filepath) do
    cond do
      String.contains?(filepath, ".git") -> []
      true -> expand(File.ls(filepath), filepath)
    end
  end

  defp expand({:ok, files}, path) do
    files
    |> Enum.flat_map(&_list_all("#{path}/#{&1}"))
  end

  defp expand({:error, _}, path) do
    collect_file({is_elixir_file?(path), path})
  end

  defp collect_file({true, path}), do: [path]

  defp collect_file({false, path}), do: []

  defp is_elixir_file?(path) do
    String.contains?(path, ".ex") || String.contains?(path, ".exs")
  end
end

We won't spend a lot of time going over our list_all code here. If you want to dig in deeper, check out my post on Building a Recursive Function To List All Files in a Directory

Next up, we'll construct the Credo config map.

Credo Config

# lib/elixir_linter/linter.ex

defmodule ElixirLinter.Linter do 
  def lint(filepath) do 
    source_files = list_all(filepath)
    |> Enum.map(&Credo.SourceFile.parse(File.read!(&1), &1))
    
    config = Credo.Config.read_or_default(filepath, nil, true)
      |> Map.merge(%{skipped_checks: [], color: true})
  end

Running Credo Checks

Now we have the parsed source files and the config with which to run Credo:

# lib/elixir_linter/linter.ex

defmodule ElixirLinter.Linter do 
  def lint(filepath) do 
    source_files = list_all(filepath)
    |> Enum.map(&Credo.SourceFile.parse(File.read!(&1), &1))
    
    config = Credo.Config.read_or_default(filepath, nil, true)
      |> Map.merge(%{skipped_checks: [], color: true})

    Credo.Check.Runner.run(source_files, config)
  end

This returns the results of the check in the following format:

{results, config}

In which results is a list of maps, each map representing a single checked file with keys like :filename and :issues . The :issues key points to another list of maps, each map representing a found issue and including its line number of origin, name and description.

This tuple will get passed by our server module to our Cli module for output to the terminal via ElixirLinter.Cli.print_to_command_line({results, config})

Printing The Results

Later on, our Cli module will grow to take input from the command line. For now, we'll focus on the print_to_command_line function which takes in an argument of the result tuple from our Linter.lint function call.

# lib/elixir_linter/cli.ex
defmodule ElixirLinter.Cli do
  def print_to_command_line({results, config}) do
    output = Credo.CLI.Output.IssuesByScope
    output.print_before_info(results, config)
    output.print_after_info(results, config, 0, 0)
  end
end

Our function is simple, it really just wraps up some existing Credo CLI code.

Removing the Cloned Repo

We're almost done! We just need to build the the RepoFetcher.clean_up function which will remove the cloned repo after Credo is done checking it.

# lib/elixir_linter/repo_fetcher.ex
...
def clean_up(repo) do
  IO.puts "Removing repo #{repo}......"
  repo
  |> get_repo_name
  |> delete_repo_if_cloned
end

def delete_repo_if_cloned(repo_name) do
  File.ls!(@dir)
  |> Enum.member?(repo_name)
  |> remove_repo(repo_name)
end

def remove_repo(true, repo_name), do: File.rm_rf("#{@dir}/#{repo_name}")
def remove_repo(false, repo_name), do: nil

Here, we:

List the files in the tmp/ directoy.
Pipe the results of that action into the remove_repo function, which uses pattern matching to either remove the repo if it is present, or do nothing if it is not.

And that's it!

Conclusion

In my next post, we'll turn our application into an executable with the help of Escripts. In the meantime, let's sum up what we've covered here.

We:

Built an OTP application that runs a supervisor tree.
Used that tree to store repository name data and run a server that fetches that data and uses it to clone down repositories and check them for code quality.
Used that tree to run and supervise tasks.
Integrated a third-party application, Porcelain, and leveraged it to clone down GitHub repos.
Integrated a third-party application, Credo, and leveraged it to lint and output code quality issues to the terminal.

One last disclaimer--I'm still very new to Elixir, and I'm sure there are improvements that can be made to the pattern laid out here. Feel free to share pointers or questions in the comments. Thanks and happy coding!