CI/CD: Let's build a CI-runner

When I push a commit to main, I want my changes to be deployed automatically. For this, we need a CI/CD tool. Obviously we could use an existing one, but where's the fun in that?


  ┌─────┐          ┌────────────┐          ┌────────────┐
  │ You │ commit() │ Repository │ magic()  │ Deployment │
  │     ├─────────►│            ├─────────►│            │
  └─────┘          └────────────┘          └────────────┘

The basic idea of CI/CD is simple. You commit something to your repository, some magic happens and your code is live on production. What steps the magic consists of differs per CI tool, and also depends on how closely the runner is integrated with the repository host. When they are closely integrated (for instance, when using Gitlab and Gitlab Runner), you could have the Git host process an incoming commit, parse a job or pipeline definition (usually a YAML file), put the resulting CI jobs in a queue, then have the runner pull jobs from the queue and execute them. A flow roughly looking like this:


  ┌─────┐          ┌────────────┐            ┌───────────┐
  │ You │ commit() │ Repository │ add-jobs() │ Job Queue │
  │     ├─────────►│            ├───────────►│           │
  └─────┘          └────────────┘            └────▲──────┘
                                                  │     
                                              pull-job()
                                                  │     
                                             ┌────┴──────┐           ┌────────────┐
                                             │  Runner   | run-job() │ Deployment │
                                             |           ├──────────►│            |
                                             └───────────┘           └────────────┘

As we are building a standalone runner, not a git-host, we have to process our job definitions on the runner side. So rather than the above our flow will be roughly this:


                      ┌────────────┐
                      │ Repository │
                      │            |
                      └─────▲──────┘
                            │     
                    pull-repository()
                            │     
  ┌────────┐           ┌────┴──────┐           ┌────────────┐
  │ You or │ trigger() │  Runner   | run-job() │ Deployment │
  │  CRON  ├──────────►|           ├──────────►│            |
  └────────┘           └───────────┘           └────────────┘

To keep our sanity intact we will start this exercise with a rather limited scope. This means that for now we will manually trigger our runner. The runner will check our repositories hosted on Sourcehut for build definitions, then run the appropriate jobs. These jobs will not be much more than shell scripts.

The project layout:

runner/
├─ config/
│  ├─ token              <-- an access token for our git host (sourcehut)
|
├─ graphql/              <-- we need graphql for our sourcehut client
|
├─ src/
│  ├─ clients/
│  |  ├─ git.rs
│  |  ├─ sourcehut.rs
|  |
│  ├─ model/       
│  |  ├─ job.rs
│  |  ├─ repository.rs
|  |
│  ├─ main.rs            <-- main fetches a list of repositories with job definitions
│  ├─ runner.rs          <-- runner actually runs the jobs
|
├─ Runner.toml

!!! During the building of this project I've done some things in a way that I would not recommend for any kind of serious project. For instance the first version of the sourcehut client used std::process::Command to call a curl binary, rather than use a http client. It does not anymore, however the git client still uses Command to run git commands. You have been warned! !!!

Finding out which repositories have build definitions

To find out what jobs we want to actually run, we query Sourcehut. For this we need a graphql query like the one listed below and an access token (SGFoYSBubywgdGhpcyBpcyBub3QgYW4gYWN0dWFsIGFjY2VzcyB0b2tlbiwgYnV0IG5pY2UgdHJ5Lg==). This query returns a list of repositories for the user the access token is associated with (me), and an entry for path when that repository contains a Runner.toml file, our job definition.

query RepositoriesQuery {
  me {
	  repositories {
   	  results {
        name
        path ( revspec: "HEAD", path: "Runner.toml" ) {
          id
        }
      }
    }
  }
}

We then use the crates graphql_client to build the query. graphql_client uses a derive macro to generate structs, so that we can interact with graphql in a typed way.

#[derive(GraphQLQuery)]
#[graphql(
    schema_path = "./graphql/schema.graphql",
    query_path = "./graphql/repositories.graphql"
)]
struct RepositoriesQuery;

We then use ureq to send the request. I chose ureq as it is a simple client that avoids async.

let token = fs::read_to_string("./config/token").expect("missing token file");

let request_body =
    json! { RepositoriesQuery::build_query(repositories_query::Variables{}) };
let request_body_string = request_body.to_string();

let response = ureq::post("https://git.sr.ht/query")
    .set("Authorization", &format!("Bearer {}", &token))
    .set("Content-Type", "application/json")
    .send_string(&request_body_string)
    .unwrap()
    .into_string()
    .unwrap();

let response_body: Response<repositories_query::ResponseData> =
    serde_json::from_str(&response).unwrap();
let data: RepositoriesQueryMe = response_body.data.unwrap().me;

Finally we filter the response, and collect only repositories with a Runner.toml file.

let repos_with_job = data
    .repositories
    .results
    .iter()
    .filter(|repo| repo.path.is_some())
    .map(|repo| Repository::new(&data.canonical_name, &repo.name))
    .collect::<Vec<_>>();

Clone repositories with job descriptions

We use our git "client" to clone repositories. At the moment, for the sake of simplicity, we "just" clone the repositories to a subfolder. We don't delete the repositories just yet, so we run git clone / fetch / pull to be sure we have the latest version of the main branch. To run git commands we use Command to run commands with the git binary present on the machine, where local_path is the subfolder to clone the repository to.

let output = Command::new("git")
        .arg("clone")
        .arg(&repository.uri)
        .arg(repository.local_path.as_os_str())
        .output()
        .unwrap();

Run the Job

When we have the repository we can deserialize Runner.toml using our Job struct:

pub struct Job {
    pub script: Option<String>,
}

Then we use Command again to run the script, where sourcehut_access_token is our one and only "pipeline variable".

let output = Command::new("/bin/sh")
    .current_dir(&repository.local_path)
    .env("sourcehut_access_token", &token)
    .arg("-c")
    .arg(format!("{}", build.script.unwrap()))
    .output()
    .unwrap();

A job to deploy our blog

We get to write our first job now! We use our environment variable set by calling Command::new(..).env() to authenticate our call to Sourcehut, and upload a tarball containing our generated blog.

script = """
cd generator
cargo +nightly run
cd ..
tar -cvzf site.tar.gz -C www .
curl --oauth2-bearer "${sourcehut_access_token}" -Fcontent=@site.tar.gz https://pages.sr.ht/publish/ron.srht.site
"""

Done?

Never! But we have our blog with static site generator, and a custom tool to deploy it.

Code

The code for this runner lives in the runner repository.