Mark III Systems Blog

Bio-IT World 2019 Hackathon: Building Pipelines and having a BLAST in Boston!

On Monday and Tuesday, 4/15 and 4/16, I had the opportunity to participate in a hackathon at the Bio-IT World conference in Boston.  The experience was amazing.  The hackathon was put on the NCBI hackathon team of Ben Busby (@DCGenomics), Kaitlyn Barago (@KaitlynMBarago), and Allissa Dillman (@DCHackathons).  We have worked with this team at other hackathons that Mark III Systems has sponsored.

The overarching theme of the hackathon was FAIR data principles in science.  FAIR stands for

F – Data is Findable

A – Data is Accessible

I – Data is Interoperable

R – Data is Re-usable

(https://www.force11.org/group/fairgroup/fairprinciples)

My team’s topic was “BLAST, Pipelines, and FAIR”.  If you are not familiar with BLAST, it is a free software product, developed by NCBI, that allows you to search for DNA or protein sequences in a pre-made or custom databases.  BLAST was by far the software package I used more than anything in my graduate work.  I am a huge fan.

(https://blast.ncbi.nlm.nih.gov/Blast.cgi)

I was fortunate that our team leader was Tom Madden who is the head of BLAST team at NCBI!

Our project was to create a re-usable pipeline that could be used to automate a bioinformatic pipeline so it could be run by anyone in a standard environment, such as Linux. The only thing that would have to be changed is the input files that are used.  For more detail on the actual pipeline and our final presentation, see our team’s Github page.

(https://github.com/NCBI-Hackathons/BLAST-Pipelines-and-FAIR).  The presentation is in the Slides folder.

Our pipeline was developed using CWL (Common Workflow Language).  CWL is an open source framework for creating workflow pipelines.  All configuration information, such as data file paths, are stored in YAML files.  As with CWL, YAML files are widely used as configuration files (for example, in Hadoop).

In the end, we had 1 very nice CWL file that ran the entire pipeline and 3 YAML configuration files.  The CWL workflow was run using only one command from the command line.

Overall, it was a great experience.  I met some great people from varied backgrounds and enjoyed the camaraderie and cooperation between all of the teams.  I can’t wait for next year’s hackathon!

Big thanks to all of my teammates!

Amanda Ruby, Software Engineer/Bioinformatics Analyst at Rheonix, Inc.  @AmandaRubyBio 

Tom Madden, Team Lead for BLAST at the NCBI.  @tom6931 

Alexander Jung, Head of Digitalization Biologicals Development CMC at Boehringer Ingelheim

Matt Doherty, Founder at Resolute.ai, @ResoluteAI

Jody Burks, Developer Advocate, Quantum Computing Ambassador IBM, @JodyBurksPhD