fast fasta processing in go

This thing I built: go-fasta

fast fasta processing in go

This thing I built: go-fasta

github.com/CNuge/go-fasta

I spend a lot of time working with fasta files, the standard for storing of biological sequence information (things like DNA sequences or protein sequences). For a little side project, I wanted to create a tool to streamline my ability to manipulate these files. Things like merging files is relatively simple, but splitting files, summarizing sequence lengths and other tasks require more complex UNIX commands that I would have to dig up every time I wanted to manipulate some fasta files. Getting new data in fasta format is also something I wanted to streamline, instead of downloading a few dozen separate files and then having to merge them, I wanted the ability to specify a set of unique sequence IDs and have them be downloaded into a single file within my current working directory. With this is mind I set out to design a command line executable program named go-fasta that would allow me to do the following:

• fasta file merger (taking many files and creating a single one)
• fasta file splitting (split a big file into single sequence fasta files)
• fasta file retrieval from NCBI (query NCBI and download new sequence data)
• fasta file summary (give a report of the sequence type (DNA/protein) the length of the sequence and its GC content for nucleic acids)
• fasta file sorting (alphabetically sort a fasta file).

I programmed this project in Go as I am currently working at developing my skills in language. This had a pleasant side effect of allowing me to implement one of Go’s big strengths, concurrency, to streamline the reading/writing and calculations on multiple files. I have made a version of the program (and the underlying go package that it uses) available on my github page. There you can find installation instructions and documentation on how to use the program’s command line flags interact with fasta files.

Avatar
Cameron Nugent
Research Scientist