In this vignette we demonstrate how to construct customized genome and annotation files using reform.

In this example, we will add a small construct that contains the fluorescent protein gene, mCitrine, and a selectable marker, kanMX and specify it’s integration proximate to the GAP1 locus in yeast.

Step 1: Generating input files

In our lab, we use Snapgene to generate the map of the novel sequence and the corresponding sequence in fasta format. The Export Feature Data function in snapgene requires a registered copy, so alternative tools should be used if this is not available.

Novel features map

This map includes all annotated features in the novel DNA. Inclusion of these annotations ensure that they will appear in the GFF.

Novel sequence

The sequence corresponding to the novel features should be in fasta format.

Generating the annotation file for novel features.

This is a two step process.

First, the list of features are exported from snapgene.

Features -> Export Feature Data...

the exported file should be saves as a .csv

Second, use the gff_from_snapgene_features() function to create a gff

The first time the code is run you will need to download and install labtools from github using the following code:

#library(devtools)
#install_github("GreshamLab/labtools")

Once installed, use the make_gff_from_snap function to generate the correctly formatted gff.

To learn more about the function use ?make_gff_from_snap

library(labtools)
labtools::make_gff_from_snap("Features_from_mCitrine_KanMX.csv", chromosome = "XI", feature_source = "mCitrine_KanMX", output = "./mCitrine_KanMX_GAP1.gff")

The resulting gff be correctly formmated and should look like this:

Step 2: Generating a reformed genome.

The next step is to use the novel gff and novel fasta files as input to reform, which will modify the reference genome to insert them and generated genome and annotation files the incorporate the new features. In this case we will provide a unique upstream sequence as a fasta file and a unique downstream sequence as a fasta file. reform will identify the insertion site using these sequences.

less up.fa
echo
less down.fa
>upstream
ATACATCATTTACACCTCGCTCTGGGTCAAGTAATCAAAAAATACCTCGT
>downstream
CGAATATCTTCGACAAATCTGTCGCTTGGTTTATGTTTGACCTGATGTAT

We now run reform providing values for all the required variables. It is also possible to specify the coordinate for the insertion site. Details on running reform can be found here

module load reform/1

reform.py \
  --chrom="XI" \
  --upstream_fasta="Data/up.fa" \
  --downstream_fasta="Data/down.fa" \
  --in_fasta="Data/mCitrine_KanMX.fa" \
  --in_gff="Data/mCitrine_KanMX_GAP1.gff" \
  --ref_fasta="/scratch/work/cgsb/genomes/Public/Fungi/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa" \
  --ref_gff="/scratch/work/cgsb/genomes/Public/Fungi/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Saccharomyces_cerevisiae.R64-1-1.34.gff3"
No valid position specified, checking for upstream and downstream sequence
Proceeding to insert sequence mCitrine_KanMX  (3375 bp) from Data/mCitrine_KanMX.fa at position 513945 on chromsome XI
New fasta file created: Saccharomyces_cerevisiae.R64-1-1.dna.toplevel_reformed.fa
Preparing to create new annotation file
New .GFF3 file created: Saccharomyces_cerevisiae.R64-1-1.34_reformed.gff3

Step 3: Confirm files have been modified correctly.

The reformed GFF contains the new features at the appropriate location:

And the reformed fasta contains the novel sequence at the appropriate location, which is adjacent to the upstream sequence

less up.fa
>upstream
ATACATCATTTACACCTCGCTCTGGGTCAAGTAATCAAAAAATACCTCGT

LS0tCnRpdGxlOiAiR2VuZXJhdGluZyBjdXN0b20gZ2Vub21lcyB1c2luZyAqcmVmKm9ybSIKb3V0cHV0OiBodG1sX25vdGVib29rCmF1dGhvcjogIkRhdmlkIEdyZXNoYW0iCmRhdGU6ICdDb21waWxlZDogYHIgZm9ybWF0KFN5cy5EYXRlKCksICIlQiAlZCwgJVkiKWAnCi0tLQoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldCgKICB0aWR5ID0gVFJVRSwKICB0aWR5Lm9wdHMgPSBsaXN0KHdpZHRoLmN1dG9mZiA9IDEyMCksCiAgbWVzc2FnZSA9IEZBTFNFLAogIHdhcm5pbmcgPSBGQUxTRQopCmBgYAoKSW4gdGhpcyB2aWduZXR0ZSB3ZSBkZW1vbnN0cmF0ZSBob3cgdG8gY29uc3RydWN0IGN1c3RvbWl6ZWQgZ2Vub21lIGFuZCBhbm5vdGF0aW9uIGZpbGVzIHVzaW5nICpyZWYqb3JtLgoKSW4gdGhpcyBleGFtcGxlLCB3ZSB3aWxsIGFkZCBhIHNtYWxsIGNvbnN0cnVjdCB0aGF0IGNvbnRhaW5zIHRoZSBmbHVvcmVzY2VudCBwcm90ZWluIGdlbmUsIG1DaXRyaW5lLCBhbmQgYSBzZWxlY3RhYmxlIG1hcmtlciwga2FuTVggYW5kIHNwZWNpZnkgaXQncyBpbnRlZ3JhdGlvbiBwcm94aW1hdGUgdG8gdGhlICpHQVAxKiBsb2N1cyBpbiB5ZWFzdC4KCiMjIFN0ZXAgMTogR2VuZXJhdGluZyBpbnB1dCBmaWxlcwoKSW4gb3VyIGxhYiwgd2UgdXNlIFNuYXBnZW5lIHRvIGdlbmVyYXRlIHRoZSBtYXAgb2YgdGhlIG5vdmVsIHNlcXVlbmNlIGFuZCB0aGUgY29ycmVzcG9uZGluZyBzZXF1ZW5jZSBpbiBmYXN0YSBmb3JtYXQuICBUaGUgRXhwb3J0IEZlYXR1cmUgRGF0YSBmdW5jdGlvbiBpbiBzbmFwZ2VuZSByZXF1aXJlcyBhIHJlZ2lzdGVyZWQgY29weSwgc28gYWx0ZXJuYXRpdmUgdG9vbHMgc2hvdWxkIGJlIHVzZWQgaWYgdGhpcyBpcyBub3QgYXZhaWxhYmxlLgoKIyMjIE5vdmVsIGZlYXR1cmVzIG1hcAoKVGhpcyBtYXAgaW5jbHVkZXMgYWxsIGFubm90YXRlZCBmZWF0dXJlcyBpbiB0aGUgbm92ZWwgRE5BLiAgSW5jbHVzaW9uIG9mIHRoZXNlIGFubm90YXRpb25zIGVuc3VyZSB0aGF0IHRoZXkgd2lsbCBhcHBlYXIgaW4gdGhlIEdGRi4gCgpgYGB7ciwgZWNobz1GQUxTRSwgZmlnLmNhcD0iTm92ZWwgc2VxdWVuY2UgbWFwIiwgb3V0LndpZHRoID0gJzE1MCUnfQprbml0cjo6aW5jbHVkZV9ncmFwaGljcygibUNpdHJpbmVfS2FuTVggTWFwLnBuZyIpCmBgYAoKIyMjIE5vdmVsIHNlcXVlbmNlCgpUaGUgc2VxdWVuY2UgY29ycmVzcG9uZGluZyB0byB0aGUgbm92ZWwgZmVhdHVyZXMgc2hvdWxkIGJlIGluIGZhc3RhIGZvcm1hdC4KCmBgYHtyLCBlY2hvPUZBTFNFLCBmaWcuY2FwPSJOb3ZlbCBmYXN0YSIsIG91dC53aWR0aCA9ICcxMCUnfQprbml0cjo6aW5jbHVkZV9ncmFwaGljcygibm92ZWxfZmExLnBuZyIpCmBgYAoKIyMjIEdlbmVyYXRpbmcgdGhlIGFubm90YXRpb24gZmlsZSBmb3Igbm92ZWwgZmVhdHVyZXMuCgpUaGlzIGlzIGEgdHdvIHN0ZXAgcHJvY2Vzcy4gIAoKIyMjIyBGaXJzdCwgdGhlIGxpc3Qgb2YgZmVhdHVyZXMgYXJlIGV4cG9ydGVkIGZyb20gc25hcGdlbmUuCgpgRmVhdHVyZXMgLT4gRXhwb3J0IEZlYXR1cmUgRGF0YS4uLmAKCmBgYHtyLCBlY2hvPUZBTFNFLCBmaWcuY2FwPSJFeHBvcnQgZmVhdHVyZXMiLCBvdXQud2lkdGggPSAnMTAwJSd9CmtuaXRyOjppbmNsdWRlX2dyYXBoaWNzKCJleHBvcnRfZmVhdHVyZXMucG5nIikKYGBgCgp0aGUgZXhwb3J0ZWQgZmlsZSBzaG91bGQgYmUgc2F2ZXMgYXMgYSBgLmNzdmAKCmBgYHtyLCBlY2hvPUZBTFNFLCBmaWcuY2FwPSJFeHBvcnQgZmVhdHVyZXMiLCBvdXQud2lkdGggPSAnMTAwJSd9CmtuaXRyOjppbmNsdWRlX2dyYXBoaWNzKCJleHBvcnRfZmVhdHVyZXNfY3N2LnBuZyIpCmBgYAoKIyMjIyBTZWNvbmQsIHVzZSB0aGUgYGdmZl9mcm9tX3NuYXBnZW5lX2ZlYXR1cmVzKClgIGZ1bmN0aW9uIHRvIGNyZWF0ZSBhIGdmZgoKVGhlIGZpcnN0IHRpbWUgdGhlIGNvZGUgaXMgcnVuIHlvdSB3aWxsIG5lZWQgdG8gZG93bmxvYWQgYW5kIGluc3RhbGwgbGFidG9vbHMgZnJvbSBnaXRodWIgdXNpbmcgdGhlIGZvbGxvd2luZyBjb2RlOgoKYGBge3IsIGVjaG89VFJVRX0KI2xpYnJhcnkoZGV2dG9vbHMpCiNpbnN0YWxsX2dpdGh1YigiR3Jlc2hhbUxhYi9sYWJ0b29scyIpCmBgYAoKT25jZSBpbnN0YWxsZWQsIHVzZSB0aGUgYG1ha2VfZ2ZmX2Zyb21fc25hcGAgZnVuY3Rpb24gdG8gZ2VuZXJhdGUgdGhlIGNvcnJlY3RseSBmb3JtYXR0ZWQgZ2ZmLgoKVG8gbGVhcm4gbW9yZSBhYm91dCB0aGUgZnVuY3Rpb24gdXNlIGA/bWFrZV9nZmZfZnJvbV9zbmFwYAoKYGBge3IgZWNobz1UUlVFLCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQpsaWJyYXJ5KGxhYnRvb2xzKQpsYWJ0b29sczo6bWFrZV9nZmZfZnJvbV9zbmFwKCJGZWF0dXJlc19mcm9tX21DaXRyaW5lX0thbk1YLmNzdiIsIGNocm9tb3NvbWUgPSAiWEkiLCBmZWF0dXJlX3NvdXJjZSA9ICJtQ2l0cmluZV9LYW5NWCIsIG91dHB1dCA9ICIuL21DaXRyaW5lX0thbk1YX0dBUDEuZ2ZmIikKYGBgCgpUaGUgcmVzdWx0aW5nIGdmZiBiZSBjb3JyZWN0bHkgZm9ybW1hdGVkIGFuZCBzaG91bGQgbG9vayBsaWtlIHRoaXM6CgpgYGB7ciwgZWNobz1GQUxTRSwgZmlnLmNhcD0iRXhwb3J0IGZlYXR1cmVzIiwgb3V0LndpZHRoID0gJzEwMCUnfQprbml0cjo6aW5jbHVkZV9ncmFwaGljcygiZmVhdHVyZV9nZmYucG5nIikKYGBgCgojIyBTdGVwIDI6IEdlbmVyYXRpbmcgYSAqcmVmKm9ybWVkIGdlbm9tZS4KClRoZSBuZXh0IHN0ZXAgaXMgdG8gdXNlIHRoZSBub3ZlbCBnZmYgYW5kIG5vdmVsIGZhc3RhIGZpbGVzIGFzIGlucHV0IHRvICpyZWYqb3JtLCB3aGljaCB3aWxsIG1vZGlmeSB0aGUgcmVmZXJlbmNlIGdlbm9tZSB0byBpbnNlcnQgdGhlbSBhbmQgZ2VuZXJhdGVkIGdlbm9tZSBhbmQgYW5ub3RhdGlvbiBmaWxlcyB0aGUgaW5jb3Jwb3JhdGUgdGhlIG5ldyBmZWF0dXJlcy4gSW4gdGhpcyBjYXNlIHdlIHdpbGwgcHJvdmlkZSBhIHVuaXF1ZSB1cHN0cmVhbSBzZXF1ZW5jZSBhcyBhIGZhc3RhIGZpbGUgYW5kIGEgdW5pcXVlIGRvd25zdHJlYW0gc2VxdWVuY2UgYXMgYSBmYXN0YSBmaWxlLiByZWZvcm0gd2lsbCBpZGVudGlmeSB0aGUgaW5zZXJ0aW9uIHNpdGUgdXNpbmcgdGhlc2Ugc2VxdWVuY2VzLiAKYGBge2Jhc2h9Cmxlc3MgdXAuZmEKZWNobwpsZXNzIGRvd24uZmEKYGBgCgpXZSBub3cgcnVuIHJlZm9ybSBwcm92aWRpbmcgdmFsdWVzIGZvciBhbGwgdGhlIHJlcXVpcmVkIHZhcmlhYmxlcy4gIEl0IGlzIGFsc28gcG9zc2libGUgdG8gc3BlY2lmeSB0aGUgY29vcmRpbmF0ZSBmb3IgdGhlIGluc2VydGlvbiBzaXRlLiBEZXRhaWxzIG9uIHJ1bm5pbmcgKnJlZipvcm0gY2FuIGJlIGZvdW5kIFtoZXJlXShodHRwczovL2dlbmNvcmUuYmlvLm55dS5lZHUvcmVmb3JtLykKCgpgYGB7YmFzaCBldmFsPUZBTFNFLCBlY2hvPVRSVUV9Cm1vZHVsZSBsb2FkIHJlZm9ybS8xCgpyZWZvcm0ucHkgXAogIC0tY2hyb209IlhJIiBcCiAgLS11cHN0cmVhbV9mYXN0YT0iRGF0YS91cC5mYSIgXAogIC0tZG93bnN0cmVhbV9mYXN0YT0iRGF0YS9kb3duLmZhIiBcCiAgLS1pbl9mYXN0YT0iRGF0YS9tQ2l0cmluZV9LYW5NWC5mYSIgXAogIC0taW5fZ2ZmPSJEYXRhL21DaXRyaW5lX0thbk1YX0dBUDEuZ2ZmIiBcCiAgLS1yZWZfZmFzdGE9Ii9zY3JhdGNoL3dvcmsvY2dzYi9nZW5vbWVzL1B1YmxpYy9GdW5naS9TYWNjaGFyb215Y2VzX2NlcmV2aXNpYWUvRW5zZW1ibC9SNjQtMS0xL1NhY2NoYXJvbXljZXNfY2VyZXZpc2lhZS5SNjQtMS0xLmRuYS50b3BsZXZlbC5mYSIgXAogIC0tcmVmX2dmZj0iL3NjcmF0Y2gvd29yay9jZ3NiL2dlbm9tZXMvUHVibGljL0Z1bmdpL1NhY2NoYXJvbXljZXNfY2VyZXZpc2lhZS9FbnNlbWJsL1I2NC0xLTEvU2FjY2hhcm9teWNlc19jZXJldmlzaWFlLlI2NC0xLTEuMzQuZ2ZmMyIKCmBgYAoKYGBge2Jhc2gsIGVjaG89RmFsc2V9CmVjaG8gTm8gdmFsaWQgcG9zaXRpb24gc3BlY2lmaWVkLCBjaGVja2luZyBmb3IgdXBzdHJlYW0gYW5kIGRvd25zdHJlYW0gc2VxdWVuY2UKZWNobyBQcm9jZWVkaW5nIHRvIGluc2VydCBzZXF1ZW5jZSAnbUNpdHJpbmVfS2FuTVggICgzMzc1IGJwKScgZnJvbSBEYXRhL21DaXRyaW5lX0thbk1YLmZhIGF0IHBvc2l0aW9uIDUxMzk0NSBvbiBjaHJvbXNvbWUgWEkKZWNobyBOZXcgZmFzdGEgZmlsZSBjcmVhdGVkOiAgU2FjY2hhcm9teWNlc19jZXJldmlzaWFlLlI2NC0xLTEuZG5hLnRvcGxldmVsX3JlZm9ybWVkLmZhCmVjaG8gUHJlcGFyaW5nIHRvIGNyZWF0ZSBuZXcgYW5ub3RhdGlvbiBmaWxlCmVjaG8gTmV3IC5HRkYzIGZpbGUgY3JlYXRlZDogU2FjY2hhcm9teWNlc19jZXJldmlzaWFlLlI2NC0xLTEuMzRfcmVmb3JtZWQuZ2ZmMyAKYGBgCgoKIyMgU3RlcCAzOiBDb25maXJtIGZpbGVzIGhhdmUgYmVlbiBtb2RpZmllZCBjb3JyZWN0bHkuCgpUaGUgcmVmb3JtZWQgR0ZGIGNvbnRhaW5zIHRoZSBuZXcgZmVhdHVyZXMgYXQgdGhlIGFwcHJvcHJpYXRlIGxvY2F0aW9uOgoKCmBgYHtyLCBlY2hvPUZBTFNFLCBmaWcuY2FwPSJOb3ZlbCBmYXN0YSIsIG91dC53aWR0aCA9ICcxMCUnfQprbml0cjo6aW5jbHVkZV9ncmFwaGljcygicmVmb3JtZWRfZ2ZmLnBuZyIpCmBgYAoKCkFuZCB0aGUgcmVmb3JtZWQgZmFzdGEgY29udGFpbnMgdGhlIG5vdmVsIHNlcXVlbmNlIGF0IHRoZSBhcHByb3ByaWF0ZSBsb2NhdGlvbiwgd2hpY2ggaXMgYWRqYWNlbnQgdG8gdGhlIHVwc3RyZWFtIHNlcXVlbmNlCgpgYGB7YmFzaH0KbGVzcyB1cC5mYQpgYGAKCmBgYHtyLCBlY2hvPUZBTFNFLCBmaWcuY2FwPSJOb3ZlbCBmYXN0YSIsIG91dC53aWR0aCA9ICcxMCUnfQprbml0cjo6aW5jbHVkZV9ncmFwaGljcygicmVmb3JtZWRfZmEucG5nIikKYGBgCgoKCgoK