Chapter 1 Scotti et al. (in prep) scaffolds preparation

The Guianan scaffolds from Scotti et al (in prep)

1.1 Filtering scaffolds over \(1kbp\)

Because of the maybe poor quality of the assembly for scaffolds from Scotti et al (in prep), we first start by filtering scaffolds with a width superior to 1000 bp.

1.4 Removing scaffolds with multimatch blasted consensus sequence from Torroba-Balmori et al. (unpublished)

We used the consensus sequence for French Guianan reads from Torroba-Balmori et al. (unpublished) previously assembled with ipyrad. As a first brutal approach we only keep the first sequence of the consensus loci file and transform it to fasta (see script below). We then blast those consensus sequences on merged scaffolds from Scotti et al (in prep) with blastn in order to detect scaffolds with repetitive regions thanks to multimapped consensus sequences. Those sequences will be saved as a list to be removed in final selected scaffolds list.

In total 542 scaffolds from Scotti et al (in prep) match consensus sequences from Torroba-Balmori et al. (unpublished). But several scaffolds obtained multiple matches that we cannot use for probes. We will thus exclude the whole scaffold if the scaffold is shorter than 2000 bp or the scaffold region matching the raw read if the scaffold is longer than 2000 bp.

Number of match with Torroba consensus reads vs gene width.

Figure 1.1: Number of match with Torroba consensus reads vs gene width.

Table 1.1: Scaffold to cut due to multiple read match.
Scaffold width remove cut
Ivan_2018_sympho47_2L1_012_scaffold197676__8.6 4993 4993-4932
Ivan_2018_sympho47_2L1_012_scaffold246452__6.6 3103 2980-3058
Ivan_2018_sympho47_2L1_012_scaffold26367__7.7 2168 2118-2168
Ivan_2018_sympho47_2L1_012_scaffold463128__4.7 2188 667-586
Ivan_2018_sympho47_2L1_008_scaffold309475__2.1 3525 342-292

Following scaffolds will be removed due to multitple matches and a length \(<200bp\): 2L1_012_scaffold645876__7.5, 2L1_012_scaffold176548__7.1, 2L1_012_scaffold21882__4.9, 2L1_012_scaffold9236__6.0. And other will be cut (see table 1.1).

1.5 Total filtered scaffolds