POPPI (Pipeline for Orthology-based Primer PIcking) is a pipeline for first, look for possible intron positions according to a template sequence and second, to design primers near the positions found in order to find polymorphism in the introns or the exons. The template sequence must be an ortholog of the sequence of interest. It works with transcript sequence or peptidic sequence but we recommend to use both and to prefer results given with the peptidic sequence.
3 'methods' available to look for primers:
- method1: Look for primers in the exon
- method2: Look for primers surrounding one intron
- method3: Look for primers surrounding one exon
POPPI is executed on the results of FrameDP, a pipeline to predict peptides from transcripts sequences like ESTs clusters.
POPPI runs genomethreader to find the intron positions. Genomethreader is under restrictive licence, you must take a licence. Genomethreader must be configured before you use POPPI.
POPPI runs primer3 in order to find primers. A version of primer3 is integrated in the package. It has been compiled for amd_x64 based processor.
Fastacmd and formatdb are also used by POPPI and are provided in the package too.
- Download POPPI-1.3
wget http://www.heliagene.org/POPPI/POPPI-Linux.1.3.tar.gz
- Get FrameDP
wget http://iant.toulouse.inra.fr/FrameDP/download/framedp-Linux-x86_64.1.0.3.tar.gz
gzip -cd framedp-Linux-x86_64.1.0.3.tar.gz | tar xvf -
ln -s framedp-1.0.3 FrameDP
cd FrameDP
setenv FRAMEDP $PWD or export FRAMEDP=$PWD
- Test the installation
$FRAMEDP/bin/FrameDP.pl --infile $FRAMEDP/data/HuSep2007Test --outdir $FRAMEDP/test
more $FRAMEDP/test/framedp.*.summary
more $FRAMEDP/test/framedp.*.pepdb.fa
- Install POPPI
cd ..
gzip -cd POPPI-Linux.1.3.tar.gz | tar xvf -
ln -s POPPI-1.3 POPPI
cd POPPI
setenv POPPI $PWD or export POPPI=$PWD
- Run this script to update the path to the executable and files in the config file POPPI.cfg, GetExonsPositions.cfg and PrimerDesign.cfg.
$POPPI/bin/int/misc/config-pl
- Run the demo test, it will run POPPI on some clusters of the test datas of FrameDP
$POPPI/bin/int/POPPI.pl --project_dir $POPPI/test --cfg $POPPI/cfg/POPPI.cfg
- Verifications, look at these files:
ls $POPPI/test/GetExonsPositions
more $POPPI/test/GetExonsPositions/AT2G39730.HuCL00001C091.gff
ls $POPPI/test/PrimerDesign/postprocess
more $POPPI/test/PrimerDesign/postprocess/All_sequences_primers.xls
If everything is ok you can adapt the pipeline according to your needs.
- Command Line Interface (CLI)
The default configuration file is $POPPI/cfg/POPPI.cfg. Copy this file and edit your own configuration file.
The fields expected to be changed are:
- db: the path to the templates database, a multifasta file. The forward sequence must be the coding sequence
- source: path to the orthology file: on each line the id of the template sequence and the id of the orthologous sequence of interest, tab separated
AT2G39730 HuCL00001C091
AT3G01540 HuCL00001C057
- frameDP: the path to the output directory of frameDP
- primer_cfg: file with parameters for primer3. See primer3 manuel
- cdna_gth_param: Modify parameter of alignement in gth for genomic sequence, must be quoted. See gth manuel
- protein_gth_param: Modify parameter of alignement in gth for peptidic sequence, must be quoted. See gth manuel
- gep_options: Parameter for GetExonsPositions.pl, must be quoted. See GetExonsPositions.pl options
- pd_options: Parameter for PrimerDesign.pl, must be quoted. See PrimerDesign.pl options
!! cdna_gth_param, protein_gth_param, gep_options and pd_options must NOT be quoted in the configuration file.
- The methods of PrimerDesign.pl
For method1, for large exons, we exclude all regions of the sequence outside of the exon, we look for primers in the exon.
For method2, we target the last base of the exon, in order to surround the intron.
For method3, for small exons, we target the exon, so no primer will be created in this exon, we hope to surround an exon and the introns around.
all, we run the 3 methods.
The identifier of each primer is:
- The name of the transcript sequence
| identifiant of the method: | EX | for method1 |
| EXEY | for method2 |
| EXEYEZ | for method3 |
- the position of the primer on the transcript (5'->3')
- the length of the primer and the strand
All separated with underscore.
Example: HuCL00023C004_E1E2_0078_21F
This primer is designed for the cluster HuCL00023C004 with the method 2, surrounding intron, the primer start at the position 78 on the cluster, the primer is 21 base long on the forward strand.
- Remarks
The script is case sensitive for the ids. They must be the same between the source file and the directories from frameDP.
POPPI format the database but if you want to format your database before using POPPI:
$POPPI/bin/ext/ncbi-blast/formatdb -i your_database -oT -pF
If you have more than one peptide predicted by FrameDP, we always use the longest.
The reference for all the notations is the forward strand.
In the identifier of the primers you will see 2 weird notation: E0 and EE. They are NOT exons. E0 is the 5' part of the sequence before exon 1 (E1). Sometimes it can be 5'UTR or another exon or a part of E1 or anything else. EE is the same but on the 3' end, after the last known exon. So we recommend you NOT to use a primer with E0 or EE in is identifier except when you do not have another choice. Here is a little schema maybe more explicit:
5' E0 E1 E2 EE 3'
_________________|_________________________|_______________________________|________________ cluster sequence
/////////////////////////// \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
/////////////////////////// \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
5' ___________///////////////////////////_____\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\______________________ 3'
| E1 | I1 | E2 | template sequence
- Dependencies
XML::Twig
Class::XML
Data::Dumper
Cwd
File::Basename
HTML::Entities
Getopt::Long
Bio::SeqIO
Bio::Seq
iANT libraries (included in the archive)
fastacmd (included in the archive)
formatdb (included in the archive)
primer3 (included in the archive)
sputnik (included in the archive)
genomethreader