staff project download information miscellaneous
Vect   Vect Cookbook
  Installing Perl


Reference Manual
Input Panel
Convert Panel
Output Panel
Perl Program Panel

Numerical Data Extraction
Statistical Data Extraction
Patent Calculation
PDB Data Extraction
GenBank Data Extraction
Tabular Data Analysis
Word Mapping
DNA to Protein Extraction

Change Log


MangoPicky DownloadLucy2 DownloadTrend DownloadGRAMAUBViz DownloadgeneDBN Download

Anchored substring extraction:

You have a key short string, say 20 bases, that you would like to find in longer input sequences. Once you found each instance of it, you would like to extract a range (say 100 upstream and downstream) of substring anchored from that location. Assuming you have separately download files containing data that are related and saved into a directory. Now here is what can be done:

0. Load a sample file into Vect;
1. Create rules that will concatenate all DNA sequence fragments into single-line sequences, assuming you have multiple sequences in your files;
2. Pipe your source sequence data into a simple user rule with the following content:

sub { # Do not name your subroutine to avoid name conflicts.

   my ($source) = @_;
   return if !defined($source);

   my @outputs;
   my $start = 0;
   while (($start=index($source, "tttattaa", $start))>=0) {
      my $left = $start-100;
      $left = 0 if $left<0;
      my $length = $start-$left+120;
      push @outputs, substr($source, $left, $length);
      $start += 120;
   return (@outputs);

Just change "tttattaa" to your 20 base substring.

3. Pipe the data from the above rule to the output, wrap lines accordingly.
4. Generate your Perl code.

Note that Vect does not create files; Vect always sends its output to the console, but you can redirect the output from the Vect generated Perl program to a file using the I/O redirection operator '>', like

          perl_program file1 file2 file3 ... > single_output_file

on a Unix-like machine (including Mac OS X).

An example is attached here. Download it onto your computer and open it into Vect and at last, get a GenBank report file to see the results.

Last modified June 13, 2008 . All rights reserved.

Contact Webmaster