Fou++

Navigation

This wiki

This page

The fou++ code (download) applies spectral analysis to tiling arrays times series data. Smoothing is performed by grouping probes in bins (in base pairs) of a fixed size (cf. the variable my$size below). The analysis can be run in two modes:

"F" for free, where the annoation is not used. Instead a sliding window is run across the whole chromosome and returns a score for each genomic position.
"A" uses annotation meaning that exons are treated as single blocks, i.e. all probes in any given exons are smoothed together. Intergenic or intronic regions are treated as in the "F" mode

./run.pl is wrapper that processes the sample data files (included the .tgz archive):

LLCrick.txt+A7.sample
LLWatson.txt+A7.sample

Note: these files contain the normalized raw data (cf. e.g. LLWatson_all_chrom.txt) to which annotation has been added in column 14-20

and produces output files:

LLWatson.txt.fou.A.V7.RW200
LLCrick.txt.fou.A.V7.RW200

Note: In this sample run, probes were pruned when the mean expression was below 3 and the standard deviation below 0.25. Cf. in the script below to see how this is specified in the command.

The columns contain (here the example of an intergenic probe)

1 6 probe number

2 all identifier (all for intergenic)

3 all dummy identifier

4 none dummy identifier

5 ig secondary identifier (ig for intergenic)

6 500 position on the chromosome

7 3.332 mean of expression across the 12 time points

8 0.643 sd of expression

9 1 chromosome (negative if the probe was pruned)

10 5.000 number of probes smoothed at that position

11 0.010 F24 score

12 21.727 phase in hours

13 7.580e-01 p-value associated with the F24 score

14 N non-coding (N) or coding (C)

The columns contain (here the example of an exonic probe)
1 678 probe number

2 AT1G01070 identifier (here the gene name)

3 all dummy identifier

4 none dummy identifier

5 tu6 secondary identifier (tu for transcription unit, intron for introns)

6 40660 position on the chromosome

7 2.745 mean of expression across the 12 time points

8 0.387 sd of expression

9 1 chromosome (negative if the probe is pruned)

10 3.000 number of probes smoothed at that position

11 0.066 F24 score

12 20.172 phase in hours

13 3.522e-01 p-value associated with the F24 score

14 C non-coding (N) or coding (C)

++++++++++++++++
this is the run.pl wrapper (included in fou_v1.tgz distribution)
#!/usr/bin/env perl

# runs the entire sequence of steps necessary to generate the cycling scores

use warnings;

use strict;

my$size=200; #smoothing window

my@chrs=(1,2,3,4,5); #list of chromosomes

my$file="";;

my$type="A"; #"F"; type of analysis to be done "A" uses the annotation based binning, "F" ignores annotation

my@bases=("LLWatson","LLCrick"); #which files to process

# compile before we start

#`g++ fou.cc -o fou -ggdb`;

`make fou`;

foreach my$base (@bases) {

# the original large data file

my$orig = "$base.txt+A7.sample";

my$slope="none";

#my$slope="slopes/win$size.chr$chr.$type.exp";

if(1){

my$data = "$base.txt.fou.$type.V7.RW$size";

# fou now processes all chromosomes at once

if(1) {

my$f24s_file = "$base.win$size.f24s";

my$fou_command =

# use this version for the mean<3, sd<0.25 cutoff

"./fou 1 0 3.0 $size 0 $type 0 $slope 0.25 $orig 2> $f24s_file 1> $data";

print "$fou_command\n";

`$fou_command`;

}