You're not logged in
 /  log in  /  admin page  /  wiki  / 

Instructions

Description

The variable function analysis performs a statistical test (Mann-Whitney U) on a given set of data.

Data Input

It is easiest to copy and paste data directly from excel, as it will input the tab characters for you

The data must be in the form "VAR1 VAR2 VAR3 VALUE", where the gaps are TABS and each line ends in a numeric value. Each VAR can be anything, it could be single base pairs of a gene, e.g. "A C T G 0.4". The VARs could also be amino acids, allele variants, etc. The VALUE must be the last thing on each line, and must be a number. It is a measure of the function of that sequence of VARs as a whole.

Note: the Mann-Whitney U test will use approximate p-values under the following conditions:

  • A VALUE exists that is non-unique (ex: multiple lines contain '0')
  • If the smaller of the populations (n-with, n-without) is N=100 or larger.
Whether each test was done with the exact or approximate p-value method is noted in the rightmost column.

Example

Say you are trying to determine which HLA types attribute to higher CD4 counts. The variables you record per individual are:

  1. A1
  2. A2
  3. B1
  4. B2
  5. C1
  6. C2

For each individual set of these variables A1-C2 you also have a function which measures the ratio of CD4 as compared to a normal healthy cell. Wherethese individuals are infected with HIV.

A1 A2 B1 B2 C1 C2 % CD4
A02:26A03:01:01GB07:02:01GB40:01:01GC03:04:01GC07:02:01G0.3
A01:01:01GA02:01:01GB08:01:01GB15:01:01GC03:04:01GC07:01:01G0.7
A01:01:01GA02:01:01GB08:01:01GB57:01:01GC06:02:01GC07:01:01G0.8
A02:01:01GA03:01:01GB14:02:01B15:34C03:04:01GC08:02:010.3
A02:01:01GA24:03:01GB38:01:01B51:01:01GC12:03:01GC14:02:010.45
A02:01:01GA02:01:01GB14:02:01B40:01:01GC03:04:01GC08:02:010.3
A01:01:01GA01:01:01GB08:01:01GB57:01:01GC06:02:01GC07:01:01G0.75
A11:01:01GA23:01:01GB07:02:01GB51:01:01GC04:01:01GC15:02:01G0.2
A01:01:01GA03:01:01GB27:05:02GB57:01:01GC01:02:01GC06:02:01G0.8
A01:01:01GA02:01:01GB08:01:01GB44:02:01GC02:02:02GC07:01:01G0.7
A01:01:01GA11:01:01GB08:01:01GB35:01:01GC04:01:01GC07:640.9
A02:01:01GA24:02:01GB15:01:01GB15:07:01GC01:02:01GC03:03:01G0.4
A01:01:01GA25:01:01GB08:01:01GB39:01:01GC07:01:01GC12:03:01G0.6

You can copy and paste the table (not including the headers) into the text area and click submit to get medians, counts, and p-values that will list each category and assign it a p-value based on whether or not the presence or absense of that variable makes a significant difference on the function