neat is the R package that implements NEAT, the Network Enrichment Analysis Test which is presented in Signorelli, M., Vinciotti, V., Wit, E. C. (2016). NEAT: an efficient network enrichment analysis test. BMC Bioinformatics, 17:352.
The article is freely available from the website of BMC Bioinformatics.
Network enrichment analysis is an extension of traditional gene enrichment analysis (GEA) tests, which are typically used to provide a characterization of a target gene set by relating it to gene sets (such as Gene Ontologies or KEGG pathways) whose function is already known.
A known limitation of GEA tests is that they ignore associations and dependences between genes. The purpose of network enrichment analysis is thus to integrate GEA tests with information on known relations between genes, represented by means of a gene network.
Loosely speaking, we can say that network enrichment analysis incorporates genetic networks, with their information on gene dependences, into gene enrichment tests. Hence, the name “network” enrichment analysis.
Let’s first have a quick look at an example of how a network enrichment analysis can be carried out with NEAT.
The analysis will typically consist of three steps: preparation of the data, computation of the test and inspection of the results.
Let’s start by loading yeast
, a list which contains the
data that we will need for the analysis:
## [1] "esr1" "esr2" "goslimproc" "kegg" "yeastnet"
## [6] "ynetgenes"
Let’s say that we are interested to know whether a set of
differentially expressed genes, yeast$esr2
, can be related
to some functional gene sets contained in yeast$goslimproc
.
Let’s focus the attention on two of these processes, namely ‘response to
heat’ and ‘response to starvation’.
Before we can proceed with the analysis, we have to create two lists
of gene sets, one (which we will call induced_genes
)
containing the set of differentially expressed genes and the other
(called functional_sets
) with the functional sets of
interest:
induced_genes = list('ESR 2' = yeast$esr2) # set of differentially expressed genes
#(ESR 2 is the set of induced ESR genes)
functional_sets = yeast$goslimproc[c(72,75)] # two functional gene sets of interest:
#response to heat and to starvation
Besides these two lists, we will need two further objects:
yeast$yeastnet
, a two-column matrix that contains
YeastNet (a network incorporating known functional couplings between
yeast genes, see the help page ?yeast
for more
details);yeast$ynetgenes
, a vector containing the names of all
the genes that are present in the network.The idea behind NEAT is that if two gene sets are related, then in the network we expect them to be connected by a larger (or smaller) number of links than we would expect to observe by chance. Our null hypothesis, thus, is that if A and B are unrelated, then links are randomly placed between the two groups, so that the total number of links between A and B can be assumed to follow an hypergeometric distribution.
If, however, the number of links that we actually observe between A and B turns out to be significantly different from what we would expect to get if links were placed randomly, then we take this fact as potential evidence of a relation between the two groups and we say that there is ``enrichment” between them.
The computation of the test can be done with the function
neat
as follows:
The results are now saved in the object test
, which we
can display with the command print
:
## A B nab expected_nab pvalue adjusted_p
## 1 ESR 2 response_to_heat 86 96.9 0.2518 0.2518
## 2 ESR 2 response_to_starvation 459 331.4 0.0000 0.0000
## conclusion
## 1 No enrichment
## 2 Overenrichment
From the table we can see that the set of differentially expressed
genes (ESR 2
) is not enriched with respect to the set of
genes involved in response to heating, whereas it is overenriched with
respect to the set of genes that are responsible for response to
starvation (that is to say, the observed number of links, 459, is
significantly higher than what we would expect to get by chance,
i.e. 331). Thus, we can conclude that genes in ESR 2
are
regulated when the yeast cell is exposed to starvation, but not when
exposed to heating.
The core of the package is the function neat
:
neat(alist, blist, network, nettype, nodes,
alpha = NULL, mtc.type = 'fdr',
anames = NULL, bnames = NULL)
The fundamental arguments of the function are:
alist
and blist
, two lists of gene
sets;
network
, which can be specified in three different
formats;
nettype
, either 'undirected'
or
'directed'
;
nodes
, a vector containing the names of all nodes in
the network.
Moreover, three optional arguments are alpha
, which
allows to specify the significance level of the test, and
anames
and bnames
(they can be used to name
the elements of alist
and blist
, if not
already named). Starting from package version 1.2.0, a further argument
mtc.type
has been added to allow computation of multiple
testing corrections within neat.
As a (toy) example, let’s consider a partially directed network with 7 nodes defined by the following adjacency matrix
A = matrix(0, nrow=7, ncol=7)
labels = letters[1:7]
rownames(A) = labels; colnames(A) = labels
A[1,c(2,3)]=1; A[2,c(5,7)]=1;A[3,c(1,4)]=1;A[4,c(2,5,7)]=1;A[6,c(2,5)]=1;A[7,4]=1
print(A)
## a b c d e f g
## a 0 1 1 0 0 0 0
## b 0 0 0 0 1 0 1
## c 1 0 0 1 0 0 0
## d 0 1 0 0 1 0 1
## e 0 0 0 0 0 0 0
## f 0 1 0 0 1 0 0
## g 0 0 0 1 0 0 0
Let’s consider three sets of genes {a,e}, {c,g} and {d,f} and suppose we want to test whether there is enrichment from the first two sets to the third one.
First of all, let’s create a vector for each of the three sets:
As we want to know whether there is enrichment from set1
and set2
to set3
, we can create two gene
lists, one (alist
) containing set1
and
set2
and the other (blist
) containing
set3
:
Above we have defined the network with its adjacency matrix
A
. However, the network can be passed to neat
in three alternative formats:
## 7 x 7 sparse Matrix of class "dgCMatrix"
## a b c d e f g
## a . 1 1 . . . .
## b . . . . 1 . 1
## c 1 . . 1 . . .
## d . 1 . . 1 . 1
## e . . . . . . .
## f . 1 . . 1 . .
## g . . . 1 . . .
an igraph
graph;
a two-column matrix where every row represents and edge (for directed and mixed networks, parent nodes must be in the first column, and child nodes in the second), e.g.:
## [,1] [,2]
## [1,] "a" "b"
## [2,] "a" "c"
## [3,] "b" "e"
## [4,] "b" "g"
## [5,] "c" "a"
## [6,] "c" "d"
## [7,] "d" "b"
## [8,] "d" "e"
## [9,] "d" "g"
## [10,] "f" "b"
## [11,] "f" "e"
## [12,] "g" "d"
Set the argument nettype
equal to
'undirected'
if an undirected network is at hand, and equal
to 'directed'
if you are considering a directed or
partially directed network.
Once you have prepared the lists of gene sets and the network, what
you need is to run neat
, without forgetting to specify the
correct nettype
(here nettype = 'directed'
)
and the labels of nodes (here nodes = labels
):
test1 = neat(alist = alist, blist = blist, network = A,
nettype = 'directed', nodes = labels)
print(test1)
## A B nab expected_nab pvalue adjusted_p
## 1 set 1 set 3 0 0.3333 0.68181818 0.68181818
## 2 set 2 set 3 2 0.5000 0.04545455 0.09090909
If you want to add to the results a column specifying the conclusion
of the test (overenrichment, no enrichment or underenrichment) for a
given significance level, you use the option alpha
. Note
that the conclusion will obviously depend on the type of multiple
correction you choose (compare test2 and test3 below):
# results using FDR correction (default since version 1.2.0)
test2 = neat(alist = alist, blist = blist, network = A,
nettype = 'directed', nodes = labels, alpha = 0.05, mtc.type = 'fdr')
print(test2)
## A B nab expected_nab pvalue adjusted_p conclusion
## 1 set 1 set 3 0 0.3333 0.68181818 0.68181818 No enrichment
## 2 set 2 set 3 2 0.5000 0.04545455 0.09090909 No enrichment
# results without multiple testing correction (mtc.type = 'none')
test3 = neat(alist = alist, blist = blist, network = A,
nettype = 'directed', nodes = labels, alpha = 0.05, mtc.type = 'none')
print(test3)
## A B nab expected_nab pvalue adjusted_p conclusion
## 1 set 1 set 3 0 0.3333 0.68181818 0.68181818 No enrichment
## 2 set 2 set 3 2 0.5000 0.04545455 0.04545455 Overenrichment
The aim of this vignette is to provide a quick introduction to the computation of NEAT using R. Here I focused my attention on the fundamental aspects that one needs to use the package.
Further details, functions and examples can be found in the manual of the package.
The description of the method is available in an article which you can read here.