Generalised Coupled Tensor Factorisation: Taylan - Cemgil, Umut - Simsekli @boun - Edu.tr

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Generalised Coupled Tensor Factorisation

Y. Kenan Ylmaz A. Taylan Cemgil Umut Simsekli Department of Computer Engineering Bo azici University, Istanbul, Turkey g [email protected], {taylan.cemgil, umut.simsekli}@boun.edu.tr

Abstract
We derive algorithms for generalised tensor factorisation (GTF) by building upon the well-established theory of Generalised Linear Models. Our algorithms are general in the sense that we can compute arbitrary factorisations in a message passing framework, derived for a broad class of exponential family distributions including special cases such as Tweedies distributions corresponding to divergences. By bounding the step size of the Fisher Scoring iteration of the GLM, we obtain general updates for real data and multiplicative updates for non-negative data. The GTF framework is, then extended easily to address the problems when multiple observed tensors are factorised simultaneously. We illustrate our coupled factorisation approach on synthetic data as well as on a musical audio restoration problem.

1 Introduction
A fruitful modelling approach for extracting meaningful information from highly structured multivariate datasets is based on matrix factorisations (MFs). In fact, many standard data processing methods of machine learning and statistics such as clustering, source separation, independent components analysis (ICA), nonnegative matrix factorisation (NMF), latent semantic indexing (LSI) can be expressed and understood as MF problems. These MF models also have well understood probabilistic interpretations as probabilistic generative models. Indeed, many standard algorithms mentioned above can be derived as maximum likelihood or maximum a-posteriori parameter estimation procedures. It is also possible to do a full Bayesian treatment for model selection [1]. Tensors appear as a natural generalisation of matrix factorisation, when observed data and/or a latent representation have several semantically meaningful dimensions. Before giving a formal denition, consider the following motivating example
i,j,k X1 r i,r j,r k,r Z1 Z2 Z3 j,p X2 r j,r p,r Z2 Z4 j,q X3 r j,r q,r Z2 Z5

(1)

where X1 is an observed 3-way array and X2 , X3 are 2-way arrays, while Z for = 1 . . . 5 are the latent 2-way arrays. Here, the 2-way arrays are just matrices but this can be easily extended to objects having arbitrary number of indices. As the term N -way array is awkward, we prefer using the more convenient term tensor. Here, Z2 is a shared factor, coupling all models. As the rst model is a CP (Parafac) while the second and the third ones are MFs, we call the combined factorization as CP/MF/MF model. Such models are of interest when one can obtain different views of the same piece of information (here Z2 ) under different experimental conditions. Singh and Gordon [2] focused on a similar problem called as collective matrix factorisation (CMF) or multi-matrix factorisation, for relational learning but only for matrix factors and observations. In addition, their generalised Bregman divergence minimisation procedure assumes matching link and loss functions. For coupled matrix and tensor factorization (CMTF), recently [3] proposed a gradient-based allat-once optimization method as an alternative to alternating least square (ALS) optimization and 1

demonstrated their approach for a CP/MF coupled model. Similar models are used for proteinprotein interactions (PPI) problems in gene regulation [4]. The main motivation of the current paper is to construct a general and practical framework for computation of tensor factorisations (TF), by extending the well-established theory of Generalised Linear Models (GLM). Our approach is also partially inspired by probabilistic graphical models: our computation procedures for a given factorisation have a natural message passing interpretation. This provides a structured and efcient approach that enables very easy development of application specic custom models, priors or error measures as well as algorithms for joint factorisations where an arbitrary set of tensors can be factorised simultaneously. Well known models of multiway analysis (Parafac, Tucker [5]) appear as special cases and novel models and associated inference algorithms can be automatically be developed. In [6], the authors take a similar approach to tensor factorisations as ours, but that work is limited to KL and Euclidean costs, generalising MF models of [7] to the tensor case. It is possible to generalise this line of work to -divergences [8] but none of these works address the coupled factorisation case and consider only a restricted class of cost functions.

2 Generalised Linear Models for Matrix/Tensor Factorisation


To set the notation and our approach, we briey review GLMs following closely the original notation of [9, ch 5]. A GLM assumes that a data vector x has conditionally independently drawn components xi according to an exponential family density xi exp xi i b(i ) c(xi , ) 2 xi = xi = b(i ) i var(xi ) = 2 2 b(i ) 2 i (2)

Here, i are canonical parameters and 2 is a known dispersion parameter. xi is the expectation of xi and b() is the log partition function, enforcing normalization. The canonical parameters are not directly estimated, instead one assumes a link function g() that links the mean of the distribution xi and assumes that g(i ) = li z where li is the ith row vector of a known model matrix L and x z is the parameter vector to be estimated, A denotes matrix transpose of A. The model is linear in the sense that a function of the mean is linear in parameters, i.e., g() = Lz . A Linear Model x (LM) is a special case of GLM that assumes normality, i.e. xi N (xi ; xi , 2 ) as well as linearity that implies identity link function as g(i ) = xi = li z assuming li are known. Logistic regression x assumes a log link, g(i ) = log xi = li z; here log xi and z have a linear relationship [9]. x The goal in classical GLM is to estimate the parameter vector z. This is typically achieved via a Gauss-Newton method (Fisher Scoring). The necessary objects for this computation are the log likelihood, the derivative and the Fisher Information (the expected value of negative of the Fisher Score). These are easily derived as: L 1 c(xi , ) [xi i b(i )]/ 2 L= (xi xi )wi gx (i )li (3) = 2 x z i i i 1 1 2L L = 2 L DL (4) = 2 L DG(x x) 2 z z where w is a vector with elements wi , D and G are the diagonal matrices as D = diag(w), G = diag(gx (i )) and x 1 g(i ) x wi = v(i )gx (i ) x 2 x gx (i ) = (5) x xi with v(i ) being the variance function related to the observation variance by var(xi ) = 2 v(i ). x x Via Fisher Scoring, the general update equation in matrix form is written as z z + L DL
1

L DG(x x)

(6)

Although this formulation is somewhat abstract, it covers a very broad range of model classes that are used in practice. For example, an important special case appears when the variance functions are in the form of v() = xp . By setting p = {0, 1, 2, 3} these correspond to Gaussian, Poisson, x Exponential/Gamma, and Inverse Gaussian distributions [10, pp.30], which are special cases of the exponential family of distributions for any p named Tweedies family [11]. Those for p = {0, 1, 2}, in turn, correspond to EU, KL and IS cost functions often used for NMF decompositions [12, 7]. 2

2.1 Tensor Factorisations (TF) as GLMs The key observation for expressing a TF model as a GLM is to identify the multilinear structure and using an alternating optimization approach. To hide the notational complexity, we will give an example with a simple matrix factorisation model; extension to tensors will require heavier notation, but are otherwise conceptually straightforward. Consider a MF model g(X) = Z1 Z2 in scalar g(X)i,j =
r i,r j,r Z1 Z2

(7)

where Z1 , Z2 and g(X) are matrices of compatible sizes. Indeed, by applying the vec operator (vectorization, stacking columns of a matrix to obtain a vector) to both sides of (7) we obtain two equivalent representation of the same system vec(g(X)) = (I|j| Z1 ) vec(Z2 ) = (Z1 Z2 ) g(X) vec(Z2 ) = vec(Z2 ) 2 Z2 Z2 Z2 (8)

where I|j| denotes the |j| |j| identity matrix, denotes the Kronecker product [13], and vec Z Z. Clearly, this is a GLM where 2 plays the role of a model matrix and Z2 is the parameter vector. By alternating between Z1 and Z2 , we can maximise the log-likelihood iteratively; indeed this alternating maximisation is standard for solving matrix factorisation problems. In the sequel, we will show that a much broader range of algorithms can be readily derived in the GLM framework. 2.2 Generalised Tensor Factorisation We dene a tensor as a multiway array with an index set V = {i1 , i2 , . . . , i|| } where each index in for n = 1 . . . || runs as in = 1 . . . |in |. An element of the tensor is a scalar that we denote by (i1 , i2 , . . . , i|| ) or i1 ,i2 ,...,i|| or as a shorthand notation by (v) with v being a particular conguration. |v| denotes number of all distinct congurations for V, and e.g. if V = {i1 , i2 } then |v| = |i1 ||i2 |. We call the form (v) as element-wise; the notation [ ] yields a tensor by enumerating all the indices, i.e., = [i1 ,i2 ,...,i|| ] or = [(v)]. For any two tensors X and Y of compatible order, X Y is an element-wise multiplication and if not explicitly stressed X/Y is an element-wise division. 1 is an object of all ones whose order depends on the context where it is used. A generalised tensor factorisation problem is specied by an observed tensor X (with possibly missing entries, to be treated later) and a collection of latent tensors to be estimated, Z1:|| = {Z } for = 1 . . . ||, and by an exponential family of form (2). The index set of X is denoted by V0 and the index set of each Z by V . The set of all model indices is V = || V . We use v (or v0 ) =1 to denote a particular conguration of the indices for Z (or X) while v denoting a conguration of the compliment V = V/V . The goal is to nd the latent Z that maximize the likelihood p(X|Z1: ) where X = X is given via g(X(v0 )) =
v0

Z (v )

(9)

To clarify our notation with an example, we express the CP (Parafac) model, dened as X(i, j, k) = Z1 (i, r)Z2 (j, r)Z3 (k, r). In our notation, we take identity link g(X) = X and the index sets r with V = {i, j, k, r}, V0 = {i, j, k}, V0 = {r}, V1 = {i, r}, V2 = {j, r} and V3 = {k, r}. Our notation deliberately follows that of graphical models; the reader might nd it useful to associate indices with discrete random variables and factors with probability tables [14]. Obviously, while a TF model does not represent a discrete probability measure, the algebraic structure is nevertheless analogous. To extend the discussion in Section 2.1 to the tensor case, we need the equivalent of the model matrix, when updating Z . This is obtained by summing over the product of all remaining factors g(X(v0 )) =
v0 v

Z (v )
v0 = v

Z (v ) =
v0 v

Z (v )L (o )

L (o ) =
v0 = v

Z (v )

with o (v0 v ) (0 v ) v

One related quantity to L is the derivative of the tensor g(X) wrt the latent tensor Z denoted as and is dened as (following the convention [13, pp 196]) g(X) v v = = I|v0 v | L with L R|v0 ||0 v | (10) Z The importance of L is that, all the update rules can be formulated by a product and subsequent contraction of L with another tensor Q having exactly the same index set of the observed tensor X. As a notational abstraction, it is useful to formulate the following function, Denition 1. The tensor valued function (Q) : R|v0 | R|v | is dened as (Q) =
v0 v

Q(v0 ) L (o )

(11)

with (Q) being an object of the same order as Z and o (v0 v ) (0 v ). Here, on v the right side, the nonnegative integer denotes the element-wise power, not to be confused with an index. On the left, it should be interpreted as a parameter of the function. Arguably, function abstracts away all the tedious reshape and unfolding operations [5]. This abstraction has also an important practical facet: the computation of is algebraically (almost) equivalent to computation of marginal quantities on a factor graph, for which efcient message passing algorithms exist [14]. Ai,p B j,q C k,r Gp,q,r with V = Example 1. TUCKER3 is dened as X i,j,k =
p,q,r

{i, j, k, p, q, r}, V0 = {i, j, k}, VA = {i, p}, VB = {j, q}, VC = {k, r}, VG = {p, q, r}. Then for the rst factor A, the objects LA and () are computed as follows A LA =
q,r

B j,q C k,r Gp,q,r = ((C B)G )p = k,j


p Qk,j L k,j A i

LA

p k,j

(12)

The index sets marginalised out for LA and A are V0 VA = {p, q, r} {j, q, k, r} = {q, r} and V0 VA = {i, j, k} {j, q, k, r} = {j, k}. Also we verify the order of the gradient A (10) as Iii LA p = i,p that conforms the matrix derivation convention [13, pp.196]. k,j i,k,j 2.3 Iterative Solution for GTF As we have now established a one to one relationship between GLM and GTF objects such as the observation x vec X, the mean (and the model estimate) x vec X, the model matrix L L and the parameter vector z vec Z , we can write directly from (6) as 1 g(X) with = DG(X X) Z Z + D (14) Z There are at least two ways that this update can further simplied. We may assume an identity link function, or alternatively we may choose a matching link and lost functions such that they cancel each other smoothly [2]. In the sequel we consider identity link g(X) = X that results to = 1. This implies G to be identity, i.e. G = I. We dene a tensor W , that plays the same gX (X) role as w in (5), which becomes simply the precision (inverse variance function), i.e. W = 1/v(X) where for the Gaussian, Poisson, Exponential and Inverse Gaussian distributions we have simply W = X p with p = {0, 1, 2, 3} [10, pp 30]. Then, the update (14) is reduced to Z Z + D
1

(Q) = A

j,k

QL A

p i

(13)

D(X X)

(15)

After this simplication we obtain two update rules for GTF for non-negative and real data. The update (15) can be used to derive multiplicative update rules (MUR) popularised by [15] for the nonnegative matrix factorisation (NMF). MUR equations ensure the non-negative parameter updates as long as starting some non-negative initial values. 4

Theorem 1. The update equation (15) for nonnegative GTF is reduced to multiplicative form as Z Z (W X) (W X) s.t. Z (v ) > 0 (16)

(Proof sketch) Due to space limitation we leave the full details of the proof, but idea is that inverse of H = D is identied as step size and by use of the results of the Perron-Frobenious theorem [16, pp 125] we further bound it as (17) v DX DX For the special case of the Tweedie family where the precision is a function of the mean as W = X p for p = {0, 1, 2, 3} the update (15) is reduced to Z Z (X p X) (X 1p ) (18) = Z < 2Z 2 max ( D) since max (H) max H Z (v ) Z (v )

For example, to update Z2 for the NMF model X = Z1 Z2 , 2 is 2 (Q) = Z1 Q. Then for the Gaussian (p = 0) this reduces to NMF-EU as Z2 Z2 (Z1 X)/(Z1 X). For the Poisson (p = 1) / Z 1 [15]. it reduces to NMF-KL as Z2 Z2 Z1 (X/X) 1

By dropping the non-negativity requirement we obtain the following update equation: Theorem 2. The update equation for GTF with real data can be expressed as Z Z + 2 (W (X X)) /0 2 (W ) with /0 = |v v0 | (19)

(Proof sketch) Again skipping the full details, as part of the proof we set Z = 1 in (17) specically, 2 and replacing matrix multiplication of D1 by D1/0 completes the proof. Here the multiplier /0 is the cardinality arising from the fact that only /0 elements are non-zero in a row of D. Note the example for /0 that if V V0 = {p, q} then /0 = |p||q| which is number of all distinct congurations for the index set {p, q}. Missing data can be handled easily by dropping the missing data terms from the likelihood [17]. The net effect of this is the addition of an indicator variable mi to the gradient L/z = 2 i (xi xi )mi wi gx (i )li with mi = 1 if xi is observed otherwise mi = 0. Hence we simply dene a mask x tensor M having the same order as the observation X, where the element M (v0 ) is 1 if X(v0 ) is observed and zero otherwise. In the update equations, we merely replace W with W M .

3 Coupled Tensor Factorization


Here we address the problem when multiple observed tensors X for = 1 . . . || are factorised simultaneously. Each observed tensor X now has a corresponding index set V0, and a particular conguration will be denoted by v0, u . Next, we dene a || || coupling matrix R where R, = 1 0 X and Z connected otherwise X (u ) =
u

Z (v )R

(20)

For the coupled factorisation, we get the following expression as the derivative of the log likelihood L = Z (v ) R,
u v

X (u ) X (u ) W (u )

X (u ) Z (v )

(21)

where W W (X (u )) are the precisions. Then proceeding as in section 2.3 (i.e. getting the Hessian and nding Fisher Information) we arrive at the update rule in vector form as Z Z +

R, D , ,

R, D X X ,

(22)

Z1

Z||

X1

X||

X1

X2

X3

Figure 1: (Left) Coupled factorisation structure where the arrow indicates the existence of the inuence of latent tensor Z onto the observed tensor X . (Right) The CP/MF/MF coupled factorisation problem in 1. where , = g(X )/Z . The update equations for the coupled case are quite intuitive; we calculate the , functions dened as (Q) = ,
u v

Q(u )
=

Z (v )R

(23)

for each submodel and add the results: Lemma 1. Update for non-negative CTF Z Z

R, , (W X ) R, , W X

(24)

p In the special case of a Tweedie family, i.e. for the distributions whose precision as W = X , the 1p , p , . , X update is Z Z , X X / R R Lemma 2. General update for CTF Z Z + 2 /0

R, , W X X

R, 2 (W ) ,

(25)

p For the special case of the Tweedie family we plug W = X and get the related formula.

4 Experiments
Here we want to solve the CTF problem introduced (1), which is a coupled CP/MF/MF problem i,j,k = X1
r

Ai,r B j,r C k,r

j,p X2 =
r

B j,r Dp,r

j,q X3 =
r

B j,r E q,r

(26)

where we employ the symbols A : E for the latent tensors instead of Z . This factorisation problem has the following R matrix with || = 5, || = 3 R= 1 1 0 1 0 1 1 0 0 1 0 0 0 0 1 X1 = 2 = with X X3 = A1 B 1 C 1 D0 E 0 A0 B 1 C 0 D1 E 0 A0 B 1 C 0 D0 E 1 (27)

We want to use the general update equation (25). This requires derivation of () for = 1 (CP) , and = 2 (MF) but not for = 3 since that ,3 () has the same shape as ,2 (). Here we show the computation for B, i.e. for Z2 , which is the common factor (Q) = B,1
ik

Qi,j,k Ai,r C k,r Qj,p Dp,r


p

= Q(1) (C A )

(28)

(Q) = B,2

= QD

(29)

with Q(n) being mode-n unfolding operation that turns a tensor into matrix form [5]. In addition, for = 1 the required scalar value B/0 is |r| here since VB V0 = {j, r} {r} = {r} noting that value B/0 is the same for = 2, 3. The simulated data size for observables is |i| = |j| = |k| = |p| = |q| = 30 while the latent dimension is |r| = 5. The number of iterations is 1000 with the Euclidean cost while the experiment produced similar results for KL cost as shown in Figure 2.
A 10 5 0 10 5 0 0 5 10 5 0 5 D 10 0 10 5 0 5 10 0 5 E 10 5 0 B C

5 Orginal Initial Final

10

Figure 2: The gure compares the original, the initial (start up) and the nal (estimate) factors for Z = A, B, C, D, E. Only the rst column, i.e. Z (1 : 10, 1) is plotted. Note that CP factorisation is unique up to permutation and scaling [5] while MF factorisation is not unique, but when coupled with CP it recovers the original data as shown in the gure. For visualisation, to nd the correct permutation, for each of Z the matching permutation between the original and estimate are found by solving an orthogonal Procrustes problem [18, pp 601].

4.1 Audio Experiments In this section, we illustrate a real data application of our approach, where we reconstruct missing parts of an audio spectrogram X(f, t), that represents the STFT coefcient magnitude at frequency bin f and time frame t of a piano piece, see top left panel of Fig.3. This is a difcult matrix completion problem: as entire time frames (columns of X) are missing, low rank reconstruction techniques are likely to be ineffective. Yet such missing data patterns arise often in practice, e.g., when packets are dropped during digital communication. We will develop here a novel approach, expressed as a coupled TF model. In particular, the reconstruction will be aided by an approximate musical score, not necessarily belonging to the played piece, and spectra of isolated piano sounds. Pioneering work of [19] has demonstrated that, when a audio spectrogram of music is decomposed using NMF as X1 (f, t) X(f, t) = i D(f, i)E(i, t), the computed factors D and E tend to be semantically meaningful and correlate well with the intuitive notion of spectral templates (harmonic proles of musical notes) and a musical score (reminiscent of a piano roll representation such as a MIDI le). However, as time frames are modeled conditionally independently, it is impossible to reconstruct audio with this model when entire time frames are missing. In order to restore the missing parts in the audio, we form a model that can incorporates musical information of chords structures and how they evolve in time. In order to achieve this, we hierarchically decompose the excitation matrix E as a convolution of some basis matrices and their weights: E(i, t) = k, B(i, , k)C(k, t ). Here the basis tensor B encapsulates both vertical and temporal information of the notes that are likely to be used in a musical piece; the musical piece to be reconstructed will share B, possibly played at different times or tempi as modelled by G. After replacing E with the decomposed version, we get the following model (eq 30): X1 (f, t) = D(f, i)B(i, , k)C(k, d)Z(d, t, ) Test le (30)
i,,k,d

X2 (i, n) =
,k,m

B(i, , k)G(k, m)Y (m, n, ) D(f, i)F (i, p)T (i, p)


i

MIDI le Merged training les

(31) (32)

X3 (f, p) =

Here we have introduced new dummy indices d and m, and new (xed) factors Z(d, t, ) = (d t + ) and Y (m, n, ) = (m n + ) to express this model in our framework. In eq 32, while forming X3 we concatenate isolated recordings corresponding to different notes. Besides, T is a 0 1 matrix, where T (i, p) = 1(0) if the note i is played (not played) during the time frame p and F models the time varying amplitudes of the training data. R matrix for this model is dened as 1 0 1 1 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 1 X1 = with X2 = X3 = D1 B 1 C 1 Z 1 G0 Y 0 F 0 T 0 D0 B 1 C 0 Z 0 G1 Y 1 F 0 T 0 D1 B 0 C 0 Z 0 G0 Y 0 F 1 T 1

R=

(33)

Figure 3 illustrates the performance the model, using KL cost (W = X 1 ) on a 30 second piano recording where the 70% of the data is missing; we get about 5dB SNR improvement, gracefully degrading from 10% to 80% missing data: the results are encouraging as quite long portions of audio are missing, see bottom right panel of Fig.3.
X1 2000
Frequency (Hz)

X2 (Transcription Data) 80 60
Notes Frequency (Hz)

X3 (Isolated Recordings) 2000 1500 1000 500 0

1500 1000 500 0

40 20

10

Time (sec)

15

20

25

50

Time (sec)

100

150

100

Time (sec)

200

300

X1hat (Restored) 2000


Frequency (Hz) Frequency (Hz)

Ground Truth 2000 1500 1000 500 0


SNR (dB)

Performance 15 Reconst. SNR Initial SNR

1500 1000 500 0

10

10

Time (sec)

15

20

25

10

Time (sec)

15

20

25

20

Missing Data Percentage (%)

40

60

80

Figure 3: Top row, left to right: Observed matrices X1 : spectrum of the piano performance, darker colors imply higher magnitude (missing data (70%) are shown white), X2 , a piano roll obtained from a musical score of the piece, X3 , spectra of 88 isolated notes from a piano. Bottom Row: Reconstructed X1 , the ground truth, and the SNR results with increasing missing data. Here, initial SNR is computed by substituting 0 as missing values.

5 Discussion
This paper establishes a link between GLMs and TFs and provides a general solution for the computation of arbitrary coupled TFs, using message passing primitives. The current treatment focused on ML estimation; as immediate future work, the probabilistic interpretation is to be extended to a full Bayesian inference with appropriate priors and inference methods. A powerful aspect, which we have not been able to summarize here is assigning different cost functions, i.e. distributions, to different observation tensors in a coupled factorization model. This requires only minor modications to the update equations. We believe that, as a whole, the GCTF framework covers a broad range of models that can be useful in many different application areas beyond audio processing, such as network analysis, bioinformatics or collaborative ltering. ITAK grant number 110E292, Bayesian Acknowledgements: This work is funded by the TUB matrix and tensor factorisations (BAYTEN) and Bo azici University research fund BAP5723. Umut g ITAK. We also would like to thank to Simsekli is also supported by a Ph.D. scholarship from TUB Evrim Acar for the fruitful discussions. 8

References
[1] A. T. Cemgil, Bayesian inference for nonnegative matrix factorisation models, Computational Intelligence and Neuroscience 2009 (2009) 117. [2] A. P. Singh, G. J. Gordon, A unied view of matrix factorization models, in: ECML PKDD08, Part II, no. 5212, Springer, 2008, pp. 358373. [3] E. Acar, T. G. Kolda, D. M. Dunlavy, All-at-once optimization for coupled matrix and tensor factorizations, CoRR abs/1105.3422. arXiv:1105.3422. [4] Q. Xu, E. W. Xiang, Q. Yang, Protein-protein interaction prediction via collective matrix factorization, in: In Proc. of the IEEE International Conference on BIBM, 2010, pp. 6267. [5] T. G. Kolda, B. W. Bader, Tensor decompositions and applications, SIAM Review 51 (3) (2009) 455500. [6] Y. K. Ylmaz, A. T. Cemgil, Probabilistic latent tensor factorization, in: Proceedings of the 9th international conference on Latent variable analysis and signal separation, LVA/ICA10, Springer-Verlag, 2010, pp. 346353. [7] C. Fevotte, A. T. Cemgil, Nonnegative matrix factorisations as probabilistic inference in composite models, in: Proc. 17th EUSIPCO, 2009. [8] Y. K. Ylmaz, A. T. Cemgil, Algorithms for probabilistic latent tensor factorization, Signal Processing(2011),doi:10.1016/j.sigpro.2011.09.033. [9] C. E. McCulloch, S. R. Searle, Generalized, Linear, and Mixed Models, Wiley, 2001. [10] C. E. McCulloch, J. A. Nelder, Generalized Linear Models, 2nd Edition, Chapman and Hall, 1989. [11] R. Kaas, Compound poisson distributions and glms, tweedies distribution, Tech. rep., Lecture, Royal Flemish Academy of Belgium for Science and the Arts, (2005). [12] A. Cichocki, R. Zdunek, A. H. Phan, S. Amari, Nonnegative Matrix and Tensor Factorization, Wiley, 2009. [13] J. R. Magnus, H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, 3rd Edition, Wiley, 2007. [14] M. Wainwright, M. I. Jordan, Graphical models, exponential families, and variational inference, Foundations and Trends in Machine Learning 1 (2008) 1305. [15] D. D. Lee, H. S. Seung, Algorithms for non-negative matrix factorization, in: NIPS, Vol. 13, 2001, pp. 556562. [16] M. Marcus, H. Minc, A Survey of Matrix Theory and Matrix Inequalities, Dover, 1992. [17] R. Salakhutdinov, A. Mnih, Probabilistic matrix factorization, in: Advances in Neural Information Processing Systems, Vol. 20, 2008. [18] G. H. Golub, C. F. V. Loan, Matrix computations, 3rd Edition, Johns Hopkins UP, 1996. [19] P. Smaragdis, J. C. Brown, Non-negative matrix factorization for polyphonic music transcription, in: WASPAA, 2003, pp. 177180.

You might also like