where the column vectors of the matrix factors ( A, B, C) have unit norm, Σis a diagonal matrix whose diagonal elements are the coefficients of the vector σand α > 0 is a penalty parameter that allows the importance of the sparseness constraints on the weight vectors ( g, σ) to be increased or decreased, modeled by means of the l 1norm. The advantage of merging EEG and fMRI data with the criterion [I.3] is that the acquisition and observation methods are complementary in terms of resolution, since EEG signals have a high temporal resolution but low spatial resolution, while fMRI imaging provides high spatial resolution;
– in the case of the CSTF model (Li et al. 2018), the tensor of high-resolution hyperspectral images (HR-HSI) is represented using a third-order Tucker model that has a sparse core with the following modes: space (width) × space (height) × spectral bands. The matrices denote the dictionaries for the width, height and spectral modes, composed of nw, nh and ns atoms, respectively, and the core tensor contains the coefficients relative to the three dictionaries. The matrices W∗, H∗ and S∗ are spatially and spectrally subsampled versions with respect to each mode. The term λ is a regularization parameter for the sparseness constraint on the core tensor, expressed in terms of the l1 norm of .
7 See definition [3.41] of the unfolding X n, and definitions [ 1.65] and [ 1.67] of the Frobenius norm (||.|| F) and the nuclear norm (||.|| ∗) of a matrix; for a tensor, see section 3.16.
The criteria listed in Table I.4can be globally minimized using a nonlinear optimization method such as a gradient descent algorithm (with fixed or optimal step size), or the Gauss–Newton and Levenberg–Marquardt algorithms, the latter being a regularized form of the former. In the case of constrained optimization, the augmented Lagrangian method is very often used, as it allows the constrained optimization problem to be transformed into a sequence of unconstrained optimization problems.
The drawbacks of these optimization methods include slow convergence for gradient-type algorithms and high numerical complexity for the Gauss–Newton and Levenberg–Marquardt algorithms due to the need to compute the Jacobian matrix of the criterion w.r.t. the parameters being estimated, as well as the inverse of a large matrix.
Alternating optimization methods are therefore often used instead of a global optimization w.r.t. all matrix and tensor factors to be estimated. These iterative methods perform a sequence of separate optimizations of criteria linear in each unknown factor while fixing the other factors with the values estimated at previous iterations. An example is the standard ALS (alternating least squares) algorithm, presented in Chapter 5for estimating PARAFAC models. For constrained optimization, the alternating direction method of multipliers (ADMM) is often used (Boyd et al . 2011).
To complete this introductory chapter, let us outline the key knowledge needed to employ tensor tools, whose presentation constitutes the main objective of this second volume:
– arrangement (also called reshaping) operations that express the data tensor as a vector (vectorization), a matrix (matricization), or a lower order tensor by combining modes; conversely, the tensorization and Hankelization operations allow us to construct tensors from data contained in large vectors or matrices;
– tensor operations such as transposition, symmetrization, Hadamard and Kronecker products, inversion and pseudo-inversion;
– the notions of eigenvalue and singular value of a tensor;
– tensor decompositions/models, and their uniqueness properties;
– algorithms used to solve dimensionality reduction problems and, hence, best low-rank approximation, parameter estimation and missing data imputation. This algorithmic aspect linked to tensors will be explored in more depth in Volume 3.
I.6. Brief description of content
Tensor operations and decompositions often use matrix tools, so we will begin by reviewing some matrix decompositions in Chapter 1, going into further detail on eigenvalue decomposition (EVD) and SVD, as well as a few of their applications.
The Hadamard, Kronecker and Khatri–Rao matrix products are presented in detail in Chapter 2, together with many of their properties and a few relations between them. To illustrate these operations, we will use them to represent first-order partial derivatives of a function, and solve matrix equations, such as Sylvester and Lyapunov ones. This chapter also introduces an index convention that is very useful for tensor computations. This convention, which generalizes Einstein’s summation convention (Pollock 2011), will be used to represent various matrix products and to prove some matrix product vectorization formulae, as well as various relations between the Kronecker, Khatri-Rao and Hadamard products. It will be used in Chapter 3for tensor matricization and vectorization in an original way, as well as in Chapter 5to establish matrix forms of the Tucker and PARAFAC decompositions.
Chapter 3presents various sets of tensors before introducing the notions of matrix and tensor slices and of mode combination on which reshaping operations are based. The key tensor operations listed above are then presented. Several links between products of tensors and systems of tensor equations are also outlined, and some of these systems are solved with the least squares method.
Chapter 4is dedicated to introducing the notions of eigenvalue and singular value for tensors. The problem of the best rank-one approximation of a tensor is also considered.
In Chapter 5, we will give a detailed presentation of various tensor decompositions, with a particular focus on the basic Tucker and CPD decompositions, which can be viewed as generalizations of matrix SVD to tensors of order greater than two. Block tensor models and constrained tensor models will also be described, as well as certain variants, such as HOSVD and BTD (block term decomposition). CPD-type decompositions are generally used to estimate latent parameters, whereas Tucker decomposition is often used to estimate modal subspaces and reduce the dimensionality via low multilinear rank approximation and truncated HOSVD.
A description of the ALS algorithm for parameter estimation of PARAFAC models will also be given. The uniqueness properties of the Tucker and CDP decompositions will be presented, as well as the various notions of the rank of a tensor. The chapter will end with illustrations of BTD and CPD decompositions for the tensor modeling of multidimensional harmonics, the problem of source separation in an instantaneous linear mixture and the modeling and estimation of a finite impulse response (FIR) linear system, using a tensor of fourth-order cumulants of the system output.
High-order cumulants of random signals that can be viewed as tensors play a central role in various signal processing applications, as illustrated in Chapter 5. This motivated us to include an Appendix to present a brief overview of some basic results concerning the higher order statistics (HOS) of random signals, with two applications to the HOS-based estimation of a linear time-invariant system and a homogeneous quadratic system.
1 1 Scalars, vectors, and matrices are written in lowercase, bold lowercase, and bold uppercase letters, respectively: a, a, A.
2 2A decomposition satisfies the essential uniqueness property if it is unique up to permutation and scaling factors in the columns of its factor matrices.
3 3 Big data is characterized by 3Vs (Volume, Variety, Velocity) linked to the size of the data set, the heterogeneity of the data and the rate at which it is captured, stored and processed.
Читать дальше