Compressed Dictionary Learning Toolbox

The compressed dictionary learning toolbox provides functions to learn dictionaries from large-scale synthetic and audio training data with the Iterative Compressed-Thresholding and K-Residual-Means (IcTKM) algorithm introduced in [1]. In this documentation we provide an overview of the toolbox and how to get started on a simple dictionary learning example. We also discuss how the various toolbox scripts and functions are organized and how they can be used to learn dictionaries. Links to the audio files containing the sonification of dictionaries learned on audio training data are provided. Lastly, we provide instructions for reproducing the numerical results presented in [1].

Contents

Overview

Sparsity in dictionary is an effective and widely used low-complexity model of high-dimensional data. In the sparse model we assume the existence of a dictionary in which every signal in the data class at hand can be approximated with a sparse linear expansion. Mathematically, we say that there exists a set of $K$ unit-norm vectors $\phi_k \in \Re^d$ referred to as atoms, such that every signal $y \in \Re^d$ can be approximately represented in the dictionary as $y \approx \sum_{i \in I}x(i) \phi_i$, where $I$ is an index set and $x \in \Re^{K}$ is a sparse coefficient vector with $|I| = S$ and $S \ll d$. In dictionary learning we address the problem of (algorithmically) finding a dictionary which provides sparse representations of a set of signals. In its most general form dictionary learning can be seen as a matrix factorization problem. Given a set of $N$ signals represented by the $d \times N$ training data matrix $Y = (y_1, \ldots, y_N)$, decompose it into a $d \times K$ dictionary matrix $\Phi = (\phi_1, \ldots, \phi_K)$ and a $K \times N$ coefficient matrix $X = (x_1, \ldots x_N)$, i.e., find $Y=\Phi X$ where every coefficient vector $x_k$ in $X$ is sparse.

The IcTKM algorithm is an alternating-minimization algorithm for dictionary learning which alternates between 2 main steps: updating the sparse coefficients based on the current version of the dictionary, and updating the dictionary based on the current version of the coefficients. For sparsity levels $S$ of practical interest, the most computationally expensive step in the alternating minimization algorithm is the sparse coefficient update, which entails the calculation of the matrix product between the training data $Y$ and the updated dictionary $\Psi$. In the IcTKM algorithm, fast dictionary learning is achieved by compressing $Y$ and $\Psi$ into $m \ll d$ dimensions with a Johnson-Lindenstrauss randomized transformation $\Gamma$. More specifically, the matrix product $\Psi^{*}Y$ is replaced by its compressed counterpart $\Psi^{*}\Gamma^{*}\Gamma Y$ in the sparse coefficient update step. When the Johnson-Lindestrauss transformation is carried out with fast algorithms, the computational complexity of the costly update is reduced by a factor of the order $O(m/d)$. Given an input dictionary $\Psi$, a training data matrix $Y$, and an embedding dimension $m$, each iteration of the IcTKM algorithm is defined by the following steps:

  1. Draw the randomized Johnson-Lindenstrauss transform $\Gamma$.
  2. For each training signals $y_n$ in the training set $Y$, find its sparse support $I^{ct}_{\Psi,n}$ via compressed thresholding: $I^{ct}_{\Psi,n} = \arg \max_{I:|I|=S} || \Psi_{I}^{*}\Gamma^{*}\Gamma y_n ||_1$.
  3. For each atom $\bar{\psi}_k$ in the updated dictionary $\bar{\Psi}$, compute $K$-residual means: $\bar{\psi}_k = \frac{1}{N} \sum_n{ sign(\langle \psi_k, y_n \rangle ) } \cdot 1_{I^{ct}_{\Psi,n}}(k) \cdot ( y_n - P(\Psi_{I^{ct}_{\Psi,n}}) y_n + P(\psi_k) y_n )$.
  4. Normalize the updated atoms: $\bar{\Psi} = ( \bar{\psi}_1/||\bar{\psi}_1||_2, \ldots, \bar{\psi}_K/|| \bar{\psi}_K ||_2 )$.

In this toolbox we provide the ability to efficiently learn dictionaries from high-dimensional audio and synthetic training data with IcTKM. Efficiency is achieved by using randomized Johnson-Lindestrauss transforms which can be carried out with the fast Fourier transform (FFT). Furthermore, we provide the ability to fully utilize the available hardware with the toolbox. In particular, parallelization techniques can be used to process multiple training signals with the available CPU cores. Furthermore, CUDA® capable graphics processing units (GPUs) can be used to speed-up the most computationally demanding vectorized calculations.

The synthetic data is generated by using a sparse signal model with coefficients that follow a well-behaved distribution, i.e., the non-zero values follow a geometric sequence which gracefully decays in magnitude. Mathematically, we let each coordinate of our coefficient vector be defined as $x_{c,p,\sigma}(k) = \sigma(k) c(p(k))$, where $p$ is a permutation sequence, $\sigma \in \{-1,1 \}^K$ is a sign sequence, and $c$ is a geometric sequence defined as $c(k)=c_b^k/||c||_2$ for $k \leq S$ and $c(k)=0$ for $k > S$ with $c_b$ uniformly distributed in $[1-b,1]$ for some $0 < b < 1$. The synthetic signal then takes the form $y = (\Phi x_{c,p,\sigma} + r)/\sqrt{1 + ||r||_2^2}$, where $r \in \Re^d$ is noise defined by a Gaussian random vector with mean zero and variance $\rho^2$, see [1] for details.

The audio training data can be input as an audio file in WAVE, FLAC, MP3, or OGG formats. The (potentially long) signal in the input audio file is partitioned into smaller signals which are allowed to overlap. Dictionary learning from audio data is carried out directly from the time-domain audio samples with no frequency transformation taking place, as opposed to what is usually done in common audio tasks.

Getting Started

In the following example code we will see how to use IctKM to learn dictionaries from synthetic training data. First we define the dictionary learning problem parameters, i.e., the ambient dimension $d$, the embedding dimension $m$, and the number of atoms $K$ in the generating dictionary $\Phi$. In the example below we use $m=d/2$, which amounts to a data compression ratio of $2:1$, and an overcomplete generating dictionary with $K=(3/2)d$.

d = 256;                % signal dimension
m = ceil(d/2);          % embedding dimension
K = ceil(3*d/2);        % Number of atoms in the dictionary

Next, we define the training data generation parameters, i.e., the sparsity level $S$, the Gaussian noise level $\rho$, the parameter $b$ for the geometric sequence, and the number of training signals $N$. We also set the random number generator seed for reproducible results.

S = 2;                  % sparsity level (4,8,12,16)
rho = 0;                % noise level (0,1/sqrt(d))
b = 0.1;                % coefficient decay parameter (0,1)
N = 100000;             % number of training signals per iteration
rng(1);                 % random generator seed

The generating dictionary $\Phi$ is defined as the union of two bases in $\Re^d$, the Dirac basis and the first half elements of the discrete cosine transform basis.

dico = [eye(d),dct(eye(d))];
dico(:,K+1:2*d) = [];

To generate the synthetic training signals we invoke the toolbox function make_sparse_signal.m

Y = make_sparse_signal(dico,N,S,S,b,rho);

The training signals are now stored in the data matrix $Y$. We now set the initialization dictionary $\Psi$ uniformly at random from the unit sphere in $\Re^d$ and run $100$ iterations of the IcTKM algorithm with the toolbox function ictkm_simplified.m

dinit = randn(d,K);                                     % Create input dictionary
dinit = dinit*diag(1./sqrt(sum(dinit.*dinit)));         % Normalize input dictionary
maxit = 100;                                            % Set maximum number of iterations
rdico = ictkm_simplified(Y, K, S, m, maxit, dinit);    % Run the IcTKM algorithm

The output (recovered) dictionary $\bar{\Psi}$ is now stored in the matrix rdico. We now evaluate the recovery performance of IcTKM. Given an atom $\bar{\psi}_l$ from $\bar{\Psi}$, the criteria for declaring the generating atom $\phi_k$ as recovered is obtained by computing the dictionary correlation inner product $\langle \bar{\psi}_l, \phi_k \rangle$, more specifically via $\max_l | \langle \bar{\psi}_l, \phi_k \rangle | \geq 0.99$. In the example code below we count the number of recovered atoms following this criteria and compute the percentage of recovered atoms.

rcorr=max(abs(rdico'*dico));
rec_count = find(rcorr > 0.99);                 % Count the number of recovered atoms
percent_recov = (length(rec_count)/K)*100;      % Calculate percentage of recovered atoms
display(['percent of recovered atoms: ' num2str(percent_recov) '%']);
percent of recovered atoms: 91.6667%

To run this dictionary learning example and for more details, refer to the toolbox script ictkm_example_script.m

Listening to the Audio Dictionaries

For the audio training data we have used several audio recordings from the RWC database. The recordings were comprised of stereo audio signals sampled at 44.1 KHz. We have first summed the audio signals to mono and then partioned the resulting signals into smaller blocks of $0.25$ and $1$ second durations. The blocks were allowed to overlap such that the maximally amount of overlap of one block with a shifted version of itself varied from $95\%$ for the $0.25$ seconds block, up to $98.75\%$ for the $1$ second block, see Section 4.2 of [1] for more details. We ran $200$ iterations of IcTKM using a compression ratio of $5:1$ to learn the dictionaries. The sparsity level was fixed at $S=4$ and we have learned dictionaries of size $K=64$ and $K=256$ atoms. To create an audio representation of the learned dictionaries, we have first sorted the atoms by their fundamental frequency, normalized their entries in the range $(-1,1)$, and then concatenated the renormalized atoms in a long array. A pause of $0.5$ seconds has also been added between each atom for an improved listening experience. The data in the long array was saved to audio files, which can be listened by following the links below:

Organization of the Toolbox

The compressed dictionary learning toolbox contains several folders, which have been organized with functions and scripts of related purpose. We now list these folders and describe their most important functions. For more details refer to the toolbox m-files, which can be accessed by following the links below.

In the following figure we illustrate the dependencies between the most important functions in the toolbox.

Reproducing the Numerical Results

As discussed in the previous section, the toolbox folder simulation contains all the required scripts for reproducing the numerical simulations of Section 4 in [1]. Specifically, the results for the synthetic training data in Sections 4.1.1 to 4.1.3 can be reproduced by using ictkm_synthetic_simulation.m. The scalability experiments of Section 4.1.4 can be reproduced by using ictkm_scalability_simulation.m. Lastly, the audio experiments of Sections 4.2.1 and 4.2.2 can be reproduced by using ictkm_audio_simulation.m. For complete reference, we now describe in details the options structure used by the main simulation script ictkm_simulation.m:

results = ictkm_simulation(options);

The IcTKM algorithm can be configured with the option structure fields in the following table. Here we can set the number of iterations, whether the dictionary correlation inner products $\langle \bar{\psi}_l, \phi_k \rangle$ are stored in the results structure for further inspection, and if the algorithm displays its status in the command window.

IcTKM Algorithm Configuration
Field Description
options.IcTKM_numIter Sets the number of iterations
options.IcTKM_storeRecCorr Set to 1 to return the dictionary correlation inner products
options.IcTKM_verbose Set to 1 to display the algorithm progress and status in the console

The run mode of the IcTKM algorithm can be configured with the struct fields described in the next table. The algorithm can be configured to run in parallel mode where the dictionary update loop is parallelized via 'parfor'. The algorithm can also be set to run in CUDA® enabled GPUs such that all vectorized computations will be performed directly in the GPU.

IcTKM Run Mode Configuration
Field Description
options.IcTKM_runMode_parallel Set to 1 to enable parallelization of dictionary updates
options.IcTKM_runMode_parallel_boost_workers Set to 1 to use the maximum number of available threads
options.IcTKM_runMode_useGPU Set to 1 to use a CUDA capable GPU to speed-up vectorized calculations

The configuration details of the Johnson-Lindenstrauss (JL) embedding are described in the following table. Here we can use JL embeddings based on orthogonal transforms such as the unitary discrete Fourier and cosine matrices, and also circulant matrix constructions. The algorithm can also be set to run with no data embedding.

Johnson-Lindenstrauss Embedding Configuration
Field Description
options.IcTKM_embedding_PartialFourier Set to 1 to use a JL matrix based on the partial discrete Fourier construction
options.IcTKM_embedding_PartialDCT Set to 1 to use a JL matrix based on the partial discrete cosine construction.
options.IcTKM_embedding_PartialCirculant Set to 1 to use a JL matrix based on the circulant construction
options.IcTKM_embedding_dimension Sets the dimension of the embedded space
options.IcTKM_embedding_disabled Set to 1 to disable data embedding

The dictionary learning configuration parameters are described next. In particular, here we can set the ambient dimension $d$, the dictionary overcomplete ratio $K/d$, the sample size ratio $\alpha$, where $N= \alpha \cdot K \log K$, and the toolbox random number generator seed for reproducible results.

Simulation Configuration
Field Description
options.ambientDimension Sets the learning problem ambient dimension
options.overcompletenessRatio Sets the learning problem overcompleteness ratio
options.SampleSizeRatio Sets the learning problem sample size ratio
options.randomSeed Sets the random seed for reproducible results

The struct fields for configuring the synthetic training data generation are described next. Here we can set the training data sparsity level $S$, the noise level $\rho$, and whether a 'flat' generating sequence $c$ is used where $c(k)=1/\sqrt{S}$ for $k \leq S$ and $c(k) = 0$ for $k > S$, or a geometric sequence $c$ is used where $c(k)=c_b^k/||c||_2$ for $k \leq S$ and $c(k)=0$ for $k > S$ with $c_b$ uniformly distributed in $[1-b,1]$ for some $0 < b < 1$. Here we can also set the training data dynamic range and the signal-to-noise ratio $E( || \Phi x_{c,p,\sigma} ||_2^2) / E( ||r ||_2^2 )$. Lastly, we may also set the type of generating dictionary $\Phi$; uniformly at random from the unit sphere in $\Re^d$ and as the union of two bases in $\Re^d$, i.e., the Dirac basis and the first half elements of the discrete cosine (or Hadamard) transform basis.

Synthetic Training Data Configuration
Field Description
options.DataModeSynthetic Set to 1 to use synthetic training data in the simulation
options.SyntheticData_S Sets the synthetic data sparsity level
options.SyntheticData_noiseLevel Sets the synthetic data noise level
options.SyntheticData_flatCoeff Set to 1 to use data with flat coefficients
options.SyntheticData_fixedCoeffDecay Set to 1 to use data with a fixed coefficient decay
options.SyntheticData_dynamicRange Sets the synthetic data dynamic range in dB
options.SyntheticData_origDicoRandom Set to 1 to use the random dictionary for generating the synthetic data
options.SyntheticData_origDicoDiracDCT Set to 1 to use the Dirac+DCT dictionary for generating the synthetic data
options.SyntheticData_origDicoDiracHadamard Set to 1 to use the Dirac+Hadamard dictionary for generating the synthetic data

The the audio training data configuration is described in the next table. Here we can set the training data sparsity level $S$ and the audio file path. As we partition the audio file in smaller blocks (frames), we can also set the size of the audio frames and the amount of overlap of one frame to the next.

Audio Training Data Configuration
Field Description
options.DataModeAudio Set to 1 to use audio training data in the simulation
options.AudioData_S Set the audio data sparsity level
options.AudioData_frameLengthInSeconds Set the audio data frame length in seconds
options.AudioData_frameOffsetInSeconds Set the frame offset duration in seconds
options.AudioData_fileName Set the location of the audio training file

Lastly, in the following table we describe the struct fields for configuring the dictionary initialization for the IcTKM algorithm.

Dictionary Initialization Configuration
Field Description
options.DicoInit_random Set to 1 to use a random input dictionary
options.DicoInit_oracle Set to 1 to use the generating dictionary as the input dictionary

References

[1]
F. Teixeira and K. Schnass, "Compressed Dictionary Learning," arXiv 1805.00692, May, 2018.