Compressed Dictionary Learning Toolbox
The compressed dictionary learning toolbox provides functions to learn dictionaries from large-scale synthetic and audio training data with the Iterative Compressed-Thresholding and K-Residual-Means (IcTKM) algorithm introduced in [1]. In this documentation we provide an overview of the toolbox and how to get started on a simple dictionary learning example. We also discuss how the various toolbox scripts and functions are organized and how they can be used to learn dictionaries. Links to the audio files containing the sonification of dictionaries learned on audio training data are provided. Lastly, we provide instructions for reproducing the numerical results presented in [1].
Contents
Overview
Sparsity in dictionary is an effective and widely used low-complexity model of high-dimensional data. In the sparse model we assume the existence of a dictionary in which every signal in the data class at hand can be approximated with a sparse linear expansion. Mathematically, we say that there exists a set of unit-norm vectors
referred to as atoms, such that every signal
can be approximately represented in the dictionary as
, where
is an index set and
is a sparse coefficient vector with
and
. In dictionary learning we address the problem of (algorithmically) finding a dictionary which provides sparse representations of a set of signals. In its most general form dictionary learning can be seen as a matrix factorization problem. Given a set of
signals represented by the
training data matrix
, decompose it into a
dictionary matrix
and a
coefficient matrix
, i.e., find
where every coefficient vector
in
is sparse.
The IcTKM algorithm is an alternating-minimization algorithm for dictionary learning which alternates between 2 main steps: updating the sparse coefficients based on the current version of the dictionary, and updating the dictionary based on the current version of the coefficients. For sparsity levels of practical interest, the most computationally expensive step in the alternating minimization algorithm is the sparse coefficient update, which entails the calculation of the matrix product between the training data
and the updated dictionary
. In the IcTKM algorithm, fast dictionary learning is achieved by compressing
and
into
dimensions with a Johnson-Lindenstrauss randomized transformation
. More specifically, the matrix product
is replaced by its compressed counterpart
in the sparse coefficient update step. When the Johnson-Lindestrauss transformation is carried out with fast algorithms, the computational complexity of the costly update is reduced by a factor of the order
. Given an input dictionary
, a training data matrix
, and an embedding dimension
, each iteration of the IcTKM algorithm is defined by the following steps:
- Draw the randomized Johnson-Lindenstrauss transform
.
- For each training signals
in the training set
, find its sparse support
via compressed thresholding:
.
- For each atom
in the updated dictionary
, compute
-residual means:
.
- Normalize the updated atoms:
.
In this toolbox we provide the ability to efficiently learn dictionaries from high-dimensional audio and synthetic training data with IcTKM. Efficiency is achieved by using randomized Johnson-Lindestrauss transforms which can be carried out with the fast Fourier transform (FFT). Furthermore, we provide the ability to fully utilize the available hardware with the toolbox. In particular, parallelization techniques can be used to process multiple training signals with the available CPU cores. Furthermore, CUDA® capable graphics processing units (GPUs) can be used to speed-up the most computationally demanding vectorized calculations.
The synthetic data is generated by using a sparse signal model with coefficients that follow a well-behaved distribution, i.e., the non-zero values follow a geometric sequence which gracefully decays in magnitude. Mathematically, we let each coordinate of our coefficient vector be defined as , where
is a permutation sequence,
is a sign sequence, and
is a geometric sequence defined as
for
and
for
with
uniformly distributed in
for some
. The synthetic signal then takes the form
, where
is noise defined by a Gaussian random vector with mean zero and variance
, see [1] for details.
The audio training data can be input as an audio file in WAVE, FLAC, MP3, or OGG formats. The (potentially long) signal in the input audio file is partitioned into smaller signals which are allowed to overlap. Dictionary learning from audio data is carried out directly from the time-domain audio samples with no frequency transformation taking place, as opposed to what is usually done in common audio tasks.
Getting Started
In the following example code we will see how to use IctKM to learn dictionaries from synthetic training data. First we define the dictionary learning problem parameters, i.e., the ambient dimension , the embedding dimension
, and the number of atoms
in the generating dictionary
. In the example below we use
, which amounts to a data compression ratio of
, and an overcomplete generating dictionary with
.
d = 256; % signal dimension m = ceil(d/2); % embedding dimension K = ceil(3*d/2); % Number of atoms in the dictionary
Next, we define the training data generation parameters, i.e., the sparsity level , the Gaussian noise level
, the parameter
for the geometric sequence, and the number of training signals
. We also set the random number generator seed for reproducible results.
S = 2; % sparsity level (4,8,12,16) rho = 0; % noise level (0,1/sqrt(d)) b = 0.1; % coefficient decay parameter (0,1) N = 100000; % number of training signals per iteration rng(1); % random generator seed
The generating dictionary is defined as the union of two bases in
, the Dirac basis and the first half elements of the discrete cosine transform basis.
dico = [eye(d),dct(eye(d))]; dico(:,K+1:2*d) = [];
To generate the synthetic training signals we invoke the toolbox function make_sparse_signal.m
Y = make_sparse_signal(dico,N,S,S,b,rho);
The training signals are now stored in the data matrix . We now set the initialization dictionary
uniformly at random from the unit sphere in
and run
iterations of the IcTKM algorithm with the toolbox function ictkm_simplified.m
dinit = randn(d,K); % Create input dictionary dinit = dinit*diag(1./sqrt(sum(dinit.*dinit))); % Normalize input dictionary maxit = 100; % Set maximum number of iterations rdico = ictkm_simplified(Y, K, S, m, maxit, dinit); % Run the IcTKM algorithm
The output (recovered) dictionary is now stored in the matrix rdico. We now evaluate the recovery performance of IcTKM. Given an atom
from
, the criteria for declaring the generating atom
as recovered is obtained by computing the dictionary correlation inner product
, more specifically via
. In the example code below we count the number of recovered atoms following this criteria and compute the percentage of recovered atoms.
rcorr=max(abs(rdico'*dico)); rec_count = find(rcorr > 0.99); % Count the number of recovered atoms percent_recov = (length(rec_count)/K)*100; % Calculate percentage of recovered atoms display(['percent of recovered atoms: ' num2str(percent_recov) '%']);
percent of recovered atoms: 91.6667%
To run this dictionary learning example and for more details, refer to the toolbox script ictkm_example_script.m
Listening to the Audio Dictionaries
For the audio training data we have used several audio recordings from the RWC database. The recordings were comprised of stereo audio signals sampled at 44.1 KHz. We have first summed the audio signals to mono and then partioned the resulting signals into smaller blocks of and
second durations. The blocks were allowed to overlap such that the maximally amount of overlap of one block with a shifted version of itself varied from
for the
seconds block, up to
for the
second block, see Section 4.2 of [1] for more details. We ran
iterations of IcTKM using a compression ratio of
to learn the dictionaries. The sparsity level was fixed at
and we have learned dictionaries of size
and
atoms. To create an audio representation of the learned dictionaries, we have first sorted the atoms by their fundamental frequency, normalized their entries in the range
, and then concatenated the renormalized atoms in a long array. A pause of
seconds has also been added between each atom for an improved listening experience. The data in the long array was saved to audio files, which can be listened by following the links below:
- classical piano recordings - 0.25 seconds training blocks: K=64 and K=256 and 1 second training blocks: K=64 and K=256
- flamenco recordings - 0.25 seconds training blocks: K=64 and K=256 and 1 second training blocks: K=64 and K=256
- folk japanese recordings - 0.25 seconds training blocks: K=64 and K=256 and 1 second training blocks: K=64 and K=256
Organization of the Toolbox
The compressed dictionary learning toolbox contains several folders, which have been organized with functions and scripts of related purpose. We now list these folders and describe their most important functions. For more details refer to the toolbox m-files, which can be accessed by following the links below.
- algorithm: This directory contains the IcTKM algorithm and auxiliary scripts. In ictkm_redraw_embedding.m we have the function to construct randomized Johnson-Lindenstrauss embeddings based on the Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT), and the Circulant Rademacher Transform (CRT). In ictkm_core.m and ictkm_core_parfor.m we have the functions to update the coefficients and dictionary, sequentially or in parallel by using the parfor construction. Lastly, in ictkm.m we have the IcTKM algorithm implementation with the main algorithm flow.
- data: This directory contains the MATLAB classes for generating synthetic training signals ictkm_synthetic_data.m and audio training signals ictkm_audio_data.m. These two classes inherit from the abstract data class ictkm_data.m which contains the interface getDataPoint() for online data generation. With this abstract class interface we can remove the dependence of the IcTKM algorithm on a particular training data type.
- example: Contains the scripts to reproduce the dictionary learning example in the Getting Started section of this documentation.
- init: Contains functions for creating an input dictionary for the IcTKM algorithm, such as in ictkm_dico_init.m, as well as miscellaneous initialization functions used in the toolbox.
- simulation: Contains the scripts for reproducing the numerical simulations presented in [1]. In particular, ictkm_synthetic_simulation.m and ictkm_scalability_simulation.m can be used for running the dictionary learning simulations with synthetic data in Secs. 4.1.1 to 4.1.3 and the scalability simulations of Section 4.1.4, respectively. Similarly, ictkm_audio_simulation.m has been used to run the audio simulations of Sec. 4.2. Lastly, ictkm_simulation.m is the main simulation function which is used by the specific simulation scripts in the folder.
- utils: Contains utilities scripts, such as function for synthesizing the learned dictionaries into audio files, console formatting functions, a fast DCT implementation, and utilities functions for properly initializing and using GPU acceleration.
In the following figure we illustrate the dependencies between the most important functions in the toolbox.
Reproducing the Numerical Results
As discussed in the previous section, the toolbox folder simulation contains all the required scripts for reproducing the numerical simulations of Section 4 in [1]. Specifically, the results for the synthetic training data in Sections 4.1.1 to 4.1.3 can be reproduced by using ictkm_synthetic_simulation.m. The scalability experiments of Section 4.1.4 can be reproduced by using ictkm_scalability_simulation.m. Lastly, the audio experiments of Sections 4.2.1 and 4.2.2 can be reproduced by using ictkm_audio_simulation.m. For complete reference, we now describe in details the options structure used by the main simulation script ictkm_simulation.m:
results = ictkm_simulation(options);
The IcTKM algorithm can be configured with the option structure fields in the following table. Here we can set the number of iterations, whether the dictionary correlation inner products are stored in the results structure for further inspection, and if the algorithm displays its status in the command window.
Field | Description |
---|---|
options.IcTKM_numIter | Sets the number of iterations |
options.IcTKM_storeRecCorr | Set to 1 to return the dictionary correlation inner products |
options.IcTKM_verbose | Set to 1 to display the algorithm progress and status in the console |
The run mode of the IcTKM algorithm can be configured with the struct fields described in the next table. The algorithm can be configured to run in parallel mode where the dictionary update loop is parallelized via 'parfor'. The algorithm can also be set to run in CUDA® enabled GPUs such that all vectorized computations will be performed directly in the GPU.
Field | Description |
---|---|
options.IcTKM_runMode_parallel | Set to 1 to enable parallelization of dictionary updates |
options.IcTKM_runMode_parallel_boost_workers | Set to 1 to use the maximum number of available threads |
options.IcTKM_runMode_useGPU | Set to 1 to use a CUDA capable GPU to speed-up vectorized calculations |
The configuration details of the Johnson-Lindenstrauss (JL) embedding are described in the following table. Here we can use JL embeddings based on orthogonal transforms such as the unitary discrete Fourier and cosine matrices, and also circulant matrix constructions. The algorithm can also be set to run with no data embedding.
Field | Description |
---|---|
options.IcTKM_embedding_PartialFourier | Set to 1 to use a JL matrix based on the partial discrete Fourier construction |
options.IcTKM_embedding_PartialDCT | Set to 1 to use a JL matrix based on the partial discrete cosine construction. |
options.IcTKM_embedding_PartialCirculant | Set to 1 to use a JL matrix based on the circulant construction |
options.IcTKM_embedding_dimension | Sets the dimension of the embedded space |
options.IcTKM_embedding_disabled | Set to 1 to disable data embedding |
The dictionary learning configuration parameters are described next. In particular, here we can set the ambient dimension , the dictionary overcomplete ratio
, the sample size ratio
, where
, and the toolbox random number generator seed for reproducible results.
Field | Description |
---|---|
options.ambientDimension | Sets the learning problem ambient dimension |
options.overcompletenessRatio | Sets the learning problem overcompleteness ratio |
options.SampleSizeRatio | Sets the learning problem sample size ratio |
options.randomSeed | Sets the random seed for reproducible results |
The struct fields for configuring the synthetic training data generation are described next. Here we can set the training data sparsity level , the noise level
, and whether a 'flat' generating sequence
is used where
for
and
for
, or a geometric sequence
is used where
for
and
for
with
uniformly distributed in
for some
. Here we can also set the training data dynamic range and the signal-to-noise ratio
. Lastly, we may also set the type of generating dictionary
; uniformly at random from the unit sphere in
and as the union of two bases in
, i.e., the Dirac basis and the first half elements of the discrete cosine (or Hadamard) transform basis.
Field | Description |
---|---|
options.DataModeSynthetic | Set to 1 to use synthetic training data in the simulation |
options.SyntheticData_S | Sets the synthetic data sparsity level |
options.SyntheticData_noiseLevel | Sets the synthetic data noise level |
options.SyntheticData_flatCoeff | Set to 1 to use data with flat coefficients |
options.SyntheticData_fixedCoeffDecay | Set to 1 to use data with a fixed coefficient decay |
options.SyntheticData_dynamicRange | Sets the synthetic data dynamic range in dB |
options.SyntheticData_origDicoRandom | Set to 1 to use the random dictionary for generating the synthetic data |
options.SyntheticData_origDicoDiracDCT | Set to 1 to use the Dirac+DCT dictionary for generating the synthetic data |
options.SyntheticData_origDicoDiracHadamard | Set to 1 to use the Dirac+Hadamard dictionary for generating the synthetic data |
The the audio training data configuration is described in the next table. Here we can set the training data sparsity level and the audio file path. As we partition the audio file in smaller blocks (frames), we can also set the size of the audio frames and the amount of overlap of one frame to the next.
Field | Description |
---|---|
options.DataModeAudio | Set to 1 to use audio training data in the simulation |
options.AudioData_S | Set the audio data sparsity level |
options.AudioData_frameLengthInSeconds | Set the audio data frame length in seconds |
options.AudioData_frameOffsetInSeconds | Set the frame offset duration in seconds |
options.AudioData_fileName | Set the location of the audio training file |
Lastly, in the following table we describe the struct fields for configuring the dictionary initialization for the IcTKM algorithm.
Field | Description |
---|---|
options.DicoInit_random | Set to 1 to use a random input dictionary |
options.DicoInit_oracle | Set to 1 to use the generating dictionary as the input dictionary |
References