# Research...

...or what I do between surfing the internet, ferrying around kids, and drinking coffee.

In order to avoid having to deal with my proposal about 10 more times (2 kids :D), the Austrian Science Fund (FWF) granted me a START-project called 'Optimisation Principles, Models & Algorithms for Dictionary Learning' (Y760), which started 1st of June 2015.

For more details have a look at the START project page.

Before that I was investigating another FWF-project called 'Dictionary Learning for Biometric Data', see Dictionary Learning.

If dictionary learning is not your thing I'm still happy to talk about Function Identification, or the Old Sparse Stuff if this is more interesting to you.

In general, I'm quite happy to mess around with linear algebra, probability/measure theory, functional analysis, optimisation and numerics, severely traumatised (but sometimes strangely fascinated) by combinatorics, differential geometry and topology, and absolutely useless for all other areas of math.

# Dictionary Learning

If you have read any compressed sensing or sparse approximation papers, you will most likely have run into the statement 'you have a signal y which is sparse in a given basis/dictionary'. The question is just who gave you that basis/dictionary :D? In dictionary learning the tasks is exactly this. You have a bunch of signals, let's say K of them stored in a matrix Y, and you want to find a dictionary D with N atoms, to approximate all your signals with S atoms. So you want to find a factorisation of the matrix Y into the dictionary D and coefficients X, with S nonzeros in every column, such that the error in E is small.

Y=D*X + E, Y...(dxK), D....(dxN), X...(NxK), E...(dxK)

Then you can vary the question, how many atoms do I need for a given sparsity level and error, which sparsity level can I achieve, given the number of atoms and error? Also you can become more philosophic, how is the coherence of the atoms related to this, how can we characterise the usefulness of an atom?

As mentioned before these are all hard problems! Why? They are hard to treat theoretically because they are so non-linear and they are even hard to treat numerically as at some point you need to find sparse approximations to calculate the error, which is time consuming and depends on the algorithm you used.

The good news is that more and more people are interested, eg. the partners of the SMALL Project, and that the theory is catching up with the algorithms. So there are now theoretical results for l_1 minimisation (our old paper, John Wright et al., and Remi's new paper) and, I have to admit I'm proud of that, a theoretical result for the K-SVD minimisation principle, Karin's new paper.

If you read the new papers you can probably guess that a lot of the techniques can be recycled for other minimisation principles, which is what I'm working on now, and who knows maybe we can also tackle MOD, MAP, etc.

## January'14 update:

Dictionary identification has officially become a hot topic with lots of new developments, first this June'12 paper by Spielman et al., which I missed last time I updated the research page, and then we have Arora et al. from August'13, Agarwal et al. from September'13, Agarwal et al., from October'13, Arora et al., from January'14, and last week I finally managed to write up my May results.

## January'15 update:

Dictionary identification results are popping up like mushrooms. I have made one last attempt at collecting results (guaranteed incompleteness): A personal introduction to theoretical dictionary learning explains the topic and can be enjoyed with the whole family.

## Function Identification

This is what Massimo was paying me to work on, more details should still be available at START-Project - "Sparse Approximation and Optimization in High Dimensions".

Assume you want to learn a very high-dimensional function f(x) that maps the d-dimensional Euclidean ball to the real line. Well, without any other assumption you are in trouble and you will need an amount of samples exponential in d to approximate it - not good. However, if you assume some additional structure on the function, things look at lot better. So if your function can be written as f(x)= g(Ax), where A is a kxd matrix, k much smaller than d, with orthonormal, compressible rows, and g some function from the k-dimensional Euclidean ball to the real line, you can find some suggestions for learning it here. If you are in the lucky case that your A is actually part of the identity matrix, ie. g depends only on k out of d coordinates, an even more efficient strategy can be found here.

## Old Sparse Stuff

If you are interested in the things I did during my PhD, you are now obliged to read my thesis :D, since EPFL restructured their webpage and got rid of junk like my old research page.