|dc.description.abstract||Multitask learning (MTL) was originally defined by Caruana (1997) as "an approach to inductive transfer that improves learning for one task by using the information contained in the training signals of other related tasks". In the linear model setting this is often realized as joint feature selection across tasks, where features (but not necessarily coefficient values) are shared across tasks. In later work related to MTL Jalali (2010) observed that sharing all features across all tasks is too restrictive in some cases, as commonly used composite absolute penalties (like the l(1,∞) norm) encourage not only common feature selection but also common parameter values between settings. Because of this, Jalali proposed an alternative "dirty model" that can leverage shared features even in the case where not all features are shared across settings. The dirty model decomposes the coefficient matrix Θ into a row-sparse matrix B and an elementwise sparse matrix S in order to better capture structural differences between tasks.
Multitask learning problems arise in many contexts, and one of the most pertinent of these is healthcare applications in which we must use data from multiple patients to learn a common predictive model. Often it is impossible to gather enough data from any one patient to accurately train a full predictive model for that patient. Additionally, learning in this context is complicated by the presence of individual differences between patients as well as population-wide effects common to most patients, leading to the need for a dirty model. Two additional challenges for methods applied in the healthcare setting include the need for scalability so that the model can work with big data, and the need for interpretable models. While Jalali gives us a dirty model, this method does not scale as well as many other commonly used methods like the Lasso, and does not have a clean interpretation. This is particularly true in the healthcare domain, as this model does not allow us to interpret coefficients in relation to all settings. Because B coefficients in the dirty model paradigm are not required to be the same for all settings for a particular feature, departures from the global model may be captured in B or S leading to ambiguity in interpreting potential main effects.
We propose a "cleaner" dirty model gLOP (global/LOcal Penalty) that is capable of representing global effects between settings as well as local setting-specific effects, much like the ANalysis Of VAriance (ANOVA) test in inferential statistics. However, the goal of the ANOVA is not to build an accurate predictive model, but to identify coefficients that are non-zero at a given level of statistical significance. By combining the dirty model's decomposed Θ matrix and the underlying concept behind the ANOVA, we get the best of both worlds: an interpretable predictive model that can accurately recover the underlying structure of a given problem. gLOP is structured as a coordinate minimization problem which decomposes Θ into a global vector of coefficients g and a matrix of local setting-specific coefficients L. At each step, g is updated using the standard Lasso paradigm applied to the composite global design matrix in which the design matrices from each setting are concatenated vertically. In contrast, L is updated at each step using the standard Lasso paradigm applied separately to each setting. Another significant advantage of our model gLOP in comparison to previous dirty models is the out-of-the-box use of standard Lasso implementations instead of less frequently implemented CAP family penalties such as the l(1,∞) norm. Additionally, gLOP has a significant advantage in lowered computational time demands as it takes larger steps towards the global optimum at each iteration. We present experimental results comparing both the runtime and structure recovered by gLOP to Jalali's dirty model.||en