Statistics and Actuarial Science
http://hdl.handle.net/10012/9934
Thu, 22 Oct 2020 21:50:51 GMT2020-10-22T21:50:51ZAsymptotic Distribution of the Optimal Value in Random Linear Programs: Application to Maximum Expected Shortfall
http://hdl.handle.net/10012/16434
Asymptotic Distribution of the Optimal Value in Random Linear Programs: Application to Maximum Expected Shortfall
Hall, Jesse
The properties of risk measures are of fundamental concern in quantitative finance, particularly in times of uncertainty. We study the behaviour of the asymptotic distribution of the maximum expected shortfall of a portfolio that has both market and credit risk,
where the marginal distributions of the risk factors are known but their joint distribution is unknown. We study the limiting behaviour of linear programs, as the maximum expected shortfall has a form similar to an optimal transport problem with stochastic cost function, and derive a result for the asymptotic distribution of the optimal solution similar to the central limit theorem. We then present simulations of maximum expected shortfall for a portfolio consisting of two counterparties subject to credit and market risk using the Basel IRB approach and a Merton single-factor copula model for portfolio losses. We observe that the histogram of maximum expected shortfall is well-described by a generalized extreme value distribution with a negative shape parameter.
Thu, 08 Oct 2020 00:00:00 GMThttp://hdl.handle.net/10012/164342020-10-08T00:00:00ZCost-Efficient Contingent Claims with Choquet Pricing
http://hdl.handle.net/10012/16420
Cost-Efficient Contingent Claims with Choquet Pricing
Zhu, Michael
We examine a problem, in which an investor seeks the cheapest contingent claim that achieves a minimum performance subject to a maximum allowed risk exposure. Specifically, our problem minimizes a non-linear cost functional, subject to both a minimum performance measure and a maximum risk measure, where all expectations are taken in the sense of Choquet. Solutions to our problem are called cost-efficient claims, and possess a desirable monotonicity property; the claims are anti-comonotonic with respect to the underlying asset, and therefore a hedge against its risk. By viewing our problem in the context of convex optimization, we apply a Karush-Kuhn-Tucker theorem to give necessary and sufficient conditions for cost efficiency. Such conditions also hold when the distortion functions are assumed to be absolutely continuous, but not necessarily continuously differentiable. This allows us to consider a broader set of risk measures, including the popular conditional value at risk (a.k.a. the expected shortfall). Under some additional assumptions, we explicitly characterize cost- efficient claims in closed-form, thereby extending the results of Ghossoub. Finally, a numerical example is provided to illustrate our results in full detail.
Wed, 30 Sep 2020 00:00:00 GMThttp://hdl.handle.net/10012/164202020-09-30T00:00:00ZPeptide Sequencing with Deep Learning
http://hdl.handle.net/10012/16412
Peptide Sequencing with Deep Learning
Qiao, Rui
In shotgun proteomics, de novo peptide sequencing from tandem mass spectrometry
data is the key technology for finding new peptide or protein sequences. It has successful applications in assembling monoclonal antibody sequences and great potentials for
identifying neoantigens for personalized cancer vaccines. In this thesis, I propose a novel
deep neural network-based de novo peptide sequencing model: PointNovo. The proposed
PointNovo model not only outperforms the previous state-of-the-art model by a significant
margin but also solves the long-standing accuracy–speed/memory trade-off problem that
exists in previous de novo peptide sequencing tools. Further, our experiment results show
that even though PointNovo is not trained to distinguish between true and false peptide
spectrum matching, its resulting log probability score can be used as a scoring function
to perform database searching. On several different datasets, we show that PointNovo,
when used as a database search engine, can achieve an identification rate that is at least
comparable to existing popular database search softwares.
We also extend and adapt an existing model to process Data Independent Acquisition
(DIA) data and propose the first de novo peptide sequencing algorithm for DIA tandem
mass spectra.
Finally, we develop a workflow that can identify tumor-specific antigens directly and
purely from mass spectrometry data of tumor tissues and test it on a published dataset of
tumor samples from melanoma patients. Our workflow applies de novo peptide sequencing
to detect mutated endogenous peptides, in contrast to the prevalent indirect approach of
combining exome sequencing, somatic mutation calling, and epitope prediction in existing
methods. More importantly, we develop machine learning models that are tailored to each
patient based on their own MS data. Such a personalized approach enables accurate identification of neoantigens for the development of personalized cancer vaccines. We applied
the workflow to datasets of five melanoma patients and expanded their immunopeptidomes
by 5% to 15%. Subsequently, we discovered 17 neoantigens of both HLA–I and HLA–II,
including those with validated T cell responses and those novel neoantigens that had not
been reported in previous studies.
Wed, 30 Sep 2020 00:00:00 GMThttp://hdl.handle.net/10012/164122020-09-30T00:00:00ZMatrix-Variate Regression with Measurement Error
http://hdl.handle.net/10012/16391
Matrix-Variate Regression with Measurement Error
Fang, Junhan
Matrix-variate regression models are useful for featuring data with a matrix structure, such as brain imaging data. However, those methods do not apply to data with measurement error or misclassification. While mismeasurement is an inevitable issue in the data collecting process, little research has been available to handle matrix-variate regression with mismeasurement. In this thesis, we explore several important problems concerning matrix-variate regression with error-contaminated data.
In Chapter 1, we provide a brief introduction for matrix-variate data and review relevant topics including logistic regression analysis, measurement error/misclassification mechanisms, regularization methods, and Bayesian inference procedures.
In Chapter 2, we discuss matrix-variate logistic regression for handling error-contaminated data. Measurement error in covariates has been extensively studied in many conventional regression settings where covariate information is typically expressed in a vector form. However, there has been little work on error-prone matrix-variate data which commonly arise from studies with imaging, spatial-temporal structures. We particularly focus on matrix-variate logistic measurement error models. We examine the biases induced from the naive analysis which ignores measurement error. Two measurement error correction methods are developed to adjust for measurement error effects. The proposed methods are justified both theoretically and empirically. We analyze a data set arising from a study examining electroencephalography(EEG) correlates of genetic predisposition to alcoholism with the proposed methods.
In Chapter 3, we consider a problem complement to that in Chapter 2. Instead of examining mismeasurement in covariates, here we study mismeasurement in binary responses. We particularly investigate the response misclassification effects on the matrix- variate logistic regression model. Matrix-variate logistic regression is useful in facilitating the relationship between the binary response and matrix-variates which arise commonly from medical imaging research. However, such a model is impaired by the presence of the response misclassification. It is imperative to account for misclassification effects when employing matrix-variate logistic regression to handle such data. In this chapter, we develop two inferential methods which account for misclassification effects. The first method is an imputation method which replaces the response variable with an observed and unbiased pseudo-response variable in the estimation procedure. The second method is derived from the likelihood function for the observed response surrogate. Our development is carried out for two settings where misclassification rates are either known or estimated from validation data. The proposed methods are justified both theoretically and empirically. We analyze the breast cancer Wisconsin prognostic data with the proposed methods.
Chapter 4 is a continuation and extension of Chapter 3. We consider regularized matrix- variate logistic regression with response misclassification, where matrix-variate data may assume a sparsity structure. With a limited sample size, the presence of a large number of redundant parameters entails the difficulty of estimation. In this chapter, we develop inferential methods which account for misclassification effects in combination with the inclusion of penalty functions to deal with the sparsity of matrix-variate data. We examine the biases induced from the naive analysis which ignores the response misclassification. Our development is carried out for two settings where misclassification rates are either known or estimated from validation data. The proposed methods are justified both theoretically and empirically. We analyze the breast cancer Wisconsin prognostic data with the proposed methods.
In Chapter 5, we shift our attention to the Bayesian framework. We consider applying Bayesian analysis to matrix-variate logistic regression. We propose a Bayesian algorithm to estimate the matrix-variate parameters element-wisely in combination with the use of horse-shore shrinkage prior. We investigate the influence on parameter estimation when ignoring the response misclassification and propose an algorithm to accommodate the effects of response misclassification. The performance of the proposed method is evaluated through numerical studies. We analyze the Lee Silverman voice treatment (LSVT) Companion data with the proposed method.
Finally, Chapter 6 summarizes the thesis work and presents some future work.
Mon, 28 Sep 2020 00:00:00 GMThttp://hdl.handle.net/10012/163912020-09-28T00:00:00Z