Machine Learning Methods for Annual Influenza Vaccine Update
Influenza is a public health problem that causes serious illness and deaths all over the world. Vaccination has been shown to be the most effective mean to prevent infection. The primary component of influenza vaccine is the weakened strains. Vaccination triggers the immune system to develop antibodies against those strains whose viral surface glycoprotein hemagglutinin (HA) is similar to that of vaccine strains. However, influenza vaccine must be updated annually since the antigenic structure of HA is constantly mutation. Hemagglutination inhibition (HI) assay is a laboratory procedure frequently applied to evaluate the antigenic relationships of the influenza viruses. It enables the World Health Organization (WHO) to recommend appropriate updates on strains that will most likely be protective against the circulating influenza strains. However, HI assay is labour intensive and time-consuming since it requires several controls for standardization. We use two machine-learning methods, i.e. Artificial Neural Network (ANN) and Logistic Regression, and a Mixed-Integer Optimization Model to predict antigenic variety. The ANN generalizes the input data to patterns inherent in the data, and then uses these patterns to make predictions. The logistic regression model identifies and selects the amino acid positions, which contribute most significantly to antigenic difference. The output of the logistic regression model will be used to predict the antigenic variants based on the predicted probability. The Mixed-Integer Optimization Model is formulated to find hyperplanes that enable binary classification. The performances of our models are evaluated by cross validation.