|dc.description.abstract||The massive amount of current data has led to many different forms of data analysis processes that aim to explore this data to uncover valuable insights such as trends, anomalies and patterns. These processes support decision makers in their analysis of varied and changing data ranging from financial transactions to customer interactions and social network postings. These data analysis processes use a wide variety of methods, including machine learning, in several domains such as business, finance, health and smart cities.
Several data analysis processes have been proposed by academia and industry, including CRISP-DM and SEMMA, to describe the phases that data analysis experts go through when solving their problems. Specifically, CRISP-DM has modeling as one of its phases, which involves selecting a modeling technique, generating a test design, building a model, and assessing the model. However, automating these data analysis modeling processes faces numerous challenges, from a software engineering perspective. First, software users expect increased flexibility from the software as to the possible variations in techniques, types of data, and parameter settings. The software is required to accommodate complex usage and deployment variations, which are difficult for non-experts. Second, variability in functionality or quality attributes increases the complexity of these systems and makes them harder to design and implement. There is a lack of a framework design that takes variability into account. Third, the lack of a more comprehensive analysis of variability makes it difficult to evaluate opportunities for automating data analysis modeling.
This thesis proposes a variability-aware design approach to the data analysis modeling process. The approach involves: (i) the assessment of the variabilities inherent in CRISP-DM data analysis modeling and the provision of feature models that represent these variabilities; (ii) the definition of a preliminary framework design that captures the identified variabilities; and (iii) evaluation of the framework design in terms of possibilities of automation. Overall, this work presents, to the best of our knowledge, the first approach based on variability assessment to design data modeling process such as CRISP-DM. The approach advances the state of the art by offering a variability-aware design a solution that can enhance system flexibility and a novel software design framework to support data analysis modeling.||en