Guler, Goksen2021-09-172021-09-172021-09-172021-08-30http://hdl.handle.net/10012/17415Technological advancements and the COVID-19 pandemic caused an increase in the adoption of technologies, services, and computers. Public cloud services surged with a 17\% increase, and adoption of software services such as online video conferencing tools increased for national and industrial actors. Subsequently, security became a crucial component due to increased adoption, connectivity, and cybersecurity risks of services and systems. The heightened interest from individuals, organizations and national actors in the security domain is not without cause, as security breaches caused by malicious actors surged in parallel. Security researchers and experts leverage their expertise to overcome threats by malicious actors. The side-channel domain is an active research topic for security experts. Side-channel information is gathered from the involuntary leak of information from a system, which can represent a vulnerability for corporations and individuals alike. Security researchers and malicious actors have shown that they can use side-channel information to attack and protect systems. For instance, a malicious actor can attack a system by extracting secrets using side-channel information such as power consumption or electromagnetic emissions. In contrast, protection of a system to help detect malware and attacks against a system is also possible by using side-channels such as cache and power consumption. Analyzing side-channel information is possible through different methodologies such as machine learning. Studies have shown that machine-learning models process side-channel information and help achieve the analysis goals with high accuracy and precision. However, machine-learning algorithms require large datasets, and in this case, this means a large number of samples from the used side-channels. The need for such datasets motivates this thesis to discuss the challenges and an approach to collecting large datasets of side-channel information from multiple systems. The challenge of reliably capturing side-channel information for later analysis grows with the number of assessed targets, the number of channels, the sampling rate, and the resolution of each sample. Side-channel data acquisition relies on physical access to target systems, making it challenging to collect data from several devices. Thus, to enable machine learning models and a robust analysis process, side-channel data acquisition requires a scalable, decentralized, and consistent approach to collect data. To solve the scalability issue around collecting side-channel information from several systems, we propose a data pipeline architecture to collect side-channel information that fulfills quality attributes such as maintainability, reusability, reliability, and scalability.enside-channeldata pipelinedata acquisitionobject-storagescalabilitysaamarchitecturemachine learningDecentralized Data Acquisition Pipeline with Machine Learning For Side-Channel InformationMaster Thesis