THESIS PRESENTATION

Title:  Scientific Data Augmentation using controllable GAN

Student Name:  Banooqa Banday

Advisor:  Dr. Tanzima Islam

Date/Time: December 2, 2022/11:00 a.m.

 

Location: Zoom (Online):   https://txstate.zoom.us/j/98698827193

 

Abstract: One of the most common challenges in implementing machine learning models, where performance is almost entirely dependent on the quantity and quality of the data available, is insufficient data. As a result, data augmentation using generative models, specifically Generative Adversarial Network (GAN), has seen a rise in the past few years. It has played a very important role in improving the performance and accuracy of machine learning models by scaling up the data size without employing exhausting and costly processes. In the paper Performance Optimality or Reproducibility: That is the Question, the authors mention at length about the challenges they faced while collecting a dataset that looks at performance of applications at various scales, network bandwidth levels, placement styles, and power limits. To collect such scientific data, one needs dedicated time on the cluster. Even if one does get the required dedicated time, like the authors of the mentioned paper did after months of wait time, the time given is still limited and doesn’t ensure collection of all possible data points. Augmenting such data to the limited existing data using a GAN would be a good alternative. However, most of the features within scientific data are nominal and GANs are recorded to not perform very well when data is not continuous. We propose using controllable generation to augment scientific data and evaluate their effectiveness in improving the quality of predictive modeling using the augmented data.

Deadline: Dec. 31, 2022, midnight