About me

I'm a May 2020 graduate of Texas State University with a Master's in Computer Science. I have a passion for engineering and data-driven work, and I'm actively looking for a role where I can build a data engineering and analytics career. While my background has primarily been hardware development, I've decided to shift into software oriented work to become more effective in a "big data" world. The focus of my studies at Texas State University was on data-driven research methods and software implementations. My coursework includes an independent study in the Efficient Computing Laboratory with Martin Burtscher as my advisor. 

Following my goals, I've also taken courses with topics in parallel programming, machine learning, data mining, and information retrieval. I'm familiar with exploratory data analysis using Python packages: Pandas, Scikit, and Gensim to name a few. Currently, I’m learning modern big data technology stacks where I can, and to sharpen developer skills, I'm contributing to a Django-built web application using Elasticsearch. Please visit the GitHub link for more technical details and other projects.

Research projects

Spring 2020. Improvement on fast unfolding of communities in large networks. The objective of this study is to research methods of rapidly processing communities of graph networks using a modularity optimization algorithm known as the Louvain method. I am exploring application of parallel processing methods to improve clustering of community structures in very large networks, whether it be a reduction in overall computation time or making reasonably accurate data available in intermediate time steps.

Discovery of communities is used at varying granularities in the analysis of embedded groupings of nodes, for example, in social communities at local, city, and national levels. Since manual inspection is not feasible with big data, a fast method for community discovery is beneficial when decisions must be made in a reasonable amount of time.

>>See this project's progress blog<<

Fall 2019. Data science pipeline for Twitter data. This work encompassed exploratory data analysis on 3.1 million Twitter documents (stored in MongoDB) relating to a broad social media movement. I used Scikit and Gensim natural language toolkits to find user communities and extract specific topic phrases occurring therein.

Reachout

sbs98 at txstate dot edu

LinkedIn | GitHub