Consent of the instructor.
The students will be able to:
• Formulate concrete research questions to address business or scientific objectives
• Identify or collect data to answer research questions
• Design tools to process, clean, and organize data for subsequent analysis
• Create and run data processing and analysis pipelines to compute statistical results over large-scale data sets using modern high-performance computing infrastructure such as Apache Spark
• Present results clearly using data visualizations and written prose
• Interpret analysis results and identify their implications for business concerns or scientific interest
• Determine appropriate data processing technology to support a desired analysis method
New course effective Fall 2017. Available only for computer science majors.