Haibo Yang

CISC 849 PhD Seminar: Non-Convex Optimization for Modern Machine Learning
_{^{(Spring 2024)}}

Personnel

Instructor: Haibo Yang, Assistant Professor, Dept. of Computing and Information Sciences Ph.D.
Contact: Rm 74-1073, hbycis@rit.edu
Time & Location: TuTh 2:00PM -- 3:15PM, Golisano Hall (GOL)-2455
Office Hours: Th 3:15PM – 4:15PM

Course Description

This course will introduce algorithm design and convergence analysis in non-convex optimization, with a strong emphasis on their practical applications in addressing contemporary challenges in machine learning and data science. The goal of this course is to prepare graduate students with a solid theoretical foundation at the intersection of optimization and machine learning so that they will be able to use optimization to solve advanced machine learning problems and/or conduct advanced research in the related fields. This course will focus on topics in nonconvex optimization that are of special interest in the machine learning community. Topics covered include large-scale distributed learning (for foundation models and large language models), federated learning, multi-task learning, as well as private and robust machine learning.

Course Materials

There is no required textbook. Most of the material covered in the class will be based on recently published papers, relevant monographs, or classical books. A list of recently trending or historically important papers on nonconvex optimization theory for machine learning will be provided.

Paper Reading Assignments

There will be estimated three paper reading assignments, each of which will be assigned during each topic set. Reading assignment must be typeset in NeurIPS format.

In each reading assignment, each student writes a review of a set of related papers in a topic set published in recent major machine learning venues (e.g., ICML, NeurIPS, ICLR, AAAI) or on arXiv. Some papers may be from those lectured in class.

The reviews may include the following: 1) a summary of the papers and their connections/relationships; 2) strengths/weaknesses of the papers from the following aspects: soundness of assumptions/theorems, empirical evaluation, novelty, and significance, etc.; 3) which parts are difficult to understand, questions about proofs/results/experiments (if there are any); and 4) how the papers can be improved and extended.

Final Project

Students could choose to finish a project individually or by a team of no more than two persons. Final reports will be due after project presentations in the final week. Final reports should follow the NeurIPS format. Each project is required to have a 20-minute presentation in the final week. Attendance to your fellow students' presentations is required. Potential project ideas include but are not limited to: 1) nontrivial extension of the results introduced in class; 2) novel applications in your own research area; 3) new theoretical analysis of an existing algorithm, etc.

Each project should contain something new. It is important that you justify its novelty.

Prerequisites

Working knowledge of probability and linear algebra. Prior exposure to convex/nonlinear optimization is a plus but not necessary.

Grading Policy

Class Participation: 10%; Paper Reading Assignments: 45%; Project: 45%.

Schedule

Week 1--2: Foundamentals of Optimization
- Basic analysis and linear algebra
- Foundations of convex analysis
- Concept of convergence and its metrics
Weeks 2--7: First-Order Methods
- Gradient descent and stochastic gradient methods
- Variance-reduction methods
- Adaptive methods
- Parameter-free methods
- Case study: algorithm design in deep learning
Weeks 8--12: Distributed and Federated Learning
- Synchronous and asynchronous SGD
- Local update SGD and federated learning
- Communication efficient methods: quantization and sparsification
- Case study: large-scale distributed learning for large language models (LLM); differentially private distributed learning; robust distributed learning
Weeks 13--16: Multi-Objective Learning
- Classical MOO: weighted sum and constraint methods
- Multiple (stochastic) gradient descent methods
- Case study: multi-task learning

Academic Integrity

As an institution of higher learning, RIT expects students to behave honestly and ethically at all times, especially when submitting work for evaluation in conjunction with any course or degree requirement. The Golisano College of Computing and Information Sciences encourages all students to become familiar with the RIT Honor Code and with RIT's Academic Integrity Policy. Students may discuss assignments with others including classmates, tutors and SLI instructors. After any such discussions, students must discard all written notes/pictures/etc. Submitting any work written by others or as an unsanctioned team is considered an act of academic dishonesty. Team-developed work also must be created solely by the team members and not copied from others or other sources. Work copied from Github or other similar sources will be subject to prosecution for breach of academic integrity.