ExCom NH Section Aug 20, 7pm

Virtual: https://events.vtools.ieee.org/m/461769

August NH Section ExCom meeting Virtual: https://events.vtools.ieee.org/m/461769

Foundational Speech Models and their Efficient Training with NVIDIA NeMo {AI Talks with Tea/Coffee #30}

Virtual: https://events.vtools.ieee.org/m/495161

https://landing.signalprocessingsociety.org/ieee-sps-webinars-27-aug-2025 The intersection of speech and language models offer unique opportunities and challenges. This talk provides a comprehensive walkthrough of speech-language model research from NVIDIA NeMo. We cover several types of models such as attention-encoder-decoder Canary-1B, and LLM-based architectures such as SALM or BESTOW. In particular, we highlight the challenges in training and inference efficiency of such models and propose robust solutions via 2D bucketing and batch size OOMptimizer. Finally, we highlight the difficulty of preserving text-domain capabilities in speech-augmented training and present several possible solutions: EMMeTT, VoiceTextBlender, and Canary-Qwen-2.5B. About the Presenter: Piotr Żelasko received the B.S. and M.Sc. degrees in acoustic engineering, and the Ph.D. in electronic engineering from AGH-University Krakow, Poland in 2013, 2014, and 2019 respectively. He is currently a research scientist at NVIDIA NeMo building multitask and multimodal models and efficient training infrastructure. He held a research scientist position at JHU’s CLSP and developed speech technology at different companies (Techmo, Avaya, Meaning.Team). Dr. Żelasko is a co-author of the next-generation Kaldi toolkit (k2) and the maintainer of Lhotse. Speaker(s): Piotr Zelasko, Agenda: https://landing.signalprocessingsociety.org/ieee-sps-webinars-27-aug-2025 Please register here too and on Vtools too. Virtual: https://events.vtools.ieee.org/m/495161

Foundational Speech Models and their Efficient Training with NVIDIA NeMo {AI Talks with Tea/Coffee #30}

Virtual: https://events.vtools.ieee.org/m/495161

https://landing.signalprocessingsociety.org/ieee-sps-webinars-27-aug-2025 The intersection of speech and language models offer unique opportunities and challenges. This talk provides a comprehensive walkthrough of speech-language model research from NVIDIA NeMo. We cover several types of models such as attention-encoder-decoder Canary-1B, and LLM-based architectures such as SALM or BESTOW. In particular, we highlight the challenges in training and inference efficiency of such models and propose robust solutions via 2D bucketing and batch size OOMptimizer. Finally, we highlight the difficulty of preserving text-domain capabilities in speech-augmented training and present several possible solutions: EMMeTT, VoiceTextBlender, and Canary-Qwen-2.5B. About the Presenter: Piotr Żelasko received the B.S. and M.Sc. degrees in acoustic engineering, and the Ph.D. in electronic engineering from AGH-University Krakow, Poland in 2013, 2014, and 2019 respectively. He is currently a research scientist at NVIDIA NeMo building multitask and multimodal models and efficient training infrastructure. He held a research scientist position at JHU’s CLSP and developed speech technology at different companies (Techmo, Avaya, Meaning.Team). Dr. Żelasko is a co-author of the next-generation Kaldi toolkit (k2) and the maintainer of Lhotse. Speaker(s): Piotr Zelasko, Agenda: https://landing.signalprocessingsociety.org/ieee-sps-webinars-27-aug-2025 Please register here too and on Vtools too. Virtual: https://events.vtools.ieee.org/m/495161

Man-Machine Collaboration: Intelligent Systems as Robots (I, II)

Virtual: https://events.vtools.ieee.org/m/493063

Man-Machine Collaboration: Intelligent Systems as Robots I, II Dr. Sridhar Raghavan My long term research has been on Intelligent Man-Machine Collaboration, especially on creative problem solving activities like Decision Making, Software Development, and Music. Needless to say on tasks that have substantial amount of digressing tedious chores from a Human Perspective. This reiteratives an important point that Machines have to be subservient to Humans (in responsible ways) across the full continuum from idiot savants to loosely-coupled autonomous systems. In this regard User-Directed Automation and User Productivity are far more critical than mere usability aspects. While Robots conjures up an inevitable streotype connotations of electro-mechanical systems, I am far more interested in getting Software Systems deliver productivities and performance in all collaboration tasks, that may not resemble much like Robotics domains at all. My central conjecture is that Systems are actually Robots, and Robots are Systems and they share a lot of things in common about collaboration and productvity that can be applied & should be applied to both seamlessly. I will illustrate and drive this "Perspective" shift, and its importance, through many common use case examples as well as projections of this shift for future especially for AI, LLM and Agentic systems. This webinar will be the first Part with second part slated for the October timeframe. Virtual: https://events.vtools.ieee.org/m/493063

Back to Top