2023

ADELT: Transpilation Between Deep Learning Frameworks

Linyuan Gong, Jiayi Wang, Alvin Cheung

Building Code Transpilers for Domain-Specific Languages Using Program Synthesis (Experience Paper)

Sahil Bhatia, Sumer Kohli, Sanjit A. Seshia, Alvin Cheung

A PACTful Agenda for Cloud Programming Research: (Invited Talk)

Alvin Cheung

Distributed Matrix-Based Sampling for Graph Neural Network Training

Alok Tripathy, Katherine Yelick, Aydin Buluc

Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly

Giulia Guidi, Gabriel Raulet, Daniel Rokhsar, Leonid Oliker, Katherine Yelick, Aydin Buluç

Scaling Generalized N-Body Problems, A Case Study from Genomics

Marquita Ellis, Aydin Buluc, Katherine Yelick

BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper

Giulia Guidi, Marquita Ellis, Daniel Rokhsar, Katherine Yelick, and Aydın Buluç

Fast multiplication of random dense matrices with fixed sparse matrices

Tianyu Liang, Riley Murray, Aydın Buluç, James Demmel

Surrogate-based Autotuning for Randomized Sketching Algorithms in Regression Problems

Younghyun Cho, James W. Demmel, Michał Dereziński, Haoyun Li, Hengrui Luo, Michael W. Mahoney, Riley J. Murray

Fast Exact Leverage Score Sampling from Khatri-Rao Products with Applications to Tensor Decomposition

Vivek Bharadwaj, Osman Asif Malik, Riley Murray, Laura Grigori, Aydin Buluc, James Demmel

Harnessing the Crowd for Autotuning High-Performance Computing Applications

Younghyun Cho; James W. Demmel; Jacob King; Xiaoye S. Li; Yang Liu; Hengrui Luo

Distributed-Memory Randomized Algorithms for Sparse Tensor CP Decomposition

Vivek Bharadwaj, Osman Asif Malik, Riley Murray, Aydin Buluç, James Demmel

Hybrid Models for Mixed Variables in Bayesian Optimization

Hengrui Luo, Younghyun Cho, James W. Demmel, Xiaoye S. Li, Yang Liu

GPTune: multitask learning for autotuning exascale applications

Yang Liu, Wissam M. Sid-Lakhdar, Osni Marques, Xinran Zhu, Chang Meng, James W. Demmel, Xiaoye S. Li

Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms

Jingyi Xu, Sehoon Kim, Borivoje Nikolic

Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration

Hasan Genc, Seah Kim, Alon Amid, Ameer Haj-Ali, Vighnesh Iyer, Pranav Prakash, Jerry Zhao, Daniel Grubb, Harrison Liew, Howard Mao, Albert J. Ou, Colin Schmidt, Samuel Steffl, John Charles Wright, Ion Stoica, Jonathan Ragan-Kelley, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao

A 16mm2 106.1 GOPS/W Heterogeneous RISC-V Multi-Core Multi-Accelerator SoC in Low-Power 22nm FinFET

Abraham Gonzalez, Jerry Zhao, Ben Korpan, Hasan Genc, Colin Schmidt, John Charles Wright, Ayan Biswas, Alon Amid, Farhana Sheikh, Anton Sorokin, Sirisha Kale, Mani Yalamanchi, Ramya Yarlagadda, Mark Flannigan, Larry Abramowitz, Elad Alon, Yakun Sophia Shao, Krste Asanovic, Borivoje Nikolic

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

Qijing Huang, Aravind Kalaiah, Minwoo Kang, James Demmel, Grace Dinh, John Wawrzynek, Thomas Norell, Yakun Sophia Shao

Verifying RISC-V Physical Memory Protection

Kevin Cheang, Cameron Rasmussen, Dayeol Lee, David W. Kohlbrenner, Krste Asanovic, Sanjit A. Seshia

Profiling Hyperscale Big Data Processing

Abraham Gonzalez, Aasheesh Kolli, Samira Manabi Khan, Sihang Liu, Vidushi Dadu, Sagar Karandikar, Jichuan Chang, Krste Asanovic, Parthasarathy Ranganathan

Hammer: a modular and reusable physical design flow tool: invited

Harrison Liew, Daniel Grubb, John Wright, Colin Schmidt, Nayiri Krzysztofowicz, Adam M. Izraelevitz, Edward Wang, Krste Asanovic, Jonathan Bachrach, Borivoje Nikolic

4-3-an-eight-core-1-44ghz-risc-v-vector-machine-in-16nm-finfet

Colin Schmidt, John Charles Wright, Zhongkai Wang, Eric Chang, Albert J. Ou, Woo-Rham Bae, Sean Huang, Anita Flynn, Brian C. Richards, Krste Asanovic, Elad Alon

COBRA: A Framework for Evaluating Compositions of Hardware Branch Predictors

Jerry Zhao, Abraham Gonzalez, Alon Amid, Sagar Karandikar, Krste Asanovic

Simulator Independent Coverage for RTL Hardware Languages

Kevin Laeufer, Vighnesh Iyer, David Biancolin, Jonathan Bachrach

A Hardware Accelerator for Protocol Buffers

Sagar Karandikar, Chris Leary, Chris Kennelly, Jerry Zhao, Dinesh Parimi, Borivoje Nikolic, Krste Asanovic, Parthasarathy Ranganathan

An Automated and Process-Portable Generator for Phase-Locked Loop

Zhongkai Wang, Minsoo Choi, Eric Chang, John Charles Wright, Wooham Bae, Sijun Du, Zhaokai Liu, Nathan Narevsky, Colin Schmidt, Ayan Biswas, Borivoje Nikolic, Elad Alon

Accessible, FPGA Resource-Optimized Simulation of Multiclock Systems in FireSim

David Biancolin, Albert Magyar, Sagar Karandikar, Alon Amid, Borivoje Nikolic, Jonathan Bachrach, Krste Asanovic

Code Transpilation for Hardware Accelerators

Yuto Nishida, Sahil Bhatia, Shadaj Laddad, Hasan Genc, Yakun Sophia Shao, Alvin Cheung

Full Stack Optimization of Transformer Inference: a Survey

Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami

RoSÉ: A Hardware-Software Co-Simulation Infrastructure Enabling Pre-Silicon Full-Stack Robotics SoC Evaluation

Dima Nikiforov, Shengjun Chris Dong, Chengyi Lux Zhang, Seah Kim, Borivoje Nikolic, Yakun Sophia Shao

CDPU: Co-designing Compression and Decompression Processing Units for Hyperscale Systems

Sagar Karandikar, Aniruddha N. Udipi, Junsun Choi, Joonho Whangbo, Jerry Zhao, Svilen Kanev, Edwin Lim, Jyrki Alakuijala, Vrishab Madduri, Yakun Sophia Shao, Borivoje Nikolic, Krste Asanovic, Parthasarathy Ranganathan

MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks

Seah Kim, Hasan Genc, Vadim Vadimovich Nikiforov, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

Recent advances in Deep Neural Networks (DNNs) have led to active development of specialized DNN accelerators, many of which feature a large number of processing elements laid out spatially, together with a multi-level memory hierarchy and flexible interconnect...

Vertically Integrated Computing Labs Using Open-Source Hardware Generators and Cloud-Hosted FPGAs

Alon Amid, Albert J. Ou, Krste Asanovic, Yakun Sophia Shao, Borivoje Nikolic