📰 AI Researcher

Ph.D. Student @ HKU | Research Associate @ UCSB | Specializing in Efficient AI, LLM Optimization, and Hardware-Aware Training

🚀 Actively seeking full-time opportunities in AI research, ML systems, or hardware-software co-design starting in 2025/2026.

📫 zhoutomas177@gmail.com(Personal)

✉️ ryjjc@connect.hku.hk

📍 Hong Kong SAR (Now)

🎓 Education

Ph.D., EEE | The University of Hong Kong (December 2025)

Focus: Efficient LLM training/inference, quantization, reasoning, and edge deployment
Supervised by Prof. Ngai Wong
Publications in NLP/FPGA-related conferences/Journal

M.S., IC Design | HKUST (December 2019)

Advisor: Prof. Mansun
Focus: VLSI Design, Embedded Systems, Semiconductor Devices

B.Eng., IC Design | National Huaqiao University (June 2018)

Multiple scholarships and an exchange student experience in Taiwan

💼 Experience

Research Intern @ Samsung Research America

May. 2025 – Aug. 2025

On-device LLM framework
Optimization for NPU arch

Research Associate @ UCSB

Sept. 2023 – Apr. 2025 | Advisor: Prof. Zheng Zhang

Developed low-bit quantized fine-tuning techniques for LLMs (QuZO, LoRA variants)
Collaborated with the Amazon AGI team on scalable training paradigms
NAACL 2024 Spotlight Paper: Demonstrated high scalability vs. other tuning baselines (LoRETTA)
EMNLP 2025 Main Conference Paper: Low-bit BP-free Training (QuZO)

Research Assistant @ CUHK

Apr. 2021 – Dec. 2021 | Advisor: Prof. Guoliang Xing

Co-designed FPGA-GPU hybrid acceleration schemes
Led NFC wireless charging system project from concept to prototype

Mixed-Signal IC Design Engineer @ ASTRI

Sep. 2019 – Mar. 2021 | Technology Co-Design Group

Designed key analog IPs, including ADCs, comparators, and amplifiers
Delivered taped-out chip with 10-bit ADC and PMU subsystems

🧠 Selected Publications

QuZO: Quantized Zeroth-Order Fine-Tuning for LLMs – EMNLP 2025
LoRETTA: Tensor-Train Adaptation for LLMs – NAACL 2024
DyBit: Dynamic Bit-Precision Inference – IEEE TCAD 2023
MSD: Mixing Signed Digits on FPGAs – FCCM 2023
NoiseZO: RRAM-Driven ZO Optimization – DAC 2025
HKLUT: Hundred-kilobyte lookup tables for Super-resolution - IJCAI 2024
PECAN: Product-Quantized CAM Network – DATE 2023
Lite It Fly: All-Deformable Butterfly Network – TNNLS (in brief)

📝 Full publication list on Google Scholar

🚀 Research Highlights

Machine learning and systems, with a focus on efficient training and inference:

Efficient LLM Fine-Tuning: Developed QuZO and LoRETTA frameworks to push the limit of parameter-efficient and quantized tuning strategies.
Hardware-Aware ML: Designed acceleration methods on FPGAs and NPU chips for DNN inference and edge AI.
Algorithm/Hardware Co-Design: Collaborated on the hardware compiler optimization across algorithms and model simulators.

📷 Fun Fact

I enjoy exploring the intersection of AI algorithms and hardware—whether it’s crafting efficient LLM models, squeezing memory on an edge chip, or analyzing training efficiency.

🤝 Academic

I’m passionate about bridging academia and decentralized technology—whether it’s co-authoring papers on efficient LLM training, collaborating with global research labs, or exploring blockchain infrastructure projects that bring AI infrastructure and intelligent agents on-chain.

🔧 Technical Skills

Languages: Python, C/C++, MATLAB, Verilog
Frameworks & Platforms: PyTorch, TensorFlow (incl. Lite & Keras), CUDA
Tools: Cadence, Xilinx Vivado & ISE, HSpice, Modelsim, VS Code

Visitor Map