About
I build AI systems across the full stack, from GPU kernels to AI product leadership. At NVIDIA, I focus on AI-driven GPU kernel generation and programming models that push the boundaries of what's possible in AI infrastructure.
Previously at IBM Research, I led teams developing Watson Code Assistant, receiving an Outstanding Technical Achievement Award for leading the creation of the first generative model for the product. My work spans the full lifecycle of high-performance AI systems — from silicon-level optimization to developer-facing tools.
Before IBM, I contributed to the system software powering Summit and Sierra, the world's fastest supercomputers in 2018, at Oak Ridge National Laboratory. At Pacific Northwest National Laboratory, I built distributed runtime systems for massively multithreaded graph analytics. I also founded NYU Courant's first high-performance machine learning course, bridging HPC and modern AI.
Education
- Ph.D. in Computer Architecture — Polytechnic University of Catalunya (UPC), Barcelona
- M.Sc. in Computer Engineering — University of Rome Tor Vergata, Rome
- B.Sc. in Computer Engineering — Roma Tre University, Rome
Featured Work
CUDA Tile Programming Model
GPU kernel optimization and new programming models for AI workloads at NVIDIA.
Watson Code Assistant
Led creation of the first generative model for IBM's AI-powered code assistant product.
Summit & Sierra Supercomputers
System software for the world's #1 and #2 fastest supercomputers (2018 TOP500).
Awards
- IBM Outstanding Technical Achievement Award (2023) — For leading the creation of the first generative model for Watson Code Assistant
- IBM Research Division Award (2022) — For contributions to AI for code
- HPCwire Editors' Choice Award (2018) — For Summit supercomputer
- R&D 100 Award Finalist (2017) — For GEMS graph analytics framework
- PNNL Outstanding Performance Award (2015) — For contributions to extreme-scale computing
- IPDPS Best Paper Award (2012) — For research on scaling irregular applications on massively multithreaded systems
- HiPEAC Paper Award (2010) — For research on TLB misses in chip multiprocessors
In the Press
CUDA Tile Programming Model (2025)
- NVIDIA Introduces CUDA 13.1 with CUDA Tile — InsideHPC
- NVIDIA CUDA 13.1 Powers Next-Gen GPU Programming — NVIDIA Developer Blog
AI for Code / Watson Code Assistant (2022–2023)
Summit & Sierra Supercomputers (2018)
- Two DOE Supercomputers Top List of World's Fastest — U.S. Department of Energy
- Sierra Honored as Top Supercomputing Achievement — LLNL / HPCwire
- Summit Supercomputer Is Already Making Its Mark on Science — HPCwire
GEMS Massively Multithreaded Graph Runtime (2014–2019)
- Startup Trovares Brings HPC to Graph Analytics — HPCwire
- GEMS: A Framework for Extreme-Scale Graph Analytics — PNNL Science Highlights
- RDF Dictionary Encoding for High-Performance Computing — PNNL Science Highlights
Selected Papers
AI and Machine Learning
High Performance Computing and Systems
- Scaling Semantic Graph Databases in Size and Performance
IEEE Micro, 2014
DOI - In-Memory Graph Databases for Web-Scale Data
IEEE Computer, 2015
DOI - Quantitative Analysis of Operating System Noise
IPDPS 2011
DOI - Evaluating the Impact of TLB Misses on Future HPC Systems
IPDPS 2012
DOI - Scaling Irregular Applications Through Data Aggregation and Software Multithreading
IPDPS 2014 — Best Paper Award
DOI
