Li, Yi ()

I am an Associate Professor in the Division of Mathematical Sciences at Nanyang Technological University, and jointly appointed in the College of Computing and Data Science.

Before joining NTU, I was a postdoc at Harvard University (hosted by Jelani Nelson), a postdoc at the Max-Planck Institute for Informatics in Saarbrücken, Germany and a research fellow at the Simons Institute for the Theory of Computing. I received a PhD from the University of Michigan in 2013 and a BEng from the ACM Class of Shanghai Jiaotong University in 2008, both in computer science and engineering.

Postal address: SPMS-MAS-05-17, Division of Mathematical Sciences, 21 Nanyang Link, Singapore 637371.

Email: yili@ntu.edu.sg

Research Interests

Algorithms for massive data sets, data streaming algorithms
Low-distortion metric embeddings
Compressive sensing and signal processing
Theoretical computer science

Manuscripts

Journal Publications

  1. Zhengyang Guo, Yi Li, and Shaoyu Pei. Expected Size of Random Tukey Layers and Convex Layers.
    Computational Geometry: Theory and Applications 103, Article No. 101856, Apr 2022. pdf
  2. Yi Li, Ruosong Wang, and David Woodruff. Tight Bounds for the Subspace Sketch Problem with Applications.
    SIAM J. Comp. 50(4), pp 1287--1335, 2021. pdf
    (This version supersedes [C19])
  3. Yi Li and Vasileios Nakos. Sublinear-Time Algorithms for Compressive Phase Retrieval.
    IEEE Trans. Info. Theory, 66(11), pp 7302--7310, 2020. pdf
    (This paper supersedes, and is radically different from, [C14])
  4. Yi Li, Huy Le Nguyen, and David Woodruff. On Approximating Matrix Norms in a Stream.
    SIAM J. Comput., 48(6), pp 1643--1697, 2019. pdf
    (This version supersedes [C4], [C9] and part of [C11])
  5. Sudipto Guha, Yi Li, and Qin Zhang. Distributed Partial Clustering.
    ACM Transactions on Parallel Computing 6(3), pp 11:1--11:20, October 2019. (Special issue on SPAA 2017) pdf
    (This version supersedes [C12])
  6. Anna Gilbert, Yi Li, Ely Porat, and Martin Strauss. For-all Sparse Recovery in Near-Optimal Time.
    ACM Transactions on Algorithms 13(3), pp 32:1--32:26, August 2017. pdf
    (This version supersedes [C6])
  7. Petros Boufounos, Volkan Cevher, Anna Gilbert, Yi Li, and Martin Strauss. What's the Frequency, Kenneth?: Sublinear Fourier Sampling Off the Grid.
    Algorithmica 73(2), pp 261-288, 2015. pdf
    (This version supersedes [C2])
    Update: A small tweak in the hashing lemma shows that diluting S^1 by k/η, instead of 1/η, would be enough. The sampling duration can be brought down to 1/η from k/η.
  8. Anna Gilbert, Yi Li, Ely Porat, and Martin Strauss. Approximate Sparse Recovery: Optimizing Time and Measurements.
    SIAM J. Comput. 41(2), pp 436-453, 2012. pdf
    (This version supersedes [C1])

Conference Publications

  1. Aarshvi Gajjar, Wai Ming Tai, Xingyu Xu, Chinmay Hegde, Yi Li, Christopher Musco. Agnostic Active Learning of Single Index Models with Linear Sample Complexity.
    Proceedings of Machine Learning Research 247:1715--1754, 2024. (Proceedings of COLT 2024) arXiv:2405.09312
  2. Sheng-Jun Huang, Yi Li, Yiming Sun, and Ying-Peng Tang. One-shot Active Learning Based on Lewis Weight Sampling for Multiple Deep Models.
    Proceedings of ICLR 2024. (No physical publication) Open Review link arXiv:2405.14121
  3. Yi Li, Honghao Lin, and David Woodruff. Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms.
    Proceedings of ICLR 2024. (No physical publication) Open Review link
  4. Yi Li, Honghao Lin, and David P. Woodruff. p-Regression in the Arbitrary Partition Model of Communication.
    Proceedings of Machine Learning Research 195:4902-4928, 2023. (Proceedings of COLT 2023) arXiv:2307.05117
  5. Yi Li, Honghao Lin, Simin Liu, Ali Vakilian, and David Woodruff. Learning the Positions in CountSketch.
    Proceedings of ICLR 2023. (No physical publication) Open Review link arXiv:2306.06611
  6. Yi Li, Honghao Lin, and David P. Woodruff. The p-Subspace Sketch Problem in Small Dimensions with Applications to Support Vector Machines.
    Proceedings of SODA 2023, pp 850--877. arXiv:2211.07132
  7. Yi Li, Honghao Lin, David Woodruff, and Yuheng Zhang. Streaming Algorithms with Large Approximation Factors.
    Proceedings of APPROX/RANDOM 2022, LIPIcs Vol. 245, pp 13:1--13:23. arXiv:2207.08075
  8. Cheng Chen, Yi Li, and Yiming Sun. Online Active Regression.
    Proceedings of Machine Learning Research 162:3320--3335, 2022. (Proceedings of ICML 2022) Long presentation at ICML 2022.
  9. Yi Li and Mingmou Liu. Lower Bounds for Sparse Oblivious Subspace Embeddings.
    Proceedings of PODS 2022, pp 251--260. arXiv:2112.10987
  10. Yi Li, Yan Song, and Qin Zhang. Learning to Cluster via Same-Cluster Queries.
    Proceedings of CIKM 2021, pp 978--987. arXiv:2108.07383
  11. Yi Li and David P. Woodruff. The Product of Gaussian Matrices is Close to Gaussian.
    Proceedings of APPROX/RANDOM 2021, LIPIcs Vol. 207, pp 35:1--35:22. arXiv:2108.09887
  12. Yi Li, David P. Woodruff, and Taisuke Yasuda. Exponentially Improved Dimensionality Reduction for 1: Subspace Embeddings and Independence Testing.
    Proceedings of Machine Learning Research 134:3111--3195, 2021. (Proceedings of COLT 2021) arXiv:2104.12946
  13. Yifei Jiang, Yi Li, Yiming Sun, Jiaxin Wang, and David Woodruff. Single Pass Entrywise-Transformed Low Rank Approximation.
    Proceedings of Machine Learning Research 139:4982--4991, 2021. (Proceedings of ICML 2021) arxiv:2107.07889
  14. Zhengyang Guo and Yi Li. Geometric Cover with Outliers Removal.
    Proceedings of STACS 2021, LIPIcs Vol. 187, pp 39:1--39:15. pdf
  15. Alexandr Andoni, Collin Burns, Yi Li, Sepideh Mahabadi, and David Woodruff. Streaming Complexity of SVMs.
    Proceedings of APPROX/RANDOM 2020, LIPIcs Vol. 176, pp 50:1--50:22. arxiv:2007.03633
  16. Yi Li, Ruosong Wang, Lin Yang, and Hanrui Zhang. Nearly Linear Row Sampling Algorithm for Quantile Regression.
    Proceedings of Machine Learning Research 119:1888--1898, 2020. (Proceedings of ICML 2020). arxiv:2006.08397
  17. Yi Li and David Woodruff. Input-Sparsity Low Rank Approximation in Schatten Norm.
    Proceedings of Machine Learning Research 119:11124--11132, 2020. (Proceedings of ICML 2020). arxiv:2004.12646
  18. Yi Li and Vasileios Nakos. Deterministic Sparse Fourier Transform with an Guarantee.
    Proceedings of ICALP 2020, LIPIcs Vol. 168, pp 77:1--77:14. arxiv:1903.00995
  19. Tanqiu Jiang, Yi Li, Honghao Lin, Yisong Ruan, and David Woodruff. Learning-Augmented Data Stream Algorithms.
    Proceedings of ICLR 2020. (No physical publication) Open Review link
  20. Yi Li, Ruosong Wang, and David Woodruff. Tight Bounds for the Subspace Sketch Problem with Applications.
    Proceedings of SODA 2020, pp 1655--1674. arxiv:1904.05543
  21. Maria-Florina Balcan, Yi Li, David Woodruff, and Hongyang Zhang. Testing Matrix Rank, Optimally.
    Proceedings of SODA 2019, pp 727--746. arxiv:1810.08171
  22. Yi Li and Vasileios Nakos. Deterministic Heavy Hitters with Sublinear Query Time.
    Proceedings of APPROX/RANDOM 2018, LIPIcs Vol. 116, pp 18:1--18:18. arxiv:1712.01971
  23. Yi Li, Vasileios Nakos, and David Woodruff. On Low-Risk Heavy Hitters and Sparse Recovery Schemes.
    Proceedings of APPROX/RANDOM 2018, LIPIcs Vol. 116, pp 19:1--19:13. arxiv:1709.02919
  24. Vladimir Braverman, Stephen Chestnut, Robert Krauthgamer, Yi Li, David Woodruff, and Lin Yang. Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order.
    Proceedings of Machine Learning Research 80:648-657, 2018. (Proceedings of ICML 2018) arxiv:1609.05885
  25. Yi Li and Vasileios Nakos. Sublinear-Time Algorithms for Compressive Phase Retrieval.
    Proceedings of ISIT 2018, pp 2301--2305.
  26. Yi Li and David Woodruff. Embeddings of Schatten Norms with Applications to Data Streams.
    Proceedings of ICALP 2017, pp 60:1--60:14. pdf
  27. Sudipto Guha, Yi Li, and Qin Zhang. Distributed Partial Clustering.
    Proceedings of SPAA 2017, pp 143--152. Co-winner of the Best Paper award.
  28. Yi Li and David Woodruff. Tight Bounds for Sketching the Operator Norm, Schatten Norms, and Subspace Embeddings.
    Proceedings of APPROX/RANDOM 2016, LIPIcs Vol. 60, 39:1--39:11. arXiv:2202.09797
  29. Yuqing Ai, Wei Hu, Yi Li, and David Woodruff. New Characterizations in Turnstile Streams with Applications.
    Proceedings of CCC 2016, pp 20:1--20:22. pdf
  30. Yi Li and David Woodruff. On Approximating Functions of the Singular Values in a Stream.
    Proceedings of STOC 2016, pp 767--780.
  31. Yi Li, Xiaoming Sun, Chengu Wang, and David Woodruff. On The Communication Complexity of Linear Algebraic Problems in the Message Passing Model.
    Proceedings of DISC 2014, pp 499--513. Full version: arxiv:1407.4755
  32. Yi Li, Zhengyu Wang, and David Woodruff. Improved Testing of Low Rank Matrices.
    Proceedings of SIGKDD 2014, pp 691--700. One of nine best papers.
  33. Anna Gilbert, Yi Li, Ely Porat, and Martin Strauss. For-all Sparse Recovery in Near-Optimal Time.
    Proceedings of ICALP 2014, LNCS 8572, pp 538--550
  34. Yi Li, Huy Le Nguyen, and David Woodruff. Turnstile Streaming Algorithms Might as Well Be Linear Sketches.
    Proceedings of STOC 2014, pp 174--183. pdf
  35. Yi Li, Huy Le Nguyen, and David Woodruff. On Sketching Matrix Norms and the Top Singular Vector.
    Proceedings of SODA 2014, pp 1562--1581.
  36. Yi Li and David Woodruff. A Tight Lower Bound for High Frequency Moment Estimation for Small Error.
    Proceedings of APPROX/RANDOM 2013, LNCS 8906, pp 623--638. pdf of full version
  37. Petros Boufounos, Volkan Cevher, Anna Gilbert, Yi Li, and Martin Strauss. What's the Frequency, Kenneth?: Sublinear Fourier Sampling Off the Grid.
    Proceedings of APPROX/RANDOM 2012, LNCS 7408, pp 61-72.
  38. Anna Gilbert, Yi Li, Ely Porat, and Martin Strauss. Approximate Sparse Recovery: Optimizing Time and Measurements.
    Proceedings of STOC 2010, pp 475-484.

Talks (excluding conference presentations)

  • The lp-Subspace Sketch Problem. National Taiwan University, Dec 2024.
  • Near-optimal Active Regression of Single-Index Models. National Centre for Theoretical Sciences Seminar on Scientific Computing, Taipei, Dec 2024
  • Subspace Embedding and Linear Regression Problems. Shanghai Jiaotong University, Dec 2024.
  • Subspace Embedding and Linear Regression Problems. Oppenheim Workshop 2024 - Machine Learning and AI: Theory and Practice. National University of Singapore, Dec 2024.
  • The lp-Subspace Sketch Problem. Basic Algorithms Research Copenhagen (BARC), University of Copenhagen, Mar 2023.
  • Lower Bounds for Sparse Oblivious Subspace Embeddings. Information Theory and Data Science Workshop. Institute of Mathematical Sciences, NUS, Jan 2023.
  • The lp-Subspace Sketch Problem. National University of Singapore, Dec 2022.
  • Lower Bounds for Sparse Oblivious Subspace Embeddings. Workshop on Algorithms and Foundations for Data Science. Institute of Mathematical Sciences, NUS, Jun 2022.
  • Some Linear Algebraic Problems from TCS Perspective. Shanghai Jiaotong University, Nov 2020.
  • The lp-Subspace Sketch Problem with Applications to Non-embeddability. Xiamen Univeresity, Jun 2019
  • The lp-Subspace Sketch Problem with Applications to Non-embeddability. Shanghai Jiaotong University, May 2019
  • The lp-Subspace Sketch Problem with Applications. Nanjing University, May 2019.
  • Matrix-related Problems in Data Streams. Workshop on Random Matrices, Stochastic Geometry and Related Topics. Institute of Mathematical Sciences, NUS, 2019.
  • Estimating the lp and Schatten Norms in Data Stream Model. NUS Department of Statistics Seminar Talk, 2019.
  • Estimating the Schatten Norms in Streaming Model. Shanghai Jiaotong University, 2018.
  • Estimating the Schatten Norms in Streaming Model. NII Shonan Meeting "Processing Big Data Streams", 2017.
  • Improved Sparse Recovery Algorithms. Dagstuhl Seminar 17181 "Theory and Applications of Hashing", 2017.
  • Sublinear-time Algorithms for the Sparse Recovery Problem. Workshop on Sparse Representations & Compressive Sensing, 2017.
  • Estimating the Schatten Norms in Streaming Model. DIMACS Workshop on E+M=C2, 2017.
  • Introduction to the Data Stream Algorithms. Xiamen University, 2016
  • Introduction to the Sparse Recovery Problem. Fuzhou University, 2016
  • Introduction to the Data Stream Algorithms. Math Colloquium, Nanyang Technological University, 2016.
  • Data Streaming Algorithms. MAS Seminar, Nanyang Technological University, 2015.
  • For-all Sparse Recovery in Near-Optimal Time. Theory of Computation Seminar, Harvard University, 2015.
  • A Brief Introduction to the Sublinear-time Sparse Recovery Problem. MPI for Informatics, 2014.
  • Sublinear Fourier Sampling Off the Grid. Workshop on Sparse Fourier Transform, MIT, 2013.
  • Approximate Sparse Recovery: Optimizing Time and Measurements. Minisymposium on Combinatorics and Data Science, Shanghai Jiaotong University, 2011.
  • Approximate Sparse Recovery: Optimizing Time and Measurements. DIMACS Workshop on Network Data Streaming and Compressive Sensing, 2010.
  • Teaching

    Grants

    Services

    Miscellaneous