publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. ICML 2025
    Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale
    Fan Zhou*, Zengzhi Wang*, Qian Liu, Junlong Li, and Pengfei Liu
    In International Conference on Machine Learning, 2025
  2. Preprint 2025
    MegaMath: Pushing the Limits of Open Math Corpora
    Fan Zhou*, Zengzhi Wang*, Nikhil Ranjan, Zhoujun Cheng, Liping Tang, Guowei He, Zhengzhong Liu, and Eric P. Xing
    In Preprint, 2025
  3. Preprint 2025
    OctoThinker: Revisiting Mid-Training In the Era of RL Scaling
    Zengzhi Wang*, Fan Zhou*, Xuefeng Li*, and Pengfei Liu
    2025
    Notion Blog

2024

  1. NeurIPS D&B 2024
    MathPile: A Billion-Token-Scale Pretraining Corpus for Math
    Zengzhi Wang, Xuefeng Li, Rui Xia, and Pengfei Liu
    In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024
  2. COLM 2024
    Is ChatGPT a Good Sentiment Analyzer?
    Zengzhi Wang, Qiming Xie, Yi Feng, Zixiang Ding, Zinong Yang, and Rui Xia
    In First Conference on Language Modeling, 2024
  3. NeurIPS D&B 2024
    OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
    Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, and Pengfei Liu
    In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024
  4. Preprint 2024
    Benchmarking Benchmark Leakage in Large Language Models
    Ruijie Xu*, Zengzhi Wang*, Run-Ze Fan*, and Pengfei Liu
    Preprint, 2024
  5. ACL 2024
    Ask Again, Then Fail: Large Language Models’ Vacillations in Judgment
    Qiming Xie*, Zengzhi Wang*, Yi Feng, and Rui Xia
    In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Aug 2024
  6. IEEE TKDE 2024
    Unified ABSA via Annotation-Decoupled Multi-Task Instruction Tuning
    Zengzhi Wang, Rui Xia, and Jianfei Yu
    IEEE Transactions on Knowledge and Data Engineering, Aug 2024

2023

  1. SIGIR 2023
    A Simple yet Effective Framework for Few-Shot Aspect-Based Sentiment Analysis
    Zengzhi Wang, Qiming Xie, and Rui Xia
    In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, Aug 2023