Harold Benoit

Github / Linkedin / CV / Google Scholar / Email: (my first name)_(my last name)@hotmail.ch

Currently working on LLMs @ Swiss AI. I also have learning notes that some people find useful.

News:

  • [April 2024] Finished 1st place in the LLM training hackathon.
  • [January 2024] Diversification methods paper accepted at ICLR 2024.

    Experience

  • [2023-2024] Research Associate at VILAB, , supervised by Amir Zamir.
  • [2021-2023] M.Sc. degree in Data Science (ranked 3rd in year) at .
  • [2023] Research Intern at Research, supervised by Mattia Rigotti.
  • [2022] Quantitative Research Intern at G-Research .
  • [2018-2021] B.Sc. degree in Computer Science & Communication Systems at .

  • Original

    me.jpg

    What I enjoy

    • Good engineering, e.g., training deep neural nets and keeping GPUs busy.
    • Good research. Lately, I've done more "data-focused" research, exploring scalable ways to identify or synthetize high-quality data with the intent to render models more general and adaptable to new environments.


    Publications

    Controlled Training Data Generation with Diffusion Models
    Teresa Yeo*, Andrei Atanov*, Harold Benoit^, Aleksandr Alekseev^, Ruchira Ray, Pooya Akhoondi, Amir Zamir
    In review, 2024
    arXiv / Github / project page

    We propose a method to generate tailored synthetic training data, i.e., specifically useful for a given supervised model and target deployment domain. We introduce two feedback mechanisms to guide the generation: 1) model-based and 2) target domain-based.

    Unraveling the Key Components of OOD Generalization via Diversification
    Harold Benoit*, Liangze Jiang*, Andrei Atanov*, Oğuzhan Fatih Kar, Mattia Rigotti, Amir Zamir
    ICLR, 2024
    arXiv / OpenReview

    We distill the critical design factors of current state-of-the-art methods (multi-hypotheses/diversification methods) for spurious correlation situations.