Hello. I am Guijin Son, co-founder at OneLine AI and lead of HAE-RAE, an open source research group focused on Korean NLP. I am interested in AI for Science, with a focus on evaluation and reasoning with language models. My current goal is to build stronger reasoning models and the measurements that prove real progress.
I am currently also interested in multimodal reasoning and agentic systems. Past projects include analyzing Korean knowledge and professional benchmarks, evaluating reward models, and exploring financial applications of LLMs.
I also teach and mentor: lectures at Fast Campus and SSAFY, curriculum work with Codeit and Code States, and mentorship at Upstage.
Yonsei University, Underwood International College (UIC) 2020 – 2025
B.S., Economics.
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought
Guijin Son, Donghun Yang, Hitesh Laxmichand Patel, Amit Agarwal, Hyunwoo Ko, Chanuk Lim, Srikant Panda, Minhyuk Kim, Nikunj Drolia, Dasol Choi, Kyong-Ha Lee, Youngjae Yu
Preprint Under Review
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research
Guijin Son, Jiwoo Hong, Honglu Fan, Heejeong Nam, Hyunwoo Ko, Seungwon Lim, Jinyeop Song, Jinha Choi, Gonçalo Paulo, Youngjae Yu, Stella Biderman
Preprint Under Review
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
Guijin Son*, Dongkeun Yoon*, Juyoung Suk, Javier Aula-Blasco, Mano Aslan, Vu Trong Kim, Shayekh Bin Islam, Jaume Prats-Cristià, Lucía Tormo-Bañuelos, Seungone Kim
Preprint Under Review
From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation
Seokhee Hong, Sunkyoung Kim, Guijin Son, Soyeon Kim, Yeonjung Hong, Jinsik Lee
EMNLP 2025
Multi-LMentry: Can Multilingual LLMs Solve Elementary Tasks Across Languages?
Luca Moroni, Javier Aula-Blasco, Simone Conia, Irene Baucells, Naiara Perez, Silvia Paniagua Suárez, Anna Sallés, Malte Ostendorff, Júlia Falcão, Guijin Son, Aitor Gonzalez-Agirre, Roberto Navigli, Marta Villegas
EMNLP 2025
On the Robustness of Reward Models for Language Model Alignment
Jiwoo Hong, Noah Lee, Eunki Kim, Guijin Son, Woojin Chung, Aman Gupta, Shao Tang, James Thorne
ICML 2025
FINKRX: Establishing Best Practices for Korean Financial NLP
Guijin Son, Hyunwoo Ko, Hanearl Jung, Chami Hwang
ACL 2025 Industry Track
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Guijin Son, Jiwoo Hong, Hyunwoo Ko, James Thorne
ACL 2025 (Oral)
KMMLU: Measuring Massive Multitask Language Understanding in Korean
Guijin Son, Hanwool Lee, Sungdong Kim, Seungone Kim, Niklas Muennighoff, Taekyoon Choi, Cheonbok Park, Kang Min Yoo, Stella Biderman
NAACL 2025
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Cho, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hansoek Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang, Seonghyeon Ye, Bill Yuchen Lin, Sean Welleck, Graham Neubig, Moontae Lee, Kyungjae Lee, Minjoon Seo
NAACL 2025 (Best Paper)
Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
Guijin Son, Sangwon Baek, Sangdae Nam, Ilgyun Jeong, Seungone Kim
ACL 2024
HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models
Guijin Son, Hanwool Lee, Suwan Kim, Huiseo Kim, Jaecheol Lee, Je Won Yeom, Jihyu Jung, Jung Woo Kim, Songseong Kim
LREC-COLING 2024
Full CV in PDF.