JP-TL-Bench: AI Paper Writing

Learn how to leverage AI tools to efficiently write an arXiv paper detailing a novel Japanese/English translation evaluation methodology in a single day.

Overview

Recently, we open-sourced several of the most useful evals we used for developing our Shisa V2 models. One of the most useful was JP-TL-Bench, our Japanese/English translation eval. It’s notable because it introduces a brand new methodology for doing better scoring (combining the discriminative power of pair-wise completions, but avoiding both the quadratic scaling and the score drift that come with normal scoring, like ELO). It’s worth writing a paper about. How can we best use AI to help us write a paper efficiently, without it being complete slop?

Links

https://shisa.ai/posts/jp-tl-bench/
JP-TL-Bench uses anchored pairwise LLM comparison and Bradley-Terry modeling for discriminating JPN-ENG translation.
https://github.com/shisa-ai/jp-tl-bench
https://arxiv.org/abs/2601.00223

Tech stack