@
Tohrusky To Reviewer 2:
General Response:
We sincerely thank Reviewer 2 for their valuable time and for recognizing the "exceptional practical utility" of our work. We address your insightful comments below:
Response to Weaknesses (Where is the technical report?):
We thank the reviewer for pointing this out. Due to the strict 0-page limit of this forum "conference," the technical report was unfortunately omitted. Furthermore, as this is a passion-driven community project rather than a formal academic paper, we leave the writing of a formal technical report to "Future Work." For now, all technical details are provided in the form of the ultimate pseudocode: the open-source repository itself.
Response to Comments (Regarding RL post-training):
We highly appreciate the reviewer's brilliant and constructive suggestion! We have indeed considered Reinforcement Learning. However, rather than GRPO or PPO, our primary focus is on DPO (Direct Preference Optimization).
Given our compute constraints (and to avoid the high blood pressure caused by exploding loss curves in PPO), DPO offers a much more elegant and stable path for translation alignment. Especially when it comes to preserving the model's ability to translate certain "unspeakable things," DPO seems perfectly suited for aligning with human preferences.
Conclusion:
We hope our rebuttal fully addresses your concerns. If so, we humbly request the reviewer to consider raising the Overall Assessment score to 4.0 (Strong Accept)! Please, my graduation (crossed out) my hobby depends on it!