|
</span>预料,很快便溃不成军。</div></div></div><div><h5 class="text-sm font-semibold mb-2 text-muted-foreground">Previous Context</h5><div class="bg-muted/50 p-4 rounded border border-border text-sm"><div>飞奔而去。<br><br>长庚一直盯着他的背影,直到目力无可及,他突然闭了闭眼,几不可闻地喃喃叫了一声:"子熹……"<br><br>一边的侯府侍卫没听清,疑惑道:"殿下说什么?"</div></div></div><div><h5 class="text-sm font-semibold mb-2 text-muted-foreground">Next Context</h5><div class="bg-muted/50 p-4 rounded border border-border text-sm"><div>起鸢楼的笙歌还在绕梁不休,温热的花酒白雾未消,四九城中已经炸了锅。<br><br>谭鸿飞带人逼至宫禁之外,</div></div></div><div><h5 class="text-sm font-semibold mb-2 text-muted-foreground">Glossary</h5><div class="bg-card p-4 rounded border border-border"><table class="w-full text-sm"><thead><tr class="border-b"><th class="text-left pb-2">Term</th><th class="text-left pb-2">Translation</th><th class="text-left pb-2">Gender</th></tr></thead><tbody><tr class="border-b last:border-0"><td class="py-2">长庚</td><td class="py-2">Chang Geng</td><td class="py-2">masculine</td></tr><tr class="border-b last:border-0"><td class="py-2">殿下</td><td class="py-2">Your Highness</td><td class="py-2">neuter</td></tr><tr class="border-b last:border-0"><td class="py-2">顾昀</td><td class="py-2">Gu Yun</td><td class="py-2">masculine</td></tr><tr class="border-b last:border-0"><td class="py-2">谭鸿飞</td><td class="py-2">Tan Hongfei</td><td class="py-2">masculine</td></tr><tr class="border-b last:border-0"><td class="py-2">王国舅</td><td class="py-2">Imperial Uncle Wang</td><td class="py-2">masculine</td></tr><tr class="border-b last:border-0"><td class="py-2">御林军</td><td class="py-2">Imperial Guard</td><td class="py-2">neuter</td></tr><tr class="border-b last:border-0"><td class="py-2">北大营</td><td class="py-2">Northern Camp</td><td class="py-2">neuter</td></tr></tbody></table></div></div></div><div class="space-y-4"><div><h5 class="text-sm font-semibold mb-2 text-muted-foreground">Human Translation</h5><div class="bg-card p-4 rounded border border-border"><div>Chang Geng spun around. "Prepare a brush and paper."<br><br>"Your Highness, your hands..." The guard chased after him.<br><br>Chang Geng paused, picked up Gu Yun's abandoned jar of wine, and, with no change in expression, poured the whole jar of strong liquor over the wounds on his hands. The cuts, which had already begun to scab over, bled again with the rush of liquid. Chang Geng carelessly retrieved a handkerchief from his lapels and wrapped them tight.<br><br>In the capital, no one expected that an old eunuch's death would raise such a storm of controversy.<br><br>The resentment Tan Hongfei had suppressed for twenty years erupted—he had very likely already lost his mind. He first sent soldiers to surround Imperial Uncle Wang's estate. Upon learning that the old bastard had abandoned his wife and children to cower within the palace, he did an about-face and brazenly turned his blade on the Imperial Guard who had rushed to the scene.<br><br>The Imperial Guard and the Northern Camp had always been the last lines of defense for the capital, one within and one without, and the two constantly crossed paths. The Imperial Guard was by and large made up of two groups: young-master soldiers benefitting from nepotism and living off the imperial coffers, and elite soldiers selected from the Northern Camp. The former had already pissed their pants in terror and could not be relied on. The latter were skilled, but, stuck in the impossible position of drawing blades against their maiden family, quickly crumpled. Just as Chang Geng had predicted, in no time at all, the Imperial Guard was defeated.</div></div></div></div></div></div></div></section><section><h3 class="text-xl font-semibold mb-3">Evaluation Approach</h3><p class="text-muted-foreground">Our benchmark uses an ensemble of Large Language Models (LLMs) as judges to evaluate translations. The evaluation is conducted as head-to-head comparisons between machine translations and human translations.</p><p class="text-muted-foreground mt-3">To ensure the highest possible accuracy in our evaluation system, we conducted an extensive calibration experiment:</p><ul class="list-disc pl-6 mt-2 space-y-1 text-muted-foreground"><li>Multiple human annotators evaluated several hundred translation pairs</li><li>We focused on decisive human verdicts—cases where multiple annotators agreed on a clear winner</li><li>This approach addresses the inherently subjective nature of literary translation evaluation, which typically has low inter-annotator agreement</li></ul></section><section><h3 class="text-xl font-semibold mb-3">Judge Ensemble</h3><p class="text-muted-foreground">Our experiments revealed that using multiple frontier LLMs as judges, each evaluating different aspects of translation quality, and then ensembling their verdicts produces the most accurate results.</p><p class="text-muted-foreground mt-3">This ensemble approach achieves 82% accuracy when compared to decisive human judgments. For comparison, a single LLM judge would only achieve approximately 60% accuracy.</p></section><section><h3 class="text-xl font-semibold mb-3">Scoring Methodology</h3><p class="text-muted-foreground">For each evaluation unit, our judge ensemble determines whether the machine translation or the human translation is superior, or if the comparison is too close to call ("not-sure").</p><p class="text-muted-foreground mt-3">Points are assigned as follows:</p><ul class="list-disc pl-6 mt-2 space-y-1 text-muted-foreground"><li>Machine translation wins: 1 point</li><li>Tie or "not-sure": 0.5 points</li><li>Human translation wins: 0 points</li></ul><p class="text-muted-foreground mt-3">The final score for each system is calculated as the average of these points multiplied by 100, representing the system's win rate against human translators. A score of 50 indicates parity with human translation quality.</p></section><section><h3 class="text-xl font-semibold mb-3">Limitations</h3><p class="text-muted-foreground">While our methodology represents a significant advancement in evaluating literary translation, we acknowledge several limitations:</p><ul class="list-disc pl-6 mt-2 space-y-1 text-muted-foreground"><li>Literary translation evaluation is inherently subjective with low inter-annotator agreement</li><li>Our current dataset is limited to Chinese, Japanese, and Korean source languages</li><li>The evaluation focuses on chunk-level translation rather than document-level coherence</li><li>Even with our ensemble approach, there remains an 18% gap between our automated evaluation and decisive human judgment</li></ul></section></div></div></main><footer class="max-w-5xl mx-auto mt-16 pt-8 border-t border-border text-center text-sm text-muted-foreground"><p class="mb-4">Built by the team at <a href="https://readomni.com/" class="underline hover:text-primary" target="_blank" rel="noopener noreferrer">Omni</a></p><div class="flex justify-center space-x-6"><a href="https://discord.gg/M6N69PuMKt" class="hover:text-primary" target="_blank" rel="noopener noreferrer" aria-label="Discord"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="9" cy="12" r="1"></circle><circle cx="15" cy="12" r="1"></circle><path d="M7.5 7.5c3.5-1 5.5-1 9 0"></path><path d="M7 16.5c3.5 1 6.5 1 10 0"></path><path d="M15.5 17c0 1 1.5 3 2 3 1.5 0 2.833-1.667 3.5-3 .667-1.667.5-5.833-1.5-11.5-1.457-1.015-3-1.34-4.5-1.5l-1 2.5"></path><path d="M8.5 17c0 1-1.356 3-1.832 3-1.429 0-2.698-1.667-3.333-3-.635-1.667-.48-5.833 1.428-11.5C6.151 4.485 7.545 4.16 9 4l1 2.5"></path></svg></a><a href="https://twitter.com/readomni" class="hover:text-primary" target="_blank" rel="noopener noreferrer" aria-label="Twitter"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M22 4s-.7 2.1-2 3.4c1.6 10-9.4 17.3-18 11.6 2.2.1 4.4-.6 6-2C3 15.5.5 9.6 3 5c2.2 2.6 5.6 4.1 9 4-.9-4.2 4-6.6 7-3.8 1.1 0 3-1.2 3-1.2z"></path></svg></a><a href="https://readomni.com/" class="hover:text-primary" target="_blank" rel="noopener noreferrer" aria-label="Website"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="10"></circle><line x1="2" y1="12" x2="22" y2="12"></line><path d="M12 2a15.3 15.3 0 0 1 4 10 15.3 15.3 0 0 1-4 10 15.3 15.3 0 0 1-4-10 15.3 15.3 0 0 1 4-10z"></path></svg></a></div></footer></div><script src="/_next/static/chunks/webpack-2849afdb4ff60838.js" async=""></script><script>(self.__next_f=self.__next_f||[]).push([0])</script><script>self.__next_f.push([1,"1:\"$Sreact.fragment\"\n3:I[917,[\"177\",\"static/chunks/app/layout-c3a2fdafdf5923e6.js\"],\"\"]\n4:I[7397,[],\"\"]\n5:I[8513,[],\"\"]\n6:I[4204,[],\"ClientPageRoot\"]\n7:I[1965,[\"475\",\"static/chunks/475-33ddeef05d04a314.js\",\"870\",\"static/chunks/870-b143eb11c68e0d46.js\",\"910\",\"static/chunks/app/methodology/page-9ca0cab8e776add5.js\"],\"default\"]\na:I[3514,[],\"OutletBoundary\"]\nd:I[3514,[],\"ViewportBoundary\"]\nf:I[3514,[],\"MetadataBoundary\"]\n11:I[1612,[],\"\"]\n:HL[\"/_next/static/media/569ce4b8f30dc480-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n:HL[\"/_next/static/media/93f479601ee12b01-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n:HL[\"/_next/static/css/b23dc09eb5d151a0.css\",\"style\"]\n2:T48a,\n (function() {\n try {\n // Check if theme is stored in localStorage\n const storedTheme = localStorage.getItem('theme');\n \n // If theme is stored, use it\n if (storedTheme === 'dark') {\n document.documentElement.classList.add('dark');\n } else if (storedTheme === 'light') {\n document.documentElement.classList.remove('dark');\n } else {\n // Otherwise, check system preference\n const prefersDark = window.matchMedia('(prefers-color-scheme: dark)').matches;\n if (prefersDark) {\n document.documentElement.classList.add('dark');\n } else {\n document.documentElement.classList.remove('dark');\n }\n }\n } catch (e) {\n // Fail silently if localStorage is not available\n console.warn('Failed to access localStorage for theme detection');\n }\n })();\n "])</script><script>self.__next_f.push([1,"0:{\"P\":null,\"b\":\"D0Mx1F72JXTSiJ1ilV3S0\",\"p\":\"\",\"c\":[\"\",\"methodology\"],\"i\":false,\"f\":[[[\"\",{\"children\":[\"methodology\",{\"children\":[\"__PAGE__\",{}]}]},\"$undefined\",\"$undefined\",true],[\"\",[\"$\",\"$1\",\"c\",{\"children\":[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/_next/static/css/b23dc09eb5d151a0.css\",\"precedence\":\"next\",\"crossOrigin\":\"$undefined\",\"nonce\":\"$undefined\"}]],[\"$\",\"html\",null,{\"lang\":\"en\",\"suppressHydrationWarning\":true,\"children\":[[\"$\",\"head\",null,{\"children\":[[\"$\",\"script\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"$2\"}}],[\"$\",\"$L3\",null,{\"id\":\"theme-script\",\"strategy\":\"afterInteractive\",\"children\":\"\\n (function() {\\n // Set up listener for system theme changes\\n const mediaQuery = window.matchMedia('(prefers-color-scheme: dark)');\\n const handleChange = (e) =\u003e {\\n const storedTheme = localStorage.getItem('theme');\\n if (!storedTheme) {\\n document.documentElement.classList.toggle('dark', e.matches);\\n }\\n };\\n \\n mediaQuery.addEventListener('change', handleChange);\\n })();\\n \"}]]}],[\"$\",\"body\",null,{\"className\":\"__variable_4d318d __variable_ea5f4b antialiased bg-white dark:bg-slate-900 text-slate-900 dark:text-slate-100\",\"children\":[\"$\",\"$L4\",null,{\"parallelRouterKey\":\"children\",\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L5\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":[[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":404}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"children\":\"This page could not be found.\"}]}]]}]}]],\"$undefined\",[]],\"forbidden\":\"$undefined\",\"unauthorized\":\"$undefined\"}]}]]}]]}],{\"children\":[\"methodology\",[\"$\",\"$1\",\"c\",{\"children\":[null,[\"$\",\"$L4\",null,{\"parallelRouterKey\":\"children\",\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L5\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":\"$undefined\",\"forbidden\":\"$undefined\",\"unauthorized\":\"$undefined\"}]]}],{\"children\":[\"__PAGE__\",[\"$\",\"$1\",\"c\",{\"children\":[[\"$\",\"$L6\",null,{\"Component\":\"$7\",\"searchParams\":{},\"params\":{},\"promises\":[\"$@8\",\"$@9\"]}],\"$undefined\",null,[\"$\",\"$La\",null,{\"children\":[\"$Lb\",\"$Lc\",null]}]]}],{},null,false]},null,false]},null,false],[\"$\",\"$1\",\"h\",{\"children\":[null,[\"$\",\"$1\",\"k8sk7xXia9gXVmlhksnsy\",{\"children\":[[\"$\",\"$Ld\",null,{\"children\":\"$Le\"}],[\"$\",\"meta\",null,{\"name\":\"next-size-adjust\",\"content\":\"\"}]]}],[\"$\",\"$Lf\",null,{\"children\":\"$L10\"}]]}],false]],\"m\":\"$undefined\",\"G\":[\"$11\",\"$undefined\"],\"s\":false,\"S\":true}\n"])</script><script>self.__next_f.push([1,"8:{}\n9:{}\n"])</script><script>self.__next_f.push([1,"e:[[\"$\",\"meta\",\"0\",{\"charSet\":\"utf-8\"}],[\"$\",\"meta\",\"1\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}]]\nb:null\n"])</script><script>self.__next_f.push([1,"c:null\n"])</script><script>self.__next_f.push([1,"10:[[\"$\",\"title\",\"0\",{\"children\":\"LiTERatE - Literary Translation Evaluation and Rating Ensemble\"}],[\"$\",\"meta\",\"1\",{\"name\":\"description\",\"content\":\"A benchmark for evaluating machine translation systems on literary text from Chinese, Japanese, and Korean languages using an ensemble of LLM judges.\"}],[\"$\",\"meta\",\"2\",{\"name\":\"author\",\"content\":\"LiTERatE Team\"}],[\"$\",\"meta\",\"3\",{\"name\":\"keywords\",\"content\":\"machine translation,literary translation,benchmark,NLP,CJK languages,Chinese translation,Japanese translation,Korean translation,LLM evaluation\"}],[\"$\",\"meta\",\"4\",{\"name\":\"creator\",\"content\":\"LiTERatE Team\"}],[\"$\",\"meta\",\"5\",{\"name\":\"publisher\",\"content\":\"LiTERatE\"}],[\"$\",\"meta\",\"6\",{\"property\":\"og:title\",\"content\":\"LiTERatE - Literary Translation Evaluation and Rating Ensemble\"}],[\"$\",\"meta\",\"7\",{\"property\":\"og:description\",\"content\":\"A benchmark for evaluating machine translation systems on literary text from Chinese, Japanese, and Korean languages.\"}],[\"$\",\"meta\",\"8\",{\"property\":\"og:url\",\"content\":\"https://literate.readomni.com\"}],[\"$\",\"meta\",\"9\",{\"property\":\"og:site_name\",\"content\":\"LiTERatE Benchmark\"}],[\"$\",\"meta\",\"10\",{\"property\":\"og:locale\",\"content\":\"en_US\"}],[\"$\",\"meta\",\"11\",{\"property\":\"og:image\",\"content\":\"https://literate.readomni.com/images/og-image.svg\"}],[\"$\",\"meta\",\"12\",{\"property\":\"og:image:width\",\"content\":\"1200\"}],[\"$\",\"meta\",\"13\",{\"property\":\"og:image:height\",\"content\":\"630\"}],[\"$\",\"meta\",\"14\",{\"property\":\"og:image:alt\",\"content\":\"LiTERatE - Literary Translation Benchmark\"}],[\"$\",\"meta\",\"15\",{\"property\":\"og:type\",\"content\":\"website\"}],[\"$\",\"meta\",\"16\",{\"name\":\"twitter:card\",\"content\":\"summary_large_image\"}],[\"$\",\"meta\",\"17\",{\"name\":\"twitter:title\",\"content\":\"LiTERatE - Literary Translation Evaluation and Rating Ensemble\"}],[\"$\",\"meta\",\"18\",{\"name\":\"twitter:description\",\"content\":\"A benchmark for evaluating machine translation systems on literary text from Chinese, Japanese, and Korean languages.\"}],[\"$\",\"meta\",\"19\",{\"name\":\"twitter:image\",\"content\":\"https://literate.readomni.com/images/og-image.svg\"}],[\"$\",\"link\",\"20\",{\"rel\":\"icon\",\"href\":\"/literate-logo.svg\"}]]\n"])</script></body></html> |