[{"data":1,"prerenderedAt":18139},["ShallowReactive",2],{"post-\u002Fblog\u002F2025\u002F2025-05-08-revisit-the-transformer":3},{"id":4,"title":5,"body":6,"cardimage":18123,"description":18124,"draft":18125,"enableComment":18126,"extension":18127,"image":18128,"meta":18129,"navigation":18126,"onday":18130,"path":18131,"seo":18132,"stem":18133,"summary":18134,"tags":18135,"__hash__":18138},"blog\u002Fblog\u002F2025\u002F2025-05-08-revisit-the-transformer.md","重读「Attention is All You Need」",{"type":7,"value":8,"toc":18109},"minimark",[9,24,27,30,34,173,180,183,249,259,262,295,298,306,309,312,860,863,1367,1370,1373,1478,1795,1935,2316,2319,2351,2354,2392,2395,2407,2413,2416,2419,2455,2458,2498,2684,2876,2879,3149,3228,3827,3902,4349,4352,4355,4361,4364,4367,4370,4375,4766,5080,5154,5157,5391,5594,5597,5782,6020,6023,6029,6098,6295,6779,7429,7432,7435,7438,7441,7455,7458,7491,7907,7910,7913,7916,8157,8189,8355,8606,8669,8847,8924,9203,9206,9385,9554,9846,9849,10044,10048,10241,10465,10734,10766,10961,11069,11261,11264,11511,11515,11756,11954,12057,12060,12314,12317,12328,12332,12335,12338,12393,12404,12407,12415,12418,13020,13123,13129,13445,13582,13586,13927,13933,14115,14118,14711,14716,15362,15365,16008,16011,16328,16620,16624,16639,16642,16645,16665,16668,17237,17262,17265,17268,17487,17498,18106],[10,11,12,13,23],"p",{},"在 2017 年 Ashish Vaswani 等人在 NeurIPS 发表了 ",[14,15,16],"em",{},[17,18,22],"a",{"href":19,"rel":20},"https:\u002F\u002Fproceedings.neurips.cc\u002Fpaper_files\u002Fpaper\u002F2017\u002Ffile\u002F3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf",[21],"nofollow","Attention is all you need"," 。这篇文章首次提出注意力机制进行序列建模，尽管 attention 机制在当初为了解决序列串行训练中的低效问题，实现了在大规模 GPU 上的并行训练，并在多年后成为了现代 LLM 的基础。",[10,25,26],{},"但说实话，这篇论文的文笔其实并不怎么样，文辞和图示也比较晦涩，笔者其实也吐槽很久了，它并不符合一般读者的习惯，更不适合新手入门。",[10,28,29],{},"在本文中，我会按照 NLP 发展的脉络，一步一步拆解 Transformer 机制中“为什么需要这么设计”，以及“每个公式都是怎么得出来的”。尽管如此，你仍然需要具备基本的线性代数、机器学习常识。",[31,32,33],"h2",{"id":33},"语义与含义",[10,35,36,37,168,169],{},"在机器学习中的流形假设中，人的思维空间是一个高维欧式空间，词汇可以通过向量 ",[38,39,43,81],"span",{"className":40,"translate":42},[41],"katex","no",[38,44,47],{"className":45},[46],"katex-mathml",[48,49,51],"math",{"xmlns":50},"http:\u002F\u002Fwww.w3.org\u002F1998\u002FMath\u002FMathML",[52,53,54,76],"semantics",{},[55,56,57,62,66],"mrow",{},[58,59,61],"mi",{"mathvariant":60},"bold-italic","v",[63,64,65],"mo",{},"∈",[67,68,69,73],"msup",{},[58,70,72],{"mathvariant":71},"double-struck","R",[58,74,75],{},"d",[77,78,80],"annotation",{"encoding":79},"application\u002Fx-tex","\\boldsymbol{v} \\in \\mathbb{R}^d",[38,82,86,119],{"className":83,"ariaHidden":85},[84],"katex-html","true",[38,87,90,95,107,112,116],{"className":88},[89],"base",[38,91],{"className":92,"style":94},[93],"strut","height:0.5782em;vertical-align:-0.0391em;",[38,96,99],{"className":97},[98],"mord",[38,100,102],{"className":101},[98],[38,103,61],{"className":104,"style":106},[98,105],"boldsymbol","margin-right:0.03704em;",[38,108],{"className":109,"style":111},[110],"mspace","margin-right:0.2778em;",[38,113,65],{"className":114},[115],"mrel",[38,117],{"className":118,"style":111},[110],[38,120,122,126],{"className":121},[89],[38,123],{"className":124,"style":125},[93],"height:0.8491em;",[38,127,129,133],{"className":128},[98],[38,130,72],{"className":131},[98,132],"mathbb",[38,134,137],{"className":135},[136],"msupsub",[38,138,141],{"className":139},[140],"vlist-t",[38,142,145],{"className":143},[144],"vlist-r",[38,146,149],{"className":147,"style":125},[148],"vlist",[38,150,152,157],{"style":151},"top:-3.063em;margin-right:0.05em;",[38,153],{"className":154,"style":156},[155],"pstrut","height:2.7em;",[38,158,164],{"className":159},[160,161,162,163],"sizing","reset-size6","size3","mtight",[38,165,75],{"className":166},[98,167,163],"mathnormal"," 表示。而一个词的语义往往是丰富的，即一个词可能含有多种语义。我们可以认为，",[170,171,172],"strong",{},"一个词的含义是多种语义的组合。",[10,174,175,176,179],{},"在思维空间里，我们把含义与词汇的概念等同。可以理解为，",[170,177,178],{},"维度是语义的单元。也是组成含义的最小单元","。",[10,181,182],{},"为了简化说明，我们考虑一个最简单的情况：描述一个颜色可以使用三原色（R,G,B），此时语义可以划分成三种相互正交的维度：红色度、绿色度、蓝色度。",[10,184,185,186,248],{},"我们可以建立一个三维空间 ",[38,187,189,209],{"className":188,"translate":42},[41],[38,190,192],{"className":191},[46],[48,193,194],{"xmlns":50},[52,195,196,206],{},[55,197,198],{},[67,199,200,202],{},[58,201,72],{"mathvariant":71},[203,204,205],"mn",{},"3",[77,207,208],{"encoding":79},"\\mathbb{R}^3",[38,210,212],{"className":211,"ariaHidden":85},[84],[38,213,215,219],{"className":214},[89],[38,216],{"className":217,"style":218},[93],"height:0.8141em;",[38,220,222,225],{"className":221},[98],[38,223,72],{"className":224},[98,132],[38,226,228],{"className":227},[136],[38,229,231],{"className":230},[140],[38,232,234],{"className":233},[144],[38,235,237],{"className":236,"style":218},[148],[38,238,239,242],{"style":151},[38,240],{"className":241,"style":156},[155],[38,243,245],{"className":244},[160,161,162,163],[38,246,205],{"className":247},[98,163],"，一个描述颜色的词汇，就包含三种最简单的语义：R、G、B。可以得出一些简单的颜色词汇的向量表示：",[250,251,252],"blockquote",{},[10,253,254,255,258],{},"纯红 (255,0,0)      纯绿 (0,255,0)  ",[256,257],"br",{},"\n粉色 (255,192,203)  天蓝 (135,206,235)",[10,260,261],{},"R\\G\\B 的三种维度的语义组成了一个颜色的含义。",[10,263,264,265,294],{},"向量中各个维度中的数字的组合构成了含义，也就能表达一个词汇，一个词汇表达 ",[38,266,268,281],{"className":267,"translate":42},[41],[38,269,271],{"className":270},[46],[48,272,273],{"xmlns":50},[52,274,275,279],{},[55,276,277],{},[58,278,75],{},[77,280,75],{"encoding":79},[38,282,284],{"className":283,"ariaHidden":85},[84],[38,285,287,291],{"className":286},[89],[38,288],{"className":289,"style":290},[93],"height:0.6944em;",[38,292,75],{"className":293},[98,167]," 种语义。词汇是离散的，所以含义是「分布式表示」的。",[10,296,297],{},"在此基础上，提出两种假设：",[250,299,300],{},[10,301,302,303,305],{},"多个含义的加权组合可以合成一个含义。  ",[256,304],{},"\n一个含义可以分解成多个含义的加权组合。",[10,307,308],{},"对于第两种假设很好理解，例如皇后（Queen），含义为「国王的妻子」。",[10,310,311],{},"从含义分解视角，可以形式化表示为：",[38,313,316],{"className":314,"translate":42},[315],"katex-display",[38,317,319,408],{"className":318,"translate":42},[41],[38,320,322],{"className":321},[46],[48,323,325],{"xmlns":50,"display":324},"block",[52,326,327,405],{},[55,328,329,338,341,349,352,359,362,369,371,378,381,384,390,392,398,402],{},[330,331,332,334],"msub",{},[58,333,61],{"mathvariant":60},[335,336,337],"mtext",{},"queen",[63,339,340],{},"=",[330,342,343,346],{},[58,344,345],{},"w",[203,347,348],{},"1",[63,350,351],{"separator":85},"⋅",[330,353,354,356],{},[58,355,61],{"mathvariant":60},[335,357,358],{},"king",[63,360,361],{},"+",[330,363,364,366],{},[58,365,345],{},[203,367,368],{},"2",[63,370,351],{"separator":85},[330,372,373,375],{},[58,374,61],{"mathvariant":60},[335,376,377],{},"wife",[63,379,380],{"separator":85},",",[110,382],{"width":383},"1em",[330,385,386,388],{},[58,387,345],{},[203,389,348],{},[63,391,351],{"separator":85},[330,393,394,396],{},[58,395,345],{},[203,397,368],{},[63,399,401],{"mathvariant":400},"normal","≠",[203,403,404],{},"0",[77,406,407],{"encoding":79},"\\boldsymbol{v}_\\text{queen} = w_1 · \\boldsymbol{v}_\\text{king} + w_2 · \\boldsymbol{v}_\\text{wife}, \\quad w_1 · w_2 \\neq 0",[38,409,411,483,603,850],{"className":410,"ariaHidden":85},[84],[38,412,414,418,474,477,480],{"className":413},[89],[38,415],{"className":416,"style":417},[93],"height:0.7305em;vertical-align:-0.2861em;",[38,419,421,430],{"className":420},[98],[38,422,424],{"className":423},[98],[38,425,427],{"className":426},[98],[38,428,61],{"className":429,"style":106},[98,105],[38,431,433],{"className":432},[136],[38,434,437,465],{"className":435},[140,436],"vlist-t2",[38,438,440,460],{"className":439},[144],[38,441,444],{"className":442,"style":443},[148],"height:0.1514em;",[38,445,447,450],{"style":446},"top:-2.55em;margin-right:0.05em;",[38,448],{"className":449,"style":156},[155],[38,451,453],{"className":452},[160,161,162,163],[38,454,457],{"className":455},[98,456,163],"text",[38,458,337],{"className":459},[98,163],[38,461,464],{"className":462},[463],"vlist-s","​",[38,466,468],{"className":467},[144],[38,469,472],{"className":470,"style":471},[148],"height:0.2861em;",[38,473],{},[38,475],{"className":476,"style":111},[110],[38,478,340],{"className":479},[115],[38,481],{"className":482,"style":111},[110],[38,484,486,490,534,538,542,592,596,600],{"className":485},[89],[38,487],{"className":488,"style":489},[93],"height:0.8694em;vertical-align:-0.2861em;",[38,491,493,497],{"className":492},[98],[38,494,345],{"className":495,"style":496},[98,167],"margin-right:0.02691em;",[38,498,500],{"className":499},[136],[38,501,503,525],{"className":502},[140,436],[38,504,506,522],{"className":505},[144],[38,507,510],{"className":508,"style":509},[148],"height:0.3011em;",[38,511,513,516],{"style":512},"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;",[38,514],{"className":515,"style":156},[155],[38,517,519],{"className":518},[160,161,162,163],[38,520,348],{"className":521},[98,163],[38,523,464],{"className":524},[463],[38,526,528],{"className":527},[144],[38,529,532],{"className":530,"style":531},[148],"height:0.15em;",[38,533],{},[38,535,351],{"className":536},[537],"mpunct",[38,539],{"className":540,"style":541},[110],"margin-right:0.1667em;",[38,543,545,554],{"className":544},[98],[38,546,548],{"className":547},[98],[38,549,551],{"className":550},[98],[38,552,61],{"className":553,"style":106},[98,105],[38,555,557],{"className":556},[136],[38,558,560,584],{"className":559},[140,436],[38,561,563,581],{"className":562},[144],[38,564,567],{"className":565,"style":566},[148],"height:0.3361em;",[38,568,569,572],{"style":446},[38,570],{"className":571,"style":156},[155],[38,573,575],{"className":574},[160,161,162,163],[38,576,578],{"className":577},[98,456,163],[38,579,358],{"className":580},[98,163],[38,582,464],{"className":583},[463],[38,585,587],{"className":586},[144],[38,588,590],{"className":589,"style":471},[148],[38,591],{},[38,593],{"className":594,"style":595},[110],"margin-right:0.2222em;",[38,597,361],{"className":598},[599],"mbin",[38,601],{"className":602,"style":595},[110],[38,604,606,610,650,653,656,705,708,712,715,755,758,761,801,804,847],{"className":605},[89],[38,607],{"className":608,"style":609},[93],"height:0.8889em;vertical-align:-0.1944em;",[38,611,613,616],{"className":612},[98],[38,614,345],{"className":615,"style":496},[98,167],[38,617,619],{"className":618},[136],[38,620,622,642],{"className":621},[140,436],[38,623,625,639],{"className":624},[144],[38,626,628],{"className":627,"style":509},[148],[38,629,630,633],{"style":512},[38,631],{"className":632,"style":156},[155],[38,634,636],{"className":635},[160,161,162,163],[38,637,368],{"className":638},[98,163],[38,640,464],{"className":641},[463],[38,643,645],{"className":644},[144],[38,646,648],{"className":647,"style":531},[148],[38,649],{},[38,651,351],{"className":652},[537],[38,654],{"className":655,"style":541},[110],[38,657,659,668],{"className":658},[98],[38,660,662],{"className":661},[98],[38,663,665],{"className":664},[98],[38,666,61],{"className":667,"style":106},[98,105],[38,669,671],{"className":670},[136],[38,672,674,697],{"className":673},[140,436],[38,675,677,694],{"className":676},[144],[38,678,680],{"className":679,"style":566},[148],[38,681,682,685],{"style":446},[38,683],{"className":684,"style":156},[155],[38,686,688],{"className":687},[160,161,162,163],[38,689,691],{"className":690},[98,456,163],[38,692,377],{"className":693},[98,163],[38,695,464],{"className":696},[463],[38,698,700],{"className":699},[144],[38,701,703],{"className":702,"style":531},[148],[38,704],{},[38,706,380],{"className":707},[537],[38,709],{"className":710,"style":711},[110],"margin-right:1em;",[38,713],{"className":714,"style":541},[110],[38,716,718,721],{"className":717},[98],[38,719,345],{"className":720,"style":496},[98,167],[38,722,724],{"className":723},[136],[38,725,727,747],{"className":726},[140,436],[38,728,730,744],{"className":729},[144],[38,731,733],{"className":732,"style":509},[148],[38,734,735,738],{"style":512},[38,736],{"className":737,"style":156},[155],[38,739,741],{"className":740},[160,161,162,163],[38,742,348],{"className":743},[98,163],[38,745,464],{"className":746},[463],[38,748,750],{"className":749},[144],[38,751,753],{"className":752,"style":531},[148],[38,754],{},[38,756,351],{"className":757},[537],[38,759],{"className":760,"style":541},[110],[38,762,764,767],{"className":763},[98],[38,765,345],{"className":766,"style":496},[98,167],[38,768,770],{"className":769},[136],[38,771,773,793],{"className":772},[140,436],[38,774,776,790],{"className":775},[144],[38,777,779],{"className":778,"style":509},[148],[38,780,781,784],{"style":512},[38,782],{"className":783,"style":156},[155],[38,785,787],{"className":786},[160,161,162,163],[38,788,368],{"className":789},[98,163],[38,791,464],{"className":792},[463],[38,794,796],{"className":795},[144],[38,797,799],{"className":798,"style":531},[148],[38,800],{},[38,802],{"className":803,"style":111},[110],[38,805,807,840,844],{"className":806},[115],[38,808,810],{"className":809},[115],[38,811,814],{"className":812},[98,813],"vbox",[38,815,818],{"className":816},[817],"thinbox",[38,819,822,825,836],{"className":820},[821],"rlap",[38,823],{"className":824,"style":609},[93],[38,826,829],{"className":827},[828],"inner",[38,830,832],{"className":831},[98],[38,833,835],{"className":834},[115],"",[38,837],{"className":838},[839],"fix",[38,841],{"className":842},[110,843],"nobreak",[38,845,340],{"className":846},[115],[38,848],{"className":849,"style":111},[110],[38,851,853,857],{"className":852},[89],[38,854],{"className":855,"style":856},[93],"height:0.6444em;",[38,858,404],{"className":859},[98],[10,861,862],{},"或者反过来，得到合成视角：",[38,864,866],{"className":865,"translate":42},[315],[38,867,869,941],{"className":868,"translate":42},[41],[38,870,872],{"className":871},[46],[48,873,874],{"xmlns":50,"display":324},[52,875,876,938],{},[55,877,878,884,886,892,894,900,902,908,910,916,918,920,926,928,934,936],{},[330,879,880,882],{},[58,881,345],{},[203,883,348],{},[63,885,351],{"separator":85},[330,887,888,890],{},[58,889,61],{"mathvariant":60},[335,891,358],{},[63,893,361],{},[330,895,896,898],{},[58,897,345],{},[203,899,368],{},[63,901,351],{"separator":85},[330,903,904,906],{},[58,905,61],{"mathvariant":60},[335,907,377],{},[63,909,340],{},[330,911,912,914],{},[58,913,61],{"mathvariant":60},[335,915,337],{},[63,917,380],{"separator":85},[110,919],{"width":383},[330,921,922,924],{},[58,923,345],{},[203,925,348],{},[63,927,351],{"separator":85},[330,929,930,932],{},[58,931,345],{},[203,933,368],{},[63,935,401],{"mathvariant":400},[203,937,404],{},[77,939,940],{"encoding":79},"w_1 · \\boldsymbol{v}_\\text{king} + w_2 · \\boldsymbol{v}_\\text{wife} = \\boldsymbol{v}_\\text{queen}, \\quad w_1 · w_2 \\neq 0",[38,942,944,1054,1165,1358],{"className":943,"ariaHidden":85},[84],[38,945,947,950,990,993,996,1045,1048,1051],{"className":946},[89],[38,948],{"className":949,"style":489},[93],[38,951,953,956],{"className":952},[98],[38,954,345],{"className":955,"style":496},[98,167],[38,957,959],{"className":958},[136],[38,960,962,982],{"className":961},[140,436],[38,963,965,979],{"className":964},[144],[38,966,968],{"className":967,"style":509},[148],[38,969,970,973],{"style":512},[38,971],{"className":972,"style":156},[155],[38,974,976],{"className":975},[160,161,162,163],[38,977,348],{"className":978},[98,163],[38,980,464],{"className":981},[463],[38,983,985],{"className":984},[144],[38,986,988],{"className":987,"style":531},[148],[38,989],{},[38,991,351],{"className":992},[537],[38,994],{"className":995,"style":541},[110],[38,997,999,1008],{"className":998},[98],[38,1000,1002],{"className":1001},[98],[38,1003,1005],{"className":1004},[98],[38,1006,61],{"className":1007,"style":106},[98,105],[38,1009,1011],{"className":1010},[136],[38,1012,1014,1037],{"className":1013},[140,436],[38,1015,1017,1034],{"className":1016},[144],[38,1018,1020],{"className":1019,"style":566},[148],[38,1021,1022,1025],{"style":446},[38,1023],{"className":1024,"style":156},[155],[38,1026,1028],{"className":1027},[160,161,162,163],[38,1029,1031],{"className":1030},[98,456,163],[38,1032,358],{"className":1033},[98,163],[38,1035,464],{"className":1036},[463],[38,1038,1040],{"className":1039},[144],[38,1041,1043],{"className":1042,"style":471},[148],[38,1044],{},[38,1046],{"className":1047,"style":595},[110],[38,1049,361],{"className":1050},[599],[38,1052],{"className":1053,"style":595},[110],[38,1055,1057,1061,1101,1104,1107,1156,1159,1162],{"className":1056},[89],[38,1058],{"className":1059,"style":1060},[93],"height:0.5945em;vertical-align:-0.15em;",[38,1062,1064,1067],{"className":1063},[98],[38,1065,345],{"className":1066,"style":496},[98,167],[38,1068,1070],{"className":1069},[136],[38,1071,1073,1093],{"className":1072},[140,436],[38,1074,1076,1090],{"className":1075},[144],[38,1077,1079],{"className":1078,"style":509},[148],[38,1080,1081,1084],{"style":512},[38,1082],{"className":1083,"style":156},[155],[38,1085,1087],{"className":1086},[160,161,162,163],[38,1088,368],{"className":1089},[98,163],[38,1091,464],{"className":1092},[463],[38,1094,1096],{"className":1095},[144],[38,1097,1099],{"className":1098,"style":531},[148],[38,1100],{},[38,1102,351],{"className":1103},[537],[38,1105],{"className":1106,"style":541},[110],[38,1108,1110,1119],{"className":1109},[98],[38,1111,1113],{"className":1112},[98],[38,1114,1116],{"className":1115},[98],[38,1117,61],{"className":1118,"style":106},[98,105],[38,1120,1122],{"className":1121},[136],[38,1123,1125,1148],{"className":1124},[140,436],[38,1126,1128,1145],{"className":1127},[144],[38,1129,1131],{"className":1130,"style":566},[148],[38,1132,1133,1136],{"style":446},[38,1134],{"className":1135,"style":156},[155],[38,1137,1139],{"className":1138},[160,161,162,163],[38,1140,1142],{"className":1141},[98,456,163],[38,1143,377],{"className":1144},[98,163],[38,1146,464],{"className":1147},[463],[38,1149,1151],{"className":1150},[144],[38,1152,1154],{"className":1153,"style":531},[148],[38,1155],{},[38,1157],{"className":1158,"style":111},[110],[38,1160,340],{"className":1161},[115],[38,1163],{"className":1164,"style":111},[110],[38,1166,1168,1172,1221,1224,1227,1230,1270,1273,1276,1316,1319,1355],{"className":1167},[89],[38,1169],{"className":1170,"style":1171},[93],"height:0.9805em;vertical-align:-0.2861em;",[38,1173,1175,1184],{"className":1174},[98],[38,1176,1178],{"className":1177},[98],[38,1179,1181],{"className":1180},[98],[38,1182,61],{"className":1183,"style":106},[98,105],[38,1185,1187],{"className":1186},[136],[38,1188,1190,1213],{"className":1189},[140,436],[38,1191,1193,1210],{"className":1192},[144],[38,1194,1196],{"className":1195,"style":443},[148],[38,1197,1198,1201],{"style":446},[38,1199],{"className":1200,"style":156},[155],[38,1202,1204],{"className":1203},[160,161,162,163],[38,1205,1207],{"className":1206},[98,456,163],[38,1208,337],{"className":1209},[98,163],[38,1211,464],{"className":1212},[463],[38,1214,1216],{"className":1215},[144],[38,1217,1219],{"className":1218,"style":471},[148],[38,1220],{},[38,1222,380],{"className":1223},[537],[38,1225],{"className":1226,"style":711},[110],[38,1228],{"className":1229,"style":541},[110],[38,1231,1233,1236],{"className":1232},[98],[38,1234,345],{"className":1235,"style":496},[98,167],[38,1237,1239],{"className":1238},[136],[38,1240,1242,1262],{"className":1241},[140,436],[38,1243,1245,1259],{"className":1244},[144],[38,1246,1248],{"className":1247,"style":509},[148],[38,1249,1250,1253],{"style":512},[38,1251],{"className":1252,"style":156},[155],[38,1254,1256],{"className":1255},[160,161,162,163],[38,1257,348],{"className":1258},[98,163],[38,1260,464],{"className":1261},[463],[38,1263,1265],{"className":1264},[144],[38,1266,1268],{"className":1267,"style":531},[148],[38,1269],{},[38,1271,351],{"className":1272},[537],[38,1274],{"className":1275,"style":541},[110],[38,1277,1279,1282],{"className":1278},[98],[38,1280,345],{"className":1281,"style":496},[98,167],[38,1283,1285],{"className":1284},[136],[38,1286,1288,1308],{"className":1287},[140,436],[38,1289,1291,1305],{"className":1290},[144],[38,1292,1294],{"className":1293,"style":509},[148],[38,1295,1296,1299],{"style":512},[38,1297],{"className":1298,"style":156},[155],[38,1300,1302],{"className":1301},[160,161,162,163],[38,1303,368],{"className":1304},[98,163],[38,1306,464],{"className":1307},[463],[38,1309,1311],{"className":1310},[144],[38,1312,1314],{"className":1313,"style":531},[148],[38,1315],{},[38,1317],{"className":1318,"style":111},[110],[38,1320,1322,1349,1352],{"className":1321},[115],[38,1323,1325],{"className":1324},[115],[38,1326,1328],{"className":1327},[98,813],[38,1329,1331],{"className":1330},[817],[38,1332,1334,1337,1346],{"className":1333},[821],[38,1335],{"className":1336,"style":609},[93],[38,1338,1340],{"className":1339},[828],[38,1341,1343],{"className":1342},[98],[38,1344,835],{"className":1345},[115],[38,1347],{"className":1348},[839],[38,1350],{"className":1351},[110,843],[38,1353,340],{"className":1354},[115],[38,1356],{"className":1357,"style":111},[110],[38,1359,1361,1364],{"className":1360},[89],[38,1362],{"className":1363,"style":856},[93],[38,1365,404],{"className":1366},[98],[10,1368,1369],{},"由此观之，一种含义既可以由多个其他含义进行加权组合，也可以和其他含义加权组合成新的含义。",[10,1371,1372],{},"现在，我们把视角从词汇推广到一个句子。句子和词汇也是一样，也可以看做成「一个含义的表达」。",[10,1374,1375,1376,1477],{},"一个句子由多种词汇的有序序列 ",[38,1377,1379,1410],{"className":1378,"translate":42},[41],[38,1380,1382],{"className":1381},[46],[48,1383,1384],{"xmlns":50},[52,1385,1386,1407],{},[55,1387,1388,1391,1393],{},[58,1389,1390],{},"V",[63,1392,65],{},[67,1394,1395,1397],{},[58,1396,72],{"mathvariant":71},[55,1398,1399,1402,1405],{},[58,1400,1401],{},"n",[63,1403,1404],{},"×",[58,1406,75],{},[77,1408,1409],{"encoding":79},"V \\in \\mathbb{R}^{n \\times d}",[38,1411,1413,1433],{"className":1412,"ariaHidden":85},[84],[38,1414,1416,1420,1424,1427,1430],{"className":1415},[89],[38,1417],{"className":1418,"style":1419},[93],"height:0.7224em;vertical-align:-0.0391em;",[38,1421,1390],{"className":1422,"style":1423},[98,167],"margin-right:0.22222em;",[38,1425],{"className":1426,"style":111},[110],[38,1428,65],{"className":1429},[115],[38,1431],{"className":1432,"style":111},[110],[38,1434,1436,1439],{"className":1435},[89],[38,1437],{"className":1438,"style":125},[93],[38,1440,1442,1445],{"className":1441},[98],[38,1443,72],{"className":1444},[98,132],[38,1446,1448],{"className":1447},[136],[38,1449,1451],{"className":1450},[140],[38,1452,1454],{"className":1453},[144],[38,1455,1457],{"className":1456,"style":125},[148],[38,1458,1459,1462],{"style":151},[38,1460],{"className":1461,"style":156},[155],[38,1463,1465],{"className":1464},[160,161,162,163],[38,1466,1468,1471,1474],{"className":1467},[98,163],[38,1469,1401],{"className":1470},[98,167,163],[38,1472,1404],{"className":1473},[599,163],[38,1475,75],{"className":1476},[98,167,163]," 组成。为了简化说明，我们先暂时忽略掉它们的顺序。我们有：",[38,1479,1481],{"className":1480,"translate":42},[315],[38,1482,1484,1546],{"className":1483,"translate":42},[41],[38,1485,1487],{"className":1486},[46],[48,1488,1489],{"xmlns":50,"display":324},[52,1490,1491,1543],{},[55,1492,1493,1495,1497,1501,1507,1509,1515,1517,1523,1525,1528,1530,1532,1534,1540],{},[58,1494,1390],{},[63,1496,340],{},[63,1498,1500],{"stretchy":1499},"false","(",[330,1502,1503,1505],{},[58,1504,61],{"mathvariant":60},[203,1506,348],{},[63,1508,380],{"separator":85},[330,1510,1511,1513],{},[58,1512,61],{"mathvariant":60},[203,1514,368],{},[63,1516,380],{"separator":85},[330,1518,1519,1521],{},[58,1520,61],{"mathvariant":60},[203,1522,205],{},[63,1524,380],{"separator":85},[58,1526,1527],{"mathvariant":400},".",[58,1529,1527],{"mathvariant":400},[58,1531,1527],{"mathvariant":400},[63,1533,380],{"separator":85},[330,1535,1536,1538],{},[58,1537,61],{"mathvariant":60},[58,1539,1401],{},[63,1541,1542],{"stretchy":1499},")",[77,1544,1545],{"encoding":79},"V = (\\boldsymbol{v}_1,\\boldsymbol{v}_2,\\boldsymbol{v}_3,...,\\boldsymbol{v}_n)",[38,1547,1549,1568],{"className":1548,"ariaHidden":85},[84],[38,1550,1552,1556,1559,1562,1565],{"className":1551},[89],[38,1553],{"className":1554,"style":1555},[93],"height:0.6833em;",[38,1557,1390],{"className":1558,"style":1423},[98,167],[38,1560],{"className":1561,"style":111},[110],[38,1563,340],{"className":1564},[115],[38,1566],{"className":1567,"style":111},[110],[38,1569,1571,1575,1579,1625,1628,1631,1677,1680,1683,1729,1732,1735,1739,1742,1745,1791],{"className":1570},[89],[38,1572],{"className":1573,"style":1574},[93],"height:1em;vertical-align:-0.25em;",[38,1576,1500],{"className":1577},[1578],"mopen",[38,1580,1582,1591],{"className":1581},[98],[38,1583,1585],{"className":1584},[98],[38,1586,1588],{"className":1587},[98],[38,1589,61],{"className":1590,"style":106},[98,105],[38,1592,1594],{"className":1593},[136],[38,1595,1597,1617],{"className":1596},[140,436],[38,1598,1600,1614],{"className":1599},[144],[38,1601,1603],{"className":1602,"style":509},[148],[38,1604,1605,1608],{"style":446},[38,1606],{"className":1607,"style":156},[155],[38,1609,1611],{"className":1610},[160,161,162,163],[38,1612,348],{"className":1613},[98,163],[38,1615,464],{"className":1616},[463],[38,1618,1620],{"className":1619},[144],[38,1621,1623],{"className":1622,"style":531},[148],[38,1624],{},[38,1626,380],{"className":1627},[537],[38,1629],{"className":1630,"style":541},[110],[38,1632,1634,1643],{"className":1633},[98],[38,1635,1637],{"className":1636},[98],[38,1638,1640],{"className":1639},[98],[38,1641,61],{"className":1642,"style":106},[98,105],[38,1644,1646],{"className":1645},[136],[38,1647,1649,1669],{"className":1648},[140,436],[38,1650,1652,1666],{"className":1651},[144],[38,1653,1655],{"className":1654,"style":509},[148],[38,1656,1657,1660],{"style":446},[38,1658],{"className":1659,"style":156},[155],[38,1661,1663],{"className":1662},[160,161,162,163],[38,1664,368],{"className":1665},[98,163],[38,1667,464],{"className":1668},[463],[38,1670,1672],{"className":1671},[144],[38,1673,1675],{"className":1674,"style":531},[148],[38,1676],{},[38,1678,380],{"className":1679},[537],[38,1681],{"className":1682,"style":541},[110],[38,1684,1686,1695],{"className":1685},[98],[38,1687,1689],{"className":1688},[98],[38,1690,1692],{"className":1691},[98],[38,1693,61],{"className":1694,"style":106},[98,105],[38,1696,1698],{"className":1697},[136],[38,1699,1701,1721],{"className":1700},[140,436],[38,1702,1704,1718],{"className":1703},[144],[38,1705,1707],{"className":1706,"style":509},[148],[38,1708,1709,1712],{"style":446},[38,1710],{"className":1711,"style":156},[155],[38,1713,1715],{"className":1714},[160,161,162,163],[38,1716,205],{"className":1717},[98,163],[38,1719,464],{"className":1720},[463],[38,1722,1724],{"className":1723},[144],[38,1725,1727],{"className":1726,"style":531},[148],[38,1728],{},[38,1730,380],{"className":1731},[537],[38,1733],{"className":1734,"style":541},[110],[38,1736,1738],{"className":1737},[98],"...",[38,1740,380],{"className":1741},[537],[38,1743],{"className":1744,"style":541},[110],[38,1746,1748,1757],{"className":1747},[98],[38,1749,1751],{"className":1750},[98],[38,1752,1754],{"className":1753},[98],[38,1755,61],{"className":1756,"style":106},[98,105],[38,1758,1760],{"className":1759},[136],[38,1761,1763,1783],{"className":1762},[140,436],[38,1764,1766,1780],{"className":1765},[144],[38,1767,1769],{"className":1768,"style":443},[148],[38,1770,1771,1774],{"style":446},[38,1772],{"className":1773,"style":156},[155],[38,1775,1777],{"className":1776},[160,161,162,163],[38,1778,1401],{"className":1779},[98,167,163],[38,1781,464],{"className":1782},[463],[38,1784,1786],{"className":1785},[144],[38,1787,1789],{"className":1788,"style":531},[148],[38,1790],{},[38,1792,1542],{"className":1793},[1794],"mclose",[10,1796,1797,1798,1896,1897,1934],{},"一个句子也可以表达一种含义。设权重矩阵为 ",[38,1799,1801,1830],{"className":1800,"translate":42},[41],[38,1802,1804],{"className":1803},[46],[48,1805,1806],{"xmlns":50},[52,1807,1808,1827],{},[55,1809,1810,1813,1815],{},[58,1811,1812],{},"W",[63,1814,65],{},[67,1816,1817,1819],{},[58,1818,72],{"mathvariant":71},[55,1820,1821,1823,1825],{},[58,1822,1401],{},[63,1824,1404],{},[203,1826,348],{},[77,1828,1829],{"encoding":79},"W \\in \\mathbb{R}^{n \\times 1}",[38,1831,1833,1852],{"className":1832,"ariaHidden":85},[84],[38,1834,1836,1839,1843,1846,1849],{"className":1835},[89],[38,1837],{"className":1838,"style":1419},[93],[38,1840,1812],{"className":1841,"style":1842},[98,167],"margin-right:0.13889em;",[38,1844],{"className":1845,"style":111},[110],[38,1847,65],{"className":1848},[115],[38,1850],{"className":1851,"style":111},[110],[38,1853,1855,1858],{"className":1854},[89],[38,1856],{"className":1857,"style":218},[93],[38,1859,1861,1864],{"className":1860},[98],[38,1862,72],{"className":1863},[98,132],[38,1865,1867],{"className":1866},[136],[38,1868,1870],{"className":1869},[140],[38,1871,1873],{"className":1872},[144],[38,1874,1876],{"className":1875,"style":218},[148],[38,1877,1878,1881],{"style":151},[38,1879],{"className":1880,"style":156},[155],[38,1882,1884],{"className":1883},[160,161,162,163],[38,1885,1887,1890,1893],{"className":1886},[98,163],[38,1888,1401],{"className":1889},[98,167,163],[38,1891,1404],{"className":1892},[599,163],[38,1894,348],{"className":1895},[98,163],"，那么此时，句子的含义可以写作各个含义向量的加权组合的向量 ",[38,1898,1900,1915],{"className":1899,"translate":42},[41],[38,1901,1903],{"className":1902},[46],[48,1904,1905],{"xmlns":50},[52,1906,1907,1912],{},[55,1908,1909],{},[58,1910,1911],{"mathvariant":60},"s",[77,1913,1914],{"encoding":79},"\\boldsymbol{s}",[38,1916,1918],{"className":1917,"ariaHidden":85},[84],[38,1919,1921,1925],{"className":1920},[89],[38,1922],{"className":1923,"style":1924},[93],"height:0.4444em;",[38,1926,1928],{"className":1927},[98],[38,1929,1931],{"className":1930},[98],[38,1932,1911],{"className":1933},[98,105]," ：",[38,1936,1938],{"className":1937,"translate":42},[315],[38,1939,1941,2009],{"className":1940,"translate":42},[41],[38,1942,1944],{"className":1943},[46],[48,1945,1946],{"xmlns":50,"display":324},[52,1947,1948,2006],{},[55,1949,1950,1952,1954,1961,1963,1965,1967,1984,1990,1992,1998,2000],{},[58,1951,1911],{"mathvariant":60},[63,1953,340],{},[67,1955,1956,1958],{},[58,1957,1390],{},[58,1959,1960],{},"T",[63,1962,351],{"separator":85},[58,1964,1812],{},[63,1966,340],{},[1968,1969,1970,1973,1982],"munderover",{},[63,1971,1972],{},"∑",[55,1974,1975,1978,1980],{},[58,1976,1977],{},"i",[63,1979,340],{},[203,1981,348],{},[58,1983,1401],{},[330,1985,1986,1988],{},[58,1987,345],{},[58,1989,1977],{},[63,1991,351],{"separator":85},[330,1993,1994,1996],{},[58,1995,61],{"mathvariant":60},[58,1997,1977],{},[63,1999,65],{},[67,2001,2002,2004],{},[58,2003,72],{"mathvariant":71},[58,2005,75],{},[77,2007,2008],{"encoding":79},"\\boldsymbol{s} = V^T · W = \\sum_{i=1}^{n} w_i · \\boldsymbol{v}_i \\in \\mathbb{R}^d",[38,2010,2012,2036,2091,2280],{"className":2011,"ariaHidden":85},[84],[38,2013,2015,2018,2027,2030,2033],{"className":2014},[89],[38,2016],{"className":2017,"style":1924},[93],[38,2019,2021],{"className":2020},[98],[38,2022,2024],{"className":2023},[98],[38,2025,1911],{"className":2026},[98,105],[38,2028],{"className":2029,"style":111},[110],[38,2031,340],{"className":2032},[115],[38,2034],{"className":2035,"style":111},[110],[38,2037,2039,2043,2073,2076,2079,2082,2085,2088],{"className":2038},[89],[38,2040],{"className":2041,"style":2042},[93],"height:0.8913em;",[38,2044,2046,2049],{"className":2045},[98],[38,2047,1390],{"className":2048,"style":1423},[98,167],[38,2050,2052],{"className":2051},[136],[38,2053,2055],{"className":2054},[140],[38,2056,2058],{"className":2057},[144],[38,2059,2061],{"className":2060,"style":2042},[148],[38,2062,2064,2067],{"style":2063},"top:-3.113em;margin-right:0.05em;",[38,2065],{"className":2066,"style":156},[155],[38,2068,2070],{"className":2069},[160,161,162,163],[38,2071,1960],{"className":2072,"style":1842},[98,167,163],[38,2074,351],{"className":2075},[537],[38,2077],{"className":2078,"style":541},[110],[38,2080,1812],{"className":2081,"style":1842},[98,167],[38,2083],{"className":2084,"style":111},[110],[38,2086,340],{"className":2087},[115],[38,2089],{"className":2090,"style":111},[110],[38,2092,2094,2098,2175,2178,2219,2222,2225,2271,2274,2277],{"className":2093},[89],[38,2095],{"className":2096,"style":2097},[93],"height:2.9291em;vertical-align:-1.2777em;",[38,2099,2103],{"className":2100},[2101,2102],"mop","op-limits",[38,2104,2106,2166],{"className":2105},[140,436],[38,2107,2109,2163],{"className":2108},[144],[38,2110,2113,2135,2148],{"className":2111,"style":2112},[148],"height:1.6514em;",[38,2114,2116,2120],{"style":2115},"top:-1.8723em;margin-left:0em;",[38,2117],{"className":2118,"style":2119},[155],"height:3.05em;",[38,2121,2123],{"className":2122},[160,161,162,163],[38,2124,2126,2129,2132],{"className":2125},[98,163],[38,2127,1977],{"className":2128},[98,167,163],[38,2130,340],{"className":2131},[115,163],[38,2133,348],{"className":2134},[98,163],[38,2136,2138,2141],{"style":2137},"top:-3.05em;",[38,2139],{"className":2140,"style":2119},[155],[38,2142,2143],{},[38,2144,1972],{"className":2145},[2101,2146,2147],"op-symbol","large-op",[38,2149,2151,2154],{"style":2150},"top:-4.3em;margin-left:0em;",[38,2152],{"className":2153,"style":2119},[155],[38,2155,2157],{"className":2156},[160,161,162,163],[38,2158,2160],{"className":2159},[98,163],[38,2161,1401],{"className":2162},[98,167,163],[38,2164,464],{"className":2165},[463],[38,2167,2169],{"className":2168},[144],[38,2170,2173],{"className":2171,"style":2172},[148],"height:1.2777em;",[38,2174],{},[38,2176],{"className":2177,"style":541},[110],[38,2179,2181,2184],{"className":2180},[98],[38,2182,345],{"className":2183,"style":496},[98,167],[38,2185,2187],{"className":2186},[136],[38,2188,2190,2211],{"className":2189},[140,436],[38,2191,2193,2208],{"className":2192},[144],[38,2194,2197],{"className":2195,"style":2196},[148],"height:0.3117em;",[38,2198,2199,2202],{"style":512},[38,2200],{"className":2201,"style":156},[155],[38,2203,2205],{"className":2204},[160,161,162,163],[38,2206,1977],{"className":2207},[98,167,163],[38,2209,464],{"className":2210},[463],[38,2212,2214],{"className":2213},[144],[38,2215,2217],{"className":2216,"style":531},[148],[38,2218],{},[38,2220,351],{"className":2221},[537],[38,2223],{"className":2224,"style":541},[110],[38,2226,2228,2237],{"className":2227},[98],[38,2229,2231],{"className":2230},[98],[38,2232,2234],{"className":2233},[98],[38,2235,61],{"className":2236,"style":106},[98,105],[38,2238,2240],{"className":2239},[136],[38,2241,2243,2263],{"className":2242},[140,436],[38,2244,2246,2260],{"className":2245},[144],[38,2247,2249],{"className":2248,"style":2196},[148],[38,2250,2251,2254],{"style":446},[38,2252],{"className":2253,"style":156},[155],[38,2255,2257],{"className":2256},[160,161,162,163],[38,2258,1977],{"className":2259},[98,167,163],[38,2261,464],{"className":2262},[463],[38,2264,2266],{"className":2265},[144],[38,2267,2269],{"className":2268,"style":531},[148],[38,2270],{},[38,2272],{"className":2273,"style":111},[110],[38,2275,65],{"className":2276},[115],[38,2278],{"className":2279,"style":111},[110],[38,2281,2283,2287],{"className":2282},[89],[38,2284],{"className":2285,"style":2286},[93],"height:0.8991em;",[38,2288,2290,2293],{"className":2289},[98],[38,2291,72],{"className":2292},[98,132],[38,2294,2296],{"className":2295},[136],[38,2297,2299],{"className":2298},[140],[38,2300,2302],{"className":2301},[144],[38,2303,2305],{"className":2304,"style":2286},[148],[38,2306,2307,2310],{"style":2063},[38,2308],{"className":2309,"style":156},[155],[38,2311,2313],{"className":2312},[160,161,162,163],[38,2314,75],{"className":2315},[98,167,163],[10,2317,2318],{},"同理，以上过程还可以推广到一个段落、一篇文章、一本文集。",[10,2320,2321,2322,2350],{},"我们也可以知道，如果我们能把一个含义分解的语义颗粒度越细，那么这个词汇能表达的含义也就越细腻精确，即维度 ",[38,2323,2325,2338],{"className":2324,"translate":42},[41],[38,2326,2328],{"className":2327},[46],[48,2329,2330],{"xmlns":50},[52,2331,2332,2336],{},[55,2333,2334],{},[58,2335,75],{},[77,2337,75],{"encoding":79},[38,2339,2341],{"className":2340,"ariaHidden":85},[84],[38,2342,2344,2347],{"className":2343},[89],[38,2345],{"className":2346,"style":290},[93],[38,2348,75],{"className":2349},[98,167]," 越高，词汇的含义表达越强。",[31,2352,2353],{"id":2353},"上下文关联",[10,2355,2356,2357,2391],{},"在上面的演示中，我们假设的是一个词汇对应着一个唯一的向量，表达的是唯一的含义，所以我们忽略了句子 ",[38,2358,2360,2373],{"className":2359,"translate":42},[41],[38,2361,2363],{"className":2362},[46],[48,2364,2365],{"xmlns":50},[52,2366,2367,2371],{},[55,2368,2369],{},[58,2370,1911],{"mathvariant":60},[77,2372,1914],{"encoding":79},[38,2374,2376],{"className":2375,"ariaHidden":85},[84],[38,2377,2379,2382],{"className":2378},[89],[38,2380],{"className":2381,"style":1924},[93],[38,2383,2385],{"className":2384},[98],[38,2386,2388],{"className":2387},[98],[38,2389,1911],{"className":2390},[98,105]," 里的词向量顺序。",[10,2393,2394],{},"但是实际上，一个句子中，同一个词，在不同的位置，也可能具有不同的含义。所以，词汇的含义和其所在的序列顺序也是有关的。举个例子：",[250,2396,2397],{},[10,2398,2399,2400,2403,2404,2406],{},"I went to the ",[170,2401,2402],{},"bank"," to sit by the ",[170,2405,2402],{}," of the river.",[10,2408,2409,2410,2412],{},"第一个 bank 指的是银行（金融机构），第二个 bank 指的是河岸（一处地点）。同一个词 bank 却有两种截然不同的含义，而造就含义不同的原因，就是因为它们所在的句子中词序位置不同。这就是 ",[170,2411,2353],{},"，指的是一个词的含义会随着序列的位置变化而变化。位置变了，含义也就变了。",[10,2414,2415],{},"此时，“一个词汇对应着一个唯一的向量，表达的是唯一的含义”这个假设就完全不适用了。此时就变成了“一个句子中，词的含义不仅和本身有关，还和其位置有关”。",[10,2417,2418],{},"我们必须做点什么，把位置信息也加入到词汇向量里面。",[10,2420,2421,2422,2425,2426,2454],{},"我们再细想一下，",[170,2423,2424],{},"其实「词序位置」本身也是一种含义","，同样也可以像词向量那样，用 ",[38,2427,2429,2442],{"className":2428,"translate":42},[41],[38,2430,2432],{"className":2431},[46],[48,2433,2434],{"xmlns":50},[52,2435,2436,2440],{},[55,2437,2438],{},[58,2439,75],{},[77,2441,75],{"encoding":79},[38,2443,2445],{"className":2444,"ariaHidden":85},[84],[38,2446,2448,2451],{"className":2447},[89],[38,2449],{"className":2450,"style":290},[93],[38,2452,75],{"className":2453},[98,167]," 维向量来表达，也可以与词向量进行加权组合，形成新的含义。",[10,2456,2457],{},"假如没有位置编码，那么一个词就只能有一种含义，就无法处理一个句子中“一词多义”的情况。",[10,2459,2460,2461,2497],{},"我们为词汇位置也定义一种含义，可以用向量 ",[38,2462,2464,2478],{"className":2463,"translate":42},[41],[38,2465,2467],{"className":2466},[46],[48,2468,2469],{"xmlns":50},[52,2470,2471,2475],{},[55,2472,2473],{},[58,2474,10],{"mathvariant":60},[77,2476,2477],{"encoding":79},"\\boldsymbol{p}",[38,2479,2481],{"className":2480,"ariaHidden":85},[84],[38,2482,2484,2488],{"className":2483},[89],[38,2485],{"className":2486,"style":2487},[93],"height:0.6389em;vertical-align:-0.1944em;",[38,2489,2491],{"className":2490},[98],[38,2492,2494],{"className":2493},[98],[38,2495,10],{"className":2496},[98,105]," 表示。所以，一个词的含义加上位置的含义，就组成了一个新的含义：",[38,2499,2501],{"className":2500,"translate":42},[315],[38,2502,2504,2536],{"className":2503,"translate":42},[41],[38,2505,2507],{"className":2506},[46],[48,2508,2509],{"xmlns":50,"display":324},[52,2510,2511,2533],{},[55,2512,2513,2520,2522,2524,2526],{},[330,2514,2515,2517],{},[58,2516,61],{"mathvariant":60},[335,2518,2519],{},"new meaning",[63,2521,340],{},[58,2523,10],{"mathvariant":60},[63,2525,361],{},[330,2527,2528,2530],{},[58,2529,61],{"mathvariant":60},[335,2531,2532],{},"old meaning",[77,2534,2535],{"encoding":79},"\\boldsymbol{v}_\\text{new meaning} = \\boldsymbol{p} + \\boldsymbol{v}_\\text{old meaning}",[38,2537,2539,2604,2629],{"className":2538,"ariaHidden":85},[84],[38,2540,2542,2545,2595,2598,2601],{"className":2541},[89],[38,2543],{"className":2544,"style":417},[93],[38,2546,2548,2557],{"className":2547},[98],[38,2549,2551],{"className":2550},[98],[38,2552,2554],{"className":2553},[98],[38,2555,61],{"className":2556,"style":106},[98,105],[38,2558,2560],{"className":2559},[136],[38,2561,2563,2587],{"className":2562},[140,436],[38,2564,2566,2584],{"className":2565},[144],[38,2567,2570],{"className":2568,"style":2569},[148],"height:0.3175em;",[38,2571,2572,2575],{"style":446},[38,2573],{"className":2574,"style":156},[155],[38,2576,2578],{"className":2577},[160,161,162,163],[38,2579,2581],{"className":2580},[98,456,163],[38,2582,2519],{"className":2583},[98,163],[38,2585,464],{"className":2586},[463],[38,2588,2590],{"className":2589},[144],[38,2591,2593],{"className":2592,"style":471},[148],[38,2594],{},[38,2596],{"className":2597,"style":111},[110],[38,2599,340],{"className":2600},[115],[38,2602],{"className":2603,"style":111},[110],[38,2605,2607,2611,2620,2623,2626],{"className":2606},[89],[38,2608],{"className":2609,"style":2610},[93],"height:0.7778em;vertical-align:-0.1944em;",[38,2612,2614],{"className":2613},[98],[38,2615,2617],{"className":2616},[98],[38,2618,10],{"className":2619},[98,105],[38,2621],{"className":2622,"style":595},[110],[38,2624,361],{"className":2625},[599],[38,2627],{"className":2628,"style":595},[110],[38,2630,2632,2635],{"className":2631},[89],[38,2633],{"className":2634,"style":417},[93],[38,2636,2638,2647],{"className":2637},[98],[38,2639,2641],{"className":2640},[98],[38,2642,2644],{"className":2643},[98],[38,2645,61],{"className":2646,"style":106},[98,105],[38,2648,2650],{"className":2649},[136],[38,2651,2653,2676],{"className":2652},[140,436],[38,2654,2656,2673],{"className":2655},[144],[38,2657,2659],{"className":2658,"style":566},[148],[38,2660,2661,2664],{"style":446},[38,2662],{"className":2663,"style":156},[155],[38,2665,2667],{"className":2666},[160,161,162,163],[38,2668,2670],{"className":2669},[98,456,163],[38,2671,2532],{"className":2672},[98,163],[38,2674,464],{"className":2675},[463],[38,2677,2679],{"className":2678},[144],[38,2680,2682],{"className":2681,"style":471},[148],[38,2683],{},[10,2685,2686,2687,2716,2717,2794,2795,2875],{},"同理，在一个句子中，我们为每个位置 ",[38,2688,2690,2703],{"className":2689,"translate":42},[41],[38,2691,2693],{"className":2692},[46],[48,2694,2695],{"xmlns":50},[52,2696,2697,2701],{},[55,2698,2699],{},[58,2700,1977],{},[77,2702,1977],{"encoding":79},[38,2704,2706],{"className":2705,"ariaHidden":85},[84],[38,2707,2709,2713],{"className":2708},[89],[38,2710],{"className":2711,"style":2712},[93],"height:0.6595em;",[38,2714,1977],{"className":2715},[98,167]," 的词汇 ",[38,2718,2720,2738],{"className":2719,"translate":42},[41],[38,2721,2723],{"className":2722},[46],[48,2724,2725],{"xmlns":50},[52,2726,2727,2735],{},[55,2728,2729],{},[330,2730,2731,2733],{},[58,2732,61],{"mathvariant":60},[58,2734,1977],{},[77,2736,2737],{"encoding":79},"\\boldsymbol{v}_i",[38,2739,2741],{"className":2740,"ariaHidden":85},[84],[38,2742,2744,2748],{"className":2743},[89],[38,2745],{"className":2746,"style":2747},[93],"height:0.5944em;vertical-align:-0.15em;",[38,2749,2751,2760],{"className":2750},[98],[38,2752,2754],{"className":2753},[98],[38,2755,2757],{"className":2756},[98],[38,2758,61],{"className":2759,"style":106},[98,105],[38,2761,2763],{"className":2762},[136],[38,2764,2766,2786],{"className":2765},[140,436],[38,2767,2769,2783],{"className":2768},[144],[38,2770,2772],{"className":2771,"style":2196},[148],[38,2773,2774,2777],{"style":446},[38,2775],{"className":2776,"style":156},[155],[38,2778,2780],{"className":2779},[160,161,162,163],[38,2781,1977],{"className":2782},[98,167,163],[38,2784,464],{"className":2785},[463],[38,2787,2789],{"className":2788},[144],[38,2790,2792],{"className":2791,"style":531},[148],[38,2793],{}," ，其位置上的含义为 ",[38,2796,2798,2816],{"className":2797,"translate":42},[41],[38,2799,2801],{"className":2800},[46],[48,2802,2803],{"xmlns":50},[52,2804,2805,2813],{},[55,2806,2807],{},[330,2808,2809,2811],{},[58,2810,10],{"mathvariant":60},[58,2812,1977],{},[77,2814,2815],{"encoding":79},"\\boldsymbol{p}_i",[38,2817,2819],{"className":2818,"ariaHidden":85},[84],[38,2820,2822,2826],{"className":2821},[89],[38,2823],{"className":2824,"style":2825},[93],"height:0.6886em;vertical-align:-0.2441em;",[38,2827,2829,2838],{"className":2828},[98],[38,2830,2832],{"className":2831},[98],[38,2833,2835],{"className":2834},[98],[38,2836,10],{"className":2837},[98,105],[38,2839,2841],{"className":2840},[136],[38,2842,2844,2866],{"className":2843},[140,436],[38,2845,2847,2863],{"className":2846},[144],[38,2848,2851],{"className":2849,"style":2850},[148],"height:0.2175em;",[38,2852,2854,2857],{"style":2853},"top:-2.4559em;margin-right:0.05em;",[38,2855],{"className":2856,"style":156},[155],[38,2858,2860],{"className":2859},[160,161,162,163],[38,2861,1977],{"className":2862},[98,167,163],[38,2864,464],{"className":2865},[463],[38,2867,2869],{"className":2868},[144],[38,2870,2873],{"className":2871,"style":2872},[148],"height:0.2441em;",[38,2874],{},"，加权组合后，我们就可以为这个句子里每一个个词汇都建立起上下文关联。",[10,2877,2878],{},"在原文中，使用的是直接相加的组合。",[38,2880,2882],{"className":2881,"translate":42},[315],[38,2883,2885,2933],{"className":2884,"translate":42},[41],[38,2886,2888],{"className":2887},[46],[48,2889,2890],{"xmlns":50,"display":324},[52,2891,2892,2930],{},[55,2893,2894,2896,2898,2912,2914,2920,2922,2928],{},[58,2895,1911],{"mathvariant":60},[63,2897,340],{},[1968,2899,2900,2902,2910],{},[63,2901,1972],{},[55,2903,2904,2906,2908],{},[58,2905,1977],{},[63,2907,340],{},[203,2909,348],{},[58,2911,1401],{},[63,2913,1500],{"stretchy":1499},[330,2915,2916,2918],{},[58,2917,61],{"mathvariant":60},[58,2919,1977],{},[63,2921,361],{},[330,2923,2924,2926],{},[58,2925,10],{"mathvariant":60},[58,2927,1977],{},[63,2929,1542],{"stretchy":1499},[77,2931,2932],{"encoding":79},"\\boldsymbol{s} = \\sum_{i=1}^{n} (\\boldsymbol{v}_i + \\boldsymbol{p}_{i})",[38,2934,2936,2960,3091],{"className":2935,"ariaHidden":85},[84],[38,2937,2939,2942,2951,2954,2957],{"className":2938},[89],[38,2940],{"className":2941,"style":1924},[93],[38,2943,2945],{"className":2944},[98],[38,2946,2948],{"className":2947},[98],[38,2949,1911],{"className":2950},[98,105],[38,2952],{"className":2953,"style":111},[110],[38,2955,340],{"className":2956},[115],[38,2958],{"className":2959,"style":111},[110],[38,2961,2963,2966,3033,3036,3082,3085,3088],{"className":2962},[89],[38,2964],{"className":2965,"style":2097},[93],[38,2967,2969],{"className":2968},[2101,2102],[38,2970,2972,3025],{"className":2971},[140,436],[38,2973,2975,3022],{"className":2974},[144],[38,2976,2978,2998,3008],{"className":2977,"style":2112},[148],[38,2979,2980,2983],{"style":2115},[38,2981],{"className":2982,"style":2119},[155],[38,2984,2986],{"className":2985},[160,161,162,163],[38,2987,2989,2992,2995],{"className":2988},[98,163],[38,2990,1977],{"className":2991},[98,167,163],[38,2993,340],{"className":2994},[115,163],[38,2996,348],{"className":2997},[98,163],[38,2999,3000,3003],{"style":2137},[38,3001],{"className":3002,"style":2119},[155],[38,3004,3005],{},[38,3006,1972],{"className":3007},[2101,2146,2147],[38,3009,3010,3013],{"style":2150},[38,3011],{"className":3012,"style":2119},[155],[38,3014,3016],{"className":3015},[160,161,162,163],[38,3017,3019],{"className":3018},[98,163],[38,3020,1401],{"className":3021},[98,167,163],[38,3023,464],{"className":3024},[463],[38,3026,3028],{"className":3027},[144],[38,3029,3031],{"className":3030,"style":2172},[148],[38,3032],{},[38,3034,1500],{"className":3035},[1578],[38,3037,3039,3048],{"className":3038},[98],[38,3040,3042],{"className":3041},[98],[38,3043,3045],{"className":3044},[98],[38,3046,61],{"className":3047,"style":106},[98,105],[38,3049,3051],{"className":3050},[136],[38,3052,3054,3074],{"className":3053},[140,436],[38,3055,3057,3071],{"className":3056},[144],[38,3058,3060],{"className":3059,"style":2196},[148],[38,3061,3062,3065],{"style":446},[38,3063],{"className":3064,"style":156},[155],[38,3066,3068],{"className":3067},[160,161,162,163],[38,3069,1977],{"className":3070},[98,167,163],[38,3072,464],{"className":3073},[463],[38,3075,3077],{"className":3076},[144],[38,3078,3080],{"className":3079,"style":531},[148],[38,3081],{},[38,3083],{"className":3084,"style":595},[110],[38,3086,361],{"className":3087},[599],[38,3089],{"className":3090,"style":595},[110],[38,3092,3094,3097,3146],{"className":3093},[89],[38,3095],{"className":3096,"style":1574},[93],[38,3098,3100,3109],{"className":3099},[98],[38,3101,3103],{"className":3102},[98],[38,3104,3106],{"className":3105},[98],[38,3107,10],{"className":3108},[98,105],[38,3110,3112],{"className":3111},[136],[38,3113,3115,3138],{"className":3114},[140,436],[38,3116,3118,3135],{"className":3117},[144],[38,3119,3121],{"className":3120,"style":2850},[148],[38,3122,3123,3126],{"style":2853},[38,3124],{"className":3125,"style":156},[155],[38,3127,3129],{"className":3128},[160,161,162,163],[38,3130,3132],{"className":3131},[98,163],[38,3133,1977],{"className":3134},[98,167,163],[38,3136,464],{"className":3137},[463],[38,3139,3141],{"className":3140},[144],[38,3142,3144],{"className":3143,"style":2872},[148],[38,3145],{},[38,3147,1542],{"className":3148},[1794],[10,3150,3151,3152,3227],{},"在原文中， ",[38,3153,3155,3172],{"className":3154,"translate":42},[41],[38,3156,3158],{"className":3157},[46],[48,3159,3160],{"xmlns":50},[52,3161,3162,3170],{},[55,3163,3164],{},[330,3165,3166,3168],{},[58,3167,10],{"mathvariant":60},[58,3169,1977],{},[77,3171,2815],{"encoding":79},[38,3173,3175],{"className":3174,"ariaHidden":85},[84],[38,3176,3178,3181],{"className":3177},[89],[38,3179],{"className":3180,"style":2825},[93],[38,3182,3184,3193],{"className":3183},[98],[38,3185,3187],{"className":3186},[98],[38,3188,3190],{"className":3189},[98],[38,3191,10],{"className":3192},[98,105],[38,3194,3196],{"className":3195},[136],[38,3197,3199,3219],{"className":3198},[140,436],[38,3200,3202,3216],{"className":3201},[144],[38,3203,3205],{"className":3204,"style":2850},[148],[38,3206,3207,3210],{"style":2853},[38,3208],{"className":3209,"style":156},[155],[38,3211,3213],{"className":3212},[160,161,162,163],[38,3214,1977],{"className":3215},[98,167,163],[38,3217,464],{"className":3218},[463],[38,3220,3222],{"className":3221},[144],[38,3223,3225],{"className":3224,"style":2872},[148],[38,3226],{}," 用的是三角函数 PE 编码，定义为",[38,3229,3231],{"className":3230,"translate":42},[315],[38,3232,3234,3358],{"className":3233,"translate":42},[41],[38,3235,3237],{"className":3236},[46],[48,3238,3239],{"xmlns":50,"display":324},[52,3240,3241,3355],{},[55,3242,3243,3249,3251],{},[330,3244,3245,3247],{},[58,3246,10],{"mathvariant":60},[58,3248,1977],{},[63,3250,340],{},[55,3252,3253,3255,3353],{},[63,3254,1500],{"fence":85},[3256,3257,3260,3276,3288,3300,3314,3327,3341],"mtable",{"rowspacing":3258,"columnalign":3259,"columnspacing":383},"0.16em","center",[3261,3262,3263],"mtr",{},[3264,3265,3266],"mtd",{},[3267,3268,3269],"mstyle",{"scriptlevel":404,"displaystyle":1499},[330,3270,3271,3274],{},[58,3272,3273],{},"r",[203,3275,348],{},[3261,3277,3278],{},[3264,3279,3280],{},[3267,3281,3282],{"scriptlevel":404,"displaystyle":1499},[330,3283,3284,3286],{},[58,3285,3273],{},[203,3287,368],{},[3261,3289,3290],{},[3264,3291,3292],{},[3267,3293,3294],{"scriptlevel":404,"displaystyle":1499},[330,3295,3296,3298],{},[58,3297,3273],{},[203,3299,205],{},[3261,3301,3302],{},[3264,3303,3304],{},[3267,3305,3306],{"scriptlevel":404,"displaystyle":1499},[55,3307,3308,3310,3312],{},[58,3309,1527],{"mathvariant":400},[58,3311,1527],{"mathvariant":400},[58,3313,1527],{"mathvariant":400},[3261,3315,3316],{},[3264,3317,3318],{},[3267,3319,3320],{"scriptlevel":404,"displaystyle":1499},[330,3321,3322,3324],{},[58,3323,3273],{},[58,3325,3326],{},"j",[3261,3328,3329],{},[3264,3330,3331],{},[3267,3332,3333],{"scriptlevel":404,"displaystyle":1499},[55,3334,3335,3337,3339],{},[58,3336,1527],{"mathvariant":400},[58,3338,1527],{"mathvariant":400},[58,3340,1527],{"mathvariant":400},[3261,3342,3343],{},[3264,3344,3345],{},[3267,3346,3347],{"scriptlevel":404,"displaystyle":1499},[330,3348,3349,3351],{},[58,3350,3273],{},[58,3352,75],{},[63,3354,1542],{"fence":85},[77,3356,3357],{"encoding":79},"\\boldsymbol{p}_i=\\begin{pmatrix}\nr_1 \\\\\nr_2 \\\\\nr_3 \\\\\n... \\\\\nr_j \\\\\n... \\\\\nr_d\n\\end{pmatrix}",[38,3359,3361,3422],{"className":3360,"ariaHidden":85},[84],[38,3362,3364,3367,3413,3416,3419],{"className":3363},[89],[38,3365],{"className":3366,"style":2825},[93],[38,3368,3370,3379],{"className":3369},[98],[38,3371,3373],{"className":3372},[98],[38,3374,3376],{"className":3375},[98],[38,3377,10],{"className":3378},[98,105],[38,3380,3382],{"className":3381},[136],[38,3383,3385,3405],{"className":3384},[140,436],[38,3386,3388,3402],{"className":3387},[144],[38,3389,3391],{"className":3390,"style":2850},[148],[38,3392,3393,3396],{"style":2853},[38,3394],{"className":3395,"style":156},[155],[38,3397,3399],{"className":3398},[160,161,162,163],[38,3400,1977],{"className":3401},[98,167,163],[38,3403,464],{"className":3404},[463],[38,3406,3408],{"className":3407},[144],[38,3409,3411],{"className":3410,"style":2872},[148],[38,3412],{},[38,3414],{"className":3415,"style":111},[110],[38,3417,340],{"className":3418},[115],[38,3420],{"className":3421,"style":111},[110],[38,3423,3425,3429],{"className":3424},[89],[38,3426],{"className":3427,"style":3428},[93],"height:8.4001em;vertical-align:-3.9501em;",[38,3430,3433,3484,3789],{"className":3431},[3432],"minner",[38,3434,3436],{"className":3435},[1578],[38,3437,3441],{"className":3438},[3439,3440],"delimsizing","mult",[38,3442,3444,3475],{"className":3443},[140,436],[38,3445,3447,3472],{"className":3446},[144],[38,3448,3451],{"className":3449,"style":3450},[148],"height:4.4499em;",[38,3452,3454,3458],{"style":3453},"top:-6.4499em;",[38,3455],{"className":3456,"style":3457},[155],"height:10.4em;",[38,3459,3461],{"style":3460},"width:0.875em;height:8.400em;",[3462,3463,3468],"svg",{"xmlns":3464,"width":3465,"height":3466,"viewBox":3467},"http:\u002F\u002Fwww.w3.org\u002F2000\u002Fsvg","0.875em","8.400em","0 0 875 8400",[3469,3470],"path",{"d":3471},"M863,9c0,-2,-2,-5,-6,-9c0,0,-17,0,-17,0c-12.7,0,-19.3,0.3,-20,1\nc-5.3,5.3,-10.3,11,-15,17c-242.7,294.7,-395.3,682,-458,1162c-21.3,163.3,-33.3,349,\n-36,557 l0,4884c0.2,6,0,26,0,60c2,159.3,10,310.7,24,454c53.3,528,210,\n949.7,470,1265c4.7,6,9.7,11.7,15,17c0.7,0.7,7,1,19,1c0,0,18,0,18,0c4,-4,6,-7,6,-9\nc0,-2.7,-3.3,-8.7,-10,-18c-135.3,-192.7,-235.5,-414.3,-300.5,-665c-65,-250.7,-102.5,\n-544.7,-112.5,-882c-2,-104,-3,-167,-3,-189\nl0,-4892c0,-162.7,5.7,-314,17,-454c20.7,-272,63.7,-513,129,-723c65.3,\n-210,155.3,-396.3,270,-559c6.7,-9.3,10,-15.3,10,-18z",[38,3473,464],{"className":3474},[463],[38,3476,3478],{"className":3477},[144],[38,3479,3482],{"className":3480,"style":3481},[148],"height:3.9501em;",[38,3483],{},[38,3485,3487],{"className":3486},[98],[38,3488,3490],{"className":3489},[3256],[38,3491,3494],{"className":3492},[3493],"col-align-c",[38,3495,3497,3780],{"className":3496},[140,436],[38,3498,3500,3777],{"className":3499},[144],[38,3501,3504,3556,3605,3654,3666,3716,3728],{"className":3502,"style":3503},[148],"height:4.45em;",[38,3505,3507,3511],{"style":3506},"top:-6.61em;",[38,3508],{"className":3509,"style":3510},[155],"height:3em;",[38,3512,3514],{"className":3513},[98],[38,3515,3517,3521],{"className":3516},[98],[38,3518,3273],{"className":3519,"style":3520},[98,167],"margin-right:0.02778em;",[38,3522,3524],{"className":3523},[136],[38,3525,3527,3548],{"className":3526},[140,436],[38,3528,3530,3545],{"className":3529},[144],[38,3531,3533],{"className":3532,"style":509},[148],[38,3534,3536,3539],{"style":3535},"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;",[38,3537],{"className":3538,"style":156},[155],[38,3540,3542],{"className":3541},[160,161,162,163],[38,3543,348],{"className":3544},[98,163],[38,3546,464],{"className":3547},[463],[38,3549,3551],{"className":3550},[144],[38,3552,3554],{"className":3553,"style":531},[148],[38,3555],{},[38,3557,3559,3562],{"style":3558},"top:-5.41em;",[38,3560],{"className":3561,"style":3510},[155],[38,3563,3565],{"className":3564},[98],[38,3566,3568,3571],{"className":3567},[98],[38,3569,3273],{"className":3570,"style":3520},[98,167],[38,3572,3574],{"className":3573},[136],[38,3575,3577,3597],{"className":3576},[140,436],[38,3578,3580,3594],{"className":3579},[144],[38,3581,3583],{"className":3582,"style":509},[148],[38,3584,3585,3588],{"style":3535},[38,3586],{"className":3587,"style":156},[155],[38,3589,3591],{"className":3590},[160,161,162,163],[38,3592,368],{"className":3593},[98,163],[38,3595,464],{"className":3596},[463],[38,3598,3600],{"className":3599},[144],[38,3601,3603],{"className":3602,"style":531},[148],[38,3604],{},[38,3606,3608,3611],{"style":3607},"top:-4.21em;",[38,3609],{"className":3610,"style":3510},[155],[38,3612,3614],{"className":3613},[98],[38,3615,3617,3620],{"className":3616},[98],[38,3618,3273],{"className":3619,"style":3520},[98,167],[38,3621,3623],{"className":3622},[136],[38,3624,3626,3646],{"className":3625},[140,436],[38,3627,3629,3643],{"className":3628},[144],[38,3630,3632],{"className":3631,"style":509},[148],[38,3633,3634,3637],{"style":3535},[38,3635],{"className":3636,"style":156},[155],[38,3638,3640],{"className":3639},[160,161,162,163],[38,3641,205],{"className":3642},[98,163],[38,3644,464],{"className":3645},[463],[38,3647,3649],{"className":3648},[144],[38,3650,3652],{"className":3651,"style":531},[148],[38,3653],{},[38,3655,3657,3660],{"style":3656},"top:-3.01em;",[38,3658],{"className":3659,"style":3510},[155],[38,3661,3663],{"className":3662},[98],[38,3664,1738],{"className":3665},[98],[38,3667,3669,3672],{"style":3668},"top:-1.81em;",[38,3670],{"className":3671,"style":3510},[155],[38,3673,3675],{"className":3674},[98],[38,3676,3678,3681],{"className":3677},[98],[38,3679,3273],{"className":3680,"style":3520},[98,167],[38,3682,3684],{"className":3683},[136],[38,3685,3687,3708],{"className":3686},[140,436],[38,3688,3690,3705],{"className":3689},[144],[38,3691,3693],{"className":3692,"style":2196},[148],[38,3694,3695,3698],{"style":3535},[38,3696],{"className":3697,"style":156},[155],[38,3699,3701],{"className":3700},[160,161,162,163],[38,3702,3326],{"className":3703,"style":3704},[98,167,163],"margin-right:0.05724em;",[38,3706,464],{"className":3707},[463],[38,3709,3711],{"className":3710},[144],[38,3712,3714],{"className":3713,"style":471},[148],[38,3715],{},[38,3717,3719,3722],{"style":3718},"top:-0.61em;",[38,3720],{"className":3721,"style":3510},[155],[38,3723,3725],{"className":3724},[98],[38,3726,1738],{"className":3727},[98],[38,3729,3731,3734],{"style":3730},"top:0.59em;",[38,3732],{"className":3733,"style":3510},[155],[38,3735,3737],{"className":3736},[98],[38,3738,3740,3743],{"className":3739},[98],[38,3741,3273],{"className":3742,"style":3520},[98,167],[38,3744,3746],{"className":3745},[136],[38,3747,3749,3769],{"className":3748},[140,436],[38,3750,3752,3766],{"className":3751},[144],[38,3753,3755],{"className":3754,"style":566},[148],[38,3756,3757,3760],{"style":3535},[38,3758],{"className":3759,"style":156},[155],[38,3761,3763],{"className":3762},[160,161,162,163],[38,3764,75],{"className":3765},[98,167,163],[38,3767,464],{"className":3768},[463],[38,3770,3772],{"className":3771},[144],[38,3773,3775],{"className":3774,"style":531},[148],[38,3776],{},[38,3778,464],{"className":3779},[463],[38,3781,3783],{"className":3782},[144],[38,3784,3787],{"className":3785,"style":3786},[148],"height:3.95em;",[38,3788],{},[38,3790,3792],{"className":3791},[1794],[38,3793,3795],{"className":3794},[3439,3440],[38,3796,3798,3819],{"className":3797},[140,436],[38,3799,3801,3816],{"className":3800},[144],[38,3802,3804],{"className":3803,"style":3450},[148],[38,3805,3806,3809],{"style":3453},[38,3807],{"className":3808,"style":3457},[155],[38,3810,3811],{"style":3460},[3462,3812,3813],{"xmlns":3464,"width":3465,"height":3466,"viewBox":3467},[3469,3814],{"d":3815},"M76,0c-16.7,0,-25,3,-25,9c0,2,2,6.3,6,13c21.3,28.7,42.3,60.3,\n63,95c96.7,156.7,172.8,332.5,228.5,527.5c55.7,195,92.8,416.5,111.5,664.5\nc11.3,139.3,17,290.7,17,454c0,28,1.7,43,3.3,45l0,4809\nc-3,4,-3.3,16.7,-3.3,38c0,162,-5.7,313.7,-17,455c-18.7,248,-55.8,469.3,-111.5,664\nc-55.7,194.7,-131.8,370.3,-228.5,527c-20.7,34.7,-41.7,66.3,-63,95c-2,3.3,-4,7,-6,11\nc0,7.3,5.7,11,17,11c0,0,11,0,11,0c9.3,0,14.3,-0.3,15,-1c5.3,-5.3,10.3,-11,15,-17\nc242.7,-294.7,395.3,-681.7,458,-1161c21.3,-164.7,33.3,-350.7,36,-558\nl0,-4944c-2,-159.3,-10,-310.7,-24,-454c-53.3,-528,-210,-949.7,\n-470,-1265c-4.7,-6,-9.7,-11.7,-15,-17c-0.7,-0.7,-6.7,-1,-18,-1z",[38,3817,464],{"className":3818},[463],[38,3820,3822],{"className":3821},[144],[38,3823,3825],{"className":3824,"style":3481},[148],[38,3826],{},[10,3828,3829,3830,3901],{},"其中每一个维度的分量 ",[38,3831,3833,3851],{"className":3832,"translate":42},[41],[38,3834,3836],{"className":3835},[46],[48,3837,3838],{"xmlns":50},[52,3839,3840,3848],{},[55,3841,3842],{},[330,3843,3844,3846],{},[58,3845,3273],{},[58,3847,3326],{},[77,3849,3850],{"encoding":79},"r_j",[38,3852,3854],{"className":3853,"ariaHidden":85},[84],[38,3855,3857,3861],{"className":3856},[89],[38,3858],{"className":3859,"style":3860},[93],"height:0.7167em;vertical-align:-0.2861em;",[38,3862,3864,3867],{"className":3863},[98],[38,3865,3273],{"className":3866,"style":3520},[98,167],[38,3868,3870],{"className":3869},[136],[38,3871,3873,3893],{"className":3872},[140,436],[38,3874,3876,3890],{"className":3875},[144],[38,3877,3879],{"className":3878,"style":2196},[148],[38,3880,3881,3884],{"style":3535},[38,3882],{"className":3883,"style":156},[155],[38,3885,3887],{"className":3886},[160,161,162,163],[38,3888,3326],{"className":3889,"style":3704},[98,167,163],[38,3891,464],{"className":3892},[463],[38,3894,3896],{"className":3895},[144],[38,3897,3899],{"className":3898,"style":471},[148],[38,3900],{}," 为",[38,3903,3905],{"className":3904,"translate":42},[315],[38,3906,3908,4036],{"className":3907,"translate":42},[41],[38,3909,3911],{"className":3910},[46],[48,3912,3913],{"xmlns":50,"display":324},[52,3914,3915,4033],{},[55,3916,3917,3923,3925],{},[330,3918,3919,3921],{},[58,3920,3273],{},[58,3922,3326],{},[63,3924,340],{},[55,3926,3927,3930],{},[63,3928,3929],{"fence":85},"{",[3256,3931,3934,3985],{"rowspacing":3932,"columnalign":3933,"columnspacing":383},"0.36em","left left",[3261,3935,3936],{},[3264,3937,3938],{},[3267,3939,3940],{"scriptlevel":404,"displaystyle":1499},[55,3941,3942,3945,3948,3950,3952,3955,3972,3974,3976,3978,3980,3982],{},[58,3943,3944],{},"sin",[63,3946,3947],{},"⁡",[63,3949,1500],{"stretchy":1499},[58,3951,3273],{},[58,3953,3954],{"mathvariant":400},"\u002F",[67,3956,3957,3960],{},[203,3958,3959],{},"10000",[55,3961,3962,3968,3970],{},[55,3963,3964,3966],{},[203,3965,368],{},[58,3967,3326],{},[58,3969,3954],{"mathvariant":400},[58,3971,75],{},[63,3973,1542],{"stretchy":1499},[63,3975,380],{"separator":85},[110,3977],{"width":383},[58,3979,3326],{},[110,3981],{"width":383},[335,3983,3984],{},"is even",[3261,3986,3987],{},[3264,3988,3989],{},[3267,3990,3991],{"scriptlevel":404,"displaystyle":1499},[55,3992,3993,3996,3998,4000,4002,4004,4020,4022,4024,4026,4028,4030],{},[58,3994,3995],{},"cos",[63,3997,3947],{},[63,3999,1500],{"stretchy":1499},[58,4001,3273],{},[58,4003,3954],{"mathvariant":400},[67,4005,4006,4008],{},[203,4007,3959],{},[55,4009,4010,4016,4018],{},[55,4011,4012,4014],{},[203,4013,368],{},[58,4015,3326],{},[58,4017,3954],{"mathvariant":400},[58,4019,75],{},[63,4021,1542],{"stretchy":1499},[63,4023,380],{"separator":85},[110,4025],{"width":383},[58,4027,3326],{},[110,4029],{"width":383},[335,4031,4032],{},"is odd",[77,4034,4035],{"encoding":79},"{r}_j =\\begin{cases}\n\\sin (r \u002F 10000^{{2j}\u002F{d}}),  \\quad j \\quad \\text{is even}  \\\\\n\\cos ({r}\u002F{10000^{{2j}\u002F{d}}}),  \\quad j \\quad \\text{is odd}\n\\end{cases} ",[38,4037,4039,4097],{"className":4038,"ariaHidden":85},[84],[38,4040,4042,4045,4088,4091,4094],{"className":4041},[89],[38,4043],{"className":4044,"style":3860},[93],[38,4046,4048,4054],{"className":4047},[98],[38,4049,4051],{"className":4050},[98],[38,4052,3273],{"className":4053,"style":3520},[98,167],[38,4055,4057],{"className":4056},[136],[38,4058,4060,4080],{"className":4059},[140,436],[38,4061,4063,4077],{"className":4062},[144],[38,4064,4066],{"className":4065,"style":2196},[148],[38,4067,4068,4071],{"style":446},[38,4069],{"className":4070,"style":156},[155],[38,4072,4074],{"className":4073},[160,161,162,163],[38,4075,3326],{"className":4076,"style":3704},[98,167,163],[38,4078,464],{"className":4079},[463],[38,4081,4083],{"className":4082},[144],[38,4084,4086],{"className":4085,"style":471},[148],[38,4087],{},[38,4089],{"className":4090,"style":111},[110],[38,4092,340],{"className":4093},[115],[38,4095],{"className":4096,"style":111},[110],[38,4098,4100,4104],{"className":4099},[89],[38,4101],{"className":4102,"style":4103},[93],"height:3em;vertical-align:-1.25em;",[38,4105,4107,4116,4345],{"className":4106},[3432],[38,4108,4112],{"className":4109,"style":4111},[1578,4110],"delimcenter","top:0em;",[38,4113,3929],{"className":4114},[3439,4115],"size4",[38,4117,4119],{"className":4118},[98],[38,4120,4122],{"className":4121},[3256],[38,4123,4126],{"className":4124},[4125],"col-align-l",[38,4127,4129,4336],{"className":4128},[140,436],[38,4130,4132,4333],{"className":4131},[144],[38,4133,4136,4231],{"className":4134,"style":4135},[148],"height:1.69em;",[38,4137,4139,4143],{"style":4138},"top:-3.69em;",[38,4140],{"className":4141,"style":4142},[155],"height:3.008em;",[38,4144,4146,4149,4152,4155,4159,4207,4210,4213,4216,4219,4222,4225],{"className":4145},[98],[38,4147,3944],{"className":4148},[2101],[38,4150,1500],{"className":4151},[1578],[38,4153,3273],{"className":4154,"style":3520},[98,167],[38,4156,4158],{"className":4157},[98],"\u002F1000",[38,4160,4162,4165],{"className":4161},[98],[38,4163,404],{"className":4164},[98],[38,4166,4168],{"className":4167},[136],[38,4169,4171],{"className":4170},[140],[38,4172,4174],{"className":4173},[144],[38,4175,4178],{"className":4176,"style":4177},[148],"height:0.888em;",[38,4179,4180,4183],{"style":151},[38,4181],{"className":4182,"style":156},[155],[38,4184,4186],{"className":4185},[160,161,162,163],[38,4187,4189,4198,4201],{"className":4188},[98,163],[38,4190,4192,4195],{"className":4191},[98,163],[38,4193,368],{"className":4194},[98,163],[38,4196,3326],{"className":4197,"style":3704},[98,167,163],[38,4199,3954],{"className":4200},[98,163],[38,4202,4204],{"className":4203},[98,163],[38,4205,75],{"className":4206},[98,167,163],[38,4208,1542],{"className":4209},[1794],[38,4211,380],{"className":4212},[537],[38,4214],{"className":4215,"style":711},[110],[38,4217],{"className":4218,"style":541},[110],[38,4220,3326],{"className":4221,"style":3704},[98,167],[38,4223],{"className":4224,"style":711},[110],[38,4226,4228],{"className":4227},[98,456],[38,4229,3984],{"className":4230},[98],[38,4232,4234,4237],{"style":4233},"top:-2.25em;",[38,4235],{"className":4236,"style":4142},[155],[38,4238,4240,4243,4246,4252,4255,4309,4312,4315,4318,4321,4324,4327],{"className":4239},[98],[38,4241,3995],{"className":4242},[2101],[38,4244,1500],{"className":4245},[1578],[38,4247,4249],{"className":4248},[98],[38,4250,3273],{"className":4251,"style":3520},[98,167],[38,4253,3954],{"className":4254},[98],[38,4256,4258,4262],{"className":4257},[98],[38,4259,4261],{"className":4260},[98],"1000",[38,4263,4265,4268],{"className":4264},[98],[38,4266,404],{"className":4267},[98],[38,4269,4271],{"className":4270},[136],[38,4272,4274],{"className":4273},[140],[38,4275,4277],{"className":4276},[144],[38,4278,4280],{"className":4279,"style":4177},[148],[38,4281,4282,4285],{"style":151},[38,4283],{"className":4284,"style":156},[155],[38,4286,4288],{"className":4287},[160,161,162,163],[38,4289,4291,4300,4303],{"className":4290},[98,163],[38,4292,4294,4297],{"className":4293},[98,163],[38,4295,368],{"className":4296},[98,163],[38,4298,3326],{"className":4299,"style":3704},[98,167,163],[38,4301,3954],{"className":4302},[98,163],[38,4304,4306],{"className":4305},[98,163],[38,4307,75],{"className":4308},[98,167,163],[38,4310,1542],{"className":4311},[1794],[38,4313,380],{"className":4314},[537],[38,4316],{"className":4317,"style":711},[110],[38,4319],{"className":4320,"style":541},[110],[38,4322,3326],{"className":4323,"style":3704},[98,167],[38,4325],{"className":4326,"style":711},[110],[38,4328,4330],{"className":4329},[98,456],[38,4331,4032],{"className":4332},[98],[38,4334,464],{"className":4335},[463],[38,4337,4339],{"className":4338},[144],[38,4340,4343],{"className":4341,"style":4342},[148],"height:1.19em;",[38,4344],{},[38,4346],{"className":4347},[1794,4348],"nulldelimiter",[31,4350,4351],{"id":4351},"注意力机制",[10,4353,4354],{},"注意力机制是模仿人类在阅读文本的过程中，对每个词汇的理解过程。",[10,4356,4357,4360],{},[170,4358,4359],{},"事实上，一个人在阅读文本的过程中并不是逐词扫描一遍过的。"," 而是分为两步。",[10,4362,4363],{},"第一步，逐词扫描，将一段句子种的所有词汇载入到大脑的短期记忆中。这里，对应的是上面所提的，建立上下文关联、将每个词的含义糅合位置含义，得到新含义的过程。",[10,4365,4366],{},"第二步，再进行扫描，在理解每个词汇时，同时关注其他所有剩余词汇，结合剩余的词汇来分析它的含义。",[10,4368,4369],{},"这么说可能有点抽象，但是换一种说法就明确了，它可以被表述为：",[250,4371,4372],{},[10,4373,4374],{},"一句话中每一个词的含义，可以视为剩余所有词汇含义的加权组合。",[10,4376,4377,4378,4689,4690,4765],{},"以形式化的方法说明，例如在句子 ",[38,4379,4381,4439],{"className":4380,"translate":42},[41],[38,4382,4384],{"className":4383},[46],[48,4385,4386],{"xmlns":50},[52,4387,4388,4436],{},[55,4389,4390,4392,4394,4396,4402,4404,4410,4412,4418,4420,4422,4424,4426,4428,4434],{},[58,4391,1911],{"mathvariant":60},[63,4393,340],{},[63,4395,1500],{"stretchy":1499},[330,4397,4398,4400],{},[58,4399,61],{"mathvariant":60},[203,4401,348],{},[63,4403,380],{"separator":85},[330,4405,4406,4408],{},[58,4407,61],{"mathvariant":60},[203,4409,368],{},[63,4411,380],{"separator":85},[330,4413,4414,4416],{},[58,4415,61],{"mathvariant":60},[203,4417,205],{},[63,4419,380],{"separator":85},[58,4421,1527],{"mathvariant":400},[58,4423,1527],{"mathvariant":400},[58,4425,1527],{"mathvariant":400},[63,4427,380],{"separator":85},[330,4429,4430,4432],{},[58,4431,61],{"mathvariant":60},[58,4433,1401],{},[63,4435,1542],{"stretchy":1499},[77,4437,4438],{"encoding":79},"\\boldsymbol{s} = (\\boldsymbol{v}_1,\\boldsymbol{v}_2,\\boldsymbol{v}_3,...,\\boldsymbol{v}_n)",[38,4440,4442,4466],{"className":4441,"ariaHidden":85},[84],[38,4443,4445,4448,4457,4460,4463],{"className":4444},[89],[38,4446],{"className":4447,"style":1924},[93],[38,4449,4451],{"className":4450},[98],[38,4452,4454],{"className":4453},[98],[38,4455,1911],{"className":4456},[98,105],[38,4458],{"className":4459,"style":111},[110],[38,4461,340],{"className":4462},[115],[38,4464],{"className":4465,"style":111},[110],[38,4467,4469,4472,4475,4521,4524,4527,4573,4576,4579,4625,4628,4631,4634,4637,4640,4686],{"className":4468},[89],[38,4470],{"className":4471,"style":1574},[93],[38,4473,1500],{"className":4474},[1578],[38,4476,4478,4487],{"className":4477},[98],[38,4479,4481],{"className":4480},[98],[38,4482,4484],{"className":4483},[98],[38,4485,61],{"className":4486,"style":106},[98,105],[38,4488,4490],{"className":4489},[136],[38,4491,4493,4513],{"className":4492},[140,436],[38,4494,4496,4510],{"className":4495},[144],[38,4497,4499],{"className":4498,"style":509},[148],[38,4500,4501,4504],{"style":446},[38,4502],{"className":4503,"style":156},[155],[38,4505,4507],{"className":4506},[160,161,162,163],[38,4508,348],{"className":4509},[98,163],[38,4511,464],{"className":4512},[463],[38,4514,4516],{"className":4515},[144],[38,4517,4519],{"className":4518,"style":531},[148],[38,4520],{},[38,4522,380],{"className":4523},[537],[38,4525],{"className":4526,"style":541},[110],[38,4528,4530,4539],{"className":4529},[98],[38,4531,4533],{"className":4532},[98],[38,4534,4536],{"className":4535},[98],[38,4537,61],{"className":4538,"style":106},[98,105],[38,4540,4542],{"className":4541},[136],[38,4543,4545,4565],{"className":4544},[140,436],[38,4546,4548,4562],{"className":4547},[144],[38,4549,4551],{"className":4550,"style":509},[148],[38,4552,4553,4556],{"style":446},[38,4554],{"className":4555,"style":156},[155],[38,4557,4559],{"className":4558},[160,161,162,163],[38,4560,368],{"className":4561},[98,163],[38,4563,464],{"className":4564},[463],[38,4566,4568],{"className":4567},[144],[38,4569,4571],{"className":4570,"style":531},[148],[38,4572],{},[38,4574,380],{"className":4575},[537],[38,4577],{"className":4578,"style":541},[110],[38,4580,4582,4591],{"className":4581},[98],[38,4583,4585],{"className":4584},[98],[38,4586,4588],{"className":4587},[98],[38,4589,61],{"className":4590,"style":106},[98,105],[38,4592,4594],{"className":4593},[136],[38,4595,4597,4617],{"className":4596},[140,436],[38,4598,4600,4614],{"className":4599},[144],[38,4601,4603],{"className":4602,"style":509},[148],[38,4604,4605,4608],{"style":446},[38,4606],{"className":4607,"style":156},[155],[38,4609,4611],{"className":4610},[160,161,162,163],[38,4612,205],{"className":4613},[98,163],[38,4615,464],{"className":4616},[463],[38,4618,4620],{"className":4619},[144],[38,4621,4623],{"className":4622,"style":531},[148],[38,4624],{},[38,4626,380],{"className":4627},[537],[38,4629],{"className":4630,"style":541},[110],[38,4632,1738],{"className":4633},[98],[38,4635,380],{"className":4636},[537],[38,4638],{"className":4639,"style":541},[110],[38,4641,4643,4652],{"className":4642},[98],[38,4644,4646],{"className":4645},[98],[38,4647,4649],{"className":4648},[98],[38,4650,61],{"className":4651,"style":106},[98,105],[38,4653,4655],{"className":4654},[136],[38,4656,4658,4678],{"className":4657},[140,436],[38,4659,4661,4675],{"className":4660},[144],[38,4662,4664],{"className":4663,"style":443},[148],[38,4665,4666,4669],{"style":446},[38,4667],{"className":4668,"style":156},[155],[38,4670,4672],{"className":4671},[160,161,162,163],[38,4673,1401],{"className":4674},[98,167,163],[38,4676,464],{"className":4677},[463],[38,4679,4681],{"className":4680},[144],[38,4682,4684],{"className":4683,"style":531},[148],[38,4685],{},[38,4687,1542],{"className":4688},[1794]," 中，对于 ",[38,4691,4693,4710],{"className":4692,"translate":42},[41],[38,4694,4696],{"className":4695},[46],[48,4697,4698],{"xmlns":50},[52,4699,4700,4708],{},[55,4701,4702],{},[330,4703,4704,4706],{},[58,4705,61],{"mathvariant":60},[58,4707,1977],{},[77,4709,2737],{"encoding":79},[38,4711,4713],{"className":4712,"ariaHidden":85},[84],[38,4714,4716,4719],{"className":4715},[89],[38,4717],{"className":4718,"style":2747},[93],[38,4720,4722,4731],{"className":4721},[98],[38,4723,4725],{"className":4724},[98],[38,4726,4728],{"className":4727},[98],[38,4729,61],{"className":4730,"style":106},[98,105],[38,4732,4734],{"className":4733},[136],[38,4735,4737,4757],{"className":4736},[140,436],[38,4738,4740,4754],{"className":4739},[144],[38,4741,4743],{"className":4742,"style":2196},[148],[38,4744,4745,4748],{"style":446},[38,4746],{"className":4747,"style":156},[155],[38,4749,4751],{"className":4750},[160,161,162,163],[38,4752,1977],{"className":4753},[98,167,163],[38,4755,464],{"className":4756},[463],[38,4758,4760],{"className":4759},[144],[38,4761,4763],{"className":4762,"style":531},[148],[38,4764],{},"，可以写作",[38,4767,4769],{"className":4768,"translate":42},[315],[38,4770,4772,4818],{"className":4771,"translate":42},[41],[38,4773,4775],{"className":4774},[46],[48,4776,4777],{"xmlns":50,"display":324},[52,4778,4779,4815],{},[55,4780,4781,4787,4789,4803,4809],{},[330,4782,4783,4785],{},[58,4784,61],{"mathvariant":60},[58,4786,1977],{},[63,4788,340],{},[1968,4790,4791,4793,4801],{},[63,4792,1972],{},[55,4794,4795,4797,4799],{},[58,4796,3326],{},[63,4798,401],{"mathvariant":400},[58,4800,1977],{},[58,4802,1401],{},[330,4804,4805,4807],{},[58,4806,345],{},[58,4808,3326],{},[330,4810,4811,4813],{},[58,4812,61],{"mathvariant":60},[58,4814,3326],{},[77,4816,4817],{"encoding":79},"\\boldsymbol{v}_i = \\sum_{j \\neq i}^{n} w_j \\boldsymbol{v}_j",[38,4819,4821,4882],{"className":4820,"ariaHidden":85},[84],[38,4822,4824,4827,4873,4876,4879],{"className":4823},[89],[38,4825],{"className":4826,"style":2747},[93],[38,4828,4830,4839],{"className":4829},[98],[38,4831,4833],{"className":4832},[98],[38,4834,4836],{"className":4835},[98],[38,4837,61],{"className":4838,"style":106},[98,105],[38,4840,4842],{"className":4841},[136],[38,4843,4845,4865],{"className":4844},[140,436],[38,4846,4848,4862],{"className":4847},[144],[38,4849,4851],{"className":4850,"style":2196},[148],[38,4852,4853,4856],{"style":446},[38,4854],{"className":4855,"style":156},[155],[38,4857,4859],{"className":4858},[160,161,162,163],[38,4860,1977],{"className":4861},[98,167,163],[38,4863,464],{"className":4864},[463],[38,4866,4868],{"className":4867},[144],[38,4869,4871],{"className":4870,"style":531},[148],[38,4872],{},[38,4874],{"className":4875,"style":111},[110],[38,4877,340],{"className":4878},[115],[38,4880],{"className":4881,"style":111},[110],[38,4883,4885,4889,4991,4994,5034],{"className":4884},[89],[38,4886],{"className":4887,"style":4888},[93],"height:3.0896em;vertical-align:-1.4382em;",[38,4890,4892],{"className":4891},[2101,2102],[38,4893,4895,4982],{"className":4894},[140,436],[38,4896,4898,4979],{"className":4897},[144],[38,4899,4901,4955,4965],{"className":4900,"style":2112},[148],[38,4902,4904,4907],{"style":4903},"top:-1.8479em;margin-left:0em;",[38,4905],{"className":4906,"style":2119},[155],[38,4908,4910],{"className":4909},[160,161,162,163],[38,4911,4913,4916,4952],{"className":4912},[98,163],[38,4914,3326],{"className":4915,"style":3704},[98,167,163],[38,4917,4919,4946,4949],{"className":4918},[115,163],[38,4920,4922],{"className":4921},[115,163],[38,4923,4925],{"className":4924},[98,813,163],[38,4926,4928],{"className":4927},[817,163],[38,4929,4931,4934,4943],{"className":4930},[821,163],[38,4932],{"className":4933,"style":609},[93],[38,4935,4937],{"className":4936},[828],[38,4938,4940],{"className":4939},[98,163],[38,4941,835],{"className":4942},[115,163],[38,4944],{"className":4945},[839],[38,4947],{"className":4948},[110,843,163],[38,4950,340],{"className":4951},[115,163],[38,4953,1977],{"className":4954},[98,167,163],[38,4956,4957,4960],{"style":2137},[38,4958],{"className":4959,"style":2119},[155],[38,4961,4962],{},[38,4963,1972],{"className":4964},[2101,2146,2147],[38,4966,4967,4970],{"style":2150},[38,4968],{"className":4969,"style":2119},[155],[38,4971,4973],{"className":4972},[160,161,162,163],[38,4974,4976],{"className":4975},[98,163],[38,4977,1401],{"className":4978},[98,167,163],[38,4980,464],{"className":4981},[463],[38,4983,4985],{"className":4984},[144],[38,4986,4989],{"className":4987,"style":4988},[148],"height:1.4382em;",[38,4990],{},[38,4992],{"className":4993,"style":541},[110],[38,4995,4997,5000],{"className":4996},[98],[38,4998,345],{"className":4999,"style":496},[98,167],[38,5001,5003],{"className":5002},[136],[38,5004,5006,5026],{"className":5005},[140,436],[38,5007,5009,5023],{"className":5008},[144],[38,5010,5012],{"className":5011,"style":2196},[148],[38,5013,5014,5017],{"style":512},[38,5015],{"className":5016,"style":156},[155],[38,5018,5020],{"className":5019},[160,161,162,163],[38,5021,3326],{"className":5022,"style":3704},[98,167,163],[38,5024,464],{"className":5025},[463],[38,5027,5029],{"className":5028},[144],[38,5030,5032],{"className":5031,"style":471},[148],[38,5033],{},[38,5035,5037,5046],{"className":5036},[98],[38,5038,5040],{"className":5039},[98],[38,5041,5043],{"className":5042},[98],[38,5044,61],{"className":5045,"style":106},[98,105],[38,5047,5049],{"className":5048},[136],[38,5050,5052,5072],{"className":5051},[140,436],[38,5053,5055,5069],{"className":5054},[144],[38,5056,5058],{"className":5057,"style":2196},[148],[38,5059,5060,5063],{"style":446},[38,5061],{"className":5062,"style":156},[155],[38,5064,5066],{"className":5065},[160,161,162,163],[38,5067,3326],{"className":5068,"style":3704},[98,167,163],[38,5070,464],{"className":5071},[463],[38,5073,5075],{"className":5074},[144],[38,5076,5078],{"className":5077,"style":471},[148],[38,5079],{},[10,5081,5082,5083,5153],{},"这里的问题是：如何确定权重 ",[38,5084,5086,5104],{"className":5085,"translate":42},[41],[38,5087,5089],{"className":5088},[46],[48,5090,5091],{"xmlns":50},[52,5092,5093,5101],{},[55,5094,5095],{},[330,5096,5097,5099],{},[58,5098,345],{},[58,5100,3326],{},[77,5102,5103],{"encoding":79},"w_j",[38,5105,5107],{"className":5106,"ariaHidden":85},[84],[38,5108,5110,5113],{"className":5109},[89],[38,5111],{"className":5112,"style":3860},[93],[38,5114,5116,5119],{"className":5115},[98],[38,5117,345],{"className":5118,"style":496},[98,167],[38,5120,5122],{"className":5121},[136],[38,5123,5125,5145],{"className":5124},[140,436],[38,5126,5128,5142],{"className":5127},[144],[38,5129,5131],{"className":5130,"style":2196},[148],[38,5132,5133,5136],{"style":512},[38,5134],{"className":5135,"style":156},[155],[38,5137,5139],{"className":5138},[160,161,162,163],[38,5140,3326],{"className":5141,"style":3704},[98,167,163],[38,5143,464],{"className":5144},[463],[38,5146,5148],{"className":5147},[144],[38,5149,5151],{"className":5150,"style":471},[148],[38,5152],{},"？也就是说，如何衡量剩余词汇对目标词含义的贡献比例？",[10,5155,5156],{},"我们可以提出一个假设：",[250,5158,5159],{},[10,5160,5161,5162,5237,5238,5314,5315,5390],{},"如果一个词 ",[38,5163,5165,5182],{"className":5164,"translate":42},[41],[38,5166,5168],{"className":5167},[46],[48,5169,5170],{"xmlns":50},[52,5171,5172,5180],{},[55,5173,5174],{},[330,5175,5176,5178],{},[58,5177,61],{"mathvariant":60},[58,5179,1977],{},[77,5181,2737],{"encoding":79},[38,5183,5185],{"className":5184,"ariaHidden":85},[84],[38,5186,5188,5191],{"className":5187},[89],[38,5189],{"className":5190,"style":2747},[93],[38,5192,5194,5203],{"className":5193},[98],[38,5195,5197],{"className":5196},[98],[38,5198,5200],{"className":5199},[98],[38,5201,61],{"className":5202,"style":106},[98,105],[38,5204,5206],{"className":5205},[136],[38,5207,5209,5229],{"className":5208},[140,436],[38,5210,5212,5226],{"className":5211},[144],[38,5213,5215],{"className":5214,"style":2196},[148],[38,5216,5217,5220],{"style":446},[38,5218],{"className":5219,"style":156},[155],[38,5221,5223],{"className":5222},[160,161,162,163],[38,5224,1977],{"className":5225},[98,167,163],[38,5227,464],{"className":5228},[463],[38,5230,5232],{"className":5231},[144],[38,5233,5235],{"className":5234,"style":531},[148],[38,5236],{}," 与剩余某个词 ",[38,5239,5241,5259],{"className":5240,"translate":42},[41],[38,5242,5244],{"className":5243},[46],[48,5245,5246],{"xmlns":50},[52,5247,5248,5256],{},[55,5249,5250],{},[330,5251,5252,5254],{},[58,5253,61],{"mathvariant":60},[58,5255,3326],{},[77,5257,5258],{"encoding":79},"\\boldsymbol{v}_j",[38,5260,5262],{"className":5261,"ariaHidden":85},[84],[38,5263,5265,5268],{"className":5264},[89],[38,5266],{"className":5267,"style":417},[93],[38,5269,5271,5280],{"className":5270},[98],[38,5272,5274],{"className":5273},[98],[38,5275,5277],{"className":5276},[98],[38,5278,61],{"className":5279,"style":106},[98,105],[38,5281,5283],{"className":5282},[136],[38,5284,5286,5306],{"className":5285},[140,436],[38,5287,5289,5303],{"className":5288},[144],[38,5290,5292],{"className":5291,"style":2196},[148],[38,5293,5294,5297],{"style":446},[38,5295],{"className":5296,"style":156},[155],[38,5298,5300],{"className":5299},[160,161,162,163],[38,5301,3326],{"className":5302,"style":3704},[98,167,163],[38,5304,464],{"className":5305},[463],[38,5307,5309],{"className":5308},[144],[38,5310,5312],{"className":5311,"style":471},[148],[38,5313],{}," 的含义越相似，那么该词对 ",[38,5316,5318,5335],{"className":5317,"translate":42},[41],[38,5319,5321],{"className":5320},[46],[48,5322,5323],{"xmlns":50},[52,5324,5325,5333],{},[55,5326,5327],{},[330,5328,5329,5331],{},[58,5330,61],{"mathvariant":60},[58,5332,1977],{},[77,5334,2737],{"encoding":79},[38,5336,5338],{"className":5337,"ariaHidden":85},[84],[38,5339,5341,5344],{"className":5340},[89],[38,5342],{"className":5343,"style":2747},[93],[38,5345,5347,5356],{"className":5346},[98],[38,5348,5350],{"className":5349},[98],[38,5351,5353],{"className":5352},[98],[38,5354,61],{"className":5355,"style":106},[98,105],[38,5357,5359],{"className":5358},[136],[38,5360,5362,5382],{"className":5361},[140,436],[38,5363,5365,5379],{"className":5364},[144],[38,5366,5368],{"className":5367,"style":2196},[148],[38,5369,5370,5373],{"style":446},[38,5371],{"className":5372,"style":156},[155],[38,5374,5376],{"className":5375},[160,161,162,163],[38,5377,1977],{"className":5378},[98,167,163],[38,5380,464],{"className":5381},[463],[38,5383,5385],{"className":5384},[144],[38,5386,5388],{"className":5387,"style":531},[148],[38,5389],{}," 含义的贡献就越大。所以，权重可以写作两个词向量的内积相似度。",[38,5392,5394],{"className":5393,"translate":42},[315],[38,5395,5397,5431],{"className":5396,"translate":42},[41],[38,5398,5400],{"className":5399},[46],[48,5401,5402],{"xmlns":50,"display":324},[52,5403,5404,5428],{},[55,5405,5406,5412,5414,5420,5422],{},[330,5407,5408,5410],{},[58,5409,345],{},[58,5411,3326],{},[63,5413,340],{},[330,5415,5416,5418],{},[58,5417,61],{"mathvariant":60},[58,5419,1977],{},[63,5421,351],{"separator":85},[330,5423,5424,5426],{},[58,5425,61],{"mathvariant":60},[58,5427,3326],{},[77,5429,5430],{"encoding":79},"w_j = \\boldsymbol{v}_i · \\boldsymbol{v}_j",[38,5432,5434,5489],{"className":5433,"ariaHidden":85},[84],[38,5435,5437,5440,5480,5483,5486],{"className":5436},[89],[38,5438],{"className":5439,"style":3860},[93],[38,5441,5443,5446],{"className":5442},[98],[38,5444,345],{"className":5445,"style":496},[98,167],[38,5447,5449],{"className":5448},[136],[38,5450,5452,5472],{"className":5451},[140,436],[38,5453,5455,5469],{"className":5454},[144],[38,5456,5458],{"className":5457,"style":2196},[148],[38,5459,5460,5463],{"style":512},[38,5461],{"className":5462,"style":156},[155],[38,5464,5466],{"className":5465},[160,161,162,163],[38,5467,3326],{"className":5468,"style":3704},[98,167,163],[38,5470,464],{"className":5471},[463],[38,5473,5475],{"className":5474},[144],[38,5476,5478],{"className":5477,"style":471},[148],[38,5479],{},[38,5481],{"className":5482,"style":111},[110],[38,5484,340],{"className":5485},[115],[38,5487],{"className":5488,"style":111},[110],[38,5490,5492,5496,5542,5545,5548],{"className":5491},[89],[38,5493],{"className":5494,"style":5495},[93],"height:0.7306em;vertical-align:-0.2861em;",[38,5497,5499,5508],{"className":5498},[98],[38,5500,5502],{"className":5501},[98],[38,5503,5505],{"className":5504},[98],[38,5506,61],{"className":5507,"style":106},[98,105],[38,5509,5511],{"className":5510},[136],[38,5512,5514,5534],{"className":5513},[140,436],[38,5515,5517,5531],{"className":5516},[144],[38,5518,5520],{"className":5519,"style":2196},[148],[38,5521,5522,5525],{"style":446},[38,5523],{"className":5524,"style":156},[155],[38,5526,5528],{"className":5527},[160,161,162,163],[38,5529,1977],{"className":5530},[98,167,163],[38,5532,464],{"className":5533},[463],[38,5535,5537],{"className":5536},[144],[38,5538,5540],{"className":5539,"style":531},[148],[38,5541],{},[38,5543,351],{"className":5544},[537],[38,5546],{"className":5547,"style":541},[110],[38,5549,5551,5560],{"className":5550},[98],[38,5552,5554],{"className":5553},[98],[38,5555,5557],{"className":5556},[98],[38,5558,61],{"className":5559,"style":106},[98,105],[38,5561,5563],{"className":5562},[136],[38,5564,5566,5586],{"className":5565},[140,436],[38,5567,5569,5583],{"className":5568},[144],[38,5570,5572],{"className":5571,"style":2196},[148],[38,5573,5574,5577],{"style":446},[38,5575],{"className":5576,"style":156},[155],[38,5578,5580],{"className":5579},[160,161,162,163],[38,5581,3326],{"className":5582,"style":3704},[98,167,163],[38,5584,464],{"className":5585},[463],[38,5587,5589],{"className":5588},[144],[38,5590,5592],{"className":5591,"style":471},[148],[38,5593],{},[10,5595,5596],{},"以一句话举个例子，现在我们分析「Cat sat seat」这一句子。",[10,5598,5599,5600,5781],{},"我们先对「Cat」这一个词进行分析，计算它和剩余词汇的相似度，我们假设，它与「sat」和「seat」的相似度分别为 ",[38,5601,5603,5641],{"className":5602,"translate":42},[41],[38,5604,5606],{"className":5605},[46],[48,5607,5608],{"xmlns":50},[52,5609,5610,5638],{},[55,5611,5612,5619,5621,5624,5626,5633,5635],{},[330,5613,5614,5616],{},[58,5615,345],{},[335,5617,5618],{},"sat",[63,5620,340],{},[203,5622,5623],{},"0.1",[63,5625,380],{"separator":85},[330,5627,5628,5630],{},[58,5629,345],{},[335,5631,5632],{},"seat",[63,5634,340],{},[203,5636,5637],{},"0.05",[77,5639,5640],{"encoding":79},"w_\\text{sat}=0.1,w_\\text{seat}=0.05",[38,5642,5644,5704,5772],{"className":5643,"ariaHidden":85},[84],[38,5645,5647,5651,5695,5698,5701],{"className":5646},[89],[38,5648],{"className":5649,"style":5650},[93],"height:0.5806em;vertical-align:-0.15em;",[38,5652,5654,5657],{"className":5653},[98],[38,5655,345],{"className":5656,"style":496},[98,167],[38,5658,5660],{"className":5659},[136],[38,5661,5663,5687],{"className":5662},[140,436],[38,5664,5666,5684],{"className":5665},[144],[38,5667,5670],{"className":5668,"style":5669},[148],"height:0.2806em;",[38,5671,5672,5675],{"style":512},[38,5673],{"className":5674,"style":156},[155],[38,5676,5678],{"className":5677},[160,161,162,163],[38,5679,5681],{"className":5680},[98,456,163],[38,5682,5618],{"className":5683},[98,163],[38,5685,464],{"className":5686},[463],[38,5688,5690],{"className":5689},[144],[38,5691,5693],{"className":5692,"style":531},[148],[38,5694],{},[38,5696],{"className":5697,"style":111},[110],[38,5699,340],{"className":5700},[115],[38,5702],{"className":5703,"style":111},[110],[38,5705,5707,5711,5714,5717,5720,5763,5766,5769],{"className":5706},[89],[38,5708],{"className":5709,"style":5710},[93],"height:0.8389em;vertical-align:-0.1944em;",[38,5712,5623],{"className":5713},[98],[38,5715,380],{"className":5716},[537],[38,5718],{"className":5719,"style":541},[110],[38,5721,5723,5726],{"className":5722},[98],[38,5724,345],{"className":5725,"style":496},[98,167],[38,5727,5729],{"className":5728},[136],[38,5730,5732,5755],{"className":5731},[140,436],[38,5733,5735,5752],{"className":5734},[144],[38,5736,5738],{"className":5737,"style":5669},[148],[38,5739,5740,5743],{"style":512},[38,5741],{"className":5742,"style":156},[155],[38,5744,5746],{"className":5745},[160,161,162,163],[38,5747,5749],{"className":5748},[98,456,163],[38,5750,5632],{"className":5751},[98,163],[38,5753,464],{"className":5754},[463],[38,5756,5758],{"className":5757},[144],[38,5759,5761],{"className":5760,"style":531},[148],[38,5762],{},[38,5764],{"className":5765,"style":111},[110],[38,5767,340],{"className":5768},[115],[38,5770],{"className":5771,"style":111},[110],[38,5773,5775,5778],{"className":5774},[89],[38,5776],{"className":5777,"style":856},[93],[38,5779,5637],{"className":5780},[98],"。那么，我们就可以认为",[38,5783,5785],{"className":5784,"translate":42},[315],[38,5786,5788,5827],{"className":5787,"translate":42},[41],[38,5789,5791],{"className":5790},[46],[48,5792,5793],{"xmlns":50,"display":324},[52,5794,5795,5824],{},[55,5796,5797,5804,5806,5808,5814,5816,5818],{},[330,5798,5799,5801],{},[58,5800,61],{"mathvariant":60},[335,5802,5803],{},"cat",[63,5805,340],{},[203,5807,5623],{},[330,5809,5810,5812],{},[58,5811,61],{"mathvariant":60},[335,5813,5618],{},[63,5815,361],{},[203,5817,5637],{},[330,5819,5820,5822],{},[58,5821,61],{"mathvariant":60},[335,5823,5632],{},[77,5825,5826],{"encoding":79},"\\boldsymbol{v}_\\text{cat} = 0.1 \\boldsymbol{v}_\\text{sat} + 0.05 \\boldsymbol{v}_\\text{seat}",[38,5828,5830,5894,5962],{"className":5829,"ariaHidden":85},[84],[38,5831,5833,5836,5885,5888,5891],{"className":5832},[89],[38,5834],{"className":5835,"style":2747},[93],[38,5837,5839,5848],{"className":5838},[98],[38,5840,5842],{"className":5841},[98],[38,5843,5845],{"className":5844},[98],[38,5846,61],{"className":5847,"style":106},[98,105],[38,5849,5851],{"className":5850},[136],[38,5852,5854,5877],{"className":5853},[140,436],[38,5855,5857,5874],{"className":5856},[144],[38,5858,5860],{"className":5859,"style":5669},[148],[38,5861,5862,5865],{"style":446},[38,5863],{"className":5864,"style":156},[155],[38,5866,5868],{"className":5867},[160,161,162,163],[38,5869,5871],{"className":5870},[98,456,163],[38,5872,5803],{"className":5873},[98,163],[38,5875,464],{"className":5876},[463],[38,5878,5880],{"className":5879},[144],[38,5881,5883],{"className":5882,"style":531},[148],[38,5884],{},[38,5886],{"className":5887,"style":111},[110],[38,5889,340],{"className":5890},[115],[38,5892],{"className":5893,"style":111},[110],[38,5895,5897,5901,5904,5953,5956,5959],{"className":5896},[89],[38,5898],{"className":5899,"style":5900},[93],"height:0.7944em;vertical-align:-0.15em;",[38,5902,5623],{"className":5903},[98],[38,5905,5907,5916],{"className":5906},[98],[38,5908,5910],{"className":5909},[98],[38,5911,5913],{"className":5912},[98],[38,5914,61],{"className":5915,"style":106},[98,105],[38,5917,5919],{"className":5918},[136],[38,5920,5922,5945],{"className":5921},[140,436],[38,5923,5925,5942],{"className":5924},[144],[38,5926,5928],{"className":5927,"style":5669},[148],[38,5929,5930,5933],{"style":446},[38,5931],{"className":5932,"style":156},[155],[38,5934,5936],{"className":5935},[160,161,162,163],[38,5937,5939],{"className":5938},[98,456,163],[38,5940,5618],{"className":5941},[98,163],[38,5943,464],{"className":5944},[463],[38,5946,5948],{"className":5947},[144],[38,5949,5951],{"className":5950,"style":531},[148],[38,5952],{},[38,5954],{"className":5955,"style":595},[110],[38,5957,361],{"className":5958},[599],[38,5960],{"className":5961,"style":595},[110],[38,5963,5965,5968,5971],{"className":5964},[89],[38,5966],{"className":5967,"style":5900},[93],[38,5969,5637],{"className":5970},[98],[38,5972,5974,5983],{"className":5973},[98],[38,5975,5977],{"className":5976},[98],[38,5978,5980],{"className":5979},[98],[38,5981,61],{"className":5982,"style":106},[98,105],[38,5984,5986],{"className":5985},[136],[38,5987,5989,6012],{"className":5988},[140,436],[38,5990,5992,6009],{"className":5991},[144],[38,5993,5995],{"className":5994,"style":5669},[148],[38,5996,5997,6000],{"style":446},[38,5998],{"className":5999,"style":156},[155],[38,6001,6003],{"className":6002},[160,161,162,163],[38,6004,6006],{"className":6005},[98,456,163],[38,6007,5632],{"className":6008},[98,163],[38,6010,464],{"className":6011},[463],[38,6013,6015],{"className":6014},[144],[38,6016,6018],{"className":6017,"style":531},[148],[38,6019],{},[10,6021,6022],{},"含义是：「Cat」这个词的含义，是由「sat」的 10% 和「seat」的 5% 组成的。对于剩下的词「sat」和「seat」也是同理。",[10,6024,6025,6026],{},"上面的过程，就是分别计算 ",[170,6027,6028],{},"“一个词的含义，分别可以由剩下的词各自以多少比例组成”的过程，就是注意力机制的雏形。",[10,6030,6031,6032,6061,6062,6097],{},"然而，上面的权重 ",[38,6033,6035,6048],{"className":6034,"translate":42},[41],[38,6036,6038],{"className":6037},[46],[48,6039,6040],{"xmlns":50},[52,6041,6042,6046],{},[55,6043,6044],{},[58,6045,345],{},[77,6047,345],{"encoding":79},[38,6049,6051],{"className":6050,"ariaHidden":85},[84],[38,6052,6054,6058],{"className":6053},[89],[38,6055],{"className":6056,"style":6057},[93],"height:0.4306em;",[38,6059,345],{"className":6060,"style":496},[98,167]," 的总和不是 ",[38,6063,6065,6083],{"className":6064,"translate":42},[41],[38,6066,6068],{"className":6067},[46],[48,6069,6070],{"xmlns":50},[52,6071,6072,6080],{},[55,6073,6074,6077],{},[203,6075,6076],{},"100",[58,6078,6079],{"mathvariant":400},"%",[77,6081,6082],{"encoding":79},"100\\%",[38,6084,6086],{"className":6085,"ariaHidden":85},[84],[38,6087,6089,6093],{"className":6088},[89],[38,6090],{"className":6091,"style":6092},[93],"height:0.8056em;vertical-align:-0.0556em;",[38,6094,6096],{"className":6095},[98],"100%","，为了保证在数值上有一致性，我们需要把权重值进行 Softmax 正则化一下：",[38,6099,6101],{"className":6100,"translate":42},[315],[38,6102,6104,6143],{"className":6103,"translate":42},[41],[38,6105,6107],{"className":6106},[46],[48,6108,6109],{"xmlns":50,"display":324},[52,6110,6111,6140],{},[55,6112,6113,6125,6127,6130,6132,6138],{},[330,6114,6115,6123],{},[6116,6117,6118,6120],"mover",{"accent":85},[58,6119,345],{},[63,6121,6122],{},"~",[58,6124,3326],{},[63,6126,340],{},[335,6128,6129],{},"softmax",[63,6131,1500],{"stretchy":1499},[330,6133,6134,6136],{},[58,6135,345],{},[58,6137,3326],{},[63,6139,1542],{"stretchy":1499},[77,6141,6142],{"encoding":79},"\\tilde{w}_j = \\text{softmax}(w_j)",[38,6144,6146,6236],{"className":6145,"ariaHidden":85},[84],[38,6147,6149,6153,6227,6230,6233],{"className":6148},[89],[38,6150],{"className":6151,"style":6152},[93],"height:0.954em;vertical-align:-0.2861em;",[38,6154,6156,6193],{"className":6155},[98],[38,6157,6160],{"className":6158},[98,6159],"accent",[38,6161,6163],{"className":6162},[140],[38,6164,6166],{"className":6165},[144],[38,6167,6170,6179],{"className":6168,"style":6169},[148],"height:0.6679em;",[38,6171,6173,6176],{"style":6172},"top:-3em;",[38,6174],{"className":6175,"style":3510},[155],[38,6177,345],{"className":6178,"style":496},[98,167],[38,6180,6182,6185],{"style":6181},"top:-3.35em;",[38,6183],{"className":6184,"style":3510},[155],[38,6186,6190],{"className":6187,"style":6189},[6188],"accent-body","left:-0.1667em;",[38,6191,6122],{"className":6192},[98],[38,6194,6196],{"className":6195},[136],[38,6197,6199,6219],{"className":6198},[140,436],[38,6200,6202,6216],{"className":6201},[144],[38,6203,6205],{"className":6204,"style":2196},[148],[38,6206,6207,6210],{"style":512},[38,6208],{"className":6209,"style":156},[155],[38,6211,6213],{"className":6212},[160,161,162,163],[38,6214,3326],{"className":6215,"style":3704},[98,167,163],[38,6217,464],{"className":6218},[463],[38,6220,6222],{"className":6221},[144],[38,6223,6225],{"className":6224,"style":471},[148],[38,6226],{},[38,6228],{"className":6229,"style":111},[110],[38,6231,340],{"className":6232},[115],[38,6234],{"className":6235,"style":111},[110],[38,6237,6239,6243,6249,6252,6292],{"className":6238},[89],[38,6240],{"className":6241,"style":6242},[93],"height:1.0361em;vertical-align:-0.2861em;",[38,6244,6246],{"className":6245},[98,456],[38,6247,6129],{"className":6248},[98],[38,6250,1500],{"className":6251},[1578],[38,6253,6255,6258],{"className":6254},[98],[38,6256,345],{"className":6257,"style":496},[98,167],[38,6259,6261],{"className":6260},[136],[38,6262,6264,6284],{"className":6263},[140,436],[38,6265,6267,6281],{"className":6266},[144],[38,6268,6270],{"className":6269,"style":2196},[148],[38,6271,6272,6275],{"style":512},[38,6273],{"className":6274,"style":156},[155],[38,6276,6278],{"className":6277},[160,161,162,163],[38,6279,3326],{"className":6280,"style":3704},[98,167,163],[38,6282,464],{"className":6283},[463],[38,6285,6287],{"className":6286},[144],[38,6288,6290],{"className":6289,"style":471},[148],[38,6291],{},[38,6293,1542],{"className":6294},[1794],[10,6296,6297,6298,6526,6527],{},"这样使得 ",[38,6299,6301,6341],{"className":6300,"translate":42},[41],[38,6302,6304],{"className":6303},[46],[48,6305,6306],{"xmlns":50},[52,6307,6308,6338],{},[55,6309,6310,6320,6322,6332,6334,6336],{},[330,6311,6312,6318],{},[6116,6313,6314,6316],{"accent":85},[58,6315,345],{},[63,6317,6122],{},[335,6319,5618],{},[63,6321,361],{},[330,6323,6324,6330],{},[6116,6325,6326,6328],{"accent":85},[58,6327,345],{},[63,6329,6122],{},[335,6331,5632],{},[63,6333,340],{},[203,6335,6076],{},[58,6337,6079],{"mathvariant":400},[77,6339,6340],{"encoding":79},"\\tilde{w}_\\text{sat} + \\tilde{w}_\\text{seat} = 100\\%",[38,6342,6344,6431,6517],{"className":6343,"ariaHidden":85},[84],[38,6345,6347,6351,6422,6425,6428],{"className":6346},[89],[38,6348],{"className":6349,"style":6350},[93],"height:0.8179em;vertical-align:-0.15em;",[38,6352,6354,6385],{"className":6353},[98],[38,6355,6357],{"className":6356},[98,6159],[38,6358,6360],{"className":6359},[140],[38,6361,6363],{"className":6362},[144],[38,6364,6366,6374],{"className":6365,"style":6169},[148],[38,6367,6368,6371],{"style":6172},[38,6369],{"className":6370,"style":3510},[155],[38,6372,345],{"className":6373,"style":496},[98,167],[38,6375,6376,6379],{"style":6181},[38,6377],{"className":6378,"style":3510},[155],[38,6380,6382],{"className":6381,"style":6189},[6188],[38,6383,6122],{"className":6384},[98],[38,6386,6388],{"className":6387},[136],[38,6389,6391,6414],{"className":6390},[140,436],[38,6392,6394,6411],{"className":6393},[144],[38,6395,6397],{"className":6396,"style":5669},[148],[38,6398,6399,6402],{"style":512},[38,6400],{"className":6401,"style":156},[155],[38,6403,6405],{"className":6404},[160,161,162,163],[38,6406,6408],{"className":6407},[98,456,163],[38,6409,5618],{"className":6410},[98,163],[38,6412,464],{"className":6413},[463],[38,6415,6417],{"className":6416},[144],[38,6418,6420],{"className":6419,"style":531},[148],[38,6421],{},[38,6423],{"className":6424,"style":595},[110],[38,6426,361],{"className":6427},[599],[38,6429],{"className":6430,"style":595},[110],[38,6432,6434,6437,6508,6511,6514],{"className":6433},[89],[38,6435],{"className":6436,"style":6350},[93],[38,6438,6440,6471],{"className":6439},[98],[38,6441,6443],{"className":6442},[98,6159],[38,6444,6446],{"className":6445},[140],[38,6447,6449],{"className":6448},[144],[38,6450,6452,6460],{"className":6451,"style":6169},[148],[38,6453,6454,6457],{"style":6172},[38,6455],{"className":6456,"style":3510},[155],[38,6458,345],{"className":6459,"style":496},[98,167],[38,6461,6462,6465],{"style":6181},[38,6463],{"className":6464,"style":3510},[155],[38,6466,6468],{"className":6467,"style":6189},[6188],[38,6469,6122],{"className":6470},[98],[38,6472,6474],{"className":6473},[136],[38,6475,6477,6500],{"className":6476},[140,436],[38,6478,6480,6497],{"className":6479},[144],[38,6481,6483],{"className":6482,"style":5669},[148],[38,6484,6485,6488],{"style":512},[38,6486],{"className":6487,"style":156},[155],[38,6489,6491],{"className":6490},[160,161,162,163],[38,6492,6494],{"className":6493},[98,456,163],[38,6495,5632],{"className":6496},[98,163],[38,6498,464],{"className":6499},[463],[38,6501,6503],{"className":6502},[144],[38,6504,6506],{"className":6505,"style":531},[148],[38,6507],{},[38,6509],{"className":6510,"style":111},[110],[38,6512,340],{"className":6513},[115],[38,6515],{"className":6516,"style":111},[110],[38,6518,6520,6523],{"className":6519},[89],[38,6521],{"className":6522,"style":6092},[93],[38,6524,6096],{"className":6525},[98],"，计算得到 ",[38,6528,6530,6580],{"className":6529,"translate":42},[41],[38,6531,6533],{"className":6532},[46],[48,6534,6535],{"xmlns":50},[52,6536,6537,6577],{},[55,6538,6539,6549,6551,6554,6556,6558,6560,6570,6572,6575],{},[330,6540,6541,6547],{},[6116,6542,6543,6545],{"accent":85},[58,6544,345],{},[63,6546,6122],{},[335,6548,5618],{},[63,6550,340],{},[203,6552,6553],{},"51.25",[58,6555,6079],{"mathvariant":400},[63,6557,380],{"separator":85},[110,6559],{"width":383},[330,6561,6562,6568],{},[6116,6563,6564,6566],{"accent":85},[58,6565,345],{},[63,6567,6122],{},[335,6569,5632],{},[63,6571,340],{},[203,6573,6574],{},"48.75",[58,6576,6079],{"mathvariant":400},[77,6578,6579],{"encoding":79},"\\tilde{w}_\\text{sat} = 51.25\\%, \\quad \\tilde{w}_\\text{seat} = 48.75\\%",[38,6581,6583,6669,6769],{"className":6582,"ariaHidden":85},[84],[38,6584,6586,6589,6660,6663,6666],{"className":6585},[89],[38,6587],{"className":6588,"style":6350},[93],[38,6590,6592,6623],{"className":6591},[98],[38,6593,6595],{"className":6594},[98,6159],[38,6596,6598],{"className":6597},[140],[38,6599,6601],{"className":6600},[144],[38,6602,6604,6612],{"className":6603,"style":6169},[148],[38,6605,6606,6609],{"style":6172},[38,6607],{"className":6608,"style":3510},[155],[38,6610,345],{"className":6611,"style":496},[98,167],[38,6613,6614,6617],{"style":6181},[38,6615],{"className":6616,"style":3510},[155],[38,6618,6620],{"className":6619,"style":6189},[6188],[38,6621,6122],{"className":6622},[98],[38,6624,6626],{"className":6625},[136],[38,6627,6629,6652],{"className":6628},[140,436],[38,6630,6632,6649],{"className":6631},[144],[38,6633,6635],{"className":6634,"style":5669},[148],[38,6636,6637,6640],{"style":512},[38,6638],{"className":6639,"style":156},[155],[38,6641,6643],{"className":6642},[160,161,162,163],[38,6644,6646],{"className":6645},[98,456,163],[38,6647,5618],{"className":6648},[98,163],[38,6650,464],{"className":6651},[463],[38,6653,6655],{"className":6654},[144],[38,6656,6658],{"className":6657,"style":531},[148],[38,6659],{},[38,6661],{"className":6662,"style":111},[110],[38,6664,340],{"className":6665},[115],[38,6667],{"className":6668,"style":111},[110],[38,6670,6672,6676,6680,6683,6686,6689,6760,6763,6766],{"className":6671},[89],[38,6673],{"className":6674,"style":6675},[93],"height:0.9444em;vertical-align:-0.1944em;",[38,6677,6679],{"className":6678},[98],"51.25%",[38,6681,380],{"className":6682},[537],[38,6684],{"className":6685,"style":711},[110],[38,6687],{"className":6688,"style":541},[110],[38,6690,6692,6723],{"className":6691},[98],[38,6693,6695],{"className":6694},[98,6159],[38,6696,6698],{"className":6697},[140],[38,6699,6701],{"className":6700},[144],[38,6702,6704,6712],{"className":6703,"style":6169},[148],[38,6705,6706,6709],{"style":6172},[38,6707],{"className":6708,"style":3510},[155],[38,6710,345],{"className":6711,"style":496},[98,167],[38,6713,6714,6717],{"style":6181},[38,6715],{"className":6716,"style":3510},[155],[38,6718,6720],{"className":6719,"style":6189},[6188],[38,6721,6122],{"className":6722},[98],[38,6724,6726],{"className":6725},[136],[38,6727,6729,6752],{"className":6728},[140,436],[38,6730,6732,6749],{"className":6731},[144],[38,6733,6735],{"className":6734,"style":5669},[148],[38,6736,6737,6740],{"style":512},[38,6738],{"className":6739,"style":156},[155],[38,6741,6743],{"className":6742},[160,161,162,163],[38,6744,6746],{"className":6745},[98,456,163],[38,6747,5632],{"className":6748},[98,163],[38,6750,464],{"className":6751},[463],[38,6753,6755],{"className":6754},[144],[38,6756,6758],{"className":6757,"style":531},[148],[38,6759],{},[38,6761],{"className":6762,"style":111},[110],[38,6764,340],{"className":6765},[115],[38,6767],{"className":6768,"style":111},[110],[38,6770,6772,6775],{"className":6771},[89],[38,6773],{"className":6774,"style":6092},[93],[38,6776,6778],{"className":6777},[98],"48.75%",[10,6780,6781,6782,6861,6862,6898,6899,7041,7042,7079,7080,7393,7394,1527],{},"在注意力机制中，我们把上面的 ",[38,6783,6785,6803],{"className":6784,"translate":42},[41],[38,6786,6788],{"className":6787},[46],[48,6789,6790],{"xmlns":50},[52,6791,6792,6800],{},[55,6793,6794],{},[330,6795,6796,6798],{},[58,6797,61],{"mathvariant":60},[335,6799,5803],{},[77,6801,6802],{"encoding":79},"\\boldsymbol{v}_\\text{cat}",[38,6804,6806],{"className":6805,"ariaHidden":85},[84],[38,6807,6809,6812],{"className":6808},[89],[38,6810],{"className":6811,"style":2747},[93],[38,6813,6815,6824],{"className":6814},[98],[38,6816,6818],{"className":6817},[98],[38,6819,6821],{"className":6820},[98],[38,6822,61],{"className":6823,"style":106},[98,105],[38,6825,6827],{"className":6826},[136],[38,6828,6830,6853],{"className":6829},[140,436],[38,6831,6833,6850],{"className":6832},[144],[38,6834,6836],{"className":6835,"style":5669},[148],[38,6837,6838,6841],{"style":446},[38,6839],{"className":6840,"style":156},[155],[38,6842,6844],{"className":6843},[160,161,162,163],[38,6845,6847],{"className":6846},[98,456,163],[38,6848,5803],{"className":6849},[98,163],[38,6851,464],{"className":6852},[463],[38,6854,6856],{"className":6855},[144],[38,6857,6859],{"className":6858,"style":531},[148],[38,6860],{}," 作为查询向量 ",[38,6863,6865,6880],{"className":6864,"translate":42},[41],[38,6866,6868],{"className":6867},[46],[48,6869,6870],{"xmlns":50},[52,6871,6872,6877],{},[55,6873,6874],{},[58,6875,6876],{"mathvariant":60},"q",[77,6878,6879],{"encoding":79},"\\boldsymbol{q}",[38,6881,6883],{"className":6882,"ariaHidden":85},[84],[38,6884,6886,6889],{"className":6885},[89],[38,6887],{"className":6888,"style":2487},[93],[38,6890,6892],{"className":6891},[98],[38,6893,6895],{"className":6894},[98],[38,6896,6876],{"className":6897,"style":106},[98,105],"，剩余词汇 ",[38,6900,6902,6928],{"className":6901,"translate":42},[41],[38,6903,6905],{"className":6904},[46],[48,6906,6907],{"xmlns":50},[52,6908,6909,6925],{},[55,6910,6911,6917,6919],{},[330,6912,6913,6915],{},[58,6914,61],{"mathvariant":60},[335,6916,5618],{},[63,6918,380],{"separator":85},[330,6920,6921,6923],{},[58,6922,61],{"mathvariant":60},[335,6924,5632],{},[77,6926,6927],{"encoding":79},"\\boldsymbol{v}_\\text{sat}, \\boldsymbol{v}_\\text{seat}",[38,6929,6931],{"className":6930,"ariaHidden":85},[84],[38,6932,6934,6937,6986,6989,6992],{"className":6933},[89],[38,6935],{"className":6936,"style":2487},[93],[38,6938,6940,6949],{"className":6939},[98],[38,6941,6943],{"className":6942},[98],[38,6944,6946],{"className":6945},[98],[38,6947,61],{"className":6948,"style":106},[98,105],[38,6950,6952],{"className":6951},[136],[38,6953,6955,6978],{"className":6954},[140,436],[38,6956,6958,6975],{"className":6957},[144],[38,6959,6961],{"className":6960,"style":5669},[148],[38,6962,6963,6966],{"style":446},[38,6964],{"className":6965,"style":156},[155],[38,6967,6969],{"className":6968},[160,161,162,163],[38,6970,6972],{"className":6971},[98,456,163],[38,6973,5618],{"className":6974},[98,163],[38,6976,464],{"className":6977},[463],[38,6979,6981],{"className":6980},[144],[38,6982,6984],{"className":6983,"style":531},[148],[38,6985],{},[38,6987,380],{"className":6988},[537],[38,6990],{"className":6991,"style":541},[110],[38,6993,6995,7004],{"className":6994},[98],[38,6996,6998],{"className":6997},[98],[38,6999,7001],{"className":7000},[98],[38,7002,61],{"className":7003,"style":106},[98,105],[38,7005,7007],{"className":7006},[136],[38,7008,7010,7033],{"className":7009},[140,436],[38,7011,7013,7030],{"className":7012},[144],[38,7014,7016],{"className":7015,"style":5669},[148],[38,7017,7018,7021],{"style":446},[38,7019],{"className":7020,"style":156},[155],[38,7022,7024],{"className":7023},[160,161,162,163],[38,7025,7027],{"className":7026},[98,456,163],[38,7028,5632],{"className":7029},[98,163],[38,7031,464],{"className":7032},[463],[38,7034,7036],{"className":7035},[144],[38,7037,7039],{"className":7038,"style":531},[148],[38,7040],{}," 作为键向量 ",[38,7043,7045,7060],{"className":7044,"translate":42},[41],[38,7046,7048],{"className":7047},[46],[48,7049,7050],{"xmlns":50},[52,7051,7052,7057],{},[55,7053,7054],{},[58,7055,7056],{"mathvariant":60},"k",[77,7058,7059],{"encoding":79},"\\boldsymbol{k}",[38,7061,7063],{"className":7062,"ariaHidden":85},[84],[38,7064,7066,7069],{"className":7065},[89],[38,7067],{"className":7068,"style":290},[93],[38,7070,7072],{"className":7071},[98],[38,7073,7075],{"className":7074},[98],[38,7076,7056],{"className":7077,"style":7078},[98,105],"margin-right:0.01852em;","，而加权组合 ",[38,7081,7083,7129],{"className":7082,"translate":42},[41],[38,7084,7086],{"className":7085},[46],[48,7087,7088],{"xmlns":50},[52,7089,7090,7126],{},[55,7091,7092,7102,7108,7110,7120],{},[330,7093,7094,7100],{},[6116,7095,7096,7098],{"accent":85},[58,7097,345],{},[63,7099,6122],{},[335,7101,5618],{},[330,7103,7104,7106],{},[58,7105,61],{"mathvariant":60},[335,7107,5618],{},[63,7109,361],{},[330,7111,7112,7118],{},[6116,7113,7114,7116],{"accent":85},[58,7115,345],{},[63,7117,6122],{},[335,7119,5632],{},[330,7121,7122,7124],{},[58,7123,61],{"mathvariant":60},[335,7125,5632],{},[77,7127,7128],{"encoding":79},"\\tilde{w}_\\text{sat} \\boldsymbol{v}_\\text{sat} + \\tilde{w}_\\text{seat} \\boldsymbol{v}_\\text{seat}",[38,7130,7132,7267],{"className":7131,"ariaHidden":85},[84],[38,7133,7135,7138,7209,7258,7261,7264],{"className":7134},[89],[38,7136],{"className":7137,"style":6350},[93],[38,7139,7141,7172],{"className":7140},[98],[38,7142,7144],{"className":7143},[98,6159],[38,7145,7147],{"className":7146},[140],[38,7148,7150],{"className":7149},[144],[38,7151,7153,7161],{"className":7152,"style":6169},[148],[38,7154,7155,7158],{"style":6172},[38,7156],{"className":7157,"style":3510},[155],[38,7159,345],{"className":7160,"style":496},[98,167],[38,7162,7163,7166],{"style":6181},[38,7164],{"className":7165,"style":3510},[155],[38,7167,7169],{"className":7168,"style":6189},[6188],[38,7170,6122],{"className":7171},[98],[38,7173,7175],{"className":7174},[136],[38,7176,7178,7201],{"className":7177},[140,436],[38,7179,7181,7198],{"className":7180},[144],[38,7182,7184],{"className":7183,"style":5669},[148],[38,7185,7186,7189],{"style":512},[38,7187],{"className":7188,"style":156},[155],[38,7190,7192],{"className":7191},[160,161,162,163],[38,7193,7195],{"className":7194},[98,456,163],[38,7196,5618],{"className":7197},[98,163],[38,7199,464],{"className":7200},[463],[38,7202,7204],{"className":7203},[144],[38,7205,7207],{"className":7206,"style":531},[148],[38,7208],{},[38,7210,7212,7221],{"className":7211},[98],[38,7213,7215],{"className":7214},[98],[38,7216,7218],{"className":7217},[98],[38,7219,61],{"className":7220,"style":106},[98,105],[38,7222,7224],{"className":7223},[136],[38,7225,7227,7250],{"className":7226},[140,436],[38,7228,7230,7247],{"className":7229},[144],[38,7231,7233],{"className":7232,"style":5669},[148],[38,7234,7235,7238],{"style":446},[38,7236],{"className":7237,"style":156},[155],[38,7239,7241],{"className":7240},[160,161,162,163],[38,7242,7244],{"className":7243},[98,456,163],[38,7245,5618],{"className":7246},[98,163],[38,7248,464],{"className":7249},[463],[38,7251,7253],{"className":7252},[144],[38,7254,7256],{"className":7255,"style":531},[148],[38,7257],{},[38,7259],{"className":7260,"style":595},[110],[38,7262,361],{"className":7263},[599],[38,7265],{"className":7266,"style":595},[110],[38,7268,7270,7273,7344],{"className":7269},[89],[38,7271],{"className":7272,"style":6350},[93],[38,7274,7276,7307],{"className":7275},[98],[38,7277,7279],{"className":7278},[98,6159],[38,7280,7282],{"className":7281},[140],[38,7283,7285],{"className":7284},[144],[38,7286,7288,7296],{"className":7287,"style":6169},[148],[38,7289,7290,7293],{"style":6172},[38,7291],{"className":7292,"style":3510},[155],[38,7294,345],{"className":7295,"style":496},[98,167],[38,7297,7298,7301],{"style":6181},[38,7299],{"className":7300,"style":3510},[155],[38,7302,7304],{"className":7303,"style":6189},[6188],[38,7305,6122],{"className":7306},[98],[38,7308,7310],{"className":7309},[136],[38,7311,7313,7336],{"className":7312},[140,436],[38,7314,7316,7333],{"className":7315},[144],[38,7317,7319],{"className":7318,"style":5669},[148],[38,7320,7321,7324],{"style":512},[38,7322],{"className":7323,"style":156},[155],[38,7325,7327],{"className":7326},[160,161,162,163],[38,7328,7330],{"className":7329},[98,456,163],[38,7331,5632],{"className":7332},[98,163],[38,7334,464],{"className":7335},[463],[38,7337,7339],{"className":7338},[144],[38,7340,7342],{"className":7341,"style":531},[148],[38,7343],{},[38,7345,7347,7356],{"className":7346},[98],[38,7348,7350],{"className":7349},[98],[38,7351,7353],{"className":7352},[98],[38,7354,61],{"className":7355,"style":106},[98,105],[38,7357,7359],{"className":7358},[136],[38,7360,7362,7385],{"className":7361},[140,436],[38,7363,7365,7382],{"className":7364},[144],[38,7366,7368],{"className":7367,"style":5669},[148],[38,7369,7370,7373],{"style":446},[38,7371],{"className":7372,"style":156},[155],[38,7374,7376],{"className":7375},[160,161,162,163],[38,7377,7379],{"className":7378},[98,456,163],[38,7380,5632],{"className":7381},[98,163],[38,7383,464],{"className":7384},[463],[38,7386,7388],{"className":7387},[144],[38,7389,7391],{"className":7390,"style":531},[148],[38,7392],{}," 则为值向量 ",[38,7395,7397,7411],{"className":7396,"translate":42},[41],[38,7398,7400],{"className":7399},[46],[48,7401,7402],{"xmlns":50},[52,7403,7404,7408],{},[55,7405,7406],{},[58,7407,61],{"mathvariant":60},[77,7409,7410],{"encoding":79},"\\boldsymbol{v}",[38,7412,7414],{"className":7413,"ariaHidden":85},[84],[38,7415,7417,7420],{"className":7416},[89],[38,7418],{"className":7419,"style":1924},[93],[38,7421,7423],{"className":7422},[98],[38,7424,7426],{"className":7425},[98],[38,7427,61],{"className":7428,"style":106},[98,105],[10,7430,7431],{},"值得注意的是，上面的示例均为简化版示例，不涉及线性变换。",[31,7433,7434],{"id":7434},"点积注意力公式",[10,7436,7437],{},"上面的示例中已经演示了简单的注意力机制。但是，我们仍然感觉，似乎还是缺了点什么。",[10,7439,7440],{},"我以前上英语的时候，读到过一个笑话：",[250,7442,7443,7446,7449,7452],{},[10,7444,7445],{},"有一个大学生小明出国读研，在选课的时候发现了这样一门课，名字叫做：《Options，Futures and Other Derivatives》。",[10,7447,7448],{},"于是小明心里边翻译边想：嗯，这门课好啊，讲的是选择，未来，和衍生，一看就是那种人生的哲学，教人们怎么在生活中选择的那种课，就它了，选了！",[10,7450,7451],{},"结果呢，到了上课那天，他傻眼了：老师在前面讲些什么呢？我怎么一句话都听不懂？怎么还有那么多的数学公式？人生选择难不成也要用数学吗？",[10,7453,7454],{},"第二天他才明白，噢，原来这个课的实际名字叫：《期权，期货和其他衍生品》。",[10,7456,7457],{},"这个笑话已经说明了一个问题：在实际的语料库中，一个词可能往往有多个子含义。例如这里，「options」的子含义有两个：「期权」和「选择」。但是有时候，我们需要偏重其中一个子含义，例如在这里，「期权」的含义成分更多，而不是往另一个子含义「选择」上面去理解。",[10,7459,7460,7461,7490],{},"我们在前面的第一节「语义和含义」的一节，已经知道，一个词向量（含义）可以写作多个词向量（子含义）的加权组合。我们假设，一个词的完整含义可以拆分成 ",[38,7462,7464,7477],{"className":7463,"translate":42},[41],[38,7465,7467],{"className":7466},[46],[48,7468,7469],{"xmlns":50},[52,7470,7471,7475],{},[55,7472,7473],{},[58,7474,7056],{},[77,7476,7056],{"encoding":79},[38,7478,7480],{"className":7479,"ariaHidden":85},[84],[38,7481,7483,7486],{"className":7482},[89],[38,7484],{"className":7485,"style":290},[93],[38,7487,7056],{"className":7488,"style":7489},[98,167],"margin-right:0.03148em;"," 个子含义，每个子含义对应一个向量，并通过权重加权得到最终含义。",[38,7492,7494],{"className":7493,"translate":42},[315],[38,7495,7497,7570],{"className":7496,"translate":42},[41],[38,7498,7500],{"className":7499},[46],[48,7501,7502],{"xmlns":50,"display":324},[52,7503,7504,7567],{},[55,7505,7506,7508,7510,7524,7530,7536,7538,7540,7543,7557,7563,7565],{},[58,7507,61],{"mathvariant":60},[63,7509,340],{},[1968,7511,7512,7514,7522],{},[63,7513,1972],{},[55,7515,7516,7518,7520],{},[58,7517,1977],{},[63,7519,340],{},[203,7521,348],{},[58,7523,7056],{},[330,7525,7526,7528],{},[58,7527,345],{},[58,7529,1977],{},[330,7531,7532,7534],{},[58,7533,61],{"mathvariant":60},[58,7535,1977],{},[63,7537,380],{"separator":85},[110,7539],{"width":383},[335,7541,7542],{},"s.t.",[1968,7544,7545,7547,7555],{},[63,7546,1972],{},[55,7548,7549,7551,7553],{},[58,7550,1977],{},[63,7552,340],{},[203,7554,348],{},[58,7556,7056],{},[330,7558,7559,7561],{},[58,7560,345],{},[58,7562,1977],{},[63,7564,340],{},[203,7566,348],{},[77,7568,7569],{"encoding":79},"\\boldsymbol{v} = \\sum^{k}_{i=1} w_i\\boldsymbol{v}_i, \\quad \\text{s.t.} \\sum_{i=1}^{k} w_i = 1",[38,7571,7573,7597,7898],{"className":7572,"ariaHidden":85},[84],[38,7574,7576,7579,7588,7591,7594],{"className":7575},[89],[38,7577],{"className":7578,"style":1924},[93],[38,7580,7582],{"className":7581},[98],[38,7583,7585],{"className":7584},[98],[38,7586,61],{"className":7587,"style":106},[98,105],[38,7589],{"className":7590,"style":111},[110],[38,7592,340],{"className":7593},[115],[38,7595],{"className":7596,"style":111},[110],[38,7598,7600,7604,7672,7675,7715,7761,7764,7767,7770,7776,7779,7846,7849,7889,7892,7895],{"className":7599},[89],[38,7601],{"className":7602,"style":7603},[93],"height:3.1138em;vertical-align:-1.2777em;",[38,7605,7607],{"className":7606},[2101,2102],[38,7608,7610,7664],{"className":7609},[140,436],[38,7611,7613,7661],{"className":7612},[144],[38,7614,7617,7637,7647],{"className":7615,"style":7616},[148],"height:1.8361em;",[38,7618,7619,7622],{"style":2115},[38,7620],{"className":7621,"style":2119},[155],[38,7623,7625],{"className":7624},[160,161,162,163],[38,7626,7628,7631,7634],{"className":7627},[98,163],[38,7629,1977],{"className":7630},[98,167,163],[38,7632,340],{"className":7633},[115,163],[38,7635,348],{"className":7636},[98,163],[38,7638,7639,7642],{"style":2137},[38,7640],{"className":7641,"style":2119},[155],[38,7643,7644],{},[38,7645,1972],{"className":7646},[2101,2146,2147],[38,7648,7649,7652],{"style":2150},[38,7650],{"className":7651,"style":2119},[155],[38,7653,7655],{"className":7654},[160,161,162,163],[38,7656,7658],{"className":7657},[98,163],[38,7659,7056],{"className":7660,"style":7489},[98,167,163],[38,7662,464],{"className":7663},[463],[38,7665,7667],{"className":7666},[144],[38,7668,7670],{"className":7669,"style":2172},[148],[38,7671],{},[38,7673],{"className":7674,"style":541},[110],[38,7676,7678,7681],{"className":7677},[98],[38,7679,345],{"className":7680,"style":496},[98,167],[38,7682,7684],{"className":7683},[136],[38,7685,7687,7707],{"className":7686},[140,436],[38,7688,7690,7704],{"className":7689},[144],[38,7691,7693],{"className":7692,"style":2196},[148],[38,7694,7695,7698],{"style":512},[38,7696],{"className":7697,"style":156},[155],[38,7699,7701],{"className":7700},[160,161,162,163],[38,7702,1977],{"className":7703},[98,167,163],[38,7705,464],{"className":7706},[463],[38,7708,7710],{"className":7709},[144],[38,7711,7713],{"className":7712,"style":531},[148],[38,7714],{},[38,7716,7718,7727],{"className":7717},[98],[38,7719,7721],{"className":7720},[98],[38,7722,7724],{"className":7723},[98],[38,7725,61],{"className":7726,"style":106},[98,105],[38,7728,7730],{"className":7729},[136],[38,7731,7733,7753],{"className":7732},[140,436],[38,7734,7736,7750],{"className":7735},[144],[38,7737,7739],{"className":7738,"style":2196},[148],[38,7740,7741,7744],{"style":446},[38,7742],{"className":7743,"style":156},[155],[38,7745,7747],{"className":7746},[160,161,162,163],[38,7748,1977],{"className":7749},[98,167,163],[38,7751,464],{"className":7752},[463],[38,7754,7756],{"className":7755},[144],[38,7757,7759],{"className":7758,"style":531},[148],[38,7760],{},[38,7762,380],{"className":7763},[537],[38,7765],{"className":7766,"style":711},[110],[38,7768],{"className":7769,"style":541},[110],[38,7771,7773],{"className":7772},[98,456],[38,7774,7542],{"className":7775},[98],[38,7777],{"className":7778,"style":541},[110],[38,7780,7782],{"className":7781},[2101,2102],[38,7783,7785,7838],{"className":7784},[140,436],[38,7786,7788,7835],{"className":7787},[144],[38,7789,7791,7811,7821],{"className":7790,"style":7616},[148],[38,7792,7793,7796],{"style":2115},[38,7794],{"className":7795,"style":2119},[155],[38,7797,7799],{"className":7798},[160,161,162,163],[38,7800,7802,7805,7808],{"className":7801},[98,163],[38,7803,1977],{"className":7804},[98,167,163],[38,7806,340],{"className":7807},[115,163],[38,7809,348],{"className":7810},[98,163],[38,7812,7813,7816],{"style":2137},[38,7814],{"className":7815,"style":2119},[155],[38,7817,7818],{},[38,7819,1972],{"className":7820},[2101,2146,2147],[38,7822,7823,7826],{"style":2150},[38,7824],{"className":7825,"style":2119},[155],[38,7827,7829],{"className":7828},[160,161,162,163],[38,7830,7832],{"className":7831},[98,163],[38,7833,7056],{"className":7834,"style":7489},[98,167,163],[38,7836,464],{"className":7837},[463],[38,7839,7841],{"className":7840},[144],[38,7842,7844],{"className":7843,"style":2172},[148],[38,7845],{},[38,7847],{"className":7848,"style":541},[110],[38,7850,7852,7855],{"className":7851},[98],[38,7853,345],{"className":7854,"style":496},[98,167],[38,7856,7858],{"className":7857},[136],[38,7859,7861,7881],{"className":7860},[140,436],[38,7862,7864,7878],{"className":7863},[144],[38,7865,7867],{"className":7866,"style":2196},[148],[38,7868,7869,7872],{"style":512},[38,7870],{"className":7871,"style":156},[155],[38,7873,7875],{"className":7874},[160,161,162,163],[38,7876,1977],{"className":7877},[98,167,163],[38,7879,464],{"className":7880},[463],[38,7882,7884],{"className":7883},[144],[38,7885,7887],{"className":7886,"style":531},[148],[38,7888],{},[38,7890],{"className":7891,"style":111},[110],[38,7893,340],{"className":7894},[115],[38,7896],{"className":7897,"style":111},[110],[38,7899,7901,7904],{"className":7900},[89],[38,7902],{"className":7903,"style":856},[93],[38,7905,348],{"className":7906},[98],[10,7908,7909],{},"就比如，「Options」这个词，它的完整含义，本质上可以拆分成是「选择」和「期权」两个「子含义」的加权组合。",[10,7911,7912],{},"就像上面的例子，如果放在金融语境下，它实际上需要以更大的权重往「期权」的子含义上去理解。对于「选择」方向上的子含义权重要小一些，以避免干扰。",[10,7914,7915],{},"之前，我们的一个句子是有序词元素，这个序列每一个元素，是一个词含义与位置含义的加权组合成的“新含义”。",[38,7917,7919],{"className":7918,"translate":42},[315],[38,7920,7922,7961],{"className":7921,"translate":42},[41],[38,7923,7925],{"className":7924},[46],[48,7926,7927],{"xmlns":50,"display":324},[52,7928,7929,7958],{},[55,7930,7931,7942,7944,7950,7952],{},[7932,7933,7934,7936,7938],"msubsup",{},[58,7935,61],{"mathvariant":60},[58,7937,1977],{},[63,7939,7941],{"mathvariant":400,"lspace":7940,"rspace":7940},"0em","′",[63,7943,340],{},[330,7945,7946,7948],{},[58,7947,61],{"mathvariant":60},[58,7949,1977],{},[63,7951,361],{},[330,7953,7954,7956],{},[58,7955,10],{"mathvariant":60},[58,7957,1977],{},[77,7959,7960],{"encoding":79},"\\boldsymbol{v}_i' = \\boldsymbol{v}_i + \\boldsymbol{p}_i",[38,7962,7964,8043,8105],{"className":7963,"ariaHidden":85},[84],[38,7965,7967,7971,8034,8037,8040],{"className":7966},[89],[38,7968],{"className":7969,"style":7970},[93],"height:1.0489em;vertical-align:-0.247em;",[38,7972,7974,7983],{"className":7973},[98],[38,7975,7977],{"className":7976},[98],[38,7978,7980],{"className":7979},[98],[38,7981,61],{"className":7982,"style":106},[98,105],[38,7984,7986],{"className":7985},[136],[38,7987,7989,8025],{"className":7988},[140,436],[38,7990,7992,8022],{"className":7991},[144],[38,7993,7996,8008],{"className":7994,"style":7995},[148],"height:0.8019em;",[38,7997,7999,8002],{"style":7998},"top:-2.453em;margin-right:0.05em;",[38,8000],{"className":8001,"style":156},[155],[38,8003,8005],{"className":8004},[160,161,162,163],[38,8006,1977],{"className":8007},[98,167,163],[38,8009,8010,8013],{"style":2063},[38,8011],{"className":8012,"style":156},[155],[38,8014,8016],{"className":8015},[160,161,162,163],[38,8017,8019],{"className":8018},[98,163],[38,8020,7941],{"className":8021},[98,163],[38,8023,464],{"className":8024},[463],[38,8026,8028],{"className":8027},[144],[38,8029,8032],{"className":8030,"style":8031},[148],"height:0.247em;",[38,8033],{},[38,8035],{"className":8036,"style":111},[110],[38,8038,340],{"className":8039},[115],[38,8041],{"className":8042,"style":111},[110],[38,8044,8046,8050,8096,8099,8102],{"className":8045},[89],[38,8047],{"className":8048,"style":8049},[93],"height:0.7333em;vertical-align:-0.15em;",[38,8051,8053,8062],{"className":8052},[98],[38,8054,8056],{"className":8055},[98],[38,8057,8059],{"className":8058},[98],[38,8060,61],{"className":8061,"style":106},[98,105],[38,8063,8065],{"className":8064},[136],[38,8066,8068,8088],{"className":8067},[140,436],[38,8069,8071,8085],{"className":8070},[144],[38,8072,8074],{"className":8073,"style":2196},[148],[38,8075,8076,8079],{"style":446},[38,8077],{"className":8078,"style":156},[155],[38,8080,8082],{"className":8081},[160,161,162,163],[38,8083,1977],{"className":8084},[98,167,163],[38,8086,464],{"className":8087},[463],[38,8089,8091],{"className":8090},[144],[38,8092,8094],{"className":8093,"style":531},[148],[38,8095],{},[38,8097],{"className":8098,"style":595},[110],[38,8100,361],{"className":8101},[599],[38,8103],{"className":8104,"style":595},[110],[38,8106,8108,8111],{"className":8107},[89],[38,8109],{"className":8110,"style":2825},[93],[38,8112,8114,8123],{"className":8113},[98],[38,8115,8117],{"className":8116},[98],[38,8118,8120],{"className":8119},[98],[38,8121,10],{"className":8122},[98,105],[38,8124,8126],{"className":8125},[136],[38,8127,8129,8149],{"className":8128},[140,436],[38,8130,8132,8146],{"className":8131},[144],[38,8133,8135],{"className":8134,"style":2850},[148],[38,8136,8137,8140],{"style":2853},[38,8138],{"className":8139,"style":156},[155],[38,8141,8143],{"className":8142},[160,161,162,163],[38,8144,1977],{"className":8145},[98,167,163],[38,8147,464],{"className":8148},[463],[38,8150,8152],{"className":8151},[144],[38,8153,8155],{"className":8154,"style":2872},[148],[38,8156],{},[10,8158,8159,8160,8188],{},"一个“含义”既然可以拆分成多个“子含义”的加权组合，假设拆成 ",[38,8161,8163,8176],{"className":8162,"translate":42},[41],[38,8164,8166],{"className":8165},[46],[48,8167,8168],{"xmlns":50},[52,8169,8170,8174],{},[55,8171,8172],{},[58,8173,7056],{},[77,8175,7056],{"encoding":79},[38,8177,8179],{"className":8178,"ariaHidden":85},[84],[38,8180,8182,8185],{"className":8181},[89],[38,8183],{"className":8184,"style":290},[93],[38,8186,7056],{"className":8187,"style":7489},[98,167]," 个子含义，那么我们就有",[38,8190,8192],{"className":8191,"translate":42},[315],[38,8193,8195,8226],{"className":8194,"translate":42},[41],[38,8196,8198],{"className":8197},[46],[48,8199,8200],{"xmlns":50,"display":324},[52,8201,8202,8223],{},[55,8203,8204,8212,8214,8221],{},[7932,8205,8206,8208,8210],{},[58,8207,61],{"mathvariant":60},[58,8209,1977],{},[63,8211,7941],{"mathvariant":400,"lspace":7940,"rspace":7940},[63,8213,340],{},[330,8215,8216,8219],{},[58,8217,8218],{},"x",[58,8220,1977],{},[58,8222,1812],{},[77,8224,8225],{"encoding":79},"\\boldsymbol{v}_i' = x_iW",[38,8227,8229,8304],{"className":8228,"ariaHidden":85},[84],[38,8230,8232,8235,8295,8298,8301],{"className":8231},[89],[38,8233],{"className":8234,"style":7970},[93],[38,8236,8238,8247],{"className":8237},[98],[38,8239,8241],{"className":8240},[98],[38,8242,8244],{"className":8243},[98],[38,8245,61],{"className":8246,"style":106},[98,105],[38,8248,8250],{"className":8249},[136],[38,8251,8253,8287],{"className":8252},[140,436],[38,8254,8256,8284],{"className":8255},[144],[38,8257,8259,8270],{"className":8258,"style":7995},[148],[38,8260,8261,8264],{"style":7998},[38,8262],{"className":8263,"style":156},[155],[38,8265,8267],{"className":8266},[160,161,162,163],[38,8268,1977],{"className":8269},[98,167,163],[38,8271,8272,8275],{"style":2063},[38,8273],{"className":8274,"style":156},[155],[38,8276,8278],{"className":8277},[160,161,162,163],[38,8279,8281],{"className":8280},[98,163],[38,8282,7941],{"className":8283},[98,163],[38,8285,464],{"className":8286},[463],[38,8288,8290],{"className":8289},[144],[38,8291,8293],{"className":8292,"style":8031},[148],[38,8294],{},[38,8296],{"className":8297,"style":111},[110],[38,8299,340],{"className":8300},[115],[38,8302],{"className":8303,"style":111},[110],[38,8305,8307,8311,8352],{"className":8306},[89],[38,8308],{"className":8309,"style":8310},[93],"height:0.8333em;vertical-align:-0.15em;",[38,8312,8314,8317],{"className":8313},[98],[38,8315,8218],{"className":8316},[98,167],[38,8318,8320],{"className":8319},[136],[38,8321,8323,8344],{"className":8322},[140,436],[38,8324,8326,8341],{"className":8325},[144],[38,8327,8329],{"className":8328,"style":2196},[148],[38,8330,8332,8335],{"style":8331},"top:-2.55em;margin-left:0em;margin-right:0.05em;",[38,8333],{"className":8334,"style":156},[155],[38,8336,8338],{"className":8337},[160,161,162,163],[38,8339,1977],{"className":8340},[98,167,163],[38,8342,464],{"className":8343},[463],[38,8345,8347],{"className":8346},[144],[38,8348,8350],{"className":8349,"style":531},[148],[38,8351],{},[38,8353,1812],{"className":8354,"style":1842},[98,167],[10,8356,8357,8358,8576,8577,8605],{},"其中 ",[38,8359,8361,8411],{"className":8360,"translate":42},[41],[38,8362,8364],{"className":8363},[46],[48,8365,8366],{"xmlns":50},[52,8367,8368,8408],{},[55,8369,8370,8372,8374,8386,8388,8394,8396],{},[58,8371,1812],{},[63,8373,65],{},[67,8375,8376,8378],{},[58,8377,72],{"mathvariant":71},[55,8379,8380,8382,8384],{},[58,8381,75],{},[63,8383,1404],{},[58,8385,7056],{},[63,8387,380],{"separator":85},[330,8389,8390,8392],{},[58,8391,8218],{},[58,8393,1977],{},[63,8395,65],{},[67,8397,8398,8400],{},[58,8399,72],{"mathvariant":71},[55,8401,8402,8404,8406],{},[203,8403,348],{},[63,8405,1404],{},[58,8407,75],{},[77,8409,8410],{"encoding":79},"W \\in \\mathbb{R}^{d \\times k},x_i \\in \\mathbb{R}^{1 \\times d}",[38,8412,8414,8432,8532],{"className":8413,"ariaHidden":85},[84],[38,8415,8417,8420,8423,8426,8429],{"className":8416},[89],[38,8418],{"className":8419,"style":1419},[93],[38,8421,1812],{"className":8422,"style":1842},[98,167],[38,8424],{"className":8425,"style":111},[110],[38,8427,65],{"className":8428},[115],[38,8430],{"className":8431,"style":111},[110],[38,8433,8435,8439,8477,8480,8483,8523,8526,8529],{"className":8434},[89],[38,8436],{"className":8437,"style":8438},[93],"height:1.0435em;vertical-align:-0.1944em;",[38,8440,8442,8445],{"className":8441},[98],[38,8443,72],{"className":8444},[98,132],[38,8446,8448],{"className":8447},[136],[38,8449,8451],{"className":8450},[140],[38,8452,8454],{"className":8453},[144],[38,8455,8457],{"className":8456,"style":125},[148],[38,8458,8459,8462],{"style":151},[38,8460],{"className":8461,"style":156},[155],[38,8463,8465],{"className":8464},[160,161,162,163],[38,8466,8468,8471,8474],{"className":8467},[98,163],[38,8469,75],{"className":8470},[98,167,163],[38,8472,1404],{"className":8473},[599,163],[38,8475,7056],{"className":8476,"style":7489},[98,167,163],[38,8478,380],{"className":8479},[537],[38,8481],{"className":8482,"style":541},[110],[38,8484,8486,8489],{"className":8485},[98],[38,8487,8218],{"className":8488},[98,167],[38,8490,8492],{"className":8491},[136],[38,8493,8495,8515],{"className":8494},[140,436],[38,8496,8498,8512],{"className":8497},[144],[38,8499,8501],{"className":8500,"style":2196},[148],[38,8502,8503,8506],{"style":8331},[38,8504],{"className":8505,"style":156},[155],[38,8507,8509],{"className":8508},[160,161,162,163],[38,8510,1977],{"className":8511},[98,167,163],[38,8513,464],{"className":8514},[463],[38,8516,8518],{"className":8517},[144],[38,8519,8521],{"className":8520,"style":531},[148],[38,8522],{},[38,8524],{"className":8525,"style":111},[110],[38,8527,65],{"className":8528},[115],[38,8530],{"className":8531,"style":111},[110],[38,8533,8535,8538],{"className":8534},[89],[38,8536],{"className":8537,"style":125},[93],[38,8539,8541,8544],{"className":8540},[98],[38,8542,72],{"className":8543},[98,132],[38,8545,8547],{"className":8546},[136],[38,8548,8550],{"className":8549},[140],[38,8551,8553],{"className":8552},[144],[38,8554,8556],{"className":8555,"style":125},[148],[38,8557,8558,8561],{"style":151},[38,8559],{"className":8560,"style":156},[155],[38,8562,8564],{"className":8563},[160,161,162,163],[38,8565,8567,8570,8573],{"className":8566},[98,163],[38,8568,348],{"className":8569},[98,163],[38,8571,1404],{"className":8572},[599,163],[38,8574,75],{"className":8575},[98,167,163],". 那么，如果我们有一个长度为 ",[38,8578,8580,8593],{"className":8579,"translate":42},[41],[38,8581,8583],{"className":8582},[46],[48,8584,8585],{"xmlns":50},[52,8586,8587,8591],{},[55,8588,8589],{},[58,8590,1401],{},[77,8592,1401],{"encoding":79},[38,8594,8596],{"className":8595,"ariaHidden":85},[84],[38,8597,8599,8602],{"className":8598},[89],[38,8600],{"className":8601,"style":6057},[93],[38,8603,1401],{"className":8604},[98,167]," 的句子序列，把每个词拆成子含义后，就可以把整个序列写成矩阵形式：",[38,8607,8609],{"className":8608,"translate":42},[315],[38,8610,8612,8634],{"className":8611,"translate":42},[41],[38,8613,8615],{"className":8614},[46],[48,8616,8617],{"xmlns":50,"display":324},[52,8618,8619,8631],{},[55,8620,8621,8624,8626,8629],{},[58,8622,8623],{},"M",[63,8625,340],{},[58,8627,8628],{},"X",[58,8630,1812],{},[77,8632,8633],{"encoding":79},"M = XW",[38,8635,8637,8656],{"className":8636,"ariaHidden":85},[84],[38,8638,8640,8643,8647,8650,8653],{"className":8639},[89],[38,8641],{"className":8642,"style":1555},[93],[38,8644,8623],{"className":8645,"style":8646},[98,167],"margin-right:0.10903em;",[38,8648],{"className":8649,"style":111},[110],[38,8651,340],{"className":8652},[115],[38,8654],{"className":8655,"style":111},[110],[38,8657,8659,8662,8666],{"className":8658},[89],[38,8660],{"className":8661,"style":1555},[93],[38,8663,8628],{"className":8664,"style":8665},[98,167],"margin-right:0.07847em;",[38,8667,1812],{"className":8668,"style":1842},[98,167],[10,8670,8357,8671,1527],{},[38,8672,8674,8720],{"className":8673,"translate":42},[41],[38,8675,8677],{"className":8676},[46],[48,8678,8679],{"xmlns":50},[52,8680,8681,8717],{},[55,8682,8683,8685,8687,8699,8701,8703,8705],{},[58,8684,1812],{},[63,8686,65],{},[67,8688,8689,8691],{},[58,8690,72],{"mathvariant":71},[55,8692,8693,8695,8697],{},[58,8694,75],{},[63,8696,1404],{},[58,8698,7056],{},[63,8700,380],{"separator":85},[58,8702,8628],{},[63,8704,65],{},[67,8706,8707,8709],{},[58,8708,72],{"mathvariant":71},[55,8710,8711,8713,8715],{},[58,8712,1401],{},[63,8714,1404],{},[58,8716,75],{},[77,8718,8719],{"encoding":79},"W \\in \\mathbb{R}^{d \\times k},X \\in \\mathbb{R}^{n \\times d}",[38,8721,8723,8741,8803],{"className":8722,"ariaHidden":85},[84],[38,8724,8726,8729,8732,8735,8738],{"className":8725},[89],[38,8727],{"className":8728,"style":1419},[93],[38,8730,1812],{"className":8731,"style":1842},[98,167],[38,8733],{"className":8734,"style":111},[110],[38,8736,65],{"className":8737},[115],[38,8739],{"className":8740,"style":111},[110],[38,8742,8744,8747,8785,8788,8791,8794,8797,8800],{"className":8743},[89],[38,8745],{"className":8746,"style":8438},[93],[38,8748,8750,8753],{"className":8749},[98],[38,8751,72],{"className":8752},[98,132],[38,8754,8756],{"className":8755},[136],[38,8757,8759],{"className":8758},[140],[38,8760,8762],{"className":8761},[144],[38,8763,8765],{"className":8764,"style":125},[148],[38,8766,8767,8770],{"style":151},[38,8768],{"className":8769,"style":156},[155],[38,8771,8773],{"className":8772},[160,161,162,163],[38,8774,8776,8779,8782],{"className":8775},[98,163],[38,8777,75],{"className":8778},[98,167,163],[38,8780,1404],{"className":8781},[599,163],[38,8783,7056],{"className":8784,"style":7489},[98,167,163],[38,8786,380],{"className":8787},[537],[38,8789],{"className":8790,"style":541},[110],[38,8792,8628],{"className":8793,"style":8665},[98,167],[38,8795],{"className":8796,"style":111},[110],[38,8798,65],{"className":8799},[115],[38,8801],{"className":8802,"style":111},[110],[38,8804,8806,8809],{"className":8805},[89],[38,8807],{"className":8808,"style":125},[93],[38,8810,8812,8815],{"className":8811},[98],[38,8813,72],{"className":8814},[98,132],[38,8816,8818],{"className":8817},[136],[38,8819,8821],{"className":8820},[140],[38,8822,8824],{"className":8823},[144],[38,8825,8827],{"className":8826,"style":125},[148],[38,8828,8829,8832],{"style":151},[38,8830],{"className":8831,"style":156},[155],[38,8833,8835],{"className":8834},[160,161,162,163],[38,8836,8838,8841,8844],{"className":8837},[98,163],[38,8839,1401],{"className":8840},[98,167,163],[38,8842,1404],{"className":8843},[599,163],[38,8845,75],{"className":8846},[98,167,163],[10,8848,8849,8850,8923],{},"如果我们把这一层也考虑进注意力机制，那么在上面的过程中，",[38,8851,8853,8875],{"className":8852,"translate":42},[41],[38,8854,8856],{"className":8855},[46],[48,8857,8858],{"xmlns":50},[52,8859,8860,8872],{},[55,8861,8862,8864,8866,8868,8870],{},[58,8863,6876],{"mathvariant":60},[63,8865,380],{"separator":85},[58,8867,7056],{"mathvariant":60},[63,8869,380],{"separator":85},[58,8871,61],{"mathvariant":60},[77,8873,8874],{"encoding":79},"\\boldsymbol{q},\\boldsymbol{k},\\boldsymbol{v}",[38,8876,8878],{"className":8877,"ariaHidden":85},[84],[38,8879,8881,8884,8893,8896,8899,8908,8911,8914],{"className":8880},[89],[38,8882],{"className":8883,"style":609},[93],[38,8885,8887],{"className":8886},[98],[38,8888,8890],{"className":8889},[98],[38,8891,6876],{"className":8892,"style":106},[98,105],[38,8894,380],{"className":8895},[537],[38,8897],{"className":8898,"style":541},[110],[38,8900,8902],{"className":8901},[98],[38,8903,8905],{"className":8904},[98],[38,8906,7056],{"className":8907,"style":7078},[98,105],[38,8909,380],{"className":8910},[537],[38,8912],{"className":8913,"style":541},[110],[38,8915,8917],{"className":8916},[98],[38,8918,8920],{"className":8919},[98],[38,8921,61],{"className":8922,"style":106},[98,105]," 的操作对象，实际上是每个词之间的子含义。那么此时，我们有",[38,8925,8927],{"className":8926,"translate":42},[315],[38,8928,8930,8988],{"className":8929,"translate":42},[41],[38,8931,8933],{"className":8932},[46],[48,8934,8935],{"xmlns":50,"display":324},[52,8936,8937,8985],{},[55,8938,8939,8942,8944,8946,8952,8954,8956,8959,8961,8963,8969,8971,8973,8975,8977,8979],{},[58,8940,8941],{},"Q",[63,8943,340],{},[58,8945,8628],{},[330,8947,8948,8950],{},[58,8949,1812],{},[58,8951,6876],{},[63,8953,380],{"separator":85},[110,8955],{"width":383},[58,8957,8958],{},"K",[63,8960,340],{},[58,8962,8628],{},[330,8964,8965,8967],{},[58,8966,1812],{},[58,8968,7056],{},[63,8970,380],{"separator":85},[110,8972],{"width":383},[58,8974,1390],{},[63,8976,340],{},[58,8978,8628],{},[330,8980,8981,8983],{},[58,8982,1812],{},[58,8984,61],{},[77,8986,8987],{"encoding":79},"Q = XW_q, \\quad K = XW_k, \\quad V = XW_v",[38,8989,8991,9010,9084,9154],{"className":8990,"ariaHidden":85},[84],[38,8992,8994,8998,9001,9004,9007],{"className":8993},[89],[38,8995],{"className":8996,"style":8997},[93],"height:0.8778em;vertical-align:-0.1944em;",[38,8999,8941],{"className":9000},[98,167],[38,9002],{"className":9003,"style":111},[110],[38,9005,340],{"className":9006},[115],[38,9008],{"className":9009,"style":111},[110],[38,9011,9013,9017,9020,9062,9065,9068,9071,9075,9078,9081],{"className":9012},[89],[38,9014],{"className":9015,"style":9016},[93],"height:0.9694em;vertical-align:-0.2861em;",[38,9018,8628],{"className":9019,"style":8665},[98,167],[38,9021,9023,9026],{"className":9022},[98],[38,9024,1812],{"className":9025,"style":1842},[98,167],[38,9027,9029],{"className":9028},[136],[38,9030,9032,9054],{"className":9031},[140,436],[38,9033,9035,9051],{"className":9034},[144],[38,9036,9038],{"className":9037,"style":443},[148],[38,9039,9041,9044],{"style":9040},"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;",[38,9042],{"className":9043,"style":156},[155],[38,9045,9047],{"className":9046},[160,161,162,163],[38,9048,6876],{"className":9049,"style":9050},[98,167,163],"margin-right:0.03588em;",[38,9052,464],{"className":9053},[463],[38,9055,9057],{"className":9056},[144],[38,9058,9060],{"className":9059,"style":471},[148],[38,9061],{},[38,9063,380],{"className":9064},[537],[38,9066],{"className":9067,"style":711},[110],[38,9069],{"className":9070,"style":541},[110],[38,9072,8958],{"className":9073,"style":9074},[98,167],"margin-right:0.07153em;",[38,9076],{"className":9077,"style":111},[110],[38,9079,340],{"className":9080},[115],[38,9082],{"className":9083,"style":111},[110],[38,9085,9087,9090,9093,9133,9136,9139,9142,9145,9148,9151],{"className":9086},[89],[38,9088],{"className":9089,"style":8997},[93],[38,9091,8628],{"className":9092,"style":8665},[98,167],[38,9094,9096,9099],{"className":9095},[98],[38,9097,1812],{"className":9098,"style":1842},[98,167],[38,9100,9102],{"className":9101},[136],[38,9103,9105,9125],{"className":9104},[140,436],[38,9106,9108,9122],{"className":9107},[144],[38,9109,9111],{"className":9110,"style":566},[148],[38,9112,9113,9116],{"style":9040},[38,9114],{"className":9115,"style":156},[155],[38,9117,9119],{"className":9118},[160,161,162,163],[38,9120,7056],{"className":9121,"style":7489},[98,167,163],[38,9123,464],{"className":9124},[463],[38,9126,9128],{"className":9127},[144],[38,9129,9131],{"className":9130,"style":531},[148],[38,9132],{},[38,9134,380],{"className":9135},[537],[38,9137],{"className":9138,"style":711},[110],[38,9140],{"className":9141,"style":541},[110],[38,9143,1390],{"className":9144,"style":1423},[98,167],[38,9146],{"className":9147,"style":111},[110],[38,9149,340],{"className":9150},[115],[38,9152],{"className":9153,"style":111},[110],[38,9155,9157,9160,9163],{"className":9156},[89],[38,9158],{"className":9159,"style":8310},[93],[38,9161,8628],{"className":9162,"style":8665},[98,167],[38,9164,9166,9169],{"className":9165},[98],[38,9167,1812],{"className":9168,"style":1842},[98,167],[38,9170,9172],{"className":9171},[136],[38,9173,9175,9195],{"className":9174},[140,436],[38,9176,9178,9192],{"className":9177},[144],[38,9179,9181],{"className":9180,"style":443},[148],[38,9182,9183,9186],{"style":9040},[38,9184],{"className":9185,"style":156},[155],[38,9187,9189],{"className":9188},[160,161,162,163],[38,9190,61],{"className":9191,"style":9050},[98,167,163],[38,9193,464],{"className":9194},[463],[38,9196,9198],{"className":9197},[144],[38,9199,9201],{"className":9200,"style":531},[148],[38,9202],{},[10,9204,9205],{},"我们得到注意力表达式为：",[38,9207,9209],{"className":9208,"translate":42},[315],[38,9210,9212,9265],{"className":9211,"translate":42},[41],[38,9213,9215],{"className":9214},[46],[48,9216,9217],{"xmlns":50,"display":324},[52,9218,9219,9262],{},[55,9220,9221,9224,9226,9228,9230,9232,9234,9236,9238,9240,9242,9244,9256,9258,9260],{},[335,9222,9223],{},"Attention",[63,9225,1500],{"stretchy":1499},[58,9227,8941],{},[63,9229,380],{"separator":85},[58,9231,8958],{},[63,9233,380],{"separator":85},[58,9235,1390],{},[63,9237,1542],{"stretchy":1499},[63,9239,340],{},[335,9241,6129],{},[63,9243,1500],{"stretchy":1499},[55,9245,9246,9248,9250],{},[58,9247,8941],{},[63,9249,351],{"separator":85},[67,9251,9252,9254],{},[58,9253,8958],{},[58,9255,1960],{},[63,9257,1542],{"stretchy":1499},[63,9259,351],{"separator":85},[58,9261,1390],{},[77,9263,9264],{"encoding":79},"\\text{Attention}(Q,K,V) = \\text{softmax}({Q·K^T}) · V",[38,9266,9268,9316],{"className":9267,"ariaHidden":85},[84],[38,9269,9271,9274,9280,9283,9286,9289,9292,9295,9298,9301,9304,9307,9310,9313],{"className":9270},[89],[38,9272],{"className":9273,"style":1574},[93],[38,9275,9277],{"className":9276},[98,456],[38,9278,9223],{"className":9279},[98],[38,9281,1500],{"className":9282},[1578],[38,9284,8941],{"className":9285},[98,167],[38,9287,380],{"className":9288},[537],[38,9290],{"className":9291,"style":541},[110],[38,9293,8958],{"className":9294,"style":9074},[98,167],[38,9296,380],{"className":9297},[537],[38,9299],{"className":9300,"style":541},[110],[38,9302,1390],{"className":9303,"style":1423},[98,167],[38,9305,1542],{"className":9306},[1794],[38,9308],{"className":9309,"style":111},[110],[38,9311,340],{"className":9312},[115],[38,9314],{"className":9315,"style":111},[110],[38,9317,9319,9323,9329,9332,9373,9376,9379,9382],{"className":9318},[89],[38,9320],{"className":9321,"style":9322},[93],"height:1.1413em;vertical-align:-0.25em;",[38,9324,9326],{"className":9325},[98,456],[38,9327,6129],{"className":9328},[98],[38,9330,1500],{"className":9331},[1578],[38,9333,9335,9338,9341,9344],{"className":9334},[98],[38,9336,8941],{"className":9337},[98,167],[38,9339,351],{"className":9340},[537],[38,9342],{"className":9343,"style":541},[110],[38,9345,9347,9350],{"className":9346},[98],[38,9348,8958],{"className":9349,"style":9074},[98,167],[38,9351,9353],{"className":9352},[136],[38,9354,9356],{"className":9355},[140],[38,9357,9359],{"className":9358},[144],[38,9360,9362],{"className":9361,"style":2042},[148],[38,9363,9364,9367],{"style":2063},[38,9365],{"className":9366,"style":156},[155],[38,9368,9370],{"className":9369},[160,161,162,163],[38,9371,1960],{"className":9372,"style":1842},[98,167,163],[38,9374,1542],{"className":9375},[1794],[38,9377,351],{"className":9378},[537],[38,9380],{"className":9381,"style":541},[110],[38,9383,1390],{"className":9384,"style":1423},[98,167],[10,9386,9387,9388,9462,9463,9553],{},"然而，为了避免 ",[38,9389,9391,9413],{"className":9390,"translate":42},[41],[38,9392,9394],{"className":9393},[46],[48,9395,9396],{"xmlns":50},[52,9397,9398,9410],{},[55,9399,9400,9402,9404],{},[58,9401,8941],{},[63,9403,351],{"separator":85},[67,9405,9406,9408],{},[58,9407,8958],{},[58,9409,1960],{},[77,9411,9412],{"encoding":79},"Q·K^T",[38,9414,9416],{"className":9415,"ariaHidden":85},[84],[38,9417,9419,9423,9426,9429,9432],{"className":9418},[89],[38,9420],{"className":9421,"style":9422},[93],"height:1.0358em;vertical-align:-0.1944em;",[38,9424,8941],{"className":9425},[98,167],[38,9427,351],{"className":9428},[537],[38,9430],{"className":9431,"style":541},[110],[38,9433,9435,9438],{"className":9434},[98],[38,9436,8958],{"className":9437,"style":9074},[98,167],[38,9439,9441],{"className":9440},[136],[38,9442,9444],{"className":9443},[140],[38,9445,9447],{"className":9446},[144],[38,9448,9451],{"className":9449,"style":9450},[148],"height:0.8413em;",[38,9452,9453,9456],{"style":151},[38,9454],{"className":9455,"style":156},[155],[38,9457,9459],{"className":9458},[160,161,162,163],[38,9460,1960],{"className":9461,"style":1842},[98,167,163]," 在数值上爆炸，我们需要手动调整数值规模，将矩阵元素除 ",[38,9464,9466,9483],{"className":9465,"translate":42},[41],[38,9467,9469],{"className":9468},[46],[48,9470,9471],{"xmlns":50},[52,9472,9473,9480],{},[55,9474,9475],{},[9476,9477,9478],"msqrt",{},[58,9479,75],{},[77,9481,9482],{"encoding":79},"\\sqrt{d}",[38,9484,9486],{"className":9485,"ariaHidden":85},[84],[38,9487,9489,9493],{"className":9488},[89],[38,9490],{"className":9491,"style":9492},[93],"height:1.04em;vertical-align:-0.1078em;",[38,9494,9497],{"className":9495},[98,9496],"sqrt",[38,9498,9500,9544],{"className":9499},[140,436],[38,9501,9503,9541],{"className":9502},[144],[38,9504,9507,9521],{"className":9505,"style":9506},[148],"height:0.9322em;",[38,9508,9511,9514],{"className":9509,"style":6172},[9510],"svg-align",[38,9512],{"className":9513,"style":3510},[155],[38,9515,9518],{"className":9516,"style":9517},[98],"padding-left:0.833em;",[38,9519,75],{"className":9520},[98,167],[38,9522,9524,9527],{"style":9523},"top:-2.8922em;",[38,9525],{"className":9526,"style":3510},[155],[38,9528,9532],{"className":9529,"style":9531},[9530],"hide-tail","min-width:0.853em;height:1.08em;",[3462,9533,9538],{"xmlns":3464,"width":9534,"height":9535,"viewBox":9536,"preserveAspectRatio":9537},"400em","1.08em","0 0 400000 1080","xMinYMin slice",[3469,9539],{"d":9540},"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z",[38,9542,464],{"className":9543},[463],[38,9545,9547],{"className":9546},[144],[38,9548,9551],{"className":9549,"style":9550},[148],"height:0.1078em;",[38,9552],{},".我们得到最终的注意力公式为：",[38,9555,9557],{"className":9556,"translate":42},[315],[38,9558,9560,9619],{"className":9559,"translate":42},[41],[38,9561,9563],{"className":9562},[46],[48,9564,9565],{"xmlns":50,"display":324},[52,9566,9567,9616],{},[55,9568,9569,9571,9573,9575,9577,9579,9581,9583,9585,9587,9589,9591,9610,9612,9614],{},[335,9570,9223],{},[63,9572,1500],{"stretchy":1499},[58,9574,8941],{},[63,9576,380],{"separator":85},[58,9578,8958],{},[63,9580,380],{"separator":85},[58,9582,1390],{},[63,9584,1542],{"stretchy":1499},[63,9586,340],{},[335,9588,6129],{},[63,9590,1500],{"stretchy":1499},[9592,9593,9594,9606],"mfrac",{},[55,9595,9596,9598,9600],{},[58,9597,8941],{},[63,9599,351],{"separator":85},[67,9601,9602,9604],{},[58,9603,8958],{},[58,9605,1960],{},[9476,9607,9608],{},[58,9609,75],{},[63,9611,1542],{"stretchy":1499},[63,9613,351],{"separator":85},[58,9615,1390],{},[77,9617,9618],{"encoding":79},"\\text{Attention}(Q,K,V) = \\text{softmax}(\\frac{Q·K^T}{\\sqrt{d}}) · V",[38,9620,9622,9670],{"className":9621,"ariaHidden":85},[84],[38,9623,9625,9628,9634,9637,9640,9643,9646,9649,9652,9655,9658,9661,9664,9667],{"className":9624},[89],[38,9626],{"className":9627,"style":1574},[93],[38,9629,9631],{"className":9630},[98,456],[38,9632,9223],{"className":9633},[98],[38,9635,1500],{"className":9636},[1578],[38,9638,8941],{"className":9639},[98,167],[38,9641,380],{"className":9642},[537],[38,9644],{"className":9645,"style":541},[110],[38,9647,8958],{"className":9648,"style":9074},[98,167],[38,9650,380],{"className":9651},[537],[38,9653],{"className":9654,"style":541},[110],[38,9656,1390],{"className":9657,"style":1423},[98,167],[38,9659,1542],{"className":9660},[1794],[38,9662],{"className":9663,"style":111},[110],[38,9665,340],{"className":9666},[115],[38,9668],{"className":9669,"style":111},[110],[38,9671,9673,9677,9683,9686,9834,9837,9840,9843],{"className":9672},[89],[38,9674],{"className":9675,"style":9676},[93],"height:2.4483em;vertical-align:-0.93em;",[38,9678,9680],{"className":9679},[98,456],[38,9681,6129],{"className":9682},[98],[38,9684,1500],{"className":9685},[1578],[38,9687,9689,9692,9831],{"className":9688},[98],[38,9690],{"className":9691},[1578,4348],[38,9693,9695],{"className":9694},[9592],[38,9696,9698,9822],{"className":9697},[140,436],[38,9699,9701,9819],{"className":9700},[144],[38,9702,9705,9761,9772],{"className":9703,"style":9704},[148],"height:1.5183em;",[38,9706,9708,9711],{"style":9707},"top:-2.1778em;",[38,9709],{"className":9710,"style":3510},[155],[38,9712,9714],{"className":9713},[98],[38,9715,9717],{"className":9716},[98,9496],[38,9718,9720,9753],{"className":9719},[140,436],[38,9721,9723,9750],{"className":9722},[144],[38,9724,9726,9738],{"className":9725,"style":9506},[148],[38,9727,9729,9732],{"className":9728,"style":6172},[9510],[38,9730],{"className":9731,"style":3510},[155],[38,9733,9735],{"className":9734,"style":9517},[98],[38,9736,75],{"className":9737},[98,167],[38,9739,9740,9743],{"style":9523},[38,9741],{"className":9742,"style":3510},[155],[38,9744,9746],{"className":9745,"style":9531},[9530],[3462,9747,9748],{"xmlns":3464,"width":9534,"height":9535,"viewBox":9536,"preserveAspectRatio":9537},[3469,9749],{"d":9540},[38,9751,464],{"className":9752},[463],[38,9754,9756],{"className":9755},[144],[38,9757,9759],{"className":9758,"style":9550},[148],[38,9760],{},[38,9762,9764,9767],{"style":9763},"top:-3.23em;",[38,9765],{"className":9766,"style":3510},[155],[38,9768],{"className":9769,"style":9771},[9770],"frac-line","border-bottom-width:0.04em;",[38,9773,9775,9778],{"style":9774},"top:-3.677em;",[38,9776],{"className":9777,"style":3510},[155],[38,9779,9781,9784,9787,9790],{"className":9780},[98],[38,9782,8941],{"className":9783},[98,167],[38,9785,351],{"className":9786},[537],[38,9788],{"className":9789,"style":541},[110],[38,9791,9793,9796],{"className":9792},[98],[38,9794,8958],{"className":9795,"style":9074},[98,167],[38,9797,9799],{"className":9798},[136],[38,9800,9802],{"className":9801},[140],[38,9803,9805],{"className":9804},[144],[38,9806,9808],{"className":9807,"style":9450},[148],[38,9809,9810,9813],{"style":151},[38,9811],{"className":9812,"style":156},[155],[38,9814,9816],{"className":9815},[160,161,162,163],[38,9817,1960],{"className":9818,"style":1842},[98,167,163],[38,9820,464],{"className":9821},[463],[38,9823,9825],{"className":9824},[144],[38,9826,9829],{"className":9827,"style":9828},[148],"height:0.93em;",[38,9830],{},[38,9832],{"className":9833},[1794,4348],[38,9835,1542],{"className":9836},[1794],[38,9838,351],{"className":9839},[537],[38,9841],{"className":9842,"style":541},[110],[38,9844,1390],{"className":9845,"style":1423},[98,167],[10,9847,9848],{},"换句话说，注意力机制实际上是在词的子含义层面上计算匹配分数，不同子含义之间的相似度决定了信息传递的权重。",[10,9850,9851,9852,10043],{},"上面的矩阵 ",[38,9853,9855,9893],{"className":9854,"translate":42},[41],[38,9856,9858],{"className":9857},[46],[48,9859,9860],{"xmlns":50},[52,9861,9862,9890],{},[55,9863,9864,9866,9868,9874,9876,9882,9884],{},[58,9865,8628],{},[63,9867,380],{"separator":85},[330,9869,9870,9872],{},[58,9871,1812],{},[58,9873,6876],{},[63,9875,380],{"separator":85},[330,9877,9878,9880],{},[58,9879,1812],{},[58,9881,7056],{},[63,9883,380],{"separator":85},[330,9885,9886,9888],{},[58,9887,1812],{},[58,9889,61],{},[77,9891,9892],{"encoding":79},"X,W_q,W_k,W_v",[38,9894,9896],{"className":9895,"ariaHidden":85},[84],[38,9897,9899,9902,9905,9908,9911,9951,9954,9957,9997,10000,10003],{"className":9898},[89],[38,9900],{"className":9901,"style":9016},[93],[38,9903,8628],{"className":9904,"style":8665},[98,167],[38,9906,380],{"className":9907},[537],[38,9909],{"className":9910,"style":541},[110],[38,9912,9914,9917],{"className":9913},[98],[38,9915,1812],{"className":9916,"style":1842},[98,167],[38,9918,9920],{"className":9919},[136],[38,9921,9923,9943],{"className":9922},[140,436],[38,9924,9926,9940],{"className":9925},[144],[38,9927,9929],{"className":9928,"style":443},[148],[38,9930,9931,9934],{"style":9040},[38,9932],{"className":9933,"style":156},[155],[38,9935,9937],{"className":9936},[160,161,162,163],[38,9938,6876],{"className":9939,"style":9050},[98,167,163],[38,9941,464],{"className":9942},[463],[38,9944,9946],{"className":9945},[144],[38,9947,9949],{"className":9948,"style":471},[148],[38,9950],{},[38,9952,380],{"className":9953},[537],[38,9955],{"className":9956,"style":541},[110],[38,9958,9960,9963],{"className":9959},[98],[38,9961,1812],{"className":9962,"style":1842},[98,167],[38,9964,9966],{"className":9965},[136],[38,9967,9969,9989],{"className":9968},[140,436],[38,9970,9972,9986],{"className":9971},[144],[38,9973,9975],{"className":9974,"style":566},[148],[38,9976,9977,9980],{"style":9040},[38,9978],{"className":9979,"style":156},[155],[38,9981,9983],{"className":9982},[160,161,162,163],[38,9984,7056],{"className":9985,"style":7489},[98,167,163],[38,9987,464],{"className":9988},[463],[38,9990,9992],{"className":9991},[144],[38,9993,9995],{"className":9994,"style":531},[148],[38,9996],{},[38,9998,380],{"className":9999},[537],[38,10001],{"className":10002,"style":541},[110],[38,10004,10006,10009],{"className":10005},[98],[38,10007,1812],{"className":10008,"style":1842},[98,167],[38,10010,10012],{"className":10011},[136],[38,10013,10015,10035],{"className":10014},[140,436],[38,10016,10018,10032],{"className":10017},[144],[38,10019,10021],{"className":10020,"style":443},[148],[38,10022,10023,10026],{"style":9040},[38,10024],{"className":10025,"style":156},[155],[38,10027,10029],{"className":10028},[160,161,162,163],[38,10030,61],{"className":10031,"style":9050},[98,167,163],[38,10033,464],{"className":10034},[463],[38,10036,10038],{"className":10037},[144],[38,10039,10041],{"className":10040,"style":531},[148],[38,10042],{}," 均为可学习矩阵，在训练阶段进行随机初始化，后期通过反向传播和梯度下降进行矩阵元素更新。这就是单个注意力头。",[31,10045,10047],{"id":10046},"encoder-和多头注意力","Encoder 和多头注意力",[10,10049,10050,10240],{},[38,10051,10053,10090],{"className":10052,"translate":42},[41],[38,10054,10056],{"className":10055},[46],[48,10057,10058],{"xmlns":50},[52,10059,10060,10088],{},[55,10061,10062,10064,10066,10072,10074,10080,10082],{},[58,10063,8628],{},[63,10065,380],{"separator":85},[330,10067,10068,10070],{},[58,10069,1812],{},[58,10071,6876],{},[63,10073,380],{"separator":85},[330,10075,10076,10078],{},[58,10077,1812],{},[58,10079,7056],{},[63,10081,380],{"separator":85},[330,10083,10084,10086],{},[58,10085,1812],{},[58,10087,61],{},[77,10089,9892],{"encoding":79},[38,10091,10093],{"className":10092,"ariaHidden":85},[84],[38,10094,10096,10099,10102,10105,10108,10148,10151,10154,10194,10197,10200],{"className":10095},[89],[38,10097],{"className":10098,"style":9016},[93],[38,10100,8628],{"className":10101,"style":8665},[98,167],[38,10103,380],{"className":10104},[537],[38,10106],{"className":10107,"style":541},[110],[38,10109,10111,10114],{"className":10110},[98],[38,10112,1812],{"className":10113,"style":1842},[98,167],[38,10115,10117],{"className":10116},[136],[38,10118,10120,10140],{"className":10119},[140,436],[38,10121,10123,10137],{"className":10122},[144],[38,10124,10126],{"className":10125,"style":443},[148],[38,10127,10128,10131],{"style":9040},[38,10129],{"className":10130,"style":156},[155],[38,10132,10134],{"className":10133},[160,161,162,163],[38,10135,6876],{"className":10136,"style":9050},[98,167,163],[38,10138,464],{"className":10139},[463],[38,10141,10143],{"className":10142},[144],[38,10144,10146],{"className":10145,"style":471},[148],[38,10147],{},[38,10149,380],{"className":10150},[537],[38,10152],{"className":10153,"style":541},[110],[38,10155,10157,10160],{"className":10156},[98],[38,10158,1812],{"className":10159,"style":1842},[98,167],[38,10161,10163],{"className":10162},[136],[38,10164,10166,10186],{"className":10165},[140,436],[38,10167,10169,10183],{"className":10168},[144],[38,10170,10172],{"className":10171,"style":566},[148],[38,10173,10174,10177],{"style":9040},[38,10175],{"className":10176,"style":156},[155],[38,10178,10180],{"className":10179},[160,161,162,163],[38,10181,7056],{"className":10182,"style":7489},[98,167,163],[38,10184,464],{"className":10185},[463],[38,10187,10189],{"className":10188},[144],[38,10190,10192],{"className":10191,"style":531},[148],[38,10193],{},[38,10195,380],{"className":10196},[537],[38,10198],{"className":10199,"style":541},[110],[38,10201,10203,10206],{"className":10202},[98],[38,10204,1812],{"className":10205,"style":1842},[98,167],[38,10207,10209],{"className":10208},[136],[38,10210,10212,10232],{"className":10211},[140,436],[38,10213,10215,10229],{"className":10214},[144],[38,10216,10218],{"className":10217,"style":443},[148],[38,10219,10220,10223],{"style":9040},[38,10221],{"className":10222,"style":156},[155],[38,10224,10226],{"className":10225},[160,161,162,163],[38,10227,61],{"className":10228,"style":9050},[98,167,163],[38,10230,464],{"className":10231},[463],[38,10233,10235],{"className":10234},[144],[38,10236,10238],{"className":10237,"style":531},[148],[38,10239],{}," 均为可学习矩阵，在训练阶段进行随机初始化，也就意味着，每一次训练，得到的模型参数都不一定是最优的。在不同的语料上训练时，模型参数也有所差异。",[10,10242,10243,10244,10434,10435,10464],{},"所以，我们需要把矩阵 ",[38,10245,10247,10284],{"className":10246,"translate":42},[41],[38,10248,10250],{"className":10249},[46],[48,10251,10252],{"xmlns":50},[52,10253,10254,10282],{},[55,10255,10256,10258,10260,10266,10268,10274,10276],{},[58,10257,8628],{},[63,10259,380],{"separator":85},[330,10261,10262,10264],{},[58,10263,1812],{},[58,10265,6876],{},[63,10267,380],{"separator":85},[330,10269,10270,10272],{},[58,10271,1812],{},[58,10273,7056],{},[63,10275,380],{"separator":85},[330,10277,10278,10280],{},[58,10279,1812],{},[58,10281,61],{},[77,10283,9892],{"encoding":79},[38,10285,10287],{"className":10286,"ariaHidden":85},[84],[38,10288,10290,10293,10296,10299,10302,10342,10345,10348,10388,10391,10394],{"className":10289},[89],[38,10291],{"className":10292,"style":9016},[93],[38,10294,8628],{"className":10295,"style":8665},[98,167],[38,10297,380],{"className":10298},[537],[38,10300],{"className":10301,"style":541},[110],[38,10303,10305,10308],{"className":10304},[98],[38,10306,1812],{"className":10307,"style":1842},[98,167],[38,10309,10311],{"className":10310},[136],[38,10312,10314,10334],{"className":10313},[140,436],[38,10315,10317,10331],{"className":10316},[144],[38,10318,10320],{"className":10319,"style":443},[148],[38,10321,10322,10325],{"style":9040},[38,10323],{"className":10324,"style":156},[155],[38,10326,10328],{"className":10327},[160,161,162,163],[38,10329,6876],{"className":10330,"style":9050},[98,167,163],[38,10332,464],{"className":10333},[463],[38,10335,10337],{"className":10336},[144],[38,10338,10340],{"className":10339,"style":471},[148],[38,10341],{},[38,10343,380],{"className":10344},[537],[38,10346],{"className":10347,"style":541},[110],[38,10349,10351,10354],{"className":10350},[98],[38,10352,1812],{"className":10353,"style":1842},[98,167],[38,10355,10357],{"className":10356},[136],[38,10358,10360,10380],{"className":10359},[140,436],[38,10361,10363,10377],{"className":10362},[144],[38,10364,10366],{"className":10365,"style":566},[148],[38,10367,10368,10371],{"style":9040},[38,10369],{"className":10370,"style":156},[155],[38,10372,10374],{"className":10373},[160,161,162,163],[38,10375,7056],{"className":10376,"style":7489},[98,167,163],[38,10378,464],{"className":10379},[463],[38,10381,10383],{"className":10382},[144],[38,10384,10386],{"className":10385,"style":531},[148],[38,10387],{},[38,10389,380],{"className":10390},[537],[38,10392],{"className":10393,"style":541},[110],[38,10395,10397,10400],{"className":10396},[98],[38,10398,1812],{"className":10399,"style":1842},[98,167],[38,10401,10403],{"className":10402},[136],[38,10404,10406,10426],{"className":10405},[140,436],[38,10407,10409,10423],{"className":10408},[144],[38,10410,10412],{"className":10411,"style":443},[148],[38,10413,10414,10417],{"style":9040},[38,10415],{"className":10416,"style":156},[155],[38,10418,10420],{"className":10419},[160,161,162,163],[38,10421,61],{"className":10422,"style":9050},[98,167,163],[38,10424,464],{"className":10425},[463],[38,10427,10429],{"className":10428},[144],[38,10430,10432],{"className":10431,"style":531},[148],[38,10433],{}," 复制多份，以不同的种子进行随机初始化，同时去训练多个注意力头。如果有 ",[38,10436,10438,10452],{"className":10437,"translate":42},[41],[38,10439,10441],{"className":10440},[46],[48,10442,10443],{"xmlns":50},[52,10444,10445,10450],{},[55,10446,10447],{},[58,10448,10449],{},"h",[77,10451,10449],{"encoding":79},[38,10453,10455],{"className":10454,"ariaHidden":85},[84],[38,10456,10458,10461],{"className":10457},[89],[38,10459],{"className":10460,"style":290},[93],[38,10462,10449],{"className":10463},[98,167]," 个头，每个头的注意力计算为",[38,10466,10468],{"className":10467,"translate":42},[315],[38,10469,10471,10520],{"className":10470,"translate":42},[41],[38,10472,10474],{"className":10473},[46],[48,10475,10476],{"xmlns":50,"display":324},[52,10477,10478,10517],{},[55,10479,10480,10487,10489,10491,10493,10499,10501,10507,10509,10515],{},[330,10481,10482,10485],{},[335,10483,10484],{},"head",[58,10486,1977],{},[63,10488,340],{},[335,10490,9223],{},[63,10492,1500],{"stretchy":1499},[330,10494,10495,10497],{},[58,10496,8941],{},[58,10498,1977],{},[63,10500,380],{"separator":85},[330,10502,10503,10505],{},[58,10504,8958],{},[58,10506,1977],{},[63,10508,380],{"separator":85},[330,10510,10511,10513],{},[58,10512,1390],{},[58,10514,1977],{},[63,10516,1542],{"stretchy":1499},[77,10518,10519],{"encoding":79},"\\text{head}_i = \\text{Attention}(Q_i,K_i,V_i)",[38,10521,10523,10582],{"className":10522,"ariaHidden":85},[84],[38,10524,10526,10530,10573,10576,10579],{"className":10525},[89],[38,10527],{"className":10528,"style":10529},[93],"height:0.8444em;vertical-align:-0.15em;",[38,10531,10533,10539],{"className":10532},[98],[38,10534,10536],{"className":10535},[98,456],[38,10537,10484],{"className":10538},[98],[38,10540,10542],{"className":10541},[136],[38,10543,10545,10565],{"className":10544},[140,436],[38,10546,10548,10562],{"className":10547},[144],[38,10549,10551],{"className":10550,"style":2196},[148],[38,10552,10553,10556],{"style":446},[38,10554],{"className":10555,"style":156},[155],[38,10557,10559],{"className":10558},[160,161,162,163],[38,10560,1977],{"className":10561},[98,167,163],[38,10563,464],{"className":10564},[463],[38,10566,10568],{"className":10567},[144],[38,10569,10571],{"className":10570,"style":531},[148],[38,10572],{},[38,10574],{"className":10575,"style":111},[110],[38,10577,340],{"className":10578},[115],[38,10580],{"className":10581,"style":111},[110],[38,10583,10585,10588,10594,10597,10637,10640,10643,10684,10687,10690,10731],{"className":10584},[89],[38,10586],{"className":10587,"style":1574},[93],[38,10589,10591],{"className":10590},[98,456],[38,10592,9223],{"className":10593},[98],[38,10595,1500],{"className":10596},[1578],[38,10598,10600,10603],{"className":10599},[98],[38,10601,8941],{"className":10602},[98,167],[38,10604,10606],{"className":10605},[136],[38,10607,10609,10629],{"className":10608},[140,436],[38,10610,10612,10626],{"className":10611},[144],[38,10613,10615],{"className":10614,"style":2196},[148],[38,10616,10617,10620],{"style":8331},[38,10618],{"className":10619,"style":156},[155],[38,10621,10623],{"className":10622},[160,161,162,163],[38,10624,1977],{"className":10625},[98,167,163],[38,10627,464],{"className":10628},[463],[38,10630,10632],{"className":10631},[144],[38,10633,10635],{"className":10634,"style":531},[148],[38,10636],{},[38,10638,380],{"className":10639},[537],[38,10641],{"className":10642,"style":541},[110],[38,10644,10646,10649],{"className":10645},[98],[38,10647,8958],{"className":10648,"style":9074},[98,167],[38,10650,10652],{"className":10651},[136],[38,10653,10655,10676],{"className":10654},[140,436],[38,10656,10658,10673],{"className":10657},[144],[38,10659,10661],{"className":10660,"style":2196},[148],[38,10662,10664,10667],{"style":10663},"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;",[38,10665],{"className":10666,"style":156},[155],[38,10668,10670],{"className":10669},[160,161,162,163],[38,10671,1977],{"className":10672},[98,167,163],[38,10674,464],{"className":10675},[463],[38,10677,10679],{"className":10678},[144],[38,10680,10682],{"className":10681,"style":531},[148],[38,10683],{},[38,10685,380],{"className":10686},[537],[38,10688],{"className":10689,"style":541},[110],[38,10691,10693,10696],{"className":10692},[98],[38,10694,1390],{"className":10695,"style":1423},[98,167],[38,10697,10699],{"className":10698},[136],[38,10700,10702,10723],{"className":10701},[140,436],[38,10703,10705,10720],{"className":10704},[144],[38,10706,10708],{"className":10707,"style":2196},[148],[38,10709,10711,10714],{"style":10710},"top:-2.55em;margin-left:-0.2222em;margin-right:0.05em;",[38,10712],{"className":10713,"style":156},[155],[38,10715,10717],{"className":10716},[160,161,162,163],[38,10718,1977],{"className":10719},[98,167,163],[38,10721,464],{"className":10722},[463],[38,10724,10726],{"className":10725},[144],[38,10727,10729],{"className":10728,"style":531},[148],[38,10730],{},[38,10732,1542],{"className":10733},[1794],[10,10735,10736,10737,10765],{},"然后将 ",[38,10738,10740,10753],{"className":10739,"translate":42},[41],[38,10741,10743],{"className":10742},[46],[48,10744,10745],{"xmlns":50},[52,10746,10747,10751],{},[55,10748,10749],{},[58,10750,10449],{},[77,10752,10449],{"encoding":79},[38,10754,10756],{"className":10755,"ariaHidden":85},[84],[38,10757,10759,10762],{"className":10758},[89],[38,10760],{"className":10761,"style":290},[93],[38,10763,10449],{"className":10764},[98,167]," 个头的输出拼接：",[38,10767,10769],{"className":10768,"translate":42},[315],[38,10770,10772,10818],{"className":10771,"translate":42},[41],[38,10773,10775],{"className":10774},[46],[48,10776,10777],{"xmlns":50,"display":324},[52,10778,10779,10815],{},[55,10780,10781,10784,10786,10789,10791,10797,10799,10801,10803,10805,10807,10813],{},[335,10782,10783],{},"MultiHead",[63,10785,340],{},[335,10787,10788],{},"Concat",[63,10790,1500],{"stretchy":1499},[330,10792,10793,10795],{},[335,10794,10484],{},[203,10796,348],{},[63,10798,380],{"separator":85},[58,10800,1527],{"mathvariant":400},[58,10802,1527],{"mathvariant":400},[58,10804,1527],{"mathvariant":400},[63,10806,380],{"separator":85},[330,10808,10809,10811],{},[335,10810,10484],{},[58,10812,10449],{},[63,10814,1542],{"stretchy":1499},[77,10816,10817],{"encoding":79},"\\text{MultiHead} = \\text{Concat}(\\text{head}_1,...,\\text{head}_h)",[38,10819,10821,10842],{"className":10820,"ariaHidden":85},[84],[38,10822,10824,10827,10833,10836,10839],{"className":10823},[89],[38,10825],{"className":10826,"style":290},[93],[38,10828,10830],{"className":10829},[98,456],[38,10831,10783],{"className":10832},[98],[38,10834],{"className":10835,"style":111},[110],[38,10837,340],{"className":10838},[115],[38,10840],{"className":10841,"style":111},[110],[38,10843,10845,10848,10854,10857,10900,10903,10906,10909,10912,10915,10958],{"className":10844},[89],[38,10846],{"className":10847,"style":1574},[93],[38,10849,10851],{"className":10850},[98,456],[38,10852,10788],{"className":10853},[98],[38,10855,1500],{"className":10856},[1578],[38,10858,10860,10866],{"className":10859},[98],[38,10861,10863],{"className":10862},[98,456],[38,10864,10484],{"className":10865},[98],[38,10867,10869],{"className":10868},[136],[38,10870,10872,10892],{"className":10871},[140,436],[38,10873,10875,10889],{"className":10874},[144],[38,10876,10878],{"className":10877,"style":509},[148],[38,10879,10880,10883],{"style":446},[38,10881],{"className":10882,"style":156},[155],[38,10884,10886],{"className":10885},[160,161,162,163],[38,10887,348],{"className":10888},[98,163],[38,10890,464],{"className":10891},[463],[38,10893,10895],{"className":10894},[144],[38,10896,10898],{"className":10897,"style":531},[148],[38,10899],{},[38,10901,380],{"className":10902},[537],[38,10904],{"className":10905,"style":541},[110],[38,10907,1738],{"className":10908},[98],[38,10910,380],{"className":10911},[537],[38,10913],{"className":10914,"style":541},[110],[38,10916,10918,10924],{"className":10917},[98],[38,10919,10921],{"className":10920},[98,456],[38,10922,10484],{"className":10923},[98],[38,10925,10927],{"className":10926},[136],[38,10928,10930,10950],{"className":10929},[140,436],[38,10931,10933,10947],{"className":10932},[144],[38,10934,10936],{"className":10935,"style":566},[148],[38,10937,10938,10941],{"style":446},[38,10939],{"className":10940,"style":156},[155],[38,10942,10944],{"className":10943},[160,161,162,163],[38,10945,10449],{"className":10946},[98,167,163],[38,10948,464],{"className":10949},[463],[38,10951,10953],{"className":10952},[144],[38,10954,10956],{"className":10955,"style":531},[148],[38,10957],{},[38,10959,1542],{"className":10960},[1794],[10,10962,10963,10964,11039,11040,11068],{},"然而此时 Multi Head 输出维度是非常大的，其形状为 ",[38,10965,10967,10993],{"className":10966,"translate":42},[41],[38,10968,10970],{"className":10969},[46],[48,10971,10972],{"xmlns":50},[52,10973,10974,10990],{},[55,10975,10976,10978,10980,10982,10984,10986,10988],{},[58,10977,1401],{},[63,10979,1404],{},[63,10981,1500],{"stretchy":1499},[58,10983,75],{},[63,10985,351],{"separator":85},[58,10987,10449],{},[63,10989,1542],{"stretchy":1499},[77,10991,10992],{"encoding":79},"n \\times (d · h)",[38,10994,10996,11015],{"className":10995,"ariaHidden":85},[84],[38,10997,10999,11003,11006,11009,11012],{"className":10998},[89],[38,11000],{"className":11001,"style":11002},[93],"height:0.6667em;vertical-align:-0.0833em;",[38,11004,1401],{"className":11005},[98,167],[38,11007],{"className":11008,"style":595},[110],[38,11010,1404],{"className":11011},[599],[38,11013],{"className":11014,"style":595},[110],[38,11016,11018,11021,11024,11027,11030,11033,11036],{"className":11017},[89],[38,11019],{"className":11020,"style":1574},[93],[38,11022,1500],{"className":11023},[1578],[38,11025,75],{"className":11026},[98,167],[38,11028,351],{"className":11029},[537],[38,11031],{"className":11032,"style":541},[110],[38,11034,10449],{"className":11035},[98,167],[38,11037,1542],{"className":11038},[1794],"。Transformer 中后续还引入了残差链接和前馈神经网络，要求输出维度特征必须与输入 ",[38,11041,11043,11056],{"className":11042,"translate":42},[41],[38,11044,11046],{"className":11045},[46],[48,11047,11048],{"xmlns":50},[52,11049,11050,11054],{},[55,11051,11052],{},[58,11053,8628],{},[77,11055,8628],{"encoding":79},[38,11057,11059],{"className":11058,"ariaHidden":85},[84],[38,11060,11062,11065],{"className":11061},[89],[38,11063],{"className":11064,"style":1555},[93],[38,11066,8628],{"className":11067,"style":8665},[98,167]," 一致。",[10,11070,11071,11072,11231,11232,11260],{},"因此，需要引入一个可学习的线性映射矩阵 ",[38,11073,11075,11116],{"className":11074,"translate":42},[41],[38,11076,11078],{"className":11077},[46],[48,11079,11080],{"xmlns":50},[52,11081,11082,11113],{},[55,11083,11084,11091,11093],{},[330,11085,11086,11088],{},[58,11087,1812],{},[58,11089,11090],{},"O",[63,11092,65],{},[67,11094,11095,11097],{},[58,11096,72],{"mathvariant":71},[55,11098,11099,11101,11103,11105,11107,11109,11111],{},[63,11100,1500],{"stretchy":1499},[58,11102,10449],{},[63,11104,351],{},[58,11106,75],{},[63,11108,1542],{"stretchy":1499},[63,11110,1404],{},[58,11112,75],{},[77,11114,11115],{"encoding":79},"W_O \\in \\mathbb{R}^{(h \\cdot d) \\times d}",[38,11117,11119,11175],{"className":11118,"ariaHidden":85},[84],[38,11120,11122,11125,11166,11169,11172],{"className":11121},[89],[38,11123],{"className":11124,"style":8310},[93],[38,11126,11128,11131],{"className":11127},[98],[38,11129,1812],{"className":11130,"style":1842},[98,167],[38,11132,11134],{"className":11133},[136],[38,11135,11137,11158],{"className":11136},[140,436],[38,11138,11140,11155],{"className":11139},[144],[38,11141,11144],{"className":11142,"style":11143},[148],"height:0.3283em;",[38,11145,11146,11149],{"style":9040},[38,11147],{"className":11148,"style":156},[155],[38,11150,11152],{"className":11151},[160,161,162,163],[38,11153,11090],{"className":11154,"style":3520},[98,167,163],[38,11156,464],{"className":11157},[463],[38,11159,11161],{"className":11160},[144],[38,11162,11164],{"className":11163,"style":531},[148],[38,11165],{},[38,11167],{"className":11168,"style":111},[110],[38,11170,65],{"className":11171},[115],[38,11173],{"className":11174,"style":111},[110],[38,11176,11178,11181],{"className":11177},[89],[38,11179],{"className":11180,"style":4177},[93],[38,11182,11184,11187],{"className":11183},[98],[38,11185,72],{"className":11186},[98,132],[38,11188,11190],{"className":11189},[136],[38,11191,11193],{"className":11192},[140],[38,11194,11196],{"className":11195},[144],[38,11197,11199],{"className":11198,"style":4177},[148],[38,11200,11201,11204],{"style":151},[38,11202],{"className":11203,"style":156},[155],[38,11205,11207],{"className":11206},[160,161,162,163],[38,11208,11210,11213,11216,11219,11222,11225,11228],{"className":11209},[98,163],[38,11211,1500],{"className":11212},[1578,163],[38,11214,10449],{"className":11215},[98,167,163],[38,11217,351],{"className":11218},[599,163],[38,11220,75],{"className":11221},[98,167,163],[38,11223,1542],{"className":11224},[1794,163],[38,11226,1404],{"className":11227},[599,163],[38,11229,75],{"className":11230},[98,167,163],"，将拼接后的多头输出，映射回原来的模型维度 ",[38,11233,11235,11248],{"className":11234,"translate":42},[41],[38,11236,11238],{"className":11237},[46],[48,11239,11240],{"xmlns":50},[52,11241,11242,11246],{},[55,11243,11244],{},[58,11245,75],{},[77,11247,75],{"encoding":79},[38,11249,11251],{"className":11250,"ariaHidden":85},[84],[38,11252,11254,11257],{"className":11253},[89],[38,11255],{"className":11256,"style":290},[93],[38,11258,75],{"className":11259},[98,167],"。这个矩阵也是通过反向传播进行更新。",[10,11262,11263],{},"因此，多头拼接再经过线性变换，有",[38,11265,11267],{"className":11266,"translate":42},[315],[38,11268,11270,11322],{"className":11269,"translate":42},[41],[38,11271,11273],{"className":11272},[46],[48,11274,11275],{"xmlns":50,"display":324},[52,11276,11277,11319],{},[55,11278,11279,11281,11283,11285,11287,11293,11295,11297,11299,11301,11303,11309,11311,11313],{},[335,11280,10783],{},[63,11282,340],{},[335,11284,10788],{},[63,11286,1500],{"stretchy":1499},[330,11288,11289,11291],{},[335,11290,10484],{},[203,11292,348],{},[63,11294,380],{"separator":85},[58,11296,1527],{"mathvariant":400},[58,11298,1527],{"mathvariant":400},[58,11300,1527],{"mathvariant":400},[63,11302,380],{"separator":85},[330,11304,11305,11307],{},[335,11306,10484],{},[58,11308,10449],{},[63,11310,1542],{"stretchy":1499},[63,11312,351],{"separator":85},[330,11314,11315,11317],{},[58,11316,1812],{},[58,11318,11090],{},[77,11320,11321],{"encoding":79},"\\text{MultiHead} = \\text{Concat}(\\text{head}_1,...,\\text{head}_h) · W_O",[38,11323,11325,11346],{"className":11324,"ariaHidden":85},[84],[38,11326,11328,11331,11337,11340,11343],{"className":11327},[89],[38,11329],{"className":11330,"style":290},[93],[38,11332,11334],{"className":11333},[98,456],[38,11335,10783],{"className":11336},[98],[38,11338],{"className":11339,"style":111},[110],[38,11341,340],{"className":11342},[115],[38,11344],{"className":11345,"style":111},[110],[38,11347,11349,11352,11358,11361,11404,11407,11410,11413,11416,11419,11462,11465,11468,11471],{"className":11348},[89],[38,11350],{"className":11351,"style":1574},[93],[38,11353,11355],{"className":11354},[98,456],[38,11356,10788],{"className":11357},[98],[38,11359,1500],{"className":11360},[1578],[38,11362,11364,11370],{"className":11363},[98],[38,11365,11367],{"className":11366},[98,456],[38,11368,10484],{"className":11369},[98],[38,11371,11373],{"className":11372},[136],[38,11374,11376,11396],{"className":11375},[140,436],[38,11377,11379,11393],{"className":11378},[144],[38,11380,11382],{"className":11381,"style":509},[148],[38,11383,11384,11387],{"style":446},[38,11385],{"className":11386,"style":156},[155],[38,11388,11390],{"className":11389},[160,161,162,163],[38,11391,348],{"className":11392},[98,163],[38,11394,464],{"className":11395},[463],[38,11397,11399],{"className":11398},[144],[38,11400,11402],{"className":11401,"style":531},[148],[38,11403],{},[38,11405,380],{"className":11406},[537],[38,11408],{"className":11409,"style":541},[110],[38,11411,1738],{"className":11412},[98],[38,11414,380],{"className":11415},[537],[38,11417],{"className":11418,"style":541},[110],[38,11420,11422,11428],{"className":11421},[98],[38,11423,11425],{"className":11424},[98,456],[38,11426,10484],{"className":11427},[98],[38,11429,11431],{"className":11430},[136],[38,11432,11434,11454],{"className":11433},[140,436],[38,11435,11437,11451],{"className":11436},[144],[38,11438,11440],{"className":11439,"style":566},[148],[38,11441,11442,11445],{"style":446},[38,11443],{"className":11444,"style":156},[155],[38,11446,11448],{"className":11447},[160,161,162,163],[38,11449,10449],{"className":11450},[98,167,163],[38,11452,464],{"className":11453},[463],[38,11455,11457],{"className":11456},[144],[38,11458,11460],{"className":11459,"style":531},[148],[38,11461],{},[38,11463,1542],{"className":11464},[1794],[38,11466,351],{"className":11467},[537],[38,11469],{"className":11470,"style":541},[110],[38,11472,11474,11477],{"className":11473},[98],[38,11475,1812],{"className":11476,"style":1842},[98,167],[38,11478,11480],{"className":11479},[136],[38,11481,11483,11503],{"className":11482},[140,436],[38,11484,11486,11500],{"className":11485},[144],[38,11487,11489],{"className":11488,"style":11143},[148],[38,11490,11491,11494],{"style":9040},[38,11492],{"className":11493,"style":156},[155],[38,11495,11497],{"className":11496},[160,161,162,163],[38,11498,11090],{"className":11499,"style":3520},[98,167,163],[38,11501,464],{"className":11502},[463],[38,11504,11506],{"className":11505},[144],[38,11507,11509],{"className":11508,"style":531},[148],[38,11510],{},[31,11512,11514],{"id":11513},"高阶注意力","*高阶注意力",[10,11516,11517,11518,11614,11615,11643,11644,11672,11673,11724,11725,11755],{},"在上面的多头输出后，我们得到输出 ",[38,11519,11521,11549],{"className":11520,"translate":42},[41],[38,11522,11524],{"className":11523},[46],[48,11525,11526],{"xmlns":50},[52,11527,11528,11546],{},[55,11529,11530,11532,11534],{},[58,11531,11090],{},[63,11533,65],{},[67,11535,11536,11538],{},[58,11537,72],{"mathvariant":71},[55,11539,11540,11542,11544],{},[58,11541,1401],{},[63,11543,1404],{},[58,11545,75],{},[77,11547,11548],{"encoding":79},"O \\in \\mathbb{R}^{n \\times d}",[38,11550,11552,11570],{"className":11551,"ariaHidden":85},[84],[38,11553,11555,11558,11561,11564,11567],{"className":11554},[89],[38,11556],{"className":11557,"style":1419},[93],[38,11559,11090],{"className":11560,"style":3520},[98,167],[38,11562],{"className":11563,"style":111},[110],[38,11565,65],{"className":11566},[115],[38,11568],{"className":11569,"style":111},[110],[38,11571,11573,11576],{"className":11572},[89],[38,11574],{"className":11575,"style":125},[93],[38,11577,11579,11582],{"className":11578},[98],[38,11580,72],{"className":11581},[98,132],[38,11583,11585],{"className":11584},[136],[38,11586,11588],{"className":11587},[140],[38,11589,11591],{"className":11590},[144],[38,11592,11594],{"className":11593,"style":125},[148],[38,11595,11596,11599],{"style":151},[38,11597],{"className":11598,"style":156},[155],[38,11600,11602],{"className":11601},[160,161,162,163],[38,11603,11605,11608,11611],{"className":11604},[98,163],[38,11606,1401],{"className":11607},[98,167,163],[38,11609,1404],{"className":11610},[599,163],[38,11612,75],{"className":11613},[98,167,163],"，与输入 ",[38,11616,11618,11631],{"className":11617,"translate":42},[41],[38,11619,11621],{"className":11620},[46],[48,11622,11623],{"xmlns":50},[52,11624,11625,11629],{},[55,11626,11627],{},[58,11628,8628],{},[77,11630,8628],{"encoding":79},[38,11632,11634],{"className":11633,"ariaHidden":85},[84],[38,11635,11637,11640],{"className":11636},[89],[38,11638],{"className":11639,"style":1555},[93],[38,11641,8628],{"className":11642,"style":8665},[98,167]," 维度特征一致，那么，我们可以将输出 ",[38,11645,11647,11660],{"className":11646,"translate":42},[41],[38,11648,11650],{"className":11649},[46],[48,11651,11652],{"xmlns":50},[52,11653,11654,11658],{},[55,11655,11656],{},[58,11657,11090],{},[77,11659,11090],{"encoding":79},[38,11661,11663],{"className":11662,"ariaHidden":85},[84],[38,11664,11666,11669],{"className":11665},[89],[38,11667],{"className":11668,"style":1555},[93],[38,11670,11090],{"className":11671,"style":3520},[98,167]," 作为新的输入 ",[38,11674,11676,11694],{"className":11675,"translate":42},[41],[38,11677,11679],{"className":11678},[46],[48,11680,11681],{"xmlns":50},[52,11682,11683,11691],{},[55,11684,11685,11687,11689],{},[58,11686,8628],{},[63,11688,340],{},[58,11690,11090],{},[77,11692,11693],{"encoding":79},"X = O",[38,11695,11697,11715],{"className":11696,"ariaHidden":85},[84],[38,11698,11700,11703,11706,11709,11712],{"className":11699},[89],[38,11701],{"className":11702,"style":1555},[93],[38,11704,8628],{"className":11705,"style":8665},[98,167],[38,11707],{"className":11708,"style":111},[110],[38,11710,340],{"className":11711},[115],[38,11713],{"className":11714,"style":111},[110],[38,11716,11718,11721],{"className":11717},[89],[38,11719],{"className":11720,"style":1555},[93],[38,11722,11090],{"className":11723,"style":3520},[98,167],"，再来一轮多头注意力。以此类推，经过多轮多头注意力，连续经过 ",[38,11726,11728,11742],{"className":11727,"translate":42},[41],[38,11729,11731],{"className":11730},[46],[48,11732,11733],{"xmlns":50},[52,11734,11735,11740],{},[55,11736,11737],{},[58,11738,11739],{},"l",[77,11741,11739],{"encoding":79},[38,11743,11745],{"className":11744,"ariaHidden":85},[84],[38,11746,11748,11751],{"className":11747},[89],[38,11749],{"className":11750,"style":290},[93],[38,11752,11739],{"className":11753,"style":11754},[98,167],"margin-right:0.01968em;"," 轮，得到最终输出：",[38,11757,11759],{"className":11758,"translate":42},[315],[38,11760,11762,11804],{"className":11761,"translate":42},[41],[38,11763,11765],{"className":11764},[46],[48,11766,11767],{"xmlns":50,"display":324},[52,11768,11769,11801],{},[55,11770,11771,11777,11779,11785,11787,11789,11791,11793,11795],{},[330,11772,11773,11775],{},[58,11774,11090],{},[203,11776,348],{},[63,11778,380],{"separator":85},[330,11780,11781,11783],{},[58,11782,11090],{},[203,11784,368],{},[63,11786,380],{"separator":85},[58,11788,1527],{"mathvariant":400},[58,11790,1527],{"mathvariant":400},[58,11792,1527],{"mathvariant":400},[63,11794,380],{"separator":85},[330,11796,11797,11799],{},[58,11798,11090],{},[58,11800,11739],{},[77,11802,11803],{"encoding":79},"O_1,O_2,...,O_l",[38,11805,11807],{"className":11806,"ariaHidden":85},[84],[38,11808,11810,11813,11853,11856,11859,11899,11902,11905,11908,11911,11914],{"className":11809},[89],[38,11811],{"className":11812,"style":8997},[93],[38,11814,11816,11819],{"className":11815},[98],[38,11817,11090],{"className":11818,"style":3520},[98,167],[38,11820,11822],{"className":11821},[136],[38,11823,11825,11845],{"className":11824},[140,436],[38,11826,11828,11842],{"className":11827},[144],[38,11829,11831],{"className":11830,"style":509},[148],[38,11832,11833,11836],{"style":3535},[38,11834],{"className":11835,"style":156},[155],[38,11837,11839],{"className":11838},[160,161,162,163],[38,11840,348],{"className":11841},[98,163],[38,11843,464],{"className":11844},[463],[38,11846,11848],{"className":11847},[144],[38,11849,11851],{"className":11850,"style":531},[148],[38,11852],{},[38,11854,380],{"className":11855},[537],[38,11857],{"className":11858,"style":541},[110],[38,11860,11862,11865],{"className":11861},[98],[38,11863,11090],{"className":11864,"style":3520},[98,167],[38,11866,11868],{"className":11867},[136],[38,11869,11871,11891],{"className":11870},[140,436],[38,11872,11874,11888],{"className":11873},[144],[38,11875,11877],{"className":11876,"style":509},[148],[38,11878,11879,11882],{"style":3535},[38,11880],{"className":11881,"style":156},[155],[38,11883,11885],{"className":11884},[160,161,162,163],[38,11886,368],{"className":11887},[98,163],[38,11889,464],{"className":11890},[463],[38,11892,11894],{"className":11893},[144],[38,11895,11897],{"className":11896,"style":531},[148],[38,11898],{},[38,11900,380],{"className":11901},[537],[38,11903],{"className":11904,"style":541},[110],[38,11906,1738],{"className":11907},[98],[38,11909,380],{"className":11910},[537],[38,11912],{"className":11913,"style":541},[110],[38,11915,11917,11920],{"className":11916},[98],[38,11918,11090],{"className":11919,"style":3520},[98,167],[38,11921,11923],{"className":11922},[136],[38,11924,11926,11946],{"className":11925},[140,436],[38,11927,11929,11943],{"className":11928},[144],[38,11930,11932],{"className":11931,"style":566},[148],[38,11933,11934,11937],{"style":3535},[38,11935],{"className":11936,"style":156},[155],[38,11938,11940],{"className":11939},[160,161,162,163],[38,11941,11739],{"className":11942,"style":11754},[98,167,163],[38,11944,464],{"className":11945},[463],[38,11947,11949],{"className":11948},[144],[38,11950,11952],{"className":11951,"style":531},[148],[38,11953],{},[10,11955,11956,11957,12027,12028,12056],{},"最终输出 ",[38,11958,11960,11978],{"className":11959,"translate":42},[41],[38,11961,11963],{"className":11962},[46],[48,11964,11965],{"xmlns":50},[52,11966,11967,11975],{},[55,11968,11969],{},[330,11970,11971,11973],{},[58,11972,11090],{},[58,11974,11739],{},[77,11976,11977],{"encoding":79},"O_l",[38,11979,11981],{"className":11980,"ariaHidden":85},[84],[38,11982,11984,11987],{"className":11983},[89],[38,11985],{"className":11986,"style":8310},[93],[38,11988,11990,11993],{"className":11989},[98],[38,11991,11090],{"className":11992,"style":3520},[98,167],[38,11994,11996],{"className":11995},[136],[38,11997,11999,12019],{"className":11998},[140,436],[38,12000,12002,12016],{"className":12001},[144],[38,12003,12005],{"className":12004,"style":566},[148],[38,12006,12007,12010],{"style":3535},[38,12008],{"className":12009,"style":156},[155],[38,12011,12013],{"className":12012},[160,161,162,163],[38,12014,11739],{"className":12015,"style":11754},[98,167,163],[38,12017,464],{"className":12018},[463],[38,12020,12022],{"className":12021},[144],[38,12023,12025],{"className":12024,"style":531},[148],[38,12026],{}," 就是模型对序列的高阶表示，它融合了多轮注意力捕捉到的复杂语义和依赖关系。第一层多头注意力：捕捉局部和短距离的依赖。第二层多头注意力：在上一层表示基础上，捕捉更高级的组合模式。直到第 ",[38,12029,12031,12044],{"className":12030,"translate":42},[41],[38,12032,12034],{"className":12033},[46],[48,12035,12036],{"xmlns":50},[52,12037,12038,12042],{},[55,12039,12040],{},[58,12041,11739],{},[77,12043,11739],{"encoding":79},[38,12045,12047],{"className":12046,"ariaHidden":85},[84],[38,12048,12050,12053],{"className":12049},[89],[38,12051],{"className":12052,"style":290},[93],[38,12054,11739],{"className":12055,"style":11754},[98,167]," 层：得到全局、高阶语义的向量表示。",[10,12058,12059],{},"在工程上，通常每一层通常都有 残差连接 + LayerNorm，即",[38,12061,12063],{"className":12062,"translate":42},[315],[38,12064,12066,12111],{"className":12065,"translate":42},[41],[38,12067,12069],{"className":12068},[46],[48,12070,12071],{"xmlns":50,"display":324},[52,12072,12073,12108],{},[55,12074,12075,12085,12087,12090,12092,12098,12100,12106],{},[330,12076,12077,12083],{},[6116,12078,12079,12081],{"accent":85},[58,12080,11090],{},[63,12082,6122],{},[58,12084,1977],{},[63,12086,340],{},[335,12088,12089],{},"LayerNorm",[63,12091,1500],{"stretchy":1499},[330,12093,12094,12096],{},[58,12095,8628],{},[58,12097,1977],{},[63,12099,361],{},[330,12101,12102,12104],{},[58,12103,11090],{},[58,12105,1977],{},[63,12107,1542],{"stretchy":1499},[77,12109,12110],{"encoding":79},"\\tilde{O}_i = \\text{LayerNorm}(X_i + O_i)",[38,12112,12114,12200,12265],{"className":12113,"ariaHidden":85},[84],[38,12115,12117,12121,12191,12194,12197],{"className":12116},[89],[38,12118],{"className":12119,"style":12120},[93],"height:1.0702em;vertical-align:-0.15em;",[38,12122,12124,12157],{"className":12123},[98],[38,12125,12127],{"className":12126},[98,6159],[38,12128,12130],{"className":12129},[140],[38,12131,12133],{"className":12132},[144],[38,12134,12137,12145],{"className":12135,"style":12136},[148],"height:0.9202em;",[38,12138,12139,12142],{"style":6172},[38,12140],{"className":12141,"style":3510},[155],[38,12143,11090],{"className":12144,"style":3520},[98,167],[38,12146,12148,12151],{"style":12147},"top:-3.6023em;",[38,12149],{"className":12150,"style":3510},[155],[38,12152,12154],{"className":12153,"style":6189},[6188],[38,12155,6122],{"className":12156},[98],[38,12158,12160],{"className":12159},[136],[38,12161,12163,12183],{"className":12162},[140,436],[38,12164,12166,12180],{"className":12165},[144],[38,12167,12169],{"className":12168,"style":2196},[148],[38,12170,12171,12174],{"style":3535},[38,12172],{"className":12173,"style":156},[155],[38,12175,12177],{"className":12176},[160,161,162,163],[38,12178,1977],{"className":12179},[98,167,163],[38,12181,464],{"className":12182},[463],[38,12184,12186],{"className":12185},[144],[38,12187,12189],{"className":12188,"style":531},[148],[38,12190],{},[38,12192],{"className":12193,"style":111},[110],[38,12195,340],{"className":12196},[115],[38,12198],{"className":12199,"style":111},[110],[38,12201,12203,12206,12212,12215,12256,12259,12262],{"className":12202},[89],[38,12204],{"className":12205,"style":1574},[93],[38,12207,12209],{"className":12208},[98,456],[38,12210,12089],{"className":12211},[98],[38,12213,1500],{"className":12214},[1578],[38,12216,12218,12221],{"className":12217},[98],[38,12219,8628],{"className":12220,"style":8665},[98,167],[38,12222,12224],{"className":12223},[136],[38,12225,12227,12248],{"className":12226},[140,436],[38,12228,12230,12245],{"className":12229},[144],[38,12231,12233],{"className":12232,"style":2196},[148],[38,12234,12236,12239],{"style":12235},"top:-2.55em;margin-left:-0.0785em;margin-right:0.05em;",[38,12237],{"className":12238,"style":156},[155],[38,12240,12242],{"className":12241},[160,161,162,163],[38,12243,1977],{"className":12244},[98,167,163],[38,12246,464],{"className":12247},[463],[38,12249,12251],{"className":12250},[144],[38,12252,12254],{"className":12253,"style":531},[148],[38,12255],{},[38,12257],{"className":12258,"style":595},[110],[38,12260,361],{"className":12261},[599],[38,12263],{"className":12264,"style":595},[110],[38,12266,12268,12271,12311],{"className":12267},[89],[38,12269],{"className":12270,"style":1574},[93],[38,12272,12274,12277],{"className":12273},[98],[38,12275,11090],{"className":12276,"style":3520},[98,167],[38,12278,12280],{"className":12279},[136],[38,12281,12283,12303],{"className":12282},[140,436],[38,12284,12286,12300],{"className":12285},[144],[38,12287,12289],{"className":12288,"style":2196},[148],[38,12290,12291,12294],{"style":3535},[38,12292],{"className":12293,"style":156},[155],[38,12295,12297],{"className":12296},[160,161,162,163],[38,12298,1977],{"className":12299},[98,167,163],[38,12301,464],{"className":12302},[463],[38,12304,12306],{"className":12305},[144],[38,12307,12309],{"className":12308,"style":531},[148],[38,12310],{},[38,12312,1542],{"className":12313},[1794],[10,12315,12316],{},"保证梯度稳定和信息传递。高阶注意力就是将多头注意力层按序堆叠，让模型能够逐层捕捉更复杂、更抽象的语义和依赖关系。",[10,12318,12319,12320,12327],{},"有最近的研究表明，阶数越低，捕捉到的特征成分主要是语法、句法和词法结构。阶数越高，捕捉到的特征成分主要是语义信息。在文章 ",[14,12321,12322],{},[17,12323,12326],{"href":12324,"rel":12325},"https:\u002F\u002Faclanthology.org\u002FP19-1452.pdf",[21],"BERT Rediscovers the Classical NLP Pipeline"," 中提到，发现 Transformer 层次表示呈现 由语法到语义的递进规律。",[31,12329,12331],{"id":12330},"decoder-和自回归生成","Decoder 和自回归生成",[10,12333,12334],{},"在前面的章节中，我们详细讨论了 Encoder 部分——它的作用是把输入的序列（比如一句英文）编码成一串富含上下文信息的高阶表示向量。那么，问题来了：如果我们想要生成一个新的序列（比如把英文翻译成中文，或者根据上文续写下文），该怎么办？",[10,12336,12337],{},"Encoder 的目标是“理解”，Decoder 的目标是“生成”。在 Transformer 原始架构中（用于机器翻译），Decoder 和 Encoder 长得非常像，但有三个关键的区别：",[12339,12340,12341,12349,12387],"ol",{},[12342,12343,12344,12345,12348],"li",{},"Decoder 是",[170,12346,12347],{},"自回归","的：它一个词一个词地往外蹦，一次只预测下一个词，每次生成下一个词时，会把之前生成的所有词都当作输入。",[12342,12350,12351,12352,12355,12356,12386],{},"Decoder 包含 ",[170,12353,12354],{},"Masked Attention（掩码注意力）","：在生成第 ",[38,12357,12359,12373],{"className":12358,"translate":42},[41],[38,12360,12362],{"className":12361},[46],[48,12363,12364],{"xmlns":50},[52,12365,12366,12371],{},[55,12367,12368],{},[58,12369,12370],{},"t",[77,12372,12370],{"encoding":79},[38,12374,12376],{"className":12375,"ariaHidden":85},[84],[38,12377,12379,12383],{"className":12378},[89],[38,12380],{"className":12381,"style":12382},[93],"height:0.6151em;",[38,12384,12370],{"className":12385},[98,167]," 个词时，它不能“偷看”未来的词。",[12342,12388,12351,12389,12392],{},[170,12390,12391],{},"Cross-Attention（交叉注意力）","：它不仅要关注自己已经生成的内容，还要时刻关注 Encoder 输出的源语言信息。",[10,12394,12395,12396,12400,12401,179],{},"假设我们正在进行中文 - 英文翻译任务，输入是 ",[12397,12398,12399],"code",{},"我 爱 机 器 学 习","，期望输出是 ",[12397,12402,12403],{},"I love machine learning",[10,12405,12406],{},"在训练阶段，我们可以并行计算（因为知道标准答案）。但在推理阶段（即模型真正使用时），模型必须自己生成答案。这个过程就是自回归（Autoregressive）。",[250,12408,12409],{},[10,12410,12411,12414],{},[170,12412,12413],{},"自回归假设","：当前时刻的输出，依赖于之前所有时刻的输出。",[10,12416,12417],{},"用形式化表示就是：",[38,12419,12421],{"className":12420,"translate":42},[315],[38,12422,12424,12547],{"className":12423,"translate":42},[41],[38,12425,12427],{"className":12426},[46],[48,12428,12429],{"xmlns":50,"display":324},[52,12430,12431,12544],{},[55,12432,12433,12436,12438,12445,12447,12453,12455,12457,12459,12461,12463,12469,12471,12473,12488,12490,12492,12498,12501,12507,12509,12515,12517,12519,12521,12523,12525,12538,12540,12542],{},[58,12434,12435],{},"P",[63,12437,1500],{"stretchy":1499},[330,12439,12440,12443],{},[58,12441,12442],{},"y",[203,12444,348],{},[63,12446,380],{"separator":85},[330,12448,12449,12451],{},[58,12450,12442],{},[203,12452,368],{},[63,12454,380],{"separator":85},[58,12456,1527],{"mathvariant":400},[58,12458,1527],{"mathvariant":400},[58,12460,1527],{"mathvariant":400},[63,12462,380],{"separator":85},[330,12464,12465,12467],{},[58,12466,12442],{},[58,12468,1960],{},[63,12470,1542],{"stretchy":1499},[63,12472,340],{},[1968,12474,12475,12478,12486],{},[63,12476,12477],{},"∏",[55,12479,12480,12482,12484],{},[58,12481,12370],{},[63,12483,340],{},[203,12485,348],{},[58,12487,1960],{},[58,12489,12435],{},[63,12491,1500],{"stretchy":1499},[330,12493,12494,12496],{},[58,12495,12442],{},[58,12497,12370],{},[63,12499,12500],{},"∣",[330,12502,12503,12505],{},[58,12504,12442],{},[203,12506,348],{},[63,12508,380],{"separator":85},[330,12510,12511,12513],{},[58,12512,12442],{},[203,12514,368],{},[63,12516,380],{"separator":85},[58,12518,1527],{"mathvariant":400},[58,12520,1527],{"mathvariant":400},[58,12522,1527],{"mathvariant":400},[63,12524,380],{"separator":85},[330,12526,12527,12529],{},[58,12528,12442],{},[55,12530,12531,12533,12536],{},[58,12532,12370],{},[63,12534,12535],{},"−",[203,12537,348],{},[63,12539,380],{"separator":85},[58,12541,8628],{},[63,12543,1542],{"stretchy":1499},[77,12545,12546],{"encoding":79},"P(y_1, y_2, ..., y_T) = \\prod_{t=1}^{T} P(y_t \\mid y_1, y_2, ..., y_{t-1}, X)",[38,12548,12550,12716,12851],{"className":12549,"ariaHidden":85},[84],[38,12551,12553,12556,12559,12562,12603,12606,12609,12649,12652,12655,12658,12661,12664,12704,12707,12710,12713],{"className":12552},[89],[38,12554],{"className":12555,"style":1574},[93],[38,12557,12435],{"className":12558,"style":1842},[98,167],[38,12560,1500],{"className":12561},[1578],[38,12563,12565,12568],{"className":12564},[98],[38,12566,12442],{"className":12567,"style":9050},[98,167],[38,12569,12571],{"className":12570},[136],[38,12572,12574,12595],{"className":12573},[140,436],[38,12575,12577,12592],{"className":12576},[144],[38,12578,12580],{"className":12579,"style":509},[148],[38,12581,12583,12586],{"style":12582},"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;",[38,12584],{"className":12585,"style":156},[155],[38,12587,12589],{"className":12588},[160,161,162,163],[38,12590,348],{"className":12591},[98,163],[38,12593,464],{"className":12594},[463],[38,12596,12598],{"className":12597},[144],[38,12599,12601],{"className":12600,"style":531},[148],[38,12602],{},[38,12604,380],{"className":12605},[537],[38,12607],{"className":12608,"style":541},[110],[38,12610,12612,12615],{"className":12611},[98],[38,12613,12442],{"className":12614,"style":9050},[98,167],[38,12616,12618],{"className":12617},[136],[38,12619,12621,12641],{"className":12620},[140,436],[38,12622,12624,12638],{"className":12623},[144],[38,12625,12627],{"className":12626,"style":509},[148],[38,12628,12629,12632],{"style":12582},[38,12630],{"className":12631,"style":156},[155],[38,12633,12635],{"className":12634},[160,161,162,163],[38,12636,368],{"className":12637},[98,163],[38,12639,464],{"className":12640},[463],[38,12642,12644],{"className":12643},[144],[38,12645,12647],{"className":12646,"style":531},[148],[38,12648],{},[38,12650,380],{"className":12651},[537],[38,12653],{"className":12654,"style":541},[110],[38,12656,1738],{"className":12657},[98],[38,12659,380],{"className":12660},[537],[38,12662],{"className":12663,"style":541},[110],[38,12665,12667,12670],{"className":12666},[98],[38,12668,12442],{"className":12669,"style":9050},[98,167],[38,12671,12673],{"className":12672},[136],[38,12674,12676,12696],{"className":12675},[140,436],[38,12677,12679,12693],{"className":12678},[144],[38,12680,12682],{"className":12681,"style":11143},[148],[38,12683,12684,12687],{"style":12582},[38,12685],{"className":12686,"style":156},[155],[38,12688,12690],{"className":12689},[160,161,162,163],[38,12691,1960],{"className":12692,"style":1842},[98,167,163],[38,12694,464],{"className":12695},[463],[38,12697,12699],{"className":12698},[144],[38,12700,12702],{"className":12701,"style":531},[148],[38,12703],{},[38,12705,1542],{"className":12706},[1794],[38,12708],{"className":12709,"style":111},[110],[38,12711,340],{"className":12712},[115],[38,12714],{"className":12715,"style":111},[110],[38,12717,12719,12723,12793,12796,12799,12802,12842,12845,12848],{"className":12718},[89],[38,12720],{"className":12721,"style":12722},[93],"height:3.0954em;vertical-align:-1.2671em;",[38,12724,12726],{"className":12725},[2101,2102],[38,12727,12729,12784],{"className":12728},[140,436],[38,12730,12732,12781],{"className":12731},[144],[38,12733,12736,12757,12767],{"className":12734,"style":12735},[148],"height:1.8283em;",[38,12737,12739,12742],{"style":12738},"top:-1.8829em;margin-left:0em;",[38,12740],{"className":12741,"style":2119},[155],[38,12743,12745],{"className":12744},[160,161,162,163],[38,12746,12748,12751,12754],{"className":12747},[98,163],[38,12749,12370],{"className":12750},[98,167,163],[38,12752,340],{"className":12753},[115,163],[38,12755,348],{"className":12756},[98,163],[38,12758,12759,12762],{"style":2137},[38,12760],{"className":12761,"style":2119},[155],[38,12763,12764],{},[38,12765,12477],{"className":12766},[2101,2146,2147],[38,12768,12769,12772],{"style":2150},[38,12770],{"className":12771,"style":2119},[155],[38,12773,12775],{"className":12774},[160,161,162,163],[38,12776,12778],{"className":12777},[98,163],[38,12779,1960],{"className":12780,"style":1842},[98,167,163],[38,12782,464],{"className":12783},[463],[38,12785,12787],{"className":12786},[144],[38,12788,12791],{"className":12789,"style":12790},[148],"height:1.2671em;",[38,12792],{},[38,12794],{"className":12795,"style":541},[110],[38,12797,12435],{"className":12798,"style":1842},[98,167],[38,12800,1500],{"className":12801},[1578],[38,12803,12805,12808],{"className":12804},[98],[38,12806,12442],{"className":12807,"style":9050},[98,167],[38,12809,12811],{"className":12810},[136],[38,12812,12814,12834],{"className":12813},[140,436],[38,12815,12817,12831],{"className":12816},[144],[38,12818,12820],{"className":12819,"style":5669},[148],[38,12821,12822,12825],{"style":12582},[38,12823],{"className":12824,"style":156},[155],[38,12826,12828],{"className":12827},[160,161,162,163],[38,12829,12370],{"className":12830},[98,167,163],[38,12832,464],{"className":12833},[463],[38,12835,12837],{"className":12836},[144],[38,12838,12840],{"className":12839,"style":531},[148],[38,12841],{},[38,12843],{"className":12844,"style":111},[110],[38,12846,12500],{"className":12847},[115],[38,12849],{"className":12850,"style":111},[110],[38,12852,12854,12857,12897,12900,12903,12943,12946,12949,12952,12955,12958,13008,13011,13014,13017],{"className":12853},[89],[38,12855],{"className":12856,"style":1574},[93],[38,12858,12860,12863],{"className":12859},[98],[38,12861,12442],{"className":12862,"style":9050},[98,167],[38,12864,12866],{"className":12865},[136],[38,12867,12869,12889],{"className":12868},[140,436],[38,12870,12872,12886],{"className":12871},[144],[38,12873,12875],{"className":12874,"style":509},[148],[38,12876,12877,12880],{"style":12582},[38,12878],{"className":12879,"style":156},[155],[38,12881,12883],{"className":12882},[160,161,162,163],[38,12884,348],{"className":12885},[98,163],[38,12887,464],{"className":12888},[463],[38,12890,12892],{"className":12891},[144],[38,12893,12895],{"className":12894,"style":531},[148],[38,12896],{},[38,12898,380],{"className":12899},[537],[38,12901],{"className":12902,"style":541},[110],[38,12904,12906,12909],{"className":12905},[98],[38,12907,12442],{"className":12908,"style":9050},[98,167],[38,12910,12912],{"className":12911},[136],[38,12913,12915,12935],{"className":12914},[140,436],[38,12916,12918,12932],{"className":12917},[144],[38,12919,12921],{"className":12920,"style":509},[148],[38,12922,12923,12926],{"style":12582},[38,12924],{"className":12925,"style":156},[155],[38,12927,12929],{"className":12928},[160,161,162,163],[38,12930,368],{"className":12931},[98,163],[38,12933,464],{"className":12934},[463],[38,12936,12938],{"className":12937},[144],[38,12939,12941],{"className":12940,"style":531},[148],[38,12942],{},[38,12944,380],{"className":12945},[537],[38,12947],{"className":12948,"style":541},[110],[38,12950,1738],{"className":12951},[98],[38,12953,380],{"className":12954},[537],[38,12956],{"className":12957,"style":541},[110],[38,12959,12961,12964],{"className":12960},[98],[38,12962,12442],{"className":12963,"style":9050},[98,167],[38,12965,12967],{"className":12966},[136],[38,12968,12970,12999],{"className":12969},[140,436],[38,12971,12973,12996],{"className":12972},[144],[38,12974,12976],{"className":12975,"style":509},[148],[38,12977,12978,12981],{"style":12582},[38,12979],{"className":12980,"style":156},[155],[38,12982,12984],{"className":12983},[160,161,162,163],[38,12985,12987,12990,12993],{"className":12986},[98,163],[38,12988,12370],{"className":12989},[98,167,163],[38,12991,12535],{"className":12992},[599,163],[38,12994,348],{"className":12995},[98,163],[38,12997,464],{"className":12998},[463],[38,13000,13002],{"className":13001},[144],[38,13003,13006],{"className":13004,"style":13005},[148],"height:0.2083em;",[38,13007],{},[38,13009,380],{"className":13010},[537],[38,13012],{"className":13013,"style":541},[110],[38,13015,8628],{"className":13016,"style":8665},[98,167],[38,13018,1542],{"className":13019},[1794],[10,13021,8357,13022,13050,13051,13122],{},[38,13023,13025,13038],{"className":13024,"translate":42},[41],[38,13026,13028],{"className":13027},[46],[48,13029,13030],{"xmlns":50},[52,13031,13032,13036],{},[55,13033,13034],{},[58,13035,8628],{},[77,13037,8628],{"encoding":79},[38,13039,13041],{"className":13040,"ariaHidden":85},[84],[38,13042,13044,13047],{"className":13043},[89],[38,13045],{"className":13046,"style":1555},[93],[38,13048,8628],{"className":13049,"style":8665},[98,167]," 是 Encoder 的输出（源语言信息），",[38,13052,13054,13072],{"className":13053,"translate":42},[41],[38,13055,13057],{"className":13056},[46],[48,13058,13059],{"xmlns":50},[52,13060,13061,13069],{},[55,13062,13063],{},[330,13064,13065,13067],{},[58,13066,12442],{},[58,13068,12370],{},[77,13070,13071],{"encoding":79},"y_t",[38,13073,13075],{"className":13074,"ariaHidden":85},[84],[38,13076,13078,13082],{"className":13077},[89],[38,13079],{"className":13080,"style":13081},[93],"height:0.625em;vertical-align:-0.1944em;",[38,13083,13085,13088],{"className":13084},[98],[38,13086,12442],{"className":13087,"style":9050},[98,167],[38,13089,13091],{"className":13090},[136],[38,13092,13094,13114],{"className":13093},[140,436],[38,13095,13097,13111],{"className":13096},[144],[38,13098,13100],{"className":13099,"style":5669},[148],[38,13101,13102,13105],{"style":12582},[38,13103],{"className":13104,"style":156},[155],[38,13106,13108],{"className":13107},[160,161,162,163],[38,13109,12370],{"className":13110},[98,167,163],[38,13112,464],{"className":13113},[463],[38,13115,13117],{"className":13116},[144],[38,13118,13120],{"className":13119,"style":531},[148],[38,13121],{}," 是当前生成的词。",[10,13124,13125,13128],{},[170,13126,13127],{},"举个例子","：",[12339,13130,13131,13234,13336,13438],{},[12342,13132,13133,13134,13137,13138,179],{},"输入 ",[12397,13135,13136],{},"\u003CSTART>","（起始标记），模型预测第一个词：",[38,13139,13141,13164],{"className":13140,"translate":42},[41],[38,13142,13144],{"className":13143},[46],[48,13145,13146],{"xmlns":50},[52,13147,13148,13161],{},[55,13149,13150,13156,13158],{},[330,13151,13152,13154],{},[58,13153,12442],{},[203,13155,348],{},[63,13157,340],{},[335,13159,13160],{},"‘I’",[77,13162,13163],{"encoding":79},"y_1 = \\text{`I'}",[38,13165,13167,13222],{"className":13166,"ariaHidden":85},[84],[38,13168,13170,13173,13213,13216,13219],{"className":13169},[89],[38,13171],{"className":13172,"style":13081},[93],[38,13174,13176,13179],{"className":13175},[98],[38,13177,12442],{"className":13178,"style":9050},[98,167],[38,13180,13182],{"className":13181},[136],[38,13183,13185,13205],{"className":13184},[140,436],[38,13186,13188,13202],{"className":13187},[144],[38,13189,13191],{"className":13190,"style":509},[148],[38,13192,13193,13196],{"style":12582},[38,13194],{"className":13195,"style":156},[155],[38,13197,13199],{"className":13198},[160,161,162,163],[38,13200,348],{"className":13201},[98,163],[38,13203,464],{"className":13204},[463],[38,13206,13208],{"className":13207},[144],[38,13209,13211],{"className":13210,"style":531},[148],[38,13212],{},[38,13214],{"className":13215,"style":111},[110],[38,13217,340],{"className":13218},[115],[38,13220],{"className":13221,"style":111},[110],[38,13223,13225,13228],{"className":13224},[89],[38,13226],{"className":13227,"style":290},[93],[38,13229,13231],{"className":13230},[98,456],[38,13232,13160],{"className":13233},[98],[12342,13235,13133,13236,13239,13240,179],{},[12397,13237,13238],{},"\u003CSTART> I","，模型预测第二个词：",[38,13241,13243,13266],{"className":13242,"translate":42},[41],[38,13244,13246],{"className":13245},[46],[48,13247,13248],{"xmlns":50},[52,13249,13250,13263],{},[55,13251,13252,13258,13260],{},[330,13253,13254,13256],{},[58,13255,12442],{},[203,13257,368],{},[63,13259,340],{},[335,13261,13262],{},"‘love’",[77,13264,13265],{"encoding":79},"y_2 = \\text{`love'}",[38,13267,13269,13324],{"className":13268,"ariaHidden":85},[84],[38,13270,13272,13275,13315,13318,13321],{"className":13271},[89],[38,13273],{"className":13274,"style":13081},[93],[38,13276,13278,13281],{"className":13277},[98],[38,13279,12442],{"className":13280,"style":9050},[98,167],[38,13282,13284],{"className":13283},[136],[38,13285,13287,13307],{"className":13286},[140,436],[38,13288,13290,13304],{"className":13289},[144],[38,13291,13293],{"className":13292,"style":509},[148],[38,13294,13295,13298],{"style":12582},[38,13296],{"className":13297,"style":156},[155],[38,13299,13301],{"className":13300},[160,161,162,163],[38,13302,368],{"className":13303},[98,163],[38,13305,464],{"className":13306},[463],[38,13308,13310],{"className":13309},[144],[38,13311,13313],{"className":13312,"style":531},[148],[38,13314],{},[38,13316],{"className":13317,"style":111},[110],[38,13319,340],{"className":13320},[115],[38,13322],{"className":13323,"style":111},[110],[38,13325,13327,13330],{"className":13326},[89],[38,13328],{"className":13329,"style":290},[93],[38,13331,13333],{"className":13332},[98,456],[38,13334,13262],{"className":13335},[98],[12342,13337,13133,13338,13341,13342,179],{},[12397,13339,13340],{},"\u003CSTART> I love","，模型预测第三个词：",[38,13343,13345,13368],{"className":13344,"translate":42},[41],[38,13346,13348],{"className":13347},[46],[48,13349,13350],{"xmlns":50},[52,13351,13352,13365],{},[55,13353,13354,13360,13362],{},[330,13355,13356,13358],{},[58,13357,12442],{},[203,13359,205],{},[63,13361,340],{},[335,13363,13364],{},"‘machine’",[77,13366,13367],{"encoding":79},"y_3 = \\text{`machine'}",[38,13369,13371,13426],{"className":13370,"ariaHidden":85},[84],[38,13372,13374,13377,13417,13420,13423],{"className":13373},[89],[38,13375],{"className":13376,"style":13081},[93],[38,13378,13380,13383],{"className":13379},[98],[38,13381,12442],{"className":13382,"style":9050},[98,167],[38,13384,13386],{"className":13385},[136],[38,13387,13389,13409],{"className":13388},[140,436],[38,13390,13392,13406],{"className":13391},[144],[38,13393,13395],{"className":13394,"style":509},[148],[38,13396,13397,13400],{"style":12582},[38,13398],{"className":13399,"style":156},[155],[38,13401,13403],{"className":13402},[160,161,162,163],[38,13404,205],{"className":13405},[98,163],[38,13407,464],{"className":13408},[463],[38,13410,13412],{"className":13411},[144],[38,13413,13415],{"className":13414,"style":531},[148],[38,13416],{},[38,13418],{"className":13419,"style":111},[110],[38,13421,340],{"className":13422},[115],[38,13424],{"className":13425,"style":111},[110],[38,13427,13429,13432],{"className":13428},[89],[38,13430],{"className":13431,"style":290},[93],[38,13433,13435],{"className":13434},[98,456],[38,13436,13364],{"className":13437},[98],[12342,13439,13440,13441,13444],{},"……一直持续到模型预测出 ",[12397,13442,13443],{},"\u003CEND>","（结束标记）。",[10,13446,13447,13448,13476,13477,13581],{},"注意：Decoder 在生成第 ",[38,13449,13451,13464],{"className":13450,"translate":42},[41],[38,13452,13454],{"className":13453},[46],[48,13455,13456],{"xmlns":50},[52,13457,13458,13462],{},[55,13459,13460],{},[58,13461,12370],{},[77,13463,12370],{"encoding":79},[38,13465,13467],{"className":13466,"ariaHidden":85},[84],[38,13468,13470,13473],{"className":13469},[89],[38,13471],{"className":13472,"style":12382},[93],[38,13474,12370],{"className":13475},[98,167]," 个词时，是看不到第 ",[38,13478,13480,13514],{"className":13479,"translate":42},[41],[38,13481,13483],{"className":13482},[46],[48,13484,13485],{"xmlns":50},[52,13486,13487,13511],{},[55,13488,13489,13491,13493,13495,13497,13499,13501,13503,13505,13507,13509],{},[58,13490,12370],{},[63,13492,361],{},[203,13494,348],{},[63,13496,380],{"separator":85},[58,13498,12370],{},[63,13500,361],{},[203,13502,368],{},[63,13504,380],{"separator":85},[58,13506,1527],{"mathvariant":400},[58,13508,1527],{"mathvariant":400},[58,13510,1527],{"mathvariant":400},[77,13512,13513],{"encoding":79},"t+1, t+2, ...",[38,13515,13517,13536,13563],{"className":13516,"ariaHidden":85},[84],[38,13518,13520,13524,13527,13530,13533],{"className":13519},[89],[38,13521],{"className":13522,"style":13523},[93],"height:0.6984em;vertical-align:-0.0833em;",[38,13525,12370],{"className":13526},[98,167],[38,13528],{"className":13529,"style":595},[110],[38,13531,361],{"className":13532},[599],[38,13534],{"className":13535,"style":595},[110],[38,13537,13539,13542,13545,13548,13551,13554,13557,13560],{"className":13538},[89],[38,13540],{"className":13541,"style":5710},[93],[38,13543,348],{"className":13544},[98],[38,13546,380],{"className":13547},[537],[38,13549],{"className":13550,"style":541},[110],[38,13552,12370],{"className":13553},[98,167],[38,13555],{"className":13556,"style":595},[110],[38,13558,361],{"className":13559},[599],[38,13561],{"className":13562,"style":595},[110],[38,13564,13566,13569,13572,13575,13578],{"className":13565},[89],[38,13567],{"className":13568,"style":5710},[93],[38,13570,368],{"className":13571},[98],[38,13573,380],{"className":13574},[537],[38,13576],{"className":13577,"style":541},[110],[38,13579,1738],{"className":13580},[98]," 个词的。这很合理——就像考试做填空默写题，你只能看到题目和你自己已经写下来的内容，你不能去翻后面的答案。",[31,13583,13585],{"id":13584},"decoder-训练与掩码注意力","Decoder 训练与掩码注意力",[10,13587,13588,13589,13784,13785,13855,13856,13926],{},"这就引出了一个问题：在训练 Decoder 的时候，我们是希望并行计算的（否则 GPU 利用率太低）。但如果我们像 Encoder 那样直接把整个目标序列 ",[38,13590,13592,13634],{"className":13591,"translate":42},[41],[38,13593,13595],{"className":13594},[46],[48,13596,13597],{"xmlns":50},[52,13598,13599,13631],{},[55,13600,13601,13607,13609,13615,13617,13619,13621,13623,13625],{},[330,13602,13603,13605],{},[58,13604,12442],{},[203,13606,348],{},[63,13608,380],{"separator":85},[330,13610,13611,13613],{},[58,13612,12442],{},[203,13614,368],{},[63,13616,380],{"separator":85},[58,13618,1527],{"mathvariant":400},[58,13620,1527],{"mathvariant":400},[58,13622,1527],{"mathvariant":400},[63,13624,380],{"separator":85},[330,13626,13627,13629],{},[58,13628,12442],{},[58,13630,1960],{},[77,13632,13633],{"encoding":79},"y_1, y_2, ..., y_T",[38,13635,13637],{"className":13636,"ariaHidden":85},[84],[38,13638,13640,13643,13683,13686,13689,13729,13732,13735,13738,13741,13744],{"className":13639},[89],[38,13641],{"className":13642,"style":13081},[93],[38,13644,13646,13649],{"className":13645},[98],[38,13647,12442],{"className":13648,"style":9050},[98,167],[38,13650,13652],{"className":13651},[136],[38,13653,13655,13675],{"className":13654},[140,436],[38,13656,13658,13672],{"className":13657},[144],[38,13659,13661],{"className":13660,"style":509},[148],[38,13662,13663,13666],{"style":12582},[38,13664],{"className":13665,"style":156},[155],[38,13667,13669],{"className":13668},[160,161,162,163],[38,13670,348],{"className":13671},[98,163],[38,13673,464],{"className":13674},[463],[38,13676,13678],{"className":13677},[144],[38,13679,13681],{"className":13680,"style":531},[148],[38,13682],{},[38,13684,380],{"className":13685},[537],[38,13687],{"className":13688,"style":541},[110],[38,13690,13692,13695],{"className":13691},[98],[38,13693,12442],{"className":13694,"style":9050},[98,167],[38,13696,13698],{"className":13697},[136],[38,13699,13701,13721],{"className":13700},[140,436],[38,13702,13704,13718],{"className":13703},[144],[38,13705,13707],{"className":13706,"style":509},[148],[38,13708,13709,13712],{"style":12582},[38,13710],{"className":13711,"style":156},[155],[38,13713,13715],{"className":13714},[160,161,162,163],[38,13716,368],{"className":13717},[98,163],[38,13719,464],{"className":13720},[463],[38,13722,13724],{"className":13723},[144],[38,13725,13727],{"className":13726,"style":531},[148],[38,13728],{},[38,13730,380],{"className":13731},[537],[38,13733],{"className":13734,"style":541},[110],[38,13736,1738],{"className":13737},[98],[38,13739,380],{"className":13740},[537],[38,13742],{"className":13743,"style":541},[110],[38,13745,13747,13750],{"className":13746},[98],[38,13748,12442],{"className":13749,"style":9050},[98,167],[38,13751,13753],{"className":13752},[136],[38,13754,13756,13776],{"className":13755},[140,436],[38,13757,13759,13773],{"className":13758},[144],[38,13760,13762],{"className":13761,"style":11143},[148],[38,13763,13764,13767],{"style":12582},[38,13765],{"className":13766,"style":156},[155],[38,13768,13770],{"className":13769},[160,161,162,163],[38,13771,1960],{"className":13772,"style":1842},[98,167,163],[38,13774,464],{"className":13775},[463],[38,13777,13779],{"className":13778},[144],[38,13780,13782],{"className":13781,"style":531},[148],[38,13783],{}," 一次性输入给 Decoder，那么它在计算 ",[38,13786,13788,13806],{"className":13787,"translate":42},[41],[38,13789,13791],{"className":13790},[46],[48,13792,13793],{"xmlns":50},[52,13794,13795,13803],{},[55,13796,13797],{},[330,13798,13799,13801],{},[58,13800,12442],{},[203,13802,348],{},[77,13804,13805],{"encoding":79},"y_1",[38,13807,13809],{"className":13808,"ariaHidden":85},[84],[38,13810,13812,13815],{"className":13811},[89],[38,13813],{"className":13814,"style":13081},[93],[38,13816,13818,13821],{"className":13817},[98],[38,13819,12442],{"className":13820,"style":9050},[98,167],[38,13822,13824],{"className":13823},[136],[38,13825,13827,13847],{"className":13826},[140,436],[38,13828,13830,13844],{"className":13829},[144],[38,13831,13833],{"className":13832,"style":509},[148],[38,13834,13835,13838],{"style":12582},[38,13836],{"className":13837,"style":156},[155],[38,13839,13841],{"className":13840},[160,161,162,163],[38,13842,348],{"className":13843},[98,163],[38,13845,464],{"className":13846},[463],[38,13848,13850],{"className":13849},[144],[38,13851,13853],{"className":13852,"style":531},[148],[38,13854],{}," 的时候就能看到 ",[38,13857,13859,13877],{"className":13858,"translate":42},[41],[38,13860,13862],{"className":13861},[46],[48,13863,13864],{"xmlns":50},[52,13865,13866,13874],{},[55,13867,13868],{},[330,13869,13870,13872],{},[58,13871,12442],{},[203,13873,368],{},[77,13875,13876],{"encoding":79},"y_2",[38,13878,13880],{"className":13879,"ariaHidden":85},[84],[38,13881,13883,13886],{"className":13882},[89],[38,13884],{"className":13885,"style":13081},[93],[38,13887,13889,13892],{"className":13888},[98],[38,13890,12442],{"className":13891,"style":9050},[98,167],[38,13893,13895],{"className":13894},[136],[38,13896,13898,13918],{"className":13897},[140,436],[38,13899,13901,13915],{"className":13900},[144],[38,13902,13904],{"className":13903,"style":509},[148],[38,13905,13906,13909],{"style":12582},[38,13907],{"className":13908,"style":156},[155],[38,13910,13912],{"className":13911},[160,161,162,163],[38,13913,368],{"className":13914},[98,163],[38,13916,464],{"className":13917},[463],[38,13919,13921],{"className":13920},[144],[38,13922,13924],{"className":13923,"style":531},[148],[38,13925],{},"，这就叫做“信息泄露”（cheating）。",[10,13928,13929,13930,13932],{},"解决方案是 ",[170,13931,12354],{},"，也叫 Causal Attention（因果注意力）。",[10,13934,13935,13936,14007,14008,14043,14044,14114],{},"它的原理非常简单：在计算注意力分数矩阵 ",[38,13937,13939,13960],{"className":13938,"translate":42},[41],[38,13940,13942],{"className":13941},[46],[48,13943,13944],{"xmlns":50},[52,13945,13946,13958],{},[55,13947,13948,13950,13952],{},[58,13949,8941],{},[63,13951,351],{"separator":85},[67,13953,13954,13956],{},[58,13955,8958],{},[58,13957,1960],{},[77,13959,9412],{"encoding":79},[38,13961,13963],{"className":13962,"ariaHidden":85},[84],[38,13964,13966,13969,13972,13975,13978],{"className":13965},[89],[38,13967],{"className":13968,"style":9422},[93],[38,13970,8941],{"className":13971},[98,167],[38,13973,351],{"className":13974},[537],[38,13976],{"className":13977,"style":541},[110],[38,13979,13981,13984],{"className":13980},[98],[38,13982,8958],{"className":13983,"style":9074},[98,167],[38,13985,13987],{"className":13986},[136],[38,13988,13990],{"className":13989},[140],[38,13991,13993],{"className":13992},[144],[38,13994,13996],{"className":13995,"style":9450},[148],[38,13997,13998,14001],{"style":151},[38,13999],{"className":14000,"style":156},[155],[38,14002,14004],{"className":14003},[160,161,162,163],[38,14005,1960],{"className":14006,"style":1842},[98,167,163]," 时，我们把“未来”位置的分数全部设为 ",[38,14009,14011,14028],{"className":14010,"translate":42},[41],[38,14012,14014],{"className":14013},[46],[48,14015,14016],{"xmlns":50},[52,14017,14018,14025],{},[55,14019,14020,14022],{},[63,14021,12535],{},[58,14023,14024],{"mathvariant":400},"∞",[77,14026,14027],{"encoding":79},"-\\infty",[38,14029,14031],{"className":14030,"ariaHidden":85},[84],[38,14032,14034,14037,14040],{"className":14033},[89],[38,14035],{"className":14036,"style":11002},[93],[38,14038,12535],{"className":14039},[98],[38,14041,14024],{"className":14042},[98],"（或者一个非常大的负数，比如 ",[38,14045,14047,14069],{"className":14046,"translate":42},[41],[38,14048,14050],{"className":14049},[46],[48,14051,14052],{"xmlns":50},[52,14053,14054,14066],{},[55,14055,14056,14058],{},[63,14057,12535],{},[67,14059,14060,14063],{},[203,14061,14062],{},"10",[203,14064,14065],{},"9",[77,14067,14068],{"encoding":79},"-10^9",[38,14070,14072],{"className":14071,"ariaHidden":85},[84],[38,14073,14075,14079,14082,14085],{"className":14074},[89],[38,14076],{"className":14077,"style":14078},[93],"height:0.8974em;vertical-align:-0.0833em;",[38,14080,12535],{"className":14081},[98],[38,14083,348],{"className":14084},[98],[38,14086,14088,14091],{"className":14087},[98],[38,14089,404],{"className":14090},[98],[38,14092,14094],{"className":14093},[136],[38,14095,14097],{"className":14096},[140],[38,14098,14100],{"className":14099},[144],[38,14101,14103],{"className":14102,"style":218},[148],[38,14104,14105,14108],{"style":151},[38,14106],{"className":14107,"style":156},[155],[38,14109,14111],{"className":14110},[160,161,162,163],[38,14112,14065],{"className":14113},[98,163],"）。",[10,14116,14117],{},"矩阵形状是这样的（假设序列长度为 4），是一个上三角全被遮盖的矩阵：",[38,14119,14121],{"className":14120,"translate":42},[315],[38,14122,14124,14278],{"className":14123,"translate":42},[41],[38,14125,14127],{"className":14126},[46],[48,14128,14129],{"xmlns":50,"display":324},[52,14130,14131,14275],{},[55,14132,14133,14136,14138],{},[335,14134,14135],{},"Mask",[63,14137,340],{},[55,14139,14140,14142,14273],{},[63,14141,1500],{"fence":85},[3256,14143,14145,14183,14217,14247],{"rowspacing":3258,"columnalign":14144,"columnspacing":383},"center center center center",[3261,14146,14147,14153,14163,14173],{},[3264,14148,14149],{},[3267,14150,14151],{"scriptlevel":404,"displaystyle":1499},[203,14152,404],{},[3264,14154,14155],{},[3267,14156,14157],{"scriptlevel":404,"displaystyle":1499},[55,14158,14159,14161],{},[63,14160,12535],{},[58,14162,14024],{"mathvariant":400},[3264,14164,14165],{},[3267,14166,14167],{"scriptlevel":404,"displaystyle":1499},[55,14168,14169,14171],{},[63,14170,12535],{},[58,14172,14024],{"mathvariant":400},[3264,14174,14175],{},[3267,14176,14177],{"scriptlevel":404,"displaystyle":1499},[55,14178,14179,14181],{},[63,14180,12535],{},[58,14182,14024],{"mathvariant":400},[3261,14184,14185,14191,14197,14207],{},[3264,14186,14187],{},[3267,14188,14189],{"scriptlevel":404,"displaystyle":1499},[203,14190,404],{},[3264,14192,14193],{},[3267,14194,14195],{"scriptlevel":404,"displaystyle":1499},[203,14196,404],{},[3264,14198,14199],{},[3267,14200,14201],{"scriptlevel":404,"displaystyle":1499},[55,14202,14203,14205],{},[63,14204,12535],{},[58,14206,14024],{"mathvariant":400},[3264,14208,14209],{},[3267,14210,14211],{"scriptlevel":404,"displaystyle":1499},[55,14212,14213,14215],{},[63,14214,12535],{},[58,14216,14024],{"mathvariant":400},[3261,14218,14219,14225,14231,14237],{},[3264,14220,14221],{},[3267,14222,14223],{"scriptlevel":404,"displaystyle":1499},[203,14224,404],{},[3264,14226,14227],{},[3267,14228,14229],{"scriptlevel":404,"displaystyle":1499},[203,14230,404],{},[3264,14232,14233],{},[3267,14234,14235],{"scriptlevel":404,"displaystyle":1499},[203,14236,404],{},[3264,14238,14239],{},[3267,14240,14241],{"scriptlevel":404,"displaystyle":1499},[55,14242,14243,14245],{},[63,14244,12535],{},[58,14246,14024],{"mathvariant":400},[3261,14248,14249,14255,14261,14267],{},[3264,14250,14251],{},[3267,14252,14253],{"scriptlevel":404,"displaystyle":1499},[203,14254,404],{},[3264,14256,14257],{},[3267,14258,14259],{"scriptlevel":404,"displaystyle":1499},[203,14260,404],{},[3264,14262,14263],{},[3267,14264,14265],{"scriptlevel":404,"displaystyle":1499},[203,14266,404],{},[3264,14268,14269],{},[3267,14270,14271],{"scriptlevel":404,"displaystyle":1499},[203,14272,404],{},[63,14274,1542],{"fence":85},[77,14276,14277],{"encoding":79},"\\text{Mask} = \\begin{pmatrix}\n0 & -\\infty & -\\infty & -\\infty \\\\\n0 & 0 & -\\infty & -\\infty \\\\\n0 & 0 & 0 & -\\infty \\\\\n0 & 0 & 0 & 0\n\\end{pmatrix}",[38,14279,14281,14302],{"className":14280,"ariaHidden":85},[84],[38,14282,14284,14287,14293,14296,14299],{"className":14283},[89],[38,14285],{"className":14286,"style":290},[93],[38,14288,14290],{"className":14289},[98,456],[38,14291,14135],{"className":14292},[98],[38,14294],{"className":14295,"style":111},[110],[38,14297,340],{"className":14298},[115],[38,14300],{"className":14301,"style":111},[110],[38,14303,14305,14309],{"className":14304},[89],[38,14306],{"className":14307,"style":14308},[93],"height:4.8em;vertical-align:-2.15em;",[38,14310,14312,14357,14673],{"className":14311},[3432],[38,14313,14315],{"className":14314},[1578],[38,14316,14318],{"className":14317},[3439,3440],[38,14319,14321,14348],{"className":14320},[140,436],[38,14322,14324,14345],{"className":14323},[144],[38,14325,14328],{"className":14326,"style":14327},[148],"height:2.65em;",[38,14329,14331,14335],{"style":14330},"top:-4.65em;",[38,14332],{"className":14333,"style":14334},[155],"height:6.8em;",[38,14336,14338],{"style":14337},"width:0.875em;height:4.800em;",[3462,14339,14342],{"xmlns":3464,"width":3465,"height":14340,"viewBox":14341},"4.800em","0 0 875 4800",[3469,14343],{"d":14344},"M863,9c0,-2,-2,-5,-6,-9c0,0,-17,0,-17,0c-12.7,0,-19.3,0.3,-20,1\nc-5.3,5.3,-10.3,11,-15,17c-242.7,294.7,-395.3,682,-458,1162c-21.3,163.3,-33.3,349,\n-36,557 l0,1284c0.2,6,0,26,0,60c2,159.3,10,310.7,24,454c53.3,528,210,\n949.7,470,1265c4.7,6,9.7,11.7,15,17c0.7,0.7,7,1,19,1c0,0,18,0,18,0c4,-4,6,-7,6,-9\nc0,-2.7,-3.3,-8.7,-10,-18c-135.3,-192.7,-235.5,-414.3,-300.5,-665c-65,-250.7,-102.5,\n-544.7,-112.5,-882c-2,-104,-3,-167,-3,-189\nl0,-1292c0,-162.7,5.7,-314,17,-454c20.7,-272,63.7,-513,129,-723c65.3,\n-210,155.3,-396.3,270,-559c6.7,-9.3,10,-15.3,10,-18z",[38,14346,464],{"className":14347},[463],[38,14349,14351],{"className":14350},[144],[38,14352,14355],{"className":14353,"style":14354},[148],"height:2.15em;",[38,14356],{},[38,14358,14360],{"className":14359},[98],[38,14361,14363,14434,14439,14442,14512,14515,14518,14591,14594,14597],{"className":14362},[3256],[38,14364,14366],{"className":14365},[3493],[38,14367,14369,14426],{"className":14368},[140,436],[38,14370,14372,14423],{"className":14371},[144],[38,14373,14375,14387,14399,14411],{"className":14374,"style":14327},[148],[38,14376,14378,14381],{"style":14377},"top:-4.81em;",[38,14379],{"className":14380,"style":3510},[155],[38,14382,14384],{"className":14383},[98],[38,14385,404],{"className":14386},[98],[38,14388,14390,14393],{"style":14389},"top:-3.61em;",[38,14391],{"className":14392,"style":3510},[155],[38,14394,14396],{"className":14395},[98],[38,14397,404],{"className":14398},[98],[38,14400,14402,14405],{"style":14401},"top:-2.41em;",[38,14403],{"className":14404,"style":3510},[155],[38,14406,14408],{"className":14407},[98],[38,14409,404],{"className":14410},[98],[38,14412,14414,14417],{"style":14413},"top:-1.21em;",[38,14415],{"className":14416,"style":3510},[155],[38,14418,14420],{"className":14419},[98],[38,14421,404],{"className":14422},[98],[38,14424,464],{"className":14425},[463],[38,14427,14429],{"className":14428},[144],[38,14430,14432],{"className":14431,"style":14354},[148],[38,14433],{},[38,14435],{"className":14436,"style":14438},[14437],"arraycolsep","width:0.5em;",[38,14440],{"className":14441,"style":14438},[14437],[38,14443,14445],{"className":14444},[3493],[38,14446,14448,14504],{"className":14447},[140,436],[38,14449,14451,14501],{"className":14450},[144],[38,14452,14454,14468,14479,14490],{"className":14453,"style":14327},[148],[38,14455,14456,14459],{"style":14377},[38,14457],{"className":14458,"style":3510},[155],[38,14460,14462,14465],{"className":14461},[98],[38,14463,12535],{"className":14464},[98],[38,14466,14024],{"className":14467},[98],[38,14469,14470,14473],{"style":14389},[38,14471],{"className":14472,"style":3510},[155],[38,14474,14476],{"className":14475},[98],[38,14477,404],{"className":14478},[98],[38,14480,14481,14484],{"style":14401},[38,14482],{"className":14483,"style":3510},[155],[38,14485,14487],{"className":14486},[98],[38,14488,404],{"className":14489},[98],[38,14491,14492,14495],{"style":14413},[38,14493],{"className":14494,"style":3510},[155],[38,14496,14498],{"className":14497},[98],[38,14499,404],{"className":14500},[98],[38,14502,464],{"className":14503},[463],[38,14505,14507],{"className":14506},[144],[38,14508,14510],{"className":14509,"style":14354},[148],[38,14511],{},[38,14513],{"className":14514,"style":14438},[14437],[38,14516],{"className":14517,"style":14438},[14437],[38,14519,14521],{"className":14520},[3493],[38,14522,14524,14583],{"className":14523},[140,436],[38,14525,14527,14580],{"className":14526},[144],[38,14528,14530,14544,14558,14569],{"className":14529,"style":14327},[148],[38,14531,14532,14535],{"style":14377},[38,14533],{"className":14534,"style":3510},[155],[38,14536,14538,14541],{"className":14537},[98],[38,14539,12535],{"className":14540},[98],[38,14542,14024],{"className":14543},[98],[38,14545,14546,14549],{"style":14389},[38,14547],{"className":14548,"style":3510},[155],[38,14550,14552,14555],{"className":14551},[98],[38,14553,12535],{"className":14554},[98],[38,14556,14024],{"className":14557},[98],[38,14559,14560,14563],{"style":14401},[38,14561],{"className":14562,"style":3510},[155],[38,14564,14566],{"className":14565},[98],[38,14567,404],{"className":14568},[98],[38,14570,14571,14574],{"style":14413},[38,14572],{"className":14573,"style":3510},[155],[38,14575,14577],{"className":14576},[98],[38,14578,404],{"className":14579},[98],[38,14581,464],{"className":14582},[463],[38,14584,14586],{"className":14585},[144],[38,14587,14589],{"className":14588,"style":14354},[148],[38,14590],{},[38,14592],{"className":14593,"style":14438},[14437],[38,14595],{"className":14596,"style":14438},[14437],[38,14598,14600],{"className":14599},[3493],[38,14601,14603,14665],{"className":14602},[140,436],[38,14604,14606,14662],{"className":14605},[144],[38,14607,14609,14623,14637,14651],{"className":14608,"style":14327},[148],[38,14610,14611,14614],{"style":14377},[38,14612],{"className":14613,"style":3510},[155],[38,14615,14617,14620],{"className":14616},[98],[38,14618,12535],{"className":14619},[98],[38,14621,14024],{"className":14622},[98],[38,14624,14625,14628],{"style":14389},[38,14626],{"className":14627,"style":3510},[155],[38,14629,14631,14634],{"className":14630},[98],[38,14632,12535],{"className":14633},[98],[38,14635,14024],{"className":14636},[98],[38,14638,14639,14642],{"style":14401},[38,14640],{"className":14641,"style":3510},[155],[38,14643,14645,14648],{"className":14644},[98],[38,14646,12535],{"className":14647},[98],[38,14649,14024],{"className":14650},[98],[38,14652,14653,14656],{"style":14413},[38,14654],{"className":14655,"style":3510},[155],[38,14657,14659],{"className":14658},[98],[38,14660,404],{"className":14661},[98],[38,14663,464],{"className":14664},[463],[38,14666,14668],{"className":14667},[144],[38,14669,14671],{"className":14670,"style":14354},[148],[38,14672],{},[38,14674,14676],{"className":14675},[1794],[38,14677,14679],{"className":14678},[3439,3440],[38,14680,14682,14703],{"className":14681},[140,436],[38,14683,14685,14700],{"className":14684},[144],[38,14686,14688],{"className":14687,"style":14327},[148],[38,14689,14690,14693],{"style":14330},[38,14691],{"className":14692,"style":14334},[155],[38,14694,14695],{"style":14337},[3462,14696,14697],{"xmlns":3464,"width":3465,"height":14340,"viewBox":14341},[3469,14698],{"d":14699},"M76,0c-16.7,0,-25,3,-25,9c0,2,2,6.3,6,13c21.3,28.7,42.3,60.3,\n63,95c96.7,156.7,172.8,332.5,228.5,527.5c55.7,195,92.8,416.5,111.5,664.5\nc11.3,139.3,17,290.7,17,454c0,28,1.7,43,3.3,45l0,1209\nc-3,4,-3.3,16.7,-3.3,38c0,162,-5.7,313.7,-17,455c-18.7,248,-55.8,469.3,-111.5,664\nc-55.7,194.7,-131.8,370.3,-228.5,527c-20.7,34.7,-41.7,66.3,-63,95c-2,3.3,-4,7,-6,11\nc0,7.3,5.7,11,17,11c0,0,11,0,11,0c9.3,0,14.3,-0.3,15,-1c5.3,-5.3,10.3,-11,15,-17\nc242.7,-294.7,395.3,-681.7,458,-1161c21.3,-164.7,33.3,-350.7,36,-558\nl0,-1344c-2,-159.3,-10,-310.7,-24,-454c-53.3,-528,-210,-949.7,\n-470,-1265c-4.7,-6,-9.7,-11.7,-15,-17c-0.7,-0.7,-6.7,-1,-18,-1z",[38,14701,464],{"className":14702},[463],[38,14704,14706],{"className":14705},[144],[38,14707,14709],{"className":14708,"style":14354},[148],[38,14710],{},[10,14712,14713,13128],{},[170,14714,14715],{},"解释",[14717,14718,14719,15041],"ul",{},[12342,14720,14721,14722,14791,14792,14861,14862,179],{},"第一行（",[38,14723,14725,14742],{"className":14724,"translate":42},[41],[38,14726,14728],{"className":14727},[46],[48,14729,14730],{"xmlns":50},[52,14731,14732,14740],{},[55,14733,14734],{},[330,14735,14736,14738],{},[58,14737,12442],{},[203,14739,348],{},[77,14741,13805],{"encoding":79},[38,14743,14745],{"className":14744,"ariaHidden":85},[84],[38,14746,14748,14751],{"className":14747},[89],[38,14749],{"className":14750,"style":13081},[93],[38,14752,14754,14757],{"className":14753},[98],[38,14755,12442],{"className":14756,"style":9050},[98,167],[38,14758,14760],{"className":14759},[136],[38,14761,14763,14783],{"className":14762},[140,436],[38,14764,14766,14780],{"className":14765},[144],[38,14767,14769],{"className":14768,"style":509},[148],[38,14770,14771,14774],{"style":12582},[38,14772],{"className":14773,"style":156},[155],[38,14775,14777],{"className":14776},[160,161,162,163],[38,14778,348],{"className":14779},[98,163],[38,14781,464],{"className":14782},[463],[38,14784,14786],{"className":14785},[144],[38,14787,14789],{"className":14788,"style":531},[148],[38,14790],{}," 的位置）：只能看到 ",[38,14793,14795,14812],{"className":14794,"translate":42},[41],[38,14796,14798],{"className":14797},[46],[48,14799,14800],{"xmlns":50},[52,14801,14802,14810],{},[55,14803,14804],{},[330,14805,14806,14808],{},[58,14807,12442],{},[203,14809,348],{},[77,14811,13805],{"encoding":79},[38,14813,14815],{"className":14814,"ariaHidden":85},[84],[38,14816,14818,14821],{"className":14817},[89],[38,14819],{"className":14820,"style":13081},[93],[38,14822,14824,14827],{"className":14823},[98],[38,14825,12442],{"className":14826,"style":9050},[98,167],[38,14828,14830],{"className":14829},[136],[38,14831,14833,14853],{"className":14832},[140,436],[38,14834,14836,14850],{"className":14835},[144],[38,14837,14839],{"className":14838,"style":509},[148],[38,14840,14841,14844],{"style":12582},[38,14842],{"className":14843,"style":156},[155],[38,14845,14847],{"className":14846},[160,161,162,163],[38,14848,348],{"className":14849},[98,163],[38,14851,464],{"className":14852},[463],[38,14854,14856],{"className":14855},[144],[38,14857,14859],{"className":14858,"style":531},[148],[38,14860],{}," 自己，不能看到 ",[38,14863,14865,14900],{"className":14864,"translate":42},[41],[38,14866,14868],{"className":14867},[46],[48,14869,14870],{"xmlns":50},[52,14871,14872,14897],{},[55,14873,14874,14880,14882,14888,14890],{},[330,14875,14876,14878],{},[58,14877,12442],{},[203,14879,368],{},[63,14881,380],{"separator":85},[330,14883,14884,14886],{},[58,14885,12442],{},[203,14887,205],{},[63,14889,380],{"separator":85},[330,14891,14892,14894],{},[58,14893,12442],{},[203,14895,14896],{},"4",[77,14898,14899],{"encoding":79},"y_2, y_3, y_4",[38,14901,14903],{"className":14902,"ariaHidden":85},[84],[38,14904,14906,14909,14949,14952,14955,14995,14998,15001],{"className":14905},[89],[38,14907],{"className":14908,"style":13081},[93],[38,14910,14912,14915],{"className":14911},[98],[38,14913,12442],{"className":14914,"style":9050},[98,167],[38,14916,14918],{"className":14917},[136],[38,14919,14921,14941],{"className":14920},[140,436],[38,14922,14924,14938],{"className":14923},[144],[38,14925,14927],{"className":14926,"style":509},[148],[38,14928,14929,14932],{"style":12582},[38,14930],{"className":14931,"style":156},[155],[38,14933,14935],{"className":14934},[160,161,162,163],[38,14936,368],{"className":14937},[98,163],[38,14939,464],{"className":14940},[463],[38,14942,14944],{"className":14943},[144],[38,14945,14947],{"className":14946,"style":531},[148],[38,14948],{},[38,14950,380],{"className":14951},[537],[38,14953],{"className":14954,"style":541},[110],[38,14956,14958,14961],{"className":14957},[98],[38,14959,12442],{"className":14960,"style":9050},[98,167],[38,14962,14964],{"className":14963},[136],[38,14965,14967,14987],{"className":14966},[140,436],[38,14968,14970,14984],{"className":14969},[144],[38,14971,14973],{"className":14972,"style":509},[148],[38,14974,14975,14978],{"style":12582},[38,14976],{"className":14977,"style":156},[155],[38,14979,14981],{"className":14980},[160,161,162,163],[38,14982,205],{"className":14983},[98,163],[38,14985,464],{"className":14986},[463],[38,14988,14990],{"className":14989},[144],[38,14991,14993],{"className":14992,"style":531},[148],[38,14994],{},[38,14996,380],{"className":14997},[537],[38,14999],{"className":15000,"style":541},[110],[38,15002,15004,15007],{"className":15003},[98],[38,15005,12442],{"className":15006,"style":9050},[98,167],[38,15008,15010],{"className":15009},[136],[38,15011,15013,15033],{"className":15012},[140,436],[38,15014,15016,15030],{"className":15015},[144],[38,15017,15019],{"className":15018,"style":509},[148],[38,15020,15021,15024],{"style":12582},[38,15022],{"className":15023,"style":156},[155],[38,15025,15027],{"className":15026},[160,161,162,163],[38,15028,14896],{"className":15029},[98,163],[38,15031,464],{"className":15032},[463],[38,15034,15036],{"className":15035},[144],[38,15037,15039],{"className":15038,"style":531},[148],[38,15040],{},[12342,15042,15043,15044,14791,15113,15237,15238,179],{},"第二行（",[38,15045,15047,15064],{"className":15046,"translate":42},[41],[38,15048,15050],{"className":15049},[46],[48,15051,15052],{"xmlns":50},[52,15053,15054,15062],{},[55,15055,15056],{},[330,15057,15058,15060],{},[58,15059,12442],{},[203,15061,368],{},[77,15063,13876],{"encoding":79},[38,15065,15067],{"className":15066,"ariaHidden":85},[84],[38,15068,15070,15073],{"className":15069},[89],[38,15071],{"className":15072,"style":13081},[93],[38,15074,15076,15079],{"className":15075},[98],[38,15077,12442],{"className":15078,"style":9050},[98,167],[38,15080,15082],{"className":15081},[136],[38,15083,15085,15105],{"className":15084},[140,436],[38,15086,15088,15102],{"className":15087},[144],[38,15089,15091],{"className":15090,"style":509},[148],[38,15092,15093,15096],{"style":12582},[38,15094],{"className":15095,"style":156},[155],[38,15097,15099],{"className":15098},[160,161,162,163],[38,15100,368],{"className":15101},[98,163],[38,15103,464],{"className":15104},[463],[38,15106,15108],{"className":15107},[144],[38,15109,15111],{"className":15110,"style":531},[148],[38,15112],{},[38,15114,15116,15142],{"className":15115,"translate":42},[41],[38,15117,15119],{"className":15118},[46],[48,15120,15121],{"xmlns":50},[52,15122,15123,15139],{},[55,15124,15125,15131,15133],{},[330,15126,15127,15129],{},[58,15128,12442],{},[203,15130,348],{},[63,15132,380],{"separator":85},[330,15134,15135,15137],{},[58,15136,12442],{},[203,15138,368],{},[77,15140,15141],{"encoding":79},"y_1, y_2",[38,15143,15145],{"className":15144,"ariaHidden":85},[84],[38,15146,15148,15151,15191,15194,15197],{"className":15147},[89],[38,15149],{"className":15150,"style":13081},[93],[38,15152,15154,15157],{"className":15153},[98],[38,15155,12442],{"className":15156,"style":9050},[98,167],[38,15158,15160],{"className":15159},[136],[38,15161,15163,15183],{"className":15162},[140,436],[38,15164,15166,15180],{"className":15165},[144],[38,15167,15169],{"className":15168,"style":509},[148],[38,15170,15171,15174],{"style":12582},[38,15172],{"className":15173,"style":156},[155],[38,15175,15177],{"className":15176},[160,161,162,163],[38,15178,348],{"className":15179},[98,163],[38,15181,464],{"className":15182},[463],[38,15184,15186],{"className":15185},[144],[38,15187,15189],{"className":15188,"style":531},[148],[38,15190],{},[38,15192,380],{"className":15193},[537],[38,15195],{"className":15196,"style":541},[110],[38,15198,15200,15203],{"className":15199},[98],[38,15201,12442],{"className":15202,"style":9050},[98,167],[38,15204,15206],{"className":15205},[136],[38,15207,15209,15229],{"className":15208},[140,436],[38,15210,15212,15226],{"className":15211},[144],[38,15213,15215],{"className":15214,"style":509},[148],[38,15216,15217,15220],{"style":12582},[38,15218],{"className":15219,"style":156},[155],[38,15221,15223],{"className":15222},[160,161,162,163],[38,15224,368],{"className":15225},[98,163],[38,15227,464],{"className":15228},[463],[38,15230,15232],{"className":15231},[144],[38,15233,15235],{"className":15234,"style":531},[148],[38,15236],{},"，不能看到 ",[38,15239,15241,15267],{"className":15240,"translate":42},[41],[38,15242,15244],{"className":15243},[46],[48,15245,15246],{"xmlns":50},[52,15247,15248,15264],{},[55,15249,15250,15256,15258],{},[330,15251,15252,15254],{},[58,15253,12442],{},[203,15255,205],{},[63,15257,380],{"separator":85},[330,15259,15260,15262],{},[58,15261,12442],{},[203,15263,14896],{},[77,15265,15266],{"encoding":79},"y_3, y_4",[38,15268,15270],{"className":15269,"ariaHidden":85},[84],[38,15271,15273,15276,15316,15319,15322],{"className":15272},[89],[38,15274],{"className":15275,"style":13081},[93],[38,15277,15279,15282],{"className":15278},[98],[38,15280,12442],{"className":15281,"style":9050},[98,167],[38,15283,15285],{"className":15284},[136],[38,15286,15288,15308],{"className":15287},[140,436],[38,15289,15291,15305],{"className":15290},[144],[38,15292,15294],{"className":15293,"style":509},[148],[38,15295,15296,15299],{"style":12582},[38,15297],{"className":15298,"style":156},[155],[38,15300,15302],{"className":15301},[160,161,162,163],[38,15303,205],{"className":15304},[98,163],[38,15306,464],{"className":15307},[463],[38,15309,15311],{"className":15310},[144],[38,15312,15314],{"className":15313,"style":531},[148],[38,15315],{},[38,15317,380],{"className":15318},[537],[38,15320],{"className":15321,"style":541},[110],[38,15323,15325,15328],{"className":15324},[98],[38,15326,12442],{"className":15327,"style":9050},[98,167],[38,15329,15331],{"className":15330},[136],[38,15332,15334,15354],{"className":15333},[140,436],[38,15335,15337,15351],{"className":15336},[144],[38,15338,15340],{"className":15339,"style":509},[148],[38,15341,15342,15345],{"style":12582},[38,15343],{"className":15344,"style":156},[155],[38,15346,15348],{"className":15347},[160,161,162,163],[38,15349,14896],{"className":15350},[98,163],[38,15352,464],{"className":15353},[463],[38,15355,15357],{"className":15356},[144],[38,15358,15360],{"className":15359,"style":531},[148],[38,15361],{},[10,15363,15364],{},"以此类推。",[10,15366,15367,15368,15401,15402,15495,15496,15565,15566,15743,15744,15813,15814,15883,15884,16007],{},"当 Softmax 处理这些分数时，",[38,15369,15371,15386],{"className":15370,"translate":42},[41],[38,15372,15374],{"className":15373},[46],[48,15375,15376],{"xmlns":50},[52,15377,15378,15384],{},[55,15379,15380,15382],{},[63,15381,12535],{},[58,15383,14024],{"mathvariant":400},[77,15385,14027],{"encoding":79},[38,15387,15389],{"className":15388,"ariaHidden":85},[84],[38,15390,15392,15395,15398],{"className":15391},[89],[38,15393],{"className":15394,"style":11002},[93],[38,15396,12535],{"className":15397},[98],[38,15399,14024],{"className":15400},[98]," 会被映射成 0（因为 ",[38,15403,15405,15432],{"className":15404,"translate":42},[41],[38,15406,15408],{"className":15407},[46],[48,15409,15410],{"xmlns":50},[52,15411,15412,15429],{},[55,15413,15414,15425,15427],{},[67,15415,15416,15419],{},[58,15417,15418],{},"e",[55,15420,15421,15423],{},[63,15422,12535],{},[58,15424,14024],{"mathvariant":400},[63,15426,340],{},[203,15428,404],{},[77,15430,15431],{"encoding":79},"e^{-\\infty} = 0",[38,15433,15435,15486],{"className":15434,"ariaHidden":85},[84],[38,15436,15438,15442,15477,15480,15483],{"className":15437},[89],[38,15439],{"className":15440,"style":15441},[93],"height:0.7713em;",[38,15443,15445,15448],{"className":15444},[98],[38,15446,15418],{"className":15447},[98,167],[38,15449,15451],{"className":15450},[136],[38,15452,15454],{"className":15453},[140],[38,15455,15457],{"className":15456},[144],[38,15458,15460],{"className":15459,"style":15441},[148],[38,15461,15462,15465],{"style":151},[38,15463],{"className":15464,"style":156},[155],[38,15466,15468],{"className":15467},[160,161,162,163],[38,15469,15471,15474],{"className":15470},[98,163],[38,15472,12535],{"className":15473},[98,163],[38,15475,14024],{"className":15476},[98,163],[38,15478],{"className":15479,"style":111},[110],[38,15481,340],{"className":15482},[115],[38,15484],{"className":15485,"style":111},[110],[38,15487,15489,15492],{"className":15488},[89],[38,15490],{"className":15491,"style":856},[93],[38,15493,404],{"className":15494},[98],"）。这样一来，Decoder 在移动到 ",[38,15497,15499,15516],{"className":15498,"translate":42},[41],[38,15500,15502],{"className":15501},[46],[48,15503,15504],{"xmlns":50},[52,15505,15506,15514],{},[55,15507,15508],{},[330,15509,15510,15512],{},[58,15511,12442],{},[203,15513,348],{},[77,15515,13805],{"encoding":79},[38,15517,15519],{"className":15518,"ariaHidden":85},[84],[38,15520,15522,15525],{"className":15521},[89],[38,15523],{"className":15524,"style":13081},[93],[38,15526,15528,15531],{"className":15527},[98],[38,15529,12442],{"className":15530,"style":9050},[98,167],[38,15532,15534],{"className":15533},[136],[38,15535,15537,15557],{"className":15536},[140,436],[38,15538,15540,15554],{"className":15539},[144],[38,15541,15543],{"className":15542,"style":509},[148],[38,15544,15545,15548],{"style":12582},[38,15546],{"className":15547,"style":156},[155],[38,15549,15551],{"className":15550},[160,161,162,163],[38,15552,348],{"className":15553},[98,163],[38,15555,464],{"className":15556},[463],[38,15558,15560],{"className":15559},[144],[38,15561,15563],{"className":15562,"style":531},[148],[38,15564],{}," ，计算其含义表示时，",[38,15567,15569,15602],{"className":15568,"translate":42},[41],[38,15570,15572],{"className":15571},[46],[48,15573,15574],{"xmlns":50},[52,15575,15576,15600],{},[55,15577,15578,15584,15586,15592,15594],{},[330,15579,15580,15582],{},[58,15581,12442],{},[203,15583,368],{},[63,15585,380],{"separator":85},[330,15587,15588,15590],{},[58,15589,12442],{},[203,15591,205],{},[63,15593,380],{"separator":85},[330,15595,15596,15598],{},[58,15597,12442],{},[203,15599,14896],{},[77,15601,14899],{"encoding":79},[38,15603,15605],{"className":15604,"ariaHidden":85},[84],[38,15606,15608,15611,15651,15654,15657,15697,15700,15703],{"className":15607},[89],[38,15609],{"className":15610,"style":13081},[93],[38,15612,15614,15617],{"className":15613},[98],[38,15615,12442],{"className":15616,"style":9050},[98,167],[38,15618,15620],{"className":15619},[136],[38,15621,15623,15643],{"className":15622},[140,436],[38,15624,15626,15640],{"className":15625},[144],[38,15627,15629],{"className":15628,"style":509},[148],[38,15630,15631,15634],{"style":12582},[38,15632],{"className":15633,"style":156},[155],[38,15635,15637],{"className":15636},[160,161,162,163],[38,15638,368],{"className":15639},[98,163],[38,15641,464],{"className":15642},[463],[38,15644,15646],{"className":15645},[144],[38,15647,15649],{"className":15648,"style":531},[148],[38,15650],{},[38,15652,380],{"className":15653},[537],[38,15655],{"className":15656,"style":541},[110],[38,15658,15660,15663],{"className":15659},[98],[38,15661,12442],{"className":15662,"style":9050},[98,167],[38,15664,15666],{"className":15665},[136],[38,15667,15669,15689],{"className":15668},[140,436],[38,15670,15672,15686],{"className":15671},[144],[38,15673,15675],{"className":15674,"style":509},[148],[38,15676,15677,15680],{"style":12582},[38,15678],{"className":15679,"style":156},[155],[38,15681,15683],{"className":15682},[160,161,162,163],[38,15684,205],{"className":15685},[98,163],[38,15687,464],{"className":15688},[463],[38,15690,15692],{"className":15691},[144],[38,15693,15695],{"className":15694,"style":531},[148],[38,15696],{},[38,15698,380],{"className":15699},[537],[38,15701],{"className":15702,"style":541},[110],[38,15704,15706,15709],{"className":15705},[98],[38,15707,12442],{"className":15708,"style":9050},[98,167],[38,15710,15712],{"className":15711},[136],[38,15713,15715,15735],{"className":15714},[140,436],[38,15716,15718,15732],{"className":15717},[144],[38,15719,15721],{"className":15720,"style":509},[148],[38,15722,15723,15726],{"style":12582},[38,15724],{"className":15725,"style":156},[155],[38,15727,15729],{"className":15728},[160,161,162,163],[38,15730,14896],{"className":15731},[98,163],[38,15733,464],{"className":15734},[463],[38,15736,15738],{"className":15737},[144],[38,15739,15741],{"className":15740,"style":531},[148],[38,15742],{}," 的含义贡献就是 0。当 Decoder 移动到 ",[38,15745,15747,15764],{"className":15746,"translate":42},[41],[38,15748,15750],{"className":15749},[46],[48,15751,15752],{"xmlns":50},[52,15753,15754,15762],{},[55,15755,15756],{},[330,15757,15758,15760],{},[58,15759,12442],{},[203,15761,368],{},[77,15763,13876],{"encoding":79},[38,15765,15767],{"className":15766,"ariaHidden":85},[84],[38,15768,15770,15773],{"className":15769},[89],[38,15771],{"className":15772,"style":13081},[93],[38,15774,15776,15779],{"className":15775},[98],[38,15777,12442],{"className":15778,"style":9050},[98,167],[38,15780,15782],{"className":15781},[136],[38,15783,15785,15805],{"className":15784},[140,436],[38,15786,15788,15802],{"className":15787},[144],[38,15789,15791],{"className":15790,"style":509},[148],[38,15792,15793,15796],{"style":12582},[38,15794],{"className":15795,"style":156},[155],[38,15797,15799],{"className":15798},[160,161,162,163],[38,15800,368],{"className":15801},[98,163],[38,15803,464],{"className":15804},[463],[38,15806,15808],{"className":15807},[144],[38,15809,15811],{"className":15810,"style":531},[148],[38,15812],{}," 时，其含义表示为自身含义与 ",[38,15815,15817,15834],{"className":15816,"translate":42},[41],[38,15818,15820],{"className":15819},[46],[48,15821,15822],{"xmlns":50},[52,15823,15824,15832],{},[55,15825,15826],{},[330,15827,15828,15830],{},[58,15829,12442],{},[203,15831,348],{},[77,15833,13805],{"encoding":79},[38,15835,15837],{"className":15836,"ariaHidden":85},[84],[38,15838,15840,15843],{"className":15839},[89],[38,15841],{"className":15842,"style":13081},[93],[38,15844,15846,15849],{"className":15845},[98],[38,15847,12442],{"className":15848,"style":9050},[98,167],[38,15850,15852],{"className":15851},[136],[38,15853,15855,15875],{"className":15854},[140,436],[38,15856,15858,15872],{"className":15857},[144],[38,15859,15861],{"className":15860,"style":509},[148],[38,15862,15863,15866],{"style":12582},[38,15864],{"className":15865,"style":156},[155],[38,15867,15869],{"className":15868},[160,161,162,163],[38,15870,348],{"className":15871},[98,163],[38,15873,464],{"className":15874},[463],[38,15876,15878],{"className":15877},[144],[38,15879,15881],{"className":15880,"style":531},[148],[38,15882],{}," 含义的加权组合，",[38,15885,15887,15912],{"className":15886,"translate":42},[41],[38,15888,15890],{"className":15889},[46],[48,15891,15892],{"xmlns":50},[52,15893,15894,15910],{},[55,15895,15896,15902,15904],{},[330,15897,15898,15900],{},[58,15899,12442],{},[203,15901,205],{},[63,15903,380],{"separator":85},[330,15905,15906,15908],{},[58,15907,12442],{},[203,15909,14896],{},[77,15911,15266],{"encoding":79},[38,15913,15915],{"className":15914,"ariaHidden":85},[84],[38,15916,15918,15921,15961,15964,15967],{"className":15917},[89],[38,15919],{"className":15920,"style":13081},[93],[38,15922,15924,15927],{"className":15923},[98],[38,15925,12442],{"className":15926,"style":9050},[98,167],[38,15928,15930],{"className":15929},[136],[38,15931,15933,15953],{"className":15932},[140,436],[38,15934,15936,15950],{"className":15935},[144],[38,15937,15939],{"className":15938,"style":509},[148],[38,15940,15941,15944],{"style":12582},[38,15942],{"className":15943,"style":156},[155],[38,15945,15947],{"className":15946},[160,161,162,163],[38,15948,205],{"className":15949},[98,163],[38,15951,464],{"className":15952},[463],[38,15954,15956],{"className":15955},[144],[38,15957,15959],{"className":15958,"style":531},[148],[38,15960],{},[38,15962,380],{"className":15963},[537],[38,15965],{"className":15966,"style":541},[110],[38,15968,15970,15973],{"className":15969},[98],[38,15971,12442],{"className":15972,"style":9050},[98,167],[38,15974,15976],{"className":15975},[136],[38,15977,15979,15999],{"className":15978},[140,436],[38,15980,15982,15996],{"className":15981},[144],[38,15983,15985],{"className":15984,"style":509},[148],[38,15986,15987,15990],{"style":12582},[38,15988],{"className":15989,"style":156},[155],[38,15991,15993],{"className":15992},[160,161,162,163],[38,15994,14896],{"className":15995},[98,163],[38,15997,464],{"className":15998},[463],[38,16000,16002],{"className":16001},[144],[38,16003,16005],{"className":16004,"style":531},[148],[38,16006],{}," 的含义贡献就是 0。",[10,16009,16010],{},"掩码注意力的公式可以表示为：",[38,16012,16014],{"className":16013,"translate":42},[315],[38,16015,16017,16081],{"className":16016,"translate":42},[41],[38,16018,16020],{"className":16019},[46],[48,16021,16022],{"xmlns":50,"display":324},[52,16023,16024,16078],{},[55,16025,16026,16028,16030,16032,16034,16036,16038,16040,16042,16044,16046,16074,16076],{},[335,16027,9223],{},[63,16029,1500],{"stretchy":1499},[58,16031,8941],{},[63,16033,380],{"separator":85},[58,16035,8958],{},[63,16037,380],{"separator":85},[58,16039,1390],{},[63,16041,1542],{"stretchy":1499},[63,16043,340],{},[335,16045,6129],{},[55,16047,16048,16050,16072],{},[63,16049,1500],{"fence":85},[9592,16051,16052,16068],{},[55,16053,16054,16056,16058,16064,16066],{},[58,16055,8941],{},[63,16057,351],{"separator":85},[67,16059,16060,16062],{},[58,16061,8958],{},[58,16063,1960],{},[63,16065,361],{},[58,16067,8623],{},[9476,16069,16070],{},[58,16071,75],{},[63,16073,1542],{"fence":85},[63,16075,351],{"separator":85},[58,16077,1390],{},[77,16079,16080],{"encoding":79},"\\text{Attention}(Q, K, V) = \\text{softmax}\\left(\\frac{Q·K^T + M}{\\sqrt{d}}\\right) · V",[38,16082,16084,16132],{"className":16083,"ariaHidden":85},[84],[38,16085,16087,16090,16096,16099,16102,16105,16108,16111,16114,16117,16120,16123,16126,16129],{"className":16086},[89],[38,16088],{"className":16089,"style":1574},[93],[38,16091,16093],{"className":16092},[98,456],[38,16094,9223],{"className":16095},[98],[38,16097,1500],{"className":16098},[1578],[38,16100,8941],{"className":16101},[98,167],[38,16103,380],{"className":16104},[537],[38,16106],{"className":16107,"style":541},[110],[38,16109,8958],{"className":16110,"style":9074},[98,167],[38,16112,380],{"className":16113},[537],[38,16115],{"className":16116,"style":541},[110],[38,16118,1390],{"className":16119,"style":1423},[98,167],[38,16121,1542],{"className":16122},[1794],[38,16124],{"className":16125,"style":111},[110],[38,16127,340],{"className":16128},[115],[38,16130],{"className":16131,"style":111},[110],[38,16133,16135,16139,16145,16148,16316,16319,16322,16325],{"className":16134},[89],[38,16136],{"className":16137,"style":16138},[93],"height:2.4684em;vertical-align:-0.95em;",[38,16140,16142],{"className":16141},[98,456],[38,16143,6129],{"className":16144},[98],[38,16146],{"className":16147,"style":541},[110],[38,16149,16151,16157,16310],{"className":16150},[3432],[38,16152,16154],{"className":16153,"style":4111},[1578,4110],[38,16155,1500],{"className":16156},[3439,162],[38,16158,16160,16163,16307],{"className":16159},[98],[38,16161],{"className":16162},[1578,4348],[38,16164,16166],{"className":16165},[9592],[38,16167,16169,16299],{"className":16168},[140,436],[38,16170,16172,16296],{"className":16171},[144],[38,16173,16175,16230,16238],{"className":16174,"style":9704},[148],[38,16176,16177,16180],{"style":9707},[38,16178],{"className":16179,"style":3510},[155],[38,16181,16183],{"className":16182},[98],[38,16184,16186],{"className":16185},[98,9496],[38,16187,16189,16222],{"className":16188},[140,436],[38,16190,16192,16219],{"className":16191},[144],[38,16193,16195,16207],{"className":16194,"style":9506},[148],[38,16196,16198,16201],{"className":16197,"style":6172},[9510],[38,16199],{"className":16200,"style":3510},[155],[38,16202,16204],{"className":16203,"style":9517},[98],[38,16205,75],{"className":16206},[98,167],[38,16208,16209,16212],{"style":9523},[38,16210],{"className":16211,"style":3510},[155],[38,16213,16215],{"className":16214,"style":9531},[9530],[3462,16216,16217],{"xmlns":3464,"width":9534,"height":9535,"viewBox":9536,"preserveAspectRatio":9537},[3469,16218],{"d":9540},[38,16220,464],{"className":16221},[463],[38,16223,16225],{"className":16224},[144],[38,16226,16228],{"className":16227,"style":9550},[148],[38,16229],{},[38,16231,16232,16235],{"style":9763},[38,16233],{"className":16234,"style":3510},[155],[38,16236],{"className":16237,"style":9771},[9770],[38,16239,16240,16243],{"style":9774},[38,16241],{"className":16242,"style":3510},[155],[38,16244,16246,16249,16252,16255,16284,16287,16290,16293],{"className":16245},[98],[38,16247,8941],{"className":16248},[98,167],[38,16250,351],{"className":16251},[537],[38,16253],{"className":16254,"style":541},[110],[38,16256,16258,16261],{"className":16257},[98],[38,16259,8958],{"className":16260,"style":9074},[98,167],[38,16262,16264],{"className":16263},[136],[38,16265,16267],{"className":16266},[140],[38,16268,16270],{"className":16269},[144],[38,16271,16273],{"className":16272,"style":9450},[148],[38,16274,16275,16278],{"style":151},[38,16276],{"className":16277,"style":156},[155],[38,16279,16281],{"className":16280},[160,161,162,163],[38,16282,1960],{"className":16283,"style":1842},[98,167,163],[38,16285],{"className":16286,"style":595},[110],[38,16288,361],{"className":16289},[599],[38,16291],{"className":16292,"style":595},[110],[38,16294,8623],{"className":16295,"style":8646},[98,167],[38,16297,464],{"className":16298},[463],[38,16300,16302],{"className":16301},[144],[38,16303,16305],{"className":16304,"style":9828},[148],[38,16306],{},[38,16308],{"className":16309},[1794,4348],[38,16311,16313],{"className":16312,"style":4111},[1794,4110],[38,16314,1542],{"className":16315},[3439,162],[38,16317],{"className":16318,"style":541},[110],[38,16320,351],{"className":16321},[537],[38,16323],{"className":16324,"style":541},[110],[38,16326,1390],{"className":16327,"style":1423},[98,167],[10,16329,8357,16330,16358,16359,16460,16461,16515,16516,14114],{},[38,16331,16333,16346],{"className":16332,"translate":42},[41],[38,16334,16336],{"className":16335},[46],[48,16337,16338],{"xmlns":50},[52,16339,16340,16344],{},[55,16341,16342],{},[58,16343,8623],{},[77,16345,8623],{"encoding":79},[38,16347,16349],{"className":16348,"ariaHidden":85},[84],[38,16350,16352,16355],{"className":16351},[89],[38,16353],{"className":16354,"style":1555},[93],[38,16356,8623],{"className":16357,"style":8646},[98,167]," 就是掩码矩阵（",[38,16360,16362,16388],{"className":16361,"translate":42},[41],[38,16363,16365],{"className":16364},[46],[48,16366,16367],{"xmlns":50},[52,16368,16369,16385],{},[55,16370,16371,16381,16383],{},[330,16372,16373,16375],{},[58,16374,8623],{},[55,16376,16377,16379],{},[58,16378,1977],{},[58,16380,3326],{},[63,16382,340],{},[203,16384,404],{},[77,16386,16387],{"encoding":79},"M_{ij} = 0",[38,16389,16391,16451],{"className":16390,"ariaHidden":85},[84],[38,16392,16394,16397,16442,16445,16448],{"className":16393},[89],[38,16395],{"className":16396,"style":9016},[93],[38,16398,16400,16403],{"className":16399},[98],[38,16401,8623],{"className":16402,"style":8646},[98,167],[38,16404,16406],{"className":16405},[136],[38,16407,16409,16434],{"className":16408},[140,436],[38,16410,16412,16431],{"className":16411},[144],[38,16413,16415],{"className":16414,"style":2196},[148],[38,16416,16418,16421],{"style":16417},"top:-2.55em;margin-left:-0.109em;margin-right:0.05em;",[38,16419],{"className":16420,"style":156},[155],[38,16422,16424],{"className":16423},[160,161,162,163],[38,16425,16427],{"className":16426},[98,163],[38,16428,16430],{"className":16429,"style":3704},[98,167,163],"ij",[38,16432,464],{"className":16433},[463],[38,16435,16437],{"className":16436},[144],[38,16438,16440],{"className":16439,"style":471},[148],[38,16441],{},[38,16443],{"className":16444,"style":111},[110],[38,16446,340],{"className":16447},[115],[38,16449],{"className":16450,"style":111},[110],[38,16452,16454,16457],{"className":16453},[89],[38,16455],{"className":16456,"style":856},[93],[38,16458,404],{"className":16459},[98]," 如果 ",[38,16462,16464,16483],{"className":16463,"translate":42},[41],[38,16465,16467],{"className":16466},[46],[48,16468,16469],{"xmlns":50},[52,16470,16471,16480],{},[55,16472,16473,16475,16478],{},[58,16474,1977],{},[63,16476,16477],{},"≥",[58,16479,3326],{},[77,16481,16482],{"encoding":79},"i \\ge j",[38,16484,16486,16505],{"className":16485,"ariaHidden":85},[84],[38,16487,16489,16493,16496,16499,16502],{"className":16488},[89],[38,16490],{"className":16491,"style":16492},[93],"height:0.7955em;vertical-align:-0.136em;",[38,16494,1977],{"className":16495},[98,167],[38,16497],{"className":16498,"style":111},[110],[38,16500,16477],{"className":16501},[115],[38,16503],{"className":16504,"style":111},[110],[38,16506,16508,16512],{"className":16507},[89],[38,16509],{"className":16510,"style":16511},[93],"height:0.854em;vertical-align:-0.1944em;",[38,16513,3326],{"className":16514,"style":3704},[98,167],"，否则 ",[38,16517,16519,16547],{"className":16518,"translate":42},[41],[38,16520,16522],{"className":16521},[46],[48,16523,16524],{"xmlns":50},[52,16525,16526,16544],{},[55,16527,16528,16538,16540,16542],{},[330,16529,16530,16532],{},[58,16531,8623],{},[55,16533,16534,16536],{},[58,16535,1977],{},[58,16537,3326],{},[63,16539,340],{},[63,16541,12535],{},[58,16543,14024],{"mathvariant":400},[77,16545,16546],{"encoding":79},"M_{ij} = -\\infty",[38,16548,16550,16608],{"className":16549,"ariaHidden":85},[84],[38,16551,16553,16556,16599,16602,16605],{"className":16552},[89],[38,16554],{"className":16555,"style":9016},[93],[38,16557,16559,16562],{"className":16558},[98],[38,16560,8623],{"className":16561,"style":8646},[98,167],[38,16563,16565],{"className":16564},[136],[38,16566,16568,16591],{"className":16567},[140,436],[38,16569,16571,16588],{"className":16570},[144],[38,16572,16574],{"className":16573,"style":2196},[148],[38,16575,16576,16579],{"style":16417},[38,16577],{"className":16578,"style":156},[155],[38,16580,16582],{"className":16581},[160,161,162,163],[38,16583,16585],{"className":16584},[98,163],[38,16586,16430],{"className":16587,"style":3704},[98,167,163],[38,16589,464],{"className":16590},[463],[38,16592,16594],{"className":16593},[144],[38,16595,16597],{"className":16596,"style":471},[148],[38,16598],{},[38,16600],{"className":16601,"style":111},[110],[38,16603,340],{"className":16604},[115],[38,16606],{"className":16607,"style":111},[110],[38,16609,16611,16614,16617],{"className":16610},[89],[38,16612],{"className":16613,"style":11002},[93],[38,16615,12535],{"className":16616},[98],[38,16618,14024],{"className":16619},[98],[31,16621,16623],{"id":16622},"decoder-输出与交叉注意力","Decoder 输出与交叉注意力",[10,16625,16626,16627,16630,16631,16634,16635,16638],{},"在推理阶段，Decoder 不能只靠自言自语来生成翻译——它必须参考编码器的原始输入句子。比如翻译 ",[12397,16628,16629],{},"我爱机器学习"," 时，Decoder 需要知道 Encoder 为 ",[12397,16632,16633],{},"machine learning"," 生成的表示，对应的是 ",[12397,16636,16637],{},"机器学习"," 这部分。",[10,16640,16641],{},"这就引入了 Cross-Attention（交叉注意力）层。",[10,16643,16644],{},"在 Cross-Attention 中，Q、K、V 的来源和 Encoder 不同：",[14717,16646,16647,16653,16659],{},[12342,16648,16649,16652],{},[170,16650,16651],{},"Q（Query）","：来自 Decoder 当前的词汇表示。这个词汇表示为该词含义与之前所有词汇含义的加权组合。",[12342,16654,16655,16658],{},[170,16656,16657],{},"K（Key）","：来自 Encoder 的输出（即“源语言里有哪些信息可供我查询”）。",[12342,16660,16661,16664],{},[170,16662,16663],{},"V（Value）","：同样来自 Encoder 的输出（即“这些信息的实际内容是什么”）。",[10,16666,16667],{},"公式和之前的类似：",[38,16669,16671],{"className":16670,"translate":42},[315],[38,16672,16674,16759],{"className":16673,"translate":42},[41],[38,16675,16677],{"className":16676},[46],[48,16678,16679],{"xmlns":50,"display":324},[52,16680,16681,16756],{},[55,16682,16683,16686,16688,16695,16697,16704,16706,16712,16714,16716,16718,16748,16750],{},[335,16684,16685],{},"CrossAttention",[63,16687,1500],{"stretchy":1499},[330,16689,16690,16692],{},[58,16691,8941],{},[335,16693,16694],{},"dec",[63,16696,380],{"separator":85},[330,16698,16699,16701],{},[58,16700,8958],{},[335,16702,16703],{},"enc",[63,16705,380],{"separator":85},[330,16707,16708,16710],{},[58,16709,1390],{},[335,16711,16703],{},[63,16713,1542],{"stretchy":1499},[63,16715,340],{},[335,16717,6129],{},[55,16719,16720,16722,16746],{},[63,16721,1500],{"fence":85},[9592,16723,16724,16742],{},[55,16725,16726,16732,16734],{},[330,16727,16728,16730],{},[58,16729,8941],{},[335,16731,16694],{},[63,16733,351],{"separator":85},[7932,16735,16736,16738,16740],{},[58,16737,8958],{},[335,16739,16703],{},[58,16741,1960],{},[9476,16743,16744],{},[58,16745,75],{},[63,16747,1542],{"fence":85},[63,16749,351],{"separator":85},[330,16751,16752,16754],{},[58,16753,1390],{},[335,16755,16703],{},[77,16757,16758],{"encoding":79},"\\text{CrossAttention}(Q_{\\text{dec}}, K_{\\text{enc}}, V_{\\text{enc}}) = \\text{softmax}\\left(\\frac{Q_{\\text{dec}}·K_{\\text{enc}}^T}{\\sqrt{d}}\\right) · V_{\\text{enc}}",[38,16760,16762,16939],{"className":16761,"ariaHidden":85},[84],[38,16763,16765,16768,16774,16777,16823,16826,16829,16875,16878,16881,16927,16930,16933,16936],{"className":16764},[89],[38,16766],{"className":16767,"style":1574},[93],[38,16769,16771],{"className":16770},[98,456],[38,16772,16685],{"className":16773},[98],[38,16775,1500],{"className":16776},[1578],[38,16778,16780,16783],{"className":16779},[98],[38,16781,8941],{"className":16782},[98,167],[38,16784,16786],{"className":16785},[136],[38,16787,16789,16815],{"className":16788},[140,436],[38,16790,16792,16812],{"className":16791},[144],[38,16793,16795],{"className":16794,"style":566},[148],[38,16796,16797,16800],{"style":8331},[38,16798],{"className":16799,"style":156},[155],[38,16801,16803],{"className":16802},[160,161,162,163],[38,16804,16806],{"className":16805},[98,163],[38,16807,16809],{"className":16808},[98,456,163],[38,16810,16694],{"className":16811},[98,163],[38,16813,464],{"className":16814},[463],[38,16816,16818],{"className":16817},[144],[38,16819,16821],{"className":16820,"style":531},[148],[38,16822],{},[38,16824,380],{"className":16825},[537],[38,16827],{"className":16828,"style":541},[110],[38,16830,16832,16835],{"className":16831},[98],[38,16833,8958],{"className":16834,"style":9074},[98,167],[38,16836,16838],{"className":16837},[136],[38,16839,16841,16867],{"className":16840},[140,436],[38,16842,16844,16864],{"className":16843},[144],[38,16845,16847],{"className":16846,"style":443},[148],[38,16848,16849,16852],{"style":10663},[38,16850],{"className":16851,"style":156},[155],[38,16853,16855],{"className":16854},[160,161,162,163],[38,16856,16858],{"className":16857},[98,163],[38,16859,16861],{"className":16860},[98,456,163],[38,16862,16703],{"className":16863},[98,163],[38,16865,464],{"className":16866},[463],[38,16868,16870],{"className":16869},[144],[38,16871,16873],{"className":16872,"style":531},[148],[38,16874],{},[38,16876,380],{"className":16877},[537],[38,16879],{"className":16880,"style":541},[110],[38,16882,16884,16887],{"className":16883},[98],[38,16885,1390],{"className":16886,"style":1423},[98,167],[38,16888,16890],{"className":16889},[136],[38,16891,16893,16919],{"className":16892},[140,436],[38,16894,16896,16916],{"className":16895},[144],[38,16897,16899],{"className":16898,"style":443},[148],[38,16900,16901,16904],{"style":10710},[38,16902],{"className":16903,"style":156},[155],[38,16905,16907],{"className":16906},[160,161,162,163],[38,16908,16910],{"className":16909},[98,163],[38,16911,16913],{"className":16912},[98,456,163],[38,16914,16703],{"className":16915},[98,163],[38,16917,464],{"className":16918},[463],[38,16920,16922],{"className":16921},[144],[38,16923,16925],{"className":16924,"style":531},[148],[38,16926],{},[38,16928,1542],{"className":16929},[1794],[38,16931],{"className":16932,"style":111},[110],[38,16934,340],{"className":16935},[115],[38,16937],{"className":16938,"style":111},[110],[38,16940,16942,16945,16951,16954,17182,17185,17188,17191],{"className":16941},[89],[38,16943],{"className":16944,"style":16138},[93],[38,16946,16948],{"className":16947},[98,456],[38,16949,6129],{"className":16950},[98],[38,16952],{"className":16953,"style":541},[110],[38,16955,16957,16963,17176],{"className":16956},[3432],[38,16958,16960],{"className":16959,"style":4111},[1578,4110],[38,16961,1500],{"className":16962},[3439,162],[38,16964,16966,16969,17173],{"className":16965},[98],[38,16967],{"className":16968},[1578,4348],[38,16970,16972],{"className":16971},[9592],[38,16973,16975,17165],{"className":16974},[140,436],[38,16976,16978,17162],{"className":16977},[144],[38,16979,16981,17036,17044],{"className":16980,"style":9704},[148],[38,16982,16983,16986],{"style":9707},[38,16984],{"className":16985,"style":3510},[155],[38,16987,16989],{"className":16988},[98],[38,16990,16992],{"className":16991},[98,9496],[38,16993,16995,17028],{"className":16994},[140,436],[38,16996,16998,17025],{"className":16997},[144],[38,16999,17001,17013],{"className":17000,"style":9506},[148],[38,17002,17004,17007],{"className":17003,"style":6172},[9510],[38,17005],{"className":17006,"style":3510},[155],[38,17008,17010],{"className":17009,"style":9517},[98],[38,17011,75],{"className":17012},[98,167],[38,17014,17015,17018],{"style":9523},[38,17016],{"className":17017,"style":3510},[155],[38,17019,17021],{"className":17020,"style":9531},[9530],[3462,17022,17023],{"xmlns":3464,"width":9534,"height":9535,"viewBox":9536,"preserveAspectRatio":9537},[3469,17024],{"d":9540},[38,17026,464],{"className":17027},[463],[38,17029,17031],{"className":17030},[144],[38,17032,17034],{"className":17033,"style":9550},[148],[38,17035],{},[38,17037,17038,17041],{"style":9763},[38,17039],{"className":17040,"style":3510},[155],[38,17042],{"className":17043,"style":9771},[9770],[38,17045,17046,17049],{"style":9774},[38,17047],{"className":17048,"style":3510},[155],[38,17050,17052,17098,17101,17104],{"className":17051},[98],[38,17053,17055,17058],{"className":17054},[98],[38,17056,8941],{"className":17057},[98,167],[38,17059,17061],{"className":17060},[136],[38,17062,17064,17090],{"className":17063},[140,436],[38,17065,17067,17087],{"className":17066},[144],[38,17068,17070],{"className":17069,"style":566},[148],[38,17071,17072,17075],{"style":8331},[38,17073],{"className":17074,"style":156},[155],[38,17076,17078],{"className":17077},[160,161,162,163],[38,17079,17081],{"className":17080},[98,163],[38,17082,17084],{"className":17083},[98,456,163],[38,17085,16694],{"className":17086},[98,163],[38,17088,464],{"className":17089},[463],[38,17091,17093],{"className":17092},[144],[38,17094,17096],{"className":17095,"style":531},[148],[38,17097],{},[38,17099,351],{"className":17100},[537],[38,17102],{"className":17103,"style":541},[110],[38,17105,17107,17110],{"className":17106},[98],[38,17108,8958],{"className":17109,"style":9074},[98,167],[38,17111,17113],{"className":17112},[136],[38,17114,17116,17154],{"className":17115},[140,436],[38,17117,17119,17151],{"className":17118},[144],[38,17120,17122,17140],{"className":17121,"style":9450},[148],[38,17123,17125,17128],{"style":17124},"top:-2.453em;margin-left:-0.0715em;margin-right:0.05em;",[38,17126],{"className":17127,"style":156},[155],[38,17129,17131],{"className":17130},[160,161,162,163],[38,17132,17134],{"className":17133},[98,163],[38,17135,17137],{"className":17136},[98,456,163],[38,17138,16703],{"className":17139},[98,163],[38,17141,17142,17145],{"style":151},[38,17143],{"className":17144,"style":156},[155],[38,17146,17148],{"className":17147},[160,161,162,163],[38,17149,1960],{"className":17150,"style":1842},[98,167,163],[38,17152,464],{"className":17153},[463],[38,17155,17157],{"className":17156},[144],[38,17158,17160],{"className":17159,"style":8031},[148],[38,17161],{},[38,17163,464],{"className":17164},[463],[38,17166,17168],{"className":17167},[144],[38,17169,17171],{"className":17170,"style":9828},[148],[38,17172],{},[38,17174],{"className":17175},[1794,4348],[38,17177,17179],{"className":17178,"style":4111},[1794,4110],[38,17180,1542],{"className":17181},[3439,162],[38,17183],{"className":17184,"style":541},[110],[38,17186,351],{"className":17187},[537],[38,17189],{"className":17190,"style":541},[110],[38,17192,17194,17197],{"className":17193},[98],[38,17195,1390],{"className":17196,"style":1423},[98,167],[38,17198,17200],{"className":17199},[136],[38,17201,17203,17229],{"className":17202},[140,436],[38,17204,17206,17226],{"className":17205},[144],[38,17207,17209],{"className":17208,"style":443},[148],[38,17210,17211,17214],{"style":10710},[38,17212],{"className":17213,"style":156},[155],[38,17215,17217],{"className":17216},[160,161,162,163],[38,17218,17220],{"className":17219},[98,163],[38,17221,17223],{"className":17222},[98,456,163],[38,17224,16703],{"className":17225},[98,163],[38,17227,464],{"className":17228},[463],[38,17230,17232],{"className":17231},[144],[38,17233,17235],{"className":17234,"style":531},[148],[38,17236],{},[12339,17238,17239,17242,17253],{},[12342,17240,17241],{},"Decoder 当前想生成“love”这个词，它的 Q 矩阵中包含了“我正在翻译「爱」”这个意图。",[12342,17243,17244,17245,17248,17249,17252],{},"这个 Q 去和 Encoder 的 K 做相似度计算——发现 ",[12397,17246,17247],{},"love"," 和源语言中的 ",[12397,17250,17251],{},"爱"," 这个含义高度相似。",[12342,17254,17255,17256,17258,17259,17261],{},"于是，注意力权重集中在 ",[12397,17257,17251],{}," 的那个位置上，把那个位置的 V（即 ",[12397,17260,17251],{}," 的语义向量）加权提取出来，融合进 Decoder 的当前表示中。",[10,17263,17264],{},"这样一来，Decoder 既知道“我已经生成了哪些词”（通过 Masked Self-Attention），又知道“源语言在说什么”（通过 Cross-Attention），就能很好地完成翻译任务了。",[31,17266,17267],{"id":17267},"可视化翻译步骤",[10,17269,17270,17271,17274,17275,179],{},"假设 Encoder 输入 ",[12397,17272,17273],{},"我 爱 AI","，训练完 Encoder，得到输出 ",[38,17276,17278,17321],{"className":17277,"translate":42},[41],[38,17279,17281],{"className":17280},[46],[48,17282,17283],{"xmlns":50},[52,17284,17285,17318],{},[55,17286,17287,17290,17292,17294,17300,17302,17308,17310,17316],{},[58,17288,17289],{},"E",[63,17291,340],{},[63,17293,1500],{"stretchy":1499},[330,17295,17296,17298],{},[58,17297,15418],{},[203,17299,348],{},[63,17301,380],{"separator":85},[330,17303,17304,17306],{},[58,17305,15418],{},[203,17307,368],{},[63,17309,380],{"separator":85},[330,17311,17312,17314],{},[58,17313,15418],{},[203,17315,205],{},[63,17317,1542],{"stretchy":1499},[77,17319,17320],{"encoding":79},"E = (e_1, e_2, e_3)",[38,17322,17324,17343],{"className":17323,"ariaHidden":85},[84],[38,17325,17327,17330,17334,17337,17340],{"className":17326},[89],[38,17328],{"className":17329,"style":1555},[93],[38,17331,17289],{"className":17332,"style":17333},[98,167],"margin-right:0.05764em;",[38,17335],{"className":17336,"style":111},[110],[38,17338,340],{"className":17339},[115],[38,17341],{"className":17342,"style":111},[110],[38,17344,17346,17349,17352,17392,17395,17398,17438,17441,17444,17484],{"className":17345},[89],[38,17347],{"className":17348,"style":1574},[93],[38,17350,1500],{"className":17351},[1578],[38,17353,17355,17358],{"className":17354},[98],[38,17356,15418],{"className":17357},[98,167],[38,17359,17361],{"className":17360},[136],[38,17362,17364,17384],{"className":17363},[140,436],[38,17365,17367,17381],{"className":17366},[144],[38,17368,17370],{"className":17369,"style":509},[148],[38,17371,17372,17375],{"style":8331},[38,17373],{"className":17374,"style":156},[155],[38,17376,17378],{"className":17377},[160,161,162,163],[38,17379,348],{"className":17380},[98,163],[38,17382,464],{"className":17383},[463],[38,17385,17387],{"className":17386},[144],[38,17388,17390],{"className":17389,"style":531},[148],[38,17391],{},[38,17393,380],{"className":17394},[537],[38,17396],{"className":17397,"style":541},[110],[38,17399,17401,17404],{"className":17400},[98],[38,17402,15418],{"className":17403},[98,167],[38,17405,17407],{"className":17406},[136],[38,17408,17410,17430],{"className":17409},[140,436],[38,17411,17413,17427],{"className":17412},[144],[38,17414,17416],{"className":17415,"style":509},[148],[38,17417,17418,17421],{"style":8331},[38,17419],{"className":17420,"style":156},[155],[38,17422,17424],{"className":17423},[160,161,162,163],[38,17425,368],{"className":17426},[98,163],[38,17428,464],{"className":17429},[463],[38,17431,17433],{"className":17432},[144],[38,17434,17436],{"className":17435,"style":531},[148],[38,17437],{},[38,17439,380],{"className":17440},[537],[38,17442],{"className":17443,"style":541},[110],[38,17445,17447,17450],{"className":17446},[98],[38,17448,15418],{"className":17449},[98,167],[38,17451,17453],{"className":17452},[136],[38,17454,17456,17476],{"className":17455},[140,436],[38,17457,17459,17473],{"className":17458},[144],[38,17460,17462],{"className":17461,"style":509},[148],[38,17463,17464,17467],{"style":8331},[38,17465],{"className":17466,"style":156},[155],[38,17468,17470],{"className":17469},[160,161,162,163],[38,17471,205],{"className":17472},[98,163],[38,17474,464],{"className":17475},[463],[38,17477,17479],{"className":17478},[144],[38,17480,17482],{"className":17481,"style":531},[148],[38,17483],{},[38,17485,1542],{"className":17486},[1794],[10,17488,17489,17490,17493,17494,17497],{},"Decoder 正在生成第 3 个词（已经生成了 ",[12397,17491,17492],{},"I love","，现在要生成 ",[12397,17495,17496],{},"AI","）：",[12339,17499,17500,17524,17955],{},[12342,17501,17502,13128,17505,17508,17509,17511,17512,17514,17515,17517,17518,17520,17521,17523],{},[170,17503,17504],{},"Masked Self-Attention",[12397,17506,17507],{},"I"," 和 ",[12397,17510,17247],{}," 互相“看”对方，计算出它们之间的关联关系。",[12397,17513,17507],{}," 的含义会包含 ",[12397,17516,17247],{}," 的含义，",[12397,17519,17247],{}," 的含义也会有 ",[12397,17522,17507],{}," 的含义。但它们都不能看到未来的第 3 个位置（因为那个位置还没生成）。",[12342,17525,17526,17529,17530,17532,17533,17535,17536,17564,17565,17743,17744,17814,17815,17884,17885,17954],{},[170,17527,17528],{},"Cross-Attention","：Decoder 把当前的词汇表示（也就是 ",[12397,17531,17247],{}," ，它包含了自身以及 ",[12397,17534,17507],{}," 的含义的加权表示）作为 ",[38,17537,17539,17552],{"className":17538,"translate":42},[41],[38,17540,17542],{"className":17541},[46],[48,17543,17544],{"xmlns":50},[52,17545,17546,17550],{},[55,17547,17548],{},[58,17549,8941],{},[77,17551,8941],{"encoding":79},[38,17553,17555],{"className":17554,"ariaHidden":85},[84],[38,17556,17558,17561],{"className":17557},[89],[38,17559],{"className":17560,"style":8997},[93],[38,17562,8941],{"className":17563},[98,167]," ，去和 Encoder 的三个位置 ",[38,17566,17568,17602],{"className":17567,"translate":42},[41],[38,17569,17571],{"className":17570},[46],[48,17572,17573],{"xmlns":50},[52,17574,17575,17599],{},[55,17576,17577,17583,17585,17591,17593],{},[330,17578,17579,17581],{},[58,17580,15418],{},[203,17582,348],{},[63,17584,380],{"separator":85},[330,17586,17587,17589],{},[58,17588,15418],{},[203,17590,368],{},[63,17592,380],{"separator":85},[330,17594,17595,17597],{},[58,17596,15418],{},[203,17598,205],{},[77,17600,17601],{"encoding":79},"e_1, e_2, e_3",[38,17603,17605],{"className":17604,"ariaHidden":85},[84],[38,17606,17608,17611,17651,17654,17657,17697,17700,17703],{"className":17607},[89],[38,17609],{"className":17610,"style":13081},[93],[38,17612,17614,17617],{"className":17613},[98],[38,17615,15418],{"className":17616},[98,167],[38,17618,17620],{"className":17619},[136],[38,17621,17623,17643],{"className":17622},[140,436],[38,17624,17626,17640],{"className":17625},[144],[38,17627,17629],{"className":17628,"style":509},[148],[38,17630,17631,17634],{"style":8331},[38,17632],{"className":17633,"style":156},[155],[38,17635,17637],{"className":17636},[160,161,162,163],[38,17638,348],{"className":17639},[98,163],[38,17641,464],{"className":17642},[463],[38,17644,17646],{"className":17645},[144],[38,17647,17649],{"className":17648,"style":531},[148],[38,17650],{},[38,17652,380],{"className":17653},[537],[38,17655],{"className":17656,"style":541},[110],[38,17658,17660,17663],{"className":17659},[98],[38,17661,15418],{"className":17662},[98,167],[38,17664,17666],{"className":17665},[136],[38,17667,17669,17689],{"className":17668},[140,436],[38,17670,17672,17686],{"className":17671},[144],[38,17673,17675],{"className":17674,"style":509},[148],[38,17676,17677,17680],{"style":8331},[38,17678],{"className":17679,"style":156},[155],[38,17681,17683],{"className":17682},[160,161,162,163],[38,17684,368],{"className":17685},[98,163],[38,17687,464],{"className":17688},[463],[38,17690,17692],{"className":17691},[144],[38,17693,17695],{"className":17694,"style":531},[148],[38,17696],{},[38,17698,380],{"className":17699},[537],[38,17701],{"className":17702,"style":541},[110],[38,17704,17706,17709],{"className":17705},[98],[38,17707,15418],{"className":17708},[98,167],[38,17710,17712],{"className":17711},[136],[38,17713,17715,17735],{"className":17714},[140,436],[38,17716,17718,17732],{"className":17717},[144],[38,17719,17721],{"className":17720,"style":509},[148],[38,17722,17723,17726],{"style":8331},[38,17724],{"className":17725,"style":156},[155],[38,17727,17729],{"className":17728},[160,161,162,163],[38,17730,205],{"className":17731},[98,163],[38,17733,464],{"className":17734},[463],[38,17736,17738],{"className":17737},[144],[38,17739,17741],{"className":17740,"style":531},[148],[38,17742],{}," 计算相似度。如果模型学得好，它应该发现“love”和“爱”（",[38,17745,17747,17765],{"className":17746,"translate":42},[41],[38,17748,17750],{"className":17749},[46],[48,17751,17752],{"xmlns":50},[52,17753,17754,17762],{},[55,17755,17756],{},[330,17757,17758,17760],{},[58,17759,15418],{},[203,17761,368],{},[77,17763,17764],{"encoding":79},"e_2",[38,17766,17768],{"className":17767,"ariaHidden":85},[84],[38,17769,17771,17774],{"className":17770},[89],[38,17772],{"className":17773,"style":5650},[93],[38,17775,17777,17780],{"className":17776},[98],[38,17778,15418],{"className":17779},[98,167],[38,17781,17783],{"className":17782},[136],[38,17784,17786,17806],{"className":17785},[140,436],[38,17787,17789,17803],{"className":17788},[144],[38,17790,17792],{"className":17791,"style":509},[148],[38,17793,17794,17797],{"style":8331},[38,17795],{"className":17796,"style":156},[155],[38,17798,17800],{"className":17799},[160,161,162,163],[38,17801,368],{"className":17802},[98,163],[38,17804,464],{"className":17805},[463],[38,17807,17809],{"className":17808},[144],[38,17810,17812],{"className":17811,"style":531},[148],[38,17813],{},"）最相关，于是注意力权重集中在 ",[38,17816,17818,17835],{"className":17817,"translate":42},[41],[38,17819,17821],{"className":17820},[46],[48,17822,17823],{"xmlns":50},[52,17824,17825,17833],{},[55,17826,17827],{},[330,17828,17829,17831],{},[58,17830,15418],{},[203,17832,368],{},[77,17834,17764],{"encoding":79},[38,17836,17838],{"className":17837,"ariaHidden":85},[84],[38,17839,17841,17844],{"className":17840},[89],[38,17842],{"className":17843,"style":5650},[93],[38,17845,17847,17850],{"className":17846},[98],[38,17848,15418],{"className":17849},[98,167],[38,17851,17853],{"className":17852},[136],[38,17854,17856,17876],{"className":17855},[140,436],[38,17857,17859,17873],{"className":17858},[144],[38,17860,17862],{"className":17861,"style":509},[148],[38,17863,17864,17867],{"style":8331},[38,17865],{"className":17866,"style":156},[155],[38,17868,17870],{"className":17869},[160,161,162,163],[38,17871,368],{"className":17872},[98,163],[38,17874,464],{"className":17875},[463],[38,17877,17879],{"className":17878},[144],[38,17880,17882],{"className":17881,"style":531},[148],[38,17883],{}," 上，把 ",[38,17886,17888,17905],{"className":17887,"translate":42},[41],[38,17889,17891],{"className":17890},[46],[48,17892,17893],{"xmlns":50},[52,17894,17895,17903],{},[55,17896,17897],{},[330,17898,17899,17901],{},[58,17900,15418],{},[203,17902,368],{},[77,17904,17764],{"encoding":79},[38,17906,17908],{"className":17907,"ariaHidden":85},[84],[38,17909,17911,17914],{"className":17910},[89],[38,17912],{"className":17913,"style":5650},[93],[38,17915,17917,17920],{"className":17916},[98],[38,17918,15418],{"className":17919},[98,167],[38,17921,17923],{"className":17922},[136],[38,17924,17926,17946],{"className":17925},[140,436],[38,17927,17929,17943],{"className":17928},[144],[38,17930,17932],{"className":17931,"style":509},[148],[38,17933,17934,17937],{"style":8331},[38,17935],{"className":17936,"style":156},[155],[38,17938,17940],{"className":17939},[160,161,162,163],[38,17941,368],{"className":17942},[98,163],[38,17944,464],{"className":17945},[463],[38,17947,17949],{"className":17948},[144],[38,17950,17952],{"className":17951,"style":531},[148],[38,17953],{}," 的特征提取出来。",[12342,17956,17957,17960,17961,18030,18031,18100,18101,18103,18104,179],{},[170,17958,17959],{},"FFN","：根据向量 ",[38,17962,17964,17981],{"className":17963,"translate":42},[41],[38,17965,17967],{"className":17966},[46],[48,17968,17969],{"xmlns":50},[52,17970,17971,17979],{},[55,17972,17973],{},[330,17974,17975,17977],{},[58,17976,15418],{},[203,17978,368],{},[77,17980,17764],{"encoding":79},[38,17982,17984],{"className":17983,"ariaHidden":85},[84],[38,17985,17987,17990],{"className":17986},[89],[38,17988],{"className":17989,"style":5650},[93],[38,17991,17993,17996],{"className":17992},[98],[38,17994,15418],{"className":17995},[98,167],[38,17997,17999],{"className":17998},[136],[38,18000,18002,18022],{"className":18001},[140,436],[38,18003,18005,18019],{"className":18004},[144],[38,18006,18008],{"className":18007,"style":509},[148],[38,18009,18010,18013],{"style":8331},[38,18011],{"className":18012,"style":156},[155],[38,18014,18016],{"className":18015},[160,161,162,163],[38,18017,368],{"className":18018},[98,163],[38,18020,464],{"className":18021},[463],[38,18023,18025],{"className":18024},[144],[38,18026,18028],{"className":18027,"style":531},[148],[38,18029],{}," ，发现 ",[38,18032,18034,18051],{"className":18033,"translate":42},[41],[38,18035,18037],{"className":18036},[46],[48,18038,18039],{"xmlns":50},[52,18040,18041,18049],{},[55,18042,18043],{},[330,18044,18045,18047],{},[58,18046,15418],{},[203,18048,368],{},[77,18050,17764],{"encoding":79},[38,18052,18054],{"className":18053,"ariaHidden":85},[84],[38,18055,18057,18060],{"className":18056},[89],[38,18058],{"className":18059,"style":5650},[93],[38,18061,18063,18066],{"className":18062},[98],[38,18064,15418],{"className":18065},[98,167],[38,18067,18069],{"className":18068},[136],[38,18070,18072,18092],{"className":18071},[140,436],[38,18073,18075,18089],{"className":18074},[144],[38,18076,18078],{"className":18077,"style":509},[148],[38,18079,18080,18083],{"style":8331},[38,18081],{"className":18082,"style":156},[155],[38,18084,18086],{"className":18085},[160,161,162,163],[38,18087,368],{"className":18088},[98,163],[38,18090,464],{"className":18091},[463],[38,18093,18095],{"className":18094},[144],[38,18096,18098],{"className":18097,"style":531},[148],[38,18099],{}," 这个向量的含义中，包含词汇 ",[12397,18102,17496],{}," 的含义最高，于是可以输出 ",[12397,18105,17496],{},[10,18107,18108],{},"上面就是 Encoder-Decoder 联合进行翻译的全过程。",{"title":18110,"searchDepth":18111,"depth":18111,"links":18112},"",2,[18113,18114,18115,18116,18117,18118,18119,18120,18121,18122],{"id":33,"depth":18111,"text":33},{"id":2353,"depth":18111,"text":2353},{"id":4351,"depth":18111,"text":4351},{"id":7434,"depth":18111,"text":7434},{"id":10046,"depth":18111,"text":10047},{"id":11513,"depth":18111,"text":11514},{"id":12330,"depth":18111,"text":12331},{"id":13584,"depth":18111,"text":13585},{"id":16622,"depth":18111,"text":16623},{"id":17267,"depth":18111,"text":17267},"https:\u002F\u002Fimage-assets.dreams.plus\u002F202605172331014.png","在 2017 年 Ashish Vaswani 等人在 NeurIPS 发表了 Attention is all you need 。这篇文章首次提出注意力机制进行序列建模，尽管 attention 机制在当初为了解决序列串行训练中的低效问题，实现了在大规模 GPU 上的并行训练，并在多年后成为了现代 LLM 的基础。",false,true,"md",null,{},"2025-05-08","\u002Fblog\u002F2025\u002F2025-05-08-revisit-the-transformer",{"title":5,"description":18124},"blog\u002F2025\u002F2025-05-08-revisit-the-transformer","在本文中，我会按照 NLP 发展的脉络，一步一步拆解 Transformer 机制",[18136,18137],"machine-learning","nlp","j-LmgzO9pqcGwQDMlcqbW-AsNgmsNAErVuDek51vhd8",1779032385652]