Distance MBA - Search News

SinKD: Sinkhorn Distance Minimization for Knowledge Distillation

Abstract: Knowledge distillation (KD) has been widely adopted to compress large language models (LLMs). Existing KD methods investigate various divergence measures including the Kullback-Leibler (KL), ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

SinKD: Sinkhorn Distance Minimization for Knowledge Distillation

Trending now