#author("2024-10-18T14:51:54+00:00","default:yoya","yoya")
#author("2024-12-18T07:31:34+00:00","default:yoya","yoya")
- https://arxiv.org/abs/2402.17764
-- https://arxiv.org/pdf/2402.17764.pdf
- https://github.com/Beomi/BitNet-Transformers/
- https://github.com/frodo821/BitNet-Transformers
>
https://twitter.com/BoufrawFrodo2/status/1763435835935047789
オリジナルのBitNetを1.58bの論文に従って3値にするように修正しました
- https://github.com/microsoft/BitNet
* モデル [#df035b0a]
- https://huggingface.co/tiiuae/Falcon3-10B-Instruct-1.58bit
*. [#ue8ff37d]
- 1bit LLMの時代が来る?
-- https://note.com/ipsj/n/ncbe5746f71fb
- Microsoftが1.58ビットの大規模言語モデルをリリース、行列計算を足し算にできて計算コスト激減へ
-- https://gigazine.net/news/20240229-microsoft-1bit-llm/
- 1ビットLLMの衝撃! 70Bで8.9倍高速 全ての推論を加算のみで!GPU不要になる可能性も
-- https://topics.smt.docomo.ne.jp/article/wirelesswire/business/wirelesswire-20240286094?
- https://twitter.com/andrew_n_carr/status/1770487200234213758?s=12
>
1.58 bit code is out (in an appendix)
https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf
https://pbs.twimg.com/media/GJIIs1ZbcAA7hzM?format=jpg&name=small#.jpg
- https://twitter.com/teortaxesTex/status/1773861506674741570
>
It seems that results of that Microsoft paper about ternary LLMs can be replicated after all – for 3B@100B at least.
https://huggingface.co/1bitLLM/bitnet_b1_58-3B
https://pbs.twimg.com/media/GJ4E0F4XgAAcKal?format=jpg&name=small#.jpg