

[feed-forward sublayers] … these layers are the mechanism used to gradually learn a general, increasingly abstract understanding of the entire text being processed.
in my opinion, this is the part that people who hates LLMs (large language models) chooses to ignore.






expensive and useless, unfortunately.