Spaces:

Ahmadzei
/

RAG

Runtime error

update 1

57bdca5 over 1 year ago

691 Bytes

	Some examples of feature extraction include transforming raw text into word embeddings and extracting important features such as edges or shapes from image/video data.
	feed forward chunking
	In each residual attention block in transformers the self-attention layer is usually followed by 2 feed forward layers.
	The intermediate embedding size of the feed forward layers is often bigger than the hidden size of the model (e.g., for
	google-bert/bert-base-uncased).
	For an input of size [batch_size, sequence_length], the memory required to store the intermediate feed forward
	embeddings [batch_size, sequence_length, config.intermediate_size] can account for a large fraction of the memory
	use.