Commit History

Merge pull request #2 from seanpedrick-case/dev
13d6f5f
Running
unverified

Sean Pedrick-Case commited on

Updated README.md to latest gradio version
d95309d

seanpedrickcase commited on

Merge pull request #1 from seanpedrick-case/dev
ccb541a
unverified

Sean Pedrick-Case commited on

Fixes on representation model, visualisations, and embeddings in CPU mode. Package updates and optimisation for compatibility
db3eaec

seanpedrickcase commited on

Changed llama-cpp version for cpu
39f4270

seanpedrickcase commited on

Corrected reference to sentence transformers dependency. Updated Dockerfile packages
611584f

seanpedrickcase commited on

Fix on returning GPU tensors to main function after embedding with zeroGPU. Representation model put under ZeroGPU spaces
02721f3

seanpedrickcase commited on

Removed explicit references to cuda in functions where spaces GPU are loaded
cb7a4c9

seanpedrickcase commited on

zeroGPU spaces duration can now be defined from environment variable
38198b1

seanpedrickcase commited on

Added random seed to topic_core_funcs
f957de1

seanpedrickcase commited on

Trying to load in cuda only within spaces environment to enable zero GPU space to run successfully
71afe01

seanpedrickcase commited on

Updated package versions in requirements files
5814ab0

seanpedrickcase commited on

Adjusted requirements for max available for Huggingface python==3.10 platform
6bf616b

seanpedrickcase commited on

Test update main requirements file for huggingface compatibility
9a4b420

seanpedrickcase commited on

Re-added TruncatedSVD dependency to topic_core_funcs.py
f42e3d1

seanpedrickcase commited on

Debugged reference to random_seed in vectorisation and reference to torch in representation_model.py
8216d8c

seanpedrickcase commited on

Importing space package near start of app now to avoid issue with cuda being initialised before
9e84863

seanpedrickcase commited on

Llama-cpp-python in GPU mode doesn't seem to work well with Bertopic on Huggingface, so downgrading that to CPU version
88d81fa

seanpedrickcase commited on

Rearranged functions for embeddings creation to be compatible with zero GPU space. Updated packages.
cc495e1

seanpedrickcase commited on

Added and replaced relevant files to download in download_model.py to allow for app use on AWS
49e0db8

seanpedrickcase commited on

Updated Dockerfile with latest packages
08eb30d

seanpedrickcase commited on

Added example of how to run function from command line. Updated packages. Embedding model default now smaller and at fp16.
34f1e83

seanpedrickcase commited on

Improved initial clean options. Now has option to return embeddings only.
89c4d20

seanpedrickcase commited on

Corrected minor Dockerfile package version issue
593153e

seanpedrickcase commited on

App now retains original index following cleaning to allow for referring back to original data
90553eb

seanpedrickcase commited on

Now installed dependencies into correct folder in Dockerfile
5888649

seanpedrickcase commited on

Finally managed to enforce cpu torch install in Dockerfile
97913c4

seanpedrickcase commited on

Further optimised Dockerfile and requirements (smaller torch installation now hopefully)
00db72b

seanpedrickcase commited on

Transferring across installed packages from build stage in Dockerfile
c9da99d

seanpedrickcase commited on

Changed Dockerfile to multi-stage build to further reduce size
0fd155c

seanpedrickcase commited on

Trying to make container image smaller through Dockerfile
7d5387e

seanpedrickcase commited on

Minor changes to reduce Dockerfile size
b767539

seanpedrickcase commited on

Updated download_model.py to download pytorch .bin file
1c0bfd4

seanpedrickcase commited on

Removed some requirements from Dockerfile for AWS deployment to reduce container size
51ba1cb

seanpedrickcase commited on

Added NUMBA_CACHE_DIR to Docker environmental variables
cd6a3e0

seanpedrickcase commited on

Allowed for app running on AWS to use smaller embedding model and not to load representation LLM (due to size restrictions).
22ca76e

seanpedrickcase commited on

Dockerfile now installs models directly into user folder instead of moving from base folder
3c1c3de

seanpedrickcase commited on

Updated Gradio version for spaces. Updated Dockerfile to enable Llama.cpp build with Cmake
d34af22

seanpedrickcase commited on

Only aggregate topics not 'other', allowed for minimum sentence length, default max_topics now will auto aggregate topics. Added Cognito Auth functionality (boto3 with AWS).
1e2bb3e

seanpedrickcase commited on

Can split passages into sentences. Improved embedding, LLM representation models, improved zero shot capabilities
55f0ce3

seanpedrickcase commited on

Updated packages. Improve hierarchy vis. Better models - mixedbread and phi3. Now option to split texts into sentences before modelling.
04a15c5

seanpedrickcase commited on

Minor cleaning, csv formatting changes
d80c8f5

Sean-Case commited on

Reduce outliers now more efficient and relabels with correct vectoriser. Default topic labels now tidier. Hiearchical topics outputs more useful for joining to df afterwards. Switched low resource reduction algorithm to UMAP as default is not good.
e1c1f68

Sonnyjim commited on

Should now parse custom regex correctly. Will now wipe previously created embeddings if 'low resource mode' option switched.
0a543a0

Sean-Case commited on

Allowed for uploading custom regex for cleaning. Fixed calculate all probabilities, reduce outliers. Added text tree for hierarchical modelling.
381f959

Sonnyjim commited on

Upgraded to Gradio 4.16.0. Guide for converting to exe added.
0a177ca

Sonnyjim commited on

Hopefully now LLM download from hub should work
cdcd7af

Sonnyjim commited on

Note about LLM not working now successfully added!
e2dfc1e

Sean-Case commited on

Added note to say that LLM representation is not currently working on the HF website
3b4333f

Sean-Case commited on