language:
- en
massive thank you to @silveroxides for phenomenal work collecting pristine state dicts and related information
MIR (Machine Intelligence Resource)
A naming schema for AIGC/ML work.
The MIR classification format seeks to standardize and complete a hyperlinked network of model information, improving accessibility and reproducibility across the AI community.
The work is inspired by:
Example:
mir : model . transformer . clip-l : stable-diffusion-xl
mir : model . lora . hyper : flux-1
↑ ↑ ↑ ↑ ↑
[URI]:[Domain].[Architecture].[Series]:[Compatibility]
Definitions:
Like other URI schema, the order of the identifiers roughly indicates their specificity from left (broad) to right (narrow)
Domain
dev
: Varying local neural network layers, in-training, pre-release, items under evaluation, likely in unexpected formats
model
: Static local neural network layers. Publicly released machine learning models with an identifier in the database
operations
: Varying global neural network attributes, algorithms, optimizations and procedures on models
info
: Static global neural network attributes, metadata with an identifier in the database
Architecture
Broad and general terms for system architectures.
dit
: Diffusion transformer, typically Vision Synthesis
'unet': Unet diffusion structure
art
: Autoregressive transformer, typically LLMs
lora
: Low-Rank Adapter (may work with dit or transformer)
vae
: Variational Autoencoder
etc
Series
Foundational network and technique types.
Compatability
Implementation details based on version-breaking changes, configuration inconsistencies, or other conflicting indicators that have practical application.
Goals
- Standard identification scheme for ALL fields of ML-related development
- Simplification of code for model-related logistics
- Rapid retrieval of resources and metadata
- Efficient and reliable compatability checks
- Organized hyperparameter management
Why not use `diffusion`/`sgm`, `ldm`/`text`/hf.co folder-structure/brand or trade name/preprint paper/development house/algorithm
- The format here isnt finalized, but overlapping resource definitions or complicated categories that are difficult to narrow have been pruned
- Likewise, definitions that are too specific have also been trimmed
- HF.CO become inconsistent across folders/files and often the metadata enforcement of many important developments is neglected
- Development credit often shared, Paper heredity tree, super complicated
- Algorithms (esp application) are less common knowledge, vague,
and I'm too smooth-brain.- Overall an attempt at impartiality and neutrality with regards to brand/territory origins
Why `unet`, `dit`, `lora` over alternatives
- UNET/DiT/Transformer are shared enough to be genre-ish but not too narrowly specific
- Very similar technical process on this level
- Functional and efficient for random lookups
- Short to type
Roadmap
- Decide on
@
or:
delimeters (like @8cfg for an indistinguishable 8 step lora that requires cfg)- crucial spec element, or an optional, MIR app-determined feature?
- Proof of concept generative model registry
- Ensure compatability/integration/cross-pollenation with OECD AI Classifications
- Ensure compatability/integration/cross-pollenation with NIST AI 200-1 NIST Trustworthy and Responsible AI