Spaces:
Running
TL;DR: TextGraphs
This space uses spaCy
+ SpanMarkerNER
to construct a lemma graph. This is a prelude to inferring the nodes, edges, properties, and probabilities for building a knowledge graph from raw unstructured text source.
The open source library is used in production, though it also a provides a playground to prototype and evaluate abstractions based on "Graph Levels Of Detail".
Analysis is intended to run on a stream of paragraphs, taking into account where/how components of spaCy
pipelines tend to work more efficiently and can be augmented with LLMs, graph algorithms, graph ML, etc. The process is designed to be iterative and the results are therefore cumulative.
This demo includes multiple steps:
- use
spaCy
to parse a document, withSpanMarkerNER
LLM assist - build a lemma graph in
NetworkX
from the parse results - use
OpenNRE
to infer relations among entities (optional) - use
DBPedia Spotlight
to perform entity linking and some graph inference. - run a modified
textrank
algorithm plus graph analytics - approximate a pareto archive (hypervolume) to re-rank extracted entities
- visualize the lemma graph interactively in
PyVis
- cluster communities within the lemma graph
- apply topological transforms to enhance embeddings (in progress)
- run graph representation learning on the graph of relations (in progress)
One important insight (based on following the textgraph research community for the past ~15 years or so) is that having an domain-specific knowledge graph available a priori for sampling during the parse (e.g., for semantic field random walks provides multiple benefits:
- faster/better convergence for extracting and ranking the key phrases in a raw text
- entity linking as a by-product of NLP parsing
- big steps toward semi-automated knowledge graph construction from large collections of unstructured text sources
To these ends, this library is exploring the use of graph foundation models -- on the resulting lemma graph to augment approaches graph representation learning, as a step toward providing graph levels of detail.
Overall, the outcomes for this library include ranked extracted key phrases plus a graph which can be used to construct or augment a knowledge graph.