DerwenAI/textgraphs · TL;DR: TextGraphs

Derwen, Inc. org Dec 2, 2023

•

edited Dec 2, 2023

This space uses spaCy + SpanMarkerNER to construct a lemma graph. This is a prelude to inferring the nodes, edges, properties, and probabilities for building a knowledge graph from raw unstructured text source. The open source library is used in production, though it also a provides a playground to prototype and evaluate abstractions based on "Graph Levels Of Detail".

Analysis is intended to run on a stream of paragraphs, taking into account where/how components of spaCy pipelines tend to work more efficiently and can be augmented with LLMs, graph algorithms, graph ML, etc. The process is designed to be iterative and the results are therefore cumulative.

This demo includes multiple steps:

use spaCy to parse a document, with SpanMarkerNER LLM assist
build a lemma graph in NetworkX from the parse results
use OpenNRE to infer relations among entities (optional)
use DBPedia Spotlight to perform entity linking and some graph inference.
run a modified textrank algorithm plus graph analytics
approximate a pareto archive (hypervolume) to re-rank extracted entities
visualize the lemma graph interactively in PyVis
cluster communities within the lemma graph
apply topological transforms to enhance embeddings (in progress)
run graph representation learning on the graph of relations (in progress)

One important insight (based on following the textgraph research community for the past ~15 years or so) is that having an domain-specific knowledge graph available a priori for sampling during the parse (e.g., for semantic field random walks provides multiple benefits:

faster/better convergence for extracting and ranking the key phrases in a raw text
entity linking as a by-product of NLP parsing
big steps toward semi-automated knowledge graph construction from large collections of unstructured text sources

To these ends, this library is exploring the use of graph foundation models -- on the resulting lemma graph to augment approaches graph representation learning, as a step toward providing graph levels of detail.

Overall, the outcomes for this library include ranked extracted key phrases plus a graph which can be used to construct or augment a knowledge graph.

pacoid pinned discussion Dec 3, 2023

pacoid unpinned discussion Feb 16

pacoid changed discussion status to closed Feb 16