Patent analytics, text mining & network science β€” straight from R

Author

JP van der Pol

Published

March 25, 2026

πŸ•ΈοΈ NetworkIsLifeR

Your all-in-one R toolkit for patent analytics, topic modelling, and network science.
Built for Utrecht University’s Innometrics and Data Analytics for Sustainability courses β€” and useful well beyond the classroom.

v0.0.1 Β· MIT License Β· Early Development


Why NetworkIsLifeR?

Modern innovation research sits at the intersection of three messy worlds: unstructured patent text, large bibliographic datasets, and complex collaboration networks. Most R packages solve one piece of the puzzle. NetworkIsLifeR bundles them all into a coherent, course-tested workflow.

Whether you’re mapping technology landscapes from Lens.org exports, clustering research topics with BERTopic-style embeddings, or visualising co-inventor networks, this package gets you from raw data to insight in far fewer steps.


Core Capabilities

πŸ“„ Patent & Publication Parsing

Read and flatten Lens.org JSONL / JSONL.GZ exports β€” including nested inventor, assignee, and citation fields β€” into tidy R data frames ready for analysis.

🧠 Topic Modelling

A BERTopic-inspired pipeline: sentence embeddings β†’ UMAP dimensionality reduction β†’ HDBSCAN clustering. No Python environment required for the core flow.

🏷️ Topic Representation

Label your clusters with human-readable terms using udpipe / quanteda (pure R) or spaCy via the reticulate bridge.

🏒 Organisation Cleaning

Fuzzy-match and classify messy company/assignee name strings β€” a perennial headache in patent data, finally tamed.

πŸ•ΈοΈ Network Helpers

Lightweight igraph-based utilities for building co-inventor, co-assignee, and citation networks from parsed patent data.


Installation

NetworkIsLifeR is not yet on CRAN. Install the development version directly from GitHub:

Code
# Install remotes if you don't have it
install.packages("remotes")

# Install NetworkIsLifeR from GitHub
remotes::install_github("JPvdP/NetworkIsLifeR")

Then load it like any other package:

Code
library(NetworkIsLifeR)

⚠️ Early Development Notice
This package is at version 0.0.1. The API may change between releases. Pin your dependency to a specific commit in production workflows:
remotes::install_github("JPvdP/NetworkIsLifeR", ref = "abc1234")


A Typical Workflow

The diagram below shows how the package’s modules connect in a real innovation-analytics pipeline.

Lens.org export (.jsonl.gz)
        β”‚
        β–Ό
  parse_lens_jsonl()          ← patent / publication parsing
        β”‚
        β”œβ”€β”€β”€ tidy patent data frame
        β”‚           β”‚
        β”‚     clean_org_names()      ← organisation normalisation
        β”‚           β”‚
        β”‚     build_network()        ← co-inventor / co-assignee graph
        β”‚           β”‚
        β”‚      igraph analysis ──────────────────────────────► Network viz
        β”‚
        └─── abstract / claim text
                    β”‚
              embed_texts()          ← sentence embeddings
                    β”‚
               umap_reduce()         ← dimensionality reduction
                    β”‚
              hdbscan_cluster()      ← topic clustering
                    β”‚
            represent_topics()       ← udpipe / spaCy labels
                    β”‚
                Topic map / wordclouds

Quick-Start Examples

1 Β· Parse a Lens.org Export

Code
library(NetworkIsLifeR)

# Point to your downloaded .jsonl or .jsonl.gz file
patents <- parse_lens_jsonl("my_export.jsonl.gz")

# Returns a tidy tibble with one row per patent
head(patents)

2 Β· Clean Organisation Names

Inventor assignee fields in patent data are notoriously noisy ("IBM Corp.", "I.B.M.", "International Business Machines"). The cleaning helper standardises them:

Code
patents_clean <- patents |>
  dplyr::mutate(assignee_clean = clean_org_names(assignee))

3 Β· Build a Co-Inventor Network

Code
library(igraph)

g <- build_network(patents_clean, type = "co_inventor")

# Basic summary
summary(g)

# Plot with igraph
plot(g,
     vertex.size   = 5,
     vertex.label  = NA,
     edge.color    = "grey70",
     main          = "Co-Inventor Network")

4 Β· BERTopic-Style Topic Modelling

Code
# Step 1 β€” embed abstracts (uses sentence-transformers via reticulate,
#           or a pure-R fallback)
embeddings <- embed_texts(patents_clean$abstract)

# Step 2 β€” reduce to 2D
reduced <- umap_reduce(embeddings)

# Step 3 β€” cluster
clusters <- hdbscan_cluster(reduced, min_cluster_size = 10)

# Step 4 β€” label clusters
topics <- represent_topics(patents_clean$abstract, clusters,
                           method = "udpipe")

# Inspect top terms per topic
print(topics)

Data Included

The package ships with a sample dataset (NW_IPC_NL_regions.csv) containing Dutch regional IPC (International Patent Classification) patent counts β€” handy for testing network and regional-innovation workflows without needing a Lens.org account.

Code
# Load the sample IPC data
ipc <- read.csv(system.file("NW_IPC_NL_regions.csv",
                            package = "NetworkIsLifeR"))
head(ipc)

Dependencies at a Glance

Purpose Package(s)
Data wrangling dplyr, tidyr, jsonlite
Network analysis igraph
Text / NLP (R-native) udpipe, quanteda
Dimensionality reduction uwot (UMAP)
Clustering dbscan (HDBSCAN)
Python bridge (optional) reticulate + spaCy

Who Is This For?

  • Students in Utrecht University’s Innometrics and Data Analytics for Sustainability courses looking for a single package that covers the full patent-analytics pipeline.
  • Innovation researchers who need reproducible, R-native workflows for technology landscape mapping.
  • Data scientists exploring bibliometric or scientometric datasets from Lens.org.

Contributing & Feedback

The project is in active early development and contributions are warmly welcomed.

  1. Fork the repository on GitHub.
  2. Create a feature branch: git checkout -b feat/my-feature.
  3. Open a pull request with a clear description of your changes.

Bug reports and feature requests can be filed as GitHub Issues.


Citation

If you use NetworkIsLifeR in your research, please cite:

van der Pol, J. (2024). NetworkIsLifeR: Utilities for patent analytics,
  topic modelling and network science. R package version 0.0.1.
  https://github.com/JPvdP/NetworkIsLifeR

License

NetworkIsLifeR is released under the MIT License. See LICENSE.md for details.


Built with ❀️ at Utrecht University · Rendered with Quarto