NetworkIsLifeR

Patent analytics, text mining & network science — straight from R

Author

JP van der Pol

Published

March 25, 2026

🕸️ NetworkIsLifeR

Your all-in-one R toolkit for patent analytics, topic modelling, and network science.
Built for Utrecht University’s Innometrics and Data Analytics for Sustainability courses — and useful well beyond the classroom.

v0.0.1 · MIT License · Early Development

Why NetworkIsLifeR?

Modern innovation research sits at the intersection of three messy worlds: unstructured patent text, large bibliographic datasets, and complex collaboration networks. Most R packages solve one piece of the puzzle. NetworkIsLifeR bundles them all into a coherent, course-tested workflow.

Whether you’re mapping technology landscapes from Lens.org exports, clustering research topics with BERTopic-style embeddings, or visualising co-inventor networks, this package gets you from raw data to insight in far fewer steps.

Core Capabilities

📄 Patent & Publication Parsing

Read and flatten Lens.org JSONL / JSONL.GZ exports — including nested inventor, assignee, and citation fields — into tidy R data frames ready for analysis.

🧠 Topic Modelling

A BERTopic-inspired pipeline: sentence embeddings → UMAP dimensionality reduction → HDBSCAN clustering. No Python environment required for the core flow.

🏷️ Topic Representation

Label your clusters with human-readable terms using udpipe / quanteda (pure R) or spaCy via the reticulate bridge.

🏢 Organisation Cleaning

Fuzzy-match and classify messy company/assignee name strings — a perennial headache in patent data, finally tamed.

🕸️ Network Helpers

Lightweight igraph-based utilities for building co-inventor, co-assignee, and citation networks from parsed patent data.

Installation

NetworkIsLifeR is not yet on CRAN. Install the development version directly from GitHub:

Code

# Install remotes if you don't have it
install.packages("remotes")

# Install NetworkIsLifeR from GitHub
remotes::install_github("JPvdP/NetworkIsLifeR")

Then load it like any other package:

Code

library(NetworkIsLifeR)

⚠️ Early Development Notice
This package is at version 0.0.1. The API may change between releases. Pin your dependency to a specific commit in production workflows:
remotes::install_github("JPvdP/NetworkIsLifeR", ref = "abc1234")

A Typical Workflow

The diagram below shows how the package’s modules connect in a real innovation-analytics pipeline.

Lens.org export (.jsonl.gz)
        │
        ▼
  parse_lens_jsonl()          ← patent / publication parsing
        │
        ├─── tidy patent data frame
        │           │
        │     clean_org_names()      ← organisation normalisation
        │           │
        │     build_network()        ← co-inventor / co-assignee graph
        │           │
        │      igraph analysis ──────────────────────────────► Network viz
        │
        └─── abstract / claim text
                    │
              embed_texts()          ← sentence embeddings
                    │
               umap_reduce()         ← dimensionality reduction
                    │
              hdbscan_cluster()      ← topic clustering
                    │
            represent_topics()       ← udpipe / spaCy labels
                    │
                Topic map / wordclouds

Quick-Start Examples

1 · Parse a Lens.org Export

Code

library(NetworkIsLifeR)

# Point to your downloaded .jsonl or .jsonl.gz file
patents <- parse_lens_jsonl("my_export.jsonl.gz")

# Returns a tidy tibble with one row per patent
head(patents)

2 · Clean Organisation Names

Inventor assignee fields in patent data are notoriously noisy ("IBM Corp.", "I.B.M.", "International Business Machines"). The cleaning helper standardises them:

Code

patents_clean <- patents |>
  dplyr::mutate(assignee_clean = clean_org_names(assignee))

3 · Build a Co-Inventor Network

Code

library(igraph)

g <- build_network(patents_clean, type = "co_inventor")

# Basic summary
summary(g)

# Plot with igraph
plot(g,
     vertex.size   = 5,
     vertex.label  = NA,
     edge.color    = "grey70",
     main          = "Co-Inventor Network")

4 · BERTopic-Style Topic Modelling

Code

# Step 1 — embed abstracts (uses sentence-transformers via reticulate,
#           or a pure-R fallback)
embeddings <- embed_texts(patents_clean$abstract)

# Step 2 — reduce to 2D
reduced <- umap_reduce(embeddings)

# Step 3 — cluster
clusters <- hdbscan_cluster(reduced, min_cluster_size = 10)

# Step 4 — label clusters
topics <- represent_topics(patents_clean$abstract, clusters,
                           method = "udpipe")

# Inspect top terms per topic
print(topics)

Data Included

The package ships with a sample dataset (NW_IPC_NL_regions.csv) containing Dutch regional IPC (International Patent Classification) patent counts — handy for testing network and regional-innovation workflows without needing a Lens.org account.

Code

# Load the sample IPC data
ipc <- read.csv(system.file("NW_IPC_NL_regions.csv",
                            package = "NetworkIsLifeR"))
head(ipc)

Dependencies at a Glance

Purpose	Package(s)
Data wrangling	`dplyr`, `tidyr`, `jsonlite`
Network analysis	`igraph`
Text / NLP (R-native)	`udpipe`, `quanteda`
Dimensionality reduction	`uwot` (UMAP)
Clustering	`dbscan` (HDBSCAN)
Python bridge (optional)	`reticulate` + `spaCy`

Who Is This For?

Students in Utrecht University’s Innometrics and Data Analytics for Sustainability courses looking for a single package that covers the full patent-analytics pipeline.
Innovation researchers who need reproducible, R-native workflows for technology landscape mapping.
Data scientists exploring bibliometric or scientometric datasets from Lens.org.

Contributing & Feedback

The project is in active early development and contributions are warmly welcomed.

Fork the repository on GitHub.
Create a feature branch: git checkout -b feat/my-feature.
Open a pull request with a clear description of your changes.

Bug reports and feature requests can be filed as GitHub Issues.

Citation

If you use NetworkIsLifeR in your research, please cite:

van der Pol, J. (2024). NetworkIsLifeR: Utilities for patent analytics,
  topic modelling and network science. R package version 0.0.1.
  https://github.com/JPvdP/NetworkIsLifeR

License

NetworkIsLifeR is released under the MIT License. See LICENSE.md for details.

Built with ❤️ at Utrecht University · Rendered with Quarto

--- title: "NetworkIsLifeR" subtitle: "Patent analytics, text mining & network science — straight from R" author: "JP van der Pol" date: today sidebar: r-package format: html: theme: cosmo toc: true toc-depth: 3 toc-title: "On this page" code-fold: true code-tools: true highlight-style: github smooth-scroll: true css: | body { font-family: 'Inter', sans-serif; } .hero { background: linear-gradient(135deg, #1a1a2e 0%, #16213e 50%, #0f3460 100%); color: white; padding: 3rem 2rem; border-radius: 12px; margin-bottom: 2rem; } .hero h1 { font-size: 2.5rem; font-weight: 700; margin-bottom: 0.5rem; } .hero p { font-size: 1.1rem; opacity: 0.85; } .badge { display: inline-block; background: #e63946; color: white; font-size: 0.75rem; padding: 2px 8px; border-radius: 20px; vertical-align: middle; margin-left: 6px; } .feature-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(240px, 1fr)); gap: 1.2rem; margin: 1.5rem 0; } .feature-card { border: 1px solid #dee2e6; border-radius: 10px; padding: 1.2rem; background: #f8f9fa; } .feature-card h4 { margin-top: 0; color: #0f3460; } .callout-custom { border-left: 4px solid #e63946; padding: 0.8rem 1rem; background: #fff5f5; border-radius: 0 8px 8px 0; margin: 1rem 0; } --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.width = 8, fig.height = 5) ``` ::: {.hero} # 🕸️ NetworkIsLifeR **Your all-in-one R toolkit for patent analytics, topic modelling, and network science.** Built for Utrecht University's *Innometrics* and *Data Analytics for Sustainability* courses — and useful well beyond the classroom. <span style="opacity:0.7; font-size:0.9rem;">v0.0.1 · MIT License · Early Development</span> ::: --- ## Why NetworkIsLifeR? Modern innovation research sits at the intersection of three messy worlds: **unstructured patent text**, **large bibliographic datasets**, and **complex collaboration networks**. Most R packages solve one piece of the puzzle. NetworkIsLifeR bundles them all into a coherent, course-tested workflow. Whether you're mapping technology landscapes from Lens.org exports, clustering research topics with BERTopic-style embeddings, or visualising co-inventor networks, this package gets you from raw data to insight in far fewer steps. --- ## Core Capabilities ::: {.feature-grid} ::: {.feature-card} #### 📄 Patent & Publication Parsing Read and flatten Lens.org JSONL / JSONL.GZ exports — including nested inventor, assignee, and citation fields — into tidy R data frames ready for analysis. ::: ::: {.feature-card} #### 🧠 Topic Modelling A BERTopic-inspired pipeline: sentence embeddings → UMAP dimensionality reduction → HDBSCAN clustering. No Python environment required for the core flow. ::: ::: {.feature-card} #### 🏷️ Topic Representation Label your clusters with human-readable terms using `udpipe` / `quanteda` (pure R) or `spaCy` via the `reticulate` bridge. ::: ::: {.feature-card} #### 🏢 Organisation Cleaning Fuzzy-match and classify messy company/assignee name strings — a perennial headache in patent data, finally tamed. ::: ::: {.feature-card} #### 🕸️ Network Helpers Lightweight `igraph`-based utilities for building co-inventor, co-assignee, and citation networks from parsed patent data. ::: ::: --- ## Installation NetworkIsLifeR is not yet on CRAN. Install the development version directly from GitHub: ```{r install, eval=FALSE} # Install remotes if you don't have it install.packages("remotes") # Install NetworkIsLifeR from GitHub remotes::install_github("JPvdP/NetworkIsLifeR") ``` Then load it like any other package: ```{r load, eval=FALSE} library(NetworkIsLifeR) ``` ::: {.callout-custom} **⚠️ Early Development Notice** This package is at version 0.0.1. The API may change between releases. Pin your dependency to a specific commit in production workflows: `remotes::install_github("JPvdP/NetworkIsLifeR", ref = "abc1234")` ::: --- ## A Typical Workflow The diagram below shows how the package's modules connect in a real innovation-analytics pipeline. ``` Lens.org export (.jsonl.gz) │ ▼ parse_lens_jsonl() ← patent / publication parsing │ ├─── tidy patent data frame │ │ │ clean_org_names() ← organisation normalisation │ │ │ build_network() ← co-inventor / co-assignee graph │ │ │ igraph analysis ──────────────────────────────► Network viz │ └─── abstract / claim text │ embed_texts() ← sentence embeddings │ umap_reduce() ← dimensionality reduction │ hdbscan_cluster() ← topic clustering │ represent_topics() ← udpipe / spaCy labels │ Topic map / wordclouds ``` --- ## Quick-Start Examples ### 1 · Parse a Lens.org Export ```{r parse-example, eval=FALSE} library(NetworkIsLifeR) # Point to your downloaded .jsonl or .jsonl.gz file patents <- parse_lens_jsonl("my_export.jsonl.gz") # Returns a tidy tibble with one row per patent head(patents) ``` ### 2 · Clean Organisation Names Inventor assignee fields in patent data are notoriously noisy (`"IBM Corp."`, `"I.B.M."`, `"International Business Machines"`). The cleaning helper standardises them: ```{r clean-example, eval=FALSE} patents_clean <- patents |> dplyr::mutate(assignee_clean = clean_org_names(assignee)) ``` ### 3 · Build a Co-Inventor Network ```{r network-example, eval=FALSE} library(igraph) g <- build_network(patents_clean, type = "co_inventor") # Basic summary summary(g) # Plot with igraph plot(g, vertex.size = 5, vertex.label = NA, edge.color = "grey70", main = "Co-Inventor Network") ``` ### 4 · BERTopic-Style Topic Modelling ```{r topic-example, eval=FALSE} # Step 1 — embed abstracts (uses sentence-transformers via reticulate, # or a pure-R fallback) embeddings <- embed_texts(patents_clean$abstract) # Step 2 — reduce to 2D reduced <- umap_reduce(embeddings) # Step 3 — cluster clusters <- hdbscan_cluster(reduced, min_cluster_size = 10) # Step 4 — label clusters topics <- represent_topics(patents_clean$abstract, clusters, method = "udpipe") # Inspect top terms per topic print(topics) ``` --- ## Data Included The package ships with a sample dataset (`NW_IPC_NL_regions.csv`) containing Dutch regional IPC (International Patent Classification) patent counts — handy for testing network and regional-innovation workflows without needing a Lens.org account. ```{r sample-data, eval=FALSE} # Load the sample IPC data ipc <- read.csv(system.file("NW_IPC_NL_regions.csv", package = "NetworkIsLifeR")) head(ipc) ``` --- ## Dependencies at a Glance | Purpose | Package(s) | |---|---| | Data wrangling | `dplyr`, `tidyr`, `jsonlite` | | Network analysis | `igraph` | | Text / NLP (R-native) | `udpipe`, `quanteda` | | Dimensionality reduction | `uwot` (UMAP) | | Clustering | `dbscan` (HDBSCAN) | | Python bridge (optional) | `reticulate` + `spaCy` | --- ## Who Is This For? - **Students** in Utrecht University's Innometrics and Data Analytics for Sustainability courses looking for a single package that covers the full patent-analytics pipeline. - **Innovation researchers** who need reproducible, R-native workflows for technology landscape mapping. - **Data scientists** exploring bibliometric or scientometric datasets from Lens.org. --- ## Contributing & Feedback The project is in active early development and contributions are warmly welcomed. 1. Fork the repository on [GitHub](https://github.com/JPvdP/NetworkIsLifeR). 2. Create a feature branch: `git checkout -b feat/my-feature`. 3. Open a pull request with a clear description of your changes. Bug reports and feature requests can be filed as [GitHub Issues](https://github.com/JPvdP/NetworkIsLifeR/issues). --- ## Citation If you use NetworkIsLifeR in your research, please cite: ``` van der Pol, J. (2024). NetworkIsLifeR: Utilities for patent analytics, topic modelling and network science. R package version 0.0.1. https://github.com/JPvdP/NetworkIsLifeR ``` --- ## License NetworkIsLifeR is released under the **MIT License**. See [LICENSE.md](https://github.com/JPvdP/NetworkIsLifeR/blob/main/LICENSE.md) for details. --- <p style="text-align:center; color:#888; font-size:0.85rem;"> Built with ❤️ at Utrecht University · Rendered with <a href="https://quarto.org">Quarto</a> </p>