Code
# Install remotes if you don't have it
install.packages("remotes")
# Install NetworkIsLifeR from GitHub
remotes::install_github("JPvdP/NetworkIsLifeR")Patent analytics, text mining & network science β straight from R
JP van der Pol
March 25, 2026
Your all-in-one R toolkit for patent analytics, topic modelling, and network science.
Built for Utrecht Universityβs Innometrics and Data Analytics for Sustainability courses β and useful well beyond the classroom.
v0.0.1 Β· MIT License Β· Early Development
Modern innovation research sits at the intersection of three messy worlds: unstructured patent text, large bibliographic datasets, and complex collaboration networks. Most R packages solve one piece of the puzzle. NetworkIsLifeR bundles them all into a coherent, course-tested workflow.
Whether youβre mapping technology landscapes from Lens.org exports, clustering research topics with BERTopic-style embeddings, or visualising co-inventor networks, this package gets you from raw data to insight in far fewer steps.
Read and flatten Lens.org JSONL / JSONL.GZ exports β including nested inventor, assignee, and citation fields β into tidy R data frames ready for analysis.
A BERTopic-inspired pipeline: sentence embeddings β UMAP dimensionality reduction β HDBSCAN clustering. No Python environment required for the core flow.
Label your clusters with human-readable terms using udpipe / quanteda (pure R) or spaCy via the reticulate bridge.
Fuzzy-match and classify messy company/assignee name strings β a perennial headache in patent data, finally tamed.
Lightweight igraph-based utilities for building co-inventor, co-assignee, and citation networks from parsed patent data.
NetworkIsLifeR is not yet on CRAN. Install the development version directly from GitHub:
Then load it like any other package:
β οΈ Early Development Notice
This package is at version 0.0.1. The API may change between releases. Pin your dependency to a specific commit in production workflows:
remotes::install_github("JPvdP/NetworkIsLifeR", ref = "abc1234")
The diagram below shows how the packageβs modules connect in a real innovation-analytics pipeline.
Lens.org export (.jsonl.gz)
β
βΌ
parse_lens_jsonl() β patent / publication parsing
β
ββββ tidy patent data frame
β β
β clean_org_names() β organisation normalisation
β β
β build_network() β co-inventor / co-assignee graph
β β
β igraph analysis βββββββββββββββββββββββββββββββΊ Network viz
β
ββββ abstract / claim text
β
embed_texts() β sentence embeddings
β
umap_reduce() β dimensionality reduction
β
hdbscan_cluster() β topic clustering
β
represent_topics() β udpipe / spaCy labels
β
Topic map / wordclouds
Inventor assignee fields in patent data are notoriously noisy ("IBM Corp.", "I.B.M.", "International Business Machines"). The cleaning helper standardises them:
# Step 1 β embed abstracts (uses sentence-transformers via reticulate,
# or a pure-R fallback)
embeddings <- embed_texts(patents_clean$abstract)
# Step 2 β reduce to 2D
reduced <- umap_reduce(embeddings)
# Step 3 β cluster
clusters <- hdbscan_cluster(reduced, min_cluster_size = 10)
# Step 4 β label clusters
topics <- represent_topics(patents_clean$abstract, clusters,
method = "udpipe")
# Inspect top terms per topic
print(topics)The package ships with a sample dataset (NW_IPC_NL_regions.csv) containing Dutch regional IPC (International Patent Classification) patent counts β handy for testing network and regional-innovation workflows without needing a Lens.org account.
| Purpose | Package(s) |
|---|---|
| Data wrangling | dplyr, tidyr, jsonlite |
| Network analysis | igraph |
| Text / NLP (R-native) | udpipe, quanteda |
| Dimensionality reduction | uwot (UMAP) |
| Clustering | dbscan (HDBSCAN) |
| Python bridge (optional) | reticulate + spaCy |
The project is in active early development and contributions are warmly welcomed.
git checkout -b feat/my-feature.Bug reports and feature requests can be filed as GitHub Issues.
If you use NetworkIsLifeR in your research, please cite:
van der Pol, J. (2024). NetworkIsLifeR: Utilities for patent analytics,
topic modelling and network science. R package version 0.0.1.
https://github.com/JPvdP/NetworkIsLifeR
NetworkIsLifeR is released under the MIT License. See LICENSE.md for details.
Built with β€οΈ at Utrecht University Β· Rendered with Quarto
---
title: "NetworkIsLifeR"
subtitle: "Patent analytics, text mining & network science β straight from R"
author: "JP van der Pol"
date: today
sidebar: r-package
format:
html:
theme: cosmo
toc: true
toc-depth: 3
toc-title: "On this page"
code-fold: true
code-tools: true
highlight-style: github
smooth-scroll: true
css: |
body { font-family: 'Inter', sans-serif; }
.hero { background: linear-gradient(135deg, #1a1a2e 0%, #16213e 50%, #0f3460 100%);
color: white; padding: 3rem 2rem; border-radius: 12px; margin-bottom: 2rem; }
.hero h1 { font-size: 2.5rem; font-weight: 700; margin-bottom: 0.5rem; }
.hero p { font-size: 1.1rem; opacity: 0.85; }
.badge { display: inline-block; background: #e63946; color: white;
font-size: 0.75rem; padding: 2px 8px; border-radius: 20px;
vertical-align: middle; margin-left: 6px; }
.feature-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(240px, 1fr));
gap: 1.2rem; margin: 1.5rem 0; }
.feature-card { border: 1px solid #dee2e6; border-radius: 10px; padding: 1.2rem;
background: #f8f9fa; }
.feature-card h4 { margin-top: 0; color: #0f3460; }
.callout-custom { border-left: 4px solid #e63946; padding: 0.8rem 1rem;
background: #fff5f5; border-radius: 0 8px 8px 0; margin: 1rem 0; }
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE,
fig.width = 8, fig.height = 5)
```
::: {.hero}
# πΈοΈ NetworkIsLifeR
**Your all-in-one R toolkit for patent analytics, topic modelling, and network science.**
Built for Utrecht University's *Innometrics* and *Data Analytics for Sustainability* courses β and useful well beyond the classroom.
<span style="opacity:0.7; font-size:0.9rem;">v0.0.1 Β· MIT License Β· Early Development</span>
:::
---
## Why NetworkIsLifeR?
Modern innovation research sits at the intersection of three messy worlds: **unstructured patent text**, **large bibliographic datasets**, and **complex collaboration networks**. Most R packages solve one piece of the puzzle. NetworkIsLifeR bundles them all into a coherent, course-tested workflow.
Whether you're mapping technology landscapes from Lens.org exports, clustering research topics with BERTopic-style embeddings, or visualising co-inventor networks, this package gets you from raw data to insight in far fewer steps.
---
## Core Capabilities
::: {.feature-grid}
::: {.feature-card}
#### π Patent & Publication Parsing
Read and flatten Lens.org JSONL / JSONL.GZ exports β including nested inventor, assignee, and citation fields β into tidy R data frames ready for analysis.
:::
::: {.feature-card}
#### π§ Topic Modelling
A BERTopic-inspired pipeline: sentence embeddings β UMAP dimensionality reduction β HDBSCAN clustering. No Python environment required for the core flow.
:::
::: {.feature-card}
#### π·οΈ Topic Representation
Label your clusters with human-readable terms using `udpipe` / `quanteda` (pure R) or `spaCy` via the `reticulate` bridge.
:::
::: {.feature-card}
#### π’ Organisation Cleaning
Fuzzy-match and classify messy company/assignee name strings β a perennial headache in patent data, finally tamed.
:::
::: {.feature-card}
#### πΈοΈ Network Helpers
Lightweight `igraph`-based utilities for building co-inventor, co-assignee, and citation networks from parsed patent data.
:::
:::
---
## Installation
NetworkIsLifeR is not yet on CRAN. Install the development version directly from GitHub:
```{r install, eval=FALSE}
# Install remotes if you don't have it
install.packages("remotes")
# Install NetworkIsLifeR from GitHub
remotes::install_github("JPvdP/NetworkIsLifeR")
```
Then load it like any other package:
```{r load, eval=FALSE}
library(NetworkIsLifeR)
```
::: {.callout-custom}
**β οΈ Early Development Notice**
This package is at version 0.0.1. The API may change between releases. Pin your dependency to a specific commit in production workflows:
`remotes::install_github("JPvdP/NetworkIsLifeR", ref = "abc1234")`
:::
---
## A Typical Workflow
The diagram below shows how the package's modules connect in a real innovation-analytics pipeline.
```
Lens.org export (.jsonl.gz)
β
βΌ
parse_lens_jsonl() β patent / publication parsing
β
ββββ tidy patent data frame
β β
β clean_org_names() β organisation normalisation
β β
β build_network() β co-inventor / co-assignee graph
β β
β igraph analysis βββββββββββββββββββββββββββββββΊ Network viz
β
ββββ abstract / claim text
β
embed_texts() β sentence embeddings
β
umap_reduce() β dimensionality reduction
β
hdbscan_cluster() β topic clustering
β
represent_topics() β udpipe / spaCy labels
β
Topic map / wordclouds
```
---
## Quick-Start Examples
### 1 Β· Parse a Lens.org Export
```{r parse-example, eval=FALSE}
library(NetworkIsLifeR)
# Point to your downloaded .jsonl or .jsonl.gz file
patents <- parse_lens_jsonl("my_export.jsonl.gz")
# Returns a tidy tibble with one row per patent
head(patents)
```
### 2 Β· Clean Organisation Names
Inventor assignee fields in patent data are notoriously noisy (`"IBM Corp."`, `"I.B.M."`, `"International Business Machines"`). The cleaning helper standardises them:
```{r clean-example, eval=FALSE}
patents_clean <- patents |>
dplyr::mutate(assignee_clean = clean_org_names(assignee))
```
### 3 Β· Build a Co-Inventor Network
```{r network-example, eval=FALSE}
library(igraph)
g <- build_network(patents_clean, type = "co_inventor")
# Basic summary
summary(g)
# Plot with igraph
plot(g,
vertex.size = 5,
vertex.label = NA,
edge.color = "grey70",
main = "Co-Inventor Network")
```
### 4 Β· BERTopic-Style Topic Modelling
```{r topic-example, eval=FALSE}
# Step 1 β embed abstracts (uses sentence-transformers via reticulate,
# or a pure-R fallback)
embeddings <- embed_texts(patents_clean$abstract)
# Step 2 β reduce to 2D
reduced <- umap_reduce(embeddings)
# Step 3 β cluster
clusters <- hdbscan_cluster(reduced, min_cluster_size = 10)
# Step 4 β label clusters
topics <- represent_topics(patents_clean$abstract, clusters,
method = "udpipe")
# Inspect top terms per topic
print(topics)
```
---
## Data Included
The package ships with a sample dataset (`NW_IPC_NL_regions.csv`) containing Dutch regional IPC (International Patent Classification) patent counts β handy for testing network and regional-innovation workflows without needing a Lens.org account.
```{r sample-data, eval=FALSE}
# Load the sample IPC data
ipc <- read.csv(system.file("NW_IPC_NL_regions.csv",
package = "NetworkIsLifeR"))
head(ipc)
```
---
## Dependencies at a Glance
| Purpose | Package(s) |
|---|---|
| Data wrangling | `dplyr`, `tidyr`, `jsonlite` |
| Network analysis | `igraph` |
| Text / NLP (R-native) | `udpipe`, `quanteda` |
| Dimensionality reduction | `uwot` (UMAP) |
| Clustering | `dbscan` (HDBSCAN) |
| Python bridge (optional) | `reticulate` + `spaCy` |
---
## Who Is This For?
- **Students** in Utrecht University's Innometrics and Data Analytics for Sustainability courses looking for a single package that covers the full patent-analytics pipeline.
- **Innovation researchers** who need reproducible, R-native workflows for technology landscape mapping.
- **Data scientists** exploring bibliometric or scientometric datasets from Lens.org.
---
## Contributing & Feedback
The project is in active early development and contributions are warmly welcomed.
1. Fork the repository on [GitHub](https://github.com/JPvdP/NetworkIsLifeR).
2. Create a feature branch: `git checkout -b feat/my-feature`.
3. Open a pull request with a clear description of your changes.
Bug reports and feature requests can be filed as [GitHub Issues](https://github.com/JPvdP/NetworkIsLifeR/issues).
---
## Citation
If you use NetworkIsLifeR in your research, please cite:
```
van der Pol, J. (2024). NetworkIsLifeR: Utilities for patent analytics,
topic modelling and network science. R package version 0.0.1.
https://github.com/JPvdP/NetworkIsLifeR
```
---
## License
NetworkIsLifeR is released under the **MIT License**. See [LICENSE.md](https://github.com/JPvdP/NetworkIsLifeR/blob/main/LICENSE.md) for details.
---
<p style="text-align:center; color:#888; font-size:0.85rem;">
Built with β€οΈ at Utrecht University Β· Rendered with <a href="https://quarto.org">Quarto</a>
</p>