Understanding the Power of Unification with Unified.js
Unified.js is an awesome library for working with natural language data. It offers a powerful and flexible solution for transforming, analyzing, and manipulating natural language data. The easy way to see it is that it compiles content to syntax trees and the other way around and there is a great community building blocks to work with those syntax trees.
The best example I have and why I started to play around with it, it's this blog as I'm using it to process from markdown to html adding code snippets, calculate the reading time and creating a short description in the middle.
In this post, I will share a quick review and also show what I'm doing for my personal blog with it.
What is Unified.js?
At its core, Unified.js is a parsing library that makes it possible to work with natural language data in a standardized format. This is achieved by transforming raw text into an abstract syntax tree (AST), which is a structured representation of the text's meaning and structure. With this AST, you can easily manipulate the data and extract meaningful information.
For example, let's say you have a blog post written in markdown. With Unified.js, you can parse the markdown text into an AST, then extract the headings, links, and other important information. This information can then be used to create a table of contents, generate a summary, or any other number of useful applications.
To parse natural language data with Unified.js, you simply need to pass the text to the unified()
function, along with a parser that corresponds to the format of the data. For example, to parse a markdown file, you would use the markdown
parser.
Here's a code sample that demonstrates how to parse a markdown file with Unified.js:
const unified = require("unified");
const markdown = require("remark-parse");
const markdownText = "# Hello, World!\n\nThis is a sample markdown file.";
const ast = unified().use(markdown).parse(markdownText);
In this example, we first require the unified and remark-parse modules, which provide the core functionality for parsing markdown data. Then, we pass the markdown text to the unified()
function, and use the markdown
parser to parse the data into an AST.
Transforming Natural Language Data
Once you've parsed natural language data into an AST, you can then use Unified.js to transform this data in a variety of ways. With its flexible architecture, you can create custom transformations that manipulate the AST and extract information in a way that is tailored to your specific needs.
For example, let's say you have a large collection of customer reviews, and you want to extract the sentiment of each review. With Unified.js, you can create a custom transformation that analyzes the sentiment of each review and categorizes it as positive, negative, or neutral.
Here's a code sample that demonstrates how to perform sentiment analysis with Unified.js:
const unified = require("unified");
const markdown = require("remark-parse");
const sentiment = require("sentiment");
const markdownText =
"# Customer Reviews\n\nI loved this product!\n\nI hated this product.";
const ast = unified().use(markdown).parse(markdownText);
ast.children.forEach((node) => {
if (node.type === "paragraph") {
const sentimentScore = sentiment(node.children[0].value).score;
if (sentimentScore > 0) {
console.log(`Positive review: ${node.children[0].value}`);
} else if (sentimentScore < 0) {
console.log(`Negative review: ${node.children[0].value}`);
} else {
console.log(`Neutral review: ${node.children[0].value}`);
}
}
});
In this example, we use the sentiment
module to perform sentiment analysis on each paragraph node in the AST. The sentiment()
function returns a score that indicates the sentiment of the text, with positive scores indicating positive sentiment, negative scores indicating negative sentiment, and scores close to zero indicating neutral sentiment.
My own personal experience
As I mentioned before, I'm using it for this blog to process the markdown files and generate what you are reading. The following is the full pipeline.
import { unified } from "unified";
import remarkParse from "remark-parse";
import rehypeInferReadingTimeMeta from "rehype-infer-reading-time-meta";
import remarkGfm from "remark-gfm";
import remarkRehype from "remark-rehype";
import rehypeRaw from "rehype-raw";
import rehypeInferDescriptionMeta from "rehype-infer-description-meta";
import prism from "remark-prism";
import rehypeStringify from "rehype-stringify";
import transformImgSrc from "./remark-transformImgSrc";
export default async function markdownToHtml(slug: string, markdown: string) {
const result = await unified()
.use(remarkParse)
.use(remarkGfm)
.use(transformImgSrc, { slug })
.use(prism)
.use(remarkRehype, { allowDangerousHtml: true })
.use(rehypeRaw)
.use(rehypeInferReadingTimeMeta)
.use(rehypeInferDescriptionMeta)
.use(rehypeStringify)
.process(markdown);
return { content: result.toString(), meta: result.data.meta } as {
content: string;
meta: Metadata;
};
}
Note that I'm using a lot of transformations, even some custom ones (i.e., transformImgSrc
which replace all the images urls paths to move them inside next.js' public folder). You can see the transformImgSrc
code below. I'm using the visit
function to simplify the navigation in the AST and just finding all the childs of type image and then replacing the image path.
import { visit } from "unist-util-visit";
const imgDirInsidePublic = "assets";
export default function transformImgSrc({ slug }) {
return (tree, _file) => {
visit(tree, "paragraph", (node) => {
const image = node.children.find((child) => child.type === "image");
if (image) {
const fileName = image.url.replace("./", "");
image.url = `/${imgDirInsidePublic}/${slug}/${fileName}`;
}
});
};
}
Another interesting transformation are the support of GitHub markdown by using remark-gfm
and the reading time calculation with rehype-infer-reading-time-meta
. As mentioned before and as you can see, I'm also adding code snippets with remark-prism
. So a lot of power with a few dependencies and chained calls.
Conclusion
The benefits of using Unified.js are many, but some of the most notable include:
-
Flexibility: Unified.js is incredibly flexible and can be used with a wide range of data formats, including markdown, HTML, and even plain text. This makes it a versatile tool for a variety of applications.
-
Ease of Use: Despite its power, Unified.js is surprisingly easy to use. Its API is intuitive and straightforward, making it accessible to developers of all skill levels.
-
High-Performance: Unified.js is built to handle large amounts of data, and its performance is unparalleled. It can handle even the largest of datasets with ease, making it ideal for use in high-traffic applications.
-
Extensibility: Unified.js is highly extensible, and its modular design makes it easy to add custom functionality. Whether you need to add support for a new data format or create a custom transformation, Unified.js has you covered.
Unified.js is a powerful and flexible library that makes it possible to work with natural language data in a standardized format. With its ease of use, high performance, and extensibility, it's no wonder that Unified.js has become a popular choice among developers. So why not give it a try? You might just be surprised at what you can build!