How I Publish Obsidian Files to the Web

10/19/2022

If you're like me, and you've come to appreciate the brilliance of Obsidian as a knowledge management tool, you might have wondered, "How can I share my beautifully interconnected thoughts with the world?"

Well, I've spent some time tinkering around this, and I've found a way to automatically publish my notes from Obsidian to the web.

I've explored various solutions, including pulling notes from Dropbox on build and Obsidian Publish, but for the past year, I've been copying them from my Obsidian vault into my NextJS content directory (version controlled with git)

In this article, I'll give you everything you need to publish your own Obsidian files to the web if you're using NextJS.

Before you venture down this path though, here's a quick list of why it's working for me:

I can easily update a bunch of notes in Obsidian and run my script to copy them into my git repo and push. They're deployed as a part of my existing site.
Page loads are significantly faster because the pages are rendered at build time rather than request time
Pages are server-side rendered, so they're better optimized for search engines and opengraph sharing
I have total control over how the page renders
I already have a website that I can add notes to rather than having a second place to point people
I can setup my own URL paths so that I use a flat file approach in the URL schema (ie /notes/<NOTE-NAME>) and display the hierarchy in the note sidebar (if a note lives 3 folders deep, it displays as 3 folders deep)

If you want to skip to the whole script, you can find it in the #The Script to Copy Obsidian `Published` Notes section.

Prerequisites

Before we dive into the depths of this automation process, it will be beneficial to have:

A working knowledge of Obsidian and its note-taking features.
Basic familiarity with JavaScript and NodeJS for understanding the script we will be using.
Understanding of NextJS, a popular React framework we'll use to serve our notes.
Comfort with file system operations as we'll be copying and moving files around.

Objective

In this post, I'm going to share a script that automates the publishing of your Obsidian content to the web. This script primarily does two things:

It fetches all the files that are marked as 'published' in the frontmatter.
It looks for any embedded images in these files and copies them into a public assets folder.

Understanding the Script

In this section, I'm going to break down the various parts of the script.

Here is a list of the packages I import and what I use them for:

fs - working with the filesystem
gray-matter - extract frontmatter from the files
date-fns - format the date for processing easily

import fs from "fs";
import matter from "gray-matter";
import { format } from "date-fns";

Regular Expression

In order to determine if the file has any files that are embedded, we need to use a regular expression. I declare the regular expression as a constant.

const EMBED_LINK_REGEX = /!\[\[([a-zA-ZÀ-ÿ0-9-'?%.():&,+/€_! ]+)\]\]/g;

I declare the image directory just to make it easy to find as a constant.

You'll notice two things about this code:

There is an environment variable called OBSIDIAN_VAULT_PATH_FROM_HOME. I pass this in to the script when I run in case I move my path. It also makes it easier for you to reuse
The path after the environment variable is where I store all of my images. You need to update that part to your image path.

Here is what that looks like:

const imagesDir = `${process.env.OBSIDIAN_VAULT_PATH_FROM_HOME}/00 - Meta/02 - Media`;

I also create a top level path for where posts are going to be created:

   const folderPath = `${process.cwd()}/content/posts`;

Now I setup my async function for getting the contents of a directory. This function takes a directory as its only argument.

async function getDirContents(directory: string) {
	// rest is here
}

Next we get the contents of the directory that's passed in. This gives us a list of contents back:

async function getDirContents(directory: string) {
	const contents = fs.readdirSync(directory);
	// rest is here
}

Now that we have the contents, we're going to loop over the contents as filename. In this function, we'll create the filepath and get the stats of the file.

The stats are important because they'll help us determine if the path that we're looking at is a directory or a file.

async function getDirContents(directory: string) {
	const contents = fs.readdirSync(directory);
  for (const filename of contents) {
    const filepath = `${directory}/${filename}`;
    const filestat = fs.statSync(filepath);
	// rest is here
 }
}

You may have noticed earlier that getDirContents takes a directory but wondered why it took a directory instead of being hard coded.

If we determine that the path that we're on in the for loop is a directory, we can now recursively run getDirContents on that directory.

async function getDirContents(directory: string) {
	const contents = fs.readdirSync(directory);

  for (const filename of contents) {
    const filepath = `${directory}/${filename}`;
    const filestat = fs.statSync(filepath);

    if (filestat.isDirectory()) {
      await getDirContents(filepath);
    } else if (filepath.match(/\.mdx?/)) {
	// rest is here
    }
 }
}

If filestat is not a directory, we know that we have a file, and we need to check it if it's a .md or .mdx file, so we use a regular expression.

I have a few .mdx files in my Obsidian vault just for publishing, but that's a post for a different time (if you're interested in learning more about it, tweet at me that you want to read about how I use .mdx!)

If the file matches the regular expression, we're going to read the file and get the frontmatter with matter from the gray-matter package:

async function getDirContents(directory: string) {
	const contents = fs.readdirSync(directory);

  for (const filename of contents) {
    const filepath = `${directory}/${filename}`;
    const filestat = fs.statSync(filepath);

    if (filestat.isDirectory()) {
      await getDirContents(filepath);
    } else if (filepath.match(/\.mdx?/)) {
	  const content = fs.readFileSync(filepath, "utf-8"); // read the contents
      const result = matter(content); // get the metadata
    }
 }
}

Now it's time to check if there a status field in the metadata and if so, is the lowercase'd version of it published:

async function getDirContents(directory: string) {
	const contents = fs.readdirSync(directory);

  for (const filename of contents) {
    const filepath = `${directory}/${filename}`;
    const filestat = fs.statSync(filepath);

    if (filestat.isDirectory()) {
      await getDirContents(filepath);
    } else if (filepath.match(/\.mdx?/)) {
	  const content = fs.readFileSync(filepath, "utf-8"); // read the contents
      const result = matter(content); // get the metadata
    }

     if (
        result.data.status &&
        result.data.status.toLowerCase() === "published"
      ) {
		// do something
      }
 }
}

Now that we know we have a file to process, it's time to do a little "sizzle". Instead of keeping date and modified in my frontmatter, I've started just using the value from the filestat.

If I have a post where I want to manually override either of those, I'll add them to the metadata. This code sets the date and modified if they're not set in the metadata:

async function getDirContents(directory: string) {
		// removed what's happening above for brevity
		if (!result.data.date) {
          result.data.date = format(filestat.birthtime, "yyyy-MM-dd");
        }
        if (!result.data.modified) {
          result.data.modified = format(filestat.mtime, "yyyy-MM-dd");
        }
		// rest is here
}

Now I want to check for any embedded images, because I want to pull those into my NextJS app's public assets folder.

Here's how I'm processing embedded images:

check for a match in the content of the file
if there are matches
- sanitize the image path by replacing the ![[ and ]]
- create the path for where this image will live in my image directory
- write the file to the directory

	async function getDirContents(directory: string) {
		// removed what's happening above for brevity
        const foundEmbed = content.match(EMBED_LINK_REGEX);

        if (foundEmbed?.length) {
          for (const image of foundEmbed) {
            const sanitizedImageName = image
              .replace(/^\!\[\[/, "")
              .replace(/\]\]$/, "");
            const path = `${imagesDir}/${sanitizedImageName}`;
            const file = fs.readFileSync(path);
            fs.writeFileSync(
              `${process.cwd()}/public/img/notes/${sanitizedImageName}`,
              file
            );
          }
        }
}

Now for the most important part: writing the Markdown file to my NextJS content directory.

Here's how I process the markdown file to copy into my NextJS content directory:

split the filepath
get the filename from the filepath
make a directory for the folderPath if it doesn't exist yet
rename the file with + for any spaces (this makes it easier for me to process)
write the file to the new path


	async function getDirContents(directory: string) {
		// removed what's happening above for brevity
        const splitPath = filepath.split("/");
        const filename = splitPath.pop();
        if (!filename) {
          throw new Error(`unable to get filename from ${filepath}`);
        }

        fs.mkdirSync(folderPath, { recursive: true });
        const newPath = filename.replace(/\s+/g, "+");

        fs.writeFileSync(
          `${folderPath}/${newPath}`,
          matter.stringify(result.content, result.data)
        );
	}

And voila! This is how all of my Obsidian vault posts (including the one you're reading now) are copied into my NextJS blog repository.

The Script to Copy Obsidian `Published` Notes

In my repository, I have a file at the path scripts/get-notes/index.ts that looks like this:

import fs from "fs";
import matter from "gray-matter";
import { format } from "date-fns";

const EMBED_LINK_REGEX = /!\[\[([a-zA-ZÀ-ÿ0-9-'?%.():&,+/€_! ]+)\]\]/g;

const imagesDir = `${process.env.OBSIDIAN_VAULT_PATH_FROM_HOME}/00 - Meta/02 - Media`;

 const folderPath = `${process.cwd()}/content/posts`;

async function getDirContents(directory: string) {
  const contents = fs.readdirSync(directory);

  for (const filename of contents) {
    const filepath = `${directory}/${filename}`;
    const filestat = fs.statSync(filepath);

    if (filestat.isDirectory()) {
      await getDirContents(filepath);
    } else if (filepath.match(/\.mdx?/)) {
      const content = fs.readFileSync(filepath, "utf-8");
      const result = matter(content);

      if (
        result.data.status &&
        result.data.status.toLowerCase() === "published"
      ) {
        if (!result.data.date) {
          result.data.date = format(filestat.birthtime, "yyyy-MM-dd");
        }
        if (!result.data.modified) {
          result.data.modified = format(filestat.mtime, "yyyy-MM-dd");
        }

        const foundEmbed = content.match(EMBED_LINK_REGEX);

        if (foundEmbed?.length) {
          for (const image of foundEmbed) {
            const sanitizedImageName = image
              .replace(/^\!\[\[/, "")
              .replace(/\]\]$/, "");
            const path = `${imagesDir}/${sanitizedImageName}`;
            const file = fs.readFileSync(path);
            fs.writeFileSync(
              `${process.cwd()}/public/img/notes/${sanitizedImageName}`,
              file
            );
          }
        }

        const splitPath = filepath.split("/");
        const filename = splitPath.pop();
        if (!filename) {
          throw new Error(`unable to get filename from ${filepath}`);
        }

        fs.mkdirSync(folderPath, { recursive: true });
        const newPath = filename.replace(/\s+/g, "+");

        fs.writeFileSync(
          `${folderPath}/${newPath}`,
          matter.stringify(result.content, result.data)
        );
      }
    }
  }
}

async function main() {
  await getDirContents(process.env.OBSIDIAN_VAULT_PATH_FROM_HOME);
  console.log(`Notes copied: ${copied.notes}`);
  console.log(`Images copied: ${copied.images}`);
}

main();

In my package.json, I have a script for prebuild:cpFiles that builds this script with tsc and runs the script:

"scripts": {
    "prebuild:cpFiles": "rm -rf content/* && tsc --outDir scripts/get-notes/dist --p scripts/get-notes && env OBSIDIAN_VAULT_PATH_FROM_HOME=$HOME/my/obsidian/vault/path node scripts/get-notes/dist/index.js"
}

Extra Credit

After running this script manually for a few months, I noticed I could make the workflow a little better.

I schedule and publish every day with a GitHub Action.

Now instead of running it for only published files and having to wait to switch drafts to publish when they're supposed to go out, I run it for anything with published and the GitHub Action does the heavy lifting of deciding if it has been scheduled.

This allows me to write a lot of posts in advance and set it and forget it.

Conclusion

There you have it!

This is how I automate publishing my content from Obsidian to the web.

The script makes it easier to share your interconnected thoughts, maintaining their structure and linked images, and it ensures that your brilliant insights reach those who need them.

Keep exploring, keep automating, and keep sharing!

Additional Resources

If you want to explore more about the technologies used in this process:

Stay curious and happy sharing!