If you’ve been keeping up, you’ll remember I wrote a few tutorials around converting your MongoDB powered Node.js applications to Couchbase.  These included a MongoDB Query Language to N1QL tutorial as well as a Mongoose to Ottoman tutorial.  These were great migration tutorials from an application perspective, but they didn’t really tell you how to get your existing Collections from MongoDB imported as JSON files.

So, in this tutorial, we’re going to explore how to import MongoDB collection data into Couchbase.  The development language doesn’t really matter, but Golang is very fast and very powerful making it a perfect candidate for the job.

Before we worry about writing a data migration script, let’s figure out a sample dataset that we’re working with.  The goal here is to be universal in our script, but it does help to have an example.

The MongoDB Collection Model

Let’s assume we have a Collection called courses that holds information about courses offered by a school.  The document model for any one of these documents might look something like the following:

Each document would represent a single course with a list of enrolled students.  Each document has an id value and the enrolled students reference documents from another collection with matching id values.

With MongoDB installed you have access to its mongoexport utility.  This will allow us to export the documents that exist in any Collection to a JSON data file.

For example, we could run the following command against our MongoDB database:

The database in question would be example and we’re exporting the courses collection to a file called courses.json.  If we try to open this JSON file, we’d see data that looks similar to the following:

Each document will be a new line in the file, however it won’t be exactly how our schema was modeled. MongoDB will take all document references and wrap them in an $oid property which represents an object id.

So where does this leave us?

Planning the Couchbase Bucket Model

As you’re probably already aware, Couchbase does not use Collections, but instead Buckets.  However, Buckets do not function the same as Collections.  Instead of having one Bucket per every one document type like MongoDB does, you’ll have one Bucket for every application.

This means we’ll need to make some changes to MongoDB export so it makes any kind of sense inside of Couchbase.

In Couchbase it is normal to have a document property in every document that represents the type of document it is.  Lucky for us we know the name of the former Collection and can work some magic.  As an end result our Couchbase documents should look something like this:

In the above example we have compressed all the $oid values and added the _id and _type properties.

Developing the Golang Collection Import Script

Now that we know where we’re headed, we can focus on the script that will do the manipulations and loading.  However, let’s think about our Golang logic on how to accomplish the job.

We know we’re going to be reading line by line from a JSON file.  For every line read we need to manipulate it, then save it.  Reading from a file and inserting into Couchbase are both blocking operations.  While reading is quite fast, inserting a single document at a time in a blocking fashion for terabytes of data can be quite slow.  This means we should start goroutines to do things in parallel.

Create a new project somewhere in your $GOPATH and create a file called main.go with the following code:

The above code is merely a blueprint to what we’re going to accomplish.  The main function will be responsible for starting several goroutines and reading our JSON file.  We don’t want the application to end when the main function ends so we use a WaitGroup.  This will prevent the application from ending until all goroutines have ended.

The worker function will be each goroutine and it will call cbimport which will call compressObjectIds to swap out any $oid with the compressed equivalent.  By compressed I mean won’t include a wrapping $oid property.

So let’s look at that main function:

The above function will take a set of command line flags that will be used in the configuration of the application.  The connection to the destination Couchbase Server and Bucket will be established and the input file will be opened.

Because we’re using goroutines, we need to use channel variables to avoid locking scenarios.  All lines read will be queued up in the channel where each goroutine will read from.

After spinning up the goroutines, the file will be read and the channel will be populated.  After the file is completely read, the channel will close.  This means that when the goroutines read all the data, the goroutines will be able to end.  We’ll be waiting until the goroutines end before ending the application.

Now let’s take a look at the worker function:

The MongoDB Collection name will be passed to each worker and the worker will remain functional in a loop until the channel closes.

For every document read from the channel, the cbimport function will be called:

Each line of the file will be a string that we need to unmarshal into a map of interfaces.  We know the Collection name, so we can create a property that will hold that particular name.  Then we can pass the entire map into the compressObjectIds function to get rid of any $oid wrappers.

The compressObjectIds function looks like the following:

In the above we are essentially looping through every key in the document.  If the value is a nested object or JSON array, we recursively do the same thing until we hit a string with a key of $oid.  If this condition is met we make sure it is the only key in that level of the document.  This will let us know that it is an id that we can safely compress.

Not so bad right?

Running the MongoDB to Couchbase Importer

Assuming you have the Go programming language installed and configured, we need to build this application.

From the command line, you’ll need to get all the dependencies.  With the project as your current working directory, execute the following:

The above command will get any dependencies found in our Go files.

Now the application can be built and ran, or just ran.  The steps aren’t really any different, but we’re just going to run the code.

From the command line, execute the following:

The above command will allow us to pass any flags into the application such as Couchbase Server information, number of worker goroutines, etc.

If successful, the MongoDB export should now be present in your Couchbase NoSQL database.

Conclusion

You just saw how to work with Golang and MongoDB to import your collections data into Couchbase.  Sure the code we saw can be optimized, but from a simplicity standpoint I’m sure you can see what we were trying to accomplish.

Want to download this importer project and try it out for yourself?  I’ve gone and uploaded it to GitHub, with further instructions for running.  Keep in mind that it is unofficial and hasn’t been tested for massive amounts of data.  Treat it as an example for learning how to meet your data migration needs.

If you’re interested in learning more about Couchbase Server or the Golang SDK, check out the Couchbase Developer Portal.

Author

Posted by Madhuram Gupta

Leave a reply