At Rivet Logic, we’ve always been big believers and adopters of NoSQL database technologies such as MongoDB. Now, leading organizations worldwide are using these technologies to create data-driven solutions to help them gain valuable insight into their business and customers. However, selecting a new technology can turn into an over engineered process of check boxes and tradeoffs. In a recent webinar, we shared our experiences, thought processes and lessons learned building apps on NoSQL databases.
The Database Debate
The database debate is never ending, where each type of database has its own pros and cons. Amongst the multitude of databases, some of the top technologies we’ve seen out in the marketing include:
- MongoDB – Document database
- Neo4j – Graph based relationship
- Riak – Key value data store
- Cassandra – Wide column database
When it comes to NoSQL databases, it’s important to think non-relational. With NoSQL databases, there’s no SQL query language or joins. It also doesn’t serve as a drop-in replacement for Relational Databases, as they are two completely different approaches to storing and accessing data.
Another key component to consider is normalized vs. denormalized data. Whereas data is normalized in relational databases, it’s not a necessity or important design consideration for NoSQL databases. In addition, you can’t use the same tools, although that’s improving and technology companies are heavily investing in making their tools integrate with various database technologies. Lastly, you need to understand your data access patterns, and what it looks like from the application level down to the DB.
Also keep in mind your expectations and make sure they’re realistic. Whereas the Relational model is over 30 years old, the NoSQL model is much younger at approximately 7 years, and enterprise adoption occurring within the last 5 years. Given the differences in maturity, NoSQL tools aren’t going to have the same level of maturity as those of Relational DB’s.
When evaluating new DB technologies, you need to understand the tradeoffs and what you’re willing to give up – whether it be data consistency, availability, or other features core to the DB – and determine if the benefits outweigh the tradeoffs. And all these DB’s aren’t created equally – they’re built off of different models for data store and access, use different language – which all require a ramp up.
In addition, keep in mind that scale and speed are all relative to your needs. Understanding all of these factors in the front end will help you make the right decision for the near and long term.
Questions to Ask Yourself
If you’re trying to determine if NoSQL would be a good fit for a new application you’re designing, here are some questions to ask yourself:
- Will the requirements evolve? Most likely they will, rarely are all requirements provided upfront.
- Do I understand the tradeoffs? Understand your must have vs. like to have.
- What are the expectations of the data and patterns? Read vs. write, and how you handle analytics (understand operational vs. analytics DB and where the overlap is)
- Build vs. Buy behavior? Understand what you’re working with internally and that changing internal culture is a process
- Is the ops team on board? When introducing new DB technologies, it’s much easier when the ops team is on board to make sure the tools are properly optimized.
Schema Design Tidbits
Schema is one the most critical things to understand when designing applications for these new databases. Ultimately the data access patterns should drive your design. We’ll use MongoDB and Cassandra as examples as they’re leading NoSQL databases with different models.
When designing your schema for MongoDB, it’s important to balance your app needs, performance and data retrieval. Your schema doesn’t have to be defined day 1, which is a benefit of MongoDB’s flexible schema. MongoDB also contains collections, which are similar to tables in relational DB’s, where documents are stored. However, the collections don’t enforce structure. In addition, you have the option of embedding data within a document, which depending on your use case, could be highly recommended.
Another technology to think about is Cassandra, a wide column database where you model around your queries. By understanding the access patterns, and the types of questions your users are asking the DB, then you can design your schema to be more accurate. You also want to distribute data evenly across nodes. Lastly, you want to minimize partition (groups of rows that share the same key) reads.
MongoDB has a primary-secondary architecture, where the secondary would become the primary if it ever failed, resulting in the notion of never having a DB offline. There are also rights, consistency, and durability, with primaries replicating to the secondaries. So in this model, the database is always available, where data is consistent and replicated across nodes, all performed in the backend by MongoDB. In terms of scalability, you’re scaling horizontally, with nodes being added as you go, which introduces a new concept of sharding, involving how data dynamically scales as the app grows.
On the other hand, Cassandra has a ring-based architecture, where data is distributed across nodes, similar to MongoDB’s sharding. There are similar patterns, but implemented differently within technologies. The diagram below illustrates architectural examples of MongoDB and Cassandra. All of these can be distributed globally, with dynamic scalability, the benefit being you can add nodes effortlessly as you grow.
NoSQL Data Solution Examples
Some of the NoSQL solutions we’ve recently built include:
Data Hub (aka 360 view, omni-channel) – A collection of various data sources pooled into a central location (in this case we used MongoDB), where use cases are built around the data. This enables new business units to access data they might not previously have access to, empowering them to build new products, understand how other teams operate, and ultimately lead to new revenue generating opportunities and improved processes across the organization
User Generated Content (UGC) & Analytics – Storing UGC sessions (e.g. blog comments and shares) that need to be stored and analyzed in the backend. A lot of times the Document model makes sense for this type of solution. However, as technologists continue to increase their NoSQL skill sets, there’s going to be an increasing amount of overlap of similar uses cases being built across various NoSQL DB types.
User Data Management – Also known as Profile Management, and storing information about the user, what they recently viewed, products bought, etc. With a Document model, the flexibility really becomes powerful to evolve the application as you can add attributes as you go, without the need to have all requirements defined out of the gate.
When talking about successful deployments, some of the lessons learned we’ve noticed include:
- Schema design is an ongoing process – From a Data Hub perspective, defining that “golden record” is not always necessary, as long as you define consistent fields that can be applied everywhere.
- Optimization is a team effort – It’s not just the developer’s job to optimize the schema, just like it’s not just the Ops team’s job to make sure the DB is always on. NoSQL is going to give you tunability across these, and the best performance and results
- Test your shard keys (MongoDB) – If sharding is a new concept for you, make sure you do your homework, understand and validate with someone that knows the DB very well.
- Don’t skimp on testing and use production data – Don’t always assume that the outcome is going to be the same in production.
- Shared resources will impact performance – Keep in mind if you’re deploying in the cloud that shared resources will impact distributed systems. This is where working with your Ops team will really help and eliminate frustrations.
- Understand what tools are available and where they are in maturity – Don’t assume existing tools (reporting, security, monitoring, etc.) will work in the same capacity as with Relational DB’s, and understand the maturity of the integration.
- Don’t get lost in the hype – Do your homework.
- Enable the “data consumer” – Enable the person that’s going to interact with the DB (e.g. data analyst) to make them comfortable working with the data.
- JSON is beautiful
To summarize, education will eliminate hesitation, and don’t get lost in the marketing fluff. Get Ops involved, the earlier and more often you work with your Ops team, the easier and more successful your application and your experience with these technologies will be. Lastly, keep in mind that these are just DB tools, so you’ll still need to build a front end.
Click here to see a recording of the webinar.
Click here for the webinar slides.