GraphQL at Treebo

Ankita Masand
Treebo Tech Blog
Published in
15 min readOct 23, 2020

--

GraphQL was publicly released in 2015 and since then it has been a most-talked-about subject across the tech community. It was introduced in ReactJs Conf 2015 where Daniel Schafer and Jing Chen explained how Facebook is using GraphQL for data fetching in some of its frontend applications. The idea of declaratively specifying the data requirements and sending as much payload as required over the wires looked interesting and people started experimenting with this new methodology of communicating with the server. GraphQL is a specification that explains how to build a strong type system, validate and execute queries.

Some of the big companies like Twitter, GitHub, Shopify, and many more started using GraphQL in production and their experiences with GraphQL are worth reading! Treebo started using GraphQL in mid-2018 while we were building a property management system for hotel chains. We’re now extensively using GraphQL in two of our prime products (Property management system, Discounts & Pricing Configuration Dashboard). Lately, we’ve been trying to make a multi-tenant property management system and there were quite some challenges that we faced along the way. In this article, I’ll first explain a bit about the property management system — what is it and how are we using GraphQL as an aggregate layer to build a smooth experience from booking to checkout. I’ll then cover the most interesting part — how we built a multi-tenant property management system.

Let’s get going!

Hotel Property Management Systems (PMS)

Hotel Property Management System is a software used by a hotel or a group of hotels to manage day-to-day hotel operation activities such as booking reservations, room prices, check-in/check-out, handling add-ons to a booking (food, extra bed, early check-in/checkout facilities), invoicing, cash management (inflow and outflow). It is also used for checking the consolidated reports related to bookings, guest, earnings, etc. Treebo’s PMS is a suite for hotel chains to manage these daily operations smoothly.

Treebo’s Property Management System Architecture

We are following a domain-driven-design microservices architectural pattern for implementing back-end services. GraphQL talks to different microservices and gets the response as requested by the front-end clients.

The website search results page, Property Management System, and Discount/Pricing dashboard are the clients of the GraphQL layer. These clients are not aware of the back-end microservices.

The microservices present in the above image are self-explanatory but let me put a brief description of these services to make things more clear!

  • Booking Reservations Service: It takes care of all the booking-related data. It stores hotel, stay dates, addons, guests related to bookings.
  • Payments/Pricing Service: As the name suggests, it handles the pricing of rooms, extra charges on addons like extra bed, food, etc. It decides prices based on some input parameters like the no of occupants, stay dates, booking platform, etc.
  • Notification Service: It is mainly used for notifying users via Email, SMS on the status of the booking.
  • Catalog Service: It stores all the hotel-related information such as the name, location, room types of a hotel.
  • Inventory Throttling Service: It takes into account the number of room inventories available based on the stay dates and various other parameters.

There are some other services as well. We’d need to understand just these services for the sake of this blog.

Let’s now take an example to understand how a booking is created and saved in the system:

The data displayed on the above user interface comes from multiple microservices.

The Booking Channel and Sub Channel under the Booking Type section come from the Catalog service; Room inventory numbers come from Inventory throttling service for the selected dates; Room prices come from the Pricing service.

After selecting a room, we ask for the user details like name, phone, and email. The user details are saved in the User Profile service; booking information is stored in the Booking Reservations service. The front-client that is involved in creating this booking at the forefront doesn’t know about these different microservices. It declaratively specifies the information it needs from the back-end and sends a JSON payload of the aggregated booking information for creating a booking. GraphQL handles the rest!

Here’s a simple query to fetch room types and their prices:

The query roomTypePriceAndAvailability takes in hotelId, occupancies, dates, channel and UTM parameters to fetch the room types and their prices.

The fields specified in the above query are resolved by calling the appropriate service. These fields are resolved parallelly and the execution time of the query depends on the slowest call.

We’re getting the hotel-related information from the catalog, inventory information from the inventory throttling service, and pricing of the room stay from the pricing service. This whole data is being collated in just one API call from the browser!

Here are the clear benefits of getting this information the Graphql way:

Strong Type System: The types of the fields are specified so the GraphQL server knows what a front-end client is expecting on the other end.

The abstraction of Microservices from the front-end client: The front-end clients don’t have to bother about different microservices on the back-end.

Avoiding multiple HTTP calls: Connecting to multiple services at the same time instead of making n HTTP calls to these services from the browser.

How is Treebo using GraphQL?

This property management system works all well for Treebo as a hotel chain.

Treebo is now expanding its horizon and is now offering a SaaS solution to multiple hotel chains. There is a lot of hard work involved behind the scenes to make PMS a multi-tenant system. The challenges involved in the back-end services are intriguing and definitely deserve a blog of its own. In this article, we’ll discuss some of the challenges that we faced and a few techniques that we used to build a tenant-driven GraphQL layer and a multi-tenant front-end architecture.

Let’s first understand what does it mean to have a multi-tenant system!

Multi-tenant Systems

Multi-tenancy means a single instance of the software serves multiple customers.

For example, you’ll see Treebo’s PMS on hosts abcd.com as well as on xyz.com. abcd and xyz are two different tenants and both are unaware of each other’s existence. The software code and functionality are the same for both these tenants. The multi-tenant system maintains data isolation across different tenants. In the later part of the article, we’ll see how we built a tenant-driven GraphQL layer to make Treebo’s PMS a multi-tenant system.

Tenant A, B, C, and D are different tenants using the same GraphQL layer. Tenant A has multiple applications but only App1 is multi-tenant, in this case as this app is also being used by Tenant B, C, and D.

In the sense, PMS (Property Management System) is a multi-tenant application being used by all of the tenants and GraphQL gracefully handles this multi-tenant PMS as well as the other applications hosted inside tenant A.

Here’s the GraphQL stack being used at Treebo:

GraphQL Tech Stack

We’re using apollo-server-express for implementing the Graphql server. redis as a cache layer and amqplib for implementing a custom pubsub asyncIterator`.

We’ll look at the implementation for the caching layer and the subscription part later in the article.

Let’s first understand what it means to have a tenant-driven GraphQL layer!

Tenant-driven GraphQL Layer

Let’s first take an example to understand what it means to have a tenant-driven GraphQL layer!

Side-note: the GraphL server is using redis for caching and rabbitMQ for pubsub-based subscriptions.

The property management system has two tenants (hotel chains) — Pearl and Platinum.

The Pearl tenant will have a different redisURL, rmqURL than the platinum tenant. There are also data source-specific configurations for different tenants. For example, the pricing data source should work for Pearl but it should be disabled for Platinum.

The GraphQL server instance should be aware of different tenants beforehand!

How does the GraphQL server know that it has to use a particular redisURL or rmqURL based on a particular tenant?

Can we get rid of the complexity and just have as many GraphQL servers as the number of tenants?

This would mean having 10 GraphQL servers for 10 tenants and doing customizations in each of these servers based on the requirements of a tenant. Going optimistically, the business can grow to accommodate 50 tenants which means having 50 GraphQL servers. Most of the time, the only change in these servers would be the data isolation part and sometimes related to having different data source configurations.

Does it make sense to keep different GraphQL servers just to meet these requirements? This is clearly not a scalable solution and 1 GraphQL server can handle multiple tenants efficiently! Keep reading to know how we did this!

Maintaining Tenant Configurations

We store some of the project-level configurations in AWS Secrets Manager. For example, the redis and rabbitMQ configurations are being fetched from AWS Secrets Manager during the build time.

We use webpack for bundling the assets and building the project configuration.

Here’s a simple problem statement for having tenant-based configurations:

  • Fetch tenants
  • Fetch configurations (Redis, RabbitMQ) for each of these tenants at the build time
  • Create a tenant config map and inject this global level __CONFIG__ variable to be accessible across the project

The above steps should be executed only once during the build time of the project. There is no point in calling the Secrets Manager or building the config object on every incoming request.

Each of these steps is executed while we’re building the webpack server configuration. Generally, webpack.server.config.js is a simple JavaScript file and it returns an object as:

We defined the global variables using the webpack plugin DefinePlugin. Look at how the global constant __CONFIG__ is being injected here and it is available as a global variable to the application. Our aim is to include the tenant level configurations in this __CONFIG__ object and then we’re good to go!

Let’s break the problem into smaller chunks and solve them one by one.

We’re going to build the configuration and do processing inside as a async function. There should be a way to return a function from module.exports in the above webpack configuration file webpack.server.js. You can check out Configuration Types in Webpack. It is mentioned that we can export a function and can also export a promise. This solves the very first problem of finding the room to do the computations and building the config object! The tenant-level computations will be done inside a function and we’ll return the webpack config object returned from this function. Let’s move ahead.

The next step is to get the list of tenants for which we’re targetting to build this configuration object. We make a call to the tenants backend service and get a list of tenants. Then for each of these tenants, we get the configurations from the AWS Secrets Manager. Sleek and Simple!

We’re first fetching tenants from a tenantsmicroservice. The function getTenantBasedConfig gets the static configurations from a file and then fetches tenant-level configurations from the secrets manager. It then merges both these configurations and the final config object (__CONFIG__) would be of this shape:

Please note: the __CONFIG__ object is getting created at the build time.

Perfect! We understood how global level tenant-based __CONFIG__ object is created and injected into the application scope.

The next step is to understand how the GraphQL server identifies a tenant from the incoming request and accordingly uses the configurations of that particular tenant.

Identifying tenant from the incoming request to the GraphQL server

Here’s how we build the GraphQL server using the apollo-server-express package:

The function getHostname is a simple function that sanitizes the URL, removes special characters, and returns the hostname. This hostname is used for getting the tenantId and then we fetch the tenant details based on this tenantId.

We’ll see later in the article how are we using caching (Redis) to first lookup the tenant information in Redis and then making a ride to the server!

As can be seen from the above code snippet, the tenant object is also being returned along with the other common properties from the context function.

This context object is available throughout the request lifecycle until the response is sent back to the client. The context object is accessible to the schema resolvers and it can be used for resolving the schema fields as per the context value.

Please note: The front-end client doesn’t directly send tenantId to the GraphQL server.

This is how the GraphQL server builds the request context object and makes the tenant data available to all the schema resolvers.

If you are following up, you might have this question in mind:

GraphQL is meant to serve different applications. It is not necessary that all of these applications are multi-tenant and some of the applications don’t even bother about the tenant information. How is this GraphQL layer handling those front-end clients?

Let’s drill down and get the answer to the above question!

Apart from the multi-tenant property management system, there are other front-end clients such as website, mobile-site, React-native app, pricing dashboards, etc that use the GraphQL layer. These front-end clients are not multi-tenant and they don’t really care about the information of the tenant!

Please note: these front-end clients are not multi-tenant but they all belong to the tenant treebo. To make things simple, the tenant treebo has multiple hosts (or applications) - website, m-site, property management system, etc while the other tenants have a single host - property management system. The applications should be aware of the parent tenant and should behave accordingly.

The below JSON snippet should make more sense:

Both the tenants treebo and pearl have a PMS application and thus pms is multi-tenant. The application web is only used by the tenant treebo and so it is not called multi-tenant in the true sense!

Now that the tenant level configurations and the host-tenant mapping are clear, let’s move onto the next part. In the next section, we’ll understand how we pass the tenant information to different microservices on the back-end

Passing down tenantId to back-end microservices

This is the easiest of all!
We append X-Tenant-Id header to all the requests made to the microservices. The willSendRequest function of the RESTDataSource class defined in the module apollo-datasource-rest makes the job easy for us!

We append this tenant header and few other custom headers in this function. This function is a hook that gets called before any request is initiated from the GraphQL server.

The back-end microservices responds as per the X-Tenant-Id header. The authentication service will authenticate the incoming user credentials based on the tenantId header. We have the replicas of database schemas for different tenants. The tenantId header helps in directing to the correct schema and the database. If the tenantId is pearl, the authentication back-end service will search for the user credentials in the pearl databases.

Now if tenant-specific things are making sense, let’s move onto the interesting and the difficult challenge!

Handling the Redis caching layer based on a Tenant

There should be clear data isolation for each of the tenants. We cannot cache data of the multiple tenants in the same caching layer. This is how ApolloServer is instantiated in the normal scenarios:

It takes in a property called cache. By default, the apollo server uses an LRU cache at the application level. We can also provide a custom caching layer. The Apollo server uses KeyValueCache interface for the caching layer. Here’s how this interface looks like:

We have a custom implementation for redisCachePool with the shape of KeyValueCache interface. This redisCachePool class is a pool of Redis instances and it picks up appropriate Redis instance based on the tenantId in the incoming request. Let’s understand this in detail!

const redisCachePool = new RedisCachePool();

We first instantiate RedisCachePool and this is done while we’re creating the server. The object redisCachePool is served to the apollo GraphQL server.

We create tenant-based Redis instances inside the class RedisCachePool. This is similar to how worker threads are being created in the MySQL pool. If your application allows a maximum of 10 worker threads to work in parallel. These 10 worker threads would open different connections and will perform the respective queries.

The RedisCachePool builds up the connections for the tenant Redis instances and stores them in a class instance variable.

This class RedisCachePool has implementations of the functions get and set as defined in the KeyValueCache interface. The keys are always stored in the format ${tenantId}::${keyName}. The get function gets the keyName in the same format and a helper function extracts out the tenantId from this keyName. This tenantId is then used to identify the appropriate Redis instance to look into!

The set function also operates in the same way. It gets the key param in the format ${tenantId}::${keyName} and that’s how the RedisCachePool knows where to keep this value!

This difficult problem of maintaining data isolation is now boiled down to identifying tenantIds from the key names and picking up the correct Redis instance from the pool.

How do we format key names to include tenantIds?

The RESTDataSource class has a function called cacheKeyFor. This function is used for building the key to be used for caching! We can override this function in the data source to return the cacheKey in the format that we need! Simple!

All we have to do is append tenantId to the cacheKey and the rest is being handled by the apollo server. The keys are then sent properly to the get and the set functions.

This is how you can implement a custom RedisCachePool and have multiple Redis instances in the pool. You can pick up the appropriate Redis instances based on the requirement of the request.

If all this is making sense until now, let’s look at a few business-specific use-cases that were required to be built for Treebo’s PMS. It was fun to work on these use-cases so I wanted to include them in this article. It might help you in solving something similar to our problem statement!

Support for disabled data sources and mocking a data source schema

The pricing microservice returns the pricing for hotel rooms, addons such as food, extra bed, early check-in, and late checkout charges. The pricing service basically takes care of the prices of all the entities based on the requested parameters. For example, it returns the prices of hotel rooms based on the start and the end date of a booking, booking platform, and room type.

This pricing is more of a Treebo specific use-case and the tenant (let’s call it Pearl) wanted to build a pricing data source of its own. But that would take some time and they really don’t want to miss out on the other features of our SaaS offering because of the custom pricing data source. The prices will be added manually in the textbox instead of fetching them from the back-end.

Here’s the problem statement:

The pricing data source should be disabled for the pearl tenant and it should work the same for Treebo and also all the other tenants.

What’s the challenge here?

We don’t want to add dozens of for loops on the front-end client to handle this tenant-specific behavior. The Hotel type schema is of the type:

The front-end client (PMS) queries postTax at a lot of places wherever we have to show pricing for rooms. We cannot just return null for postTax for the other tenant as it is clearly a non-null field.

One option is to write tenant-based queries on the front-end client. The tenant front-end client will not request the postTax field and that solves the problem. This handling was to be done at a few other front-end components as well where we were using any of the fields from the pricing data source.

This won’t still solve the problem that easily! The front-end components expect the hotel object to be in that specific form (which has pricing fields as well!). We cannot change these conditions based on tenant behavior. And moreover, if this was to be done for other front-end clients as well, we’ll have to repeat the same logic in each of these front-end clients.

Shouldn’t there be just a simple solution of just disabling this data source and solving this for once and all?

No! The schema expects this to be a non-nullable field and if the front-end clients request this field, we’ll have to return something from the GQL layer.

But we can do something! We can return a mocked schema response for pricing. The TenantDirective does exactly that! The field-level directive in the field priceAcrossDates intercepts the request before it goes to the resolver. It checks in the tenant-level config if the pricing data source should be disabled and it accordingly sends the mocked response from this resolver instead of calling the back-end pricing service. Otherwise, it just forwards the request to the appropriate priceAcrossDates resolver. This simple directive is extensible and can be used in any field/schema easily. It makes things a lot easier as we don’t have to modify front-end queries at multiple places based on the tenant.

Next in the list is handling tenant-based GraphQL Subscriptions. This is the toughest of the lot but an interesting use-case to look at!

Handling GraphQL Subscriptions

We’re using RabbitMQ for implementing pubsub-based subscriptions.

Subscription is a vast topic and we’ll cover the challenges that we faced while building subscriptions and also making them work with the multi-tenant system in the next set of articles.

Conclusion

In this article, we learned about property management systems. We also looked at some of the challenges that we faced while building a tenant-driven GraphQL layer.

The awesome engineers who contributed to the journey of building a multi-tenant GraphQL layer and helping us progress on our GraphQL journey — Dhwanil Vyas, Dhanya Rai, Dhruv Patel, Jainam Shah, Kapil Matani, Praneeth Kumar, Gaurav Kinra, Varun Nahata, Prateek Mittal, and myself.

Thanks to Rohit Jain, Mayank Khandelwal, and Arun Midha for helping with the infrastructural decisions around Multi-Tenancy.

--

--