Understanding Dataloaders

Dhruv Patel
Treebo Tech Blog
Published in
4 min readMar 29, 2021

--

If you’re starting with Graphql or have already started with Graphql and working on optimizing your queries you would’ve definitely come across a problem known as the “N+1 problem”.

Let’s see how can Dataloader help us solve the “N+1 problem”.

What are Dataloaders?

A Dataloader is a basic utility that helps in batching API calls returning an array of promises which resolves an array of values. We will look more into this visually to understand the concept in a better way going forward.

This article would be based on a real-world problem faced by us after the integration of a feature called “contactless check-in” with Treebo’s PMS which is a suite for hotel chains to manage daily operations like booking reservations, room prices, check-in/check-out, handling add-ons to a booking (food, extra bed, early check-in/checkout facilities), invoicing, cash management (inflow and outflow), etc smoothly.

The Problem

So let us first have a look at the actual problem for which Dataloaders were introduced.

const typeDefs= gql`type WebCheckIn {
id: ID!
guests: [ID!]
status: String
}
type Booking {
id: ID!
referenceNumber: String
roomStays: [BookingRoomStay!]!
source: BookingSource!
status: String!
version: ZeroPositiveInt!
webCheckIn: WebCheckIn
}
type Hotel {
id: ID!
name: String!
phone: Phone
email: String
boookings:[Booking]!
}
`

This is a small snippet of Hotel, Booking and WebChecking schema. Each hotel has an array of type Booking and each booking can have a web-check-in.

Normal resolvers loading data in Graphql

In the above image, we have a Hotel being fetched by a normal SQL query followed by a bulk search query for N number of bookings where N is the limit i.e. number of bookings to be fetched. Once a booking is fetched its corresponding resolvers will come into action and start fetching individual data as visible in the image above.

Resolvers are always executed in isolation and that is why resolvers would not have any idea as to how should it fetch data efficiently. Another thing it won’t know is that if the data it is trying to fetch is already fetched earlier or not.

Now imagine a number of bookings in a hotel on scale and having to fetch web-checkin for one individual booking like this would be quite expensive.

The Solution

We already know the solution here. But we will see how exactly.

Ideal flow for better performance

A simple solution to this problem would be to batch those individual SQL calls which were used to fetch data from the database and convert them into one single SQL query which would look something like this.

Select * FROM WebCheckIns WHERE booking_id IN (B1,B2,B3)

What we are actually doing here is instead of firing one query for one booking we wait for all the booking ids we would want to fetch web-checkins for and fire just one query to get all web-checkins for multiple booking.

Let’s see how this would translate in terms of code using Dataloader.

// WebCheckInLoader.jsimport Dataloader from 'dataloader';const batchWebCheckIns = async (ids) => {/* ids here would be an array because data loader will combine each booking id in an array */const webCheckIns = await context.getBulkWebcheckIns({ids});const webCheckInMap = {};/* Mapping data using booking_id so that loader returns correct data corresponding to correct booking id when needed */webCheckIns.forEach((res) => {
webCheckInMap[res.booking_id] = res;
});
return ids.map((id) => webCheckInMap[id]);
};
export const webCheckInLoader = () => new Dataloader(batchWebCheckIns);

Okay, what are we doing here?

A Dataloader expects you to provide a batching function that accepts an array of keys.

As already mentioned earlier, the Dataloader instance’s lifetime is for a single request and for that one frame of execution it will group all the ids given to it and convert them into an array and execute the code written in the batching function just once.

Another question that might come up is why do we need a mapping object here?

Dataloader has two constraints mentioned in their documentation.

  1. The array of values must be of the same length as the array of keys
  2. Each index in the Array of values must correspond to the same index in the array of keys.

The second one is more important because we can never know the order in which data would be returned and to maintain the correct order we need to map the response against a key that is unique.

Once our loader is ready, remember to initialize in your server’s entry file where you have your code for context creation. Context would be the place from where we’ll be able to use loader util at resolver level.

//server.jsconst server = new ApolloServer({
...typeDefs,resovlers etc,
context: () => ({
...data you want across globally,
webCheckInLoader: webCheckInLoader()
})
})

Let’s see how can we use our Dataloader at the resolver level now.

//Booking.js/* This is just an example resolver of the Booking schema mentioned in the problem section */const resolver = {
Booking: {
id: (booking) => booking.booking_id,
webCheckIn: (booking,args,context) => context.webCheckInLoader.load(booking.booking_id)
(...other items to be resolved)
}

If you’ll have a close look at this resolver we are not using the loader function directly here. We are using .load() method here. Each .load() saves the key and returns a promise. At the tick of the event loop, it takes all the keys and then passes them into the batch function. The batch function resolves to its values which are then stored with the corresponding keys. Finally, each promise resolves to the value of its given key.

Conclusion

In this article, we learned how we can optimize our resolvers using Dataloaders to load data efficiently.

The awesome engineers who worked on identifying this issue and solving it-Praneeth Kumar, Kapil Matani and myself.

--

--