Beliebte Suchanfragen
//

How to secure a GraphQL service using persisted queries

30.4.2020 | 9 minutes of reading time

GraphQL is a rising query language that gives clients the power to ask for what they need and get exactly that in a single request. In theory this leads to effective and flexible client-server communication. But adopting new technology always comes with new challenges. One challenge we recently dealt with is to limit the exposed data of an existing GraphQL server. In this tutorial you will learn how persisted queries can help to improve the security and performance of your application without reducing the great developer experience of GraphQL and its toolings.

What are persisted queries and what makes them secure?

Persisted GraphQL queries modify the communication between a GraphQL client and its server. Instead of a whole query, the client only sends a hash of it to the server. The server has a list of known hashes and uses the related query. This improves the performance as well as the security, as the server only responds to a limited list of queries.

Why do we need to make our queries more secure?

Persisting queries reduces the exposed interface and data to the ones your application really needs. Imagine using a traditional CMS like Drupal via its GraphQL interface. Do you really want all data to be accessible to the public? Including the email addresses of your author accounts? Drupal might provide a way to secure that data at some point in time. But in times of new security vulnerabilities popping up consistently, I would rather move the whole Drupal service into a secured network than patching the next security update in the night it gets released.

What are the limits of this approach?

If you have a public GraphQL API, which is used by multiple clients outside of your responsibility, this is not for you. The whole thing about securing your application this way is that your server already knows which queries it can expect.

Using persisted queries will also not limit the exposed data based on the current user. This is still the responsibility of the server. However, there are ways of extending the solution we are going to build to achieve this. Is this worth another blog post? Feel free to leave a comment below and let me know what problems you currently deal with.

Why another tutorial?

Searching for persisted queries, there are a bunch of tutorials out there. Why do we need a new one? A common approach is to use the persistgraphql library which is archived and hasn’t been updated since 2018. The official approach as of now is to use automatic persisted queries which optimizes the query performance on the fly. But there are no security benefits when automatically hashing every query that pops up on the server. Reducing the allowed queries is now called safe listing and part of the paid Apollo Platform plans. So no more simple way of securing my queries? Hold on.

Let’s get started implementing persisted queries

The idea is slightly different from what is out there using libraries like persistgraphql. But it is pretty straight forward and this image shows what we will do.

Instead of talking directly to the GraphQL service from the client, we will introduce a server in between. This server will handle all GraphQL requests, but only allow known queries. This works as the client and server share the same query sources and will create hashes importing them. Creating the hashes directly from the source code which is shipped with the server ensures that only intended queries get a valid hash. These hashes are used for public communication between client and server. The server knows the correct query for a valid hash and sends it to the GraphQL service. This GraphQL service can now safely limit its access to requests from the new server.

We will start with the client-side by creating a hash and sending it instead of the whole query. Then we will switch over to the server and handle this sent hash. We will benefit from using JavaScript in the client and server as we can use the same approach to create the hashes.

If you want to follow along or jump right into the source code, feel free to check out this repository with the complete example implementation .

Used libraries and tools

We will not reinvent the wheel and use existing open source libraries, if possible.

Create and send the hash in the client

We need to create a unique hash for every query we are going to send. Therefore we use the Apollo link for persisted queries which does exactly that out of the box:

1import { ApolloClient } from "apollo-client"
2import { createHttpLink } from "apollo-link-http"
3import { InMemoryCache } from "apollo-cache-inmemory"
4import { ApolloLink } from "apollo-link"
5import { createPersistedQueryLink } from "apollo-link-persisted-queries"
6 
7const httpLink = createHttpLink({
8  uri: "/graphql",
9})
10const automaticPersistedQueryLink = createPersistedQueryLink()
11const apolloClient = new ApolloClient({
12  link: ApolloLink.from([automaticPersistedQueryLink, httpLink]),
13  cache: new InMemoryCache(),
14})

Instead of sending the full query, Apollo will now only send its hash in the post body:

1{
2  "extensions":{
3    "persistedQuery":{
4      "version":1,
5      "sha256Hash":"fcf31818e50ac3e818ca4bdbc433d6ab73176f0b9d5f9d5ad17e200cdab6fba4"
6    }
7  }
8}

Note: Now you can also switch to get requests, this will help if you want to cache your query results (see Apollo link options ).

One drawback of using the stated Apollo link is that it is done primarily for performance and creating hashes on the fly. A side effect of this is that variables are also part of the hash to provide a unique response for every hash. This will not work for us as we do not always know all the dynamic variables at build time. Therefore we need to modify the hash function to include only static sources. We make use of a webpack loader called graphql-persisted-document-loader which will automatically create a hash for every imported query source.

Make sure to use the graphql-persisted-document-loader before the graphql-tag/loader as we need to create the hash after all required fragments of the query have resolved. (Yes, webpack applies used loaders in reverse order from right to left…)

1module.exports = {
2  module: {
3    rules: [
4      {
5        test: /\.graphql$/,
6        exclude: /node_modules/,
7        use: ["graphql-persisted-document-loader", "graphql-tag/loader"],
8      },
9    ],
10  },
11}

Now we need to tell our Apollo link to use the created hash, which is stored in the query as documentId.

1const automaticPersistedQueryLink = createPersistedQueryLink({
2  generateHash: ({ documentId }) => documentId,
3})

That’s it from the client. Now every sent GraphQL request contains a hash instead of the query.

Note: One side effect of using the Apollo link for persisted queries is that the client automatically resends the request with the query if the server was not able to respond to the hash. You can also create your own link to avoid this, but it might not be a big deal since the second request will just fail as well when queries are not handled by the server. Indeed, you could use this to create a fallback mechanism when switching an existing application to persisted queries. This way we made sure that everything works as expected before blocking requests with queries.

Resolve the hash to its query on the server

Instead of the full query, our GraphQL server now only receives a hash of it. So it needs to know which query belongs to that hash. The trick to get the exact same hash as the client is to use the same mechanism. We will load the same queries using the same webpack loaders.

There are several ways of implementing this concept for the server. You could create a middleware for your GraphQL service, add the functionality to your existing backend, introduce a new microservice for this or extend a server which already combines several GraphQL services using Apollo Federation . We chose the microservice approach, as we work with a legacy GraphQL service which we want to hide completely from the public. That’s the setting for the server part of this tutorial. However, the concept is the same for the other approaches.

The new service uses the same GraphQL loader as the client above. Only how we import the queries is slightly different, as we need to load all possible queries to create a map from hash to its query. This is the code we run when starting the server to import them all. We use a monorepo to import the queries straight from the client folder.

1const queries = require.context("../../client/queries", false, /\.graphql$/)
2 
3let resolvedQueries = []
4queries.keys().forEach((key) => resolvedQueries.push(queries(key)))
5 
6module.exports = resolvedQueries

Note: Using require.context might create problems when executing tests. I wrote a follow-up post about testing this service with jest that also handles avoiding such issues.

The next step is to create the endpoint which uses these queries. We will create a small express service for that.

1const express = require("express")
2const app = express()
3const GraphqlRequestHandler = require("./graphqlRequestHandler")
4const queries = require("./loadQueries")
5 
6app.use(express.json())
7 
8app.post("/graphql", new GraphqlRequestHandler(queries))
9 
10app.listen(8082, function () {
11  console.log("Example app listening on port 8082!")
12})

The queries are loaded once on server startup and passed to the handler. The constructor of this handler will create a map to get the query for every hash and return the actual handler.

1module.exports = class GraphqlRequestHandler {
2  constructor(queries) {
3    this.hashToQueryMap = {}
4    queries.forEach((query) => {
5      this.hashToQueryMap[query.documentId] = query
6    })
7    return (req, res, next) => {
8      this.handle(req, res, next)
9    }
10  }
11  // ...
12}

For every incoming request to this endpoint, the handler will look into the hashToQueryMap and get the corresponding query and its variables. Then he will send this request to the GraphQL service and handle error cases.

1const apolloClient = require("./apolloClient")
2 
3module.exports = class GraphqlRequestHandler {
4  // ...
5  async handle(req, res, next) {
6    const query = this.getQueryForHash(req)
7    const variables = req.body.variables
8 
9    if (query) {
10      try {
11        console.log("sending graphql query")
12        const response = await apolloClient.query({ query, variables })
13        console.log("returning graphql response")
14        res.send(response)
15      } catch (error) {
16        console.log("error while sending graphql query")
17        next(error)
18      }
19    } else {
20      console.log("no matching query for hash found")
21      res.status(400).send()
22    }
23  }
24  // ...
25}

Getting the query for the current request could look like the following.

1module.exports = class GraphqlRequestHandler {
2  // ...
3  getQueryForHash(req) {
4    const persistedQueryHash =
5      req.body.extensions &&
6      req.body.extensions.persistedQuery &&
7      req.body.extensions.persistedQuery.sha256Hash
8    if (persistedQueryHash) {
9      console.log("search query for provided hash " + persistedQueryHash)
10      return this.hashToQueryMap[persistedQueryHash]
11    } else {
12      console.log("no hash provided")
13      return undefined
14    }
15  }
16  // ...
17}

And last but not least, creating the Apollo client to send the actual query to the GraphQL service is shown below. We could also send the query using a simple post request. But using the Apollo client helps to get the same result as expected from the client.

1const { InMemoryCache } = require("apollo-cache-inmemory")
2const { ApolloClient } = require("apollo-client")
3const { createHttpLink } = require("apollo-link-http")
4const fetch = require("cross-fetch")
5 
6const httpLink = createHttpLink({
7  uri: "http://graphql.service.url/",
8  fetch,
9})
10module.exports = new ApolloClient({
11  link: httpLink,
12  cache: new InMemoryCache(),
13})

Additional thoughts

That’s it for now. Are you already using persisted queries? What is holding you back? Here are some things that came to my mind while working on this blog post.

Take care of your deployment

As the query sources are shared between the client and server, it is crucial to deploy updates simultaneously. Meaning, if you change or add a query, make sure to deploy the client and the server together. This makes sure that the hashes stay in sync and the server can respond to each of them.

Some words about Testing this service

Do you want to learn more about testing the result? I published another blog article about the tests I wrote while developing this service . It contains example tests and a few things to keep in mind. Feel free to check it out.

share post

//

More articles in this subject area

Discover exciting further topics and let the codecentric world inspire you.

//

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.