Beliebte Suchanfragen
//

Count your queries! Repository integration tests with Hibernate Statistics

7.8.2023 | 6 minutes of reading time

If you are using Spring Data JPA as a data access framework, Hibernate is almost certainly hiding under the hood. And although this setup takes a lot of work off your hands by doing a lot of awesome things, the final outcome should better be checked. In this article we want to show you how to avoid pitfalls such as the N+1 problem by counting executed queries within automated tests using the Hibernate Statistics.

Still a thing: The N+1 problem

The N+1 problem describes the behavior of an ORM tool such as Hibernate (or similar data access frameworks, it's not bound to this specific product) to reload related entities by executing “unexpected”, additional SQL queries when loading data that was supposed to be loaded together in one go.

Of course, loading data little by little can be intentional within a lazy loading context. For example, when related entities are associated with the fetch type LAZY, the additional information should be reloaded only when it’s needed, which inevitably requires additional queries in the aftermath. However, there might be a good reason to load the linked data upfront. Therefore, the fetch type EAGER is used and one could expect all data being loaded together, for example using an SQL join to avoid additional database round trips.

Sadly, this is not the case when using abstractions such as Spring Data JPA, its feature to derive queries by repository method names and Hibernate in combination. Due to the lack of Hibernate's opportunities for optimization within this constellation, a lot of unwanted queries are executed, for example, one query to fetch the initial data collection (having the size N) and another query for each of the N collection items to retrieve the related items. After all, N + 1 queries hit the database although only a single query was expected. In the worst case this can have a huge performance impact.

An example

Let's suppose we have a very simple example without any practical relevance, serving only for demonstration purposes. We have a primary entity with a one-to-one relationship (whose default fetch type is EAGER) to a secondary entity.

1@Entity
2public class PrimaryEntity {
3
4    @Id
5    private Long id;
6
7    private String category;
8
9    @OneToOne
10    private SecondaryEntity secondaryEntity;
11
12    public SecondaryEntity getSecondaryEntity() {
13        return secondaryEntity;
14    }
15
16}

Furthermore, we use a very simple Spring Data repository to retrieve primary entities by its category.

1interface PrimaryEntityRepository extends JpaRepository<PrimaryEntity, Long> {
2
3    List<PrimaryEntity> findByCategory(String category);
4
5}

If we now invoke this method – having the eager fetching in mind – we might expect the ORM framework to use a join to query the data as a whole. Surprisingly Hibernate performs a lot more SQL queries.

Take a look inside!

To check SQL statements generated by Spring Data JPA you can activate the SQL log.

spring.jpa.show-sql=true

Executing the findByCategory(String category) method of our example repository which for instance returns a list of three primary entities, you will see the following:

Hibernate: select p1_0.id,p1_0.category,p1_0.secondary_entity_id from primary_entity p1_0 where p1_0.category=?
Hibernate: select s1_0.id,s1_0.additional_data from secondary_entity s1_0 where s1_0.id=?
Hibernate: select s1_0.id,s1_0.additional_data from secondary_entity s1_0 where s1_0.id=?
Hibernate: select s1_0.id,s1_0.additional_data from secondary_entity s1_0 where s1_0.id=?

But why are there four database queries? We configured an eager fetching strategy, didn’t we?

This happens because Spring Data JPA leverages the JPA Criteria API to access data and due to its explicit fetch plan Hibernate cannot add the crucial JOIN FETCH to this plan afterwards. So for every of our three primary entities another query is triggered to fetch the related secondary entity.

We are on our own. JPQL to the rescue

Circumventing this problem is quite simple. To fix it, we just have to neglect the derived Spring Data queries by using a JPQL directly and adding the missing JOIN FETCH by ourselves. Let’s add a second method which retrieves the same data but this time at one stroke.

1@Query("FROM PrimaryEntity p JOIN FETCH p.secondaryEntity s WHERE p.category=:category")
2List<PrimaryEntity> findByCategoryJoinFetch(String category);

After that, the data is fetched together in one single query by joining the two tables:

Hibernate: select p1_0.id,p1_0.category,s1_0.id,s1_0.additional_data from primary_entity p1_0 join secondary_entity s1_0 on s1_0.id=p1_0.secondary_entity_id where p1_0.category=?

How to avoid regression?

Checking the queries via logfiles is a good start. But although it was easy to fix this issue, it would be more sustainable to ensure this behavior with an automated test, wouldn't it? One of the simplest solutions would be to count the queries generated by the framework and compare them with our expectation. And this is where Hibernate Statistics enters the stage.

The Hibernate Statistics interface

When you're using Hibernate and you have to gain insights about the database footprint of your application, there is a nice tool for that. The Hibernate Statistics interface is a very powerful helper you should definitely get familiar with. It offers a lot of features such as second-level cache and concurrency-control metrics, but that is beyond what we would need for our simple query count test. Luckily, it also provides 'simpler' information like the amount of executed queries and additional fetch operations. That is exactly what we need. And the best part is, compared to other libraries such as the DataSourceProxy shown in our article about Hibernate caching, it's built-in.

You can just activate Hibernate Statistics for your Spring Boot project as follows:

spring.jpa.properties.hibernate.generate_statistics=true

After that, the org.hibernate.stat.Statistics interface – obtained via an instance of the Hibernate SessionFactory – can be used to write some simple integration tests.

Among others, it offers the following extremely useful methods to count the executed queries.

1/**
2* The global number of executed queries.
3*/
4long getQueryExecutionCount();

This method tells us how many queries are executed to load our primary entities.

1/**
2* The global number of entity fetches.
3*/
4long getEntityFetchCount();

An indicator if there are additional queries to fetch related secondary entities.

1/**
2* The number of prepared statements that were acquired.
3*/
4long getPrepareStatementCount();

Provides information about how many statements are executed at all. In other words, the sum of the both methods above.

With that tooling, it is very easy to write tests for our repositories. At the end of the day, in our use case, the method getQueryExecution() should always return 1 while getEntityFetchCount() should always return 0. And to rule out anything else happening on the database, getPreparedStatementCount() should sum to both. A corresponding test could therefore look like this.

1@DataJpaTest(properties = "spring.jpa.properties.hibernate.generate_statistics=true")
2class PrimaryEntityRepositoryTest {
3
4    @Autowired
5    public PrimaryEntityRepositoryTest(PrimaryEntityRepository primaryEntityRepository, SessionFactory sessionFactory) {
6        this.primaryEntityRepository = primaryEntityRepository;
7        this.sessionFactory = sessionFactory;
8    }
9
10    private final PrimaryEntityRepository primaryEntityRepository;
11
12    private final SessionFactory sessionFactory;
13
14    private Statistics stats;
15
16    @BeforeEach
17    void setUp() {
18        stats = sessionFactory.getStatistics();
19        stats.clear();
20    }
21
22    @Test
23    void derivedQueryRunsIntoNPlusOneProblem() {
24        final List<PrimaryEntity> primaryEntities = primaryEntityRepository.findByCategory("CAT1");
25
26        assertThat(primaryEntities).hasSize(3);
27        assertThat(primaryEntities.stream().map(primaryEntity -> primaryEntity.getSecondaryEntity().getAdditionalData()).collect(Collectors.toSet())).contains("A", "B", "C");
28        assertThat(stats.getQueryExecutionCount()).isOne();
29        assertThat(stats.getEntityFetchCount()).isEqualTo(primaryEntities.size());
30        assertThat(stats.getPrepareStatementCount()).isEqualTo(stats.getQueryExecutionCount() + stats.getEntityFetchCount());
31    }
32
33    @Test
34    void jpqlWithJoinFetchExecutesSingleQuery() {
35        final List<PrimaryEntity> primaryEntities = primaryEntityRepository.findByCategoryJoinFetch("CAT1");
36
37        assertThat(primaryEntities).hasSize(3);
38        assertThat(primaryEntities.stream().map(primaryEntity -> primaryEntity.getSecondaryEntity().getAdditionalData()).collect(Collectors.toSet())).contains("A", "B", "C");
39        assertThat(stats.getQueryExecutionCount()).isOne();
40        assertThat(stats.getEntityFetchCount()).isZero();
41        assertThat(stats.getPrepareStatementCount()).isEqualTo(stats.getQueryExecutionCount() + stats.getEntityFetchCount());
42    }
43
44}

Implementing such tests for all your critical database operations should help prevent you from falling into performance traps caused by unexpected operations.

You can check out the complete example from our GitHub repository.

Takeaway: Do not believe in any magic!

As the saying goes, 'trust is good, control is better'. This is true in many cases, especially when working with JPA. While I love Spring Data, JPA, Hibernate and related ORM stuff, no tool is perfect, in particular not when several abstractions sit on top of each other. In the best case they all do their job well, but unfortunately you get into trouble faster than you think if you do not know what happens behind the scenes. It is really important to validate generated database statements, so take a look at Hibernate Statistics!

share post

//

More articles in this subject area

Discover exciting further topics and let the codecentric world inspire you.

//

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.