Spring Batch 2.2 – JavaConfig Part 6: Partitioning and Multi-threaded Step

7.7.2013 | 4 minutes reading time

Finally, last part of the blog series! Today we’ll have a quick look at scaled batch jobs, done via partitioning and multi-threaded step.
This is the sixth post about the new Java based configuration features in Spring Batch 2.2. Previous posts are about a comparison between the new Java DSL and XML , JobParameters, ExecutionContexts and StepScope , profiles and environments , job inheritance and modular configurations . You can find the JavaConfig code examples on Github .

Partitioning

I won’t explain partitioning in detail here, just this: with partitioning you need to find a way to partition your data. Each partition of data gets its own StepExecution and will be executed in its own thread. The most important interface here is the Partitioner.
Of course, when working with different threads, we’ll need a source of those threads, and that’ll be a TaskExecutor. Since that’s a very low level component, we add it to the InfrastructureConfiguration interface:

1public interface InfrastructureConfiguration {
2 
3    @Bean
4    public abstract DataSource dataSource();
5 
6    @Bean
7    public abstract TaskExecutor taskExecutor();
8 
9}

For testing environments, this can be an implementation:

1@Configuration
2@EnableBatchProcessing
3public class StandaloneInfrastructureConfiguration implements InfrastructureConfiguration {
4 
5    @Bean
6    public DataSource dataSource(){
7        EmbeddedDatabaseBuilder embeddedDatabaseBuilder = new EmbeddedDatabaseBuilder();
8        return embeddedDatabaseBuilder.addScript("classpath:org/springframework/batch/core/schema-drop-hsqldb.sql")
9                .addScript("classpath:org/springframework/batch/core/schema-hsqldb.sql")
10                .addScript("classpath:schema-partner.sql")
11                .setType(EmbeddedDatabaseType.HSQL)
12                .build();
13    }
14 
15    @Bean
16    public TaskExecutor taskExecutor() {
17        ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
18        taskExecutor.setMaxPoolSize(4);
19        taskExecutor.afterPropertiesSet();
20        return taskExecutor;
21    }
22 
23}

The job that I used as an example during the last blog posts read data from one file and wrote that data to a database. Now we want to read data from more than one file, and we want a partition for each file.
Let’s take a look at the important parts of the job configuration:

1@Bean
2    public Job flatfileToDbPartitioningJob(){
3        return jobBuilders.get("flatfileToDbPartitioningJob")
4                .listener(protocolListener())
5                .start(partitionStep())
6                .build();
7    }
8 
9    @Bean
10    public Step partitionStep(){
11        return stepBuilders.get("partitionStep")
12                .partitioner(flatfileToDbStep())
13                .partitioner("flatfileToDbStep", partitioner())
14                .taskExecutor(infrastructureConfiguration.taskExecutor())
15                .build();
16    }
17 
18    @Bean
19    public Step flatfileToDbStep(){
20        return stepBuilders.get("flatfileToDbStep")
21                .<Partner,Partner>chunk(1)
22                .reader(reader())
23                .processor(processor())
24                .writer(writer())
25                .listener(logProcessListener())
26                .build();
27    }
28 
29    @Bean
30    public Partitioner partitioner(){
31        MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
32        Resource[] resources;
33        try {
34            resources = resourcePatternResolver.getResources("file:src/test/resources/*.csv");
35        } catch (IOException e) {
36            throw new RuntimeException("I/O problems when resolving the input file pattern.",e);
37        }
38        partitioner.setResources(resources);
39        return partitioner;
40    }

We defined a Partitioner that’s looking for csv files in a special location and creating a partition for each file. We defined the step like we did it in the other examples, and then we defined a special partitionStep that’s combining our standard step, the partitioner and the TaskExecutor. And finally, the job is using that partitionStep.

Multi-threaded step

This is a quite simple way of scaling, it just adds some more threads to the processing of a step. Since reading from a file isn’t suitable for this kind of scaling we need a new use case, and it’ll be reading from a queue and writing to a log file. We need some more infrastructure for it:

1public interface InfrastructureConfiguration {
2 
3    @Bean
4    public abstract DataSource dataSource();
5 
6    @Bean
7    public abstract TaskExecutor taskExecutor();
8 
9    @Bean
10    public abstract ConnectionFactory connectionFactory();
11 
12    @Bean
13    public abstract Queue queue();
14 
15    @Bean
16    public abstract JmsTemplate jmsTemplate();
17 
18}

We are using ActiveMQ in a test environment:

1@Configuration
2@EnableBatchProcessing
3public class StandaloneInfrastructureConfiguration implements InfrastructureConfiguration {
4 
5    @Bean
6    public DataSource dataSource(){
7        EmbeddedDatabaseBuilder embeddedDatabaseBuilder = new EmbeddedDatabaseBuilder();
8        return embeddedDatabaseBuilder.addScript("classpath:org/springframework/batch/core/schema-drop-hsqldb.sql")
9                .addScript("classpath:org/springframework/batch/core/schema-hsqldb.sql")
10                .addScript("classpath:schema-partner.sql")
11                .setType(EmbeddedDatabaseType.HSQL)
12                .build();
13    }
14 
15    @Bean
16    public TaskExecutor taskExecutor() {
17        ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
18        taskExecutor.setMaxPoolSize(4);
19        taskExecutor.afterPropertiesSet();
20        return taskExecutor;
21    }
22 
23    @Bean
24    public ConnectionFactory connectionFactory() {
25        return new ActiveMQConnectionFactory("tcp://localhost:61616");
26    }
27 
28    @Bean
29    public Queue queue() {
30        return new ActiveMQQueue("queueName");
31    }
32 
33    @Bean
34    public BrokerService broker() throws Exception{
35        BrokerService broker = new BrokerService();
36        // configure the broker
37        broker.addConnector("tcp://localhost:61616");
38        broker.start();
39        return broker;
40    }
41 
42    @Bean
43    public JmsTemplate jmsTemplate(){
44        JmsTemplate jmsTemplate = new JmsTemplate(connectionFactory());
45        jmsTemplate.setDefaultDestination(queue());
46        jmsTemplate.setReceiveTimeout(500);
47        return jmsTemplate;
48    }
49 
50}

The job configuration is quite simple then:

1@Configuration
2public class MultiThreadedStepJobConfiguration {
3 
4    @Autowired
5    private JobBuilderFactory jobBuilders;
6 
7    @Autowired
8    private StepBuilderFactory stepBuilders;
9 
10    @Autowired
11    private InfrastructureConfiguration infrastructureConfiguration;
12 
13    @Bean
14    public Job multiThreadedStepJob(){
15        return jobBuilders.get("multiThreadedStepJob")
16                .listener(protocolListener())
17                .start(step())
18                .build();
19    }
20 
21    @Bean
22    public Step step(){
23        return stepBuilders.get("step")
24                .<String,String>chunk(1)
25                .reader(reader())
26                .processor(processor())
27                .writer(writer())
28                .taskExecutor(infrastructureConfiguration.taskExecutor())
29                .throttleLimit(4)
30                .build();
31    }
32 
33    @Bean
34    public JmsItemReader<String> reader(){
35        JmsItemReader<String> itemReader = new JmsItemReader<String>();
36        itemReader.setJmsTemplate(infrastructureConfiguration.jmsTemplate());
37        return itemReader;
38    }
39 
40    @Bean
41    public ItemProcessor<String,String> processor(){
42        return new LogItemProcessor<String>();
43    }
44 
45    @Bean
46    public ItemWriter<String> writer(){
47        return new LogItemWriter<String>();
48    }
49 
50    @Bean
51    public ProtocolListener protocolListener(){
52        return new ProtocolListener();
53    }
54 
55}

The difference to a job without any scaling is just the calls to taskExecutor and throttleLimit in the step definition.

Conclusion

Configuring scalability in Spring Batch jobs is easy in Java based configuration. And again, you can see the advantage of having an interface for the infrastructure configuration to easily switch between environments.
I hope this blog series was useful for you, and if there are any questions, don’t hesitate to comment the blog posts!

Was this post helpful?

Blog author

Tobias Flohre

Do you still have questions? Just send me a message.

fromTobias Flohre

Continuous Delivery Patterns: Building your application inside a Docker...

Let me be clear: This post is not about building a Docker container for your application – it is about building your application inside a container specially designed for exactly doing that – building your application – and nothing else. It helped us...

Container
CI/CD

16.11.2016 | 3 minutes reading time

Tobias Flohre

Event Driven Microservices with Spring Cloud Stream

Lately I’ve been much into event driven architectures because I believe it’s the best approach for microservices, allowing for much more decoupled services than point-to-point communication. There are two main approaches for event driven communication...

4.4.2016 | 5 minutes reading time

Tobias Flohre

Bounded Contexts and data duplication in action: adding a shop system ...

It seems that ‘Bounded Context’ (from Eric Evans’ Domain Driven Design) has become one of the terms that have to be included in every microservices talk (along ‘Conway’s Law’, of course). And in fact, it’s an important concept, and although not really...

26.4.2015 | 6 minutes reading time

Tobias Flohre

Self-Contained Systems and ROCA: A complete example using Spring Boot,...

This post is about architectural concepts for web applications – self-contained systems (SCS) and resource-oriented client architecture (ROCA) – and their implementation with Spring Boot, Spring MVC, Spring Security, Thymeleaf, Bootstrap, jQuery, nginx...

12.1.2015 | 17 minutes reading time

Tobias Flohre

Enterprise Java Batch: Challenges and solutions

In the previous blog post we looked at a best practice architecture for Java Batch applications that is running successfully a lot. Still, we see challenges that affect productivity and costs, three of them are the following: MonolithsApplication serverMeta...

1.12.2014 | 3 minutes reading time

Tobias Flohre

Enterprise Java Batch: A best practice architecture

More and more companies are doing their batch processing in Java these days – but how do you do it the right way? This is the start of a series on Enterprise Java Batch about how we think it should be done. Today we will start with some simple questions...

24.11.2014 | 4 minutes reading time

Tobias Flohre

Boot your own infrastructure – Extending Spring Boot in five steps

Writing your own Spring Boot Starter can be a very reasonable thing to do – not every technology is covered by the provided starters, and maybe you have your own framework stuff you wanna boot automatically. We’ll take a look at the possible reasons ...

17.11.2014 | 7 minutes reading time

Tobias Flohre

spring-boot-starter-batch-web 1.3.0 released

Java batch is becoming a hot topic in enterprise environments these days, but how do you do it the right way? The project spring-boot-starter-batch-web offers a best practice approach to modern batch architectures, answering the following questions:...

12.11.2014 | 4 minutes reading time

Tobias Flohre

Writing JSR-352 style jobs with Spring Batch Part 2: Dependency injection

Spring Batch 3.0 supports writing and running batch jobs that comply with the JSR-352 specification, which is the standard for batch processing also included in JEE7. This article series focuses on three topics: configuration options using Spring Batch...

1.9.2014 | 5 minutes reading time

Tobias Flohre

Writing JSR-352 style jobs with Spring Batch Part 1: Configuration options

25.8.2014 | 4 minutes reading time

Tobias Flohre

spring-boot-starter-batch-web 1.2.0 released

18.8.2014 | 1 minutes reading time

Tobias Flohre

Five Spring features that I miss in the Java Enterprise Edition (JEE)

During the last years you could read a lot about the rise of the Java Enterprise Edition and the fall of Spring (a.k.a. Spring is the new legacy), only I don’t see it in real life’s work. For more than ten years I’ve been working on long time projects...

18.5.2014 | 4 minutes reading time

Tobias Flohre

Enterprise-ready production-ready Java batch applications powered by Spring...

This post introduces a new project we set up – our own Spring Boot starter for Spring Batch: spring-boot-starter-batch-web . Spring Boot is the new kid on the Spring block helping you build Spring applications in a fast convention-over-configuration...

12.5.2014 | 4 minutes reading time

Tobias Flohre

Spring Batch: BatchStatus state transitions

Have you ever been wondering what it means that your job has ended in UNKNOWN state? Or why your job never seems to finish (state STARTED), although you restarted the server? This short blog post is about the different batch states and the transitions...

2.4.2014 | 2 minutes reading time

Tobias Flohre

10 criteria for choosing the right implementation of the JSR-352 (Java...

As you might already know, the JSR-352 is the standardization effort for batch processing in Java. It has been released and included in JEE7 this year, which means that every JEE7 application server will have batch processing capabilities. As I pointed...

5.12.2013 | 4 minutes reading time

Tobias Flohre

Batch processing and the Java Enterprise Edition (JSR-352, JEE7, Spring...

The JSR-352 (Batch Applications for the Java platform) has been released and included in JEE7 over half a year ago, so now I see conference talks and workshops popping up explaining it, and that’s a good thing, people need to know about it. Spring Batch...

28.11.2013 | 2 minutes reading time

Tobias Flohre

Spring One Wrap-up: Spring Batch, Spring Hadoop and Spring XD

Here it comes, the second part of my Spring One wrap-up, this time not from sunny California but from rainy Germany. The first one was about Spring IO and Spring Boot, and it’ll be all about batch now. I’ll focus on three projects here, one of them ...

15.9.2013 | 5 minutes reading time

Tobias Flohre

Spring One Wrap-up: Spring Boot and Spring IO

Today’s the last day of Spring One in Santa Clara, California, the biggest conference on the Spring eco system, and it’s time for a wrap-up. There have been a lot of sessions, and of course it’s not possible to cover them all. I’ll do two blog posts,...

11.9.2013 | 5 minutes reading time

Tobias Flohre

Spring Batch and JSR-352 (Batch Applications for the Java Platform) – ...

JSR-352 is final and included in JEE7, the first implementation is available in Glassfish 4. JSR-352 takes over the programming model of Spring Batch almost 1-1, just take a look at the domain and component vocabulary: Spring BatchJSR-352CommentJobJobStepStepChunkChunkItemItemItemReader...

28.7.2013 | 7 minutes reading time

Tobias Flohre

Spring Batch 2.2 – JavaConfig Part 5: Modular configurations

When adding more jobs to an ApplicationContext, you will soon run into problems with the naming and the uniqueness of beans. Normally you define one configuration class or one configuration XML file for each job, and then it feels natural to name the...

29.6.2013 | 2 minutes reading time

Tobias Flohre

Spring Batch 2.2 – JavaConfig Part 4: Job inheritance

One important feature in XML is the possibility to write abstract job definitions like these: Concrete job definitions may inherit parts of their definition from it: ... In enterprise environments...

22.6.2013 | 7 minutes reading time

Tobias Flohre

Spring Batch 2.2 – JavaConfig Part 3: Profiles and environments

This is the third post about the new Java based configuration features in Spring Batch 2.2. In the first post I compared the two configuration styles on a non-trivial batch job reading from a file and writing to a database. I used a very simple infrastructure...

15.6.2013 | 3 minutes reading time

Tobias Flohre

Spring Batch 2.2 – JavaConfig Part 2: JobParameters, ExecutionContext ...

This is the second post about the new Java based configuration features in Spring Batch 2.2. In the first post I compared the two configuration styles on a non-trivial batch job reading from a file and writing to a database. In the first version of...

8.6.2013 | 4 minutes reading time

Tobias Flohre

Spring Batch 2.2 – JavaConfig Part 1: A comparison to XML

This is the first part of a series on Java based configuration in Spring Batch. Spring Batch 2.2 will be out in a few weeks (update: was released 6/6), and it will have a Java DSL for Spring Batch, including its own @Enable annotation. In Spring Core...

1.6.2013 | 10 minutes reading time

Tobias Flohre

Monitoring Spring Batch with AppDynamics

When running Spring Batch in production it’s always good to keep an eye on performance. And in pre-production it’s helpful to have a tool that points to the weak spots. One option you have is to use Application Performance Management (APM) tools, and...

27.4.2013 | 4 minutes reading time

Tobias Flohre

A real ROCA using Bootstrap, jQuery, Thymeleaf, Spring HATEOAS and Spring...

There’s now a newer post about the same topic at Self-Contained Systems and ROCA: A complete example using Spring Boot, Thymeleaf and Bootstrap . What is the best way to build a web application? I know, tough question, and in the end, there cannot be...

20.1.2013 | 6 minutes reading time

Tobias Flohre

A RESTful learning curve: nouns, verbs, HATEOAS and ROCA

I really don’t claim to be a REST expert, but I learned a few things during the last months that I want to share with you. Some things might be trivial for people already working with REST a lot, good for you then, but I guess there are a lot people ...

25.11.2012 | 6 minutes reading time

Tobias Flohre

Spring Dependency Injection Styles – Why I love Java based configuration

I must admit, when looking at Spring 3.0’s feature list for the first time I didn’t see one thing I wanted to use right away in my next project. There was the Spring Expression Language, the stereotype annotation model, there was some Spring MVC stuff...

21.7.2012 | 8 minutes reading time

Tobias Flohre

Transactions in Spring Batch – Part 3: Skip and retry

This is the third post in a series about transactions in Spring Batch, you find the first one here , it’s about the basics, and the second one here , it’s about restart, cursor based reading and listeners. Today’s topics are skip and retry functionality...

29.3.2012 | 6 minutes reading time

Tobias Flohre

Transactions in Spring Batch – Part 2: Restart, cursor based reading and...

This is the second post in a series about transactions in Spring Batch, you find the first one here , it’s about chunk based transaction handling, batch job vs. business data, a failed batch and transaction attributes, and the third one here , it’s about...

25.3.2012 | 6 minutes reading time

Tobias Flohre

Transactions in Spring Batch – Part 1: The Basics

This is the first post in a series about transactions in Spring Batch, you find the second one here , it’s about restarting a batch, cursor based reading and listeners, and the third one here , it’s about skip and retry. Transactions are important in...

21.3.2012 | 6 minutes reading time

Tobias Flohre

SWT and Spring’s @Configurable – Dependency Injection for the UI

Given the following technology stack: – Java frontend with the Standard Web Toolkit (SWT), started via Java Web Start. – Spring Remoting as the interface to the backend. – Spring web application on a Tomcat as backend. The backend is standard Spring...

16.12.2011 | 5 minutes reading time

Tobias Flohre

Google App Engine Persistence – Generic repositories with Objectify

Google’s App Engine is a platform as a service (PAAS) offered by Google. Any servlet-based web application can be deployed there with limitations due to the cloud character of the environment: Instances can be deployed and undeployed at any time. Instances...

10.10.2011 | 5 minutes reading time

Tobias Flohre

Spring Batch 2.2 – JavaConfig Part 6: Partitioning and Multi-threaded Step

Partitioning

Multi-threaded step

Conclusion

Was this post helpful?

Blog author

More articles

Continuous Delivery Patterns: Building your application inside a Docker...

Event Driven Microservices with Spring Cloud Stream

Bounded Contexts and data duplication in action: adding a shop system ...

Self-Contained Systems and ROCA: A complete example using Spring Boot,...

Enterprise Java Batch: Challenges and solutions

Enterprise Java Batch: A best practice architecture

Boot your own infrastructure – Extending Spring Boot in five steps

spring-boot-starter-batch-web 1.3.0 released

Writing JSR-352 style jobs with Spring Batch Part 2: Dependency injection

Writing JSR-352 style jobs with Spring Batch Part 1: Configuration options

spring-boot-starter-batch-web 1.2.0 released

Five Spring features that I miss in the Java Enterprise Edition (JEE)

Enterprise-ready production-ready Java batch applications powered by Spring...

Spring Batch: BatchStatus state transitions

10 criteria for choosing the right implementation of the JSR-352 (Java...

Batch processing and the Java Enterprise Edition (JSR-352, JEE7, Spring...

Spring One Wrap-up: Spring Batch, Spring Hadoop and Spring XD

Spring One Wrap-up: Spring Boot and Spring IO

Spring Batch and JSR-352 (Batch Applications for the Java Platform) – ...

Spring Batch 2.2 – JavaConfig Part 5: Modular configurations

Spring Batch 2.2 – JavaConfig Part 4: Job inheritance

Spring Batch 2.2 – JavaConfig Part 3: Profiles and environments

Spring Batch 2.2 – JavaConfig Part 2: JobParameters, ExecutionContext ...

Spring Batch 2.2 – JavaConfig Part 1: A comparison to XML

Monitoring Spring Batch with AppDynamics

A real ROCA using Bootstrap, jQuery, Thymeleaf, Spring HATEOAS and Spring...

A RESTful learning curve: nouns, verbs, HATEOAS and ROCA

Spring Dependency Injection Styles – Why I love Java based configuration

Transactions in Spring Batch – Part 3: Skip and retry

Transactions in Spring Batch – Part 2: Restart, cursor based reading and...

Transactions in Spring Batch – Part 1: The Basics

SWT and Spring’s @Configurable – Dependency Injection for the UI

Google App Engine Persistence – Generic repositories with Objectify