This is the first part of a series on Java based configuration in Spring Batch. Spring Batch 2.2 will be out in a few weeks (update: was released 6/6), and it will have a Java DSL for Spring Batch, including its own @Enable annotation. In Spring Core I prefer Java based configuration over XML , but Spring Batch has a really good namespace in XML. Is the Java based approach really better? Time to take a deep look into the new features!
In this first post I will introduce the Java DSL and compare it to the XML version, but there’s more to come. In future posts I will talk about JobParameters, ExecutionContexts and StepScope , profiles and environments , job inheritance , modular configurations and partitioning and multi-threaded step , everything regarding Java based configuration, of course. You can find the JavaConfig code examples on Github . If you want to know when a new blog post is available, just follow me on Twitter (@TobiasFlohre) or Google+.
Back in the days – a simple configuration in XML
Before we start looking at the new Java DSL, I’ll introduce you to the job we’ll translate to Java based configuration. It’s a common use case, not trivial, but simple enough to understand it in a reasonable amount of time. It’s the job’s job to import partner data (name, email address, gender) from a file into a database. Each line in the file is one dataset, different properties are delimited by a comma. We use the FlatFileItemReader
to read the data from the file, and we use the JdbcBatchItemWriter
to write the data to the database.
We split the configuration in two parts: the infrastructure configuration and the job configuration. It always makes sense to do that, because you may want to switch the infrastructure configuration for different environments (test, production), and you may have more than one job configuration.
An infrastructure configuration in XML for a test environment looks like this:
1<context:annotation-config/> 2 3<batch:job-repository/> 4 5<jdbc:embedded-database id="dataSource" type="HSQL"> 6 <jdbc:script location="classpath:org/springframework/batch/core/schema-hsqldb.sql"/> 7 <jdbc:script location="classpath:schema-partner.sql"/> 8</jdbc:embedded-database> 9 10<bean id="transactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager"> 11 <property name="dataSource" ref="dataSource" /> 12</bean> 13 14<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher"> 15 <property name="jobRepository" ref="jobRepository" /> 16</bean>
Note that we create our domain database tables here as well (schema-partner.sql), and note that it’s done in an In-Memory-Database. That’s a perfect scenario for JUnit integration tests.
Now let’s take a look at the job configuration:
1<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader"> 2 <property name="resource" value="classpath:partner-import.csv"/> 3 <property name="lineMapper" ref="lineMapper"/> 4</bean> 5<bean id="lineMapper" class="org.springframework.batch.item.file.mapping.DefaultLineMapper"> 6 <property name="lineTokenizer"> 7 <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"> 8 <property name="names" value="name,email"/> 9 <property name="includedFields" value="0,2"/> 10 </bean> 11 </property> 12 <property name="fieldSetMapper"> 13 <bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"> 14 <property name="targetType" value="de.codecentric.batch.domain.Partner"/> 15 </bean> 16 </property> 17</bean> 18 19<bean id="processor" class="de.codecentric.batch.LogItemProcessor"/> 20 21<bean id="writer" class="org.springframework.batch.item.database.JdbcBatchItemWriter"> 22 <property name="sql" value="INSERT INTO PARTNER (NAME, EMAIL) VALUES (:name,:email)"/> 23 <property name="dataSource" ref="dataSource"/> 24 <property name="itemSqlParameterSourceProvider"> 25 <bean class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider"/> 26 </property> 27</bean> 28 29<batch:job id="flatfileJob"> 30 <batch:step id="step"> 31 <batch:tasklet> 32 <batch:chunk reader="reader" processor="processor" writer="writer" commit-interval="3" /> 33 </batch:tasklet> 34 </batch:step> 35</batch:job>
Note that we almost only use standard Spring Batch components, with the exception of the LogItemProcessor
and, of course, our domain class Partner
.
Java – and only Java
Now it’s time for the Java based configuration style. You can find all the examples used in this blog post series here .
Infrastructure configuration
First, we’ll take a look at the infrastructure configuration. Following one of the patterns I described here , I provide an interface for the InfrastructureConfiguration
to make it easier to switch it in different environments:
1public interface InfrastructureConfiguration {
2
3 @Bean
4 public abstract DataSource dataSource();
5
6}
Our first implementation will be one for testing purposes:
1@Configuration
2@EnableBatchProcessing
3public class StandaloneInfrastructureConfiguration implements InfrastructureConfiguration {
4
5 @Bean
6 public DataSource dataSource(){
7 EmbeddedDatabaseBuilder embeddedDatabaseBuilder = new EmbeddedDatabaseBuilder();
8 return embeddedDatabaseBuilder.addScript("classpath:org/springframework/batch/core/schema-drop-hsqldb.sql")
9 .addScript("classpath:org/springframework/batch/core/schema-hsqldb.sql")
10 .addScript("classpath:schema-partner.sql")
11 .setType(EmbeddedDatabaseType.HSQL)
12 .build();
13 }
14
15}
All we need here is our DataSource
and the small annotation @EnableBatchProcessing
. If you’re familiar with Spring Batch, you know that the minimum for running jobs is a PlatformTransactionManager
, a JobRepository
and a JobLauncher
, adding a DataSource
if you want to persist job meta data. All we have right now is a DataSource
, so what about the rest? The annotation @EnableBatchProcessing
is creating those component for us. It takes the DataSource
and creates a DataSourceTransactionManager
working on it, it creates a JobRepository
working with the transaction manager and the DataSource
, and it creates a JobLauncher
using the JobRepository
. In addition it registers the StepScope
for usage on batch components and a JobRegistry
to find jobs by name.
Of course you’re not always happy with a DataSourceTransactionManager
, for example when running inside an application server. We’ll cover that in a future post . The usage of the StepScope
will be covered in a future post as well.
I left out two new components that are registered in the application context as well: a JobBuilderFactory
and a StepBuilderFactory
. Of course we may autowire all of those components into other Spring components, and that’s what we’re gonna do now in our job configuration with the JobBuilderFactory
and the StepBuilderFactory
.
Job configuration
1@Configuration
2public class FlatfileToDbJobConfiguration {
3
4 @Autowired
5 private JobBuilderFactory jobBuilders;
6
7 @Autowired
8 private StepBuilderFactory stepBuilders;
9
10 @Autowired
11 private InfrastructureConfiguration infrastructureConfiguration;
12
13 @Bean
14 public Job flatfileToDbJob(){
15 return jobBuilders.get("flatfileToDbJob")
16 .listener(protocolListener())
17 .start(step())
18 .build();
19 }
20
21 @Bean
22 public Step step(){
23 return stepBuilders.get("step")
24 .<Partner,Partner>chunk(1)
25 .reader(reader())
26 .processor(processor())
27 .writer(writer())
28 .listener(logProcessListener())
29 .build();
30 }
31
32 @Bean
33 public FlatFileItemReader<Partner> reader(){
34 FlatFileItemReader<Partner> itemReader = new FlatFileItemReader<Partner>();
35 itemReader.setLineMapper(lineMapper());
36 itemReader.setResource(new ClassPathResource("partner-import.csv"));
37 return itemReader;
38 }
39
40 @Bean
41 public LineMapper<Partner> lineMapper(){
42 DefaultLineMapper<Partner> lineMapper = new DefaultLineMapper<Partner>();
43 DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
44 lineTokenizer.setNames(new String[]{"name","email"});
45 lineTokenizer.setIncludedFields(new int[]{0,2});
46 BeanWrapperFieldSetMapper<Partner> fieldSetMapper = new BeanWrapperFieldSetMapper<Partner>();
47 fieldSetMapper.setTargetType(Partner.class);
48 lineMapper.setLineTokenizer(lineTokenizer);
49 lineMapper.setFieldSetMapper(fieldSetMapper);
50 return lineMapper;
51 }
52
53 @Bean
54 public ItemProcessor<Partner,Partner> processor(){
55 return new LogItemProcessor();
56 }
57
58 @Bean
59 public ItemWriter<Partner> writer(){
60 JdbcBatchItemWriter<Partner> itemWriter = new JdbcBatchItemWriter<Partner>();
61 itemWriter.setSql("INSERT INTO PARTNER (NAME, EMAIL) VALUES (:name,:email)");
62 itemWriter.setDataSource(infrastructureConfiguration.dataSource());
63 itemWriter.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<Partner>());
64 return itemWriter;
65 }
66
67 @Bean
68 public ProtocolListener protocolListener(){
69 return new ProtocolListener();
70 }
71
72 @Bean
73 public LogProcessListener logProcessListener(){
74 return new LogProcessListener();
75 }
76}
Looking at the code you’ll find the ItemReader
, ItemProcessor
and ItemWriter
definition identical to the XML version, just done in Java based configuration. I added two listeners to the configuration, the ProtocolListener
and the LogProcessListener
.
The interesting part is the configuration of the Step
and the Job
. In the Java DSL we use builders for building Step
s and Job
s. Since every Step
needs access to the PlatformTransactionManager
and the JobRepository
, and every Job
needs access to the JobRepository
, we use the StepBuilderFactory
to create a StepBuilder
that already uses the configured JobRepository
and PlatformTransactionManager
, and we use the JobBuilderFactory
to create a JobBuilder
that already uses the configured JobRepository
. Those factories are there for our convenience, it would be totally okay to create the builders ourselves.
Now that we have a StepBuilder
, we can call all kinds of methods on it to configure our Step
, from setting the chunk size over reader, processor, writer to listeners and much more. Just explore it for yourself. Note that the type of the builder may change in your builder chain according to your needs. For example, when calling the chunk
method, you switch from a StepBuilder
to a parameterized SimpleStepBuilder
, because from now on the builder knows that you want to build a chunk based Step
. The StepBuilder
doesn’t have methods for adding a reader or writer, but the SimpleStepBuilder
has those methods. Because the SimpleStepBuilder
is typesafe regarding the item type, you need to parameterize the call to the chunk
method, like it is done in the example with the item type Partner
. Normally you won’t notice the switching of builder types when constructing a builder chain, but it’s good to know how it works.
The same holds for the JobBuilder
for configuring Job
s. You can define all kinds of properties important for the Job
, and you may define a Step
flow with multiple Step
s, and again, according to your needs, the type of the builder may change in your builder chain. In our example we define a simple Job
with one Step
and one JobExecutionListener
.
Connecting infrastructure and job configuration
One more thing about the job configuration: we need the DataSource
in the JdbcBatchItemWriter
, but we defined it in the infrastructure configuration. That’s a good thing, because it is very low level, and of course we don’t want to define something like that in the job configuration. So how do we get the DataSource
? We know that we’ll start the application context with an infrastructure configuration and one or more job configurations, so one option would be to autowire the DataSource
directly into the job configuration. I didn’t do that, because I believe that minimizing autowire magic is one important thing in the enterprise world, and I could do better. Instead of injecting the DataSource
I injected the InfrastructureConfiguration
itself, getting the DataSource
from there. Now it’s a thousand times easier to understand where the DataSource
comes from when looking at the job configuration. Note that the InfrastructureConfiguration
is an interface and we don’t bind the job configuration to a certain infrastructure configuration. Still there’ll be only two or three implementations, and it’s easy to see which one is used under which circumstances.
Fault-tolerant steps: skipping and retrying items
If you want to use skip and/or retry functionality, you’ll need to activate fault-tolerance on the builder, which is done with the method faultTolerant
. Like explained above, the builder type switches, this time to FaultTolerantStepBuilder
, and a bunch of new methods appear, like skip
, skipLimit
, retry
, retryLimit
and so on. A Step
configuration may look like this:
1@Bean
2 public Step step(){
3 return stepBuilders.get("step")
4 .<Partner,Partner>chunk(1)
5 .reader(reader())
6 .processor(processor())
7 .writer(writer())
8 .listener(logProcessListener())
9 .faultTolerant()
10 .skipLimit(10)
11 .skip(UnknownGenderException.class)
12 .listener(logSkipListener())
13 .build();
14 }
Conclusion
The Spring Batch XML namespace for configuring jobs and steps is a little bit more concise than its Java counterpart, that’s a plus on that side. The Java DSL has the advantage of type-safety and the perfect IDE support regarding refactoring, auto-completion, finding usages etc. So you may say it’s just a matter of taste if you pick this one or the other, but I say it’s more than that.
90 % of all batch applications reside in the enterprise, big companies like insurances or financial services. Batch applications are at the heart of their business, and they are business critical. Every such company using Java for batch processing has its own little framework or library around solutions like Spring Batch to adapt it to its needs. And when it comes to building frameworks and libraries, Java based configuration is way ahead of XML, and here are some of the reasons:
- We want to do some basic configurations in the framework. People add a dependency to our framework library and import those configurations according to their needs. If these configurations were written in XML, they would have a hard time opening them to look what they are doing. No problem in Java. Important topic for transparency and maintainability.
- There’s no navigability in XML. That may be okay as long as you don’t have too many XML files and all of them are in your workspace, because then you can take advantage of the Spring IDE support. But a framework library usually should not be added as a project to the workspace. When using Java based configuration you can perfectly jump into framework configuration classes. I will talk more about this subject in a following blog post .
- In a framework you often have requirements the user of the library has to fulfil in order to make everything work, for example the need for a
DataSource
, aPlatformTransactionManager
and a thread pool. The implementation doesn’t matter from the perspective of the framework, they just need to be there. In XML you have to write some documentation for the users of framework, telling them they need to add this and this and this Spring bean under this name to theApplicationContext
. In Java you just write an interface describing that contract, and people using the library implement that interface and add it as a configuration class to theApplicationContext
. That’s what I did with the interfaceInfrastructureConfiguration
above, and I will talk more about it in a future post .
All these advantages become even more important when there’s not only one common library but a hierarchy of libraries, for example one for the basic stuff and then one for a certain division. You really need to be able to navigate through everything to keep it understandable. And Java based configuration makes it possible.
More articles
fromTobias Flohre
Your job at codecentric?
Jobs
Agile Developer und Consultant (w/d/m)
Alle Standorte
Gemeinsam bessere Projekte umsetzen.
Wir helfen deinem Unternehmen.
Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.
Hilf uns, noch besser zu werden.
Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.
Blog author
Tobias Flohre
Senior Software Developer
Do you still have questions? Just send me a message.
Do you still have questions? Just send me a message.