Is Spring Batch Right for Your Project?

First of all, what is batch processing? According to the Spring Batch website:

Batch processing is the processing of a finite amount of data in a manner that does not require external interaction or interruption.

This means that if you need to have data processed without user interaction or if user interaction is required to start the process. Still, the process should be completely autonomous you need batch processing. This is especially true if the time expected for the process is relatively large, and we cannot hang up the users to wait for it to finish before they can continue with other actions.

In this post, we’ll take a look at the case where user interaction is required in order to start off a batch processing job, and we’ll enable users to do so through a REST API.

Key Points in Understanding Spring Batch

This post is meant to be a basic guide on using Spring Batch in your projects, but as with any technology, the official documentation can take you much further. So if you need additional reading on the topic, check out the Spring Batch documentation.

Basics of Spring Batch

The first step in understating Spring Batch is understanding its basic terminology and organization.

Basic Spring Batch Stereotypes

In Spring Batch, jobs are used to organize batch jobs. Next, each job can have one or multiple steps. This means that when we define a batch job, we can logically subdivide the type of job in multiple steps. This makes for better code organization, readability, and, most importantly, isolation. This is because each step is executed in its own separate transaction and can be separately committed, canceled or retried on failure. As for the item reader, processor, and writer, these are convenience implementations of Spring Batch for batch processing of files, but we can execute any code as part of a step.

Spring Batch Meta-Data Schema

The next step in better understanding the inner workings of Spring batch is understanding the metadata tables that it uses in the background. These tables are used to track metadata about the jobs, steps, and all of their components.

Spring Batch Meta-Data Schema

From these tables, we can learn several things about how Spring Batch operates, but more importantly, about how to utilize it in a project.

Note: Before continuing, it is important to note that the “BATCH_” prefix is user-defined. So in the following text, I’ll reference the tables without the mentioned prefix.

We can see that the top-level table is JOB_INSTANCE. This is where the basic information about the jobs is stored. Next, we have the JOB_EXECUTION table. The reason why the metadata about a job is broken down into two tables is because we can execute a job multiple times. Each execution of a single job will create a new record in the JOB_EXECUTION table. Finally, there is the JOB_EXECUTION_PARAMS table. Here all the parameters with which a job execution was started are stored. The purpose of saving the parameters is to have a trail of how a job was started, but more importantly, with this, we can restart a job if needed or use the job parameters in other parts of the batch process. Also, when we have multiple steps, we often need to pass some information between them. This is where the JOB_EXECUTION_CONTEXT comes in. Through this table, one step can place one or more parameters in the context, while the following steps extract this data and use it as needed.

For the individual steps of each job, we have a similar table organization. An important thing to know, however is the purpose of the STEP_EXECUTION_CONTEXT table. This is where we can store data that we need in order to have a trail of from each step.

Initial Spring Batch Configuration

After adding Spring Batch as a dependency and making sure it is pulled to your local repository, you can start using it by creating the base configuration, which will enable us to add steps and jobs later.

@Configuration
@EnableBatchProcessing
public class SpringBatchConfig implements BatchConfigurer {

  private JobRepository jobRepository;

  private JobExplorer jobExplorer;
 
  private JobLauncher jobLauncher;

  public JobRepository jobRepository;
 
  @Autowired
  private PlatformTransactionManager transactionManager;

  @Override
  public PlatformTransactionManager getTransactionManager() {
    return transactionManager;
  }

  @Override
  @Bean
  public JobLauncher getJobLauncher() {
    return jobLauncher;
  }

  @Override
  @Bean
  public JobExplorer getJobExplorer() {
    return jobExplorer;
  }

}

Enabling Asynchronous Execution

Now that we have the basic configuration down, next we need to extend it to fit our needs. Because we’ll be starting jobs from a REST controller, we need to return immediately to the user and not block while the job is being run. For this, we’ll need to start the jobs on separate threads. The following is the configuration to achieve such behavior.

// SpringBatchConfig.java

@PostConstruct
public void initialize() {
 try {
     this.jobRepository = createJobRepository();
     JobExplorerFactoryBean jobExplorerFactoryBean = new JobExplorerFactoryBean();
     jobExplorerFactoryBean.setDataSource(this.dataSource);
     jobExplorerFactoryBean.afterPropertiesSet();
     this.jobExplorer = jobExplorerFactoryBean.getObject();
     this.jobLauncher = createJobLauncher();
   } catch (Exception e) {
     throw new BatchConfigurationException(e);
   }
}

// Create and use an Async JobLauncher
private JobLauncher createJobLauncher() throws Exception {
  SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
  jobLauncher.setJobRepository(jobRepository);
  jobLauncher.setTaskExecutor( new SimpleAsyncTaskExecutor());
  jobLauncher.afterPropertiesSet();
  return jobLauncher;
}

Defining a Job and Multiple Steps

As mentioned in the basics of Spring Batch section, item reader, processor, and writer are readily available implementations for processing files. These are well-documented and can easily be used. However, often we need to cover a very specific case that we need to be executed as a step in the batch process. This is where Tasklets come in. It allows us to execute any code as a step.

public class TaskletOne implements Tasklet {   

   @Override
   @Nullable
   public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
       System.out.println("Executing TaskletOne.");
       return RepeatStatus.FINISHED;
   }
}

Next, we can use this tasklet when defining a job.

// SpringBatchConfig.java

@Bean
protected Step stepOne() {
 return steps
       .get("stepOne")
       .tasklet(taskletOne())
       .build();
}

private Tasklet taskletOne() {
 return new TaskletOne();
}

Under the assumption that we defined a similar TaskletTwo and stepTwo, we can define a job that ties these two steps together.

// SpringBatchConfig.java

@Bean
public Job jobOne() {
 return jobs
      .get("jobOne")
      .start(stepOne())
      .next(stepTwo())
      .build();
}

Starting a Job From a REST API Application

We previously defined multiple steps, a job, and an async launcher which tries to start the job on a separate thread and then returns with a status. All that is left to do now is start the job when a user requests it. The following is a simple example of starting a job from a REST controller. Note that this should always be done via a service which includes additional logic for handling all possible errors and providing the user with an appropriate message.

@PostMapping(value = "/startJobOne")
public ResponseEntity<String> startJob(@RequestParam(required = true) Long jobParameter) {

   JobParameters jobParameters = new JobParametersBuilder()
           .addLong("ExampleParameter", jobParameter, true)
           .toJobParameters();
   try {
       asyncJobLauncher.run(jobOne, jobParameters);
   } catch (Exception e) {
       e.printStackTrace();
       return ResponseEntity.status(500).body("Could not start job.");
   }

   return ResponseEntity.accepted().body("Job successfully started.");
}

Wrapping Up

Spring Batch offers so much more than the example described in this post, so be sure to refer to the official documentation. Here we looked at an example of how to define job steps, jobs, and how to run jobs asynchronously from a REST controller. What you might want to do next is provide means for the API users to follow the job progress. This can be done by short polling, long polling or web sockets, depending on your specific use case. Whichever option you choose, Spring Batch offers a JobOperator to provide insights into your past and current jobs.



Author avatar

About Andrej Taneski

was part of Keitaro

How may we help you with ?

By submitting this form you agree to Keitaro using your personal data in accordance with the General Data Protection Regulation. You can unsubscribe at any time. For information about our privacy practices, please visit our Privacy Policy page.