March 14

Optimizing repository.saveAll() JPA Hibernate

Problem. Invoking repository.saveAll() has to insert into DB by batches, but it doesn't.

Result of the investigation.
Current realisation inserts new records one by one. Batching doesn't work because of IDENTITY id generator.

Hibernate batch processing works only with SEQUENCE id generator. And with properties:
spring.jpa.properties.hibernate.jdbc.batch_size=30 spring.jpa.properties.hibernate.order_inserts=true

Links: https://docs.jboss.org/hibernate/orm/4.3/manual/en-US/html/ch15.html

Solution:
Move to SEQUENCE id generator.
Create the my_table_id_seq for table my_table. Or create new in case there is not exist.
Then restart it from the latest existing primary key.

SELECT SETVAL('my_table_id_seq', (SELECT MAX(id) + 1 FROM my_table));

Test 1: saving of 50_000 records took 20 seconds.
Set up with identity generator:
1) MyEntity
  @Id   @GeneratedValue(strategy = GenerationType.IDENTITY)   @Column(name = "id")   private Long id;

2) properties.yaml
spring.jpa.properties.hibernate.jdbc.batch_size=30 spring.jpa.properties.hibernate.order_inserts=true

In the log you can see that batchSize = 0. There is 50_000 logs entries. Batching doesn't work.

Log:

{"name":"Batch-Insert-Logger", "connection":5, "time":0, "success":true, "type":"Prepared", "batch":false, "querySize":1, "batchSize":0, "query":["insert into my_table (created,id,full_name,updated) values (?,?,?,?,)"], "params":[["2025-03-06 12:07:07.010665","698606","John Doe","2025-03-06 12:07:07.010666"]]}

Test 2: saving of 50_000 records took 6 seconds.
Set up with sequence generator:
1) MyEntity
  @Id   @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "mySeqGen")   @SequenceGenerator(name = "mySeqGen", sequenceName = "my_table_id_seq", allocationSize = 1)   @Column(name = "id")   private Long id;

2) properties.yaml
spring.jpa.properties.hibernate.jdbc.batch_size=30 spring.jpa.properties.hibernate.order_inserts=true

In the log you can see that batchSize = 30. There is 117 logs entries with batch size = 30 and 1 log entry with batch size = 20. Batching works.

Log:

{"name":"Batch-Insert-Logger", "connection":5, "time":0, "success":true, "type":"Prepared", "batch":true, "querySize":1, "batchSize":30, "query":["insert into my_table (created,id,full_name,updated) values (?,?,?,?)"], "params": []}