RAG using Spring AI
Retrieval Augmented Generation (RAG) is a technique used to give LLMs access to specific data. In this post, we will build a RAG application using Spring AI and PostgreSQL.
What is RAG?
RAG combines the power of LLMs with specialized data. Instead of training a model on your data, you store your documents in a vector database. When a user asks a question, the system retrieves relevant chunks of data and provides them to the LLM as context.
What RAG Looks Like in Practice
I didn’t want to fine-tune a model or manage embeddings manually. The goal was to:
Upload documents (PDFs, text files, etc.)
Chunk and embed them automatically
Query them using metadata-based filtering
Let the LLM answer questions using retrieved context
Spring AI handles does most of the heavy lifting, which keeps the codebase small and readable.
Maven Dependencies
We need several Spring AI starters for OpenAI, PGVector, and document parsing.
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-tika-document-reader</artifactId>
</dependency>
</dependencies>
Configuration
We configure the OpenAI model and PostgreSQL connection in application.yaml.
spring:
application:
name: rag
ai:
openai:
api-key: ${OPENAI_KEY}
chat:
options:
model: gpt-5-nano
embedding:
options:
model: text-embedding-3-small
vectorstore:
pgvector:
table-name: rag
schema-name: store
initialize-schema: true
datasource:
url: ${POSTGRES_URL}
username: ${POSTGRES_USERNAME}
password: ${POSTGRES_PASSWORD}
servlet:
multipart:
max-file-size: 50MB
Document Upload
The UploadController handles reading files using Apache Tika, splitting them into tokens, and storing them in the vector store.
@PostMapping("/file")
public ResponseEntity<String> uploadFile(@RequestParam("file") MultipartFile file) {
try {
TikaDocumentReader documentReader = new TikaDocumentReader(toResource(file));
TextSplitter textSplitter = new TokenTextSplitter();
vectorStore.add(textSplitter.split(documentReader.read()));
return ResponseEntity.ok("file successfully uploaded and processed");
} catch (IOException e) {
return ResponseEntity.status(500).body("Error processing file: " + e.getMessage());
} catch (Exception e) {
return ResponseEntity.status(500).body("Error: " + e.getMessage());
}
}
public Resource toResource(MultipartFile file) throws IOException {
return new InputStreamResource(file.getInputStream()) {
@Override
public String getFilename() {
return file.getOriginalFilename();
}
@Override
public long contentLength() throws IOException {
return file.getSize();
}
};
}
Chat with Context
The QuestionController retrieves relevant documents from PostgreSQL using QuestionAnswerAdvisor and generates a response from the LLM.
SearchRequest helps in filtering out the documents based on different filenames that were embedded, which is notebook param in the RequestParm.
@GetMapping("/rag/chat")
public String chat(@RequestParam(value = "prompt", defaultValue = "Tell me a joke.") String prompt,
@RequestParam(value = "notebook", defaultValue = "embed.txt") String notebook) {
long startTime = System.currentTimeMillis();
SearchRequest searchRequest = SearchRequest.builder()
.filterExpression(String.format("source == '%s'", notebook))
.build();
ChatResponse chatResponse = this.chatClient.prompt()
.user(prompt)
.advisors(QuestionAnswerAdvisor
.builder(vectorStore)
.searchRequest(searchRequest)
.build())
.call()
.chatResponse();
assert chatResponse != null;
Usage usage = chatResponse.getMetadata().getUsage();
int promptTokens = usage.getPromptTokens();
int completionTokens = usage.getCompletionTokens();
long endTime = System.currentTimeMillis();
long responseTime = endTime - startTime;
return String.format("""
Response from LLM is\s
%s
Prompt tokens used %s
Completion tokens used %s
Response time %s ms""",
chatResponse.getResult().getOutput().getText(),
promptTokens, completionTokens, responseTime);
}
Conclusion
Spring AI makes it easy to integrate vector stores and LLMs into a Java application. By using PgVectorStore and QuestionAnswerAdvisor, we can implement a full RAG pipeline with very little code.

