RAG using Spring AI

Retrieval Augmented Generation (RAG) is a technique used to give LLMs access to specific data. In this post, we will build a RAG application using Spring AI and PostgreSQL.

What is RAG?

RAG combines the power of LLMs with specialized data. Instead of training a model on your data, you store your documents in a vector database. When a user asks a question, the system retrieves relevant chunks of data and provides them to the LLM as context.

What RAG Looks Like in Practice

I didn’t want to fine-tune a model or manage embeddings manually. The goal was to:

Upload documents (PDFs, text files, etc.)
Chunk and embed them automatically
Query them using metadata-based filtering
Let the LLM answer questions using retrieved context

Spring AI handles does most of the heavy lifting, which keeps the codebase small and readable.

Maven Dependencies

We need several Spring AI starters for OpenAI, PGVector, and document parsing.

<dependencies>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-model-openai</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-tika-document-reader</artifactId>
    </dependency>
</dependencies>

Configuration

We configure the OpenAI model and PostgreSQL connection in application.yaml.

spring:
  application:
    name: rag
  ai:
    openai:
      api-key: ${OPENAI_KEY}
      chat:
        options:
          model: gpt-5-nano
      embedding:
        options:
          model: text-embedding-3-small
    vectorstore:
      pgvector:
        table-name: rag
        schema-name: store
        initialize-schema: true
  datasource:
    url: ${POSTGRES_URL}
    username: ${POSTGRES_USERNAME}
    password: ${POSTGRES_PASSWORD}
  servlet:
    multipart:
      max-file-size: 50MB

Document Upload

The UploadController handles reading files using Apache Tika, splitting them into tokens, and storing them in the vector store.

@PostMapping("/file")
public ResponseEntity<String> uploadFile(@RequestParam("file") MultipartFile file) {
    try {
        TikaDocumentReader documentReader = new TikaDocumentReader(toResource(file));
        TextSplitter textSplitter = new TokenTextSplitter();
        vectorStore.add(textSplitter.split(documentReader.read()));
        return ResponseEntity.ok("file successfully uploaded and processed");
    } catch (IOException e) {
        return ResponseEntity.status(500).body("Error processing file: " + e.getMessage());
    } catch (Exception e) {
        return ResponseEntity.status(500).body("Error: " + e.getMessage());
    }
}

public Resource toResource(MultipartFile file) throws IOException {
    return new InputStreamResource(file.getInputStream()) {
        @Override
        public String getFilename() {
            return file.getOriginalFilename();
        }

        @Override
        public long contentLength() throws IOException {
            return file.getSize();
        }
    };
}

Chat with Context

The QuestionController retrieves relevant documents from PostgreSQL using QuestionAnswerAdvisor and generates a response from the LLM.

SearchRequest helps in filtering out the documents based on different filenames that were embedded, which is notebook param in the RequestParm.

@GetMapping("/rag/chat")
public String chat(@RequestParam(value = "prompt", defaultValue = "Tell me a joke.") String prompt,
                   @RequestParam(value = "notebook", defaultValue = "embed.txt") String notebook) {
    long startTime = System.currentTimeMillis();
    SearchRequest searchRequest = SearchRequest.builder()
            .filterExpression(String.format("source == '%s'", notebook))
            .build();
    ChatResponse chatResponse = this.chatClient.prompt()
            .user(prompt)
            .advisors(QuestionAnswerAdvisor
                    .builder(vectorStore)
                    .searchRequest(searchRequest)
                    .build())
            .call()
            .chatResponse();
    assert chatResponse != null;
    Usage usage = chatResponse.getMetadata().getUsage();
    int promptTokens = usage.getPromptTokens();
    int completionTokens = usage.getCompletionTokens();
    long endTime = System.currentTimeMillis();
    long responseTime = endTime - startTime;
    return String.format("""
                    Response from LLM is\s

                    %s

                    Prompt tokens used %s
                    Completion tokens used %s
                    Response time %s ms""",
            chatResponse.getResult().getOutput().getText(),
            promptTokens, completionTokens, responseTime);
}

Conclusion

Spring AI makes it easy to integrate vector stores and LLMs into a Java application. By using PgVectorStore and QuestionAnswerAdvisor, we can implement a full RAG pipeline with very little code.

RAG using Spring AI

What is RAG?

What RAG Looks Like in Practice

Maven Dependencies

Configuration

Document Upload

Chat with Context

Conclusion

Comments

More from this blog

Building a simple Readwise alternative

Building a Memory Lane with a Spring Boot MCP Server

How to Automate Backups with Go

Wallet Service in Spring Boot

Command Palette

What is RAG?

What RAG Looks Like in Practice

Maven Dependencies

Configuration

Document Upload

Chat with Context

Conclusion

Comments

More from this blog