Fine-Tuning T5: A University Cafeteria Chatbot

Project Overview

This project demonstrates the process of fine-tuning a pre-trained language model to specialize in a specific domain. By using a “text-to-text” approach, we transform a general-purpose model into a specialized assistant capable of answering questions about the menus, prices, and services of the Universidad Autónoma de Madrid (UAM) cafeterias.

Technologies Used

Python: The core programming language.
Hugging Face Transformers: Library for accessing the T5 model and the Trainer API.
PyTorch: Deep learning framework used for tensor manipulation and hardware acceleration (CUDA/MPS).
SentencePiece: Tokenization backend for the T5 model.
JSON: Format used for the custom Q&A dataset.

Methodology

The workflow follows the standard pipeline for Natural Language Processing (NLP) adaptation:

Conceptual Setup: Understanding that T5 (Text-to-Text Transfer Transformer) views every task as a string-to-string problem.
Data Tokenization: Converting Spanish natural language queries into numerical “tokens” that the neural network can process.
Custom Dataset Creation: Implementing a CafeteriaDataset class in PyTorch to handle input/target encoding, padding, and truncation.
Model Training: Utilizing the Trainer API to perform backpropagation over 10 epochs, adjusting the model’s internal weights (parameters) to minimize the error between generated and expected answers.
Inference Pipeline: Developing a function to load the fine-tuned weights and generate real-time responses using “Beam Search” for better text quality.

Key Concepts: The “Attention” Mechanism

Unlike older models that read text strictly word-by-word, the Transformer architecture uses Attention. This allows the model to:

Identify that in the question “How much does a coffee cost?”, the words “cost” and “coffee” are the most relevant.
Ignore filler characters or punctuation that don’t add semantic value to the price inquiry.

Results

By the end of the training process, the model successfully transitioned from general language understanding to specific domain expertise. It can now accurately handle queries such as:

Input: “¿Cuánto cuesta el café con leche?” (How much is a latte?)
Output: The specific price defined in the cafeteria.json dataset.
Input: “¿El bocata de jamón es vegano?” (Is the ham sandwich vegan?)
Output: “NO” A context-aware refusal based on the ingredients.