EmoBooks — Emotionally Intelligent Book Recommender (LoRA Adapter)

A fine-tuned Llama-3-8B-Instruct LoRA adapter for emotion-aware Sinhala book recommendations.

⚠️ This is a LoRA adapter (~168MB), not a full model. It must be loaded on top of the base model. See Architecture below.

Architecture

How Base Model + LoRA Adapter Works

┌─────────────────────────────────────────────────────┐
│              Full Inference Pipeline                 │
│                                                     │
│  ┌───────────────────────────────────┐              │
│  │   Base Model (Frozen Weights)     │  ~5GB (4-bit)│
│  │   unsloth/llama-3-8b-instruct     │              │
│  │   - 32 Transformer layers         │              │
│  │   - 8B parameters (quantized)     │              │
│  │   - General language ability      │              │
│  └────────────────┬──────────────────┘              │
│                   │ merge at runtime                 │
│  ┌────────────────▼──────────────────┐              │
│  │   LoRA Adapter (This Repo)        │  ~168MB      │
│  │   DiyRex/emobooks-llama3-lora     │              │
│  │   - Adds small weight deltas      │              │
│  │   - Targets 7 module types        │              │
│  │   - Rank 16, Alpha 16            │              │
│  │   - Emotion-aware behavior        │              │
│  └────────────────┬──────────────────┘              │
│                   │                                  │
│                   ▼                                  │
│          EmoBooks Output                             │
│   (Empathetic, safety-filtered recommendations)      │
└─────────────────────────────────────────────────────┘

The base model provides general language understanding. It knows English, grammar, how to follow instructions, and conversational patterns.

The LoRA adapter teaches it EmoBooks-specific behavior: mood detection, empathetic acknowledgments, the match/switch protocol, book title formatting, and critical safety rules (never recommending dark books to sad users who want to feel better).

What's in This Repo

File	Size	Purpose
`adapter_model.safetensors`	168MB	LoRA weight deltas (the fine-tuned parameters)
`adapter_config.json`	1KB	LoRA config (rank, alpha, target modules, base model reference)
`tokenizer.json`	17MB	Tokenizer vocabulary (same as base, included for convenience)
`tokenizer_config.json`	51KB	Tokenizer settings and chat template

Training Details (v9 — Apr 2026)

Parameter	Value
Base Model	`unsloth/llama-3-8b-instruct-bnb-4bit`
Method	QLoRA (4-bit quantized base + LoRA adapters)
LoRA Rank (r)	32
LoRA Alpha	64
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Steps	3000
Learning Rate	1.0e-4 (cosine)
Effective Batch Size	16 (per_device 2 × grad_accum 8)
Dataset	DiyRex/emobooks-dataset (`data/emobooks_chat_v3.jsonl`, 66 000 multi-turn rows)
Catalog	607 Sinhala novels, English-language Sri Lankan authors removed (Carl Muller, Punyakante Wijenaike, etc.)
Anti-hallucination	Post-hoc `_enforce_catalog` guardrail in the runtime — every "X by Y" mention is validated against the catalog index; mismatches are rewritten or stripped

How It Works

User shares mood (e.g., "I feel lonely today") → Model acknowledges empathetically
Natural Flow:
- Explicit: Model asks "Match your mood or Switch?" when user intent is vague.
- Implicit: Model understands intent from context (e.g., "Cheer me up" → Switch) and recommends directly.
- Direct: Model honors specific requests (e.g., "Recommend a thriller") without unnecessary mood questioning.
- Greetings: Model handles "Hi/Hello" gracefully without forced recommendations.
Single Recommendation: Model recommends exactly one book with title, author, and description.
Safety: When sad/anxious/angry users choose "Switch", ONLY uplifting books are recommended.

Quick Start (Inference)

Option A: Using Unsloth (Recommended, fastest)

from unsloth import FastLanguageModel

# Step 1: Load base model + LoRA adapter in one call
# Unsloth reads adapter_config.json → finds base_model_name_or_path →
# downloads llama-3-8b-instruct (~5GB) → loads LoRA adapter on top
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="DiyRex/emobooks-llama3-lora",  # This repo
    max_seq_length=2048,
    load_in_4bit=True,  # 4-bit quantization for ~5GB VRAM usage
)
FastLanguageModel.for_inference(model)  # Enable 2x faster inference

# Step 2: Chat with the model
messages = [{"role": "user", "content": "I feel lonely today and I'm alone at home"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, max_length=None)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Option B: Using Transformers + PEFT (No Unsloth dependency)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

# Step 1: Load the base model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/llama-3-8b-instruct-bnb-4bit",
    quantization_config=bnb_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("DiyRex/emobooks-llama3-lora")

# Step 2: Load the LoRA adapter on top of the base model
model = PeftModel.from_pretrained(base_model, "DiyRex/emobooks-llama3-lora")
model.eval()

# Step 3: Inference (same as above)
messages = [{"role": "user", "content": "I feel lonely today"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, max_length=None)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Continue Training (Retraining from this Adapter)

You can resume fine-tuning from this checkpoint without starting from scratch:

from unsloth import FastLanguageModel
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

# Step 1: Load this adapter (LoRA layers are already attached)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="DiyRex/emobooks-llama3-lora",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Step 2: Load your new/updated dataset
dataset = load_dataset("DiyRex/emobooks-dataset", data_files="data/emobooks_training_v6.jsonl", split="train")

# Step 3: Configure and run training
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    args=TrainingArguments(
        output_dir="./outputs",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        num_train_epochs=2,
        learning_rate=5e-5,
        fp16=True,
        logging_steps=10,
    ),
)
trainer.train()

# Step 4: Save and push the updated adapter
model.save_pretrained("./outputs/lora_adapter_v2")
model.push_to_hub("DiyRex/emobooks-llama3-lora")  # Updates main branch

Merging into a Standalone Model (Fusing)

If you need a standalone model without requiring the base model separately (e.g., for GGUF export or production deployment):

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="DiyRex/emobooks-llama3-lora",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Merge LoRA weights into base model (creates a ~16GB fp16 model)
model.save_pretrained_merged("./merged_model", tokenizer, save_method="merged_16bit")

# Or export directly to GGUF for llama.cpp / Ollama
model.save_pretrained_gguf("./gguf_model", tokenizer, quantization_method="q4_k_m")

Dataset Versions

Available at DiyRex/emobooks-dataset:

Version	Samples	Focus
v3	5000	Format compliance — single-book output, match/switch protocol
v4	5000	Expanded prompts and conversational variety
v5	5000	Category-aware descriptions
v6	5000	Sentiment-safe — keyword shield, 5 moods, 100 prompt styles, unique descriptions
v7	6000	Conversational — explicit, implicit, and direct intent detection
v8	6600	Balanced — added neutral greetings to prevent "Model Aggression"
v9 (Apr 2026)	66 000	Cleaned Sinhala-only catalog (607 books, no English-language authors), readable Singlish transliteration, strict + soft-offer SYSTEM_PROMPT, 9 dialog arcs × 8 emotions, anti-hallucination guardrail at runtime

The v9 chat file is data/emobooks_chat_v3.jsonl (file name kept for back-compat with training scripts; the release tag is v9.0). The human-readable catalog reference is at reference/reference_books.{json,csv} and reference/curated_sinhala_novels.json.

License

Apache 2.0

Downloads last month: 51