EmoBooks β€” Emotionally Intelligent Book Recommender (LoRA Adapter)

A fine-tuned Llama-3-8B-Instruct LoRA adapter for emotion-aware Sinhala book recommendations.

⚠️ This is a LoRA adapter (~168MB), not a full model. It must be loaded on top of the base model. See Architecture below.


Architecture

How Base Model + LoRA Adapter Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Full Inference Pipeline                 β”‚
β”‚                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚  β”‚   Base Model (Frozen Weights)     β”‚  ~5GB (4-bit)β”‚
β”‚  β”‚   unsloth/llama-3-8b-instruct     β”‚              β”‚
β”‚  β”‚   - 32 Transformer layers         β”‚              β”‚
β”‚  β”‚   - 8B parameters (quantized)     β”‚              β”‚
β”‚  β”‚   - General language ability      β”‚              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚                   β”‚ merge at runtime                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚  β”‚   LoRA Adapter (This Repo)        β”‚  ~168MB      β”‚
β”‚  β”‚   DiyRex/emobooks-llama3-lora     β”‚              β”‚
β”‚  β”‚   - Adds small weight deltas      β”‚              β”‚
β”‚  β”‚   - Targets 7 module types        β”‚              β”‚
β”‚  β”‚   - Rank 16, Alpha 16            β”‚              β”‚
β”‚  β”‚   - Emotion-aware behavior        β”‚              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚                   β”‚                                  β”‚
β”‚                   β–Ό                                  β”‚
β”‚          EmoBooks Output                             β”‚
β”‚   (Empathetic, safety-filtered recommendations)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The base model provides general language understanding. It knows English, grammar, how to follow instructions, and conversational patterns.

The LoRA adapter teaches it EmoBooks-specific behavior: mood detection, empathetic acknowledgments, the match/switch protocol, book title formatting, and critical safety rules (never recommending dark books to sad users who want to feel better).

What's in This Repo

File Size Purpose
adapter_model.safetensors 168MB LoRA weight deltas (the fine-tuned parameters)
adapter_config.json 1KB LoRA config (rank, alpha, target modules, base model reference)
tokenizer.json 17MB Tokenizer vocabulary (same as base, included for convenience)
tokenizer_config.json 51KB Tokenizer settings and chat template

Training Details (v9 β€” Apr 2026)

Parameter Value
Base Model unsloth/llama-3-8b-instruct-bnb-4bit
Method QLoRA (4-bit quantized base + LoRA adapters)
LoRA Rank (r) 32
LoRA Alpha 64
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Steps 3000
Learning Rate 1.0e-4 (cosine)
Effective Batch Size 16 (per_device 2 Γ— grad_accum 8)
Dataset DiyRex/emobooks-dataset (data/emobooks_chat_v3.jsonl, 66 000 multi-turn rows)
Catalog 607 Sinhala novels, English-language Sri Lankan authors removed (Carl Muller, Punyakante Wijenaike, etc.)
Anti-hallucination Post-hoc _enforce_catalog guardrail in the runtime β€” every "X by Y" mention is validated against the catalog index; mismatches are rewritten or stripped

How It Works

  1. User shares mood (e.g., "I feel lonely today") β†’ Model acknowledges empathetically
  2. Natural Flow:
    • Explicit: Model asks "Match your mood or Switch?" when user intent is vague.
    • Implicit: Model understands intent from context (e.g., "Cheer me up" β†’ Switch) and recommends directly.
    • Direct: Model honors specific requests (e.g., "Recommend a thriller") without unnecessary mood questioning.
    • Greetings: Model handles "Hi/Hello" gracefully without forced recommendations.
  3. Single Recommendation: Model recommends exactly one book with title, author, and description.
  4. Safety: When sad/anxious/angry users choose "Switch", ONLY uplifting books are recommended.

Quick Start (Inference)

Option A: Using Unsloth (Recommended, fastest)

from unsloth import FastLanguageModel

# Step 1: Load base model + LoRA adapter in one call
# Unsloth reads adapter_config.json β†’ finds base_model_name_or_path β†’
# downloads llama-3-8b-instruct (~5GB) β†’ loads LoRA adapter on top
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="DiyRex/emobooks-llama3-lora",  # This repo
    max_seq_length=2048,
    load_in_4bit=True,  # 4-bit quantization for ~5GB VRAM usage
)
FastLanguageModel.for_inference(model)  # Enable 2x faster inference

# Step 2: Chat with the model
messages = [{"role": "user", "content": "I feel lonely today and I'm alone at home"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, max_length=None)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Option B: Using Transformers + PEFT (No Unsloth dependency)

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

# Step 1: Load the base model with 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/llama-3-8b-instruct-bnb-4bit",
    quantization_config=bnb_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("DiyRex/emobooks-llama3-lora")

# Step 2: Load the LoRA adapter on top of the base model
model = PeftModel.from_pretrained(base_model, "DiyRex/emobooks-llama3-lora")
model.eval()

# Step 3: Inference (same as above)
messages = [{"role": "user", "content": "I feel lonely today"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, max_length=None)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Continue Training (Retraining from this Adapter)

You can resume fine-tuning from this checkpoint without starting from scratch:

from unsloth import FastLanguageModel
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

# Step 1: Load this adapter (LoRA layers are already attached)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="DiyRex/emobooks-llama3-lora",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Step 2: Load your new/updated dataset
dataset = load_dataset("DiyRex/emobooks-dataset", data_files="data/emobooks_training_v6.jsonl", split="train")

# Step 3: Configure and run training
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    args=TrainingArguments(
        output_dir="./outputs",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        num_train_epochs=2,
        learning_rate=5e-5,
        fp16=True,
        logging_steps=10,
    ),
)
trainer.train()

# Step 4: Save and push the updated adapter
model.save_pretrained("./outputs/lora_adapter_v2")
model.push_to_hub("DiyRex/emobooks-llama3-lora")  # Updates main branch

Merging into a Standalone Model (Fusing)

If you need a standalone model without requiring the base model separately (e.g., for GGUF export or production deployment):

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="DiyRex/emobooks-llama3-lora",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Merge LoRA weights into base model (creates a ~16GB fp16 model)
model.save_pretrained_merged("./merged_model", tokenizer, save_method="merged_16bit")

# Or export directly to GGUF for llama.cpp / Ollama
model.save_pretrained_gguf("./gguf_model", tokenizer, quantization_method="q4_k_m")

Dataset Versions

Available at DiyRex/emobooks-dataset:

Version Samples Focus
v3 5000 Format compliance β€” single-book output, match/switch protocol
v4 5000 Expanded prompts and conversational variety
v5 5000 Category-aware descriptions
v6 5000 Sentiment-safe β€” keyword shield, 5 moods, 100 prompt styles, unique descriptions
v7 6000 Conversational β€” explicit, implicit, and direct intent detection
v8 6600 Balanced β€” added neutral greetings to prevent "Model Aggression"
v9 (Apr 2026) 66 000 Cleaned Sinhala-only catalog (607 books, no English-language authors), readable Singlish transliteration, strict + soft-offer SYSTEM_PROMPT, 9 dialog arcs Γ— 8 emotions, anti-hallucination guardrail at runtime

The v9 chat file is data/emobooks_chat_v3.jsonl (file name kept for back-compat with training scripts; the release tag is v9.0). The human-readable catalog reference is at reference/reference_books.{json,csv} and reference/curated_sinhala_novels.json.

License

Apache 2.0

Downloads last month
51
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support