FiscalNote/billsum
Viewer • Updated • 23.5k • 16.6k • 54
How to use AlgorithmicResearchGroup/led_base_16384_billsum_summarization with Transformers:
# Use a pipeline as a high-level helper
# Warning: Pipeline type "summarization" is no longer supported in transformers v5.
# You must load the model directly (see below) or downgrade to v4.x with:
# 'pip install "transformers<5.0.0'
from transformers import pipeline
pipe = pipeline("summarization", model="AlgorithmicResearchGroup/led_base_16384_billsum_summarization") # Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("AlgorithmicResearchGroup/led_base_16384_billsum_summarization")
model = AutoModelForSeq2SeqLM.from_pretrained("AlgorithmicResearchGroup/led_base_16384_billsum_summarization")This model is a fine-tuned version of led-base-16384 on the billsum dataset.
As described in Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan, led-base-16384 was initialized from bart-base since both models share the exact same architecture. To be able to process 16K tokens, bart-base's position embedding matrix was simply copied 16 times.
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Artifact-AI/led_base_16384_billsum_summarization")
model = AutoModelForSeq2SeqLM.from_pretrained("Artifact-AI/led_base_16384_billsum_summarization")
| Model | Rouge-1 | Rouge-2 | Rouge-L | Rouge-Lsum |
|---|---|---|---|---|
| LED Large | 47.843 | 26.342 | 34.230 | 41.689 |
| LED Base | 47.672 | 26.737 | 34.568 | 41.529 |
The model is trained on the BillSum summarization dataset found here
Please find a notebook to test the model below:
@misc{led_base_16384_billsum_summarization,
title={led_base_16384_billsum_summarization},
author={Matthew Kenney},
year={2023}
}