La Liga Score Predictor

What This Release Is

This is a public football prediction bundle for Spanish La Liga.

It is built for simple pre-match score prediction from structured historical match data.

Public package version:

2026.04.1

Scope:

supported competition: Spanish La Liga only
intended usage: pre-match prediction for La Liga fixtures only

It does not include:

non-prediction application code
other internal model lines
private ingestion code
private infrastructure
app-specific code

Competition Scope

This release supports Spanish La Liga matches only.
It is not packaged or validated for other leagues.

Who This Is For

developers evaluating a football prediction model package
teams integrating pre-match score prediction into their own apps or internal tools
ML or analytics users who want a documented La Liga inference bundle

This release is strongest as:

a technical package
an integration starting point
a reproducible model bundle for experimentation

It is not positioned as:

a hosted prediction API
a live data service
a no-input prediction engine that already knows every future fixture context

The simplest way to think about it is:

this is a model-and-inference bundle
it is not a bundled football data service

Important Reality Check

This bundle ships:

the trained champion model
the feature-building wrapper
synthetic sample CSVs
runnable examples

This bundle does not ship:

a full production historical La Liga dataset
a built-in live data feed
a guarantee of matching an internal/private prediction environment exactly

That means:

predict_match(...) still needs compatible historical match data
the included sample CSVs are for demonstration and onboarding
exact outputs depend on the history data provided to the feature builder
using different history data can lead to different predictions, even with the same model artifact

For real upcoming-match prediction, users must supply compatible historical match data so the feature builder can compute pre-match context.

What It Supports

Predicting home goals
Predicting away goals
Predicting final scoreline
Predicting home / draw / away probabilities
Returning confidence level
Returning confidence score
Returning confidence margin
Returning abstain / score-range signals for fragile matches
Predicting from a full numeric feature row
Predicting from home_team, away_team, and match_date when a compatible history CSV is available
Batch prediction from a CSV of fixtures

How The 48-Signal Model Relates To The CSV

The trained model uses 48 numeric signals at inference time.

Those 48 signals do not mean your history CSV must literally contain 48 raw columns.

In the public package, those signals come from a mix of:

values directly present in the history CSV
rolling features derived by the wrapper from past match rows
fallback defaults when richer optional columns are not available

That means:

the package can still run with a thinner history CSV
prediction quality is better when the history CSV is richer
the included sample demonstrates a better recommended shape, not just a minimum runnable shape

Public API Methods

The public Python package exposes these main methods:

`predict_match(home_team, away_team, match_date)`

best for normal application use
builds features from a compatible history CSV
returns the full response shape, including advanced fields

Typical fields returned:

model_version
expected_home_goals
expected_away_goals
predicted_home_goals
predicted_away_goals
predicted_score
result_probabilities
raw_result_probabilities
confidence_level
confidence_score
confidence_margin
abstain_recommended
predicted_score_range when triggered
decoder_diagnostics
request

`predict_match_simple(home_team, away_team, match_date)`

best for product-facing score cards and lighter UI integrations
builds features from a compatible history CSV
returns the smaller public response shape

Typical fields returned:

model_version
predicted_home_goals
predicted_away_goals
predicted_score
result_probabilities
confidence_level
confidence_score
confidence_margin
abstain_recommended
predicted_score_range when triggered
request

`predict_features(features)`

best for advanced users who already manage engineered features themselves
expects the full numeric feature row
returns the full response shape, including advanced fields

Typical fields returned:

model_version
expected_home_goals
expected_away_goals
predicted_home_goals
predicted_away_goals
predicted_score
result_probabilities
raw_result_probabilities
confidence_level
confidence_score
confidence_margin
abstain_recommended
predicted_score_range when triggered
decoder_diagnostics

`predict_features_simple(features)`

best for advanced users who want the raw-feature path with a smaller response
expects the full numeric feature row
returns the smaller public response shape

Typical fields returned:

model_version
predicted_home_goals
predicted_away_goals
predicted_score
result_probabilities
confidence_level
confidence_score
confidence_margin
abstain_recommended
predicted_score_range when triggered

Public Files


  README.md
  LICENSE
  CHANGELOG.md
  RELEASE_GUIDE.md
  MODEL_CARD.md
  EVALUATION_SUMMARY.md
  DATA_FORMAT.md
  FAQ.md
  QUICK_PUBLISH_CHECKLIST.md
  ARTIFACTS_SHA256.txt
  requirements.txt
  pyproject.toml
  sample_history.csv
  sample_fixtures.csv
  predict_one.py
  predict_batch.py
  demo_cli.py
  smoke_test.py
  demo_notebook.ipynb
  la_liga_score_predictor/
    __init__.py
    predictor.py
    feature_builder.py
    artifacts/
      la_liga_score_predictor.json
      home_goals_model.cbm
      away_goals_model.cbm
      outcome_model.cbm

Installation

From inside ``:

python3 -m venv .venv
source .venv/bin/activate
pip install .

If you prefer, the dependency list is also available in requirements.txt.

Run The Included Example

This bundle includes an expanded synthetic sample_history.csv so the wrapper can be demonstrated without private data.

The included sample now provides:

120 synthetic historical match rows
20 Spanish La Liga team names
35 CSV columns
enough match depth for rolling-form features in the demo flow
richer optional fields such as player aggregates and tactic-stability values

It is still:

synthetic
limited
not a substitute for a production historical dataset

From inside ``:

PYTHONPATH=. python3 predict_one.py

Expected result style:

request summary
predicted score
home/draw/away probabilities
confidence level
abstain flag

Batch Example

From inside ``:

PYTHONPATH=. python3 predict_batch.py
cat predictions_output.csv

Smoke Test

From inside ``:

PYTHONPATH=. python3 smoke_test.py

This confirms that:

the package imports
the bundled model files load
a sample prediction runs end to end
the output has the expected public fields

CLI Demo

From inside ``:

PYTHONPATH=. python3 demo_cli.py \
  --home-team 'Girona FC' \
  --away-team 'Mallorca' \
  --match-date '2026-05-01' \
  --dataset-csv sample_history.csv \
  --pretty

Notebook Demo

A notebook starter is included for Kaggle or local notebook use:

demo_notebook.ipynb

Easiest Interface

The easiest interface is:

from la_liga_score_predictor import LaLigaScorePredictor

predictor = LaLigaScorePredictor.from_defaults(
    dataset_csv_path="sample_history.csv"
)

result = predictor.predict_match(
    home_team="Athletic",
    away_team="Osasuna",
    match_date="2026-04-21",
)

print(result["predicted_score"])
print(result["result_probabilities"])
print(result["confidence_level"])

What To Expect In Real Use

For a real application, the cleanest usage pattern is:

load the predictor
point it at your compatible historical match dataset
call predict_match(...) for upcoming fixtures

Keep expectations clear:

same model + different history data = potentially different prediction
sample CSVs are for demos, tests, and onboarding
production-grade reproducibility requires production-grade historical context

What Makes A Good History CSV

At a practical level, a strong history CSV should provide:

enough historical depth

not just a few rows
enough prior matches per team for rolling last-5 and last-10 features

stable team naming

use one consistent naming style
avoid mixing many variants for the same club

final scores for past matches

these are essential because the wrapper derives rolling form from them

richer optional context where possible

team IDs
Elo values
tactic IDs
coach IDs
player aggregate columns
tactic stability columns

coverage for the teams you want to predict

if a team is absent from the history CSV, predict_match(...) will fail

In short:

minimum CSV shape lets the package run
richer CSV shape lets the model behave more like a serious prediction engine

If you only want the product-facing fields, use:

simple_result = predictor.predict_match_simple(
    home_team="Athletic",
    away_team="Osasuna",
    match_date="2026-04-21",
)

Real Usage With Your Own History CSV

For real usage, replace sample_history.csv with your own compatible historical match CSV:

from la_liga_score_predictor import LaLigaScorePredictor

predictor = LaLigaScorePredictor.from_defaults(
    dataset_csv_path="/path/to/your/history.csv"
)

result = predictor.predict_match(
    home_team="Real Madrid",
    away_team="Valencia",
    match_date="2026-05-10",
)

The same smaller response shape is available here too:

simple_result = predictor.predict_match_simple(
    home_team="Real Madrid",
    away_team="Valencia",
    match_date="2026-05-10",
)

Raw Feature Interface

If you already have the numeric features, use the lower-level interface:

from la_liga_score_predictor import LaLigaScorePredictor

predictor = LaLigaScorePredictor.from_defaults()

features = {
    "home_avg_goals_last5_all": 1.4,
    "away_avg_goals_last5_all": 1.1,
    "home_avg_goals_last5_home": 1.6,
    "away_avg_goals_last5_away": 1.0,
    "home_avg_conceded_last5_all": 0.9,
    "away_avg_conceded_last5_all": 1.2,
    "home_avg_conceded_last5_home": 0.8,
    "away_avg_conceded_last5_away": 1.3,
    "home_win_rate_last10_all": 0.5,
    "away_win_rate_last10_all": 0.4,
    "home_win_rate_last10_home": 0.6,
    "away_win_rate_last10_away": 0.3,
    "home_draw_rate_last10": 0.2,
    "away_draw_rate_last10": 0.3,
    "home_goal_diff_last5": 2.0,
    "away_goal_diff_last5": -1.0,
    "home_rest_days": 6.0,
    "away_rest_days": 5.0,
    "home_elo_pre": 1715.0,
    "away_elo_pre": 1662.0,
    "elo_diff_pre": 53.0,
    "home_team_id": 12.0,
    "away_team_id": 19.0,
    "home_player_minutes_total_prev5": 4050.0,
    "away_player_minutes_total_prev5": 3970.0,
    "home_player_goals_total_prev5": 6.0,
    "away_player_goals_total_prev5": 4.0,
    "home_player_assists_total_prev5": 4.0,
    "away_player_assists_total_prev5": 3.0,
    "home_player_yellow_cards_total_prev5": 8.0,
    "away_player_yellow_cards_total_prev5": 10.0,
    "home_player_red_cards_total_prev5": 0.0,
    "away_player_red_cards_total_prev5": 0.0,
    "home_player_starters_count_prev5": 55.0,
    "away_player_starters_count_prev5": 55.0,
    "home_player_used_count_prev5": 76.0,
    "away_player_used_count_prev5": 73.0,
    "home_player_injured_count_prev5": 1.0,
    "away_player_injured_count_prev5": 2.0,
    "home_player_suspended_count_prev5": 0.0,
    "away_player_suspended_count_prev5": 1.0,
    "home_tactic_id": 4.0,
    "away_tactic_id": 7.0,
    "home_coach_id": 1012.0,
    "away_coach_id": 1048.0,
    "home_tactic_stability_last5": 0.8,
    "away_tactic_stability_last5": 0.4,
    "tactic_matchup_code": 4007.0,
}

result = predictor.predict_features(features)
print(result["predicted_score"])

For a smaller response:

simple_result = predictor.predict_features_simple(features)
print(simple_result)

Dataset Requirement For `predict_match()`

For predict_match(home_team, away_team, match_date) to work, the predictor needs a compatible historical match CSV.

Minimum required columns:

date
home_team
away_team
home_goals
away_goals

Better results if your CSV also includes:

Elo columns
team ids
player rolling aggregates
tactic ids
coach ids
tactic stability fields

If advanced columns are missing, the wrapper falls back to dataset-level defaults. That keeps the interface runnable, but prediction quality may be weaker than the full training environment.

Output Shape

Typical output:

{
  "predicted_score": "1-0",
  "result_probabilities": {
    "home_win": 0.46,
    "draw": 0.31,
    "away_win": 0.23
  },
  "confidence_level": "medium",
  "confidence_score": 0.46,
  "confidence_margin": 0.15,
  "abstain_recommended": false
}

Responsible Use

This package is designed for pre-match football prediction only.
It is guidance software, not a guarantee tool.
It is not validated for live in-play forecasting.
It is not validated for competitions outside Spanish La Liga.

Simple vs Advanced Output

For most web or mobile products, the main fields to show are:

predicted_score
result_probabilities
confidence_level
abstain_recommended
predicted_score_range when present

The helper methods for this are:

predict_match_simple(...)
predict_features_simple(...)

Advanced fields are also returned for developers and power users:

expected_home_goals
expected_away_goals
confidence_score
confidence_margin
raw_result_probabilities
decoder_diagnostics

Field Glossary

predicted_score
- the final exact score chosen by the model
predicted_home_goals
- the home-goal side of the chosen scoreline
predicted_away_goals
- the away-goal side of the chosen scoreline
result_probabilities
- the calibrated probabilities for home_win, draw, and away_win
raw_result_probabilities
- the pre-calibration probabilities before temperature scaling
expected_home_goals
- the model's expected goals estimate for the home team before final score decoding
expected_away_goals
- the model's expected goals estimate for the away team before final score decoding
confidence_level
- a simple label: high, medium, or low
confidence_score
- the top outcome probability after calibration
confidence_margin
- the gap between the highest and second-highest outcome probabilities
abstain_recommended
- true when the fixture is fragile enough that an exact-score claim should be treated cautiously
predicted_score_range
- an optional home/away score band returned when the fixture is fragile
decoder_diagnostics
- advanced explanation fields for developers
top_outcome
- the outcome direction with the highest calibrated probability
top_outcome_probability
- the probability of that top outcome
second_outcome_probability
- the probability of the second-strongest outcome
draw_probability
- the calibrated draw probability
xg_delta
- expected_home_goals - expected_away_goals
- positive values lean home
- negative values lean away
- near-zero values indicate a more balanced match
close_call_draw_override
- true when a near-tied outcome distribution and small expected-goal gap push the decoder toward a draw
outcome_enforced
- true when the outcome model is strong enough that the decoder forces the final score to match that direction
specialist_rule_triggered
- true when an internal score-adjustment rule fires
specialist_rule_name
- the name of that rule, if one was used
request
- echoes the home_team, away_team, and match_date used in predict_match()

Example Supported Usage

predict_match("Athletic", "Osasuna", "2026-04-21")
predict_match("Girona FC", "Real Betis", "2026-04-21")
predict_match("Mallorca", "Valencia", "2026-04-21")
batch CSV prediction for a fixture list
direct feature-row inference for power users

What This Model Does Not Do

It does not fetch fresh match history by itself.
It is not a general multi-league model release.
It does not know team names and dates magically without a compatible history CSV.
It does not include application-layer components.
It does not include app or database logic.
It is not a betting guarantee engine.

Versioning

This bundle ships the model artifact:

la_liga_score_predictor

This bundle ships the public package version:

2026.04.1

Planned public cadence:

two releases per month

Companion Documents

MODEL_CARD.md
RELEASE_GUIDE.md
EVALUATION_SUMMARY.md
DATA_FORMAT.md
FAQ.md
QUICK_PUBLISH_CHECKLIST.md
CHANGELOG.md
ARTIFACTS_SHA256.txt

Downloads last month: -; Downloads are not tracked for this model. How to track

La Liga Score Predictor

What This Release Is

Competition Scope

Who This Is For

Important Reality Check

What It Supports

How The 48-Signal Model Relates To The CSV

Public API Methods

predict_match(home_team, away_team, match_date)

predict_match_simple(home_team, away_team, match_date)

predict_features(features)

predict_features_simple(features)

Public Files

Installation

Run The Included Example

Batch Example

Smoke Test

CLI Demo

Notebook Demo

Easiest Interface

What To Expect In Real Use

What Makes A Good History CSV

Real Usage With Your Own History CSV

Raw Feature Interface

Dataset Requirement For predict_match()

Output Shape

Responsible Use

Simple vs Advanced Output

Field Glossary

Example Supported Usage

What This Model Does Not Do

Versioning

Companion Documents

`predict_match(home_team, away_team, match_date)`

`predict_match_simple(home_team, away_team, match_date)`

`predict_features(features)`

`predict_features_simple(features)`

Dataset Requirement For `predict_match()`