La Liga Score Predictor

What This Release Is

This is a public football prediction bundle for Spanish La Liga.

It is built for simple pre-match score prediction from structured historical match data.

Public package version:

  • 2026.04.1

Scope:

  • supported competition: Spanish La Liga only
  • intended usage: pre-match prediction for La Liga fixtures only

It does not include:

  • non-prediction application code
  • other internal model lines
  • private ingestion code
  • private infrastructure
  • app-specific code

Competition Scope

  • This release supports Spanish La Liga matches only.
  • It is not packaged or validated for other leagues.

Who This Is For

  • developers evaluating a football prediction model package
  • teams integrating pre-match score prediction into their own apps or internal tools
  • ML or analytics users who want a documented La Liga inference bundle

This release is strongest as:

  • a technical package
  • an integration starting point
  • a reproducible model bundle for experimentation

It is not positioned as:

  • a hosted prediction API
  • a live data service
  • a no-input prediction engine that already knows every future fixture context

The simplest way to think about it is:

  • this is a model-and-inference bundle
  • it is not a bundled football data service

Important Reality Check

This bundle ships:

  • the trained champion model
  • the feature-building wrapper
  • synthetic sample CSVs
  • runnable examples

This bundle does not ship:

  • a full production historical La Liga dataset
  • a built-in live data feed
  • a guarantee of matching an internal/private prediction environment exactly

That means:

  • predict_match(...) still needs compatible historical match data
  • the included sample CSVs are for demonstration and onboarding
  • exact outputs depend on the history data provided to the feature builder
  • using different history data can lead to different predictions, even with the same model artifact

For real upcoming-match prediction, users must supply compatible historical match data so the feature builder can compute pre-match context.

What It Supports

  • Predicting home goals
  • Predicting away goals
  • Predicting final scoreline
  • Predicting home / draw / away probabilities
  • Returning confidence level
  • Returning confidence score
  • Returning confidence margin
  • Returning abstain / score-range signals for fragile matches
  • Predicting from a full numeric feature row
  • Predicting from home_team, away_team, and match_date when a compatible history CSV is available
  • Batch prediction from a CSV of fixtures

How The 48-Signal Model Relates To The CSV

The trained model uses 48 numeric signals at inference time.

Those 48 signals do not mean your history CSV must literally contain 48 raw columns.

In the public package, those signals come from a mix of:

  • values directly present in the history CSV
  • rolling features derived by the wrapper from past match rows
  • fallback defaults when richer optional columns are not available

That means:

  • the package can still run with a thinner history CSV
  • prediction quality is better when the history CSV is richer
  • the included sample demonstrates a better recommended shape, not just a minimum runnable shape

Public API Methods

The public Python package exposes these main methods:

predict_match(home_team, away_team, match_date)

  • best for normal application use
  • builds features from a compatible history CSV
  • returns the full response shape, including advanced fields

Typical fields returned:

  • model_version
  • expected_home_goals
  • expected_away_goals
  • predicted_home_goals
  • predicted_away_goals
  • predicted_score
  • result_probabilities
  • raw_result_probabilities
  • confidence_level
  • confidence_score
  • confidence_margin
  • abstain_recommended
  • predicted_score_range when triggered
  • decoder_diagnostics
  • request

predict_match_simple(home_team, away_team, match_date)

  • best for product-facing score cards and lighter UI integrations
  • builds features from a compatible history CSV
  • returns the smaller public response shape

Typical fields returned:

  • model_version
  • predicted_home_goals
  • predicted_away_goals
  • predicted_score
  • result_probabilities
  • confidence_level
  • confidence_score
  • confidence_margin
  • abstain_recommended
  • predicted_score_range when triggered
  • request

predict_features(features)

  • best for advanced users who already manage engineered features themselves
  • expects the full numeric feature row
  • returns the full response shape, including advanced fields

Typical fields returned:

  • model_version
  • expected_home_goals
  • expected_away_goals
  • predicted_home_goals
  • predicted_away_goals
  • predicted_score
  • result_probabilities
  • raw_result_probabilities
  • confidence_level
  • confidence_score
  • confidence_margin
  • abstain_recommended
  • predicted_score_range when triggered
  • decoder_diagnostics

predict_features_simple(features)

  • best for advanced users who want the raw-feature path with a smaller response
  • expects the full numeric feature row
  • returns the smaller public response shape

Typical fields returned:

  • model_version
  • predicted_home_goals
  • predicted_away_goals
  • predicted_score
  • result_probabilities
  • confidence_level
  • confidence_score
  • confidence_margin
  • abstain_recommended
  • predicted_score_range when triggered

Public Files


  README.md
  LICENSE
  CHANGELOG.md
  RELEASE_GUIDE.md
  MODEL_CARD.md
  EVALUATION_SUMMARY.md
  DATA_FORMAT.md
  FAQ.md
  QUICK_PUBLISH_CHECKLIST.md
  ARTIFACTS_SHA256.txt
  requirements.txt
  pyproject.toml
  sample_history.csv
  sample_fixtures.csv
  predict_one.py
  predict_batch.py
  demo_cli.py
  smoke_test.py
  demo_notebook.ipynb
  la_liga_score_predictor/
    __init__.py
    predictor.py
    feature_builder.py
    artifacts/
      la_liga_score_predictor.json
      home_goals_model.cbm
      away_goals_model.cbm
      outcome_model.cbm

Installation

From inside ``:

python3 -m venv .venv
source .venv/bin/activate
pip install .

If you prefer, the dependency list is also available in requirements.txt.

Run The Included Example

This bundle includes an expanded synthetic sample_history.csv so the wrapper can be demonstrated without private data.

The included sample now provides:

  • 120 synthetic historical match rows
  • 20 Spanish La Liga team names
  • 35 CSV columns
  • enough match depth for rolling-form features in the demo flow
  • richer optional fields such as player aggregates and tactic-stability values

It is still:

  • synthetic
  • limited
  • not a substitute for a production historical dataset

From inside ``:

PYTHONPATH=. python3 predict_one.py

Expected result style:

  • request summary
  • predicted score
  • home/draw/away probabilities
  • confidence level
  • abstain flag

Batch Example

From inside ``:

PYTHONPATH=. python3 predict_batch.py
cat predictions_output.csv

Smoke Test

From inside ``:

PYTHONPATH=. python3 smoke_test.py

This confirms that:

  • the package imports
  • the bundled model files load
  • a sample prediction runs end to end
  • the output has the expected public fields

CLI Demo

From inside ``:

PYTHONPATH=. python3 demo_cli.py \
  --home-team 'Girona FC' \
  --away-team 'Mallorca' \
  --match-date '2026-05-01' \
  --dataset-csv sample_history.csv \
  --pretty

Notebook Demo

A notebook starter is included for Kaggle or local notebook use:

demo_notebook.ipynb

Easiest Interface

The easiest interface is:

from la_liga_score_predictor import LaLigaScorePredictor

predictor = LaLigaScorePredictor.from_defaults(
    dataset_csv_path="sample_history.csv"
)

result = predictor.predict_match(
    home_team="Athletic",
    away_team="Osasuna",
    match_date="2026-04-21",
)

print(result["predicted_score"])
print(result["result_probabilities"])
print(result["confidence_level"])

What To Expect In Real Use

For a real application, the cleanest usage pattern is:

  1. load the predictor
  2. point it at your compatible historical match dataset
  3. call predict_match(...) for upcoming fixtures

Keep expectations clear:

  • same model + different history data = potentially different prediction
  • sample CSVs are for demos, tests, and onboarding
  • production-grade reproducibility requires production-grade historical context

What Makes A Good History CSV

At a practical level, a strong history CSV should provide:

  1. enough historical depth
  • not just a few rows
  • enough prior matches per team for rolling last-5 and last-10 features
  1. stable team naming
  • use one consistent naming style
  • avoid mixing many variants for the same club
  1. final scores for past matches
  • these are essential because the wrapper derives rolling form from them
  1. richer optional context where possible
  • team IDs
  • Elo values
  • tactic IDs
  • coach IDs
  • player aggregate columns
  • tactic stability columns
  1. coverage for the teams you want to predict
  • if a team is absent from the history CSV, predict_match(...) will fail

In short:

  • minimum CSV shape lets the package run
  • richer CSV shape lets the model behave more like a serious prediction engine

If you only want the product-facing fields, use:

simple_result = predictor.predict_match_simple(
    home_team="Athletic",
    away_team="Osasuna",
    match_date="2026-04-21",
)

Real Usage With Your Own History CSV

For real usage, replace sample_history.csv with your own compatible historical match CSV:

from la_liga_score_predictor import LaLigaScorePredictor

predictor = LaLigaScorePredictor.from_defaults(
    dataset_csv_path="/path/to/your/history.csv"
)

result = predictor.predict_match(
    home_team="Real Madrid",
    away_team="Valencia",
    match_date="2026-05-10",
)

The same smaller response shape is available here too:

simple_result = predictor.predict_match_simple(
    home_team="Real Madrid",
    away_team="Valencia",
    match_date="2026-05-10",
)

Raw Feature Interface

If you already have the numeric features, use the lower-level interface:

from la_liga_score_predictor import LaLigaScorePredictor

predictor = LaLigaScorePredictor.from_defaults()

features = {
    "home_avg_goals_last5_all": 1.4,
    "away_avg_goals_last5_all": 1.1,
    "home_avg_goals_last5_home": 1.6,
    "away_avg_goals_last5_away": 1.0,
    "home_avg_conceded_last5_all": 0.9,
    "away_avg_conceded_last5_all": 1.2,
    "home_avg_conceded_last5_home": 0.8,
    "away_avg_conceded_last5_away": 1.3,
    "home_win_rate_last10_all": 0.5,
    "away_win_rate_last10_all": 0.4,
    "home_win_rate_last10_home": 0.6,
    "away_win_rate_last10_away": 0.3,
    "home_draw_rate_last10": 0.2,
    "away_draw_rate_last10": 0.3,
    "home_goal_diff_last5": 2.0,
    "away_goal_diff_last5": -1.0,
    "home_rest_days": 6.0,
    "away_rest_days": 5.0,
    "home_elo_pre": 1715.0,
    "away_elo_pre": 1662.0,
    "elo_diff_pre": 53.0,
    "home_team_id": 12.0,
    "away_team_id": 19.0,
    "home_player_minutes_total_prev5": 4050.0,
    "away_player_minutes_total_prev5": 3970.0,
    "home_player_goals_total_prev5": 6.0,
    "away_player_goals_total_prev5": 4.0,
    "home_player_assists_total_prev5": 4.0,
    "away_player_assists_total_prev5": 3.0,
    "home_player_yellow_cards_total_prev5": 8.0,
    "away_player_yellow_cards_total_prev5": 10.0,
    "home_player_red_cards_total_prev5": 0.0,
    "away_player_red_cards_total_prev5": 0.0,
    "home_player_starters_count_prev5": 55.0,
    "away_player_starters_count_prev5": 55.0,
    "home_player_used_count_prev5": 76.0,
    "away_player_used_count_prev5": 73.0,
    "home_player_injured_count_prev5": 1.0,
    "away_player_injured_count_prev5": 2.0,
    "home_player_suspended_count_prev5": 0.0,
    "away_player_suspended_count_prev5": 1.0,
    "home_tactic_id": 4.0,
    "away_tactic_id": 7.0,
    "home_coach_id": 1012.0,
    "away_coach_id": 1048.0,
    "home_tactic_stability_last5": 0.8,
    "away_tactic_stability_last5": 0.4,
    "tactic_matchup_code": 4007.0,
}

result = predictor.predict_features(features)
print(result["predicted_score"])

For a smaller response:

simple_result = predictor.predict_features_simple(features)
print(simple_result)

Dataset Requirement For predict_match()

For predict_match(home_team, away_team, match_date) to work, the predictor needs a compatible historical match CSV.

Minimum required columns:

  • date
  • home_team
  • away_team
  • home_goals
  • away_goals

Better results if your CSV also includes:

  • Elo columns
  • team ids
  • player rolling aggregates
  • tactic ids
  • coach ids
  • tactic stability fields

If advanced columns are missing, the wrapper falls back to dataset-level defaults. That keeps the interface runnable, but prediction quality may be weaker than the full training environment.

Output Shape

Typical output:

{
  "predicted_score": "1-0",
  "result_probabilities": {
    "home_win": 0.46,
    "draw": 0.31,
    "away_win": 0.23
  },
  "confidence_level": "medium",
  "confidence_score": 0.46,
  "confidence_margin": 0.15,
  "abstain_recommended": false
}

Responsible Use

  • This package is designed for pre-match football prediction only.
  • It is guidance software, not a guarantee tool.
  • It is not validated for live in-play forecasting.
  • It is not validated for competitions outside Spanish La Liga.

Simple vs Advanced Output

For most web or mobile products, the main fields to show are:

  • predicted_score
  • result_probabilities
  • confidence_level
  • abstain_recommended
  • predicted_score_range when present

The helper methods for this are:

  • predict_match_simple(...)
  • predict_features_simple(...)

Advanced fields are also returned for developers and power users:

  • expected_home_goals
  • expected_away_goals
  • confidence_score
  • confidence_margin
  • raw_result_probabilities
  • decoder_diagnostics

Recommended product approach:

  • use the simple fields in the main UI
  • keep advanced fields for debug, analytics, or an expandable details view

Field Glossary

  • predicted_score
    • the final exact score chosen by the model
  • predicted_home_goals
    • the home-goal side of the chosen scoreline
  • predicted_away_goals
    • the away-goal side of the chosen scoreline
  • result_probabilities
    • the calibrated probabilities for home_win, draw, and away_win
  • raw_result_probabilities
    • the pre-calibration probabilities before temperature scaling
  • expected_home_goals
    • the model's expected goals estimate for the home team before final score decoding
  • expected_away_goals
    • the model's expected goals estimate for the away team before final score decoding
  • confidence_level
    • a simple label: high, medium, or low
  • confidence_score
    • the top outcome probability after calibration
  • confidence_margin
    • the gap between the highest and second-highest outcome probabilities
  • abstain_recommended
    • true when the fixture is fragile enough that an exact-score claim should be treated cautiously
  • predicted_score_range
    • an optional home/away score band returned when the fixture is fragile
  • decoder_diagnostics
    • advanced explanation fields for developers
  • top_outcome
    • the outcome direction with the highest calibrated probability
  • top_outcome_probability
    • the probability of that top outcome
  • second_outcome_probability
    • the probability of the second-strongest outcome
  • draw_probability
    • the calibrated draw probability
  • xg_delta
    • expected_home_goals - expected_away_goals
    • positive values lean home
    • negative values lean away
    • near-zero values indicate a more balanced match
  • close_call_draw_override
    • true when a near-tied outcome distribution and small expected-goal gap push the decoder toward a draw
  • outcome_enforced
    • true when the outcome model is strong enough that the decoder forces the final score to match that direction
  • specialist_rule_triggered
    • true when an internal score-adjustment rule fires
  • specialist_rule_name
    • the name of that rule, if one was used
  • request
    • echoes the home_team, away_team, and match_date used in predict_match()

Example Supported Usage

  • predict_match("Athletic", "Osasuna", "2026-04-21")
  • predict_match("Girona FC", "Real Betis", "2026-04-21")
  • predict_match("Mallorca", "Valencia", "2026-04-21")
  • batch CSV prediction for a fixture list
  • direct feature-row inference for power users

What This Model Does Not Do

  • It does not fetch fresh match history by itself.
  • It is not a general multi-league model release.
  • It does not know team names and dates magically without a compatible history CSV.
  • It does not include application-layer components.
  • It does not include app or database logic.
  • It is not a betting guarantee engine.

Versioning

This bundle ships the model artifact:

  • la_liga_score_predictor

This bundle ships the public package version:

  • 2026.04.1

Planned public cadence:

  • two releases per month

Companion Documents

  • MODEL_CARD.md
  • RELEASE_GUIDE.md
  • EVALUATION_SUMMARY.md
  • DATA_FORMAT.md
  • FAQ.md
  • QUICK_PUBLISH_CHECKLIST.md
  • CHANGELOG.md
  • ARTIFACTS_SHA256.txt
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support