La Liga Score Predictor
What This Release Is
This is a public football prediction bundle for Spanish La Liga.
It is built for simple pre-match score prediction from structured historical match data.
Public package version:
2026.04.1
Scope:
- supported competition: Spanish La Liga only
- intended usage: pre-match prediction for La Liga fixtures only
It does not include:
- non-prediction application code
- other internal model lines
- private ingestion code
- private infrastructure
- app-specific code
Competition Scope
- This release supports Spanish La Liga matches only.
- It is not packaged or validated for other leagues.
Who This Is For
- developers evaluating a football prediction model package
- teams integrating pre-match score prediction into their own apps or internal tools
- ML or analytics users who want a documented La Liga inference bundle
This release is strongest as:
- a technical package
- an integration starting point
- a reproducible model bundle for experimentation
It is not positioned as:
- a hosted prediction API
- a live data service
- a no-input prediction engine that already knows every future fixture context
The simplest way to think about it is:
- this is a model-and-inference bundle
- it is not a bundled football data service
Important Reality Check
This bundle ships:
- the trained champion model
- the feature-building wrapper
- synthetic sample CSVs
- runnable examples
This bundle does not ship:
- a full production historical La Liga dataset
- a built-in live data feed
- a guarantee of matching an internal/private prediction environment exactly
That means:
predict_match(...)still needs compatible historical match data- the included sample CSVs are for demonstration and onboarding
- exact outputs depend on the history data provided to the feature builder
- using different history data can lead to different predictions, even with the same model artifact
For real upcoming-match prediction, users must supply compatible historical match data so the feature builder can compute pre-match context.
What It Supports
- Predicting
home goals - Predicting
away goals - Predicting final
scoreline - Predicting
home / draw / awayprobabilities - Returning
confidence level - Returning
confidence score - Returning
confidence margin - Returning
abstain / score-rangesignals for fragile matches - Predicting from a full numeric feature row
- Predicting from
home_team,away_team, andmatch_datewhen a compatible history CSV is available - Batch prediction from a CSV of fixtures
How The 48-Signal Model Relates To The CSV
The trained model uses 48 numeric signals at inference time.
Those 48 signals do not mean your history CSV must literally contain 48 raw columns.
In the public package, those signals come from a mix of:
- values directly present in the history CSV
- rolling features derived by the wrapper from past match rows
- fallback defaults when richer optional columns are not available
That means:
- the package can still run with a thinner history CSV
- prediction quality is better when the history CSV is richer
- the included sample demonstrates a better recommended shape, not just a minimum runnable shape
Public API Methods
The public Python package exposes these main methods:
predict_match(home_team, away_team, match_date)
- best for normal application use
- builds features from a compatible history CSV
- returns the full response shape, including advanced fields
Typical fields returned:
model_versionexpected_home_goalsexpected_away_goalspredicted_home_goalspredicted_away_goalspredicted_scoreresult_probabilitiesraw_result_probabilitiesconfidence_levelconfidence_scoreconfidence_marginabstain_recommendedpredicted_score_rangewhen triggereddecoder_diagnosticsrequest
predict_match_simple(home_team, away_team, match_date)
- best for product-facing score cards and lighter UI integrations
- builds features from a compatible history CSV
- returns the smaller public response shape
Typical fields returned:
model_versionpredicted_home_goalspredicted_away_goalspredicted_scoreresult_probabilitiesconfidence_levelconfidence_scoreconfidence_marginabstain_recommendedpredicted_score_rangewhen triggeredrequest
predict_features(features)
- best for advanced users who already manage engineered features themselves
- expects the full numeric feature row
- returns the full response shape, including advanced fields
Typical fields returned:
model_versionexpected_home_goalsexpected_away_goalspredicted_home_goalspredicted_away_goalspredicted_scoreresult_probabilitiesraw_result_probabilitiesconfidence_levelconfidence_scoreconfidence_marginabstain_recommendedpredicted_score_rangewhen triggereddecoder_diagnostics
predict_features_simple(features)
- best for advanced users who want the raw-feature path with a smaller response
- expects the full numeric feature row
- returns the smaller public response shape
Typical fields returned:
model_versionpredicted_home_goalspredicted_away_goalspredicted_scoreresult_probabilitiesconfidence_levelconfidence_scoreconfidence_marginabstain_recommendedpredicted_score_rangewhen triggered
Public Files
README.md
LICENSE
CHANGELOG.md
RELEASE_GUIDE.md
MODEL_CARD.md
EVALUATION_SUMMARY.md
DATA_FORMAT.md
FAQ.md
QUICK_PUBLISH_CHECKLIST.md
ARTIFACTS_SHA256.txt
requirements.txt
pyproject.toml
sample_history.csv
sample_fixtures.csv
predict_one.py
predict_batch.py
demo_cli.py
smoke_test.py
demo_notebook.ipynb
la_liga_score_predictor/
__init__.py
predictor.py
feature_builder.py
artifacts/
la_liga_score_predictor.json
home_goals_model.cbm
away_goals_model.cbm
outcome_model.cbm
Installation
From inside ``:
python3 -m venv .venv
source .venv/bin/activate
pip install .
If you prefer, the dependency list is also available in requirements.txt.
Run The Included Example
This bundle includes an expanded synthetic sample_history.csv so the wrapper can be demonstrated without private data.
The included sample now provides:
120synthetic historical match rows20Spanish La Liga team names35CSV columns- enough match depth for rolling-form features in the demo flow
- richer optional fields such as player aggregates and tactic-stability values
It is still:
- synthetic
- limited
- not a substitute for a production historical dataset
From inside ``:
PYTHONPATH=. python3 predict_one.py
Expected result style:
- request summary
- predicted score
- home/draw/away probabilities
- confidence level
- abstain flag
Batch Example
From inside ``:
PYTHONPATH=. python3 predict_batch.py
cat predictions_output.csv
Smoke Test
From inside ``:
PYTHONPATH=. python3 smoke_test.py
This confirms that:
- the package imports
- the bundled model files load
- a sample prediction runs end to end
- the output has the expected public fields
CLI Demo
From inside ``:
PYTHONPATH=. python3 demo_cli.py \
--home-team 'Girona FC' \
--away-team 'Mallorca' \
--match-date '2026-05-01' \
--dataset-csv sample_history.csv \
--pretty
Notebook Demo
A notebook starter is included for Kaggle or local notebook use:
demo_notebook.ipynb
Easiest Interface
The easiest interface is:
from la_liga_score_predictor import LaLigaScorePredictor
predictor = LaLigaScorePredictor.from_defaults(
dataset_csv_path="sample_history.csv"
)
result = predictor.predict_match(
home_team="Athletic",
away_team="Osasuna",
match_date="2026-04-21",
)
print(result["predicted_score"])
print(result["result_probabilities"])
print(result["confidence_level"])
What To Expect In Real Use
For a real application, the cleanest usage pattern is:
- load the predictor
- point it at your compatible historical match dataset
- call
predict_match(...)for upcoming fixtures
Keep expectations clear:
- same model + different history data = potentially different prediction
- sample CSVs are for demos, tests, and onboarding
- production-grade reproducibility requires production-grade historical context
What Makes A Good History CSV
At a practical level, a strong history CSV should provide:
- enough historical depth
- not just a few rows
- enough prior matches per team for rolling last-5 and last-10 features
- stable team naming
- use one consistent naming style
- avoid mixing many variants for the same club
- final scores for past matches
- these are essential because the wrapper derives rolling form from them
- richer optional context where possible
- team IDs
- Elo values
- tactic IDs
- coach IDs
- player aggregate columns
- tactic stability columns
- coverage for the teams you want to predict
- if a team is absent from the history CSV,
predict_match(...)will fail
In short:
- minimum CSV shape lets the package run
- richer CSV shape lets the model behave more like a serious prediction engine
If you only want the product-facing fields, use:
simple_result = predictor.predict_match_simple(
home_team="Athletic",
away_team="Osasuna",
match_date="2026-04-21",
)
Real Usage With Your Own History CSV
For real usage, replace sample_history.csv with your own compatible historical match CSV:
from la_liga_score_predictor import LaLigaScorePredictor
predictor = LaLigaScorePredictor.from_defaults(
dataset_csv_path="/path/to/your/history.csv"
)
result = predictor.predict_match(
home_team="Real Madrid",
away_team="Valencia",
match_date="2026-05-10",
)
The same smaller response shape is available here too:
simple_result = predictor.predict_match_simple(
home_team="Real Madrid",
away_team="Valencia",
match_date="2026-05-10",
)
Raw Feature Interface
If you already have the numeric features, use the lower-level interface:
from la_liga_score_predictor import LaLigaScorePredictor
predictor = LaLigaScorePredictor.from_defaults()
features = {
"home_avg_goals_last5_all": 1.4,
"away_avg_goals_last5_all": 1.1,
"home_avg_goals_last5_home": 1.6,
"away_avg_goals_last5_away": 1.0,
"home_avg_conceded_last5_all": 0.9,
"away_avg_conceded_last5_all": 1.2,
"home_avg_conceded_last5_home": 0.8,
"away_avg_conceded_last5_away": 1.3,
"home_win_rate_last10_all": 0.5,
"away_win_rate_last10_all": 0.4,
"home_win_rate_last10_home": 0.6,
"away_win_rate_last10_away": 0.3,
"home_draw_rate_last10": 0.2,
"away_draw_rate_last10": 0.3,
"home_goal_diff_last5": 2.0,
"away_goal_diff_last5": -1.0,
"home_rest_days": 6.0,
"away_rest_days": 5.0,
"home_elo_pre": 1715.0,
"away_elo_pre": 1662.0,
"elo_diff_pre": 53.0,
"home_team_id": 12.0,
"away_team_id": 19.0,
"home_player_minutes_total_prev5": 4050.0,
"away_player_minutes_total_prev5": 3970.0,
"home_player_goals_total_prev5": 6.0,
"away_player_goals_total_prev5": 4.0,
"home_player_assists_total_prev5": 4.0,
"away_player_assists_total_prev5": 3.0,
"home_player_yellow_cards_total_prev5": 8.0,
"away_player_yellow_cards_total_prev5": 10.0,
"home_player_red_cards_total_prev5": 0.0,
"away_player_red_cards_total_prev5": 0.0,
"home_player_starters_count_prev5": 55.0,
"away_player_starters_count_prev5": 55.0,
"home_player_used_count_prev5": 76.0,
"away_player_used_count_prev5": 73.0,
"home_player_injured_count_prev5": 1.0,
"away_player_injured_count_prev5": 2.0,
"home_player_suspended_count_prev5": 0.0,
"away_player_suspended_count_prev5": 1.0,
"home_tactic_id": 4.0,
"away_tactic_id": 7.0,
"home_coach_id": 1012.0,
"away_coach_id": 1048.0,
"home_tactic_stability_last5": 0.8,
"away_tactic_stability_last5": 0.4,
"tactic_matchup_code": 4007.0,
}
result = predictor.predict_features(features)
print(result["predicted_score"])
For a smaller response:
simple_result = predictor.predict_features_simple(features)
print(simple_result)
Dataset Requirement For predict_match()
For predict_match(home_team, away_team, match_date) to work, the predictor needs a compatible historical match CSV.
Minimum required columns:
datehome_teamaway_teamhome_goalsaway_goals
Better results if your CSV also includes:
- Elo columns
- team ids
- player rolling aggregates
- tactic ids
- coach ids
- tactic stability fields
If advanced columns are missing, the wrapper falls back to dataset-level defaults. That keeps the interface runnable, but prediction quality may be weaker than the full training environment.
Output Shape
Typical output:
{
"predicted_score": "1-0",
"result_probabilities": {
"home_win": 0.46,
"draw": 0.31,
"away_win": 0.23
},
"confidence_level": "medium",
"confidence_score": 0.46,
"confidence_margin": 0.15,
"abstain_recommended": false
}
Responsible Use
- This package is designed for pre-match football prediction only.
- It is guidance software, not a guarantee tool.
- It is not validated for live in-play forecasting.
- It is not validated for competitions outside Spanish La Liga.
Simple vs Advanced Output
For most web or mobile products, the main fields to show are:
predicted_scoreresult_probabilitiesconfidence_levelabstain_recommendedpredicted_score_rangewhen present
The helper methods for this are:
predict_match_simple(...)predict_features_simple(...)
Advanced fields are also returned for developers and power users:
expected_home_goalsexpected_away_goalsconfidence_scoreconfidence_marginraw_result_probabilitiesdecoder_diagnostics
Recommended product approach:
- use the simple fields in the main UI
- keep advanced fields for debug, analytics, or an expandable details view
Field Glossary
predicted_score- the final exact score chosen by the model
predicted_home_goals- the home-goal side of the chosen scoreline
predicted_away_goals- the away-goal side of the chosen scoreline
result_probabilities- the calibrated probabilities for
home_win,draw, andaway_win
- the calibrated probabilities for
raw_result_probabilities- the pre-calibration probabilities before temperature scaling
expected_home_goals- the model's expected goals estimate for the home team before final score decoding
expected_away_goals- the model's expected goals estimate for the away team before final score decoding
confidence_level- a simple label:
high,medium, orlow
- a simple label:
confidence_score- the top outcome probability after calibration
confidence_margin- the gap between the highest and second-highest outcome probabilities
abstain_recommendedtruewhen the fixture is fragile enough that an exact-score claim should be treated cautiously
predicted_score_range- an optional home/away score band returned when the fixture is fragile
decoder_diagnostics- advanced explanation fields for developers
top_outcome- the outcome direction with the highest calibrated probability
top_outcome_probability- the probability of that top outcome
second_outcome_probability- the probability of the second-strongest outcome
draw_probability- the calibrated draw probability
xg_deltaexpected_home_goals - expected_away_goals- positive values lean home
- negative values lean away
- near-zero values indicate a more balanced match
close_call_draw_overridetruewhen a near-tied outcome distribution and small expected-goal gap push the decoder toward a draw
outcome_enforcedtruewhen the outcome model is strong enough that the decoder forces the final score to match that direction
specialist_rule_triggeredtruewhen an internal score-adjustment rule fires
specialist_rule_name- the name of that rule, if one was used
request- echoes the
home_team,away_team, andmatch_dateused inpredict_match()
- echoes the
Example Supported Usage
predict_match("Athletic", "Osasuna", "2026-04-21")predict_match("Girona FC", "Real Betis", "2026-04-21")predict_match("Mallorca", "Valencia", "2026-04-21")- batch CSV prediction for a fixture list
- direct feature-row inference for power users
What This Model Does Not Do
- It does not fetch fresh match history by itself.
- It is not a general multi-league model release.
- It does not know team names and dates magically without a compatible history CSV.
- It does not include application-layer components.
- It does not include app or database logic.
- It is not a betting guarantee engine.
Versioning
This bundle ships the model artifact:
la_liga_score_predictor
This bundle ships the public package version:
2026.04.1
Planned public cadence:
- two releases per month
Companion Documents
MODEL_CARD.mdRELEASE_GUIDE.mdEVALUATION_SUMMARY.mdDATA_FORMAT.mdFAQ.mdQUICK_PUBLISH_CHECKLIST.mdCHANGELOG.mdARTIFACTS_SHA256.txt