Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real-Time Scoreboard Data & Model Accuracy #1

Open
ethan-haas opened this issue May 21, 2023 · 0 comments
Open

Real-Time Scoreboard Data & Model Accuracy #1

ethan-haas opened this issue May 21, 2023 · 0 comments

Comments

@ethan-haas
Copy link

ethan-haas commented May 21, 2023

Have you explored the ESPN scoreboard and noticed that its updates lag behind the odds changes on DraftKings? Both ESPN and MLB seem to be falling significantly behind in providing real-time game statistics. Given this, I anticipate that the accuracy of any model trained with this data will be compromised.

One approach I've considered is to source the season records from ESPN, as DraftKings doesn't provide this information, and then extract additional data directly from DraftKings. For each event, you could cycle through the HTML source (the user interface tabs of each game) and maintain a record of each event ID. You can then use these event IDs to fetch live scores by making a simple request to that URL using sync.

Here's the main URL for reference: https://live.draftkings.com/sports/mlb/seasons/2023/date/20230521/games/all
Here's a game URL for reference: https://live.draftkings.com/sports/mlb/seasons/2023/date/20230521/games/5923551

Also, in my experience with data collection and model training, I've noticed that the dataset tends to overfit significantly. It's unnecessary to run 150 epochs; this merely results in the model memorizing the training set and consequently underperforming when presented with new or live data (like the kind you'd update in the user interface). I've found that using less epochs, with early stopping, helps to curb this issue. Especially when more data is being trained. I've graphed the loss of the training and validation datasets to check if the model is over or under fitting. Like I said even with 40k examples or 70k (which I have now), I can see that the model is overfitting after 40-50+ epochs. If you're interested, I'd be happy to share my code.

Espn is behind and when the odds change, it messes up the model performance. If you have time, I would appreciate you redoing the scoreboard and implementing a way to get real-time scores and data. We will have to restart capturing the data from 0. I do not have a problem with capturing the data for several weeks if not months to build the data set and sharing it with the community.

Another problem is that there is captured data that is saving as 0.0 for the predictions which will throw off the model. As well as an imbalance in the data set of samples. Right now, I found out that 2/3 of my data is favored for home so when I make a new prediction, it is biased towards home for all teams now, which will surely not be the case for every home team.

I have been keeping tracking of MoneyLine picks the NN makes before each game, for about 5 days now. I have achieved over 64% accuracy. This is decent with the data given, even the Sportsbooks makes mistakes daily on their odds. I think that improving the live scores with the live odds will greatly improve the model's understanding of why the odds have increased or decreased do to an event that happened.

Another thing I am curious about is.. what do you mean by "When a batter walks, ESPN will mark it with 4 balls in the count AND a man on second temporarily. If there are 4 balls and a man is on, count it as 0 balls." I would like you to explain this more because in general this does not make any sense to baseball. If you explain this better, I will be able to come up with a solution.

You may also reach out to me on LinkedIn. If you have any question, ideas, or even disagreements. https://www.linkedin.com/in/ethan-z-haas/

Thanks,

Ethan

@ethan-haas ethan-haas changed the title Scoreboard Scoreboard Data & Model Accuracy May 21, 2023
@ethan-haas ethan-haas changed the title Scoreboard Data & Model Accuracy Real-Time Scoreboard Data & Model Accuracy May 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant