Post
2017
Introducing the github-top-code dataset: A curated dataset of 1.3M+ source code files from GitHub's top ranked developers.
I collected the best source code files from Github's highest trending developers of all time, and compiled a dataset to train LLMs to write well-structured, production-grade code.
#dataset #codedataset #pretraining
ronantakizawa/github-top-code
I collected the best source code files from Github's highest trending developers of all time, and compiled a dataset to train LLMs to write well-structured, production-grade code.
#dataset #codedataset #pretraining
ronantakizawa/github-top-code