82% of code on GitHub is copying existing files
That's the result of a recent study by researchers from the University of California, Irvine, Microsoft Research, Czech Technical and Northeastern.
They studied 4.5 million original projects on GitHub, containing all 482 million files.Of which only 85 million, equivalent to 17.63% is the original.
Most JavaScript projects contain copy files
The study only looks at projects written in C ++, Java, JavaScript and Python.In which JavaScript has 94% of the file is the same clone as the original (based on hash file).C ++ ranked second with 73%, followed by Python and Java, respectively 71% and 40%.The researchers also looked at the content of the file (based on token hash), but the results were similar.
The result of the number of files copied on GitHub
The reason is NPM
NPM is a library management tool (package or module) for both client and server in JavaScript projects.NPM is currently the world's largest package management tool with over 350,000 libraries, more than double the second tool - Apache Maven.
NPM contains many useful libraries so many developers use it.Therefore, they import more JavaScript project libraries than other languages, and the number of reusable code is also more.
See also: Microsoft and GitHub cooperated to bring Git virtual file system to macOS and Linux
Research on code is important for other studies
'Git, the source control system on GitHub is built to encourage' taking '(copying) the project.But many of the code is copied without a grab and copy file and even the entire library. '
This research result is important because 'firstly, maybe GitHub should recapture its real data scale and secondly, more and more research using a huge number of open source projects is available on GitHub. .Copying such code will affect the results of these studies.The raw data for this study can be downloaded here.http://mondego.ics.uci.edu/projects/dejavu/ This is the whole study. http://janvitek.org/pubs/oopsla17b.pdf and https://dl.acm.org/citation.cfm?doid=3152284.3133908
You should read it
- What is GitHub? Overview of GitHub
- Official news: Microsoft acquired GitHub for $ 7.5 billion
- Microsoft and GitHub cooperated to bring Git virtual file system to macOS and Linux
- Microsoft publicly released MS-DOS source code on GitHub
- The hacker claimed to successfully steal 63.2GB of Microsoft source code from GitHub
- What is GitHub? What benefits does GitHub bring?
- Snapchat source code is revealed on GitHub
- GitHub introduces a new feature that allows you to write code directly in the browser
- Microsoft is about to buy GitHub
- GitHub's machine learning tool can detect vulnerabilities in code
- The source code of the GPU for PS5 and Xbox Series X was stolen and posted on Github
- Passkeys: How to log in to GitHub without a password