82% of code on GitHub is copying existing files
They studied 4.5 million original projects on GitHub, containing all 482 million files.Of which only 85 million, equivalent to 17.63% is the original.
Most JavaScript projects contain copy files
The study only looks at projects written in C ++, Java, JavaScript and Python.In which JavaScript has 94% of the file is the same clone as the original (based on hash file).C ++ ranked second with 73%, followed by Python and Java, respectively 71% and 40%.The researchers also looked at the content of the file (based on token hash), but the results were similar.
The result of the number of files copied on GitHub
The reason is NPM
NPM is a library management tool (package or module) for both client and server in JavaScript projects.NPM is currently the world's largest package management tool with over 350,000 libraries, more than double the second tool - Apache Maven.
NPM contains many useful libraries so many developers use it.Therefore, they import more JavaScript project libraries than other languages, and the number of reusable code is also more.
See also: Microsoft and GitHub cooperated to bring Git virtual file system to macOS and Linux
Research on code is important for other studies
'Git, the source control system on GitHub is built to encourage' taking '(copying) the project.But many of the code is copied without a grab and copy file and even the entire library. '
This research result is important because 'firstly, maybe GitHub should recapture its real data scale and secondly, more and more research using a huge number of open source projects is available on GitHub. .Copying such code will affect the results of these studies.The raw data for this study can be downloaded here.http://mondego.ics.uci.edu/projects/dejavu/ This is the whole study. http://janvitek.org/pubs/oopsla17b.pdf and https://dl.acm.org/citation.cfm?doid=3152284.3133908
You should read it
- Microsoft and GitHub cooperated to bring Git virtual file system to macOS and Linux
- Microsoft publicly released MS-DOS source code on GitHub
- The hacker claimed to successfully steal 63.2GB of Microsoft source code from GitHub
- What is GitHub? What benefits does GitHub bring?
- Snapchat source code is revealed on GitHub
- GitHub introduces a new feature that allows you to write code directly in the browser
- Microsoft is about to buy GitHub
- GitHub's machine learning tool can detect vulnerabilities in code
May be interested
- Is GitHub Copilot or ChatGPT better for programming?github copilot and chatgpt are two of the most popular ai programming support tools available. they use the same gpt large language model and are capable of generating, recommending, and testing code. so which one should you use?
- How to use TeraCopy to speed up file copyingteracopy is a software that speeds up copying files on windows, to external memory cards, usb, external hard drives, saving time even with large files.
- Snapchat source code is revealed on GitHuba repository (repository - repo) called source-snapchat has been uploaded to github by a user named i5xx in tando bago village, sindh province in southeastern pakistan.
- Passkeys: How to log in to GitHub without a passwordwith github passkey, accessing your github account on your device has never been easier, safer and more convenient. below are detailed instructions.
- The GitHub app for iOS and Android is officially launched.finally, github has officially announced the launch of the github mobile version for ios and android devices.
- The source code of the GPU for PS5 and Xbox Series X was stolen and posted on Githubwhile the configuration of ps5 has not been announced specifically, the gpu source code has been posted to github for the public.
- How to fix 'File Too Large' error when copying files on Windowsthe file too large error when copying files on windows often occurs because the destination drive's file system does not support files larger than 4gb, typically fat32. in this article, tipsmake will guide you through the cause and how to fix this error so you can continue copying files easily.
- It was GitHub's turn to be ransomedgithub has been hacked for more than a day, causing code and commit repositories to disappear.
- Microsoft's private GitHub repository was hackedmicrosoft's own github repository has become the subject of data theft with more than 500 gb of data stolen by hackers.
- Protect your GitHub account with two-factor authenticationtwo-factor authentication helps keep your online accounts secure. therefore, increasing the security of your github account with two-factor authentication is a smart thing to do.