82% of code on GitHub is copying existing files

That's the result of a recent study by researchers from the University of California, Irvine, Microsoft Research, Czech Technical and Northeastern.

They studied 4.5 million original projects on GitHub, containing all 482 million files.Of which only 85 million, equivalent to 17.63% is the original.

Most JavaScript projects contain copy files

The study only looks at projects written in C ++, Java, JavaScript and Python.In which JavaScript has 94% of the file is the same clone as the original (based on hash file).C ++ ranked second with 73%, followed by Python and Java, respectively 71% and 40%.The researchers also looked at the content of the file (based on token hash), but the results were similar.

Picture 1 of 82% of code on GitHub is copying existing files

The result of the number of files copied on GitHub

The reason is NPM

NPM is a library management tool (package or module) for both client and server in JavaScript projects.NPM is currently the world's largest package management tool with over 350,000 libraries, more than double the second tool - Apache Maven.

NPM contains many useful libraries so many developers use it.Therefore, they import more JavaScript project libraries than other languages, and the number of reusable code is also more.

See also: Microsoft and GitHub cooperated to bring Git virtual file system to macOS and Linux

Research on code is important for other studies

'Git, the source control system on GitHub is built to encourage' taking '(copying) the project.But many of the code is copied without a grab and copy file and even the entire library. '

This research result is important because 'firstly, maybe GitHub should recapture its real data scale and secondly, more and more research using a huge number of open source projects is available on GitHub. .Copying such code will affect the results of these studies.The raw data for this study can be downloaded here.http://mondego.ics.uci.edu/projects/dejavu/ This is the whole study. http://janvitek.org/pubs/oopsla17b.pdf and https://dl.acm.org/citation.cfm?doid=3152284.3133908

Update 24 May 2019
Category

System

Mac OS X

Hardware

Game

Tech info

Technology

Science

Life

Application

Electric

Program

Mobile