Jin Xu
University of Notre Dame
jxu1@nd.edu
Gregory Madey
University of Notre Dame
gmadey@nd.edu
Abstract
The OSS community can be considered as a complex, self-organizing system. These
systems are typically comprised of large numbers of locally interacting elements.
Developers are main components in this network. The interaction between developers
forms a collaborative social network. Study of the roles of developers and their
activities can help us determine the development of projects. In this paper, we perform a
quantitative analysis of Open Source Software developers by studying the whole
developer community at SourceForge. Our research provides topological and
evolutionary statistics for the OSS developer social network, which is helpful to
understand the OSS phenomenon. Our work shows that OSS developer network is a scale
free network.
Exploration of the Open Source Software CommunityJin Xu, Gregory MadeyThe OSS movement is a phenomenon that challenges many traditional theories in economics, softwareengineering, business strategy, and IT management. The OSS community has developed a substantialamount of the infrastructure of the Internet, and has several outstanding technical achievements, includingApache, Perl, Linux, etc. These programs were written, developed, and debugged largely by part timecontributors, who in most cases were not paid for their work, and without the benefit of any traditionalproject management techniques. A research study of how the OSS community functions may help ITplanners make more informed decisions and develop more effective strategies for using OSS software.
The OSS community can be considered as a complex, self-organizing system [Madey 2004]. Thesesystems are typically comprised of large numbers of locally interacting elements. The Open SourceSoftware (OSS) development movement is a classic example of a dynamic social network; it is also aprototype of a complex evolving network. Developers are main components in this network. As shown inFigure 1, many developers may participate in one project. A developer may join many projects. Theinteraction between developers forms a collaborative social network. Study of the roles of developers andtheir activities can help us determine the development of projects.
Some researchers have begun to study OSS developers. Nakakoji et al. [Nakakoji 2002] classify OSScommunity members into deferent roles and study the influences of different members on the OSS systemand the community in three OSS projects. A modified classification is presented by Xu [Xu 2003] toredefine OSS member roles which will be discussed in the next section. Crowston et al. [Crowston 2002]studied the OSS development teams on success factors for distributed work teams. By studying LinuxSoftware Maps (LSMs), Dempsey et al. [Dempsey 2002] analyze the body of all extant LSMs at a Linuxsite to obtain information on the nature of Linux contributions and their contributors. Data miningtechniques were used by Xu et al. to find patterns in the OSS developers’ community [Xu1 2003]. Gao etal. [Gao 2003, Xu2 2003] simulate activities of core developers on SourceForge hosted projects.Figure 1: Developer Social Network, Linked by Joint Project Membership —Cluster of Size 16(This graph is drawn by using UCINet [Ucinet])

Figure 1: Developer Social Network, Linked by Joint Project Membership —Cluster of Size 16(This graph is drawn by using UCINet [Ucinet])
All of these previous studies are either qualitative classifications or are performed on a small set ofsample projects. In this paper, we perform a quantitative analysis of Open Source Software developers bystudying the whole developer community at SourceForge. Our research provides topological andevolutionary statistics for the OSS developer social network, which is helpful to understand the OSSphenomenon. The work in this paper is the preliminary stage of our OSS community study. Based on thesestatistic data, we will develop agent-based models to simulate the development of the OSS community.
The rest of this paper is organized as follows: the next section describes the properties of OSSdeveloper network; the third section classifies roles of developers by their activities in projects; Then, datacollection and mining process are presented; Based on the collected data, statistic analysis is performed onthe SourceForge developer community; lastly, conclusions and future work are given.
OSS Developer NetworkThe OSS developer network is a scale free network whose degree distribution follows a power law.According to Barabasi and Albert [Barabasi 1999], such a network possesses two properties:
Unlike random networks which have a fixed number of nodes that are randomly connected, thenetwork grows by the sequential addition of new nodes. In our OSS developer network, with thedevelopment of projects, developers sequentially join in projects.
Unlike random networks in which the probability of two nodes being connected is independent ofthe nodes’ degree, there exists “richer gets richer” phenomenon in scale free networks. Theprobability of two nodes being connected is related to the nodes' degree, which is calledpreferential attachment. In OSS, developers tend to choose more popular projects to participate.
Analysis of the SourceForge Developer CommunityWe classified developer roles in SourceForge as follows: project leaders are administrators in eachproject; core developers are members who control CVS releases and are listed in each project; codevelopers(central and peripheral developers) are people who are assigned to tasks such as bug fixing anddocument writing, but are not listed as project leaders and core developers; active users are those whosubmit requests and post messages, but are not included in project leaders, core developers and codevelopers;passive users are gotten by excluding all developers from all users. Figure 3 shows thedistribution of developers in the whole SourceForge community. About 65% of the community is passiveusers who have no direct contributions to the development of projects. Among developers, there are 28.4%project leaders, 15.5% core developers, 33.9% central/peripheral developers and 22.2% active users. Weobserved that the central/peripheral developers have almost the same percentage as the sum of projectleaders and core developers. This is because a large portion of projects on SourceForge are not so popularthat almost all developers are initiators. (Detailed analysis of specific projects is under investigation.)

Figure 3: Distribution of SourceForge Community
Degree distribution is the frequency of the index value throughout the network. Degree distributionwas believed to be a normal distribution, but Albert and Barabasi recently found it fit a power lawdistribution in many real networks [Albert 1999]. Figure 4 gives developer distributions in SourceForgecommunity. The X coordinate is the number of projects in which each developer participated, and the Ycoordinate is the number of developers in the related categories. The right sub-graph shows thedistribution based on the log scale. From the figure, we can observe that the developer distributionmatches the power law. Such power law distribution proves that the SourceForge developer network is ascale free network. In this network, developers sequentially choose more popular projects to join. Thus, apopular project tends to attract more and more developers, while less popular project sometimes can noteven survive after a while. (More results will be presented during the conference.)

Figure 4: Degree Distribution of Developers
ConclusionsIn this paper, we classify and study Open Source Software developer network of SourceForge. Thedata collection design and process are described. By gathering data from SourceForge 2003 data dump, weperform a quantitative analysis of OSS developers’ community. Our research provides useful informationto study the development of OSS projects. Future work will focus on the simulation of OSS developernetwork based on the statistic results in this paper.
心得
在這篇文章中提到 OSS 開放源碼組織是一種龐大複雜的自體組織系統,相當於沒有主要的共用工作環境,但卻仍然可以造就大型團隊針對某些方向,設計出功能出色的軟體,其中相當具有代表性的有 Apache, Perl, Linux Kernel, PostgreSQL, 他們的組織可以說是呈現了某種共同社會網路的合作型態,類似宗教團體一般,人數越多,就有可能造就越特殊的成果,事實上存在著無尺度網路的現象,作者認為若能了解其中的結構,那麼對於我們開發大規模的專案會許多決策面的幫助,當然,在這其中,也有許多地方是我想去了解的,特別是在於這麼多人究竟是如何協同運作的。
OSS 為什麼會呈現無尺度網路的分部,卻非隨機網路的分布型態,是由於,在 SourceForge 上,假設一個專案開出來時,會加入的開發者,大概都是對於這方面有相當興趣的人,當然,若這個主題越熱門,會吸引到的人相當於越多,所以並非是隨機分布的型態,另外一點,使用這個專案軟體的使用者,會對發生的問題作回報,或者是在 mailing list 成為討論串,這不但讓軟體找到了更好的方向,也增加了使用者之間的一種廣告效應。
OSS 之所以能有這些成就,當然有許多的軟體輔助是必然的,例如 CVS ,他對於工作分配,以及後續的軟體維護追蹤升級等,都有良好的辦法可以支持,而且,即使是 CVS 本身佔有率,似乎也呈現著無尺度網路的分布效應。