Jump to content

Bioconductor: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
OAbot (talk | contribs)
m Open access bot: add pmc identifier to citation with #oabot.
m clean up, typo(s) fixed: e.g → e.g.
(35 intermediate revisions by 22 users not shown)
Line 7: Line 7:
| screenshot =
| screenshot =
| developer =
| developer =
| latest_release_version = 3.9
| latest_release_version = 3.19
| latest_release_date = {{Release date and age|2019|05|03|df=yes}}
| latest_release_date = {{Start date and age|2024|05|01|df=yes}}
| operating_system = [[Linux]], [[macOS]], [[Microsoft Windows|Windows]]
| operating_system = [[Linux]], [[macOS]], [[Microsoft Windows|Windows]]
| platform = [[R programming language]]
| platform = [[R programming language]]
Line 17: Line 17:
'''Bioconductor''' is a [[Free software|free]], [[Open-source software|open source]] and [[Open source software development|open development]] software project for the analysis and comprehension of [[Genome|genomic]] data generated by [[Wet laboratory|wet lab]] experiments in [[molecular biology]].
'''Bioconductor''' is a [[Free software|free]], [[Open-source software|open source]] and [[Open source software development|open development]] software project for the analysis and comprehension of [[Genome|genomic]] data generated by [[Wet laboratory|wet lab]] experiments in [[molecular biology]].


Bioconductor is based primarily on the [[statistics|statistical]] [[R (programming language)|R programming language]], but does contain contributions in other programming languages. It has two [[Software release life cycle|releases]] each year that follow the semiannual releases of R. At any one time there is a [[Software versioning|release version]], which corresponds to the released version of R, and a [[Software versioning|development version]], which corresponds to the development version of R. Most users will find the release version appropriate for their needs. In addition there are a large number of [[genome annotation]] packages available that are mainly, but not solely, oriented towards different types of [[microarray]]s.
Bioconductor is based primarily on the [[statistics|statistical]] [[R (programming language)|R programming language]], but does contain contributions in other programming languages. It has two [[Software release life cycle|releases]] each year that follow the semiannual releases of R. At any one time there is a [[Software versioning|release version]], which corresponds to the released version of R, and a [[Software versioning|development version]], which corresponds to the development version of R. Most users will find the release version appropriate for their needs. In addition there are many [[genome annotation]] packages available that are mainly, but not solely, oriented towards different types of [[microarray]]s.

While computational methods continue to be developed to interpret biological data, the Bioconductor project is an open source software repository that hosts a wide range of statistical tools developed in the R programming environment. Utilizing a rich array of statistical and graphical features in R, many Bioconductor packages have been developed to meet various data analysis needs. The use of these packages provides a basic understanding of the R programming / command language. As a result, R and Bioconductor packages, which have a strong computing background, are used by most biologists who will benefit significantly from their ability to analyze datasets. All these results provide biologists with easy access to the analysis of genomic data without requiring programming [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4559603/ expertise].


The project was started in the Fall of 2001 and is overseen by the Bioconductor core team, based primarily at the [[Fred Hutchinson Cancer Research Center]], with other members coming from international institutions.
The project was started in the Fall of 2001 and is overseen by the Bioconductor core team, based primarily at the [[Fred Hutchinson Cancer Research Center]], with other members coming from international institutions.


== Packages ==
== Packages ==
Most Bioconductor components are distributed as [[R (programming language)|R packages]], which are add-on modules for R. Initially most of the Bioconductor software packages focused on the analysis of single channel [[Affymetrix]] and two or more channel [[Complementary DNA|cDNA]]/[[Oligonucleotide|Oligo]] microarrays. As the project has matured, the functional scope of the software packages broadened to include the analysis of all types of genomic data, such as SAGE, [[Sequence (biology)|sequence]], or [[Single nucleotide polymorphism|SNP]] data.
Most Bioconductor components are distributed as [[R packages]], which are add-on modules for R. Initially most of the Bioconductor software packages focused on the analysis of single channel [[Affymetrix]] and two or more channel [[Complementary DNA|cDNA]]/[[Oligonucleotide|Oligo]] microarrays. As the project has matured, the functional scope of the software packages broadened to include the analysis of all types of genomic data, such as SAGE, [[Sequence (biology)|sequence]], or [[Single nucleotide polymorphism|SNP]] data.


== Goals ==
== Goals ==
The broad goals of the projects are to:
The broad goals of the projects are to:
* Provide widespread access to a broad range of powerful [[Statistics|statistical]] and [[Statistical graphics|graphical]] methods for the analysis of genomic data.
* Provide widespread access to a broad range of powerful [[Statistics|statistical]] and [[Statistical graphics|graphical]] methods for the analysis of genomic data.
* Facilitate the inclusion of [[genome annotation|biological metadata]] in the analysis of genomic data, e.g. literature data from [[PubMed]], annotation data from LocusLink/Entrez.
* Facilitate the inclusion of [[genome annotation|biological metadata]] in the analysis of genomic data, e.g. literature data from [[PubMed]], annotation data from LocusLink/[[Entrez]].
* Provide a common [[Computing platform|software platform]] that enables the rapid [[Software development|development]] and [[Software deployment|deployment]] of [[Plug-in (computing)|plug-able]], [[Scalability|scalable]], and [[Interoperability|interoperable]] software.
* Provide a common [[Computing platform|software platform]] that enables the rapid [[Software development|development]] and [[Software deployment|deployment]] of [[Plug-in (computing)|plug-able]], [[Scalability|scalable]], and [[Interoperability|interoperable]] software.
* Further scientific understanding by producing high-quality [[Tutorial|documentation and reproducible research]].
* Further scientific understanding by producing high-quality [[Tutorial|documentation and reproducible research]].
Line 33: Line 35:


== Main features ==
== Main features ==
* '''[[Tutorial|Documentation and reproducible research]].''' Each Bioconductor package contains at least one vignette, which is a document that provides a textual, task-oriented description of the package's functionality. These vignettes come in several forms. Many are simple "[[How-to]]"s that are designed to demonstrate how a particular task can be accomplished with that package's software. Others provide a more thorough overview of the package or might even discuss general issues related to the package. In the future, the Bioconductor project is looking towards providing vignettes that are not specifically tied to a package, but rather are demonstrating more complex concepts. As with all aspects of the Bioconductor project, users are encouraged to participate in this effort.
* '''[[Tutorial|Documentation and reproducible research]].''' Each Bioconductor package contains at least one vignette, which is a document that provides a textual, task-oriented description of the package's functionality. These vignettes come in several forms. Many are simple "[[wikt:how-to|How-to]]"s that are designed to demonstrate how a particular task can be accomplished with that package's software. Others provide a more thorough overview of the package or might even discuss general issues related to the package. In the future, the Bioconductor project is looking towards providing vignettes that are not specifically tied to a package, but rather are demonstrating more complex concepts. As with all aspects of the Bioconductor project, users are encouraged to participate in this effort.
* '''[[Statistics|Statistical and graphical methods]].''' The Bioconductor project aims to provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data. Analysis packages are available for: pre-processing [[Affymetrix]] and [[Illumina (company)|Illumina]], [[Complementary DNA|cDNA]] array data; identifying [[Expression profiling|differentially expressed genes]]; graph theoretical analyses; plotting genomic data. In addition, the R package system itself provides implementations for a broad range of state-of-the-art [[Statistics|statistical]] and [[Statistical graphics|graphical]] techniques, including [[Linear regression|linear]] and [[Nonlinear regression|non-linear]] modeling, [[cluster analysis]], [[prediction]], [[Resampling (statistics)|resampling]], [[survival analysis]], and [[time series]] analysis.
* '''[[Statistics|Statistical and graphical methods]].''' The Bioconductor project aims to provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data. Analysis packages are available for: pre-processing [[Affymetrix]] and [[Illumina (company)|Illumina]], [[Complementary DNA|cDNA]] array data; identifying [[Expression profiling|differentially expressed genes]]; graph theoretical analyses; plotting genomic data. In addition, the R package system itself provides implementations for a broad range of state-of-the-art [[Statistics|statistical]] and [[Statistical graphics|graphical]] techniques, including [[Linear regression|linear]] and [[Nonlinear regression|non-linear]] modeling, [[cluster analysis]], [[prediction]], [[Resampling (statistics)|resampling]], [[survival analysis]], and [[time series]] analysis.
* '''[[Genome annotation|Genome Annotation]].''' The Bioconductor project provides software for associating microarray and other genomic data in real time to biological metadata from web databases such as [[GenBank]], LocusLink and [[PubMed]] (annotate package). Functions are also provided for incorporating the results of statistical analysis in HTML reports with links to annotation WWW resources. Software tools are available for assembling and processing genomic annotation data, from databases such as [[GenBank]], the [[Gene Ontology|Gene Ontology Consortium]], LocusLink, [[UniGene]], the [[Human Genome Project|UCSC Human Genome Project]] and others with the AnnotationDbi package. Data packages are distributed to provide mappings between different probe identifiers (e.g. Affy IDs, LocusLink, [[PubMed]]). Customized annotation libraries can also be assembled.
* '''[[Genome annotation]].''' The Bioconductor project provides software for associating microarray and other genomic data in real time to biological metadata from web databases such as [[GenBank]], LocusLink and [[PubMed]] (annotate package). Functions are also provided for incorporating the results of statistical analysis in HTML reports with links to annotation WWW resources. Software tools are available for assembling and processing genomic annotation data, from databases such as [[GenBank]], the [[Gene Ontology|Gene Ontology Consortium]], LocusLink, [[UniGene]], the [[Human Genome Project|UCSC Human Genome Project]] and others with the AnnotationDbi package. Data packages are distributed to provide mappings between different probe identifiers (e.g. Affy IDs, LocusLink, [[PubMed]]). Customized annotation libraries can also be assembled.This project also contain several functions for genomic analysis and phylogenetic (e.g. [http://bioconductor.org/packages/release/bioc/html/ggtree.html ggtree], [https://cran.r-project.org/web/packages/phytools/index.html phytools] packages ..).
* '''[[Open-source software|Open source]].''' The Bioconductor project has a commitment to full open source discipline, with distribution via a [[SourceForge.net]]-like platform. All contributions are expected to exist under an [[open source license]] such as [[Artistic License|Artistic 2.0]], [[GNU General Public License|GPL2]], or [[Berkeley Software Distribution|BSD]]. There are many different reasons why open-source software is beneficial to the analysis of microarray data and to computational biology in general. The reasons include:
* '''[[Open-source software|Open source]].''' The Bioconductor project has a commitment to full open source discipline, with distribution via a [[SourceForge.net]]-like platform. All contributions are expected to exist under an [[open source license]] such as [[Artistic License|Artistic 2.0]], [[GNU General Public License|GPL2]], or [[Berkeley Software Distribution|BSD]]. There are many different reasons why open-source software is beneficial to the analysis of microarray data and to [[computational biology]] in general. The reasons include:
** To provide full access to [[algorithm]]s and their implementation
** To provide full access to [[algorithm]]s and their implementation
** To facilitate software improvements through [[Software bug|bug fixing]] and [[Plug-in (computing)|plug-ins]]
** To facilitate software improvements through [[Software bug|bug fixing]] and [[Plug-in (computing)|plug-ins]]
Line 47: Line 49:


== Milestones ==
== Milestones ==
Each release of Bioconductor is developed to work best with a chosen version of R.<ref name="BioCReleasePage">{{cite web |title=Bioconductor - Release Announcements |url=https://bioconductor.org/about/release-announcements/ |website=bioconductor.org |publisher=Bioconductor |accessdate=28 May 2019}}</ref> In addition to bugfixes and updates, a new release typically adds packages. The major and current releases are
Each release of Bioconductor is developed to work best with a chosen version of R.<ref name="BioCReleasePage">{{cite web |title=Bioconductor Release Announcements |url=https://bioconductor.org/about/release-announcements/ |website=bioconductor.org |publisher=Bioconductor |accessdate=28 May 2019}}</ref> In addition to bugfixes and updates, a new release typically adds packages. The table below maps a Bioconductor release to a R version and shows the number of available Bioconductor software packages for that release.
{| class="wikitable"
{| class="wikitable"
|-
|-
! Version
! Version
! Release Date
! Release date
! Package Count
! Package count
! R dependency
! Dependency
|-
|-
| align="center" | 1.0
| align="center" | 3.19
| align="right" | {{dts|2002-05-01|abbr=on|format=dmy}}
| align="right" | {{dts|2024-05-01|abbr=on|format=dmy}}
| align="right" | 15
| align="right" | 2300
| align="center" | [[R (programming language)|R 1.5]]
| align="center" | [[R (programming language)|R 4.4]]
|-
|-
| align="center" | 2.0
| align="center" | 3.18
| align="right" | {{dts|2007-04-26|abbr=on|format=dmy}}
| align="right" | {{dts|2023-10-25|abbr=on|format=dmy}}
| align="right" | 214
| align="right" | 2266
| align="center" | [[R (programming language)|R 2.5]]
| align="center" | [[R (programming language)|R 4.3]]
|-
| align="center" | 3.16
| align="right" | {{dts|2022-11-02|abbr=on|format=dmy}}
| align="right" | 2183
| align="center" | [[R (programming language)|R 4.2]]
|-
| align="center" | 3.14
| align="right" | {{dts|2021-10-27|abbr=on|format=dmy}}
| align="right" | 2083
| align="center" | [[R (programming language)|R 4.1]]
|-
| align="center" | 3.11
| align="right" | {{dts|2020-04-28|abbr=on|format=dmy}}
| align="right" | 1903
| align="center" | [[R (programming language)|R 4.0]]
|-
| align="center" | 3.10
| align="right" | {{dts|2019-10-30|abbr=on|format=dmy}}
| align="right" | 1823
| align="center" | [[R (programming language)|R 3.6]]
|-
| align="center" | 3.8
| align="right" | {{dts|2018-10-31|abbr=on|format=dmy}}
| align="right" | 1649
| align="center" | [[R (programming language)|R 3.5]]
|-
| align="center" | 3.6
| align="right" | {{dts|2017-10-31|abbr=on|format=dmy}}
| align="right" | 1473
| align="center" | [[R (programming language)|R 3.4]]
|-
| align="center" | 3.4
| align="right" | {{dts|2016-10-18|abbr=on|format=dmy}}
| align="right" | 1296
| align="center" | [[R (programming language)|R 3.3]]
|-
| align="center" | 3.2
| align="right" | {{dts|2015-10-14|abbr=on|format=dmy}}
| align="right" | 1104
| align="center" | [[R (programming language)|R 3.2]]
|-
|-
| align="center" | 3.0
| align="center" | 3.0
Line 70: Line 112:
| align="center" | [[R (programming language)|R 3.1]]
| align="center" | [[R (programming language)|R 3.1]]
|-
|-
| align="center" | 3.9
| align="center" | 2.13
| align="right" | {{dts|2019-05-03|abbr=on|format=dmy}}
| align="right" | {{dts|2013-10-15|abbr=on|format=dmy}}
| align="right" | 1741
| align="right" | 749
| align="center" | [[R (programming language)|R 3.6]]
| align="center" | [[R (programming language)|R 3.0]]
|-
| align="center" | 2.11
| align="right" | {{dts|2012-10-03|abbr=on|format=dmy}}
| align="right" | 610
| align="center" | [[R (programming language)|R 2.15]]
|-
| align="center" | 2.9
| align="right" | {{dts|2011-11-01|abbr=on|format=dmy}}
| align="right" | 517
| align="center" | [[R (programming language)|R 2.14]]
|-
| align="center" | 2.8
| align="right" | {{dts|2011-04-14|abbr=on|format=dmy}}
| align="right" | 466
| align="center" | [[R (programming language)|R 2.13]]
|-
| align="center" | 2.7
| align="right" | {{dts|2010-11-18|abbr=on|format=dmy}}
| align="right" | 418
| align="center" | [[R (programming language)|R 2.12]]
|-
| align="center" | 2.6
| align="right" | {{dts|2010-04-23|abbr=on|format=dmy}}
| align="right" | 389
| align="center" | [[R (programming language)|R 2.11]]
|-
| align="center" | 2.5
| align="right" | {{dts|2009-10-28|abbr=on|format=dmy}}
| align="right" | 352
| align="center" | [[R (programming language)|R 2.10]]
|-
| align="center" | 2.4
| align="right" | {{dts|2009-04-21|abbr=on|format=dmy}}
| align="right" | 320
| align="center" | [[R (programming language)|R 2.9]]
|-
| align="center" | 2.3
| align="right" | {{dts|2008-10-22|abbr=on|format=dmy}}
| align="right" | 294
| align="center" | [[R (programming language)|R 2.8]]
|-
| align="center" | 2.2
| align="right" | {{dts|2008-05-01|abbr=on|format=dmy}}
| align="right" | 260
| align="center" | [[R (programming language)|R 2.7]]
|-
| align="center" | 2.1
| align="right" | {{dts|2007-10-08|abbr=on|format=dmy}}
| align="right" | 233
| align="center" | [[R (programming language)|R 2.6]]
|-
| align="center" | 2.0
| align="right" | {{dts|2007-04-26|abbr=on|format=dmy}}
| align="right" | 214
| align="center" | [[R (programming language)|R 2.5]]
|-
| align="center" | 1.9
| align="right" | {{dts|2006-10-04|abbr=on|format=dmy}}
| align="right" | 188
| align="center" | [[R (programming language)|R 2.4]]
|-
| align="center" | 1.8
| align="right" | {{dts|2006-04-27|abbr=on|format=dmy}}
| align="right" | 172
| align="center" | [[R (programming language)|R 2.3]]
|-
| align="center" | 1.7
| align="right" | {{dts|2005-10-14|abbr=on|format=dmy}}
| align="right" | 141
| align="center" | [[R (programming language)|R 2.2]]
|-
| align="center" | 1.6
| align="right" | {{dts|2005-05-18|abbr=on|format=dmy}}
| align="right" | 123
| align="center" | [[R (programming language)|R 2.1]]
|-
| align="center" | 1.5
| align="right" | {{dts|2004-10-25|abbr=on|format=dmy}}
| align="right" | 100
| align="center" | [[R (programming language)|R 2.0]]
|-
| align="center" | 1.4
| align="right" | {{dts|2004-05-17|abbr=on|format=dmy}}
| align="right" | 81
| align="center" | [[R (programming language)|R 1.9]]
|-
| align="center" | 1.3
| align="right" | {{dts|2003-10-30|abbr=on|format=dmy}}
| align="right" | 49
| align="center" | [[R (programming language)|R 1.8]]
|-
| align="center" | 1.2
| align="right" | {{dts|2003-05-29|abbr=on|format=dmy}}
| align="right" | 30
| align="center" | [[R (programming language)|R 1.7]]
|-
| align="center" | 1.1
| align="right" | {{dts|2002-10-19|abbr=on|format=dmy}}
| align="right" | 20
| align="center" | [[R (programming language)|R 1.6]]
|-
| align="center" | 1.0
| align="right" | {{dts|2002-05-01|abbr=on|format=dmy}}
| align="right" | 15
| align="center" | [[R (programming language)|R 1.5]]
|}
|}


== Resources ==
== Resources ==
*{{cite book |last=Gentleman |first=R. |author2=Carey, V. |author3=Huber, W. |author4=Irizarry, R. |author5=Dudoit, S.|author5-link=Sandrine Dudoit |year=2005 |title=Bioinformatics and Computational Biology Solutions Using R and Bioconductor |publisher=Springer |isbn=978-0-387-25146-2}}
*{{cite book |last1=Gentleman |first1=R. |author1-link=Robert Gentleman (statistician)|author2=Carey, V. |author3=Huber, W.|author3-link=
Wolfgang Huber (scientist) |author4=Irizarry, R. |author4-link= Rafael Irizarry (scientist) |author5=Dudoit, S.|author5-link=Sandrine Dudoit |year=2005 |title=Bioinformatics and Computational Biology Solutions Using R and Bioconductor |publisher=Springer |isbn=978-0-387-25146-2}}
*{{cite book |last=Gentleman |first=R. |year=2008 |title=R Programming for Bioinformatics |publisher=Chapman & Hall/CRC |isbn=1-4200-6367-7 |url=https://books.google.com/books?id=34Y6WjJy8zEC}}
*{{cite book |last=Gentleman |first=R. |author-link=Robert Gentleman (statistician)|year=2008 |title=R Programming for Bioinformatics |publisher=Chapman & Hall/CRC |isbn=978-1-4200-6367-7 |url=https://books.google.com/books?id=34Y6WjJy8zEC}}
*{{cite book |last=Hahne |first=F. |author2=Huber, W. |author3=Gentleman, R. |author4= Falcon, S. |year=2008 |title=Bioconductor Case Studies |publisher=Springer |isbn=978-0-387-77239-4 |url=https://books.google.com/books?id=F3tAehmRHSwC}}
*{{cite book |last=Hahne |first=F. |author2=Huber, W. |author2-link=
Wolfgang Huber (scientist) |author3=Gentleman, R.|author3-link=Robert Gentleman (statistician)|author4= Falcon, S. |year=2008 |title=Bioconductor Case Studies |publisher=Springer |isbn=978-0-387-77239-4 |url=https://books.google.com/books?id=F3tAehmRHSwC}}
*{{cite journal|last1=Gentleman|first1=Robert C.|author1-link=Robert Gentleman (statistician)|last2=Carey|first2=Vincent J.|last3=Bates|first3=Douglas M.|last4=Bolstad|first4=Ben|last5=Dettling|first5=Marcel|last6=Dudoit|first6=Sandrine|author6-link=Sandrine Dudoit|last7=Ellis|first7=Byron|last8=Gautier|first8=Laurent|last9=Ge|first9=Yongchao|last10=Gentry|first10=Jeff|last11=Hornik|first11=Kurt|last12=Hothorn|first12=Torsten|last13=Huber|first13=Wolfgang|last14=Iacus|first14=Stefano|last15=Irizarry|first15=Rafael|last16=Leisch|first16=Friedrich|last17=Li|first17=Cheng|last18=Maechler|first18=Martin|last19=Rossini|first19=Anthony J.|last20=Sawitzki|first20=Gunther|last21=Smith|first21=Colin|last22=Smyth|first22=Gordon|last23=Tierney|first23=Luke|last24=Yang|first24=Jean Y. H.|author24-link=Jean Yang|last25=Zhang|first25=Jianhua|doi=10.1186/gb-2004-5-10-r80|issue=10|journal=[[Genome Biology]]|page=R80|title=Bioconductor: open software development for computational biology and bioinformatics|volume=5|year=2004|pmc=545600}}
*{{cite journal|last1=Gentleman|first1=Robert C.|author1-link=Robert Gentleman (statistician)|last2=Carey|first2=Vincent J.|last3=Bates|first3=Douglas M.|last4=Bolstad|first4=Ben|last5=Dettling|first5=Marcel|author5-link=Marcel Dettling|last6=Dudoit|first6=Sandrine|author6-link=Sandrine Dudoit|last7=Ellis|first7=Byron|last8=Gautier|first8=Laurent|last9=Ge|first9=Yongchao|last10=Gentry|first10=Jeff|last11=Hornik|first11=Kurt|last12=Hothorn|first12=Torsten|last13=Huber|first13=Wolfgang|author13-link=
Wolfgang Huber (scientist) |last14=Iacus|first14=Stefano|last15=Irizarry|first15=Rafael |author15-link= Rafael Irizarry (scientist) |last16=Leisch|first16=Friedrich|last17=Li|first17=Cheng|last18=Maechler|first18=Martin|last19=Rossini|first19=Anthony J. |last20=Sawitzki|first20=Gunther|last21=Smith|first21=Colin |last22=Smyth|first22=Gordon |last23=Tierney|first23=Luke|author23-link= Luke Tierney|last24=Yang|first24=Jean Y. H.|author24-link=Jean Yang|last25=Zhang|first25=Jianhua|doi=10.1186/gb-2004-5-10-r80|issue=10|journal=[[Genome Biology]]|page=R80|title=Bioconductor: open software development for computational biology and bioinformatics|volume=5|year=2004|pmc=545600|pmid=15461798 |doi-access=free }}


==See also==
==See also==
{{Portal|Free and open-source software|Biology}}
{{Portal|Free and open-source software|Biology}}
*[[Computational Biology]]
*[[Computational biology]]
*[[Bioinformatics]]
*[[Bioinformatics]]
*[[List of open source bioinformatics software]]
*[[List of open source bioinformatics software]]
*[[List of sequence alignment software]]
*[[List of sequence alignment software]]
*[[R (programming language)]]
*[[R (programming language)]]
*[[DNA Microarray]]
*[[DNA microarray]]
*[[Affymetrix]], a microarray technology platform
*[[Affymetrix]], a microarray technology platform


Line 98: Line 248:
* {{Official website|//www.bioconductor.org}}
* {{Official website|//www.bioconductor.org}}
* [http://www.r-project.org The R Project] [[GNU]] R is a programming language for statistical computing.
* [http://www.r-project.org The R Project] [[GNU]] R is a programming language for statistical computing.
* [http://www.bioconductor.org/about/release-announcements/ Bioconductor Releases]
* The community of the [[Debian|Debian GNU/Linux]] distribution strives towards an [http://wiki.debian.org/AliothPkgBioc automated building of BioConductor packages] for their distribution. [https://web.archive.org/web/20051124175716/http://bioknoppix.hpcf.upr.edu/ BioKnoppix] and [http://dirk.eddelbuettel.com/quantian.html Quantian] are projects extending [[Knoppix]] that have contributed bootable [[Debian|Debian GNU/Linux]] CDs providing BioConductor installations.
* The community of the [[Debian|Debian GNU/Linux]] distribution strives towards an [http://wiki.debian.org/AliothPkgBioc automated building of BioConductor packages] {{Webarchive|url=https://web.archive.org/web/20070811135011/http://wiki.debian.org/AliothPkgBioc |date=2007-08-11 }} for their distribution. [https://web.archive.org/web/20051124175716/http://bioknoppix.hpcf.upr.edu/ BioKnoppix] and [http://dirk.eddelbuettel.com/quantian.html Quantian] are projects extending [[Knoppix]] that have contributed bootable [[Debian|Debian GNU/Linux]] CDs providing BioConductor installations.


[[Category:Bioinformatics software]]
[[Category:Free bioinformatics software]]
[[Category:Free bioinformatics software]]
[[Category:Free science software]]
[[Category:Free R (programming language) software]]
[[Category:Free R (programming language) software]]
[[Category:Science software for MacOS]]
[[Category:Science software for macOS]]
[[Category:Science software for Windows]]
[[Category:Science software for Windows]]
[[Category:Science software for Linux]]
[[Category:Science software for Linux]]

Revision as of 21:49, 22 July 2024

Bioconductor
Stable release
3.19 / 1 May 2024; 6 months ago (2024-05-01)
Operating systemLinux, macOS, Windows
PlatformR programming language
TypeBioinformatics
LicenseArtistic License 2.0
Websitewww.bioconductor.org

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

Bioconductor is based primarily on the statistical R programming language, but does contain contributions in other programming languages. It has two releases each year that follow the semiannual releases of R. At any one time there is a release version, which corresponds to the released version of R, and a development version, which corresponds to the development version of R. Most users will find the release version appropriate for their needs. In addition there are many genome annotation packages available that are mainly, but not solely, oriented towards different types of microarrays.

While computational methods continue to be developed to interpret biological data, the Bioconductor project is an open source software repository that hosts a wide range of statistical tools developed in the R programming environment. Utilizing a rich array of statistical and graphical features in R, many Bioconductor packages have been developed to meet various data analysis needs. The use of these packages provides a basic understanding of the R programming / command language. As a result, R and Bioconductor packages, which have a strong computing background, are used by most biologists who will benefit significantly from their ability to analyze datasets. All these results provide biologists with easy access to the analysis of genomic data without requiring programming expertise.

The project was started in the Fall of 2001 and is overseen by the Bioconductor core team, based primarily at the Fred Hutchinson Cancer Research Center, with other members coming from international institutions.

Packages

Most Bioconductor components are distributed as R packages, which are add-on modules for R. Initially most of the Bioconductor software packages focused on the analysis of single channel Affymetrix and two or more channel cDNA/Oligo microarrays. As the project has matured, the functional scope of the software packages broadened to include the analysis of all types of genomic data, such as SAGE, sequence, or SNP data.

Goals

The broad goals of the projects are to:

Main features

  • Documentation and reproducible research. Each Bioconductor package contains at least one vignette, which is a document that provides a textual, task-oriented description of the package's functionality. These vignettes come in several forms. Many are simple "How-to"s that are designed to demonstrate how a particular task can be accomplished with that package's software. Others provide a more thorough overview of the package or might even discuss general issues related to the package. In the future, the Bioconductor project is looking towards providing vignettes that are not specifically tied to a package, but rather are demonstrating more complex concepts. As with all aspects of the Bioconductor project, users are encouraged to participate in this effort.
  • Statistical and graphical methods. The Bioconductor project aims to provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data. Analysis packages are available for: pre-processing Affymetrix and Illumina, cDNA array data; identifying differentially expressed genes; graph theoretical analyses; plotting genomic data. In addition, the R package system itself provides implementations for a broad range of state-of-the-art statistical and graphical techniques, including linear and non-linear modeling, cluster analysis, prediction, resampling, survival analysis, and time series analysis.
  • Genome annotation. The Bioconductor project provides software for associating microarray and other genomic data in real time to biological metadata from web databases such as GenBank, LocusLink and PubMed (annotate package). Functions are also provided for incorporating the results of statistical analysis in HTML reports with links to annotation WWW resources. Software tools are available for assembling and processing genomic annotation data, from databases such as GenBank, the Gene Ontology Consortium, LocusLink, UniGene, the UCSC Human Genome Project and others with the AnnotationDbi package. Data packages are distributed to provide mappings between different probe identifiers (e.g. Affy IDs, LocusLink, PubMed). Customized annotation libraries can also be assembled.This project also contain several functions for genomic analysis and phylogenetic (e.g. ggtree, phytools packages ..).
  • Open source. The Bioconductor project has a commitment to full open source discipline, with distribution via a SourceForge.net-like platform. All contributions are expected to exist under an open source license such as Artistic 2.0, GPL2, or BSD. There are many different reasons why open-source software is beneficial to the analysis of microarray data and to computational biology in general. The reasons include:
    • To provide full access to algorithms and their implementation
    • To facilitate software improvements through bug fixing and plug-ins
    • To encourage good scientific computing and statistical practice by providing appropriate tools and instruction
    • To provide a workbench of tools that allow researchers to explore and expand the methods used to analyze biological data
    • To ensure that the international scientific community is the owner of the software tools needed to carry out research
    • To lead and encourage commercial support and development of those tools that are successful
    • To promote reproducible research by providing open and accessible tools with which to carry out that research (reproducible research is distinct from independent verification)
  • Open development. Users are encouraged to become developers, either by contributing Bioconductor compliant packages or documentation. Additionally Bioconductor provides a mechanism for linking together different groups with common goals to foster collaboration on software, possibly at the level of shared development.

Milestones

Each release of Bioconductor is developed to work best with a chosen version of R.[1] In addition to bugfixes and updates, a new release typically adds packages. The table below maps a Bioconductor release to a R version and shows the number of available Bioconductor software packages for that release.

Version Release date Package count R dependency
3.19 1 May 2024 2300 R 4.4
3.18 25 Oct 2023 2266 R 4.3
3.16 2 Nov 2022 2183 R 4.2
3.14 27 Oct 2021 2083 R 4.1
3.11 28 Apr 2020 1903 R 4.0
3.10 30 Oct 2019 1823 R 3.6
3.8 31 Oct 2018 1649 R 3.5
3.6 31 Oct 2017 1473 R 3.4
3.4 18 Oct 2016 1296 R 3.3
3.2 14 Oct 2015 1104 R 3.2
3.0 14 Oct 2014 934 R 3.1
2.13 15 Oct 2013 749 R 3.0
2.11 3 Oct 2012 610 R 2.15
2.9 1 Nov 2011 517 R 2.14
2.8 14 Apr 2011 466 R 2.13
2.7 18 Nov 2010 418 R 2.12
2.6 23 Apr 2010 389 R 2.11
2.5 28 Oct 2009 352 R 2.10
2.4 21 Apr 2009 320 R 2.9
2.3 22 Oct 2008 294 R 2.8
2.2 1 May 2008 260 R 2.7
2.1 8 Oct 2007 233 R 2.6
2.0 26 Apr 2007 214 R 2.5
1.9 4 Oct 2006 188 R 2.4
1.8 27 Apr 2006 172 R 2.3
1.7 14 Oct 2005 141 R 2.2
1.6 18 May 2005 123 R 2.1
1.5 25 Oct 2004 100 R 2.0
1.4 17 May 2004 81 R 1.9
1.3 30 Oct 2003 49 R 1.8
1.2 29 May 2003 30 R 1.7
1.1 19 Oct 2002 20 R 1.6
1.0 1 May 2002 15 R 1.5

Resources

  • Gentleman, R.; Carey, V.; Huber, W.; Irizarry, R.; Dudoit, S. (2005). Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer. ISBN 978-0-387-25146-2.
  • Gentleman, R. (2008). R Programming for Bioinformatics. Chapman & Hall/CRC. ISBN 978-1-4200-6367-7.
  • Hahne, F.; Huber, W.; Gentleman, R.; Falcon, S. (2008). Bioconductor Case Studies. Springer. ISBN 978-0-387-77239-4.
  • Gentleman, Robert C.; Carey, Vincent J.; Bates, Douglas M.; Bolstad, Ben; Dettling, Marcel; Dudoit, Sandrine; Ellis, Byron; Gautier, Laurent; Ge, Yongchao; Gentry, Jeff; Hornik, Kurt; Hothorn, Torsten; Huber, Wolfgang; Iacus, Stefano; Irizarry, Rafael; Leisch, Friedrich; Li, Cheng; Maechler, Martin; Rossini, Anthony J.; Sawitzki, Gunther; Smith, Colin; Smyth, Gordon; Tierney, Luke; Yang, Jean Y. H.; Zhang, Jianhua (2004). "Bioconductor: open software development for computational biology and bioinformatics". Genome Biology. 5 (10): R80. doi:10.1186/gb-2004-5-10-r80. PMC 545600. PMID 15461798.

See also

References

  1. ^ "Bioconductor – Release Announcements". bioconductor.org. Bioconductor. Retrieved 28 May 2019.