660 likes | 796 Vues
Array Studio Expression Training. OmicSoft July 28 2010. Outline. Concepts Suggested workflow Studio concepts Server concepts Hands on exercise (Microarray) Feature introduction. Suggested workflow. Client side Array Studio. Server side Array Server. Client side Array Studio
E N D
Array Studio Expression Training OmicSoft July 28 2010
Outline • Concepts • Suggested workflow • Studio concepts • Server concepts • Hands on exercise (Microarray) • Feature introduction
Suggested workflow Client side Array Studio Server side Array Server Client side Array Studio Array Viewer Raw data Xpress Data Affymetrix CEL files Raw text /Excel files (stored in local or shared folder) Search Search results Analysis Analysis results Array Studio Central storage Projects Meta data Shared views Lists (stored in server) Server side processing(optional) Shared folder raw files Server projects Download Array Studio projects (stored in local or shared folder) Projects Publish Share Shared views
Studio concepts: solution Distributed project: save all data/lists in a folder (recommended for Exon Array/SNP/CNV) Simple project: save all data/lists in a single file (recommended for MicroArray/Taqman)
Studio concepts: L shape structure Design (X) Measurement (Y) Annotation(A) Observations • Design: sample description / phenotype data • Annotation: variable annotation (i.eprobeset/gene annotation) • Measurement • Microarray data • Taqman data • Exon data • SNP data • SNP allele signal data • SNP dose/probability data • Genotype data • CNV log2 ratio data • CNV allele difference data • CNV LOH data • CNV transcript level data • Mythylation data
Studio concepts: solution Solution Project Data types –Omic data (contains data matrix, design, and annotation Table data -Omic data Annotation Design Data folder Table data List types Variable list Observation list Row list Column list General list Views List folder Lists
Studio concepts: user interface Data viewer Solution Explorer View controller Details window
Workflows Array Studio has workflows for CNV, SNP, MicroArray, Taqman and Exon array
Solution explorer • You can open multiple projects in the solution • Each project can contain multiple datasets • You can easily organize your data and lists by folders • You can rename any data/view/folder • A lot of context sensitive functions by right clicking • Commonly used right-click functions • Add view • Import design • Import annotation • New folder • Copy/paste views • Export • View audit trail • View source
Data viewer • Views are different from graphs • They are fully interactive and customizable • The status is stored by projects • You can open/close views any time • Most views can be saved as PDF/EMF/PowerPoint/Excel • Based on tabs, but you can float any view • Drag the tabs to split the viewer • F10 to float tabs • Mouse over to show the project name and data name • Your active view (not active project) will determine the default selected data
View controller • Always use view controller to customize your view • Task tab: view sensitive menus to customize your view • Variable tab: filter the variables (-omic data) • Observation tab: filter the observations (-omic data) • Filter tab: filter the observations (table data) • Legend: show legend information • Filter status and customized filters are saved with projects, and the filters might be inherited when generating new data!
Details window Details window show the details for selected variables or observations (depending on the context)
Studio concepts: interactivity • Array Studio is a fully interactive visualization package (a high dimensional version of SpotFire) • Interactivity concepts • Filtering • Selection (click, drag or lasso) • Hot track • Broadcasting • View customization (from task or legend) • Exporting • Quick demo of the interactivity concepts
Studio concepts: Selection vs Filtering • Selection shows details on demand for a particular selected row/variable or column/observation (and highlights the selected items in each view for that dataset). • Use Selection Menu to clear row and column selections. • Filtering “filters” a particular dataset (and all accompanying views) using a set of criteria.
Server concepts: what does the server do? This table only displays selected features
Statistical algorithms for Expression/Microarray data in Array Studio Omicsoft’simplementation is independent of SAS/R and Array Command (Array Studio’s command module can be run under Linux (AS can not be run under linux). All the implementations are exact (i.e. not approximations) • Proc GLM • One-way ANOVA • Two-way ANOVA • Two-way Nested ANOVA • General Linear Model (fixed, mixed, or random models). • Survival Model • Proportional hazard regression (Proc tphreg) • Logistic Regression • Proportion data logistic regression (Proc logistic)
PART II Exercises for Microarray data analysis
The hands-on training will focus on • Usability • If you know how to use Microsoft Office, you will be able to know how to use Array Studio • Interactivity • We will have lots of exercise to interact with different views • Performance • Array Studio is usually 10-1000 times faster than its competitors
Keys for Today/Keys for Success with Array Studio The goal is not to familiarize you with every command in Array Studio that you will ever use. Instead, we hope to give you a good start that you can build on yourself. The key to learning Array Studio, like any complicated software, is to practice with your own data. Don’t worry about “hurting” things. Clicking and trying out new options can only help you learn the software better. With that said, you can save your data, and always return to a previously saved version (using the Save As command). If you don’t see something, or can’t figure something ask, don’t hesitate to ask…..First, consult the Online Help and Frequently Asked Questions (http://www.omicsoft.com/help.php) database, then ask a power-user, or if they cannot help, or are not available, call Omicsoft Support (484-918-0515) or email at [email protected] Web Chat and Remote Support also available at http://www.omicsoft.com using the Live Help Button
List of features to exercise • Linear modeling and result exploration • Linear modeling-2 Way ANOVA • Volcano Plot • Summarize Inference Report • Venn diagram • Interpretation • Hierarchical clustering • Molecular signatures analysis • Pattern and power • Find neighbors • Audit trail • Signal extraction • RMA extraction • Attach design table • Raw data visualization • Web details on demand • Observation table view • VariableView • Quality Control • PairwiseScatterView • PCA
Launch Array Studio Launch Array Studio now.
Workflow Window The Workflow Window can be found on the left-hand side of the screen the first time the user starts Array Studio. Workflows are used as a starting place for first-time and novice users of Array Studio. Array Studio offers workflows for Microarray, Taqman, Exon, CNV, and Genotyping analysis. Microarray workflow includes sections for Getting Started, Manage data, Preprocess, Quality Control, Statistical Inference, and Pattern Recognition. The workflows do not contain all the commands and analyses that can be run in Array Studio, but should give the user a good start.
Create a New Project A project contains all the datasets, results, reports, views, lists, etc. in a single file (for “simple projects” i.e for microarray, Taqman data) It is perfectly fine to share/transfer the project file to another user and the other user will be able to open the project immediately (Array Studio is required) When you create a new project, the project is present in memory until you save it. Now – create a new project by pushing the New Project button in the Microarray Workflow Array Studio will prompt you to choose a type of project. For Microarray data, it is recommended to create a “simple project”. Click the Browse button, and name the project and select a save location. Click OK to continue. Note: Alternatively, to create a New Project, go to File Menu | New Project or click the New button in the toolbar.
Adding Microarray Data/Chip Normalization Choose Add Microarray data from the workflow. Select Affymetrix .CEL files from the source Add all 24.CEL files Push Submit button Array Studio provides fast RMA/GCRMA/MAS5 implementations The result is benchmarked with R packages (max difference < 1e-7) Can easily process thousands of chips in a few hours The 24 .CEL files ~30 seconds, depending on the computer speed No memory problems Alternatively, data can be added by going to the File Menu | Add Data | Add Microarray Data or clicking the Add Data button on the toolbar
Attach design table .CEL files generate the Y block (signal matrix) Array Studio automatically attaches the annotation block (A) Design block still needs to be attached to the dataset Array Studio prompts the user to attach the Design Table upon import of data. Click Yes to import Design Table. Choose Tab delimited file and select dbpts.design.txt to attach the Design Table to the dataset. Rename MicroArrayData to DBPTS (right click and choose rename) If you choose no upon import, you can always attach the Design table later on by right-clicking on the Design node for your dataset (in the Project Explorer), and choosing Import.
The Solution Explorer Switch to the Solution Explorer by finding the tab for it at the bottom of the Workflow Window (or, going to View Menu | Show Solution Explorer. The Solution Explorer is used to organize all the data and views in your project, and allow you to keep open multiple projects at a tme. Imported microarray/genotying/taqman data is organized in the –Omic data section. Generated results will usually be shown in the Table data section. Other important sections include the List section (for creating lists of genes/probesets/etc..), as well as a QC Section, Table Section, Inference Section, etc. (not shown). In Array Studio 3.6, most sections are just “folders” and can easily be changed, but the important thing to remember is that there is an –Omics section and a Tables section. For each Data, Table, Inference Report, etc., the Solution Explorer also maintains the views. Notice the Table view under DBPTS. These views can be closed and opened, and all settings are retained. Try closing the DBPTS\Table View now, then reopening it by double-clicking it in the Project Explorer.
The TableView/View Controller The TableView shows the microarray data, with the columns representing each chip, and the rows representing each probeset. The View Controlleris found on the right-hand side of Array Studio. It’s responsible for the customization of all views. Switch to the Variable Tab. The Variable Tab and Observation Tab are used for filtering of data. The Variable Tab uses the attached Gene Annotation for columns to filter, while the Observation Tab uses the attached Design Table for columns to filter. Type ^egr1$ into the Gene Symbol filter to filter the TableView for only the gene egr1. (Uses regular expressions) The Observation Tab can also be used to filter the data. Switch to it now, and filter treatment to control. Notice that the TableView is updated to reflect the filter. Note:right-clicking on treatment will offer the option of three different types of filters (radio, checkbox, and string). Clear the Observationtab filter by clicking the (All) radio box or selecting the Reset All Filters tab.
Details Window In Array Studio, all views are interactive. Selecting a column header in the TableView or a row header brings up details in the Details Window (found at the bottom of the screen), showing the Design Table information for the selected Observation (Chip) or the Gene Annotation for the selected variable (probeset). The Details Window allows the user to find out on-the-fly information about individual probesets, chips, etc..
Web Details Web Details is used to provide users with on-demand web information about particular variables/probesets. Right-click on the selected probeset in the Details Window or main view window. This brings up a list of websites the user can choose to find out info about that probeset. Select Entrez and one of the gene identifiers. Internet Explorer should open containing the web details/ Web details allows easy access to Array Server (via Search Variable Profile and Search Variable Data—to be shown later). Also includes access to GeneGo and Ingenuity’s GeneView and Gene Neighborhood functionality.
VariableView • What is the variable view? • Variable view is a highly customizable view designed for high dimensional data. It provides auto-trellis for each variable and shows the profile of each variable in its own pane • Why does Omicsoft think variable view is the most important feature of the software? • It is unique • It addresses the needs of most biologists: look at the gene profiles • It is highly optimized • It has many special features that other views do not have, e.g. confidence intervals
VariableView To add a new view to the DBPTS dataset, right click on the DBPTS node of the Solution Explorer. Click Add View, then select VariableView from the ensuing window. (Alternatively, just choose Add View from the toolbar). Scroll through all ~16000 charts, one for each gene. This view can be customized. Re-filter using the Variable Tab for ^egr1$ so that only one chart is showing. Using the Task Tab of the View Controller, customize this view.. Specify Title Columns to include Gene Symbol along with probeset. Specify Profile column to Time. Specify Split column to Treatment. Specify Transformation to Exp2. Why does the X-Axis look strange? What are we looking at? The Column Type is wrong for time…..
Column Type The VariableView’s X-Axis appears to show the time, on an integer scale. We’d rather it show each time point (1, 3, 6, 18hrs) as individual factors. This can be changed by opening the Design Node of the Solution Explorer for the dataset, then double-clicking the Table view. Column properties can be edited by going to Table Menu | Columns | Column Properties (alternatively, right click on the design column in the table view and choose Column Properties). Select time column, then change Column Type to Factor.
VariableView Now switch back to the VariableView. Notice the X axis is now correct. Now click the Show Summary Information button in the Task tab of the View Controller. On-the-fly p-value information is shown for time (profile column), treatment (split column), and the interaction of the two factors. This should not replace a formal analysis, but can be used as a way to quickly find out if a gene is significantly changing. Click the Change Profile Gallery button in the Task tab of the View Controller, and switch to a different view (choose Bar as the gallery type), then click the Show Error Bars button. Switch to the Legend tab of the View Controller to see the Legend for the chart. Any charts can be opened at any point in PowerPoint Reset all Variable Tab filters now.
Variable view: other features LASSO selection-right click and drag Control selection-for choosing multiple points F10-for popping the view out (good for multiple screens) Open in Excel Most of the features also apply to other plots
PairwiseScatterView PairwiseScatterView can be used for QC purposes, to compare biological/technical replicates. It shows a ScatterView comparing chip-to-chip, (bottom left of the view), as well as the MA Plot for each chip comparison. Add a new view, PairwiseScatterView, using the same method used earlier for VariableView. Filter the group column, in the Observation Tab to DBP.t18. The PairwiseScatterView is updated to show only the 3 chips belonging to the DBP treatment at timepoint 18. Notice that one chip 22A, appears to correlate more poorly to the other chips. This is the first indication this is an outlier chip.
Principal Component Analysis (2 components) Choose Principal Component Analysis from the Quality Control section of the microarray workflow. Alternative, choose Microarray| QC | Principal Component Analysis from menu Make sure that Demonstration is selected as project. Make sure DBPTS is selected as Data Ensure that 2 components are generated Ensure that group is selected for Group. Ensure that Calculate Hotelling T2 is selected. Click Submit.
Principal Component Analysis (2 components) PCA with two components is generated. Legend available using the Legend Tab. Automatic coloring based on the Group setting. Customize chart using Change Symbol Properties. Change Labels to All, By to chip. Chart is updated, indicating appears to be an outlier. Select chip 22A. Notice Details Window. Point should turn red. Click Exclude Selection in the Task tab of the Project Explorer. This re-runs the PCA, and creates a list, DBPTS.Observation23. This list will be used for further analysis, as it contains the 23 “good” chips.
Lists • What is a list in Array Studio? • A flat list of probesets, chips, genes, etc.. • Lists can be re-used in other projects. • Lists can be used to filter. • Lists can be used when running analysis modules to limit the analysis. • Variable Lists, Observation Lists, Row, Column, or General lists—Array Studio is smart and only shows context-specific lists.
Principal Component Analysis (3-D) Choose Principal Component Analysis from the Quality Control section of the microarray workflow. Alternative, choose Microarray| QC | Principal Component Analysis from menu Make sure that Demonstration is selected as project. Make sure DBPTS is selected as Data Ensure that 3 components are generated Ensure that group is selected for Group. Click Submit.
Principal Component Analysis (3-D) A fully interactive 3-D PCA is returned. Includes trackball tool, panning/zooming tool, and selection tool for interacting with the graph. Functions the same as 2-D plot (changing coloring, excluding selection, etc.)
Differential Expression/Two-Way ANOVA Using Workflow, select Two-Way ANOVA from the Statistical Inference section. Set Data to DBPTS. Ensure that all Variables are selected, but use the list DBPTS.Observation23 for Observations. The design of this experiment is 4 time points, with a treatment and control at each time point. Thus, contrasts should be generated for each time point, comparing the treatment (DBP) to control. To figure out the comparisons, read from the top to the bottom. For each, time, Compare to control will create 4 comparisons. Other options include generating F-Test (time, treatment , time*treatment) Pvalues, generating LSMean data, Appending LSMean data to the inference report, and generating estimate data. Click Submit to run the module.
General Linear Model Demonstration of General Linear Model Module Two-way ANOVA gives equivalent results—General Linear Model provides much power power and flexibility.
Results of Statistical Inference The Two-Way ANOVA generates a table called DBPTS.Tests in the Inference folder in the Tables section. This includes two generated views- Report and Volcano view. In addition, Lists were generated for each comparison, using the alpha level (p-value cutoff) for each comparison. A 5th list is generated, with all the significant probesets in the Two-Way ANOVA Note: Lists are generated using the adjusted p-value column, because a multiplicity adjustment was set in the Two-Way ANOVA window.
Volcano plots Volcano plots give a nice overview of the modeling results Array Studio automatically sets the layout of the plots to incorporate as much information as possible on one screen For this particular data, a 2*2 layout is set (2 rows, 2 columns) All the plots are linked (both hot track and selection) A uniform scale could be more informative Details on demand could be useful Select a probeset in the top right corner of the 1 DBP vs Control and notice that the Details Windows provides on-demand gene annotation info, including p-values, estimates, etc.. If you do not see anything on the volcano plot, reset your filter
Table reports • Volcano plot is one way to view the modeling results. Table view is another way (so is chromosome view). • Usually a table with everything is too big to explore. Filtering is essential. • To view the table reports, double click the table view generated by the modeling process. • Use Group By Mode to arrange the filters so all the raw pvalues are grouped together (and adjusted pvalues,, estimates, etc..) • Create a list that contains probesets significant in all treatments • Filter 1 DBP vs. Control.RawPValue < 0.05 • Filter 3 DBP vs. Control.RawPValue < 0.05 • Filterr 6 DBP vs. Control.RawPValue < 0.05 • The final number should be 78 rows. • Click Add Item, then Add List From Visible Rows, then choose List Source as Probe Set ID.
Broadcasting • What if you’ve filtered one dataset, and want to look at the filtered results in other open tables or datasets? • Options: • Create a list, then filter in that other dataset by that list. • Broadcast the results to all the other open datasets. • Cross-Platform broadcasting • Uses Array Server to map to a “master ID” and then “broadcasts” to the other platforms. Use when looking at multiple platforms (or species). • Broadcast your results now using Current Filter->Filter all Opened Views • Return to the previously created Variable View
Venn diagram view • Generate Venn diagram view • Right click on Solution Explorer | Data | DBPTS | Views and choose Add View • Choose VennDiagramView from the list • Select three of your lists from the Solution Explorer and darg and drop into the view. • Advanced features: change the title of the plot • Venn diagram is also interactive • Hint: to compare more than 4lists, you can use Compare Lists feature • Remember, our 3 lists were generated with the adjusted p-values, so the number of probesets similar in all three lists should not match the previously created Filtered list
Summarize Inference Report Summarize Inference Report used to count the # variables meeting certain criteria. Go to Summarize Inference Report in the microarray workflow, under Statistical Inference. Alternative, go to Microarray Menu | Inference | Summarize Inference Report. Select DBPTS.Tests, Variables all, and all 4 estimates. In Options section, build the conditions. Build Raw Pvalue<0.05 for all conditions, but make one condition for FC>2, FC>3, FC<-2, FC<-3 Make sure to name each condition. Table is generated, giving a count for each condition/estimate. Notice the interactivity of the table.
Hierarchical clustering Select Hierarchical clustering from the Pattern Recognition section of the microarray workflow. Alternatively, choose Microarray Menu| Pattern | Hierarchical clustering Make sure DBPTS is the data to be analyzed Select 18 DBP vs control.Sig379 as the working variable set. Select DBPTS.Observation23 as the working observation set Check Compute variable tree Check Generate classic dendrogram view. Push Submit button
Dendrogram • Interacts with heatmap table view • Adjust thumbnail width • Adjust thumbnail cell sizes • Fit thumbnails into window • Change color properties • Select branches • Select thumbnail blocks • Change color bars • Adjust heatmap cell sizes • Specify annotation columns • Select Gene Symbol Star