Questions tagged [data-visualization]
Constructing and interpreting meaningful and useful graphical representations of data. (If your question is only about how to get particular software to produce a specific effect, then it is likely not on topic here.)
3,115 questions
4
votes
4
answers
658
views
Does a boxplot assume interval data?
Does a boxplot assume interval data? If not, is it then fine to use a box plot to represent Likert-scale (ordinal) data?
1
vote
1
answer
68
views
Distributional visualization of small discrete values
I have barely under 40 small integer counts per measurement type from a study subject. The counts start from zero, and the distributions are highly positively skewed. See the annexed image. Without ...
0
votes
1
answer
16
views
SPSS: related-samples friedman's two-way analysis of variance by ranks extra bar in graph [closed]
I'm new here, and somewhat of a lay-user in terms of statistics, and hoping this question will be at the appropriate place.
I'm running the mentioned analysis in SPSS on 3 iterations of an experiment ...
0
votes
0
answers
5
views
Regression table for fixest and plm models?
I am working with fixest and plm models. I usally use fixest's etable for fixest objects and stargazer for plm. The problem is ...
0
votes
0
answers
17
views
How to aggregate daily sales data to weekly for thousands of products? [closed]
I have a dataset with daily sales and prices of three thousand products for 5 years and three stores. I want to visualize price and sales trends during weeks of a year. I was thinking of creating a ...
0
votes
0
answers
20
views
Approach for characterizing 'shape' of plotted time-series data
I have a collection of time-series data aggregated at 1 second intervals over several days.
I'm attempting to formalize different 'regimes' that may exist/appear via shape of the plotted data. For ...
21
votes
4
answers
3k
views
Is it (always) better to build a model prior to viewing the data?
When it comes to data exploration, aside from checking for outliers (human error), correlated covariates, and missing values, is there a downside to viewing relationships between a response variable ...
1
vote
0
answers
28
views
T-SNE on OCTMNIST gives bad results no matter the parameters [closed]
I am trying to implement T-SNE on OCTMNIST. No matter how I try to change the parameters however I never get a good result. This is my code
...
0
votes
1
answer
35
views
covariate balance loveplot and distance
What is the distance parameter that is plotted on the love.plot ? I am not able to find any reference or explanation for this in the cobalt package. Any help understanding this value "distance&...
1
vote
0
answers
27
views
data science for economics textbook suggestion
I'm looking for a textbook for data science course in graduate economics degree program. It has to be based on Python and have a lot of examples. I'd appreciate any suggestions.
Naturally, the grad ...
1
vote
2
answers
42
views
Best way to display two variables changing over the span of a number of years?
I'm trying to display both the median rent in a city as well as its population on the y axis in relationship to the year in the x axis.
On first instinct I'd go with a bubble chart, where the y axis ...
0
votes
1
answer
75
views
Figure that shows both mean, standard deviation, and standard error?
I’m novice working with a set of behavior data (e.g. duration) that’s non-normal due to A LOT of variability. The data also has a lot of zeroes, which is why I plan to present it using means as ...
3
votes
0
answers
50
views
Why does a random term in a large GAMM model make the curves spiky and wiggly?
I am creating several GAMM models with similar structures to dynamically model an acoustic parameter across realizations from multiple subjects, who are included as random smooths in my models.
In the ...
8
votes
3
answers
581
views
Opinion / input on visualization needed
I am about to submit my first paper and have such a graph (in the paper as a $2\times 2$ plot). The plot gives quite a lot. I have a model with two varying parameters, alpha and beta, and am examining ...
4
votes
1
answer
54
views
Clarifying the default "standard error" for error bars in Microsoft Excel/Powerpoint plots (calculated without N or SD) [closed]
I have noticed that Excel allows you to toggle "error bars" for any given plot and one of the options is to have the error bars denote standard errors. This is peculiar since if you do a ...
2
votes
2
answers
547
views
Where is the inflection point here in this elbow chart?
Where do you think is the inflection point on this chart?
1
vote
0
answers
28
views
Confidence bands for binary time series data
Context: I have binary data $x_{it}\in\{0,1\}$ where $i\in\{1,...,N\}$ indexes trials and $t\in\{1,...,T\}$ indexes time (independent across trials; not independent across time). It's from a ...
2
votes
1
answer
59
views
Discrepancy Between KS Test Results and CDF Visualizations in Neuronal Correlation Analysis
Discrepancy Between KS Test Results and CDF Visualizations in Neuronal Correlation Analysis
I am analyzing Pearson Correlation Coefficient (PCC) values computed pairwise for a set of neurons measured ...
1
vote
1
answer
47
views
Interaction Plotting
How to draw an interaction diagram like the ones in the research articles. I am trying to plot an interaction diagram of a continuous predictor variable and a binary interaction variable on a ...
3
votes
1
answer
125
views
How is a PR curve plotted?
Whilst reading Machine Learning by Zhi-Hua Zhou (pg 34-35), I was a little confused on the method used to plot a PR curve and was hoping you could help me become a little less confused.
In the book it ...
1
vote
0
answers
33
views
Non-overlapping, non-stacked area plots or non-branching, non-constant quantity flow diagrams
Considering visualization for data over a small number of timepoints and a large number of series, in this case:
...
0
votes
0
answers
30
views
a discrepancy between the arima model and plot
I ran the arima model and estimated the fitted values. My constant value in the arima model is 153. Since the time variable (t_centered) was centered at zero, the constant indicates a predicted ...
2
votes
1
answer
26
views
How Can I Simplify a Radial Graph Network While Preserving Key Information?
I have created a radial graph network to visualize connections between brain regions. Each region is represented by a circle, with unique colors used for the regions. The circle border indicates group/...
0
votes
2
answers
31
views
On choice of y-axis range for visualization
Let's say that a disease has low prevalence. The prevalence is the proportion which should range from 0% to 100%. I want to visualize the trend of prevalence over the time. (Let's assume that ...
1
vote
0
answers
26
views
Bishop Gaussian Basis
In Pattern Recognition and Machine Learning by Christopher Bishop he says in Section 3.3.2 titled Predictive distribution
If we used localised basis functions such as Gaussians, then in regions away
...
12
votes
3
answers
715
views
Visual assessment of scatterplots acceptable?
I have a fairly basic question about analyzing a dataset of measurements taken on a number of fish, which I’m doing as part of a student project. So I have measurements of four species of fish of ...
14
votes
5
answers
2k
views
In what instances are 3-D charts appropriate?
All the data analysis-related texts I have read over the years recommend against using three-dimensional charts in almost all cases. To quote one of them, "Never use a 3-D chart when a two-...
0
votes
1
answer
50
views
How to derive points for each variable in nomogram by hand?
I'm dealing with nomograms at this time, but I don't really understand how to derive the exact points for each variable by hand (or in a mathematically way).
To be concrete, I brought a simple example:...
2
votes
1
answer
36
views
How to visualise the value of one predictor in a multiple linear regression
I'm looking for confirmation on whether the approach I have is statistically correct / straightforward, and if there might be any references supporting this line of thinking on how to visualise ...
5
votes
2
answers
90
views
Interpreting random effect in GAM output: gaussian quantiles
Model runs are fine. Interpreting effects is easier for factors. What is not straight forward are the outputs/plots for variables with factor levels assigned a random effect. The plots I am trying to ...
1
vote
0
answers
15
views
Discrepancy in MSE Cell Frequencies and Entropy Estimation for Various Estimators
I've been trying to replicate the results from the paper "Entropy Inference and the James-Stein Estimator, With Application to Nonlinear Gene Association Networks" in R. The goal is to ...
2
votes
0
answers
63
views
Is It Okay to Scale up Principal Component Analysis Loadings to Fill the Plotting Region When Plotting Principal Component Analysis Scores [duplicate]
I have the following plot, where I've plotted principal component analysis scores.
I overlaid the loadings in the plotting region, but they're all very short and you can't see the loadings nor their ...
1
vote
0
answers
38
views
How to tell if an interaction term should be included based on residual plot?
Applied Linear Regression Models says:
Also, residuals
should be plotted against interaction terms for potential interaction effects not included in
the regression model, such as against $X_1X_2$ , $...
1
vote
1
answer
48
views
How to plot posterior marginal means for categorical variables?
I ran this model. Each categorical variable has three levels. How can I plot the results such as posterior marginal means for each six levels by country?
...
0
votes
0
answers
14
views
Best Approach to Smooth ETA Transitions in Delivery Prediction Models
I'm working on displaying the ETA (Estimated Time of Arrival) for an item delivery using a predictive model that evolves over time. The issue I'm facing involves smoothing the transition between ...
2
votes
1
answer
46
views
Plotting/modelling a histogram with large bins of small values
I'm wondering some best practices or approaches for data where, for example, as in the below image the low value bins are most common, but you are interested in the whole distribution.
Here is a ...
1
vote
2
answers
152
views
Relative vs. absolute error bars in log-scaled plots
There's conflicting info from seemingly knowledgeable sources about the correct way to show error bars on a log-scaled plot.
$log_{10}(x \pm \Delta x)$ shows the absolute error. On the one hand, it's ...
4
votes
3
answers
456
views
Any tips to differentiate zero and non-zero values on a map with a continuous color scale?
This is more a theoretical/best practices question than a practical one, but imagine I have to make a map with points representing cities, and some variable is tied to the points color. The value ...
8
votes
3
answers
470
views
I think standard deviation of y is related to size of x. How do I create a model for this / test this?
I have a sample of data $(x_i, y_i)$. I hypothesize that $y_i$ is not dependent on $x_i$, but the standard deviation of $y_i$ depends on $x_i$
More concretely, say I assume $\textrm{Var}(y_i | x_i) = ...
1
vote
0
answers
51
views
seeking suggestions for visualizing relationships between extracted topics and ratings in R
I have a set of product online reviews data. The dataset contains review text and 1-5 star ratings.
I've extracted 5 prevalence topics using R stm package. They are price, design, packaging, promotion ...
0
votes
1
answer
60
views
Plot Hazard ratio, 95% CI with a continous variable
Can R generate some figure like this?
x-axis=age(continous), y-axis = hazard ratio for each count?
Our dataset: mydata, event=death, time to event=years, var=age, group=treatment(1) vs control(0)
0
votes
3
answers
144
views
False Negative vs False Positive for Multiclass classification
Suppose I have three classes 1,2,3.
And there's evaluation like below, where second element is false prediction where model predict class 3 while ground truth is 2.
...
0
votes
0
answers
69
views
How can i make a simple visualization of my result from a two way fixed effects regression model?
I am trying to understand the effect of the variable x1 on the dependent variable y1, and for that i have panel data. I decided to use a two way fixed effects model to estimate the effect of variable ...
2
votes
1
answer
37
views
Representing growth in learning outcomes
The average score of my class of students at the beginning of the year was 50/100 and at the end of the year, it was 80/100.
The percentage growth is 60%.
Can I write there is 1.6x growth in average ...
2
votes
3
answers
141
views
How to work out the expected rate of success when there is a guaranteed success on the nth attempt?
I'm looking to work out how to find the expected success rate when given the rate of success but, also after n-1 failed attempts, there the success rate is 100% for the nth attempt.
Intuitively I ...
0
votes
0
answers
66
views
How to express the uncertainty in the OR of a right-tailed Fisher's exact test?
Many OR (odds ratio) visualizations show 95% confidence intervals expressing uncertainty like on the left panel in figure below.
However, in genomic overrepresentation analysis right tailed Fisher ...
1
vote
1
answer
375
views
Best solution was not repeated - no reliable result for metaMDS?
I have abundance data of spiders that I captured in different forest stands and I want to perform an NMDS. It generally works, although the stress value is relatively high. However, I get the message ...
0
votes
1
answer
59
views
Parallel Mediation Model Diagram
I am building a parallel mediation model diagram but am not sure what one of the numbers signifies despite 30 minutes of Goolging.
The number in question is what's in the parenthesis for the c' path —...
0
votes
0
answers
28
views
I want to plot the decision boundaries of an SVM model with more than 2 variables
I understand that that is impossible to visualize, so I went in and PCA-transformed the variables. The problem is that I still need more than 2 principal components to get "good" ...
3
votes
2
answers
95
views
Intro to Stats- professor's mistake in an exercise on types of diagrams?
The Question
Parking at a university has become a problem. University administrators are interested in determining the average time it takes students to find a parking spot. An administrator ...