TFM Miguel Perez Mateo

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Universidad Politécnica

de Madrid
Escuela Técnica Superior de
Ingenieros Informáticos

Master in Digital
Innovation / Health

Master Thesis

Medical Supply Management


through Data Visualization Tool

Author: Miguel Pérez Mateo


Supervisor: Sergio Paraíso Medina

Co-supervisor: Fco. Javier Fernández Martínez

Madrid, July 2021


This Master Thesis has been deposited in ETSI Informáticos de la Universidad
Politécnica de Madrid.

Master Thesis
Master in Digital Innovation / Health

Title: Medical Supply Management through Data Visualization Tool


July 2021

Author: Miguel Pérez Mateo

Supervisor: Co-supervisor:

Sergio Paraíso Medina Francisco Javier Fernández


PhD in Computer Science Martínez
Universidad Politécnica de Madrid PhD in Molecular Biology
Universidad Autónoma de Madrid

E.T.S de ingenieros informáticos Hospital 12 de Octubre


Resumen
La creciente digitalización de la gestión de los recursos médicos es una
tendencia que puede ayudar a monitorizar de forma eficiente el estado de los
suministros en un entorno hospitalario [1], proporcionando una herramienta
sólida para conocer el excedente y el déficit de los diferentes materiales
médicos y la eficiencia de los diferentes departamentos, proveedores y
productos.

Anteriormente se desarrollaron herramientas de visualización de datos para


visualizar datos médicos en otros ámbitos con resultados positivos.

En este trabajo fin de máster pretendemos desarrollar una herramienta de


visualización de datos para analizar los costes y consumos de los materiales
empleados para realizar las pruebas del Hospital 12 de Octubre, cuyo personal
nos proporcionó los datos y el conocimiento para realizar dichas acciones.

Para ello utilizamos Shiny, un paquete de R diseñado para construir


aplicaciones web interactivas que nos permitirán plasmar los resultados en
gráficos para que el personal de la sección de laboratorio pueda acceder de
forma segura a sus datos y extraer conclusiones de éstos.

Los resultados permitieron procesar múltiples archivos csv complejos, sacar


conclusiones sencillas de ellos y navegar entre filtros para realizar una
exploración intensiva sobre los datos, en la que se pusieron de manifiesto los
elementos de baja eficiencia. De este modo, al identificar dichos elementos
menos eficientes, se pueden adoptar las medidas de gestión adecuadas y
ahorrar costes al hospital y, por tanto, a la sociedad en su conjunto.

1
Abstract
The growing digitalization of the management of medical resources is an
increasing trend that can help to monitor in an efficient way the state of
supplies in a hospital environment [1], providing a solid tool to be aware of the
surplus and deficit of the different medical materials and the efficiency of
different departments, providers, and products.
Prior data visualization tools were developed to visualize medical data in other
domains with positive results.
In this master thesis, we aim to develop a data visualization tool to analyze the
costs and consumptions of materials employed to perform tests from the
Hospital 12 de Octubre, which personnel provided us data and knowledge to
perform said actions.
We used Shiny, an R package designed to build interactive web apps that will
allow us to plot the results in graphs so the staff of the laboratory section can
access in a secure way to their data and draw conclusions from it.
The results allowed to process multiple complex csv files, to draw simple
conclusions from them and to navigate between filters to perform an intensive
exploration on the data, in which low efficiency elements were made clear.
This way these less efficient elements can be identified to adopt the proper
measures and save costs to the hospital and therefore to the entire society.

2
Table of contents
1 Introduction ......................................................................................7
2 State of the art ................................................................................11
2.1 Data Visualization Tools ................................................................... 11
2.2 Medical DataViz Examples ............................................................... 14
3 Requirements and Design ................................................................24
3.1 Data Requirements........................................................................... 24
3.2 Dashboard Requirements ................................................................. 25
4 Developments ..................................................................................27
4.1 Original Datasets ............................................................................. 27
4.1.1 Mappings ................................................................................... 27
4.1.2 Activities .................................................................................... 28
4.1.3 Purchases .................................................................................. 29
4.2 Data cleaning ................................................................................... 29
4.3 Data merging.................................................................................... 29
4.4 Shiny Layout .................................................................................... 32
4.5 Remarks about the tests .................................................................. 33
5 Results ............................................................................................37
5.1 Data manipulation results ................................................................ 37
5.2 UI Dashboard Results ...................................................................... 40
6 Conclusions and future lines ............................................................44
7 Innovation – Entrepreneurship ........................................................48
8 Bibliography ....................................................................................50

3
Table of Figures
Figure 1: Example of possible dashboard generated with Tableau, extracted
from the official website. ............................................................................... 11
Figure 2: Power Bi dashboard provided as an example of the possible graphs
to develop...................................................................................................... 12
Figure 3: Pancreatic cancer genome viewer. .................................................. 12
Figure 4: US Emergency Department visits related to suicide, per 100.000
habitants. ..................................................................................................... 14
Figure 5 ........................................................................................................ 15
Figure 6: Coronavirus Dashboard from Informationisbeautiful. .................... 16
Figure 7: US covid interactive map ................................................................ 16
Figure 8: Washington State Department of Health dashboar: HIV information
Statewise. ..................................................................................................... 17
Figure 9: WHO dashboard concerning air pollution. ...................................... 18
Figure 10: CDC's US map of drug overdose cases ......................................... 19
Figure 11: A clinical dashboard of DataViz. In this tool, x axis always
represents the time, while each graph represents a variable observed during
that time. ...................................................................................................... 20
Figure 12: Dashboard designed as example for a donation healthcare
campaign. ..................................................................................................... 21
Figure 13: An example of the Cambridge Intelligence tool based on
customer360. ................................................................................................ 21
Figure 14: AccuGest dashboard. ................................................................... 22
Figure 15: Diagram of data manipulation. ..................................................... 31
Figure 16: Relation between articles and statistical concept code .................. 34
Figure 17: Example of relation between article and statistical concept code. . 34
Figure 18: Relation between statistical concept code and LOINC code ........... 35
Figure 19: Dasboard's login menu ................................................................. 40
Figure 20: Dashboard's Home Tab with top 10 efficiency bar plots ................ 40
Figure 21: Dasboard's providers tab with consumption bar plots .................. 41
Figure 22: Dashboard's Article tab. ............................................................... 42
Figure 23: Report generated from the providers tab. ..................................... 43
Figure 24: Example of ARIMA model with estimated area in grey .................. 45

4
Acknowledgements
In the memory of my grandfather Jose Luis Mateo.

5
6
1 Introduction
This thesis was developed for the hospital “Hospital 12 de Octubre”, located in
Madrid.
The hospital’s laboratory section has limited funds to acquire the materials
and supplies the machines need to perform all the clinical tests for the
patients. It is important to mention this, they should spend a specified
amount of money and no less than that, as it is stipulated in their provider’s
contract.
To use these funds more efficiently, they tried to spend the money with certain
care, so the materials will not expire or be left unused in the inventory as
much as it was possible.
Since they do not have any computerized control over the supplies in their
inventory, this task was performed manually up to this point, following their
purchase and consumption reports that they emit in a monthly basis.
They mentioned us that the previous year (to the one in which this thesis was
done) that they performed several cost saving tasks without sacrificing any
functionality.
This way they managed to reinvest their savings in new materials and
equipment, which increased the functionality of the hospital without
threatening the interest of the providers, because the saved money would
eventually be spent in other articles, new machines, etc.
Therefore, what in first instance may look like a money saving movement,
resulted exclusively in an increase of the hospital performance and equipment.
But this level of optimization can be extremely time consuming when done
manually, and other parameters such as a security stock for certain materials
must be considered to perform this kind of analysis without harming the
laboratory’s response to unexpected events at individual and populational
level.
Therefore, a software solution may be ideal to solve this problem: if we manage
to take the data from the monthly reports, process it and output a visual of
the rates of usage and consumption for all the products, it may help the
personnel in charge of the purchases to make better orders and thus save
more money to reinvest in the hospital in a short term.
Brief, we wanted to design a data visualization tool which will allow the
laboratory’s hospital to obtain simplified information of their medical material
to take decisions accordingly.
This tool will help them to use different filters to obtain insightful information
of the supply’s management, such as: which areas of the laboratory are more
and less efficient, which providers are more efficient and similar features.
Notably, the laboratory’s layout is distributed in several sections, one section
for each type of machine used in the tests. The main machines used are the
COBAS 8000, a large machine whose fabricator is Roche, a pharmaceutical
company that also fabricates the inputs or supplies needed for said machine.
This is the largest and the most resource consuming machine, but there are
other machines in the lab for more specific tests which consume a smaller
number of resources when compared to the main one.

7
We should also mention that some tests are not performed by machines but
are performed manually by laboratory specialists whose experience and
education allow them to draw clinical knowledge from the samples required for
such tests.
In this thesis, we were provided with some limited amount of data which was
processed manually by the hospital personnel; therefore, this data was not
entirely cleaned.
For instance, some row’s data was missing and when exploring in depth there
were some incoherencies in the data or at least intriguing elements, like some
test codes which did not belong to any specific test with such code or tests
whose article to be used had a price of 0.00€.
Thanks to their collaboration, we were able to surmount, clarify or work
around most part of these errors, so the information they provided us was of
great importance for the understanding of the data.
With all this, we should build a software that considers the possibility of
missing values and errors but should work without fails with a new set of
completely valid data in a foreseeable future.
One example of this is the case in which 2 elements in the data are not
correlated, said x article has an entry on Purchases but it has not
correspondent entry on Consumptions. Should it be safe to assume that has
an equivalent of 0€ consumed?
Or at contrary, this article should be discarded since it has no correspondence
and perhaps was misreported or accounted to the next month?
Errors and mistakes aside, we will also confront redundant or unnecessary
information that will indeed be discarded for the sake of simplicity.
Similarly, many inferences and data completion tasks will be performed to be
able to treat this test data as valid data and avoid ending up with a lot of
missing values. Note that all inferences and data completion were performed
under reasonable and deliberated assumptions.
There were similar cases in which articles were consumed more than they
were purchased. This makes no sense a priori but considering that this data is
not entirely correct and assuming the excess of the given article in the
inventory from previous years, we can assume that an article can be
consumed more than purchased for any given period if in the whole scope the
balance stays positive.
Now, they informed us that the data they provided us came from a hospital
application which provides them with 3 csv contained in one excel file, which
were then manually preprocessed. In a foreseeable future, these 3 csv would
be either modified before being provided by the application or post-processed
by our program or an intermediate one to be output in a way that can be
processed by our main program.
These 3 csv contain the information from Purchases (called Compras.csv),
Tests performed (called Actividades.csv) and Mappings (called Mapeo.csv),
which contains information related to medical goods and tests. Therefore, our
task is not only to develop a data visualization tool, but also to manipulate
data from this 3 csv to obtain simplified data which is easy to develop.

8
The main objectives are:
x Understand all the implications and relationships between the data
provided in the csv files. This extends on how primary keys and foreign
keys relate to each other and how to aggregate by unique values.
x Process the data in a coherent manner, in such way that csv files from
other years are processed the same way without errors or incoherencies.
x Design the visual dashboard that the hospital 12 de Octubre may find
useful to extract valuable information allowing them to take the
strategic decisions to save costs.
x Provide extra functionalities in such dashboard like data exports or
report generation.
x Plan future developments and applications which may be available once
we have obtained the features of our software solution.

The final project proposed was to develop a data visualization tool using R
language.
The data processing and management would be performed using a package
called TidyR and the UI part would be using a package called Shiny that
allows to develop web-like dashboards in which the plots, charts and data will
be displayed.

We should also note that due the overload of the hospital work, we had no
specific directions about the way in which those “safety margins” of article
supplies should be considered.

Therefore, we assumed a margin of 0 for all articles, meaning that in this first
version no surplus of any supply is compulsory. This assumption, although
correct for the zero-priority articles, is wrong for those articles whose tests are
of first need and can save human lives.

However, this margin will be considered for the future, and in cases when
there is a margin necessary for products, there will be a new calculation for
the efficiency according to this formula.

Without safety margin:


‫݀݁݉ݑݏ݊݋ܿݏ݀݊ݑ݂ݐ݈݊݁ܽݒ݅ݑݍܧ‬
‫ݕ݂݂ܿ݊݁݅ܿ݅ܧ‬ሺΨሻ ൌ ͳͲͲ ‫כ‬
‫݀݁ݏ݄ܽܿݎݑ݌ݏ݀݊ݑ݂ݐ݈݊݁ܽݒ݅ݑݍܧ‬

Considering safety margin:


‫݀݁݉ݑݏ݊݋ܿݏ݀݊ݑ݂ݐ݈݊݁ܽݒ݅ݑݍܧ‬
‫ݕ݂݂ܿ݊݁݅ܿ݅ܧ‬ሺΨሻ ൌ ͳͲͲ ‫כ‬
ሺ‫ ݀݁ݏ݄ܽܿݎݑ݌ݏ݀݊ݑ݂ݐ݈݊݁ܽݒ݅ݑݍܧ‬െ ‫݊݅݃ݎܽ݉ݕݐ݂݁ܽݏݐ݈݊݁ܽݒ݅ݑݍܧ‬ሻ

It is important to note that the desired efficiency values should be with limit
100% (maximum efficiency) when approached from values below 100%. If
efficiency exceeds 100%, this implies that the product is being overconsumed,
and therefore is in a trend to depletion, which is generally not a good sign.

The final dashboard would be divided in what is called “Tabs” or different main
options to explore the data concerning the articles.

The main Tab would give a general overview about the overall articles with the
worst efficiencies and highest margins of improvement in a yearly basis. This

9
way with one glance the laboratory’s staff will be able to understand the main
flaws of the system.

In the secondary Tabs, more precise and detailed information will be provided.
For instance, one will be able to look at the evolution of expenses in certain
articles along the months of the year, or in specific sections of the laboratory,
for specific tests, etc.

There will also be a display button to switch units between money (€) and
article uses (determination) only in cases where its possible such translation.

A determination or article use is considered as the following: each article


comes in a format of pack, like a printer ink. Each pack bought can be used
only a predefined number of times for the tests.

For instance, looking at the value of the articles consumed in a laboratory


section makes sense, since the user will be able to look at the overall cost that
said section implies, but makes little sense to display the amount of
determinations it consumes, since some articles can be cheap and be used
hundreds of thousands of times, whereas other articles can be expensive, and
they would all be mixed in an incoherent way.

Would have also been useful to have a more detailed information about the
expire date of purchased packs of articles. This way, paired also with more
detailed consumption report, we could also make estimations around the
optimal number of determinations per package so the providers can have
helpful information about the desired product their client really need.

Also, could be a great saving costs tool, since one could estimate how much
money in articles is expired and how modifying purchase dates maintaining
the amount bought can help to stop wasting resources.
Unfortunately, due the limited information stored by the laboratory this
objective is beyond the goal of the current thesis.

Although, the main concern here is to design a valid data visualization


dashboard to allow the specialists to obtain valuable information about the
articles purchased and consumed in the laboratory.

It is important to remark this question because what a lay person may find
non-significant, a specialist or a manager with technical experience in the
hospital 12 de Octubre may find it useful and important.

It is also important to mention that the project was developed during the
COVID-19 pandemic, more precisely in the vaccination period, therefore visits
to the hospital were restricted and the workload of their staff significatively
increased.

In this context, this project developed with complete normality, but most face
to face meetings had to be done online, and some of them postponed.

With these data and with the objective settled, we are ready to move forward to
set the specific requirements of the thesis’s project, but first we did a small
research on the available tools in the industry to design our data visualization
tool.
10
2 State of the art

2.1 Data Visualization Tools

There is a wide array of data visualization tools in the medical industry [2],
so we will provide a simple scope of the most popular ones that are available
to design this final project.
One of the most used data visualization tools is Tableau. Tableau is a visual
analytic tool for businesses and organizations. It comes with a free trial mode,
but since its conceived for organizations and businesses, it is subscription
based.
It is not only a data visualization tool, but it also allows user management and
permissions. It also allows data manipulation alongside with chart generation.

Figure 1: Example of possible dashboard generated with Tableau, extracted from the official
website.

Tableau also has a big community and provides technical support and even a
program for developers.

On the other hand, Microsoft Power Bi is useful tool to provide business


intelligence to projects. Also, Power Bi comes with a lot of integrations with
other applications such as Azure and Office, so it is easy to manipulate with
office format such as excel.

Although, it comes with the downside that a monthly fee of 10 dollars must be
paid to have access to this Microsoft service. In any case, Power bi guarantees
a complete security in the storage and access to the data.

11
Figure 2: Power Bi dashboard provided as an example of the possible graphs to develop.

In the R language there is also a very useful package for the development of
data visualization dashboards called Shiny. Shiny allows the deployment of
web-like applications for data visualization purposes and profits from the
already existing plotting packages. There are a handful of good examples of
shiny applications on the internet and there is well reported documentation
about shiny features. Shiny comes with the advantage that it is completely
free of charge.

Shiny apps have also been developed for the medical domain [3], for instance
the pancreatic cancer genome viewer which is shown below.

Figure 3: Pancreatic cancer genome viewer.

12
The previously mentioned application is specialized for the technical/clinical
field and not for the supply management, but that does not deny the validity of
Shiny as a versatile and functional tool.
Another used tool for the integral business intelligence purpose is Google
Analytics. Yet google analytics is not focused on the conventional business
that keeps a local excel and draws information from it. Instead, if focuses on
the reports coming from digital devices, instead of relying in a specific
database or file.
Therefore, when a user signs, logs in or sends a message, they receive a
notification about it, which can for example help to determine the number of
logged users per week, and other business intelligence insights. Similar
aggregated services to Google Analytics are Firebase or Kinvey.
One interesting tool is Flourish [4]. Flourish is not a dashboard creating tool
per se, but an interactive plot creator. First, the user designs a plot based on
certain data, and then the user can embed the newly created chart in a
website-like application.
In the UI where it is embedded, is also possible to download both the chart
and the data underlying the chart which can also be blocked if desired. It is
therefore, a more modularized tool which turn the otherwise complex creation
of interactive plots using conventional web design tools into references to the
Flourish server where the plot is located.
There are many other integrated data visualization tools like WhataGraph,
Infogram, Zoho Analytics, Siscience, Visme… But since there was an ample
array of options, the hospital had to suggest us which tool are their digital
services more compatible with.
D3.js is a Javascript library to produce even the most complex kinds of chart.
From world maps, heatmaps, network charts, the only drawback of this tool is
that requires ample knowledge on javascript and the time required to develop
charts is slightly longer than other libraries. There is an r package to use D3
called r2D3.
NVD3 Is a chart plotting visualization tool based on D3 that focus his potential
on the reusability components from D3.
Plotly Plotly is an interactive chart plotting tool. One of the most used libraries
for plotting in r environments since it can provide a wide array of charts, from
simple barplots to network graph and even the more complex ones like
heatmaps, contour plots and network graphs.
Ggplot2 A r package for declarative creating graphics. It is part of the
TidyVerse environment, so for this reason is one of the plotting tools more
frequently use alongside with r data manipulation.
Google Charts This tool allows the user to generate website-oriented plots.
Therefore, it generates the charts with Javascript but the charts themselves
are rendered in HTML/SVG formats.
As an addition, it would not harm knowing which specific programming
languages have a built-in bio/medical packages. Our goal is indeed more
resemblant to business intelligence, but in a future, it may be considered to
connect to certain clinical aspects.

13
A brief study [5] of the expert’s opinion yields with R language as the most
used and convenient, followed by SAS and SPSS, preferred often by
pharmaceutical companies but in decline stage in favor to R.
R language benefits from multiple packages for bioscience such as
Bioconductor, noting that most part of the packages are destined to the
genetic and genomic.
2.2 Medical DataViz Examples

Now, let us take a look to the medical and healthcare applications using data
visualization tools and techniques. In a broad scope, we can find data
visualization in two main formats: Written reports that make use of interactive
and non-interactive plots and Dashboards which often show a map or a series
of interactive plots from which a user can extract its detailed information.
Good examples are the data infographics and data plots provided by the AHRQ,
the Agency for Health Research and Quality [6] on its reports (The AHRQ is
one of the 12 agencies from the Department of Health and Human services
from the US Government).
One of its reports were about emergency influenza visits in the us, providing
graphs for age groups, states in the US, year-round distribution of cases,
ethnicity, estimated wealth, etc. [7]
Another good example from AHRQ is a tool that allows to visualize the
expenditure in health insurance from workers in the US, which has been
increasing in a linear way since 2006.

Figure 4: US Emergency Department visits related to suicide, per 100.000 habitants.

The website vizhealth.org [8], a project started by the university of Michigan


and the Center for Health Communications Research to provide insights of
medical data. On this platform, there are multiple filters like the type of
medical condition, the type of chart, etc.
One good example is one report about sequels after prostate cancer in which
you can visualize the different risk probabilities of the multiple sequels
depending on the intervention performed to remove the prostate cancer.

14
Also, the Institute for Health Metrics and Evaluation (IHME), a research health
center part of the university of Washington, provides medical data in a global
scale to evaluate the best strategies to address them.
They provide a tab with all the data visualization reports, for instance this
interactive world map with the HIV Mortality in LATAM [9] which has a great
number of filters to perform a deep exploration of the data.
The populational domain is one of the main applications for the data
visualization tools. It allows to find out distributions and outliers in the
different regions of a country or a continent to apply the necessary measures
to address the problem.
One particular case is the COVID-19 situation, in which multiple medias and
governmental website have published several interactive plots, maps and
statistical data to keep the population informed about the new cases, deaths
and vaccination progression with a daily update.
A good example of an integral data visualization regarding most of the
interesting data is the one provided by epdata.es [10]. They allow to explore
the data regarding almost all considered aspects of the virus: from hospital
occupation to deaths, new cases, distribution of cases by regions. This tool
can be useful to concern the population about the severity of the current state.

Figure 5

This case the plots were developed using the already mentioned tool called
Flourish. They also enabled the downloadability of all these charts and their
corresponding data.
This is a tool developed to inform about evolution of covid in Spain in
particular (Although it also provides some global insights).
A global scope COVID-19 informative tool can be found in
informationisbeautiful.net [11]. In this website there is a section dedicated to
COVID-19 infographics and statistics, which is really interesting. Between the
conventional displayed data, there are also some less evident, like the riskiest
activities contagion-wise, the distribution between mild, severe, and critical

15
cases, country comparator, and one showing the mortality rates in people with
preexisting conditions.

Figure 6: Coronavirus Dashboard from Informationisbeautiful.

The website pandemicinternationalsos.com has a dashboard concerning the


pandemic data, but these data are a little bit far from the medical domain.
Nevertheless, they provide time changing maps about the different social
events that occurred concerning the COVID-19 pandemic, such as the covid
test policies applied at national levels, the state of public institutions during
the pandemic, such as school status, public transport regulations,
cancellations of public events, etc.
In the US, a very useful and well-designed tool is the website thecovidatlas.org
[12]. In this website, we can explore in depth all the map of the United States
tuning the parameters and getting useful localized insights for each state and
even for each region of the state.

Figure 7: US covid interactive map


16
This solution leverages a tool called GeoDa, a spatial oriented data science tool
focused on map plotting.
Concerning the population-oriented data visualization medical tools, one
interesting case is the Washington State Health Department [13], which
displays in a website an entire set of entries for different dashboards
concerning the main health issues of the state: deaths, births, HIV cases,
Immunizations, Marihuana and Tobacco usage, etc.
They also provide their data guidelines, protecting the privacy of the data and
their ethical usage. In a similar way, they provide the source of all the data
they hold, coming from surveys, data collection systems and public agencies.
The Washington State Health Department website uses Tableau for the plots
and charts allowing them to be interactive and downloadable.

Figure 8: Washington State Department of Health dashboar: HIV information Statewise.

In this domain of data visualization for populations, we should pay careful


attention to the biggest health organization in the world, the WHO (World
Health Organization). In their web page [14], they have perhaps the most
reliable data visualization dashboard for worldwide health information. Similar
to the previous mentioned case (Washington State Health Department) they
have a menu on their website with the different dashboards that the user can
consult in order to get insights of the worldwide health status, ranging from
tobacco users to traffic accidents, to hygiene parameters and mental health.

17
Figure 9: WHO dashboard concerning air pollution.

The WHO dashboards use Microsoft Power BI to plot all their charts.

A similar example for a renowned institution is the example provided by the


CDC Data Visualization Gallery [15] (Center of Disease Control and
Prevention).

In their Gallery we can see as similar examples before, several data


visualization dashboards in which many US health data is displayed, generally
in the way of interactive maps. Also, within each dashboard a comprehensive
explanation on the data treatment is provided, which can be key for
understanding the technicality of the concepts here described, also including
some basic instructions for the exploration of the plots in the dashboard.
Often, they add link to the sources of the data, to the articles and bibliography
in which these concepts are explained, a brief glossary of the concepts used in
the dashboard, mathematical equations and statistical operations applied to
treat the data, etc.

18
Figure 10: CDC's US map of drug overdose cases

The National Center of BioTechnology Information bring us one example of one


of the already mentioned Data Analysis tool [16] (Tableau) in which they can
visualize the metrics for the length of the stay in the hospital of patients
suffering from colon cancer.
healthitanalytics.com is not a data visualization tool itself nor any kind of
software enabling tool, but a very interesting website where all kind of news
about the state of the art concerning health IT analytics, therefore providing a
good landscape for the new applications of technology in the medical domain.

Another tool found to develop medical data visualization is VisuExplore [17].


Visuexplore is a tool used specifically for the exploration and data
visualization of medical data funded by the Austrian Research Promotion
Agency.
In this case, we do not find a business intelligence data visualization tool, but
a clinical one. Even though its user interface looks a little bit outdated, it is a
solid reference from the past on how useful data visualization in medicine can
become.

19
Figure 11: A clinical dashboard of DataViz. In this tool, x axis always represents the time, while
each graph represents a variable observed during that time.

There are also in the entrepreneurial plane, a lot of companies and startups
who offer good solutions for health analytics, for instance Triscribe, Qode
Health Solutions, Nicolette, Genpro Research and Helpilepsy which include
working with data visualization tools to display their reports and medical data
to their customers.

Some brief mention about infographics, which are a way of conveying


information to a more general population [1]. We talked about business data
visualization (which is interesting for the healthcare managers), the clinical
data (which is interesting for the technical staff such as doctors, nurses, and
specialists) but the infographics are generally destined to provide a lay person
with basic data in an appealing manner.
In vaccination campaigns, seasonal events, pandemics, and many other events,
these means of information can inform the population about how to react in
such events.

Another interesting example of the business intelligence domain is the service


provided by Sisense [18]. They provide guidelines on how to process medical
data to create an insightful dashboard so their designers understand in broad
terms the inputs and the key interesting features they may need to extract.

20
Figure 12: Dashboard designed as example for a donation healthcare campaign.

This type of guidelines is very helpful and may contribute to define standards
and protocols for medical healthcare management.
Often, these standards are defined only in the way medical data is structured
for each country, hospital, or healthcare group, which is one of the reasons
there is so much disparity in medical data structure.
Another interesting approach is taken by Cambridge Intelligence [19]. It
consists in using a knowledge graph tool called customer360 to develop a valid
tool so managers can query their data to obtain insightful information, for
example to explore the patient’s distribution along clinicians and facilities, get
a summary of the complaints filed by the patients, etc.
This is an interesting approach, since it is a query-oriented visualization tool,
like the database neo4j.

Figure 13: An example of the Cambridge Intelligence tool based on customer360.

The nodes are Patient, Provider and Facility, and the arcs are their relations.

21
One very close example of a shiny application for medical purposes was the
AccuGest dashboard [20]. On it, the information concerning TPNI (Not Invasive
Prenatal Tests) for several hospitals. Often, these kinds of tests are performed
to prevent pregnancy problems such as miscarriage, prenatal diseases and
problems, malformations of the embryo, etc.

Figure 14: AccuGest dashboard.

This type of applications is of great use for healthcare managers, since it


provides a good way to visualize the distribution of cases and therefore,
optimize the distribution of material, equipment, and medical staff among all
the hospitals.

With all the information collected, we can conclude that a lot of data
visualization tools have been developed to the medical domain with
satisfactory result.
The absence of published examples in the management of medical supplies is
only logic; no hospital should allow to publish a data visualization tool that
compromises the privacy of their data and material purchases since it can
lead to speculation and prejudicial actions taken by third parties.
Also, since every hospital or healthcare group has his own way of managing
supplies, it lacks interest to even publish the structure of their supply
management.

22
23
3 Requirements and Design
The solution proposed for the dashboard design is composed in two main
requirements: those belonging to the data process and those belonging to the
design of the dashboard itself.

3.1 Data Requirements

Concerning the data processing part, we need to process the data in a


coherent way as said before, first the data should be cleaned.
After cleaning, the data process should contain only the fundamental paths to
the final data that will be displayed in the dashboard.
The code should be executed in a reasonable time so the dashboard can have
a reasonably low loading time.
It is also desirable to have a clean code and explanatory. Unfortunately, in our
case, even though the code was simplified and cleaned in a couple of
occasions, the complexity of the data and the transformations required,
coupled with the limitations of TidyR oppose to the simplicity of the code.
In any case, several comments were supplied to set each processing stage for
ensuring clarification.
Also, some missing values were replaced with zeros when considered, and in
the other cases, rows containing said missing values will be removed from the
data frame.
The main goal here is to reach two main data frames from the original csv.
These two data frames are: the one concerning all data about articles (in such
df there is one row per month and article id) and the second one, concerting
all data about tests (one row per month and Statistical Concept Code which is
somehow similar to a test id). Both should carry on the original information,
provided its relevant to the further analysis.
These two data frames will be the two fundamental pillars for the entire
plotting in the dashboard.
From these two well formatted and structured, we can basically extract all the
desired information with basic operations, such as filtering, grouping, etc.
We also have to note that even though most of the processing was performed
in the server function of the shiny app, some basic processing such as
ordering rows by column value or filtering rows by value were performed to
keep the server side in order, creating each data frame in the server part
would be clogging it up with all types of similarly named data frames that may
lead to confusion in later stages, thus in the server function only main data
frames will be located for the sake of simplicity.

24
3.2 Dashboard Requirements

As for the dashboard, several steps in the designed are involved.


In order of use, the dashboard should prompt you a login menu in which you
should provide a username and a password in order to access the data inside
the dashboard. In case that in a future this dashboard is published in a
webpage, the data inside it should have some way of protection.
We have to say that with this method, the user and password are predefined,
unique, and always remain constant, but the possibility of creating custom
users to grant access to different profiles is opened. Still there is no need for
such custom login, since all the users will access the same data.
Next step will be automatically done by Shiny’s process, which is to execute
the code from the server function and UI function.
To inform the user about these processes, we decided to implement an extra
Shiny package called shinybusy, which displays an animated spinner in the
right top corner of the dashboard in case a process is happening in the
background and the dashboard is irresponsive.
Luckily in most of the cases the data is processed and displayed almost
instantly.
Then, the logo from the Hospital 12 de Octubre and a head bar to the
dashboard will be placed in the upper part of the dashboard for aesthetic and
informative purposes.
After this, we will implement a home Tab which will show the overall data
about the articles as said before in the Introduction section.
This tab will contain three main plots about the yearly activities. The first one
will contain the top 10 most efficient articles with their efficiency, the second
contains the top 10 least efficient articles with their efficiency, and the third
one with the top 10 articles with highest margin of stagnation.
Funds are stagnated if they are bought but never used, therefore efficiency is
the percentual division between consumed and purchased.
Note that if it is desired that the efficiency tends to 100% from below; values
exceeding the 100% imply that the article is being consumed more than
bought, indicating a tendency to its depletion in the stock, and therefore it
may be desirable to purchase more of said article.

There will also be other tabs to let the user explore the data grouped by
different features: by provider, lab section, automation unit, article, and test.
Similar as the main home tab, they will provide different tabs grouped by their
features, enabling to give insights about how different features affect to the
overall efficiency.
For example, one may see that one provider is more efficient than other, but
experts must know that maybe the less efficient provider is in the global scope
less fund consuming than the others and then maybe taking actions in his
contract is not as valuable as tuning the specificities with some other
providers, etc.

25
The similar concept applies for the rest of features, even though one may think
that said lab section or unit is not efficient enough, the laboratory experts will
always have the last word, since they know better the purpose and meaning of
these results and have a better interpretation on them than the lay person.
Articles and Tests will be shown with an interactive table, since they are the
most complex and diverse in unique types, with a graph below describing their
monthly progression.
For the less diverse features like providers, sections and units, merely a bar
plot containing the different x variables will suffice, since they cannot escalate
in numbers as fast as the articles and test do (as example, there are only 5
providers, 7 sections and 8 units in the current data).
There is also an explicit requirement for all the plots; they are interactive.
The interactivity is reached using both the two main libraries for plotting in R
language: ggplot2 and plotly.
Interactivity can be defined in several ways, but basically in almost all our
plots we want the user to be able to see different plots based on some input
(button, checkbox, input box …) in the dashboard. Another interesting thing
would be to display the precise data in a tooltip while hovering over each data.
Mainly the objective would be to allow the user to change the main feature for
each tab, so in the tab provider you should be able to select each individual
provider separately, with their corresponding articles.
Secondarily, would be interesting to be able to export the data so the user can
have a report of the plots with the desired parameters to send them as a
justification for the providers.
This is a feature available thanks to the package knitr, that allows the user to
create one template for each report to do.

26
4 Developments
Now, the development of the thesis will be explained in three main phases:
1 Data Processing (3.1-3.3)
2 Layout Design (3.4-3.6)
3 Chart plotting (3.7)

4.1 Original Datasets


The data processing comes from the 3.csv provided by the hospital. A brief
description of the data appearing in each column will be provided for its
clarification. Followed by an explanation about how they are merged for its
comprehension and further plotting.
Most part of the processing was carried out using the packages tidyverse,
which helps to standardize and using the pipe operator ‘%>%’ to connect
multiple operations over a dataset.
We also have to mention that the most part of this thesis development was
performed using sample data. This means that the csv files had some missing
rows, therefore the result had some incompletion.

4.1.1 Mappings
This csv contains the complete information about the individual tests, each
row being one test. It contains the information regarding the reactive used, its
price, references, providers, hospital units and more:

LOINC: Universal vocabulary to describe a medical measurement or


observation related to electronic health records.
Muestra: Sample format in which the observation was carried out. It can be
blood plasm, urine, cerebrospinal fluid, etc
Área/Sección: Laboratory section in which the measurement is taken. The
same type of measure cannot be carried out in different sections (machinery is
immobile)
GFH: Hospital floor in which the section is located. In our current dataset it
was only considered the sections from the second floor, but in the future the
third floor sections may be added
Técnica: Only 2 options are possible: manual or automatic. If it is the former,
some human analysis is required to be performed, if the latter, the machine
extracts the results on its own.
Unidad de automatización: Designation of the main machine or non-fungible
tool employed to carry out the observation.
Código de proveedor & Proveedor: Code and name associated with the provider
company of the reactive involved in the test. There is only one reactive per test,
but not the other way around.
Código artículo: Code of each article bought by the providers to carry out one
or more tests. Regularly it has 5 digits.

27
Código estadístico: Hospital internal numerical code linked to a GENERAL
TEST. It may have a C indicating that this test comprises several smaller tests
but all of them share a common pack.
Example: Test with “Código de concepto estadístico: 5001 - Iones (Líquidos)” is
a global test which uses the article called “ISE REFERENCIA SOLUCION
ELECTRODO“ and such test is comprised by these other tests:
1. Iones (Líquidos) -> COD: 5001
a) Sodio (líquido)
b) Potasio (líquido)
c) Cloro (líquido)
d) Cloro (LCR)
Código de prueba: Hospital internal numerical code link to each PARTICULAR
TEST. This one has no C indicator.
Example: Each of the prior minor test has his own code:
1. Iones (Líquidos) -> C.E.: 5001
a) Sodio (líquido) -> Código de Prueba: 435
b) Potasio (líquido) -> Código de Prueba: 436
c) Cloro (líquido) -> Código de Prueba: 437
d) Cloro (LCR) -> Código de Prueba: 500
Precio de determinante con IVA: Cost of each individual usage of an article for
each individual test. It is important to mention that each article comes in
packs.
For example: if the user buys one pack of Calcium which costs 123.75€
(smallest package commercially available for the hospital, it contains 2250
usages.
Therefore, this column is the division between the costs and the usages,
returning with a total of 0.055€/usage of Calcium.

4.1.2 Activities

This csv contains information about the global tests monthly performed in one
year. The first column contains the merged information of “Código de concepto
estadístico” and “Descripción de concepto estadístico”, which will be separated
in the processing code.
It also contains column-wise the dates from December to January and in each
of the rows, the number of tests performed in the selected month (column) for
the said test (row). It also contains a column called “Total” containing all the
tests for the test in the entire year.
This way to displaying the data is fairly inconvenient, since most of the csv
and datasets are used to have a single column called “date” in which the date
is displayed in a specific format, such as DD/MM/YYYY or similar.
Therefore, in the future we will indeed change the columns containing the
months to a single column with the date, which will also multiply the number
of rows, but giving a better structure to the data, since most of the plotting
tools expect the date as an x value contained in a single column.

28
4.1.3 Purchases

This csv contains all the information related to the purchases per article in the
entire year. The only columns of interest are article code, article description
and total budget. The last column called tipmat defines with 0/1 if the article
is fungible (0) or not (1). The total budget will be split in 12 months mimicking
the monthly distribution of “Actividades.csv”, with the difference that all
months will contain same number of purchases.

4.2 Data cleaning


The next step after the dataset understanding was cleaning the data.
In the csvs, the following problems were found when inspecting them in detail:
1) Some trailing whitespaces were found in some strings, solved with the
str_trim() function from the stringr package.
2) Numbers were imported as strings, so we had to substitute decimal comas
for decimal periods and then the number casting (as.numeric).
3) NA and missing values were found in places where numbers were expected.
In cases where it made sense, a 0 substituted the missing value.
4) Non fungible elements in “Compras.csv” were eliminated since they are not
linked to any possible test or measurement. In the future, if they are to be
considered in the costs balance, a corresponding entry in “Actividades.csv”
must be provided.
5) Change of complexity indicator in “Actividades.csv”. In “Mapeos.csv” capital
c is added to Código de concepto estadístico in case the global test is
formed by smaller ones and therefore is not a single individual test. But,
due to the hospital data processing, this C indicator is changed in
Actividades (thus, a C is added to simple global tests). To be able to merge
the two datasets and match them correctly, we have to add a ‘C’ to codes
without it and vice versa: remove “C”s from codes that contain ‘C’, in the
csv “Actividades“.

4.3 Data merging

The merging of the data consists in producing new datasets that summarize
the information regarding the costs of articles and tests without losing the
information. Datsets “Purchases” and “Activities” hold the instances of the
data: let us say that it is the actual income and outcome of resources.
“Mappings” on the other hand, contains detailed information about those
articles or tests that have been purchased and consumed.
The main idea is to produce 2 final datasets that comprise all the information:
One with the tests and other one with the articles.
For this goal, we must produce 2 datasets before: one with the purchases and
other with the consumptions. After that, we will join both by the dates and
Código de concepto estadístico, and we will obtain the dataset containing info
about the tests.
But some information regarding the costs will be repeated in these tests, so we
will have to group up by código de artículo to provide a consistent and

29
coherent, non-repeated with all the single article information about purchases
and consumptions.
Worth to notate the processes that were performed while merging:
1) We removed some NA rows in cases where the primary keys were missing,
and this merging was impossible.
2) We managed mismatching between merges. One good and understandable
case is when there is an article bought but there is no record of its
consumption and you try to merge the intermediate datasets “Purchases”
and “Consumptions” (due to lack of control, lost data or even it was not
used in the whole year). In those cases, we coalesced the information from
the merged result (meaning that we extracted all the information from the
matched part, and we put 0’s in the fields where the mismatch came from).
In other cases, the merging acted as a Left Join.
3) Renaming was often used to clarify the data inspection.
4) Date format was changed. Due to most of plotting libraries, date uses to be
a single column. Thus, we changed from having 12 columns for all months
with budget amount on each row to have 2 columns: Date and Budget,
where each row contained the month and the budget amount. In the future,
the year and day may also be included. Year when csvs coming from other
years will be added and days when the expiry dates from articles will be
considered.
5) We removed columns without valuable information, but kept all columns
that could have certain information, unused yet for further use.
6) Transformed tests coming from “Actividades.csv” into cost budget.
7) Created a column “efficiency” for the rate of consumption vs purchase.

In the following diagram we can see a simplification of the merging process,


from the original datasets, to the intermediate ones to the final usable
datasets.

30
Figure 15: Diagram of data manipulation.

Of course, a lot of little changes such as renaming, regrouping and


restructuring are omitted in this diagram for the sake of simplicity.
To be able to visualize two of our main tabs which group articles by provider
and laboratory sector, we created two additional datasets in which we grouped
articles by provider in the first dataset, and by sector in the second one.

It was a little bit burdensome to perform all this steps, since the TidyR has of
doing a grouping and summarizing without excluding columns in a single
operation. Thus, all steps were possible if concatenating operators and the
results were clear and satisfactory.

31
4.4 Shiny Layout

The layout was designed to be a dashboard with 6 main tabs. Each one of
those tabs indicate the big filter that the user wants to impose to the data and
the core of the question to be answered. The 6 tabs are:

1) Home. In this tab the user can visualize three graphic plots related with
the big scope information about the data. The user can see in a broad
scope the main problems the supplies have. The three plots are:
1.1) Top 10 Less Efficient Articles. Mention again that efficiency is
computed as the rate between consumed value and bought value.
1.2) Top 10 Most Efficient Articles.
1.3) Top 10 Articles with higher loss margin. Loss margin is the amount
of money that was spent to buy the article but was left unused, thus
implying the waste of resources.
2) Providers. How efficient is of your contractual relationship with your
provider? How big is their contribution to the total of your supply? Can I
see which article in particular damages my contract with certain supplier?
All those questions are answered here. There are 4 plots here:
1.1) Provider purchases. How much in total is bought to each provider?
1.2) Provider expenses. How much is spent in tests per provider?
1.3) Provider efficiency. How is the expenses/purchase per provider?
1.4) Provider per article. How is the purchases and expenses of each
article that a provider supplies to the hospital?

3) Articles. The search of specific articles data. You can select one or more
articles to visualize in the plot. You will see their evolution from January to
December and you can change the display options to see the Purchase
Cost, the Consumption Cost, the difference between both and the efficiency.
You can also change units from money in € to its equivalent in tests
performed.

4) Tests. Visualizing specific individual tests, similar to articles you can select
one or more tests. Important to mention that some tests are linked,
meaning they are counted as the same Concepto estadístico, therefore they
are linked to the same article and their expenses are considered the same.
Again, the user can select the data displayed and the unit is displayed with
like Articles Tab.

5) Lab Sector. You can plot the total amounts by each one of the testing units.
In our dataset, a total of seven units or areas were found, but no problems
should arise if more units appear in the future: Seminogramas, Drogas
(Abuso), Bioquímica (Suero/Plasma), Bioquímica (orina), Inmunoquímica,
Osmolaridad and Urianálisis. This way you can specify by their technical
aspects what are the less efficient kind of tests, and from there maybe take
actions upon these (Renewing machinery, increasing maintenance, asking
the provider to change the number of determinations per packs, etc.)

6) Automatization Unit. This is the main type of workstation or tool in which


the test was performed. Therefore, in these graphs the data is grouped and
filtered by the type of unit in which the specific resource was spent in. We

32
could find some curious facts like for example that in the machine called
Platform COBAS 8000 had the exact same consumptions and purchases
than the provider “Roche Diagnostics S.L.” so it is safe to assume that the
provider was also the producer of the automatization unit and probably
also produced all the modular material supplies involved in the tests of this
machine.
In the plots is also displayed whether the specific machine is “manual” or
“automatic”, manual implying that human judgement and intervention is
needed to draw a conclusion from the result.

There is also a “No Aplica” kind of machine, which means that no machine
was used for that test. For the same reason, such machine category of not
being a machine at all is considered to have a manual technique.

We should also note the implementation of a basic report generation


button. This is a Shiny feature that allows the download of a pdf or html
file containing specific information.
The template design for these reports is done in a parametrized Rmd file
which is then rendered into the report itself. Due to the need of installation
of pdfLatex in the server, we chose the html format.
We must mention that there are not many examples and documentation on
this report generation, for instance there were no examples on complex
graphs nor how to pass data frames with several columns as parameters to
the report template, etc.
Therefore, this feature only exists in the single dimension plot tabs (The
ones who only have 1 value of y over each x feature) which are all tabs but
articles and tests.

4.5 Remarks about the tests

We also should mention one data effect that makes impossible to render the
tests useful in plotting level.
We will explain it this way: there purchases are done by Article Code, but the
consumptions are done by Statistical Concept Code.
We can aggregate all the tests performed that use one reactive in common to
determine how much of that article was consumed vs how much was
purchased.
But there is no way to know how much of the article purchased is destinated
to each one of the Statistical Concept Code.
We can use these graphs to explain it better:

33
Figure 16: Relation between articles and statistical concept code

Example for article: 21236 Cámara Neubauer

Figure 17: Example of relation between article and statistical concept code.

Also, something similar happens if we want to know how much money was
actually consumed by every single individual test, each one with his unique
SIL CODE.

34
This similar graph can help to understand what happens in this case:

Figure 18: Relation between statistical concept code and LOINC code

As it can be interpreted from these incoherencies, both data structures for


Purchases and Consumptions were not designed to be symmetric or for
merging them.
As will be mentioned in the Conclusions section, some data restructuring from
the hospital csv generator could be extremely useful to eliminate a good
number of problems which appeared during the data manipulation.

35
36
5 Results

5.1 Data manipulation results


The output provided were somehow satisfactory, since the dashboard was
developed according with the requirements that were previously set and with
the feedback provided by the hospital.
In broad terms, the results were met; we obtained the two main desired
datasets providing coherent data processing, which served us for the most
part of the chart plotting.
For the sake of clarification and a proper explanation, let us go along with the
aggregation of one article, so we can see that the processing of the data was
done properly. We will try to manually weave the data into the final costs of
purchase and consumption of all time, and then go to the plots to check if
they are the same. This process was performed manually for certain number of
articles, and in all trials the values in the plots and in the manual operations
were equal.
Let us start with the article number 22728, the ALKALINE PHOSPHATASE,
appearing in the 65th row of compras.csv.
This specific article had a purchase equivalent to 13286,38 € now, how much
was it spent in laboratory tests? We will have to look at all matches in
mapeo.csv with the number 22728 in the column “article code”, since this
article can be used in multiple tests. Luckily, this article has an only entry in
the mapeo.csv, at the 50th row, matching with a test with Statistical Concept
Code 137 with same concept: Alkaline Phosphatase.
If we go now to the actividades.csv, we will have to look for code C137
(remember that code matching between mapeos.csv and actividades.csv
should change: if it has a C in one csv, it belongs to the same code without the
C and vice versa) We can find it in the 16th row, with a sum of determinations
expended of 338911 units.
Now, to change this number to euros, we should go back to the mapeo.csv and
look at the 50th row again to the column “precio determinante con iva”. This
will tell us how much money costs each usage of the article, in this case being
0.0462€ per use. Multiplying 0.0462 and 338911 yield us 15657.69€
Let us now look at what the dashboard tells us about this product, navigating
to the articles tab and then looking for alkaline phosphatase in the search box
of the table. We can see the following values:

Source Purchased Consumed Efficiency

Manual 13286.38€ 15657.69€ 117.84767%

Dashboard 15657.6882€ 117.84765%


13286.3808€

The results are basically identical, the only difference being that in the manual
calculations, third decimals were omitted since euros only reach cents and not
thousandth of euro.

37
Now, let us take a slightly more complex case, the article number 21236,
named; “CAMARA NEUBAUER DESECHABLE METRAQUILATO
(P/RECUENTO CONCENTRACION CELULAS, SEMEN, BACTERIAS O
ESPORAS)”, in compras.csv 17th row, with a purchase import of 1452€.

After matching the article code with the mapeo.csv, we obtained several
matches. Out of all those matches, we only should select the ones with unique
Statistical Code Concept since the repeated ones will provide redundant
information.

The matches can be seen in the following table below:

Matched SCC Description Cost per Determinant

C3569 Estudio Post-vasectomía 0,01815

C3910 Seminograma (Est. 0,01815


Fertilidad)

2630 Volumen 0,01815

2632 Apariencia 0,01815

2634 Viscosidad 0,01815

2640 Licuefacción 0,01815

As one can see, all the matches have the same cost per determinant, this
makes sense since all of them are using the same article but refer to different
tests in which the article is being used.

Now what remains is to find out the number of each one of these tests that
were performed, so we will have to match each one of these tests with their
correspondent one in the actividades.csv, and then as done before, translate it
to money and aggregate the total sum.

Matched Description Cost per Number of Funds


SCC Determinant determinants consumed

C3569 Estudio Post- 0,01815 68 1.2342€


vasectomía

C3910 Seminograma 0,01815 201 3.64815€


(Est. Fertilidad)

2630 Volumen 0,01815 X X

2632 Apariencia 0,01815 X X

2634 Viscosidad 0,01815 X X

2640 Licuefacción 0,01815 X X

Total 4.88€

38
Unfortunately, there was no match for SCC 2630, 2632, 2634 and 2640, so
they were discarded. A close inspection in the mapping csv tells us that their
SCC and their LOINC code were the same, so maybe they are considered as
parts of the C3910 and C3569 tests and therefore they do not have their own
SCC. In any case, due to this mismatch with actividades.csv, they should not
be considered.

Once again, if we look at the result in the dashboard, we can see that both
results match again, so we can ensure that all this multi match possibilities
are considered when processing the data.

Now, let us do a brief description about how the second data frame is
processed, the test one. It holds similarities with the article data frame, but
here things get a little bit more complicated.

The departing point is now the consumption csv. From here, we try to match
all possible entries in mapeo.csv. Once done this, we have to take a look at all
the possible articles and now here comes the great problem, cause if one links
articles to purchased articles, it is a big mistake, since these articles linked
are not referring to a unique SCC, but to several.

In this case, the only way forward is to assume that articles are bought to be
used in a homogeneous way to each test implying that all tests are expected to
be performed the same number of times (this is not true for almost all the
possible cases).

The reason to do this despite all the errors that this may entail, is to provide a
background in case in the future the data structure changes notifying for
example how much of x articles are destinated to each test, and then it will be
possible to show truthful information about the tests.

Until now, we should treat the conclusions drawn from the tests with care,
since they are not valid data and may lead to confusion.

However, the article dataset contains precise information, and it is enough by


itself to provide most part of the information appearing in the dataset.

And of course, to manipulate data around the LOINC test as unique id is


almost completely impossible and would lead to several errors. First, we
should not only assume that every article is distributed equally along each
linked SCC test, but we should also suppose that each SCC test’s funds
(either consumed or purchased) is distributed equally among all its LOINC
tests.

This would distort entirely the meaning of the data, and provide with
incoherent information, therefore grouping the data around the LOINC code
was discarded.

39
5.2 UI Dashboard Results

Now, let us move on to the results from the UI part.

The first thing appearing in the dashboard is the login pop up.

The security is provided thanks to the package [21] shinymanager, and it is as


simple as setting up some credentials and then wrapping your UI function like
this: “ui <- secure_app(ui)”

The result is this menu.

Through this pop up the user should


introduce the username and password to
validate the login and see the rest of the
dashboard where the data is shown.

Figure 19: Dasboard's login menu

After passing it, we can see the home tab, where general barplots are located.

Figure 20: Dashboard's Home Tab with top 10 efficiency bar plots

Scrolling down the other two barplots containing the 10 less efficient articles
and the 10 with more absolute margin of difference are located, but it lacks
interest to show them since they are almost identical to the first.

The bar plot was a fitting plot to represent continuous values (the percentual
efficiency) in discrete variables (articles).

The content of these first plots provides with an overall synthesis of the
articles that imply a major impact on the efficiency of the hospital, so they
alert the user about the main articles that are more harmful for the economy
of the laboratory.

40
But in case they want to find deeper relationships and precise information
(e.g., these harmful articles may be the worst ones, but not necessary the
easier to adjust in terms of modifying contracts or orders), we should move to
the rest of the tabs, for instance let us take a brief look to the tab “Providers”.

Mention that “Providers”, “Sections” and “Automation Units” they all three
follow the same UI structure, so there is no need to show the other two tabs.

Figure 21: Dasboard's providers tab with consumption bar plots

In this graph we can see a bar plot above, showing the distribution of the costs
associated to consumed articles by provider and below, the distribution along
the months of said costs for the provider selected on the left drop down list.

The options allow you to change the information displayed in the y axis.

41
Figure 22: Dashboard's Article tab.

Now this is the article tab we can see in the upper part an explorable table
with the yearly information about all the articles.

In the lower part there is an input bar where the user can select multiple
articles and plot their monthly progression of costs. User can also select type
of cost and unit: money or determinants.

Mention that the test tab is similar to the article tab, so again showing the test
tab lacks interest.

Finally, the remaining UI element left to be shown is the report generated.

One example of such report is provided above. It simply consists of the plots in
that particular tab combined with the parameters that were selected at that
precise moment as titles.

42
Figure 23: Report generated from the providers tab.

The information may not seem extremely useful, but again this may be useful
to exchange information.

One thing that misses in this report generation is that is not easy to format it
out in pdf format, which can be considered as the most universal report
format.

As a side note, it is important to highlight that the purchases are a plain line
in all cases because the purchases report is yearly and has no break downs by
months. Therefore, to mimic monthly distribution, it was assumed that
purchased costs in each month was the same.

Overall, the results obtained were satisfactory and, in the end, they deliver
useful and precise information to the user.

43
6 Conclusions and future lines
As a summary we can say that the final version of the dashboard is
satisfactory, yet some final remarks need to be done:
The main goal of this dashboard was to provide conclusive information from
the real data from the lab. However, real validated data was never provided
from the hospital, so it remains yet to know if the data manipulation would be
fitting to real data and the problems that could arise.
The report generation was also troublesome, since just a few examples are to
be found using knitr and even fewer making reports with interactive plots
within them, since the parametrized report does not admit easily plots with a
dynamic amount of y values.
This is yet to be improved and tuned, but again to the lack of documentation
on complex reports, we left this feature a bit aside for the tabs. One good
example could be to implement basic hints, remarks or advices within the
reports so the laboratory staff can get hints that can direct their strategic
decision making.
But generally, the results were satisfactory since all the main goals were met
and the data treatment yet complex linked all the relationships within the data.
Also, the dashboard was clarifying and insightful, simplifying the state of the
articles and giving an option to explore the data.
Yet many things can be done to make the supplies diagnosis more precise.
On the side of the hospital, one good thing would be to modify the original
csvs: Purchases and Consumptions in a symmetric way. This manner, it
would be easier to make the mapping and we would not have incoherencies in
the data so the test data frame would have actual meaningful information.
The kind of structure present in the current csvs led to some initial
misunderstanding of the data, and unfortunately some questions initially
posed remained unsolved like the one related to the expiration date on articles.
Also, a brief sheet or document containing all the documentation about the
concepts mentioned in the csvs could be very clarifying for the user that
manipulates these data and want to understand all its concepts (similar to the
website Kaggle, which gives a brief description of each column field in the
website entry of each dataset).
However, in an overall scope the understanding of the relations between the
different csv was made clear like for example the relationships between LOINC
Code, Statistical Concept Code, Article Code, and their correspondent
consumptions and purchases.
More precisely the structure of Consumptions should be applied to Purchases,
so the Statistical Concept Code can be used as a primary key and from the
Mapping csv file you can then infer the article that is being used for that SCC.
Even more, for a complete usage of the data, the structure of both csvs should
be change in a way that they both use the LOINC code as primary key, this
way we could break down the costs associated to even the smallest test
performed in the laboratory.
Still, it is very complicated as a supply purchase specialist to order a specific
medical article knowing how many tests associated with LOINC codes are
44
going to be needed by the hospital, so it is understandable to add up all the
articles consumed and order for a slightly higher amount. But this problem
can be solved with the further possible task proposed at hand.
It would be of great interest to collect and process all possible data from
previous years and build a forecast model of the supplies needed associated
with each LOINC code. The training data would be the consumption data from
the supplies, since they are the data that we want to fit. After building a
satisfactory model, it would allow the hospital to make automated predictions
of needed supplies and adjust even more precisely to the real demands. A
model such as an ARIMA/SARIMA one could be a possible solution for this
task, but many other time-series machine learning could be considered.

Figure 24: Example of ARIMA model with estimated area in grey

Note in any case that this possible future development is not opposed to the
specialist criteria and should always be approached with care, since the lack
of a good amount of data could pose a limitation to this kind of Machine
Learning approach, and seasonal effects could distort a little our scarcity of
data (Good example is the COVID-19 pandemic).
It would also be very interesting making an indirect inventory counting. Of
course there is no physical way to make the inventory of the hospital to count
the amount of supplies that are actually in there (that would probably need a
robotic solution to make a self-managed inventory which required an
identification to access the different reagent and articles), but if the data is
properly coordinated, counting only once (let’s say when the inventory is the
emptiest) would allow to set an initial value in the inventory, so when a
purchase is performed, the amount of said article is added to the inventory,
and when the consumption is performed, the consumed amount is subtracted
from the inventory.
Of course this would require certain periodic calibration in order to make sure
that the transactions from the data belong to the real transactions in the
articles, but this proposal may help sided with the forecast model to make a
decision: if there is a slight excess in the inventory maybe the specialist
should make an order slightly lower than the actual prediction, and if there is
a slight deficit, the specialist may decide to order a little bit more of that
product in order to compensate.

45
Also note that ARIMA models often give an interval of confidence in their
prediction and but no single numbers, so this fan of possibilities could be
reduced thanks to the inventory accountability.
Nevertheless, this last possibility may be a little problematic, since it is an
estimation, which takes not into account elements like deteriorated packs, or
even problems that may arise when using them in the machines, so the
laboratory staff may need to use another one without carrying out the test
itself.
Of course, said problems and inconveniencies could also be reported so the
data can include them, but it would be burdensome and tedious for the
laboratory staff. Nevertheless, it would solve the incoherencies associated with
the fact that for some months certain articles are consumed more than
purchased.
Finally, it would be interesting to include the safety margins as said in the
introduction. Safety margins can give a more realistic estimation of the
forecasts and could be provided in the mapping csv.
Safety margins could be included as a natural feature of each article, since
probably one test is going to be essential (limiting), you need a fixed amount of
said article no matter for what test, since the limiting element is the most
critical test (remember there may be multiple tests associated with one article).

One interesting result to point out based on the data visualization tool:
The funds spent and consumed by “Automation Unit: COBAS 8000” are the
exact same as the ones from the provider “Roche Diagnostics S.L.”.
A brief googling revealed that said Automation Unit belongs to Roche
Diagnostics SL, and therefore all the reagents and articles needed for this
machine are also provided by the company in a modularized way, which is by
the way the main provider to the hospital.
From the UX/UI perspective, it would be interesting to have some feedback of
its real usage. For instance, some plots may be rendered useless and
uninteresting while others may be shallow and unprecise in the long term. For
example, if it is discovered that along the months and years the efficiency
distribution between the different sections remains constant, this may
probably lead to think that the fact that there is no inherent feature that
makes one section more prone to efficiency than any other, thus making the
information useless.

It is also worth to mention that the UI itself is improvable. Maybe they do not
like the fact that the tabs are scrollable, and they prefer a single fixed screen
tab content.
Perhaps they find the tab structure unintuitive, and they would rather have a
single page that one can scroll down and see the different tabs in a continuous
way split from each other with separation lines.
We had an initial positive feedback from the hospital, but what looks great to
them now may be incomplete with usage and flaws may arise.

46
Also, some legends in the x axis are longer than others and therefore they are
automatically tilted and moved out of screen, it would be interesting to define
a function that takes as input the length of the title x and outputs the angle
and font size that should be correct, since plotly doesn’t seem to adjust the
font size or even jump line when text is too long, but it would be a tedious task,
so in case the user wants so read the entire x label, they can just hover the
plot with the mouse point and read it in the displayed text.
The tool developed along with the existing data is enough to provide a solid
proof of the possibility of fund savings and therefore can be used as a
justification to redefine the distribution of purchases arranged with the
providers.
Also, for a future possible solution might be based on Flourish. This way the
problem can be solved in two steps: If the data is properly output by the
hospital, the website can process it without big troubles and decide the layout
of the elements displayed in said website.
The second step would be to design each plot desired to be displayed in the
already created website. This would be done using the Flourish studio tools
available.
This methodology has been proven effective in the industry, since several data
visualization websites work this way, and even two positions could be fulfilled
for this purpose: one as web designer and other one as UI/UX plot designer
with Flourish.
To sum up, the conclusion is very positive, since most of the goals set initially
have been fulfilled satisfactorily, the understanding of the data, not evident at
first glance, the processing of it, which was subject to several revisions, the
design of the dashboard and all the additions it suffered.
The only thing that misses is a little bit of feedback obtained from the hospital,
so I could tune the dashboard but that is outside the reach of my tutor and
supervisor.
In conclusion, the data visualization tools are one of the most used for
business intelligence purposes and in this case, there is no exception.
Even though some aspects are still unpolished, this dashboard provides a
solid foundation to a more solid and business analysis tool for the hospital’s
laboratory.

47
7 Innovation – Entrepreneurship

This possible proposal defines an integrated solution for the development of


business intelligence tools. A company could be formulated around this idea
to help other businesses to grow through the analysis of their supplies.

Many small companies have a tight control of their supplies, often consisting
in a handful of employees that have a close connection with the provider(s)
and they have a good control over the company. Often the limitations with
their providers refer to minimal order possible or similar issues.

Is not until the company grows that the control over the supplies becomes a
little bit more problematic, since multiple storehouses are acquired and
multiple departments may have different providers, and they relationships
with them grow more flexible. With this new state of laxity, is easy to leak
funds since the company has also grown. Similar as a plant overgrowing its
pot, the control system that uses a small business may get obsolete for a
medium business.

The solution may be a second company (our proposition) that performs


business intelligence reports over the first one and delivers them in a periodic
basis so the direction of the first company can have the key information for
their strategic business decisions. This way, imitating a symbiosis relationship,
can provide a significatively advantage in the growth of both companies, since
the BI company can save a lot of information, hiring of an IT employee,
maintenance, and security. In contrast, the client company can have a great
advantage by decreasing the amount of money leaking from their purchases.

Thus, such company could offer their services providing data manipulation
and specific data visualization tools customized for each client. The final
product would contain critical information about the desired features
requested by the client, but ultimately the strategic decisions are his/hers to
be made.

One of the most important key values in this company would be guaranteeing
the trust and the confidence. This kind of business resembles to traditional
banks, since at the end they keep your money, their clients trusting its safe
and sound, and the bank makes transactions using that money, keeping part
of the revenues, and giving certain clients another part of the revenues.

As with the traditional banks, which will not assume the actions of your
investments, we would not be held responsible for the strategic decision our
client does, we just simply provide the information about the status of their
business.

Similarly, our clients trust us enough to give us their data so we can perform
business analysis, and in exchange we compromise ourselves to keep those
data secured and eliminate them at the end of their contract. It is also very
interesting and complex the way this business model can work, specially how
to solve conflicts such as managing the data of two market competitors. Giving
away data from one to another will certainly blow away the trust and

48
reputation of the company, so our trust and procedures will undoubtedly have
to be certified by a third company.

Naturally, we should sign an NDA with the client company in order to


guarantee their security and privacy, certificated by the already mentioned 3 rd
company.

In our customer relationship, we would be a subscription-based company. We


either had a minimum number of months as first subscription or maybe ask
for an extra inscription amount. But, after this first phase in which the core of
the data manipulation and understanding of the structure with the client is
done, the rest of the months will be cheaper since our company would only do
the routines stipulated, data curation, processing and finally we would deliver
the monthly report to our clients. In case they want to unsubscribe from our
services, they will simply stop receiving the reports and we will eliminate their
data, but the structure to process their data and generate their reports will be
kept in case they want to rejoin our services.

Our main channels would be online advertisement and business and


entrepreneurship channels, conferences, webinars, and similar environments.

Overall, this initiative may rely on other preexisting BI companies and


considering their trajectories, but with more competitive prices. One example
of such preexisting company is HealthDataViz [22]. Maybe we could even
consider specializing in hospitals, this way creating global BI standards and
practices for medical domains and collaborating with public and private
institutions.

Similar tools are in the market oriented to business intelligence such as


Microsoft PowerBi or Tableau but the main aggregated value in our possible
company would be to perform all the programing tasks required to provide the
business reports or data tools specifically for each client.

Concerning the financing of this project, our supporters and collaborators may
be big companies which after applying Business Intelligence to their own
supply chain discovered its potential in cost saving and now, they want to use
it as a spin off company to discover new opportunities in the market.

This way, we could obtain a partnership with a bigger company which may be
able to sustain and give consulting advice, thus profiting their experience in
the business with the flexibility of a small company.

In summary, this could be a reasonable idea to implement in a startup


business, considering the way in which companies grow more and more digital
and they store and digitalize the data with more consistency, this implies a
great opportunity for business intelligence to reach the market.

49
8 Bibliography
[1]. Jennifer Abayowa. (2021, Apr. 6). Visualizing Healthcare Data With
Infographics to Save Lives [Online]. Available:
https://venngage.com/blog/healthcare-data-visualization/
[2]. Erin McCoy. (2019, Dec. 16). How Data Visualization Is Transforming
the Health Care Industry [Online]. Available:
https://modus.medium.com/how-data-visualization-is-transforming-
the-healthcare-industry-6761d7293dd2
[3]. Australian Pancreatic Cancer Genome Initiative icgc pancreatic cancer
(ductal adenocarcinoma) - genome viewer [Online]. Available:
https://gallery.shinyapps.io/genome_browser/
[4]. Duncan Clark and Robin Houston. (2016). [Online]. Available:
https://flourish.studio/
[5]. Ahmed Mohammed Morsy. (2016, Mar. 29). What is the best statistical
software package for biomedical sciences? [Online]. Available:
https://www.researchgate.net/post/What_is_the_best_statistical_soft
ware_package_for_biomedical_sciences
[6]. Agency for Healthcare Research and Quality. (2019, Nov).
Data Visualizations [Online]. Available:
https://www.ahrq.gov/data/visualizations/index.html
[7]. Lawrence D. Reid, Kathryn R. Fingar. (2020, Dec. 15). Emergency
Department Visits Involving Influenza and Influenza-Like Illnesses,
2016–2018 [Online]. Available: https://hcup-
us.ahrq.gov/reports/statbriefs/sb269-Influenza-ED-Visits-2016-
2018.pdf
[8]. University of Michigan & Robert Wood Johnson Foundation. (2014).
Visualizing Health [Online]. Available:
http://www.vizhealth.org/gallery/
[9]. Institute for Health Metrics and Evaluation. (2021). HIV Mortality –
Latin America | Viz Hub [Online]. Available:
https://vizhub.healthdata.org/lbd/hiv-mort-la
[10]. Agencia de Datos, Europa Press. (2021). La evolución del coronavirus
en España y en el mundo, en gráficos [Online]. Available:
https://www.epdata.es/datos/coronavirus-china-datos-graficos/498
[11]. Information is beautiful. (2021). Covid-19 Coronavirus Data Dashboard
[Online]. Available:
https://informationisbeautiful.net/visualizations/covid-19-
coronavirus-infographic-datapack/
[12]. U.S. COVID-19 Atlas. (2021). US Covid Atlas [Online]. Available:
https://theuscovidatlas.org/map
[13]. Washington State Department of Health. Health Data Visualization:
Washington Tracking Network Dashboards [Online]. Available:
https://www.doh.wa.gov/DataandStatisticalReports/HealthDataVisua
lization
[14]. U.S. Department of Health & Human Services. (2021, May. 13)
[Online]. Available: https://www.cdc.gov/nchs/data-
visualization/index.htm
[15]. World Health Organization. (2016). World Health Statistics data
visualizations dashboard [Online]. Available:
https://apps.who.int/gho/data/node.sdg

50
[16]. Inseok Ko, Hyejung Chang. (2017, Oct. 31). Interactive Visualization of
Healthcare Data Using Tableau [Online]. Available:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5688037/
[17]. Alexander Rind. (2021, Oct. 5). VisuExplore – Gaining New Medical
Insights from Visual Exploration [Online]. Available:
https://www.cvast.tuwien.ac.at/projects/visuexplore
[18]. Dana Liberty. (2018, Sep. 13). Healthcare Dashboards: Examples of
Visualizing Key Metrics & KPIs [Online]. Available:
https://www.sisense.com/blog/healthcare-dashboards-examples-
visualizing-key-metrics/
[19]. Cambridge Intellingence. (2021). Healthcare data visualization [Online].
Available: https://cambridge-intelligence.com/use-cases/healthcare/
[20]. Fernández Martínez, Fco. Javier (2021). Cuadro De Mandos Para La
Visualización De Datos Genómicos: Accugest. Máster En
Bioinformática Aplicada A Medicina Personalizada Y Salud
[21]. Benoit Thieurmel, Victor Perrier. (2021, Jun. 16). shinymanager:
Authentication Management for 'Shiny' Applications [Online]. Available:
https://cran.r-project.org/web/packages/shinymanager/
[22]. HealthDataViz [Online]. Available: https://healthdataviz.com/

51
52

You might also like