Judith Williams
Judith Williams
Judith Williams
INTERNATIONAL (JPTS)
INSTITUTION OF SCIENCE MANAGEMENT AND
TECHNOLOGY TERM PAPER PRESENTATION ON A TOPIC:
NAME_GLORIA WILLIAMS
REG NUMBER_47497
SEMESTER_400LEVEL SECOND SEMESTER
FACULTY_MANAGEMENT
DEPARTMENT_BUSINESS ADMINISTRATION
CENTER: KADUNA FULL TIME CENTRE.
Introduction
1
In a thesis, dissertation, academic journal article or other formal pieces of
research, there are often details of how the researcher approached the study and the
methods and techniques they used. If you’re designing a research study, then it’s
helpful to understand what research methodology is and the selection of techniques
and tools available to you. In this article, we explore what research methodology
is, the types of research methodologies and the techniques and tools commonly
used to collect and analyze data.
What Is Research Methodology?
Research methodology is a way of explaining how a researcher intends to carry out
their research. It’s a logical, systematic plan to resolve a research problem. A
methodology details a researcher’s approach to the research to ensure reliable,
valid results that address their aims and objectives. It encompasses what data
they’re going to collect and where from, as well as how it’s being collected and
analyzed.
Why is a Research Methodology important?
A research methodology gives research legitimacy and provides scientifically
sound findings. It also provides a detailed plan that helps to keep researchers on
track, making the process smooth, effective and manageable. A researcher’s
methodology allows the reader to understand the approach and methods used to
reach conclusions.
2
Types of research methodology
When designing a research methodology, a researcher has several decisions to
make. One of the most important is which data methodology to use, qualitative,
quantitative or a combination of the two. No matter the type of research, the data
gathered will be as numbers or descriptions, and researchers can choose to focus
on collecting words, numbers or both.
Quantitative
Researchers usually use a quantitative methodology when the objective of the
research is to confirm something. It focuses on collecting, testing and measuring
numerical data, usually from a large sample of participants. They then analyze the
data using statistical analysis and comparisons. Popular methods used to gather
quantitative data are:
Surveys
Questionnaires
Test
Databases
3
Organizational records
This research methodology is objective and is often quicker as researchers use
software programs when analyzing the data. An example of how researchers could
use a quantitative methodology is to measure the relationship between two
variables or test a set of hypotheses.
Statistical analysis is a scientific tool in AI and ML that helps collect and analyze
large amounts of data to identify common patterns and trends to convert them into
meaningful information. In simple words, statistical analysis is a data analysis tool
that helps draw meaningful conclusions from raw and unstructured data.
Descriptive Analysis
Descriptive statistical analysis involves collecting, interpreting, analyzing, and
summarizing data to present them in the form of charts, graphs, and tables. Rather
4
than drawing conclusions, it simply makes the complex data easy to read and
understand.
Inferential Analysis
The inferential statistical analysis focuses on drawing meaningful conclusions on
the basis of the data analyzed. It studies the relationship between different
variables or makes predictions for the whole population.
Predictive Analysis
Predictive statistical analysis is a type of statistical analysis that analyzes data to
derive past trends and predict future events on the basis of them. It uses machine
learning algorithms, data mining, data modelling, and artificial intelligence to
conduct the statistical analysis of data.
Prescriptive Analysis
The prescriptive analysis conducts the analysis of data and prescribes the best
course of action based on the results. It is a type of statistical analysis that helps
you make an informed decision.
Causal Analysis
The causal statistical analysis focuses on determining the cause and effect
relationship between different variables within the raw data. In simple words, it
determines why something happens and its effect on other variables. This
methodology can be used by businesses to determine the reason for failure.
5
Importance of Statistical Analysis
Statistical analysis eliminates unnecessary information and catalogs important data
in an uncomplicated manner, making the monumental work of organizing inputs
appear so serene. Once the data has been collected, statistical analysis may be
utilized for a variety of purposes. Some of them are listed below:
Although there are various methods used to perform data analysis, given below are
the 5 most used and popular methods of statistical analysis:
6
Mean
Mean or average mean is one of the most popular methods of statistical analysis.
Mean determines the overall trend of the data and is very simple to calculate. Mean
is calculated by summing the numbers in the data set together and then dividing it
by the number of data points. Despite the ease of calculation and its benefits, it is
not advisable to resort to mean as the only statistical indicator as it can result in
inaccurate decision making.
Standard Deviation
Regression
Regression is a statistical tool that helps determine the cause and effect relationship
between the variables. It determines the relationship between a dependent and an
independent variable. It is generally used to predict future trends and events.
Hypothesis Testing
7
is used when the size of the population is very large. You can choose from among
the various data sampling techniques such as snowball sampling, convenience
sampling, and random sampling.
8
Next in our list of data analytics tools comes a more technical area related to
statistical analysis. Referring to computation techniques that often contain a variety
of statistical techniques to manipulate, explore, and generate insights, there exist
multiple programming languages to make (data) scientists’ work easier and more
effective. With the expansion of various languages that are today present on the
market, science has its own set of rules and scenarios that need special attention
when it comes to statistical data analysis and modeling. Here we will present one
of the most popular tools for a data analyst – Posit (previously known as RStudio
or R programming). Although there are other languages that focus on (scientific)
data analysis, R is particularly popular in the community.
Posit, formerly known as RStudio, is one of the top data analyst tools for R and
Python. Its development dates back to 2009 and it’s one of the most used software
for statistical analysis and data science, keeping an open-source policy and running
on a variety of platforms, including Windows, macOS and Linux. As a result of the
latest rebranding process, some of the famous products on the platform will change
their names, while others will stay the same. For example, RStudio Workbench and
RStudio Connect will now be known as Posit Workbench and Posit Connect
respectively. On the other side, products like RStudio Desktop and RStudio Server
will remain the same. As stated on the software’s website, the rebranding happened
because the name RStudio no longer reflected the variety of products and
languages that the platform currently supports.
Posit is by far the most popular integrated development environment (IDE) out
there with 4,7 stars on Capterra and 4,5 stars on G2Crowd. Its capabilities for data
cleaning, data reduction, and data analysis report output with R markdown, make
this tool an invaluable analytical assistant that covers both general and academic
data analysis. It is compiled of an ecosystem of more than 10 000 packages and
extensions that you can explore by categories, and perform any kind of statistical
analysis such as regression, conjoint, factor cluster analysis, etc. Easy to
understand for those that don’t have a high-level of programming skills, Posit can
perform complex mathematical operations by using a single command. A number
of graphical libraries such as ggplot and plotly make this language different than
others in the statistical community since it has efficient capabilities to create
quality visualizations.
9
Posit was mostly used in the academic area in the past, today it has applications
across industries and large companies such as Google, Facebook, Twitter, and
Airbnb, among others. Due to an enormous number of researchers, scientists, and
statisticians using it, the tool has an extensive and active community where
innovative technologies and ideas are presented and communicated regularly.
Amongst its most valuable functions, MAXQDA offers users the capability of
setting different codes to mark their most important data and organize it in an
efficient way. Codes can be easily generated via drag & drop and labeled using
10
colors, symbols, or emojis. Your findings can later be transformed, automatically
or manually, into professional visualizations and exported in various readable
formats such as PDF, Excel, or Word, among others.
5. SQL CONSOLES
Our data analyst tools list wouldn’t be complete without SQL consoles.
Essentially, SQL is a programming language that is used to manage/query data held
in relational databases, particularly effective in handling structured data as a
11
database tool for analysts. It’s highly popular in the data science community and
one of the analyst tools used in various business cases and data scenarios. The
reason is simple: as most of the data is stored in relational databases and you need
to access and unlock its value, SQL is a highly critical component of succeeding in
business, and by learning it, analysts can offer a competitive advantage to their
skillset. There are different relational (SQL-based) database management systems
such as MySQL, PostgreSQL, MS SQL, and Oracle, for example, and by learning
these data analysts’ tools would prove to be extremely beneficial to any serious
analyst. Here we will focus on MySQL Workbench as the most popular one.
MySQL Workbench is used by analysts to visually design, model, and manage
databases, optimize SQL queries, administer MySQL environments, and utilize a
suite of tools to improve the performance of MySQL applications. It will allow you
to perform tasks such as creating and viewing databases and objects (triggers or
stored procedures, e.g.), configuring servers, and much more. You can easily
perform backup and recovery as well as inspect audit data. MySQL Workbench
will also help in database migration and is a complete solution for analysts working
in relational database management and companies that need to keep their databases
clean and effective. The tool, which is very popular amongst analysts and
developers, is rated 4.6 stars in Capterra and 4.5 in G2Crowd.
12
Our list of data analysis tools wouldn’t be complete without data modeling.
Creating models to structure the database, and design business systems by utilizing
diagrams, symbols, and text, ultimately represent how the data flows and is
connected in between. Businesses use data modeling tools to determine the exact
nature of the information they control and the relationship between datasets, and
analysts are critical in this process. If you need to discover, analyze, and specify
changes in information that is stored in a software system, database or other
application, chances are your skills are critical for the overall business. Here we
will show one of the most popular data analyst software used to create models and
design your data assets.
8. ETL TOOLS
ETL is a process used by companies, no matter the size, across the world, and if a
business grows, chances are you will need to extract, load, and transform data into
another database to be able to analyze it and build queries. There are some core
types of ETL tools for data analysts such as batch ETL, real-time ETL, and cloud-
based ETL, each with its own specifications and features that adjust to different
business needs. These are the tools used by analysts that take part in more technical
processes of data management within a company, and one of the best examples is
Talend.
Talend is a data integration platform used by experts across the globe for data
management processes, cloud storage, enterprise application integration, and data
quality. It’s a Java-based ETL tool that is used by analysts in order to easily process
millions of data records and offers comprehensive solutions for any data project
you might have. Talend’s features include (big) data integration, data preparation,
cloud pipeline designer, and stitch data loader to cover multiple data management
requirements of an organization. Users of the tool rated it with 4.2 stars in Capterra
and 4.3 in G2Crowd. This is an analyst software extremely important if you need
to work on ETL processes in your analytical department.
Apart from collecting and transforming data, Talend also offers a data
governance solution to build a data hub and deliver it through self-service access
through a unified cloud platform. You can utilize their data catalog, inventory and
produce clean data through their data quality feature. Sharing is also part of their
data portfolio; Talend’s data fabric solution will enable you to deliver your
information to every stakeholder through a comprehensive API delivery platform.
13
If you need a data analyst tool to cover ETL processes, Talend might be worth
considering.
9. AUTOMATION TOOLS
As mentioned, the goal of all the solutions present on this list is to make data
analysts lives easier and more efficient. Taking that into account, automation tools
could not be left out of this list. In simple words, data analytics automation is the
practice of using systems and processes to perform analytical tasks with almost no
human interaction. In the past years, automation solutions have impacted the way
analysts perform their jobs as these tools assist them in a variety of tasks such as
data discovery, preparation, data replication, and more simple ones like report
automation or writing scripts. That said, automating analytical processes
significantly increases productivity, leaving more time to perform more important
tasks. We will see this more in detail through Jenkins one of the leaders in open-
source automation software.
Developed in 2004 under the name Hudson, Jenkins is an open-source CI
automation server that can be integrated with several DevOps tools via plugins. By
default, Jenkins assists developers to automate parts of their software development
process like building, testing, and deploying. However, it is also highly used by
data analysts as a solution to automate jobs such as running codes and scripts daily
or when a specific event happened. For example, run a specific command when
new data is available.
There are several Jenkins plugins to generate jobs automatically. For example, the
Jenkins Job Builder plugin takes simple descriptions of jobs in YAML or JSON
format and turns them into runnable jobs in Jenkins’s format. On the other side, the
Jenkins Job DLS plugin provides users with the capabilities to easily generate jobs
from other jobs and edit the XML configuration to supplement or fix any existing
elements in the DLS. Lastly, the Pipeline plugin is mostly used to generate
complex automated processes.
For Jenkins, automation is not useful if it’s not tight to integration. For this reason,
they provide hundreds of plugins and extensions to integrate Jenkins with your
14
existing tools. This way, the entire process of code generation and execution can be
automated at every stage and in different platforms – leaving you enough time to
perform other relevant tasks. All the plugins and extensions from Jenkins are
developed in Java meaning the tool can also be installed in any other operator that
runs on Java. Users rated Jenkins with 4.5 stars in Capterra and 4.4 stars in
G2Crowd.
15
11.UNIFIED DATA ANALYTICS ENGINES
If you work for a company that produces massive datasets and needs a big data
management solution, then unified data analytics engines might be the best
resolution for your analytical processes. To be able to make quality decisions in a
big data environment, analysts need tools that will enable them to take full control
of their company’s robust data environment. That’s where machine learning and AI
play a significant role. That said, Apache Spark is one of the data analysis tools on
our list that supports big-scale data processing with the help of an extensive
ecosystem.
Apache Spark was originally developed by UC Berkeley in 2009 and since then, it
has expanded across industries and companies such as Netflix, Yahoo, and eBay
that have deployed Spark, processed petabytes of data and proved that Apache is
the go-to solution for big data management, earning it a positive 4.2 star rating in
both Capterra and G2Crowd. Their ecosystem consists of Spark SQL, streaming,
machine learning, graph computation, and core Java, Scala, and Python APIs to
ease the development. Already in 2014, Spark officially set a record in large-scale
sorting. Actually, the engine can be 100x faster than Hadoop and this is one of the
features that is extremely crucial for massive volumes of data processing.
You can easily run applications in Java, Python, Scala, R, and SQL while more
than 80 high-level operators that Spark offers will make your data transformation
easy and effective. As a unified engine, Spark comes with support for SQL queries,
MLlib for machine learning and GraphX for streaming data that can be combined
to create additional, complex analytical workflows. Additionally, it runs on
Hadoop, Kubernetes, Apache Mesos, standalone or in the cloud and can access
diverse data sources. Spark is truly a powerful engine for analysts that need
support in their big data environment.
12.SPREADSHEET APPLICATIONS
Spreadsheets are one of the most traditional forms of data analysis. Quite popular
in any industry, business or organization, there is a slim chance that you haven’t
created at least one spreadsheet to analyze your data. Often used by people that
16
don’t have high technical abilities to code themselves, spreadsheets can be used for
fairly easy analysis that doesn’t require considerable training, complex and large
volumes of data and databases to manage. To look at spreadsheets in more detail,
we have chosen Excel as one of the most popular in business.
With 4.8 stars rating in Capterra and 4.7 in G2Crowd, Excel needs a category on its
own since this powerful tool has been in the hands of analysts for a very long time.
Often considered a traditional form of analysis, Excel is still widely used across the
globe. The reasons are fairly simple: there aren’t many people who have never
used it or come across it at least once in their career. It’s a fairly versatile data
analyst tool where you simply manipulate rows and columns to create your
analysis. Once this part is finished, you can export your data and send it to the
desired recipients, hence, you can use Excel as a reporting tool as well. You do
need to update the data on your own, Excel doesn’t have an automation feature
similar to other tools on our list. Creating pivot tables, managing smaller amounts
of data and tinkering with the tabular form of analysis, Excel has developed as an
electronic version of the accounting worksheet to one of the most spread tools for
data analysts.
18
science products that help in the design and deployment of analytics processes.
Their data exploration features such as visualizations and descriptive statistics will
enable you to get the information you need while predictive analytics will help you
in cases such as churn prevention, risk modeling, text mining, and customer
segmentation.
With more than 1500 algorithms and data functions, support for 3rd party machine
learning libraries, integration with Python or R, and advanced analytics,
RapidMiner has developed into a data science platform for deep analytical
purposes. Additionally, comprehensive tutorials and full automation, where
needed, will ensure simplified processes if your company requires them, so you
don’t need to perform manual analysis. All these positive traits have earned the tool
a positive 4.4 stars rating on Capterra and 4.6 stars in G2Crowd. If you’re looking
for analyst tools and software focused on deep data science management and
machine learning, then RapidMiner should be high on your list.
19
cleaning features that will let you spot anything from extra spaces to duplicated
fields.
What makes this software so popular amongst others in the same category is the
fact that it provides beginners and expert users with a pleasant usage experience,
especially when it comes to generating swift data visualizations in a quick and
20
uncomplicated way. Orange, which has 4.2 stars ratings on both Capterra and
G2Crowd, offers users multiple online tutorials to get them acquainted with the
platform. Additionally, the software learns from the user’s preferences and reacts
accordingly, this is one of their most praised functionalities.
Highcharts supports line, spline, area, column, bar, pie, scatter charts and many
others that help developers in their online-based projects. Additionally, their
WebGL-powered boost module enables you to render millions of datapoints in the
browser. As far as the source code is concerned, they allow you to download and
make your own edits, no matter if you use their free or commercial license. In
21
essence, Basically, Highcharts is designed mostly for the technical target group so
you should familiarize yourself with developers’ workflow and their JavaScript
charting engine. If you’re looking for a more easy to use but still powerful
solution, you might want to consider an online data visualization tool like datapine.
References
1. Howell, Kerry E. (13 November 2012). "Preface". An Introduction to the
Philosophy of Methodology. SAGE. ISBN 978-1-4462-9062-0.
22
2. ^ Jump up to:a b "methodology". The American Heritage Dictionary.
HarperCollins. Retrieved 20 February 2022.
3. ^ Jump up to:a b c d Herrman, C. S. (2009). "Fundamentals of Methodology
- Part I: Definitions and First Principles". SSRN Electronic
Journal. doi:10.2139/ssrn.1373976.
4. ^ Jump up to:a b c d e f g h i j k l m n o p Howell, Kerry E. (13 November
2012). "13. Methods of Data Collection". An Introduction to the Philosophy
of Methodology. SAGE. ISBN 978-1-4462-9062-0.
5. ^ Jump up to:a b Howell, Kerry E. (13 November 2012). "1. Introduction:
Problems Identified". An Introduction to the Philosophy of Methodology.
SAGE. ISBN 978-1-4462-9062-0.
6. ^ Jump up to:a b c Mehrten, Arnd (2010). "Methode/Methodologie". In
Sandkühler, Hans Jörg (ed.). EnzyklopädiePhilosophie. Meiner.
7. ^ Jump up to:a b c d Mittelstraß, Jürgen, ed. (2005).
"Methode". EnzyklopädiePhilosophie und Wissenschaftstheorie. Metzler.
8. ^ Jump up to:a b c d e f g h i j k Hatfield, Gary (1996). "Scientific method".
In Craig, Edward (ed.). Routledge Encyclopedia of Philosophy. Routledge.
9. ^ Jump up to:a b c Mittelstraß, Jürgen, ed. (2005).
"Methodologie". EnzyklopädiePhilosophie und Wissenschaftstheorie.
Metzler.
10. "Transforming Unstructured Data into Useful Information", Big Data,
Mining, and Analytics, Auerbach Publications, pp. 227–246, 2014-03-
12, doi:10.1201/b16666-14, ISBN 978-0-429-09529-0, retrieved 2021-05-29
11.^ "The Multiple Facets of Correlation Functions", Data Analysis Techniques
for Physical Scientists, Cambridge University Press, pp. 526–576,
2017, doi:10.1017/9781108241922.013, ISBN 978-1-108-41678-8,
retrieved 2021-05-29
12.^ Xia, B. S., & Gong, P. (2015). Review of business intelligence through
data analysis. Benchmarking, 21(2), 300-311. doi:10.1108/BIJ-08-2012-
0050
13.^ Exploring Data Analysis
14.^ "Data Coding and Exploratory Analysis (EDA) Rules for Data Coding
Exploratory Data Analysis (EDA) Statistical Assumptions", SPSS for
Intermediate Statistics, Routledge, pp. 42–67, 2004-08-
16, doi:10.4324/9781410611420-6, ISBN 978-1-4106-1142-0,
retrieved 2021-05-29
23