Early Warning of Traffic Accident in Shanghai Based On Large Data Set Mining
Early Warning of Traffic Accident in Shanghai Based On Large Data Set Mining
Early Warning of Traffic Accident in Shanghai Based on Large Data set Mining
Yang Yanbin, Zhou Lijuan, Leng Mengjun, Sun Ling
Shanghai Maritime University, College of Transport &Communications, Shanghai, 201306, China
[email protected]
Abstract—Through the classification and regression analysis Data mining is the process of extracting knowledge from
on traffic accident statistics in Shanghai from July 2014 to specific forms of data. For specific data, specific issues,
April 2015, the paper puts forward a forecasting model of choosing one or more algorithms to find the hidden rules of
traffic accident incidences, by which we provides the index the data, that is implicit and meaningful knowledge, to
system of traffic accident, including month, week, weather and provide scientific support for decision making. The basic
wind speed. Using this model to calculate the range of traffic process of data mining is as follows:
accident simultaneously. Finally, making decisions
and recommendations for controlling traffic accidents and A. Data preparation
rescue related based on analyzing safe levels, which has Select the data applicable to data mining applications, the
important guiding significance to the traffic accident quality of research data, in order to further analyze the
prevention and traffic safety management in our country.
preparation, and determine the analytical methods to be
Keywords- data mining; traffic accident; regression analysis; carried out. We analyze the main data source of traffic
incidence; safety levels Introduction accidents in Shanghai in recent years. In order to data mining
more effectively , but also includes a number of relevant data,
such as Shanghai's time information, temperature
I. INTRODUCTION information, weather information, etc..
According to the global traffic and police department
B. Data reorganization and conversion
statistics, the number of traffic accidents in the world for
about 500 thousand people last year. There are 104 thousand On the basis of open data of the Shanghai municipal
people in China, accounting for 1/5 of the total number of government, using soda data, public data and private data,
deaths worldwide traffic accidents, ranking first in the world. taking into account the accident data is the government
And a lot of traffic accidents happened because of the statistics and manual sorting, and is mainly used for the
unreasonable setting of the road itself, the need is hurry to analysis of accident statistics, accident data is incomplete,
change the status quo, to reduce the incidence of accidents. redundancy and ambiguity, not for data mining algorithm
At present, the road traffic accident analysis and decision directly, the need for data processing and classification.
etc. basically in the manual processing stage, and manual C. Data mining
processing is the main cause of low efficiency and poor
accuracy of decision analysis of the large amount of data After cleaning and conversion, the original data of the
traffic accident. Therefore, it is imperative to carry out accident is suitable for mining data sets, data mining on this
scientific research and effective improvement on the analysis data set to complete the extraction of knowledge, to find the
and decision making of road traffic accidents. But the appropriate knowledge model for decision analysis. For
existing navigation system only for speeding, and specific data, specific issues, choose one or more data
monitoring of the high incidence of road sweeping voice mining algorithm, find the hidden rules, rules and patterns,
prompt to have shortcomings, in view of the road ahead of and provide the solution to the problem.
the drivers prone to defects, improving vigilance on the D. Result analysis
traffic accidents, the user vigilance, thus reducing the
probability of road accidents. Interpret the results of data mining and evaluate the
This paper makes analysis on whether the various factors results, remove the meaningless part, the meaning of the
of Shanghai traffic accidents influencing traffic accidents. rules or patterns to analyze again, and ultimately to be easy
Through the collation of a large initial record of accident to understand and identify the way to provide decision
data, and screening the influence factors by significance makers.
analysis, to comprise the new accident record. The accident III. ROAD TRAFFIC ACCIDENTS DATA MINING IN
rate model was fitted by Lingo, and the influence factors on SHANGHAI
the traffic accidents rate were derived.
The goal of data mining is to discover hidden and
meaningful knowledge from databases. There are many data
mining algorithms and they applies to broad functional areas,
II. ACCIDENT DATA MINING which includes classification, estimation and prediction,
clustering, association, sequence discovery and
characterization. Regression analysis, time series analysis,
cluster analysis and others are general methods.
Table 3 Weather categories Figure 1. relation fitting on month and traffic accidents frequency
Weather Reference values
heavy rain 1 From the chart above, the number of accidents in
thundershower 2 Shanghai occurred at least in February, in the September,
moderate rain 3 October and November occurred more. In February, most
rainstorm 4 people go home for the New Year, the Shanghai traffic
clear 5 volume tends to the lowest, so the number of occurrences are
shower 6 minimum. In the September, October and November, on the
overcast 7 one hand because the students term begins, on the other hand
sleet 8 due to the National Day holiday, and the vehicles increased,
light rain 9 so the number of occurrences also increased and is in line
cloudy 10 with reality.
Based on the analysis of other influencing factors, we
Table 4 Wind direction categories can get the conclusion:
Wind direction Reference values 1) Week
east wind 1
19
The number of accidents on Monday, Thursday and of the regression equation is very good, the regression
Friday mostly, and also in the first and the last two working equation is significant, the regression model is setting up.
days, people are generally become undisciplined, prone to
traffic accidents.
C. Model of accident occurrence rate
2) Time
As we know, the number of traffic accidents in the According to the relationship between the number of
morning and evening peak hours more than other times, that accidents and the various influence factors, we first assume
is, more accidents occurs in 6:00-8:00 and 16:00-19:00. that the relationship between the incidence rate and the
3) Temperature influence factors is as follows:
The number of traffic accidents in each temperature Y k1 x13 k 2 x12 k 3 x1 k 4 x 24 k 5 x 23 k 6 x 22 k 7 x 2
range is relatively average, but with the increase of k 8 x35 k 9 x34 k10 x33 k11 x32 k12 x3 k14 x 42 k15 x 4 (2-8)
temperature, the number of traffic accidents has increased 2 3 2
slowly. k16 ln( x5 ) k17 x k18 x 6 k19 x k 20 x k 21 x 7
6 7 7
20
Figure 3. Index system of accident occurrence rate (3) According to the scope of the traffic accident
incidence, we put forward the safety level, and provide the
In this way, we can calculate the probability of corresponding measures and the concept of the volunteer aid
occurrence of traffic accidents according to the month, week, station in different safety level.
the weather and wind speed. (4) In order to develop the traffic accident rate model
better, the classification of the current traffic data need to be
IV. MODEL APPLICATION
more reasonable, in addition to the current traffic accident
According to the function that we have obtained, as well data, other data such as vehicle mileage, road information
as the value of each variable range .We find out the and lane number data that could influence traffic accident,
maximum value of the traffic accident rate is 1.2397, the we need to collect and improve the modal as soon as
minimum value is 0.9303. possible.
That is when on Tuesday January, the weather is cloudy,
wind speed at the 4-6 level, the probability of traffic REFERENCES
accidents achieve maximum, we should watch rigorously. [1] Hayakawa H, Fischbeck P S, Fischhoff B. Traffic accident statistics
When on Monday August, the weather is rain, wind speed at and risk perceptions in Japan and the United States[J]. Accident
the 3 level, the probability of traffic accident reach the Analysis & Prevention, 2000, 32(6):827-35.
minimum instead. A possible reason is that we will be more [2] Evans A W. Estimating transport fatality risk from past accident
data[J]. Accident Analysis & Prevention, 2003, 35(4):459-72.
careful in a rainy day, not prone to traffic accidents, but we
also need to remind people to be careful. [3] Liu Jun, “Traffic accident analysis based on Data Mining
Technology” [J]. Transport Information and Safety, 2008, 26(1):73-
According to the range of traffic accidents rate, we give 76. (in Chinese)
the safety level, as shown in the following table: [4] Li Ganshan, “Study on the Traffic Accident Fatality Data in Yunnan
Table 7 Safety level classification Province of China” [J]. China Safety Science Journal, 2007,
Safety level Range of accident rate 17(7):72-80.
7 0.9303-0.9746
6 0.9747-1.0189
5 1.0190-1.0632
4 1.0633-1.1075
3 1.1076-1.1518
2 1.1519-1.1961
1 1.1962-1.2400
21