Transmission lines are an important component of the power system, and the detection of transmission line fittings is of great significance for ensuring the safe and stable operation of the power grid. In the inspection of transmission lines, drones are mainly used for taking photos and deep learning technology is used to achieve automatic detection. Due to the complex inspection background, high occlusion interference, and the variety of metal object categories and varying shapes and sizes, common detection methods have poor performance. This paper proposes an improved YOLOv5 method based on deformable convolution and coordinate attention, called DC-YOLOv5. Firstly, we construct the YOLOv5 network as the basic framework for the detection model. In order to extract more effective features from images containing complex background interference, we use deformable convolution to improve the original convolution module and enhance the feature extraction ability of the backbone network. Then, we use the coordinate attention module to process the output of the backbone network, improve the model’s attention to fitting targets. This article hopes to effectively improve the performance of the model and maintain low complexity of the model for subsequent UAV deployment by using such uncomplicated lightweight modifications. Finally, in order to verify the effectiveness of DC-YOLOv5, a fitting detection dataset was established and experiments were conducted. The results indicate that DC-YOLOv5 has higher detection accuracy compared to other models and can accurately detect various metal object targets in complex environments.