The study put forward a data fusion approach for urban remote sensing that combines SAR (Synthetic Aperture Radar) and optical satellite data. By integrating datasets from different sensors and spatial-temporal scales, the technique aims to extract more accurate information. The fusion approach utilizes two methods: feature-based fusion, where relevant features are extracted and fused, and simple layer stacking (SLS), where the original datasets are directly stacked as multiple layers. This study extracted features using SAR textures (using Sentinel-1) and modified indices (using Landsat-8), and then classified these features using an XGBoost algorithm implemented in Python and Google Earth Engine. Researchers examined five cities, each representing a distinct climatic zone and urban dynamic: Cape Town, Guangzhou, Los Angeles, Mumbai, and Osaka. An accuracy assessment was conducted using random validation points, achieving an overall accuracy of 89.5% using the proposed MSFI method. A comparison was also performed with three well-known global products. The proposed approach, outperformed all three global products achived 89% accuracy while ESA (84%), ESRI (81%) and Dynamic World (82%). Additionally, Land surface temperature analysis was accomplished to investigate the relationship between extracted UIS and Land Surface Temperature (LST) across selected cities to show the practical use of proposed MSFI method. Los Angeles, a warm temperate city, showed the highest LST among all five cities. The datasets, along with the GEE and Python codes, are available at https://github.com/mnasarahmad/sls.