1. Introduction
Sugarcane (
Saccharum officinarum) is used in the production of 68% of global sugar and is also used for bioethanol [
1]. It is a high-impact crop because of a long growing period and the large land, labour, water, and nutrient requirements needed for successful cultivation [
2]. Global and national demands for sugarcane as food and bioenergy continue to increase, but the capacity to increase the area of sugarcane plantations is limited. Increased sugarcane yields are therefore dependent on increased production per unit area, with sustainable production being key to achieving this.
India is the second largest producer of sugarcane in the world [
3], and the cultivation of sugarcane forms an important part of the Indian economy. National requirements for sugarcane are predicted to reach 600 million Mt by 2030, and global exports have risen from 2.6% in 2010 to 6.3% in 2020 [
4]. Sugarcane production also employs millions of farmers alongside transport and tertiary roles associated with the industry [
5,
6]. Sugarcane production in India is heterogeneous, with over 550 operational sugar mills across the country, with different ownership, management styles, and cane varieties that are exposed to a range of different environmental conditions. This is problematic, as the many independent and commercial farming communities do not have access to knowledge and physical tools like machinery, irrigation systems, and improved cultivars to enable sustainable sugarcane production [
7]. As a result, the production of vast quantities of sugarcane may be unsustainable through widespread use of monocultures and inappropriate pesticide and fertilizer use [
8], which negatively impact productivity and degrade soil and water resources to the point where short-term restoration becomes difficult [
9]. Many businesses seek to be economically, environmentally, and socially sustainable, where profitable sugar production is combined with desirable employment and minimal negative environmental effects [
3,
6]. Hence, sugar mill managers are interested in improving their understanding of the source and management of their sugar.
Optimising agricultural practices such as fertilisation, irrigation, and pest and disease management requires information about the seasonal development of sugarcane [
10], as resource requirements vary with sugarcane growth stage [
5] and spatially between climatic zones. Accurate monitoring can identify biotic and abiotic stress in order to guide improved management aimed at reducing post-harvest losses and the cost of cultivation and increasing sucrose content at harvest [
8]. However, sugarcane monitoring from field surveys at sample sites is costly, labour intensive, and time-consuming, especially as multiple surveys are required during the growing season [
11]. These sample data will often not fully represent the heterogeneous nature of the sugarcane industry, and the resulting information quickly becomes outdated [
12].
An alternative approach to field surveys is to combine Earth observation (EO) data from satellites with biophysical crop models [
13]. Satellite sensors allow for large areas to be covered quickly and changes in biomass to be studied in near real-time [
14,
15]. They also capture information in non-visible wavelengths at a temporal frequency that would be too time consuming to collect through field visits, and they can image areas that are difficult to access on the ground [
14,
16].
Previous studies have used time series of readily available Landsat and Sentinel-2 optical imagery to derive information on sugarcane growth stages in Ethiopia, China, and India [
12,
17,
18,
19]. A major challenge when deriving sugarcane information from optical EO data is the low frequency of observations and gaps in the time series caused by clouds [
19,
20]. Using appropriate cloud filtering methods and parameters for temporal smoothing are important for optimising data retention, but parameterisation is subjective and left to the experience of the data analyst [
21]. This is problematic for the development of algorithms, as these parameters vary spatially and temporally, affecting the interpretation of EO time series [
22].
A second challenge is that existing methodologies require field-scale knowledge of the planting and harvest dates to isolate individual growing seasons from within a time series dataset before growth stages can be derived. Shape-fitting models have been used to monitor the growth and phenology of maize, soybean, wheat, barley, sorghum, rice, and cotton crops in the USA and China using MODIS vegetation indices time series data [
23,
24,
25,
26]. Shape-fitting model approaches work by averaging multiple-year growth cycles from EO time series data to estimate crop phenology. However, these methods require planting and harvesting dates derived from either field data or crop calendar information [
23,
25], or, where this information is not available, state-level [
24] and national-level statistics [
26] to isolate individual sugarcane growing seasons from long time series data and provide the field-scale growth information. These methods are not practical for Indian sugarcane, as sugarcane management is heterogeneous with phenology varying between different fields, which makes it difficult to collect ground-based phenology estimates as state-level and national-level statistics, and crop calendars cannot be used [
12,
18]. The averaging of multiple-year growth cycles to generate shape models is also inappropriate for monitoring the growth of sugarcane crops, as the growth cycles and length of phenological stages for virgin and ratoon sugarcane differs. Differences in management can also affect the shape of the growth profiles. Studying these differences is important for the sustainability of sugarcane production and averaging multiple growth profiles would remove this information.
A third challenge is a lack of transferability between EO measurements of land surface phenology, based on temporal variation in plant vigour, and field survey-derived growth stages where field-scale phonological information is absent [
27]. There are advantages for crop management if information derived from EO monitoring can be compared with growth stages familiar to producers using standard management practices in the field without needing to use survey-derived information.
This study seeks to overcome these challenges by developing an automated method to derive sugarcane growth stages from EO data by first splitting the temporal signal into individual seasons, followed by identifying features with the annual profiles that are representative of sugarcane phenology. The advantage of this approach is continuous field-level monitoring of sugarcane growth that can be used alongside bio-physical models and in combination with additional datasets, such as climate and yield measurements, as a tool for improving sustainable production. For example, mills could optimise the organisation of cane crushing operations at the field scale based on crop development before harvesting takes place.
4. Discussion
Growth stage information is important for sustainable sugarcane management as it allows for the optimisation of resource use at different stages of the sugarcane growth cycle. The advantage of using EO data to explore biomass change over time across large areas is well documented, and there has been a notable increase in the number of studies using EO to explore sugarcane growth. However, the applications of previous studies to assess sustainable sugarcane management at the field scale are limited by three main factors: low frequency of observations after cloud removal, reliance on prior knowledge of planting and harvest times before the intermediate growth stages can be defined [
12,
17,
18,
19], and ambiguity and lack of transferability between EO-derived growth stages and field-derived growth stages important for management where field survey data are unavailable.
To overcome the above challenges, in this study, we investigated methods and optimised pre-processing parameters for cloud and noise removal. We also automated the splitting of long time series data into growing seasons to then automate the identification of sugarcane growth stages within two Indian study sites. The FAO model of sugarcane growth was chosen as a simplified model to relate the physical principles of remote sensing to field-level development stages important for management.
4.1. Increasing Observation Frequency in Cloudy Locations
In contrast to previous studies, this study investigated methods for the removal of cloud coverage and optimised pre-processing parameters specifically for the time series analysis of sugarcane fields in India. This was important to effectively remove cloud coverage whilst maximising the number of cloud-free observations per field per growing season.
We found that pixel-based cloud filtering appeared to be more effective at removing erroneously low values of NDVI recorded by S2 TOA and L8 TOA than the use of scene-wide filtering through a >20% cloud mask. The pixel-based cloud filtering method was able to remove more noise and resulted in fewer data gaps within the time series. Finding this sweet spot between the amount of noise removed and number of observations retained was important for the extraction of useful information by computer algorithm for the automated identification of growth stages, and after pixel-based cloud filtering, the trapezoid shape of sugarcane growth could be visualised from changes in NDVI over time (
Figure 4).
The monsoon season in our study areas in India typically occurs during the three months from June to September. During this period, there is a high probability of cloud coverage, and hence, the frequency of good-quality observations to monitor changes in NDVI is low. To address this, we combined L8 TOA and S2 TOA NDVI values by developing a local regression calibration, which was similar to the regression reported by others [
35]. The harmonisation worked well, increasing the frequency of cloud-free observations from every 18–36 and 4–9 days, respectively, for L8 TOA and S2 TOA time series datasets, to every 3–6. However, gaps in the data still remained, with cloud cover persisting for long periods of time during the monsoon.
We therefore investigated the optimum BISE sliding window for the removal of additional noise during the monsoon season in Unit 1 and a resampling interval for the splitting of the long time series data into seasons. This was important, because the lack of definitive rules for pre-processing, i.e., determining an acceptable threshold for the BISE filter, makes methods difficult to replicate. The results showed that a sliding window period of 30 days and a resampling interval of 14 days gave the best balance of minimising erroneous troughs and the root mean standard error whilst retaining sufficient information. The BISE filter with a sliding period of 30 and a 14-day resampling interval was applied to the time series data for fields in both Unit 1 and Unit 2 for the removal of additional noise and, thereby, automation of growth stage identification. The above method worked well for removing cloud coverage in Unit 1, with clouds affecting the identification of growth stages in only a small number of fields. In contrast to the Unit 1 dataset, the major cause for outliers in the evaluation dataset (Unit 2) was cloud coverage. Resampling and smoothing where there was the continued presence of some NDVI data affected by clouds during the mid-season resulted in significant over- and underestimation of the start and end of the mid-season for Unit 2 (
Figure 14). This suggests that there is still room for further improvement of the removal NDVI data on cloudy days. With the removal of such outliers, it was possible to develop an algorithm that could explain 78% of the variation in a manually determined estimate of the SOS evaluation dataset of 338 fields. Like the calibration dataset, the errors from the prediction of the SOS and EOS cascaded and affected the prediction of the SMS and EMS of the mid-season. Compared to the calculation of the SOS and EOS, the algorithm was less able to predict the manually determined values of SMS and EMS. A reason for this could be the tailored nature of the determined BISE sliding period, as resampling was conducted on Unit 1 while the results were simply applied to Unit 2. To improve the identification of sugarcane phenology from Unit 2, it may be necessary to tailor pre-processing parameters to specific study areas and growth seasons in future studies. The 2018–2019 growth season in Unit 1 may have contained fewer cloudy observations than the 2019–2020 growth season in Unit 2. Further investigation is also required to assess the impact of the SR and HLS datasets on the algorithm sensitivity.
It is also important to note that the impact of cloud coverage was more severe in small fields (less than 0.9 ha); therefore, increasing the spatial resolution of the satellite data used or supplementing missing NDVI observations with synthetic aperture radar (SAR) data from Sentinel-1 back scatter may help to overcome this challenge. Sentinel-1 data come from active instruments that have been used in the past to monitor vegetation growth over time without being affected by clouds [
11,
42,
43,
44]. However, it is important to note that Sentinel-1 time series data reflect structural changes in vegetation over time, while NDVI time series data are related to biomass. Future studies should consider the differences in the data captured by passive and active sensors, which could be harmonised to improve optically derived sugarcane growing stages.
4.2. Time Series for Management
Previous studies on determining the duration of sugarcane growth stages have been based on the pre-specification of planting and harvesting dates [
12,
17,
19,
45]. By contrast, this study demonstrates the capacity to determine the growth stages of sugarcane without knowing the planting and harvesting dates, and this is especially important for India, where planting and harvest dates are variable and cannot simply be assumed. Due to the lack of a requirement for specifying planting and harvesting dates, the approach described here can be used over substantially wider areas.
Another difference between this study and previous studies is the use of the simplified FAO trapezoid model for deriving field-level growth stages. The novel method developed in this study, based on the FAO model, is advantageous, as it was able to provide more information about growth stages important for management than previously used simplified models like double logistic curves [
45], as key stages are not removed through over-smoothing. The use of the FAO trapezoidal model of sugarcane growth stages was advantageous, as information regarding differences in the timing and length of growth stages between virgin cane, ratoons, and different management practices were not masked by averaging multiple-year growth cycles. This paper presents a novel method for deriving the stages of the FAO trapezoid model for sugarcane growth by applying a knee function originally designed for computer programmers to a biological system using the physical principles of remote sensing. For computer programmers, the knee analysis is based on the relationship between cost to increase tunable parameters and the corresponding performance benefit, with the knee being defined at the point of levelling off. This paper found that leveling-off effects (knees) are also present in EO data that capture information about biological systems, with NDVI stabilising at the SMS as the number of new tillers (new leaves) stabilises and the focus of growth moves to stem-elongation (an effect which is not captured by NDVI). This is important for sustainable sugarcane production because the mid-season stage is particularly sensitive to drought stress, whereas drought stress is less critical during the senescence stage [
31]. It is important to note that the methodology developed in this study used a complete growth cycle of observations to derive the sugarcane growth stages, as our focus was on splitting long time series data into multiple seasons. The use of historic time series data to monitor phenology over large areas and time scales is useful for establishing baselines for management where yields and intervention practices are variable. Within-season monitoring is useful for precision agriculture, allowing for intervention strategies to be put in place whilst the sugarcane crop is still growing, for example, to reduce post-harvest losses. Future studies should therefore investigate whether the methodology developed in this study could be used for within-season crop phenology detection, i.e., whether an entire season of data is needed for deriving sugarcane phenology.
Many sugarcane factory owners want more clarity on when sugarcane is grown and harvested; hence, the described trapezoidal approach for describing sugarcane growth phases using EO data could have benefits for field level crop management and regional predictions of sugarcane yields. For example, it is important for sugarcane to be transported to the mills for crushing as soon as it is harvested in order to minimise post-harvest losses [
46]. In addition to the timing of growth stages, monitoring the length of growth stages is also important for management. For example, the longer length of the mid-season in Unit 1 sugarcane fields than the standards stated by [
5,
31] could indicate that the fields were harvested after the sugarcane had started to senescence. This information is important for sustainable sugarcane production because sugarcane that is harvested after senescence or before the mid-season has lower yields resulting from reduced sucrose concentrations in the stalks [
47], and is therefore not a sustainable use of resources.
The Indian state of Telangana also faced extreme drought between 2016 and 2018. The shorter mid-season in Unit 2 could indicate earlier harvests due to the onset of drought. Prematurely harvesting the cane during the mid-season could have been a way for farmers to provide the mills with some cane before the full effects of the drought had completely caused the crop to fail. Such observations highlight the benefits of combining EO measurements with agrometeorological data. There is also a difference between the proportion of drip and flood irrigated fields in Unit 1 and Unit 2; therefore, future work should also investigate whether the irrigation type has an impact on the derived length of growing seasons.
4.3. Improving the Automation for Deriving Growth Stages
Although ground surveys can provide detailed data, their collection can be costly and labour intensive, and hence, this type of data collection is rare for large areas [
11]. The validity of ground surveys is also difficult to assess. The method used in this paper takes the mean NDVI within the field boundary; therefore, correct GPS field boundaries were important, as averaging the NDVI values of two opposing growth cycles will cause them to cancel each other out.
Mixed cropping before the planting of the sugarcane crop resulted in the underestimation of the SMS. The early identification of the end of the season due to cloud coverage resulted in the early prediction of the SMS for two fields. The late identification of the end of season resulting from the point of harvest coinciding with a cloud event in turn resulted in the overestimation of the end of mid-season for two fields.
Changing the boundary after harvest resulted in erroneous overestimates of the EOS for five sugarcane fields located in Unit 1. With the removal of such outliers, it was possible to develop an algorithm that could explain 73% and 86% of the variation in a manually determined estimate of the start and the end of a sugarcane growth cycle, respectively, for a calibration dataset of 337 individual sugarcane fields. To address these problems, there is a need to develop an approach which can separate the mixed fields, as errors cascade and affect how the other phenological stages are derived. This can be seen in the prediction of the start and end of the mid-season for Unit 1, which was less able to predict the start and the end of the mid-season stage for fields with boundary faults. To overcome this problem, it will be necessary for future studies to determine the growth stages for individual pixels within a field or group similar pixels before extracting time series data.
Time series-based approaches for classifying sugarcane have been found to be more effective than single-date mapping approaches, as they allow for land cover to be assessed continually over many different spatial and temporal resolutions [
48]. Sugarcane is a dynamic crop, and features used for sugarcane detection, e.g., spectral reflectance, texture, shape/size, and height change with development stage and management practices such as irrigation. The inclusion of temporal information should also enable the clearer discrimination of sugarcane from other land cover types such as paddy rice and maize, which may have a similar signal on a single date. Sugarcane, for example, typically has a single annual cropping cycle, whereas paddy rice and maize can be sown and harvested two to three times within a single year [
12,
49,
50]. Mapping the spatial and temporal distribution of sugarcane fields over large areas in order to identify the source of sugarcane supplied to mills is therefore the first step to using EO for sustainable sugarcane production. Future work should conduct an investigation into whether the growth stages derived in this study could also be used to aid classification in addition to management.