A machine learning constitutive model for plasticity and strain hardening of polycrystalline metals based on data from micromechanical simulations

Ronak Shoghi; Alexander Hartmaier

doi:10.1088/2632-2153/ad379e

1. Introduction

When materials are subjected to external loads, they respond by deforming and changing their shape and volume. Understanding this behavior is critical for various applications, with safety and reliability considerations. The constitutive equations relate the stress and strain exhibited by a material under mechanical loads. This relationship can be categorized as either elastic or plastic, depending on whether the deformation is reversible. Typically, a yield function is introduced in constitutive modeling, which turns negative if the applied stress is smaller than the yield strength, indicating elastic material behavior, and positive when the yield limit is exceeded. Such yield functions form the backbone of the constitutive equations, specifying the conditions under which a material undergoes permanent plastic deformation. Complementing this, flow rules specify the direction and magnitude of the plastic deformation of a material as it undergoes plastic deformation. In the theory of plasticity, it is rather common to equate the plastic potential function with the yield function, leading to an associated flow rule. As a result, the yield function becomes pivotal not only in identifying the onset of plastic flow but also in elucidating the evolution of stress and the corresponding strain during plastic deformation [1–4]. Von Mises [5] introduced a yield criterion, which can be considered the basis of all contemporary yield functions. His criterion states that plastic deformation in materials begins when the second invariant of the deviatoric stress tensor reaches a critical value. Uninfluenced by hydrostatic stresses, the von Mises criterion has become particularly valuable for predicting the yielding of ductile materials. Later, Hill [6] proposed his criterion as an improvement to von Mises' theory with anisotropic or direction-dependent considerations. Hill's criterion is expressed as a quadratic form of the stress tensor, utilizing six independent material constants. This offers a rather realistic description of anisotropic yield behavior. Despite this significant improvement, Hill's model still fell short of fully capturing the complex anisotropic behavior, particularly in sheet metals. Barlat et al [7] later proposed a yield criterion known as Yld2000–2d, which was validated through experimental data specifically for an aluminum alloy sheet under plane stress conditions. This criterion was later expanded to accommodate three-dimensional stress states using linear transformation functions, leading to the emergence of the Yld2004-13p and Yld2004-18p [8] models. These subsequent models, characterized by 13 and 18 anisotropy parameters, respectively, provide a more comprehensive description of the anisotropic properties. A major issue associated with these parameters is their non-uniqueness, which was observed through a sensitivity analysis conducted by Van den Boogard [9]. A significant outcome of his study was the reduction in anisotropic parameters from 18 to 16 in the Yld2004-18p model. The isotropic coefficients in these models require experimental determination, which poses a considerable challenge. In recent approaches, it has been attempted to relate the anisotropy parameters to various crystallographic textures by machine learning (ML) methods [10].

Given these challenges, data-driven approaches have been introduced that involve developing surrogate models directly from experimental or simulation data, bypassing the need for explicit equations or assumptions about material behavior. Kirchdoerfer and Ortiz [11] first formulated a revolutionary approach called data-driven mechanics. In this paradigm, a data-driven solver was designed to match each point in a material to a pre-determined data set, with the goal of closely satisfying conservation laws. The resulting task for these solvers is to minimize the difference between the material state and the data set while adhering to these laws. They found that as the data set came closer to representing a classical material law in what is called the phase space, the solutions from data-driven solvers moved closer to traditional solutions. Eggersmann et al [12] extended the scope of the data-driven approach from just elastic to inelastic material behavior. Their investigations revealed that many classical plasticity models can be approximated by using material datasets based on history variables. However, a key challenge in their research was determining how to refine these material dataset representations to capture the inelastic behavior more precisely. Lopez et al [13] further advanced the data-driven approach and broadened its applicability to scenarios involving internal variables. These include the plastic strain rate, rate of accumulated plastic deformation, and kinematic variables associated with kinematic hardening rates. To manage this complexity, they proposed an approach that involves the construction of a rich behavior manifold. This manifold essentially functions as a detailed map, outlining the possible behaviors of materials under various conditions. However, such approaches can behave like black boxes, making their predictions less interpretable. Additionally, these models can run the risk of overfitting to their training scenarios, limiting their generalizability. In addition, their reliance on extensive and often purpose-specific datasets can pose practical challenges.

While data-driven mechanics aims to formulate entire mechanical problems without relying on pre-existing constitutive models or physical laws, data-oriented constitutive models focus on learning the material behavior directly from the data but within the established framework of mechanics that adheres to fundamental physical principles. In these models, foundational physical laws and principles, such as the conservation of energy and momentum, are still adhered to. However, rather than employing empirically derived or physics-based constitutive models to understand the relationship between stress and strain, data-oriented constitutive models learn this relationship directly from data. The key challenge with these methods stems from the need to provide sufficient and representative data to input into an ML algorithm, enabling the resultant model to predict complex material behavior under previously unseen scenarios. Such data sources can be diverse, including experiments, micromechanical simulations, or a hybrid of both. In this direction, for hyperelasticity considerations—where strain energy is a function of the current deformation state, and materials can undergo large deformations and still return to their original shape upon load removal—Fernandez [14] developed ML-based constitutive models based on finite deformations. The anisotropic hyperelastic models were trained employing data from cubic lattice metamaterials, which served as the input for convex neural networks. Using neural networks, Fuhg and Bouklas [15] introduced a mixed deep energy method for handling finite-strain hyperelasticity. Building on this, He et al [16] proposed a novel loss function to update the internal plastic state variables, demonstrating the potential of using neural networks to address elastoplasticity problems. In this direction, Mozaffar et al [17] utilized the capacity of recurrent neural networks (RNNs) to implicitly determine internal variables via the network's internal feedback connections. These internal variables could represent hidden or latent features in the stress–strain data. In their work, they constructed a database comprising average stresses and plastic energy for 15000 distinct deformation paths, sourced from simulations of a representative volume lement (RVE). Once trained, the RNN model could predict the average stress and plastic energy based on the deformation paths provided. Linka et al [18] proposed a constitutive neural network to describe hyperelastic material behavior. This approach constitutes a hybrid method in which a data-based description is achieved with ML methods where some constitutive assumptions are part of the structure of the neural network. Huang et al [19] employed the proper orthogonal decomposition feed forward neural network(PODFNN) method to capture both the von Mises yield surface and the hardening law. Using the accumulated absolute strain as the history variable, they found that data collected solely from multi-axial loading tests were sufficient for this characterization. Yang et al [20] developed a data-driven method to model elastoplastic constituitive laws for isotropic materials. They employed neural networks to learn from homogenized stress–strain data derived from direct numerical simulations on an RVE. To impose various loading conditions, they numerically computed the average stress for a given average strain under 22 different loading directions and 500 loading steps, ensuring an even distribution across the principal stress space. Vlassis and Sun [21] introduced a deep learning framework that uses a set of deep neural network predictions to evolve elements, such as the stored elastic energy function, yield surface, and plastic flow. In their approach, the yield function was trained from the elastic response using the level set method. This method reduces the stress representation to only principal stresses, thereby decreasing the dimensionality of the input space. Furthermore, their methodology used a polar coordinate system for yield surface interpolation and applied 140 different Lode angles to partition the $\pi$ -plane. Fuhg et al [22] introduced a texture-dependent, smooth, and convex macroscopic yield function for polycrystals based on data collected from 3240 crystal-plasticity simulations. They used a partially input convex neural network in combination with a level set approach to ensure convexity of the yield function. As suggested by Hartmaier [23] the yield function can be characterized in a data-oriented approach, using an ML algorithm known as support vector classification (SVC), applied to a reference material with Hill-like anisotropy. The training data comprised stress values in the form of principal deviatoric stresses, generated to comprehensively span the space of polar angles. The trained machine learning yield function was then capable of predicting whether a given stress lies inside or beyond the elastic regime of the materials. Later, Shoghi and Hartmaier [24] expanded this method by employing an optimal data generation strategy for uniform sampling of the yield locus in the 6-dimensional stress space. Utilizing a strategy based on Monte Carlo simulations and Fibonacci sequences, they concluded that 300 loading directions were sufficient to train an ML yield function in the full stress space, even for severely anisotropic cases.

The integration of ML in the constitutive modeling of polycrystalline materials has predominantly focused on the onset of yielding. However, the crucial aspect of strain hardening, which is essential for accurately predicting the complex behavior of polycrystalline materials, has not been extensively considered. The present paper aims to bridge this gap by developing an ML-based constitutive model that explicitly considers strain-hardening effects. Using SVC, we intend to craft an ML yield function that is informed by training data generated from crystal plasticity finite element method (CPFEM) simulations with RVE, incorporating considerations of texture and detailed aspects of plastic behavior of the material, including hardening. By analyzing stress–strain data derived from 300 simulations, our goal is to establish a data-oriented flow rule that is not only driven by data but also highly adaptable to the complexities of different polycrystalline materials.

The structure of this paper is as follows. Section 2 offers a comprehensive explanation of the hardening model and CPFEM set up with RVE and texture considerations. The resulting micromechanical model serves as the foundational ground truth for training and testing of the ML yield function. Section 3 introduces the principles of SVC, detailing the workflow consisting of training data generation, performing the actual training, and defining the validation setup. Section 4 explores the performance and application of the developed ML yield function, providing detailed insights into its effectiveness and use cases. Finally, section 5 concludes the paper.

2. Micromechanical modeling

This chapter presents a thorough overview of the crystal plasticity model employed in this study to describe the strain-hardening behavior of polycrystalline materials in a realistic manner. Subsequently, the specifics of the micromechanical simulation setup are explained. The results obtained from these simulations serve as the ground truth for the training of the ML yield function.

2.1. Crystal plasticity constitutive description

The deformation gradient is a fundamental concept in continuum mechanics, which describes how a material deforms or changes its shape under external forces or displacements. It quantifies the local deformation of a material at each point in space. The deformation gradient ${\boldsymbol{F}}$ is a tensor that relates the initial configuration of a material (reference state) to its deformed configuration (current state). As shown in figure 1, when considering an intermediate or relaxed state where all the material points are unloaded and only plastically deformed, the deformation gradient can be decomposed as:

$\begin{align}{\boldsymbol{F}} = {\text{ }}{{\boldsymbol{F}}^{\,\text{e}}}{{\boldsymbol{F}}^p}\end{align} \tag{ 1 }$

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** The multiplicative decomposition of the deformation gradient ${\boldsymbol{F}}$ , into its elastic ( ${{\boldsymbol{F}}^{\,\text{e}}}$ ) and plastic ( ${{\boldsymbol{F}}^{\,p}}$ ) components. The elastic deformation gradient ( ${{\boldsymbol{F}}^{\,\text{e}}}$ ) reflects the combined effects of stretching and rigid rotation within the crystal lattice. This component represents the reversible transformations experienced by the material, highlighting its ability to return to its original state once the applied forces are removed. On the other hand, the plastic deformation gradient ( ${{\boldsymbol{F}}^{\,p}}$ ) relates the initial configuration to an intermediate (relaxed) state, based on the plastic shearing that occurs along specific crystallographic slip systems. This part of the deformation gradient effectively reflects the irreversible deformation in the material.
Download figure:
Standard image High-resolution image

where ${{\boldsymbol{F}}^{\,\text{e}}}$ arises from the stretching and rigid rotation of the crystal lattice and ${{\boldsymbol{F}}^{\,p}}$ describes the plastic shearing on crystallographic slip systems [25, 26]. At low temperatures, it is assumed that the dislocation slip on well-defined crystallographic slip systems is the dominant source of plastic deformation; thus, according to Rice [27], the plastic flow can be described using the velocity gradient:

$\begin{equation}{\boldsymbol{L}_p} = \left( {\boldsymbol{F}^p} \right)^{-1} \dot{\boldsymbol{F}^p} = \sum_{\alpha = 1}^N \dot{\gamma}^\alpha \left( {\boldsymbol{m}_0^\alpha} \otimes {\boldsymbol{n}_{0}^{\alpha}} \right)\end{equation} \tag{ 2 }$

where ${\dot \gamma ^\alpha }$ is the shear rate of slip system $\alpha$ , $N$ is the number of active slip systems, ${\boldsymbol{m}}_0^\alpha$ and ${\boldsymbol{n}}_0^\alpha$ are the slip direction and normal to the slip plane, respectively. As the crystal deforms, vectors connecting lattice sites are stretched and rotated according to ${{\boldsymbol{F}}^{\,\text{e}}}$ , the slipping direction and normal to slip plane in the current deformed configuration can be characterized as:

$\begin{align} \begin{array}{*{20}{l}} {\boldsymbol{m}^\alpha = \boldsymbol{F}^{\,\text{e}} \boldsymbol{m}_0^\alpha} \\ {\boldsymbol{n}^\alpha = \boldsymbol{n}_{0}^{\alpha} (\boldsymbol{F}^{\,\text{e}})^{-1}} \end{array} \end{align} \tag{ 3 }$

The resolved shear stress on slip system $\alpha$ with respect to the reference configuration can be written as:

$\begin{align}{\tau ^\alpha } = {\boldsymbol{S}} \cdot \left( {{\boldsymbol{m}}_0^\alpha \otimes {\boldsymbol{n}}_0^\alpha } \right)\end{align} \tag{ 4 }$

${\boldsymbol{S}}$ is the second Piola–Kirchhoff stress and is defined in the intermediate configuration. For face-centered cubic metallic crystals, the kinetic law of a slip system can be described as:

$\begin{align}{\dot \gamma ^\alpha } = {\text{ }}{\dot \gamma _0}{\left| {\frac{{{\tau ^\alpha }}}{{\tau _c^\alpha }}} \right|^{\frac{1}{m}}}{\text{sgn}}\left( {{\tau ^\alpha }} \right)\end{align} \tag{ 5 }$

where ${\dot \gamma ^\alpha }$ is the shear rate on the slip system $\alpha ,$ which is subjected to resolved shear stress ${\tau ^\alpha }$ with a resistance of $\tau _c^\alpha$ . ${\dot \gamma _0}$ is the reference shear rate, and $m$ is the slip rate sensitivity. Both are materials parameters [25, 26, 28].

The hardening behavior of slip system $\alpha$ is influenced by any slip system $\beta$ as follows:

$\begin{align}\dot \tau _c^\alpha = {h_0}{q_{\alpha \beta }}{\left( {1 - \frac{{\tau _c^\beta }}{{{\tau _s}}}} \right)^p}\left| {{{\dot \gamma }^\beta }} \right|\end{align} \tag{ 6 }$

where ${h_0}$ and $p$ are material parameters, ${\tau _s}$ is the saturated slip resistance due to dislocation density accumulation and ${q_{\alpha \beta }}$ is the cross-hardening matrix.

2.2. Crystal plasticity finite element analysis

In this work, we employed the CPFEM implemented in the commercial software package ABAQUS [29] to run simulations capturing the stress–strain responses of polycrystalline materials under various monotonic loading conditions. The outcomes of these simulations provided the data necessary to construct the training dataset, which subsequently informed our ML model. The CPFEM implementation requires steps such as the creation of RVEs, the assignment of texture, and the careful definition of boundary conditions and application of the loads. These steps are outlined in detail in the following sections.

2.2.1. RVE generation

Classical continuum mechanics deals with the mechanical behavior of materials modeled as a continuum rather than discrete particles, hence neglecting the atomic structure of the materials. Additionally, it does not consider the fact that the mechanical properties can be size-dependent, altering significantly when considering extremely small scales. On the other hand, micromechanics explores the behavior of materials on the microscale, considering the influences of individual grains, phases, texture, or other microstructural components on the mechanical response. A fundamental concept in micromechanics is the RVE or the repeating unit cell. This element provides a micro-to-macro bridge, enabling us to represent the intricate details of the microstructure and to to operate within the simpler framework of continuum mechanics [30, 31].

In this study, we utilized the RVE concept to capture the complex behavior of a polycrystalline material. The RVE generated in ABAQUS consists of 343 distinct cells, with each cell representing an individual grain within the polycrystalline structure. This number of cells was specifically chosen to ensure that the RVE is large enough to accurately represent the complex behavior of polycrystalline materials, providing a realistic and comprehensive portrayal of their microstructural heterogeneity. Each individual cell has dimensions of 0.095 mm. Each of these cells was subdivided into 8 smaller elements, resulting in a total of 2744 elements in the RVE. The choice to subdivide each grain in the RVE into 8 smaller elements was made to capture detailed microstructural characteristics while maintaining computational efficiency. Each cell was assigned unique material properties, including crystallographic texture, to create a detailed microstructural representation of the material. The generated RVE is shown in figure 2.

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** A three-dimensional RVE depicting a polycrystalline metal. The RVE is composed of 343 grains, each divided into 8 elements, resulting in a total of 2744 elements. Each grain is assigned a unique crystallographic orientation contributing to the overall texture of the model. However, the colors representing the grains are chosen arbitrarily and do not represent a specific grain orientation.
Download figure:
Standard image High-resolution image

2.2.2. Texture

The orientation or texture of a rigid body in three-dimensional space can be effectively described using three Euler angles. This description facilitates the transition from a reference frame to a desired frame. Based on Euler's rotation theorem, any given orientation or rotation can be realized by combining these three fundamental rotations, each performed around a distinct axis [32, 33]. The MTEX [34] toolbox in MATLAB was used to generate a list of Euler angles representing a desired texture. This procedure involves specifying the crystal symmetry to align with the general symmetry criteria. Based on this, a set of discrete orientations is systematically selected to ensure conformity with the established symmetry. The Euler angles derived from these orientations are then assigned to the grains in the RVE outlined in the previous section.

2.2.3. Periodic boundary conditions

Periodic boundary conditions (PBCs) are often employed in the simulations of physical phenomena within systems that are conceptually infinite or substantially larger than the scale of internal interactions. These boundary conditions play a crucial role in determining the realistic deformation behavior of RVEs. In PBCs, the displacements of opposing nodes on the RVE are coupled in such a way that the deformation can still represent an average strain over the RVE if demanded by the imposed deformation [35, 36]. To implement PBC, the approach introduced by Smit et al [37] was adopted, which involves constraining the representative pair of boundary nodes located on opposite faces. To achieve this, the equations proposed by Boeff [36] were utilized. As illustrated in figure 3, the RVE is characterized by eight nodes labeled as F1, F2, F3, F4, B1, B2, B3, and B4. Only F1, F2, B1, and F4 are independent nodes to which global boundary conditions are applied, as detailed in table 1. The remaining nodes F3, B2, B3, and B4, are dependent on the displacements of the independent nodes through coupling equations:

$\begin{align}\begin{array}{*{20}{l}} {{u^{B3}} - {u^{F3}} = {u^{B1}} - {u^{F1}}} \\ {{u^{B2}} - {u^{F2}} = {u^{B1}} - {u^{F1}}} \\ {{u^{B4}} - {u^{F4}} = {u^{B1}} - {u^{F1}}} \\ {{u^{F3}} - {u^{F4}} = {u^{F2}} - {u^{F1}}} \end{array}\end{align} \tag{ 7 }$

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** Graphical representation of the RVE with eight reference nodes (F1, F2, F3, F4, B1, B2, B3, B4). F1, F2, B1, and F4 are the independent nodes, while F3, B2, B3, and B4 are dependent. Global boundary conditions are applied exclusively to the independent nodes. The configuration adheres to PBCs, ensuring that the deformation behavior of the RVE accurately represents that of a larger or even infinite system. See also table 1.
Download figure:
Standard image High-resolution image

Table 1. The independent nodes within the RVE to which global boundary conditions are applied.

Nodes	X-direction	Y-direction	Z-direction
F1	Fixed	Fixed	Fixed
F2	Concentrated load	Fixed	Concentrated load
B1	Fixed	Concentrated load	Concentrated load
F4	Concentrated load	Concentrated load	Fixed

2.2.4. Loading condition

The loading directions utilized in this study were based on the approach proposed by Shoghi and Hartmaier [24]. They suggested using proportional loading along 300 directions to ensure comprehensive coverage of the 6-dimensional stress space. These loading directions are formatted as unit stress tensors uniformly distributed on the surface of a unit sphere in the 6-dimensional stress space. By distributing these loading directions uniformly across this surface and subsequently scaling them to represent the actual stresses, the aim is to capture the full range of stress states that the material might encounter. This systematic approach facilitates the collection of diverse and informative data points, enabling the ML model to learn the complex relationship between the stress inputs and yield responses.

3. Machine learning yield function

This section introduces an innovative methodology that integrates the strengths of ML, specifically through SVC and the capabilities of CPFEM in dealing with complex polycrystalline structures to develop an ML yield function with strain hardening considerations. Our technique extends the feature space beyond stress to include strain information. Consequently, our training data comprises flow stresses and corresponding plastic strain components, constituting a 12-dimensional feature vector with six dimensions for stress and strain, respectively. A key advantage of the presented approach is its capacity to handle input data characterized by critical stresses indicative of plastic flow.

This section begins with an explanation of the SVC principles that form the backbone of our ML method. Following this, a strategy for generating comprehensive training data from CPFEM simulation results in the preprocessing step and feature transformation is outlined, ensuring a wide coverage of stress and strain spaces. The training process of the model is then described, underscoring the importance of incorporating strain-hardening data. Finally, our robust testing principles to verify the generalizability of the trained ML model are explained.

3.1. SVC principles

Elastic deformation is the reversible change in the shape and volume of a material under stress, which disappears once the stress is removed. In contrast, plastic deformation is a permanent shape change that remains even after stress is alleviated. The yield function serves as a quantitative threshold marking the transition from elastic to plastic deformation, which occurs when the applied stress reaches the yield strength of the material. The mathematical representation of this framework can be described as:

$\begin{align}f\left( {\boldsymbol{\sigma}} \right) = {\sigma _{{\text{eq}}}}\left( {\boldsymbol{\sigma}} \right) - {\sigma _y} \end{align} \tag{ 8 }$

In this formulation ${\sigma _{{\text{eq}}}}\left( {\boldsymbol{\sigma}} \right)$ is the equivalent stress function of the current stress tensor ${\boldsymbol{\sigma}}$ and ${\sigma _y}$ represents the yield stress of the material. When the yield function responds with a value equal to or larger than zero, it signifies that the material is undergoing plastic deformation. On the other hand, if the yield function returns a negative value, it indicates elastic deformation. Upon yielding, the plastic deformation in many metals is governed by the associated flow rule. This rule postulates that plastic strain develops in the direction of the gradient of the yield function, which is described as:

$\begin{align}{\dot\varepsilon _p} = \dot \lambda \frac{{\partial f\left( {\boldsymbol{\sigma}} \right)}}{{\partial {\boldsymbol{\sigma}} }}\end{align} \tag{ 9 }$

In this equation ${\dot \varepsilon_p}$ represents the plastic strain rate and $\dot \lambda$ is the plastic multiplier. The associated flow rule implies that the material will deform plastically in the direction where the yield function increases most rapidly. To maintain consistency with the yield condition, the plastic multiplier is derived such that the consistency condition $\frac{{{\text{d}}f}}{{{\text{d}}t}} = 0$ is satisfied during plastic loading. This ensures that, as the material deforms plastically, its stress state always remains on the yield surface, even if the yield surface itself evolves (e.g. due to hardening) [4, 38]. Oftentimes, a return mapping algorithm is applied to calculate the proper magnitude of the plastic strain increment in an iterative way.

As suggested by Hartmaier [23], the yield function can be perceived as a decision boundary within a classification problem. In machine learning, classifiers are designed to discriminate between two or more classes of data. Like a classifier, the purpose of the yield function is to separate two distinct regions in the stress–strain space. Drawing parallels between these two concepts, SVC can be employed as a powerful tool to represent the yield function. The primary objective is to precisely identify an optimal hypersurface that serves as the boundary between the elastic and plastic regions, analogous to the role of a yield function. SVC is known for handling high-dimensional datasets well. Its main strength lies in the convex nature of the optimization problem it solves, ensuring a consistent and unique solution and avoiding issues like local minima [39–41]. It is important to note that this optimization convexity does not automatically ensure the convexity of the decision boundary, which is paramount when modeling yield functions. A convex yield surface guarantees a unique and stable material response under any given loading condition. This convexity ensures physically consistent behavior, preventing any unpredictable transitions or non-physical manifestations [42]. The requirement for convexity in yield surfaces has been emphasized and derived in various works. Specifically, the work assumption presented by Naghdi and Trapp [43] and the assumption of maximum rate of dissipation by Rajagopal and Srinivasa [44] both underscore the importance and necessity of convexity in yield surfaces. SVC has a remarkable ability to maintain and reflect the convexity inherent in the training data when determining the decision boundary, particularly when trained on data with inherent physical convexity. Hence, SVC naturally captures and mirrors this characteristic without the need for additional convexity enforcements which makes it an ideal ML algorithm to train a model that can effectively serve as the yield function.

Shoghi and Hartmaier [24] demonstrated the functionality of an SVC-based yield function using 6-dimensional stress tensors as training data in the case of ideal plasticity. In the present work, as a significant advancement, we extended the feature space of the ML model to include both stress and strain components. The training data we employ now comprise the flow stresses and their corresponding plastic strain components. This results in a 12-dimensional feature vector, with six of these dimensions dedicated to stress and the remaining six to strain. In mechanics, six independent variables—comprising three normal and three shear components—are sufficient for describing the full stress or strain tensor. This enhancement broadens the capability of the model to capture strain hardening and deepen its understanding of material behavior. This ML yield function denoted as ${f_{{\text{ML}}}}\left( {{\boldsymbol{\sigma}} ,{\boldsymbol{\varepsilon}} } \right)$ is designed and trained such that, when presented with a stress and corresponding strain tensors, it can provide a response of either ${f_{{\text{ML}}}}\left( {{\boldsymbol{\sigma}} ,{\boldsymbol{\varepsilon}} } \right) = - 1$ or ${f_{{\text{ML}}}}\left( {{\boldsymbol{\sigma}} ,{\boldsymbol{\varepsilon}} } \right) = + 1$ . These numerical responses indicate the elastic or plastic behavior of the material respectively. Beyond merely acting as a classifier, the ML yield function can also be employed as a basis for a data-oriented flow rule. Once yielding happens for every specific plastic strain, the function aims to identify a stress state where the yield function is zero, ensuring that the calculated flow stress aligns with the anticipated plastic deformation. This methodology is grounded in the consistency condition inherent in classical plasticity theory. The decision function, handling a 12-dimensional feature vector, is expressed as follows:

$\begin{align}{f_{{\text{ML}}}}\left( {{\boldsymbol{\sigma}} ,{\boldsymbol{\varepsilon}} {\text{ }}} \right) = \mathop \sum \limits_{k = 1}^{{N_{{\text{SV}}}}} {\alpha _k}{y_k}\psi \left( {\left( {{\boldsymbol{\sigma}} ,{\boldsymbol{\varepsilon}} } \right),\left( {{{\boldsymbol{\sigma}} _k},{{\boldsymbol{\varepsilon}} _k}} \right)} \right) + b\end{align} \tag{ 10 }$

${N_{{\text{SV}}}}{\text{ }}$ is the number of support vectors, which are a subset of the training data points that are crucial for defining the decision boundary. ${\alpha _k}{\text{ }}$ and ${y_k}{\text{ }}$ are coefficients associated with each support vector. ${\alpha _k}$ is the Lagrange multiplier or weight associated with the $k{\text{th}}$ support vector and ${y_k}{\text{ }}$ is the corresponding class label (+1 or −1). $\psi \left( {\left( {{\boldsymbol{\sigma}} ,{\boldsymbol{\varepsilon}} } \right),\left( {{{\boldsymbol{\sigma}} _{\boldsymbol{k}}},{{\boldsymbol{\varepsilon}} _{\boldsymbol{k}}}} \right)} \right)$ is the kernel function that measures the similarity or proximity between the input vector $\left( {{\boldsymbol{\sigma}} ,{\boldsymbol{\varepsilon}} {\text{ }}} \right)$ and each support vector $\left( {{{\boldsymbol{\sigma}} _{\boldsymbol{k}}},{{\boldsymbol{\varepsilon}} _{\boldsymbol{k}}}} \right)$ . The kernel function is responsible for transforming the input space into a higher-dimensional feature space where the data are linearly separable. For the nonlinear problem at hand, the radial basis function (RBF) kernel $\psi \left( {\left( {{\boldsymbol{\sigma }},{\boldsymbol{\varepsilon }}} \right),\left( {{{\boldsymbol{\sigma }}_{\boldsymbol{k}}},{{\boldsymbol{\varepsilon }}_{\boldsymbol{k}}}} \right)} \right) = {\text{ exp}}( - \gamma \| ( {{\boldsymbol{\sigma }},{\boldsymbol{\varepsilon }}} )-$ ${\text{ }}( {{{\boldsymbol{\sigma }}_{\boldsymbol{k}}},{{\boldsymbol{\varepsilon }}_{\boldsymbol{k}}}} ) \|^2 )$ is chosen, and $b$ is the bias term, which is a constant offset that helps shift the decision boundary. The RBF kernel was selected due to its capability to effectively handle complex and non-linear patterns, especially in the context of our 12-dimensional fracture vector. In contrast to the polynomial kernel, which necessitates a predetermined degree of relationship, the RBF kernel offers flexibility and is often more proficient in such high-dimensional scenarios. The SVC model is trained through a dual optimization problem, which focus on finding the optimal values for the Lagrange multipliers ${\alpha _k}$ . The dual optimization problem can be formulated as a maximum search of the expression

$\begin{align}\mathop {{\text{max}}}\limits_\alpha \left( {\mathop \sum \limits_{k = 1}^{{N_{{\text{SV}}}}} {\alpha _k} - \frac{1}{2}\mathop \sum \limits_{k = 1}^{{N_{{\text{SV}}}}} \mathop \sum \limits_{l = 1}^{{N_{{\text{SV}}}}} {\alpha _k}{\alpha _l}{y_k}{y_l}\psi \left( {\left( {{{\boldsymbol{\sigma}} _k},{{\boldsymbol{\varepsilon}} _k}} \right),\left( {{{\boldsymbol{\sigma}} _l},{{\boldsymbol{\varepsilon}} _l}} \right)} \right)} \right)\end{align} \tag{ 11 }$

with respect to the Lagragian multipliers under the constraints:

$\begin{align}\begin{array}{*{20}{l}} {\mathop \sum \limits_{k = 1}^{{N_{{\text{SV}}}}} {\alpha _k}{y_k} = 0{\text{ }}} \\ {0 \le {\alpha _k} \le C} \end{array} \end{align} \tag{ 12 }$

Here, $C$ is the regularization parameter limitting the magnitude of the Lagrange multipliers and, thus, controlling the trade-off between achieving a low training error and ensuring that the model generalizes well to unseen data. Once the dual optimization problem is solved, i.e. optimal ${\alpha _k}$ values are found, the decision function (10) can be used to make predictions [45, 46].

As the ML yield function is defined as a convolution sum over the support vectors, the gradient to the SVC decision function can be calculated as

$\begin{align}\frac{{\partial {f_{{\,\,\text{ML}}}}\left( {{\boldsymbol{\sigma }},{\boldsymbol{\varepsilon }}{\text{ }}} \right)}}{{\partial {\boldsymbol{\sigma }}}} = \sum\limits_{k = 1}^{{N_{{\text{SV}}}}} - 2\gamma {\alpha _k}{y_k}\exp \left( { - \gamma {{\left\| {\left( {{\boldsymbol{\sigma }},{\boldsymbol{\varepsilon }}} \right) - \left( {{{\boldsymbol{\sigma }}_{\boldsymbol{k}}},{{\boldsymbol{\varepsilon }}_{\boldsymbol{k}}}} \right)} \right\|}^2}} \right)\left( {\left( {{\boldsymbol{\sigma }},{\boldsymbol{\varepsilon }}} \right) - \left( {{{\boldsymbol{\sigma }}_{\boldsymbol{k}}},{{\boldsymbol{\varepsilon }}_{\boldsymbol{k}}}} \right)} \right)\end{align} \tag{ 13 }$

from which the plastic strain increments can be derived directly using a standard finite element formulation.

$C$ and $\gamma$ are two important hyperparameters in SVC with the RBF kernel. Parameter $C$ is the regularization parameter that controls the trade-off between maximizing the margin and minimizing the misclassification of the training examples. A lower $C{\text{ }}$ value allows for more misclassifications, leading to a broader margin, which can enhance generalization. However, a higher $C$ value imposes a stiffer penalty on misclassifications, resulting in a more rigid decision boundary characterized by a narrower margin. The parameter $\gamma$ defines the spread of the RBF kernel. It controls the extent of influence for each training instance, affecting the smoothness and flexibility of the decision boundary. A smaller $\gamma$ value results in a more localized influence, where each data point has a narrower impact range. This often leads to simpler and smoother decision boundaries. Conversely, a larger $\gamma$ value increases the range of influence for each training instance, resulting in a more complex decision boundary that conforms closely to the training data [40, 47]. The effects of different combinations of $C$ and $\gamma$ are listed in table 2. Each case in the table represents a combination of high or low $\gamma$ and $C$ values, showcasing the resulting characteristics of the decision boundary.

Table 2. The effects of different combinations of $\gamma$ and $C$ values on the decision boundary in SVC with the RBF kernel.

${\boldsymbol{\gamma}}$	${\boldsymbol{C}}$	Effect on Decision Boundary
Low	Low	A wider margin and a smoother decision boundary. It allows more misclassifications and tends to generalize better but might sacrifice accuracy.
Low	High	The margin becomes narrower, and the decision boundary becomes stricter. It penalizes misclassifications more severely, leading to a decision boundary that fits the training data more tightly.
High	Low	With a higher gamma value, the influence of each training example increases, resulting in a more complex and wiggly decision boundary. However, a low $C$ value still allows some misclassifications.
High	High	Results in a decision boundary that is both complex and strict. The decision boundary tightly fits the training data and minimizes misclassifications but might be wiggly.

It is important to note that the optimal choice of $C$ and $\gamma$ depends on the dataset and specific problem. Achieving the best performance often requires optimization techniques, such as grid search or Bayesian optimization, to determine the optimal hyperparameter values.

3.2. Training data generation

The resulting data from the CPFEM simulation serve as the basis for the generation of training data necessary to inform the ML model. However, before utilizing these data for training, a preprocessing step is necessary to classify the data into distinct classes representing the elastic and plastic regions of the stress–strain space. This feature extraction process can be carried out by utilizing flow stresses and their corresponding plastic strain components. For each equivalent plastic strain level, the flow stresses are upscaled towards the plastic region and downscaled to the elastic area. Following this methodology, the upscaled and downscaled flow stresses are then labeled, indicating whether they fall within the elastic or plastic class. These systematically labeled stresses, along with their corresponding plastic strain tensor, serve as the ground truth for our ML model.

During the preprocessing step, the critical task of upscaling and downscaling $N$ data points is performed. The size of the training dataset is directly determined by the value of $N$ . As shown in figure 4 this operation occurs within the two ranges of [ ${F_e}$ , ${C_e}$ ] and [ ${C_p}$ , ${F_p}$ ]. In these ranges, ${C_e}$ and ${C_p}$ are the boundaries proximate to the actual flow stress in the elastic and plastic domains respectively. Meanwhile, ${F_e}{\text{ }}$ and ${F_p}$ define the outermost limits of these ranges. The real flow stress, marked in yellow, is set to a value of 1 in this schematic representation. The primary function of this real flow stress is to establish a convenient reference point, facilitating the subsequent generation of labeled training data in the elastic and plastic regions of the stress space, respectively. The correct configuration of the preprocessing parameters is essential for the accuracy and predictive capability of the ML model. Figure 5 illustrates the effects of varying preprocessing parameters on data transformation through three plots. Figure 5(a), set with a constant transformation range ( ${C_e}$ , ${F_e}$ , ${C_p}$ , and ${F_p}$ ), illustrates the influence of varying $N$ , the number of upscaled and downscaled points. This variability in $N$ directly influences the training size, which in turn can affect the overall quality of model training. Figure 5(b) shows the effect of varying ${F_e}$ , and ${F_p}$ while $N$ , ${C_e}$ , and ${C_p}$ held constant. This visualization offers insights into the varied possibilities when the far boundaries in the elastic and plastic regions are adjusted, consequently altering the transformation range. Figure 5(c) depicts the potential transformations resulting from variations in ${C_e}$ and ${C_p}$ with $N$ , ${F_e}$ , and ${F_p}$ held constant. This illustration highlights the potential variations when adjustments are made to the outer boundaries of both the elastic and plastic regions, which in turn shifts the transformation range. It must be noted that for illustrative clarity, this visualization is simplified. In practice, the process is executed on 6-dimensional arrays of flow stresses.

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** Visualization of the preprocessing step with flow stresses from the CPFEM simulation that need to be upscaled and downscaled. The specific range for these transformations is defined by the two ranges of [ ${F_e}$ , ${C_e}$ ] and [ ${C_p}$ , ${F_p}$ ]. Here, $N$ represents the number of points being transformed, with two points in each region being upscaled and downscaled in this instance.
Download figure:
Standard image High-resolution image

**Figure 4.** Visualization of the preprocessing step with flow stresses from the CPFEM simulation that need to be upscaled and downscaled. The specific range for these transformations is defined by the two ranges of [ ${F_e}$ , ${C_e}$ ] and [ ${C_p}$ , ${F_p}$ ]. Here, $N$ represents the number of points being transformed, with two points in each region being upscaled and downscaled in this instance.
Download figure:
Standard image High-resolution image

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** The impact of varying preprocessing parameters on data transformation. (a) illustrates the influence of varying $N$ within a constant transformation range ( ${C_e}$ , ${F_e}$ , ${C_p}$ , and ${F_p}$ ) on the size of training dataset. (b) demonstrates the effect of altering far boundaries ${F_e}$ and ${F_p}$ while holding $N$ , ${C_e}$ , and ${C_p}$ constant, and (c) shows how changing boundaries closer to the yield point ( ${C_e}{\text{ }}$ and ${C_p}$ ) can affect the transformation process, with $N$ , ${F_e}$ , and ${F_p}$ kept constant.
Download figure:
Standard image High-resolution image

**Figure 5.** The impact of varying preprocessing parameters on data transformation. (a) illustrates the influence of varying $N$ within a constant transformation range ( ${C_e}$ , ${F_e}$ , ${C_p}$ , and ${F_p}$ ) on the size of training dataset. (b) demonstrates the effect of altering far boundaries ${F_e}$ and ${F_p}$ while holding $N$ , ${C_e}$ , and ${C_p}$ constant, and (c) shows how changing boundaries closer to the yield point ( ${C_e}{\text{ }}$ and ${C_p}$ ) can affect the transformation process, with $N$ , ${F_e}$ , and ${F_p}$ kept constant.
Download figure:
Standard image High-resolution image

This visualization underscores the wide array of parameter combinations that can be explored during the preprocessing stage. By leveraging symmetry, a constant data generation range can be mirrored across both the elastic and plastic regions, thereby reducing the number of parameters that need determination. Utilizing techniques such as Design of Experiments (DoE) [48] facilitates systematic parameter exploration, aiming for optimal model performance. It is important to note that the preprocessing parameters also affect the hyperparameters $C$ and $\gamma$ of the SVC model. To identify this optimal set, it is essential to thoroughly explore the space of $C$ and $\gamma$ using grid search techniques. The grid search employs a cross-validation technique that helps to detect and mitigate overfitting by validating the model's performance on unseen data. Each $C$ and $\gamma$ combination is evaluated using a $k$ -fold cross-validation process, where the dataset is divided into $k$ subsets. For each iteration, one subset serves as the test set, while the remaining $k - 1$ subsets are combined to form the training set [40, 49].

3.3. Feature design

The feature vector powering the ML model is 12-dimensional. This vector comprises six stress components and six strain components, where the strain components represent only the plastic portion of the total strain. The emphasis on plastic strain is important because it represents how the material undergoes permanent deformation when subjected to stress. Such a focus is essential for accurately capturing and describing the strain-hardening behavior of the material.

In micromechanical simulations, the plastic strain starts accumulating right from the onset of deformation. However, it is a widely accepted convention to consider the transition from elastic to plastic behavior when the equivalent plastic strain reaches a small but non-zero threshold of 0.2%. This convention helps distinguish between the initial elastic deformation and the onset of more significant, irreversible plastic deformation. On the other hand, continuum mechanics adopts a different perspective. Within this framework, a distinct transition from elastic to plastic behavior is envisioned at a critical stress, below which the plastic strain is identically zero. This idealized yield onset, typically considered in continuum plasticity, enforces a clear transition between purely elastic behavior and the beginning of plastic deformation. In designing the ML model, alignment with the principles of continuum mechanics is intended, with zero plastic strain being assumed in the elastic regime up to the onset of plastic yielding. To reconcile this difference, the plastic strain data from the micromechanical simulations is adjusted to align with this assumption. As illustrated in figure 6, plastic strain data is adjusted by shifting values that originally corresponded to a plastic strain of 0.2% to zero, maintaining the value of the yield stress. Adopting the convention of zero plastic strain at the onset of plastic deformation, in line with the principles of continuum mechanics, offers several advantages for ML training. Through this adjustment, model interpretability is improved, and predictions become more intuitive.

Figure 6. Refer to the following caption and surrounding text. — **Figure 6.** The schematic illustration of the transformation process applied to the plastic strain data from micromechanical simulations. Data points originally aligned with a plastic strain of 0.2% are adjusted to correspond to a zero value.
Download figure:
Standard image High-resolution image

This shifting process is achieved by applying the following formula:

$\begin{align}{\boldsymbol{\varepsilon}_p}&= {\boldsymbol{\varepsilon}_{p}}^{MM} - \frac{0.2\% \cdot \boldsymbol{\varepsilon}_{p}^{MM}}{\left\| \boldsymbol{\varepsilon}_{p}^{MM} \right\|} = \left( 1 - \frac{0.002}{\left\| \boldsymbol{\varepsilon}_{p}^{MM} \right\|} \right) \cdot {\boldsymbol{\varepsilon}_{p}}^{MM}\nonumber \\ \left\| \boldsymbol{\varepsilon}_{p}^{MM} \right\| &= \sqrt{\frac{2}{3} \left( \left( {\varepsilon _{p1}}^2 + {\varepsilon _{p2}}^2 + {\varepsilon _{p3}}^2 \right) + 2\left( {\varepsilon _{p4}}^2 + {\varepsilon _{p5}}^2 + {\varepsilon _{p6}}^2 \right) \right)}\end{align} \tag{ 14 }$

In this equation, ${\boldsymbol{\varepsilon} _{p}}^{{\text{MM}}}$ symbolizes the original plastic strain data from the micromechanical simulations, $\left\| {{\boldsymbol{\varepsilon} _{p}}^{{\text{MM}}}} \right\|$ is the equivalent plastic strain and ${\boldsymbol{\varepsilon} _p}$ represents the adjusted plastic strain data, which aligns with the ML model's assumption of zero plastic strain at the onset of yielding.

Following the preprocessing step and shifting operation, the feature vector takes the form:

$\begin{align}\left\{ \begin{array}{*{20}{c}} {{x_t} = \left[ {{\sigma _1},{\sigma _2},{\sigma _3},{\sigma _4},{\sigma _5},{\sigma _6},{\text{ }}{\varepsilon _{p1}},{\varepsilon _{p2}},{\varepsilon _{p3}},{\varepsilon _{p4}},{\varepsilon _{p5}},{\varepsilon _{p6}}} \right]} \\ {{y_t} = + 1{\text{ }}} \\ {or} \\ {{y_t} = - 1{\text{ }}} \end{array}\right..\end{align} \tag{ 15 }$

The corresponding target variable ${y_t}{\text{ }}$ can have two possible values: [−1] or [+1].

The next crucial step in preprocessing is feature normalization. This step is important for many ML algorithms and is even mandatory for SVC to ensure no feature dominates the decision function due to its range. By transforming the features to fall within the range of −1 to 1 it is ensured that all features exert equal influence on the SVC's decision function, which leads to a more balanced and accurate model.

3.4. ML training

Following the preprocessing step, the prepared training data and corresponding labels can be utilized for model training. Figure 7(a) offers a schematic representation of the processed training data. Here, each yellow point denotes the equivalent stress at yielding under different loading directions. For ease of visualization, only the initial yield surface is shown in a 2-dimensional stress space, as obtained under plane stress conditions. The brown points represent the upscaled plastic data with label +1, while the blue points represent the downscaled elastic data with label −1. It is crucial to highlight that the actual training data are characterized by 12-dimensional feature vectors resulting from the CPFEM simulations in 300 loading directions. The visualization provides only a simplified view of the full dataset. Subsequently, the training process is proceeded with, where the model parameters in the form of support vectors and dual coefficients of the SVC scheme are iteratively adjusted to minimize the discrepancy between the model's predictions and the actual target labels. The process continues until there is either no significant improvement in the model's performance or it reaches an acceptable level of performance. The SVC model employs a kernel function (RBF in this case) to transform the input space, allowing for linear separation of the classes in a higher dimensional space. The visualization in figure 7(b) illustrates how the hyperplane essentially serves as the ML yield function, effectively distinguishing between the elastic and plastic regions within the stress–strain space.

Figure 7. Refer to the following caption and surrounding text. — **Figure 7.** (a) a schematic visualization of the processed training data. Each yellow point represents the equivalent flow stress in various loading directions. For simplicity, only the initial yield surface is depicted in a 2-dimensional stress space. The brown points symbolize the upscaled plastic data (labeled as +1), and the blue points denote the downscaled elastic data (labeled as −1). (b) illustrates the upscaled (brown) and downscaled (blue) stresses used as the training data to identify the optimal hyperplane in the SVC model.
Download figure:
Standard image High-resolution image

3.5. Performance evaluation

Evaluating the performance of a machine learning model is a critical step in assessing its effectiveness and reliability. One commonly used approach is to analyze a confusion matrix, which provides valuable insights into the model's predictive capabilities. The confusion matrix consists of four components: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). TP and TN represent the cases correctly identified as positive and negative, respectively. Conversely, FP corresponds to negative cases incorrectly predicted as positive, whereas FN indicates positive cases mistakenly identified as negative. High counts of TP and TN reflect the good predictive power of the model, whereas a high number of FP and FN suggest difficulties in making accurate predictions. From these components, several evaluation metrics can be calculated, including accuracy, precision, recall, and the F1 score. One particularly valuable metric is the Matthews correlation coefficient (MCC), which is well-suited for binary classifications and imbalanced datasets. MCC values range from −1, which signifies a completely inverse prediction, to +1, denoting an ideal prediction. Unlike other metrics, MCC considers every element of the confusion matrix rather than focusing solely on specific components. By penalizing both over-predictions (FP) and under-predictions (FN), it offers a balanced evaluation metric [50]. The MCC score is calculated as follows:

$\begin{align}{\text{MCC}} = \frac{{{\text{(TP)}}{\text{(TN)}} - {\text{(FP)}}{\text{(FN)}}}}{{\sqrt {\left( {{\text{TP}} + {\text{FP}}} \right)\left( {{\text{TP}} + {\text{FN}}} \right)\left( {{\text{TN}} + {\text{FP}}} \right)\left( {{\text{TN}} + {\text{FN}}} \right)} }} \end{align} \tag{ 16 }$

It is noted here that measures such as the mean average error are not meaningful to assess the quality of a classification model. Evaluating the performance of an ML model requires the use of well-defined test cases, particularly when employing tools such as the confusion matrix. Ensuring a balanced distribution of positive and negative cases is vital for unbiased performance evaluation. The risk of poorly defined or skewed test cases is that they may result in an unbalanced confusion matrix, thus compromising the integrity and reliability of the derived performance metrics.

To evaluate the robustness and generalization capability of the ML model, it is crucial to utilize unseen data sourced from CPFEM simulations with random loading directions that are not included in the training set. As illustrated in figure 8, these additional test cases are subjected to upscaling and downscaling along their respective hardening paths relative to their associated plastic strain increments. While figure 8 depicts the illustration only for the equivalent flow stress, it is important to emphasize that, as described in section 3.2, this process is applied to the flow stress tensors at each value of plastic strain along the loading path. Through this process, additional test data points are generated that lie within both the elastic and plastic regions of the stress–strain space. These test points follow a Gaussian distribution, clustering densely around the initial flow stress values and gradually decreasing in frequency as the distance from these values increases. The reason for choosing a Gaussian distribution is the fact that points near the boundary are generally perceived as harder to classify, while those farther away are often seen as easier. By this distribution being adopted, a comprehensive and unbiased assessment is ensured, highlighting the adaptability and generalizability of the model when trained across varied scenarios.

Figure 8. Refer to the following caption and surrounding text. — **Figure 8.** Visualization of the distribution of test data for evaluating the ML model. The yellow marker represents the flow stress sourced from the CPFEM simulation. The red markers illustrate the generated test data that have been subjected to upscaling and downscaling procedures, dispersed within both the elastic and plastic deformation regions. Governed by a Gaussian distribution, these test points are most densely populated around the initial yield stress (yellow marker) and decrease in density as they deviate farther, spanning a specific range to ensure comprehensive testing. It must be noted that while this visualization showcases the equivalent flow stress, the actual test cases are represented by a 12-dimensional feature vector generated using flow stresses along the loading path, as outlined in the methodology, as outlined in 3.2.
Download figure:
Standard image High-resolution image

4. Numerical application of the method

In this section, the practical application of the proposed method is detailed. The CPFEM simulations, fundamental to this approach, were executed according to the procedures outlined in section 2.2. In this example, a material with a random texture is assumed. To accurately reflect the microstructure of the material, 343 distinct orientations were created, mirroring the 343 grains in the RVE. These orientations were generated to be randomly distributed across the orientation space defined by the cubic crystal symmetry. A visualization of this texture can be seen in the pole figure presented in figure 9, which displays the orientation distribution function (ODF) of the grain orientations along the [0 0 1]-axis.

Figure 9. Refer to the following caption and surrounding text. — **Figure 9.** Pole figure of the orientation distribution function (ODF) of the grain orientations in the RVE along the [0 0 1]-axis.
Download figure:
Standard image High-resolution image

The specific material parameters of the hardening model are provided in table 3. It is essential to note that while this random texture material is used for demonstration, the application of the method is not confined to this specific material or texture.

Table 3. Material parameters used for CPFEM simulation.

${\dot {\boldsymbol{\gamma}} _0}$ $\left( - \right)$	${{\boldsymbol{\tau}} _0}$ $\left( {{\text{MPa}}} \right)$	${{\boldsymbol{\tau}} _s}$ $\left( {{\text{MPa}}} \right)$	${{\boldsymbol{h}}_0}$ $\left( {{\text{MPa}}} \right)$	${\boldsymbol{m}}$ $\left( - \right)$	${\boldsymbol{p}}$ $\left( - \right)$	${\boldsymbol{C11}}$ $\left( {{\text{GPa}}} \right)$	${\boldsymbol{C12}}$ $\left( {{\text{GPa}}} \right)$	${\boldsymbol{C44}}$ $\left( {{\text{GPa}}} \right)$
0.001	16	148	250	83	2.5	170	124	75

This study involved 300 unique CPFEM simulations, each corresponding to a distinct loading direction represented by a unit stress tensor that is applied as boundary condition to the simulation box. During the simulation, the stress boundary conditions are scaled up proportionally, resulting in a linear increase of the global equivalent stress on the simulation box, which reaches a value of 80 MPa at the end of the simulation. In this way, a minimum of 3% global equivalent plastic strain was achieved in each simulation. Since the total simulation time is 250 seconds, the resulting strain rate is approximately 10⁻⁴ s⁻¹. Upon completion of each simulation, the obtained data was stored in a JavaScript Object Notation (JSON) container, which is a lightweight format ideal for storing structured data. The JSON storage system adopts a hierarchical design, in which each load case receives a unique hashed identifier, enhancing data accessibility and efficiency. After the data generation step, data management, preprocessing, training data formation, and ML training were executed using the pylabFEA [51] library, which is a Python-based tool tailored for simplified finite element analysis.

4.1. Training data generation

In this section, the critical preprocessing step necessary for the generation of training data is detailed. Central to this process is the utilization of flow stresses derived from the CPFEM simulations. To create the 12-dimensional feature vectors with corresponding labels, these stresses undergo upscaling and downscaling within a pre-defined data generation range. The discussion explores the determination of the best range for this data generation, ensuring effective coverage of both elastic and plastic regions. In addition, the balance between computational feasibility and predictive performance in deciding the optimal training data size is highlighted. It is important to emphasize that these factors—the data generation range and training data size—are inherently tied to the unique characteristics of the dataset used. Hence, with the introduction of a new dataset, these parameters require a thorough reassessment to ensure optimal performance.

4.1.1. Optimal training data generation range

To find the optimal range for generation of labeled training data, as discussed in section 3.2, our focus was on three critical parameters: ${C_e}$ , ${F_e}$ , and $N$ . Here, $\left[ {{F_e},{\text{ }}{C_e}{\text{ }}} \right]$ is the data generation range, which due to symmetry, is mirrored into both, the elastic and plastic regions of space. The parameter $N$ determines the number of upscaled and downscaled data points in the respective regions. For effective model training, it is important that the training data generation range is optimally selected. Guided by this principle, a low value of 0.65 for ${F_e}$ and a high value of 0.99 for ${C_e}$ were chosen. While this investigation was conducted using $N = 2$ , the derived range is applicable for higher $N$ values. A Partial Factorial DoE method was employed to systematically explore specific combinations of values for the parameters ${C_e}$ , ${F_e}$ . This was done to generate training data within each of these different ranges and to assess their respective impacts of the training result. The selected values for ${C_e}$ , ${F_e}$ are summarized in table 4. As explained earlier, the values of ${C_e}{\text{ }}$ and ${F_e}$ represent the distance difference from the actual flow stress considered at position 1 in a 6-dimensional space with suited units for the stress. For example, when ${C_e}{\text{ }}$ is set to 0.85, it implies a spatial distance of 0.15 to the actual flow stress.

Table 4. Range of ${C_e}$ , ${F_e}{\text{ }}$ used in the Design of Experiment (DoE) for $N = 2$ .

Factor	Values
${C_e}$	0.85, 0.9, 0.95, 0.99
${F_e}$	0.65, 0.7, 0.75, 0.8

To simplify the experiment, the data selection was confined to simulations with up to 0.5% plastic strain. ML training was carried out utilizing the training data generated within the range resulting from each combination of the values represented in table 4. The MCC score was computed using designated test cases as detailed in section 3.5, providing a quantitative measure of the trained model's performance under varying conditions, which can be seen in figure 10.

Figure 10. Refer to the following caption and surrounding text. — **Figure 10.** Visualization of the relationship between data generation range [ ${C_e}$ , ${F_e}$ ] and the resultant MCC scores for $N = 2$ . Each point represents a unique combination of parameters ${C_e}$ and ${F_e}$ , defining a specific range for the generation of labeled training data. This training data within each of these ranges were used for ML training. The colors denote the MCC score of the trained models obtained for an unseen test data set, illustrating the model's performance under realistic conditions.
Download figure:
Standard image High-resolution image

**Figure 10.** Visualization of the relationship between data generation range [ ${C_e}$ , ${F_e}$ ] and the resultant MCC scores for $N = 2$ . Each point represents a unique combination of parameters ${C_e}$ and ${F_e}$ , defining a specific range for the generation of labeled training data. This training data within each of these ranges were used for ML training. The colors denote the MCC score of the trained models obtained for an unseen test data set, illustrating the model's performance under realistic conditions.
Download figure:
Standard image High-resolution image

Figure 10 shows a heatmap representation of the MCC scores resulting from ML training, where distinct training data sets were generated within different ranges, each originating from a unique combination of ${C_e}{\text{ }}$ and ${F_e}$ . Notably, ${C_e}{\text{ }}$ emerges as a more influential parameter. This is evident as, for certain ${C_e}$ values, the MCC scores remain constant or exhibit only minor fluctuations across the entire range of ${F_e}$ levels. Particularly, ${C_e}$ value of 0.9 stands out as optimal, resulting in higher scores than both its immediate neighboring levels. The combination of ${F_e} = 0.7$ and ${C_e} = 0.9$ results in the highest MCC score of 0.8418 and can be chosen as the optimal data generation range.

4.1.2. Optimal training data generation size

As mentioned in section 3.2, during the preprocessing step, for each level of plastic strain, $N$ labeled data points are generated through upscaling the flow stress tensors into the plastic region and another $N$ values through downscaling the flow stress tensors into the elastic region. This process results in a total of $2N$ training data points contributing to constructing the 12-dimensional feature vectors across all plastic strain levels for all the loading directions. The value of $N$ directly dictates the training size, making its optimal selection important. If $N$ is set too high, it can lead to an excessive amount of training data. This not only increases computational costs but can also introduce noise, potentially leading to overfitting and poor performance on unseen data. To determine the optimal value for $N$ , ML training was carried out with $N$ values of 2, 4 and 6, using the raw data in the micromechanical stress–strain curves up to 3% plastic strain in the optimal range found for ${F_e} = 0.7$ and ${C_e} = 0.9$ . Following training, the MCC scores were assessed using the test cases defined earlier. Along with the MCC scores, the corresponding number of training data and actual support vectors are summarized in table 5.

Table 5. Number of support vectors with respect to that training size in various $N$ and the corresponding ratio.

$N$	MCC Score	Number of Training Data	Number of Support Vectors	Ratio
2	0.8418	157 416	7456	0.047
4	0.8706	314 832	7475	0.024
6	0.8458	472 248	7591	0.016

While there is an improvement in the MCC score when moving from $N = 2$ to $N = 4$ , the gain is relatively small. Compared to the significant increase in computational time required for the larger training size, this minor boost in MCC may not be worth the extra computational effort. Additionally, when $N$ is increased to 6, the MCC score drops slightly, which may indicate the start of overfitting as also the number of support vectors is increasing significantly. Support vectors are specific data points within the training set that, during the training procedure, are identified to lie closest to the hypersurface separating elastic and plastic regions in the stress space. As the training size increases, the algorithm has more potential data points available for selection as support vectors. In an ideal scenario, the algorithm must pick the most representative data points as support vectors to define a precise boundary. However, this is not always the case. If the selected support vectors are not representative of the broader dataset's patterns, it typically leads to a decrease in the performance, which can be observed in $N = 6$ case. Given these observations, $N = 2$ emerges as the recommended choice for ML training.

4.2. Training with optimal training data: a machine learning yield function

This section sheds light on the results of the ML model trained using the optimal training dataset. Using the optimal training range and setting $N = 2$ , resulting in a training size of 157416, the model was trained on data up to 3% plastic strain. A comprehensive hyperparameter optimization process was undertaken with the aim of fine-tuning the model and identifying the optimal combination of hyperparameters. During the optimization phase, a grid search strategy was utilized, facilitating a systematic exploration of a comprehensive range of values for the hyperparameters $C$ and $\gamma$ . The values for $C$ were allowed to vary from 1 to 12 while $\gamma$ was explored within the range of 0.1 to 1. To enhance the robustness of this process, 5-fold cross-validation was incorporated, ensuring that diverse subsets of the data were utilized for both training and validation of the model. A comprehensive evaluation was conducted through exhaustive testing of various hyperparameter combinations, allowing for the precise identification of the most effective setting. The optimal combination of hyperparameters, once identified, was employed to train the final model. To assess the model's generalizability and check for overfitting, its performance was evaluated on both the training set and a separate unseen test set within the same range of up to 3% plastic strain, as described in section 3.5. The model's predictive performance is shown in the confusion matrix in figure 11, while the associated performance metrics for both the training and test sets, are given in table 6.

Figure 11. Refer to the following caption and surrounding text. — **Figure 11.** Confusion matrix for evaluating the ML yield function on the test dataset, after training with optimal hyperparameters.
Download figure:
Standard image High-resolution image

Table 6. Performance metrics of the ML model using the optimal hyperparameters on training and test sets.

Set	${\boldsymbol{C}}$	${\boldsymbol{\gamma}}$	Precision	Accuracy	Recall	F1-score	MCC
Train	4	0.5	0.9410	0.9448	0.9492	0.9450	0.8897
Test	4	0.5	0.9216	0.9053	0.8858	0.9034	0.8111

By incorporating plastic strains as additional degrees of freedom, the ML yield function was trained to capture the hardening behavior of the materials. To visualize this behavior, the full 6-dimensional stress space was simplified to the plane of deviatoric stresses in the 3-dimensional space of principal stresses, the so-called $\pi$ -plane, which can be conveniently described in polar coordinates comprised of the polar angle $\theta$ and the equivalent stress. Within this plane, the ML yield function is evaluated for different stresses, i.e. points on that plane, and plastic strain tensors, given as additional labels, to assess the yielding behavior under a wide range of conditions. To calculate the proper strain tensor associated to a given level of equivalent plastic strain, the gradient of the yield function is evaluated for each point in the stress space and used to indicate the plastic flow direction. By multiplying this normalized direction with the equivalent plastic strain, any desired strain tensor can be constructed in a realistic manner and provided as argument to the ML yield function. Through the evaluation of the yield function's response for the points on the π-plane, the flow stresses were determined, as shown in figure 12.

Figure 12. Refer to the following caption and surrounding text. — **Figure 12.** Evolution of strain hardening behavior on the π-plane for various equivalent plastic strain levels (0%–2.5%). The figure features five distinct contours, each signifying the critical equivalent stress where yielding occurs at the specified plastic strain level under the loading direction indicated by the polar angle. The progression from one contour to the next illustrates the adaptive changes in the yield boundary as the plastic strain level increases.
Download figure:
Standard image High-resolution image

Figure 12 delineates five distinct contours, each marking the boundary in the stress space where yielding occurs at a specific level of equivalent plastic strain. With each successive contour, the evolution of the yield boundary and the increasing resistance to deformation can be observed. Also, the contours exhibit a smooth, rounded nature, confirming the convexity of the yield surface. Specifically, for any two points selected on a contour, all points on the straight-line segment connecting those two points also lie within the domain bounded by the contour, which is a defining characteristic of convex shapes. This visual representation affirms our expectation of a convex yield function, showcasing a form that is consistent with the physical principles of material behavior when subjected to stress. This means that the ML model captures the essence and primary characteristics of the data without overcomplicating or overfitting. When the training data inherently exhibits a convex yielding behavior, a characteristic expected from the material's physical properties, the SVC model is naturally configured to mirror this convexity. It is seen that the model does not try to introduce convoluted boundaries but aims to replicate the general trend of the data.

In figure 13, the predicted yielding behavior from the ML model is compared with actual scattered data points from the CPFEM simulations, representing two distinct levels of plastic strain: the initial yielding and 2.5% of equivalent plastic strain. For each specified plastic strain level, the flow stresses across all loading directions were collected and then converted into polar coordinates on the π-plane of deviatoric stresses. As shown in figure 13, a remarkable alignment between the predictions of the ML model and the actual empirical data can be observed. Most of the scattered data points are found to nestle close to the contours predicted by the ML function. This strong alignment not only underscores the model's ability to capture the material's intricate hardening behavior but also reaffirms the reliability and robustness of the ML model trained in this study, even in regions where data is relatively scarce.

Figure 13. Refer to the following caption and surrounding text. — **Figure 13.** Visualization of the ML model's predicted yielding behavior set alongside actual data points for two specific plastic strain levels: initial and 2.5%. The close alignment between the contours and scattered points highlights the model's accuracy in representing material hardening behavior across different loading directions.
Download figure:
Standard image High-resolution image

The trained ML model can determine both the initial yielding stress and subsequent flow stresses as plastic yielding progresses. Using this capability, a stress–strain curve for any given loading direction can be reconstructed directly from the trained ML yield function. For demonstration purposes, the micromechanical simulation results from an arbitrarily chosen loading direction used during training were taken. This unit stress tensor, $\sigma = {\text{ }}$ (0.054, −0.091, −0.127, 0.139, 0.55, 0.05), is applied as the boundary condition to the CPFEM model and increased proportionally during the simulation. For each equivalent plastic strain in the dataset, the unique stress multiplier is identified and provided as an argument to the ML yield function. As described above, the corresponding plastic strain tensor has been obtained from the normalized gradient of the ML yield function multiplied by the equivalent plastic strain. The resulting stress–strain curve is depicted in figure 14 and compared to the actual data gathered from the CPFEM simulation. This comparison reveals a significant level of agreement, which underscores the proficiency of the ML yield function in characterizing hardening behavior. It should be noted that the ML model was trained specifically within the range of plastics starting from 0.2%, as discussed above. Therefore, stress–strain data from the elastic branch below this 0.2% threshold were not considered in the plot and would result in a linear-elastic response of the ML yield function.

Figure 14. Refer to the following caption and surrounding text. — **Figure 14.** An example for a stress–strain curve reconstructed from the trained ML yield function (red curve) and the corresponding data obtained from a CPFEM simulation in the same loading direction (black curve). This comparison demonstrates a remarkable level of alignment, emphasizing the effectiveness of the ML yield function in capturing the hardening behavior.
Download figure:
Standard image High-resolution image

5. Conclusion

Constitutive equations serve as the fundamental framework for describing and predicting the mechanical behavior of materials under complex mechanical loads. In constitutive modeling, the yield function plays a pivotal role in delineating when a material transitions from elastic to plastic deformation. Recently, ML introduced novel approaches for generating constitutive models directly from data. These methods offer an adaptive way to capture complex material behavior while still adhering to fundamental mechanical principles such as the conservation of energy and momentum. Many ML approaches tend to primarily emphasize initial yielding, neglecting the impact of strain hardening when dealing with the complex behavior of polycrystalline materials. Recognizing this gap, the present work addresses the need to incorporate strain hardening comprehensively into a data-driven yield function, ensuring a complete description of polycrystalline material behavior.

In the proposed workflow, the ML yield function, including strain hardening, was constructed using SVC and trained with a 12-dimensional feature vector, encompassing both the stress and plastic strain components. As a data-driven method, the most challenging part was providing not only sufficient but also the most informative data for training. For capturing the complex hardening behavior of polycrystalline materials, a RVE with random crystallographic texture and 343 grains was generated. CPFEM simulations under monotonic proportional loading were executed along 300 distinct loading directions. These loading directions were strategically chosen to provide optimal and comprehensive coverage of the 6-dimensional stress space. This optimal coverage of the stress space ensures that the trained yield function gains the capability to predict the yielding behavior for any given point within the entire 6-dimensional stress domain. The resulting flow stresses and corresponding strain values along all those directions formed the training data. From this raw data, proper features needed to be extracted for training and testing. Furthermore, the hyperparameters of the training procedure had to be optimized. It was shown that the MCC score provides a reliable basis to judge the quality of the training. Once the trained ML yield function is available, its gradient enables the direct calculation of plastic strain increments in standard finite element formulations following the return mapping algorithm. In this work, however, merely an exemplary stress–strain curve is reconstructed based on the plastic strain increments, while applications of this method within an FEM scheme are subject to forthcoming work.

The trained ML yield function was tested for stresses on the π-plane, which describes the subspace of deviatoric stresses of the principal stress space. To accomplish this, the response of the ML yield function was evaluated on the π-plane at various levels of plastic strain and the results exhibited its remarkable proficiency in predicting the evolution of flow stresses with the plastic strain. A notable alignment between the data points and the model predictions serves as evidence of the model's accuracy and reliability. In consequence, the ML model has been demonstrated to precisely determine the initial yield stress as well as the evolving flow stresses during plastic deformation, thus enabling the generation of stress–strain curves for any loading direction. The suggested method translates the complexity of hardening behavior in polycrystalline materials into a data-driven framework, providing a powerful predictive tool. In future work, this approach will be extended to effectively incorporate aspects of anisotropic yield behavior, obtained for specific crystallographic textures. Furthermore, the possibilities to discriminate between isotropic and kinematic hardening in the data-oriented constitutive model will be explored.

Acknowledgments

AH gratefully acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project-ID 190 389 738—TRR 103.

Data availability statement

The data that support the findings of this study are openly available at the following URL/DOI: https://zenodo.org/records/10522337. For the Python scripts and required libraries, please refer to [51].

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates