Enhancing Privacy and Authorization Control Scalability in The Grid Through Ontologies
Enhancing Privacy and Authorization Control Scalability in The Grid Through Ontologies
Enhancing Privacy and Authorization Control Scalability in The Grid Through Ontologies
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 1, JANUARY 2010
Enhancing Privacy and Authorization Control Scalability in the Grid Through Ontologies
I. Blanquer, V. Hern ndez, D. Segrelles, and E. Torres a
AbstractThe use of data Grids for sharing relevant data has proven to be successful in many research disciplines. However, the use of these environments when personal data are involved (such as in health) is reduced due to its lack of trust. There are many approaches that provide encrypted storages and key shares to prevent the access from unauthorized users. However, these approaches are additional layers that should be managed along with the authorization policies. We present in this paper a privacy-enhancing technique that uses encryption and relates to the structure of the data and their organizations, providing a natural way to propagate authorization and also a framework that ts with many use cases. The paper describes the architecture and processes, and also shows results obtained in a medical imaging platform. Index TermsGrid, ontologies, open grid services architecture (OGSA), security, Web service resource framework (WSRF).
I. INTRODUCTION
ATA SECURITY is a key requirement for biomedical Grid applications. Dealing with the different national legal regulations and procedures accepted by the medical community [1] requires a careful approach. One of the challenges for biomedical application is to provide efcient high-level interfaces, depending on the applications that enable access to Grids for nonexperts, ensuring transparent access to medical resources through services compatible with medical practice. As part of the interfaces, a exible architecture for the management of the privacy of data is needed, compatible with medical practice and with preexisting medical information systems. Besides, the talks that were delivered by the authors of the Grids: The Top Ten Questions give us one concluding remark that describes many of the Grid production platforms today: Until security is made easier to use, it will not be used [2]. Grid security systems are complex enough to be considered an obstacle in the successful Grid adoption. The proposed architecture
Manuscript received November 19, 2007; revised July 27, 2008. First published August 4, 2008; current version published January 4, 2009. This work was supported in part by the Spanish Ministry of Education and Science to develop the project ngGridNew Generation Components for the Efcient Exploitation of eScience Infrastructures, under Grant TIN2006-12860 and in part by the Structural Funds of the European Regional Development Fund (ERDF). I. Blanquer is with the Institute for the Applications of Advanced Information and Comunication Technologies (ITACA) and the Network Centre for Biomedical Engineering (CRIB), Polytechnic University of Valencia (UPV), Valencia 46022, Spain (e-mail: [email protected]). V. Hern ndez and D. Segrelles are with the Grid and High Performance Coma puting Group (GRyCAP), Valencia University of Technology, Valencia 46980, Spain (e-mail: [email protected]; [email protected]). E. Torres is with the Department of Information Systems and Computation, Polytechnic University of Valencia (UPV), Valencia 46022, Spain (e-mail: [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.org. Digital Object Identier 10.1109/TITB.2008.2003369
introduces new concepts and methods that need to be expressed in the natural terms of the application community or it will be considered a new barrier. A virtual organization (VO) [3] is formed from different real entities (e.g., medical centers, hospitals, governmental centres), and probably also from different communities (e.g., physicians and researchers working in specic projects). Access to data is normally organized around VO membership. Medical imaging grid middlewares using virtual communities for sharing, transferring, and processing Digital Imaging and Communications in Medicine (DICOM) medical images in a distributed environment [4] are starting to be adopted by the medical community. DICOM [5] is the most common standard for medical images. A single DICOM le contains both a header (which stores information about the patients name, the type of scan, image dimensions, structure report, etc.), as well as all of the image data (which can contain information in three dimensions) or structured reports in DICOM structured reporting objects (DICOM-SR) [6]. TRENCADIS is a middleware for managing DICOM-SR [6] that has been used as part of the Valencian Cyberinfrastructure of Oncological Medical Images (CVIMO) [7] deployment, in which ve hospitals in Valencian region collaborate to share DICOM studies and DICOM structured reports. Three ontologies have been created in CVIMO, which dene the three oncological target areas implied (i.e., lung, liver, and central nervous system). Each area can only access the parts of DICOM studies dened in the ontology that a user belongs to. The main objective of this paper is to provide Grid middlewares such as TRENCADIS, with efcient and reliable privacy protection for sensitive data. This paper presents a model for long-term storage and management of encrypted data in distributed environments. Furthermore, the paper outlines how this model is implemented to preserve the privacy of patient information in Grid-based collaborative computational infrastructures for biomedical applications. This paper delineates a dependable security framework in overextended organizations. Throughout the assembly of this framework, organizations will encounter different degrees of data integrity and condentiality. The specic objectives of the paper are: 1) to propose an on-the-y cryptographic infrastructure to protect privacy from users with administrative privileges; 2) to provide a exible architecture for organizing key management for long-term storage of encrypted data; 3) to propose a model applicable in different environments, compatible with current Grid middlewares;
BLANQUER et al.: ENHANCING PRIVACY AND AUTHORIZATION CONTROL SCALABILITY IN THE GRID THROUGH ONTOLOGIES
17
4) to provide an access control mechanism for encryption keys based on ontological groups and roles. The paper is organized as follows. Section II illustrates related works. Section III describes the security model and an insight into the security issues presented in previous papers [34], [35]. Section IV shows a real deployment of the security model that has been applied in the CVIMO project. After that, results about the model deployed in a controlled environment are described. Finally, conclusions are presented.
II. RELATED WORKS Computational Grids offer a number of benets and opportunities to biomedicine, healthcare, and other biomedical domain areas [8]. Several recent systems focused on new health-related applications are analyzed. The Medical Data Manager (MDM) [9] is a data management service designed to handle medical images on Grids, strongly based to the gLite middleware. The MDM aims at guaranteeing patients privacy by keeping private data in acquisition centers. However, this approach comes along with higher complexity in the specication and maintenance of the access policies. Granting full access right to information objects (both image data and header attributes from a DICOM le) requires achieving a number of capabilities kept by different services in the form of access control lists (ACLs). This approach has deciencies in systems where the potential users will not be known beforehand. The higher exibility of attribute-based approaches enables the model presented in this paper to deal efciently with these requirements. The EncFile [10] is an encrypted le management system for biomedical applications in the Enabling Grids for E-science (EGEE) [11] project. Although EncFile is not linked to the EGEE Grid components, the system has been implemented over LCG2 [12]. A Grid-based architecture for computer-aided diagnosis was presented in [13]. In order to protect information against unauthorized disclosure, the authors propose an encrypted storage component described in [14]. Although the prototype was validated on a large experimental platform, the architecture has not been tested in real environments. The Secure Storage Service provides a set of tools to manage condential information in an encrypted format in a Grid computing environment [15]. This service has been developed for the gLite [16] middleware. The Secure Storage Service aims to solve the insider abuse problem, also preventing the administrators of the storage elements to access the condential data in a clear format; however, it does not specify a means to protect the decryption keys from being accessed by administrators. Moreover, the Secure Storage Service associates an ACL with the decryption key. This ACL contains all users authorized to access the encrypted le. This approach does not scale well as the number of users increases. Identifying data resources is a fundamental problem within large-scale Grid environments. While traditional solutions enable users from one organization to access data belonging to
other organizations by sharing metadata, this may not be acceptable for certain organizations due to privacy concerns. The MDM client library provides applications programming interfaces (APIs) for requesting les based on the metadata attached to the DICOM image. The metadata is internally extracted from the DICOM headers and placed into specialized catalogues. The role of ontologies [17] in the context of Grid computing for obtaining, comparing, and analyzing data is increasing. Ontologies can be used to localize datasets within collaborative environments and to build on-the-y collections of data les based on attributes of the ontology. Our proposal uses ontologies that dene the information that is interesting for a given area or group [4]. In CVIMO, ontology attributes match DICOM elds (headers or DICOM-SR tags) and can be used for ltering, indexing, and searching DICOM objects in virtual collections. There are number of efforts to produce access control languages and standards based on XML (e.g., extensible access control markup language (XACML) [18]) and authorization assertion protocols (e.g., security assertation markup language (SAML) [19]). While SAML provides a mechanism for making authentication and authorization assertions and a mechanism for conveying them, XACML provides the language that denes the rules needed to make the necessary authorization decisions. XACML has been applied with great success [20] for implementations of the attribute-based access control (ABAC) model. In ABAC, access decisions are based on attributes of the requestor and resource, and users need not be known by the resource before sending a request. ABAC is scalable and exible, and thus, is more suitable for distributed, open systems than identity-based access control models [21]. Finally, there are promising results on applying Semantic Web standards for protecting Grid [22] and Web services [23]. III. SECURITY MODEL A. Grid Architecture Most of the current Grid middlewares are based on Web services protocols. The Open Grid Services Architecture (OGSA) [24] is a specication in progress that aims at dening a standard and open architecture for Grid-based applications. The Globus Toolkit is a realization of OGSA, which can be used to develop Grid applications. Globus Toolkit Version 4 (GT4) provides services implemented on top of the Web Service Resource Framework (WSRF) [25], a specication that extends Web services with stateful services and other features. The services of the architecture presented in this paper are all based on OGSA/WSRF. B. Grid Security Infrastructure The security services of Grids are not altogether different from those of other distributed system paradigms. Specically, an effective security model must ensure a set of security primitives: identity verication, authorization, access control, data integrity, data condentiality, and availability.
18
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 1, JANUARY 2010
Modern Grid middlewares provide the security infrastructure, usually by means of the Globus Security Infrastructure (GSI) [26], which is a set of tools, libraries, and protocols used in Globus to securely access resources. Almost all Grid components and Grid middlewares use the GSI for authentication. GSI also provides mechanisms that deal with secure connections as well as message protection. GSI lacks from guaranteeing the reliability of the information stored, in terms of authenticity and condentiality. On the other hand, in computational Grids, authorization has an importance beyond its common security meaning. Proper Grid authorization eases the administration of the shared resources and provides coherence to the system by consistently preserving the relationships of the participants. Grid authorization is closely related to the VO concept. The VO administrators dene hierarchy relationships (e.g. groups, subgroups) in the VO, different privileges (e.g., roles, capabilities) to resources, and dene membership in the groups. They are also in charge of controlling access to the Resource Providers (RPs) (e.g., services in an OGSA approach) on the basis of users credentials (e.g., groups, roles, capabilities) and the agreements established between the VO participants. In addition, the RPs have their own local security policies that may override the VO policies. Last decisions on the access to resources must be on the side of the owner of the resource, but global policies enable the management of large-scale infrastructures. In conclusion, the access control to the RPs in the collaborative Grid infrastructure is based on the membership in the group. The actions that users in a given group are allowed to perform (from the point of view of the VO) on a specic resource instance are determined by two policies: the rules that describe the group and the rules controlling the access to resources. All this is managed in the RPs by a component named GateKeeper, which takes into account resource-specic policies, normally ACLs. The classic approach of ACLs requires that permissions are explicitly given to individuals or groups. If a piece of information should be made available to different VOs or VO groups (but not all of them), the data owner should explicitly indicate it when sharing the data. This could be complex if many data are created regularly. However, the metadata associated to a piece of information can have enough information to decide which groups should access it. There are several attribute-based access control systems for Grid environments in the literature (i.e., Akenti [27], PERMIS [28], Shibboleth [29], and Virtual Organization Management Service (VOMS) [30]). Group-based authorization tools (such as VOMS) enable granting different roles and permissions for a single user. VOMS manages authorization information about the members of VOs, and supplies this information as a X.509 attribute certicate. In the context of the EGEE [11] Grid infrastructure, roles are assigned to users through VOMS. As VOMS makes use of X.509 attribute certicates to assert users group memberships, roles, and capabilities, users must create a X.509 proxy certicate [31] before accessing the resources. A VOMS server generates the attribute extensions.
C. VO Management and Ontologies The concept of ontology as the branch of metaphysics that deals with the nature of being has been used in many areas of science and literature. In information technologies, an ontology is a vocabulary and a set of terms, rules, and relations that dene with the needed accuracy a set of entities enabling the denition of classes, hierarchies, and other relations among them. The ontologies dene the terms to be used to describe and represent a knowledge domain. In this sense, the ontologies organize the knowledge in a reusable way. An Ontology Server is a service provided by the model that denes the ontologies (in any language: XML, Resource Discription Framework (RDF), Web Ontology Language (OWL), etc.) and species the relations between VO groups and ontologies. The Ontology Server stores a unique identier for the ontologies in the context of the VO (namely Ontology Id). In conclusion, the VOMS server organizes the users into groups, and the Ontology Server organizes the access of groups to ontologically classied resources and data. Each group can manage multiple ontologies, and each ontology can be managed by different groups. D. Information Object Storage The Information Object Storage (IOS) is a repository service provided by the model. This repository stores all the encrypted information objects required by the VO, inspite of the ontological classications these objects can have. Furthermore, the IOS keeps the relationships between the objects and the ontologies through the Encrypted Object Unique Identier (EOUID) that uniquely identies the object in the Grid. In parallel, the ontologies are used for ltering, indexing, and searching encrypted objects in virtual collections. These virtual collections are also kept in the IOS. When a user tries to retrieve an information object, the Gatekeeper at the IOS veries: 1) rst that the users credentials identify the user as a member of a VO group; 2) second that this group is authorized on the objects ontologies (combining the ontology information stored in the ontology server); and 3) nally, that the local rules allow the user access to the resource. Fig. 1 shows three ontologies that classify the objects into three subsets (Onto1, Onto2, and Onto3). Users are organized into two groups (Group 1 and Group 2), and there is one user in both groups. Group 1 is authorized to access data from ontologies 1 and 2 (which means accessing objects s2, s3, s4, and s5). Group 2 can access data from ontologies 2 and 3 (which means accessing objects s1, s2, s4, and s6). User 2 will be authorized to access data from Onto1 and Onto2, or Onto2 and Onto3, depending on the credentials exposed. Moreover, an IOS might allow or deny the access from users of specic group (IOS 1 and Group 1 in Fig. 1). For this reason, User1, even able to manage objects from ontology 1, cannot access s2 and s3 data in IOS 1. The key of the authorization mechanism is that ontologies and VO groups can only be created by the system administrator, which needs the agreement from the deputies of the communities to include an ontology from one group in a different group.
BLANQUER et al.: ENHANCING PRIVACY AND AUTHORIZATION CONTROL SCALABILITY IN THE GRID THROUGH ONTOLOGIES
19
Finally, individual users or VO groups can be banned even presenting the right ontologies through a conguration le of the gatekeeper. This is a critical operation and should be performed at the resource administration level. E. Encryption and Decryption of Data The model requires a symmetric cryptographic key to encrypt and decrypt the information object, and 256-bit keys Advanced Encryption Standard (AES) [32] are used. Submitters of new/updated datasets utilize separate keys for each object. Encryption and decryption operations took place on the client, preventing the overload of application servers running the IOS. Given that the risk of attacks is higher in servers that share multiple services (including public ones), and the impact could be higher since servers keep far more data than clients, keeping unencrypted information out of the IOS not only improves performance, but also helps to protect the information from unauthorized disclosure. F. Data Integrity and Condentiality Dependable data storage and sharing among multiple organizations are important features of the proposed model. The security framework guards data integrity and condentiality, while ensuring that information objects are easily accessible for authorized users. An integrity code protects both objects integrity as well as its authenticity by allowing users to detect any changes to the object content. We implement this functionality through a 160-bit RIPEMD message digest algorithm. The AES-encrypted blocks of data are used as input for the digest function, joining the encryption/decryption and validation in a single step. The encrypted objects are stored in the IOS, while redundant copies of the integrity code are kept in secure storages, ensuring that authorized users can compare the integrity code with the digest of the encrypted object. A message integrity code provides integrity. Additional measures for authenticity are explained in next section, as well as the reason for not encrypting the integrity code.
On the other hand, guaranteeing condentiality of sensitive data outside the organizations borders additionally requires implementing a decryption key management scheme. In our model, the management of decryption keys is performed through a secret sharing scheme. The key distribution is achieved by a client that divides the key in N different shares, using the Shamirs secret sharing scheme [33]. Key shares are distributed among different administrative domains that contribute to the responsibility of protecting data from unauthorized disclosure. Only k shares (k < N ) are needed to reconstruct a key. Key shares are pairs of data that relate to the input and output of a polynomial of degree N . A sharing pair is represented as (IDKeyPart, Key), where IDKeyPart and Key are the input and the output (to the polynomial), respectively. Key shares for the same decryption key must be placed at different key servers. The key server is a repository service provided by the model. Two key servers are different if and only if they are located at different administrative domains. This means that they are managed by different administrators, even if they are sharing the same VO. It also means that any user who has granted access to a share in a given administrative domain cannot reconstruct the decryption key without obtaining permission on other k1 administrative domains. G. Distribution of the Key Shares and the Integrity Code Distribution of key shares is one of the novel contributions of this paper. By taking the participants of the secret sharing scheme in different administrative domains, the information is protected from being exposed by users granted with physical or administrator access, ensuring the condentiality of the encrypted objects in the Grid. Each administrative domain needs to be enclosed within the boundaries of one organization. The organization registered as a private data holder or as a private data processor must carry on with a set of legal responsibilities concerning keeping private data secure from unauthorized access or disclosure. Key servers not only store key sharing pairs, but also keep a copy of the integrity code of the encrypted object. The distribution of the integrity code among real administrative domains ensures the integrity of the objects in the Grid, and does not necessarily need to encrypt the integrity code to provide a reliable level of assurance. In this way, it becomes possible to validate the object integrity by comparing the integrity code with its representation in the key servers. Unauthorized attempts to modify any encrypted object on the Grid will require compromising the security of a group of services deployed by different administrative domains. Storing the integrity codes in the key servers also serves the purpose of providing the model with a reliable permission revocation mechanism. When a user is revoked from a given VO group, he or she will not be able to access the objects using the VO credentials. The problem arises when the user kept a decryption key after permission revocation and he or she could use local administrative privileges to access the data in the storage elements. In these scenarios, the authenticity and the integrity of the objects is ensured by cross-validating the copy
20
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 1, JANUARY 2010
of the integrity code within the encrypted object with the copies stored in the key servers available at different administrative domains. The complexity of compromising the integrity of an object is the problem of compromising at least k key servers located in k different administrative domains. Useful insight into the permission revocation issue was presented in a previous paper [34]. H. Administrative Domains in the VO The different administrative domains that kept shares of the same decryption key may be part of the same VO. The VO context is the perfect scope for the integration of independent organizations in data protection schemes. VOs are usually associated with a project related to a community where information objects are shared. The VO should agree on the values of k and N , as well as the number of key server replicas that each party should contribute to guarantee operation. Previous works [33] have probed that a very robust key management scheme can be reached by using N = 2k 1. As the participants of the secret sharing scheme are in different administrative domains, even the minimum value the k parameter can take (k = 2) enhances the privacy of the data, since any user (even a local administrator of a storage) needs obtaining access on k different administrative domains in order to reconstruct a decryption key. On the other hand, with k = 2, only N = 3 different administrative domains are needed. In the context of the VO, the administrative domains could be dened as the individual organizations that control private information, and contribute with their private data to the VO. In our model, each administrative domain is revealed by an X.509 organizational unit attribute, along with the common name attribute of the certicate authority. I. Publication in Monitoring and Information System (MIS) The monitoring and information system (MIS) is a signicant piece of Grid technology. This component could be implemented in many different ways (e.g., GMA, MDS2, and MDS4), depending on the middleware used (gLite, GT2, and GT4), while the objective is the same: to collect and deliver information about Grid resources where and when needed. MIS simplies the key shares distribution process among parties involved in the secret sharing scheme. Administrative domains integrated in VOs issue information about their key servers to the MIS, and the clients query the MIS for available key servers in trusted and different administrative domains. The identication of a key server and its administrative domain requires the Key Servers URI, the local key server identier (IDKS) relative to the administrative domain, and the identier of the administrative domain. This information is issued by the MIS and queried by the VO clients. Fig. 2 shows a schematic representation of the storage and management of encrypted data in the Grid. The top of the gure shows a user interacting with the MIS of the VO. The user queries the MIS for three key servers in three different administrative domains. At the base of Fig. 3, independent organizations afliated with the VO are represented. Each organization conFig. 3. Schematic representation of the reconstruction of the decryption key and the decryption and validation of the encrypted object.
Fig. 2. Schematic representation of the storage and management of encrypted data. It shows the encryption of an information object and the distribution of the key shares among different administrative domains.
tributes with its own key servers in the decryption key sharing scheme. At the top of the gure, a CA issues security credentials to the members of the VO. The key servers register the organizations in the MIS index. Through the structure of DNs, the administrative domains of the key servers are revealed. These different administrative domains are used by the encryption mechanism to ensure that different key parts are stored on different domains (as shown at the top of Fig. 2). The encrypted object is generated using a new encryption key and the information of the administrative domains. Once the decryption key shares are distributed among the parties (additional attributes issued to the key servers will be explained in the following sections), the encrypted object is submitted to the IOS. Since the condentiality and integrity of the information object is protected by the framework through the encryption and the integrity code, the IOS could be deployed across the whole of VO computing environment.
BLANQUER et al.: ENHANCING PRIVACY AND AUTHORIZATION CONTROL SCALABILITY IN THE GRID THROUGH ONTOLOGIES
21
J. Uniqueness of Information Objects Whether an encrypted information object could be moved to a different IOS (or simply, whether it could be replicated) depends on how complex is to modify the information linking the object with the IOS and with the key servers. The EOUID is a globally unique identier that guarantees that the encrypted information objects can be unambiguously identied. The EOUID is assigned the rst time the object is encrypted, and is based on the Universally Unique Identier (UUID) standard to guarantee to be unique in time and space. Referring to an encrypted object by its EOUID, Grid repositories (i.e., IOS and key server) guarantees that the information derived from the object is detached from the physical location of the object in the Grid. K. Encrypted Objectss Data Format Besides the encrypted bits, the encrypted object carries additional information. Along with the already mentioned integrity code, the encrypted object contains header elds, a body of encrypted bits, and a footer eld. The prime number used to divide the key is attached to the object in a header eld. The rest of the header contains the N identiers of the administrative domains that keep shares of the decryption key. The footer eld is reserved for the integrity code. L. Access Control With Ontology Attributes The basic idea of access control with ontology attributes is not to dene permissions directly between users and resources, but instead to use the resources ontology attributes as the basis for authorization. Access control policies grant groups of users with different privileges to ontologically classied resources. All services in the framework must enforce these policies on users, and therefore, they must know what services in the Grid store the authorization statements that policy decision points (PDPs) will use with the attributes available about the requester and the resource to evaluate authorization. The previous sections of this paper discussed where policies and other authorization attributes are stored in the framework: the VOMS Server is the repository where VO groups and roles are created and maintained, the ontology server stores the different authorization statements that dene the relations between VO groups and ontologies, and the IOS denes the ontological classication of the information objects. As we have seen before, the IOS could be deployed anywhere in the Grid. Therefore, an IOS outside the administrative domain is not a trusted source of ontology attributes for key servers. On the contrary, when the key server itself is a source of ontology attributes for its administrative domain, changing an encrypted objects ontology in the IOS does not affect the security of the object. Keeping a list of ontologies for the object, the key server guarantees the security of the key, thus guaranteeing the security of the encrypted object. Besides the decryption key share, the IDKeyPart, the integrity code, and the EOUID, the object owner stores a list of ontology identiers (Ids. Ontologies) for the encrypted object in the
key server. Hereby, authorization to key shares is provided to predened ontologies that are related to the encrypted object. In this way, ontology identiers updates must be synchronized among key servers. Hence, this model works better for applications where ontological classication of encrypted objects varies little over time. M. Rebuilding Keys and Decrypting information When a Grid user wants to retrieve an encrypted object identied by its EOUID, the user is rst authenticated, and then the IOS collects the attributes from the users proxy (Fig. 3, step 1). It then consults the ontology server to nd out if the user belongs to any of the VO groups allowed to access the ontologies related to the object. If authorized by the IOS, the user will retrieve the encrypted object (Fig. 3, step 2). Once the user retrieves the encrypted object, he or she extracts the administrative domain identiers from the header of the encrypted object (Fig. 3, step 3). Then, the user consults the MIS for the URIs of the key servers (see Fig. 2), and consults k key servers to retrieve the key (Fig. 3, Step 4). The role of the different components of the model involved in the security scheme can be explained through an example in the terminology of the XACML standard: when a user requests the key server to retrieve a key share, the code responsible for executing the request contains a policy enforcement point (PEP) creating an access request. The access request contains the attributes that identify the user, and the encrypted object associated with the key share (ontologies), and the action being performed in the resource (retrieving a decryption key). The PEP sends this description of the attempted access to the Gatekeeper. The Gatekeeper implements a PDP that consults the ontology server for policies matching the specied group membership to the ontologies (Fig. 3, steps 5 and 6), and also consulting local security policies (e.g., resource-specic ACLs). The PDP then evaluates the access request and issues an authorization decision, sending this conclusion to the PEP. Finally, the PEP executes the code for retrieving the key share, or throws a denying exception. If authorized, the user will retrieve k different shares and k copies of the integrity code (Fig. 3, step 7). With the k sharing pairs, the user reconstructs the decryption key, decrypts the object, and computes the integrity code (Fig. 3, steps 9 and 10). The user veries the computed integrity code with the code stored within the encrypted object, and with the codes retrieved from the key servers. IV. REAL IMPLEMENTATION AND DEPLOYMENT Radiological image and report data storage and distribution in clinical practice at intracorporative level is a well-solved issue with many industrial successful stories. However, sharing data for research and training is an issue that deals with additional problems, such as knowledge organization, privacy, and processing. A representative use case targeted by the present paper could be executing a perfusion analysis on all the images from patients suffering a hepatocarcinome and retrieving the ow rate coefcient images. This cannot be done in current image management systems on clinical delivery, even involving
22
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 1, JANUARY 2010
Fig. 4. Grid.
only one institution or administrative domain. Integrating multiple sources will increase the representativeness of the study, and the integration of computing resources will enable complex postprocessing. The model presented in this paper has been implemented in the framework of the CVIMO [7] project. All services implemented are based on OGSA/WSRF, which constitutes the Grid architecture and infrastructure of the project. The implementation has been done using the Globus Toolkit 4, which uses MDS4 as MIS [35]. CVIMO is a project funded by the Ministry of Enterprises, University and Science of the Valencia Region. In this case, the ontologies are built from an anonymized set of attributes from DICOM headers or DICOM-SR elds. This set is controlled by the VO at a central level, and under the approval of the management of the system, so no privacy leakages appear. Relevant cases are organized into three communities related with oncology (lung, liver, and central nervous system). CVIMO does not compete with intrahospital system such as picture archiving and communication systems (PACS) or RIS/HIS systems, which are oriented to clinical daily practice, but complements them with a collaborative tool to store and share cases relevant for their research or training. A VO named CVIMO and three VO groups have been created using a VOMS server, one for each oncological community implied. The studies relevant for each group can be dened through the ontology using a part of their information. These ontologies are dened in XML. When a user performs a query operation, he or she can only access the information of DICOM studies or DICOM structured reports specied by the ontology that his or her groups have associated. When a medical user selects a relevant DICOM study for sharing (point 1, Fig. 4), it rst creates the structured report DICOM-SR with a given diagnostic (point 2, Fig. 4). DICOM study and DICOM-SR are sent to the IOS to share the information only with the users of the community related to the study. If this happened, the object is encrypted (point 4, Fig. 4) and then inserted into the IOS. The encryption operation implies consult-
Fig. 5. Graph of the execution time for a set of studies (left) and (right) registering and retrieving objects with and without encryption.
ing the ontologies identiers that the user can manage (points 4.1 and 4.2, Fig. 4) to generate an EOUID for the encrypted object (points 4.3 and 4.4, Fig. 4), and to create and distribute the encryption key (point 4.3, Fig. 4). The implemented services in this system are the following. 1) Ontology Server: Keep the ontologies and the relations between VO groups and ontologies. 2) Key Server: Keep for each key part the associated information Message Integrity Code (MIC), EOUID, and IDs of ontologies, IDKeyPart, and the key part). 3) EOUID Server: This service generates the EOUIDs required to identify the encrypted objects. 4) DICOM Storage: This service storage the DICOM studies and DICOM-SR encrypted. V. RESULTS AND DISCUSSION A sample dataset from radiology studies has been created. Each le in the dataset consists of radiology image accompanied by relevant (anonymous) clinical data. Four different studies with different le sizes (see Table I) were used to create the sample set. Images were rst encrypted and stored in an IOS. Unencrypted images are also stored in the IOS to measure the differences between encrypted and unencrypted objects. The length of registering and retrieving an object in the IOS was measured in a client of the infrastructure. Fig. 5 shows the execution time for the set of studies. In Fig. 5 (left), four different series of experimental values are represented. On one hand, crypt-up and crypt-down show the time used for registering and retrieving objects, including encrypting and decrypting the object and key sharing. On the other hand, up and down show the time used for registering and
BLANQUER et al.: ENHANCING PRIVACY AND AUTHORIZATION CONTROL SCALABILITY IN THE GRID THROUGH ONTOLOGIES
23
retrieving objects without encryption. Each point in the graph represents the average time measured by the client clock. The error bars shown in the graph show the standard error calculated for each data point. Fig. 5 (right) shows the difference D, calculated as D = time with encryption time without encryption. The importance of this graph is to show that even when the difference with and without encryption tends to be greater for large-sized objects, it is possible to estimate a performance level in a given interval. For example, the results of this study show that for those objects in the interval from 0.5 to 7.7 MB, it is possible to anticipate up to four additional seconds for retrieving an object using encryption, if compared with the same process without using encryption. This is consistent with the initial studies demonstrating that the overhead due to the security model can be accepted even in an interactive use. Grouping several key shares for different objects in a single request is also possible, so the overhead for retrieving the decryption keys could be optimized when several objects are used in the same study. This is a common need in medical research and training, the two main objectives of the system. VI. CONCLUSION Healthgrids require supporting the ow of information across hospital network boundaries. Encrypted storage is needed to ensure data privacy on different administrative domains. Sharing encrypted objects requires an infrastructure to manage, protect, and control access to the encryption keys. However, decryption keys have a lifecycle, whose management is proposed in this paper by ontology-organized key management for long-term storage. The novelty of the approach is to bind automatically the authorization of users to the actual data automatically through the use of ontologies that specify the data accessible and the relation of VO groups and those ontologies, instead of using the classical ACL approach. Other novelty is in the denition of a distributed security enforcement scheme that takes advantage of the ontologies for distributing and managing the encryption keys in a secure manner. DICOM elds (headers or DICOM-SR tags) used to build the ontologies are previously anonymized, guaranteeing that almost all elds can be used, and resulting in a comprehensive set of ontologies. The information-centric approach of securing the data combined with protecting and controlling the access to the decryption keys presented in this paper have proven to be effective in the prevention of incidents of exposed data due to inconsistent encryption and key management policies, in the prevention of incidents of inaccessible data due to mismanagement of decryption keys, and in helping communities to increase the consistency of encryption and key management policies across organization boundaries. In addition, this paper contributes to increasing the clarity of responsibilities and also to the creation of encryption and key management policies and practices.
Overhead due to encryption and decryption is not signicant with respect to data transfer overhead, and those processes are performed on the client-side to improve scalability. The ontologies are connected to objects both through the IOS and the key server. Duplicating this layer of access control could penalize performance when propagating changes in the ontologies, but deliver higher scalability when the ontologies association does not change in time often. Ontology updates are performed in a lazy revocation. When the ontologies change, a new object with a new EOUID is created, reducing the need for massive reencryption. This update management could be inefcient for objects frequently changing their ontological classication, medical imaging Grids normally deal with read-only and persistent data that minimizes this issue.
REFERENCES
[1] I. E. Magnin and J. Montagnat, The grid and the biomedical community: Achievements and open issues, presented at the EGEE User Forum, CERN, Geneva, Switzerland, Mar. 13, 2006. [2] J. M. Schopf, Grids: The top ten questions, Sci. Program., vol. 10, no. 2, pp. 103111, 2002. [3] . Skita, R. Sota, D. Nikolow, and J. Kitowski, Methodology for virtual organisation design and management, presented at the EGEE User Forum, CERN, Geneva, Switzerland, Mar. 13, 2006. [4] I. Blanquer, V. Hernandez, and J. D. Segrelles, An OGSA middleware for managing medical images using ontologies, J. Clin. Monit. Comput., vol. 19, pp. 295305, Oct. 2005. [5] Digital Imaging and Communications in Medicine (DICOM)Part 10: Media Storage and File Format for Media Interchange, National Electrical Manufacturers Association, Rosslyn, VA, 2008. [6] I. Blanquer, V. Hernandez, and D. Segrelles, TRENCADISA WSRF grid middle ware for managing DICOM structured reporting objects, Stud. Health Technol. Inf., vol. 120, pp. 381391, 2006. [7] Ciberinfraestructura Valenciana deImagen M dica Oncol gica (CVIMO). e o (2008). [Online]. Available: http://www.grycap.upv.es/cvimo [8] V. Breton, K. Dean, and T. Solomonides, Eds., The Healthgrid white paper, in From Grid HealthgridProc. Healthgrid 2005, Stud. Health Technol. Inf. vol. 112. Amsterdam, The Netherlands: IOS Press, pp. 249321. [9] J. Montagnat, A. Frohner, D. Jouvenot, C. Pera, P. Kunszt, B. Koblitz, N. Santos, C. Loomis, R. Texier, D. Lingrand, P. Guio, R. Brito Da Rocha, A. Sobreira de Almeida, and Z. Farkas, A secure grid medical data manager interfaced to the gLite middleware, J. Grid Comput. (JGC), vol. 6, no. 1, pp. 4559, Mar. 2008. [10] C. Blanchet, R. Mollon, and G. Del age, Building an encrypted le e system on the EGEE grid: Application to protein sequence analysis, in Proc. 1st IEEE Int. Conf. Availability, Rel. Security (ARES 2006), pp. 965 973. [11] EGEE: Enabling Grids for E-sciencE (Phase I and II). FP6 European IST Project, Contract Number INFSO-RI-508833. (2008). [Online]. Available: http://www.eu-egee.org [12] World Wide Web Computing Grid. Distributed Production Environment of Physics Data Processing. (2008). [Online]. Available: http://lcg.web.cern.ch/LCG [13] S. Varrette, J. L. Roch, J. Montagnat, J. M. Pierson, L. Seitz, and F. Leprevost, Safe distributed architecture for image-based computer assisted diagnosis, presented at the IEEE Int. Conf. Pervasive Serv. (ICPS 2006), Workshop Health Pervasive Syst., Lyon, France, Jun. 2006. [14] L. Seitz, J. M. Pierson, and L. Brunie, Encrypted storage of medical data on a grid, Methods Inf. Med., vol. 44, no. 2, pp. 198201, Feb. 2005. [15] D. Scardaci and G. Scuderi, A secure storage service for the gLite middleware, in Proc. IAS 20073rd Int. Symp. Inf. Assurance Security, Manchester, U.K., pp. 261266. [16] gLite middleware. (2008). [Online]. Available: http://www.glite.org [17] M. Hadzic and E. Chang, Role of the ontologies in the context of grid computing and application for the human disease studies, in Proc. Int. Conf. Semantics Networked WorldSemantics Grid Databases (ICSNW 2004), pp. 316318.
24
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, VOL. 13, NO. 1, JANUARY 2010
[18] T. Moses, Ed., eXtensible Access Control Markup Language (XACML) Version 2.0, OASIS Standard, 2005. [19] OASIS Security Services TC. (2005). Security Assertion Markup Language (SAML), v2.0 [Online]. Available: http://www.oasis-open.org/ committees/tc_home.php?wg_abbrev=security (10.02.2007). [20] C. Schl ger, T. Priebe, M. Liewald, and G. Pernul, Enabling attributea based access control in authentication and authorisation infrastructures, presented at the 20th Bled eConference eMergence (Bled 2007), Bled, Slovenia. [21] B. Lang, I. Foster, F. Siebenlist, R. Ananthakrishnan, and T. Freeman, Attribute based access control for grid computing, Math. Comput. Sci. (MCS) Div., Argonne Nat. Lab., Argonne, IL, Preprint ANL/MCS-P13670806, Aug. 2006. [22] W. Xiaopeng, L. Junzhou, S. Aibo, and M. Teng, Semantic access control in grid computing, in Proc. 11th Int. Conf. Parallel Distrib. Syst. (ICPADS 2005), vol. 1, pp. 661667. [23] B. Shields, O. Molloy, G. Lyons, and J. Duggan, Securing web services using semantic web technologies, in Proc. 1st Int. IFIP/WG12.5 Working Conf. Ind. Appl. Semantic Web (IASW 2005), pp. 213222. [24] Towards Open Grid services Architecture. (2008). [Online]. Available: http:// www.globus.org/ogsa [25] The Web Services Resource Framework. (2008). [Online]. Available: http://www. globus.org/wsrf [26] Grid Security Infrastructure. (2008). [Online]. Available: http://www. globus. org/security/overview.html [27] M. Thompson, A. Essiari, and S. Mudumbai, Certicate-based authorization policy in a PKI environment, ACM Trans. Inf. Syst. Security (TISSEC), vol. 6, no. 4, pp. 566588, Nov. 2003. [28] D. Chadwick and A. Otenko, The PERMIS X.509 role based privilege management infrastructure, Future Generation Comput. Syst., vol. 19, no. 2, pp. 277289, Feb. 2003. [29] V. Welch, T. Barton, K. Keahey, and F. Siebenlist, Attributes, anonymity and access: Shibboleth and globus integration to facilitate grid collaboration, presented at the 4th Annual PKI R&D Workshop, Gaithersburg, MD, Apr. 2005. [30] R. Aleri, R. Cecchini, V. Ciaschini, L. dellAgnello, A. Frohner, A. Gianoli, K. L rentey, and F. Spataro, VOMSAn authorization syso tem for virtual organizations, presented at the 1st Eur. Across Grids Conf., Santiago de Compostela, Spain, 2003. [31] S. Tuecke, D. Engert, I. Foster, V. Welch, M. Thompson, L. Pearlman, and C. Kesselman, Internet x509 Public key Infrastructure Proxy Certicate Prole, The Internet Society, Reston, VA, draft-ggf-gsi-proxy-04, 2002. [32] FIP 197. (2001, Nov. 26). Announcing the Advanced Encryption Standard [Online]. Available: http://www.csrc.nist.gov/publications/ps/ps197/ ps-197.pdf [33] A. Shamir, How to share a secret, Commun. ACM, vol. 22, pp. 612613, 1979. [34] E. Torres, C. de Alfonso, I. Blanquer, and V. Hernandez, Privacy protection in HealthGrid: Distributing encryption management over the VO, in Proc. HealthGrid 2006, Stud. Health Technol. Inf., vol. 120, pp. 131141. [35] I. Blanquer, V. Hernandez, D. Segrelles, and E. Torres, Long-term storage and management of encrypted biomedical data in real scenarios, in Proc. Int. Conf. Emerging Security Inf., Syst., Technol. (SECURWARE 2007), Valencia, Spain, pp. 7782.
I. Blanquer received the Ph.D. degree in computer sciences. He has been an Assistant Professor at the Computer System Department (DSIC), Polytechnic University of Valencia (UPV), Valencia, Spain, since 1999. He has been involved in parallel computation and medical image processing for 12 years while participating in 17 national and European research projects. He is currently a Research Fellow at the Institute for the Applications of Advanced Information and Comunication Technologies (ITACA) and Network Centre for Biomedical Engineering (CRIB), UPV, and a Member of the Board of Directors of the HealthGrid Association, Pont du Chateau, France.
V. Hern ndez received the Ph.D. degree in applied mathematics. a He is a Full Professor of computer science and articial intelligence and the Leader of the Grid and High Performance Computing Group (GRyCAP), Valencia University of Technology, Valencia, Spain. He is also associated with the Spanish e-Science Network (NGI), Polytechnic University of Valencia (UPV), Valencia, as a Scientic Coordinator. He has been the Vice-chancellor of Research, Development, and Innovation of the UPV during 20002005. He has large experience in parallel and grid computing and numerical methods. He has managed and participated in more than 25 European projects, from the III to the VI Framework Program, along with many national and regional projects.
D. Segrelles received the Ph.D. degree in computer sciences. He has been a Researcher at the Institute for the Applications of Advanced Information and Communication Technologies (ITACA), Polytechnic University of Valencia (UPV), Valencia, Spain, since 2001. He has been involved in Grid Technologies and Medical Image processing for 7 years while participating in seven National and European Research Projects. He is currently a Research Fellow at the Grid and High Performance Computing Group (GRyCAP), Valencia University of Technology, Valencia, and a Associated Professor in this University.
E. Torres. is currently working toward the Ph.D. degree at the Department of Information Systems and Computation, Polytechnic University of Valencia (UPV), Valencia, Spain. He has been involved in Grid Technologies and Security for 3 years while participating in three National and European Research Projects. He is currently a Research Fellow at the Applications of Advanced Information and Communication Technologies (ITACA), UPV.