International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(4): 416-427
Published online December 25, 2024
https://doi.org/10.5391/IJFIS.2024.24.4.416
© The Korean Institute of Intelligent Systems
Narendra Shyam Joshi1, Kuldeep P. Sambrekar1, Abhijit J. Patankar2, Archana Jadhav3, and Prajakta Ajay Khadkikar4
1Department of Computer Science and Engineering, KLS Gogte Institute of Technology, Visvesvaraya Technological University, Belagavi Karnataka, India
2Department of Information Technology, D Y Patil College of Engineering, Savitribai Phule Pune University, Akurdi Pune, India
3Department of Artificial Intelligence and Data Science, D Y Patil Institute of Engineering Management and Research, Savitribai Phule Pune University, Akurdi Pune, India
4Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Savitribai Phule Pune University, Dhanakavadi Pune, India
Correspondence to :
Narendra Shyam Joshi (nsjsandip100@gmail.com)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The increasing usage of cloud services is correlated with the increasing importance of the data security issues linked to these services. Therefore, numerous individuals are concerned about the risk of eavesdropping on sensitive information while it is being transmitted via cloud services. If unauthorized individuals gain access to a firm’s cloud tools, they may harm the company’s databases, or misuse them. Encryption is crucial for ensuring data security when transmitting data to the cloud. Searching for a large amount of encrypted user data poses unique challenges that are not encountered in other scenarios. During the experiment, we reviewed previous studies to examine how other researchers have addressed similar difficulties. Our main objective is to deliver the quickest achievable search speed while maintaining the highest levels of data protection and computing efficiency. We developed a cutting-edge data encryption method specifically designed for cloud computing. We believe that implementing the Greedy depth-first search (GDFS) ranked search technique significantly simplifies the task of finding encrypted content. We aim to optimize the utilization of cloud computing to enhance time efficiency. We are currently exploring methods for integrating multiple encryption algorithms into a unified method that is compatible with various techniques.
Keywords: Efficient keyword search, Privacy preservation, Binary tree search, Ranked DFS search, Cloud-data encryption
Cloud computing has revolutionized various fields in recent years by transforming data storage, processing, and access. Cloud computing is rapidly gaining popularity among enterprises and individuals owing to its scalability, agility, and affordability. The potential lack of security of the data stored on cloud servers is a significant problem. One of the key features of cloud computing is the organization’s ability to secure sensitive data from being lost, stolen, or compromised. Cloud environments present new risks and challenges. Although traditional security measures, such as access control and encryption, are widely used, they may be insufficient. Therefore, novel strategies are required to improve cloud data security. In this study, we propose a hybrid strategy that combines ranked searching and Greedy depth-first search (GDFS) [1] to address the issue of cloud data security. Our objective is to maximize retrieval efficiency while minimizing waiting times by utilizing the GDFS algorithm [2], which is especially well suited for searching and navigating huge datasets. The proposed method can increase the effectiveness of data retrieval from the cloud by using this technology. To further enhance users’ privacy and safety, the proposed hybrid technique incorporates ranked searching. Encryption and indexing of sensitive data using ranked search can reduce the risk of falling into the incorrect hand. This reduces the possibility that it is depicted in Figure 1.
The proposed solution uses a methodology that guarantees cloud data secrecy. The key security problems in cloud computing can be addressed by employing a hybrid strategy that combines enhanced security measures with improved data retrieval speeds. Enhancing data access and implementing encryption measures can reduce the chances of data breaches, unauthorized access, and data leakage. The proposed technique strikes a balance between data protection and system performance to ensure that security measures do not hinder cloud-computing scalability and adaptability. This study introduces an innovative method to safeguard data stored in the cloud. It combines the advantages of ranked search techniques with a GDFS. By addressing serious security issues, utilizing effective data retrieval, and boosting data protection, this approach provides an effective solution for safeguarding sensitive data stored in the cloud. The findings of this study provide important guidance for improving cloud security, which will drive advancement in the cloud-computing industry.
The objectives of this study are as follows:
1. Learn the concept of data entry into a cloud system.
2. Create a system that can enhance both the performance and privacy by integrating a cloud-based multi-keyword ranked search with GDFS encryption.
3. Experimentally demonstrate that our proposed methodology is better than the present multikey encryption method.
The remainder of this paper is structured as follows: in Section 2, we present the groundwork for our study, including a formal definition and security model, and address specific queries. Section 3 provides a detailed explanation of the creation of the search strategy for semantic terms, including relevant examples. Section 4 presents the security analysis. Section 5 presents the results of both theoretical and experimental investigations. The conclusion is presented in Section 6.
Extensive research has been conducted on techniques for searching encrypted cloud data. While some researchers have focused on methods for recovering lost information, others have endeavored to improve the outcomes of search engine queries. Below, we provide a summary of the most noteworthy studies on this topic. This method was devised to expedite the identification of subgraphs pertinent to a specified query. This technique reduced the search area by utilizing the graph’s topology and priority queue processes. Experimental results demonstrate that the newly proposed strategy outperforms other current approaches in terms of effectiveness, precision, and efficiency when handling large graphs. The most recent approach for performing ranked searches on large graphs involves utilizing a fast and GDFS algorithm. This approach utilizes pruning techniques to optimize the search process and enhance the overall efficiency. Vertically, we assigned a score to each subgraph, depending on the level of correspondence with the query. This approach proficiently extracts subgraphs from large graphs. This method utilizes geometric mean fusion in ambient intelligence and humanized computing [3] to enhance the speed and accuracy of subgraphs. The objective is to identify subgraphs that closely match a given query by calculating identical scores and results for all subgraphs. This method combines similarity metrics using geometric mean fusion to calculate scores. An improved version of the GDFS ranked search method that considers the entire node distance was introduced in a 2020 publication in a cluster computing journal [4]. The work analysis results are listed in Table 1 [3–7].
We developed a GDFS ranked search algorithm as an improved method for implementing search algorithms [8]. This method was developed expressly for this study, and is based on the cumulative distance between nodes contained within subgraphs. Because the cloud server is responsible for managing both the encrypted searchable tree index I and encrypted document collection D, the owner of the data is assured control over both. By using the term-document combination that the user has requested in the index tree, the cloud server can obtain the top-k encrypted documents that have the highest ratings. Every change made by the data owner must be reflected in document collections D and I, which are stored on the server. An efficient method that uses a priority queue is proposed to simultaneously retrieve relevant subgraphs for a large number of requests. Additionally, it narrows the search field by employing a pruning technique based on dynamic programming, making the search process more straightforward. Table 1 presents the results of this comparison.
Efficient, secure, and privacy preserving multi-keyword ranked search encryption for data stored in the cloud has become a critical area in cloud security. Currently, traditional encryption schemes, hierarchical indexing, and simple searching are the most widely used techniques for ranking search functionalities, while preserving privacy at the expense of performance. These methods employ structures, such as the secure k-nearest neighbor (k-NN) and term frequency-inverse document frequency (TF-IDF) scoring mechanisms to rank the search results. However, they experience issues, such as high computational overhead, poor scalability, and vulnerability to inference attacks. To address these issues, we propose a new approach in the form of GDFS–based encryption. This method implements a depth-first traversal technique along with dynamic optimization in the search process. GDFS [9] is a hierarchical encryption-based approach for data retrieval, among which data are encrypted, and the traversal mechanism greedily evaluates terms of query and ranks scores, achieving a faster search with less computational overhead. In contrast to breadth-first or other exhaustive approaches, GDFS attains depth (resp. width) first paradigm and prunes irrelevant nodes early in the search, thereby avoiding unnecessary searches for irrelevant data. It speeds up searches and limits the exposure of metadata to increase privacy guarantees. Furthermore, the encryption is continuously updated according to the relevance of the keywords and how frequently they appear, thereby optimizing the ranking while offering data confidentiality. The performance evaluation of the proposed GDFS approach demonstrated notable improvements in query latency, encryption complexity, and data security performance compared with existing schemes. By integrating obfuscation techniques and relevance-preserving transformations [9], the resulting protocols enjoy a privacy-preserving nature, rendering adversaries unable to infer sensitive information from access patterns. However, current techniques heavily depend on static encryption and ranking algorithms, which do not evolve gracefully as datasets become larger or more complicated. Natively, GDFS adapts to provide a flexible and robust solution with realistic dynamic query frequencies for real-time cloud environments. This analysis also uses greedy optimization to ensure that the system can dynamically provide computational resources, thereby improving the energy efficiency, which is a key requirement for large-scale cloud implementations. This approach fills this gap by balancing the enhancement of privacy and ensuring that end users and organizations use secure cloud storage solutions for processing sensitive data. The GDFS encryption paradigm proposed in this thesis integrates advanced cryptographic techniques and efficient search mechanisms to achieve a promising tradeoff between security, privacy, and efficiency of cloud based multi keyword ranked searches.
·
–
–
–
·
–
–
–
·
–
–
·
–
–
* It traverses the tree depth-first starting at the root.
* Greedily, higher weight nodes are expanded first as they have higher relevance keywords.
–
–
·
–
–
–
–
All text-related actions in the datasets used to extract relevant keywords were included in document preprocessing [13]. The function responsible for preprocessing the data is represented as
Here,
An information provider, information consumer, and cloud storage server comprise the suggestions shown in Figure 2.
·
·
·
·
The performance was evaluated using the REUTERS 21578 dataset, with parameters detailed in Table 2, which outlines the parameter settings for the analysis shown in the result. We predicted 90% and above precision, recall, accuracy, and F1score for the listed encryption algorithms, including the suggested algorithm. The k-means technique, implemented in Tables 2 and Python, was used to cluster the texts into 5 or 10 groups. The text count per cluster is shown in Table 2. The text number varied from 1,000 to 10,000 for comparative analysis. By concatenating the outputs of the various hash functions, the binary indices of 2688 bits were produced, and a reduction factor of 6 was shortened to 448 bits. The proposed scheme supports efficient conjunctive searching through keyword field free indexes, as initially suggested by Li et al. [17]. It differs from existing schemes by reducing the number of texts examined to retrieve relevant results. It only examines texts within the matching cluster, thereby reducing comparison counts and search time. The proposed scheme was compared with existing scheme [18], which were implemented using the same parameters [19] as those described in Table 2.
·
·
·
·
To be considered suitable for real-world applications, a search technique must possess high accuracy and efficiency. The proposed search technique enhances search efficiency by decreasing the average search time required to locate pertinent texts, in contrast to previous strategies [17] that involved scanning the complete collection of texts. Search accuracy was evaluated by computing metrics, such as recall, precision, F1-score, and false accept rate (FAR). We conducted a comprehensive evaluation of the compiled text using a set of 100 questions. Each query consisted of five relevant terms and 30 irrelevant terms, and the user’s text was devoid of content.
A comparison of the search accuracies is displayed in Figure 3. Based on the available information, we constructed a revised table that displays the search accuracy data.
Tables 3 and 4 show the search accuracy comparison between the newly introduced and existing schemes, as well as the tools and technology, indicating improvements in precision, F1-score, and reduction in the FAR. The recall remains the same for both schemes at 100%. The gain column represents the percentage increase or decrease in the performance index of the newly introduced scheme compared with existing schemes.
· Dataset size: The term “data capacity” pertains to the overall quantity of data that can be processed by the search system.
· Query complexity: This study examines the relationship between the performance of the system and the complexity of the query, specifically in terms of the number of keywords and the use of synonym mapping.
· Response time: The amount of time that passes between starting a search query and the user seeing the search results afterward.
· Accuracy: The accuracy of the searching outcomes in relation to the retrieval of pertinent texts.
· System load: The term “computational load” refers to the number of computational resources required by a system during the processing of queries.
· Network latency: The duration required for data transmission over the network during the execution of a query.
· Indexing time: The time necessary to index the data for search operations.
· Encryption/decryption time: The duration required for the encryption of data before its storage and the subsequent decryption process during retrieval.
· Scalability: The system’s capacity to sustain performance levels while expanding in terms of data volume and user count.
· Fault tolerance: The evaluation of the system’s capacity to sustain functioning in the face of component failures.
· Throughput: The system’s query processing capacity within a specified time interval.
As the number of files approached 500, a slight difference between the lines became apparent. An unsuccessful search occurred when a query yielded no results. It is crucial to promptly identify unsuccessful searches to conserve cloud resources. The current research does not examine the time taken to declare that a search is unsuccessful. The quick recognition of an unsuccessful search can reduce users’ financial costs, making it a key performance metric.
Under current schemes [17], a failed search is determined after reviewing all text indices, requiring N comparisons to confirm that no relevant text exists. Conversely, the proposed scheme utilizes a cluster head that embodies all keywords within its cluster, allowing the determination of the presence or absence of text with search terms by reviewing only the cluster indices. Therefore, an unsuccessful search can only be concluded after K comparisons. Figure 4 illustrates the reduction in both the average number of comparisons and average time required to determine an unsuccessful search. Compared to previous schemes, the proposed method enhances the declaring efficiency unsuccessful searches by 99.29%.
In the newly introduced search scheme, the time required to construct a searchable index encompasses the duration required to create both text and cluster indices. The build time of the index for the proposed scheme tends to be greater than that of the existing scheme [15, 16], which is attributed to the additional step of generating indexes for multiple clusters. The number of clusters, denoted by K, varies based on the application’s needs, size of the text collection, and clustering algorithm employed. Incorporating clustering into the index-building process resulted in a slight increase in the time required to create an index for a large text collection. For the graph, we illustrate the comparison of index-build times between the proposed and existing schemes, highlighting a minor increase owing to clustering. If you can provide any specific data or parameters that you would like to include in the graph, please provide them. Otherwise, we proceeded with a hypothetical representation. The average time required to construct a query, which encompasses HMAC computation, reduction, and bitwise-AND operations for each term, remains identical for both the proposed and existing schemes [15, 16] because the proposed scheme introduces no additional delays owing to clustering. The average time required to build queries, ranged from 1 to 5 genuine terms. As shown in Figure 5, scenarios with and without the inclusion of noise terms were considered. This timing was based on the mean time required to generate 200 queries with varying counts of 1–5 genuine terms. To create a graph without specific values, we obtain a plot of the average query build time as it varies with the number in both scenarios (with and without noise). As there are no specific values, we can create a hypothetical graph to illustrate this concept. A graph representing this scenario was generated.
The efficiency of result ranking is evaluated by comparing the time required to generate ‘p’ indexes at different relevance levels within the text collection. The increase in index-build time owing to higher relevance levels is a one-time overhead managed during the offline stage by the data owner. The utilization of cloud resources and parallel processing can further reduce this impact. Thus, the extra time for creating multiple indexes is outweighed by the benefit of delivering superior-ranked search results to users. To illustrate this, in Figure 6, we plotted a graph showing the alignment of top-ranked texts from the proposed scheme with the top results from plain-text searches using hypothetical data.
Considering cloud-based multi-keyword ranked search encryption, we show that the GDFS algorithm incurs indexing and search time, which are the main performance factors. In GDFS, indexing builds a hierarchical structure for the encrypted data such that the time complexity is controlled by the depth and breadth of the tree-shaped structure, namely,
Using a greedy search algorithm that emphasizes depth, encrypted cloud-based multi-keyword ranked searches may be conducted in a confidential and efficient manner. This systems can adhere to strict privacy restrictions, while enabling prompt data retrieval. This technique is based on computational technology performance, and prioritizes relevant results through intelligent navigation of the encryption space. Encryption safeguards sensitive information from unauthorized access from hackers and other intruders. Ranked searching enhances the user experience by organizing the search results based on their relevance. By categorizing the findings, clients can save time and computer resources. In the realm of big data, handling complex queries requires the use of multi-keyword capabilities. Owing to the ever-evolving cyber dangers, achieving perfect security for any system is unachievable. Continuous research and development are essential for adapting and enhancing these systems to withstand emerging Vulnerabilities. This integration establishes a foundation for delivering safe cloud services and showcases interaction between speed and privacy. Cloud computing enables user-centric, efficient, and secure searches services by controlling and storing the data. This will be feasible because of advanced methods currently available.
No potential conflict of interest relevant to this article was reported.
No potential conflict of interest relevant to this article was reported.
Table 1. Comparison of multi-keyword ranked search methodologies.
Study | Methodology | Key features | Performance metrics | Advantages | Limitations | Time complexity |
---|---|---|---|---|---|---|
Das and Kalra [3] | Public key encryption with ranked keyword search | Utilizes secure kNN to achieve multi-keyword search | Precision, recall, search efficiency | High security, efficient search | High computational overhead | |
Guo et al. [4] | Multi-keyword ranked search over encrypted cloud data | Adopts inner product similarity measure for ranking | Search accuracy, response time | Enhanced search accuracy, privacy-preserving | Limited scalability | |
Liu et al. [5] | Privacy-preserving multi-keyword ranked search | Coordinates matching and ranking with privacy-preserving operations | Precision, recall, search efficiency | High accuracy, strong privacy guarantees | Computationally intensive | |
Gawade and Kadu [6] | Privacy-preserving multi-keyword text search | Leverages homomorphic encryption for secure search | Accuracy, search time, encryption/decryption time | Strong security, efficient search | High storage and computation overhead | |
Xu et al. [7] | Efficient multi-keyword ranked search | Combines secure index with advanced ranking | Search precision, query response time | High precision, reduced query time | Complexity in implementation | |
Proposed work | Greedy depth-first encryption (GDFE) | Cloud-based, multi-keyword search, ranked results, greedy depth-first approach | Search efficiency, encryption/decryption time, scalability, privacy | Enhanced performance, strong privacy, scalable | To be determined through experimental evaluation |
Table 2. Simulation environments with parameter.
Datase | Cluster count | Number of texts | Hash function for indexing | HMAC functions for query construction | Reduction factor (d) | Final query length (r) | Server configuration | Programming language |
---|---|---|---|---|---|---|---|---|
REUTERS-21578 [320] | Uniform: 5, Non-uniform: 10 | 1,000 to 10,000 | MD5 | hash functions like SHA-256, SHA-384, and SHA-512 | 6 | 448 bits | Processor Core s, 4 TB HDD, 64 GB RAM and more | Python |
Table 3. Comparative analysis of proposed scheme and existing scheme (unit: %).
Parameter | Proposed scheme | Existing scheme | Gain |
---|---|---|---|
Recall | 100% | 100% | Same |
Precision | 82.4 | 76.27 | +6.13 |
F1-score | 89.07 | 84.89 | +4.18 |
FAR | 0.128 | 0.286 | −55.24 |
Table 4. Tools and technology.
Tool/technology | Purpose | Description |
---|---|---|
Encryption software | Data security | Software used to encrypt the cloud data. Examples include AES, RSA, or custom encryption algorithms. |
Cloud platform | Data hosting | Cloud service provider used to host the encrypted data. This could be AWS, Azure, Google Cloud, etc. |
Indexing engine | Data retrieval | Tool used to create searchable indexes for the encrypted data. Examples might include Apache Lucene or Elasticsearch. |
Synonym database | Search enhancement | A database or API service, such as WordNet that provides synonyms for extending search capabilities. |
Greedy DFS algorithm implementation | Search algorithm | Custom or pre-built greedy DFS algorithm used to perform the ranked searching. |
Programming language | Development | Language used for implementing the search algorithm and handling the encryption/decryption. Likely candidates are Python, Java, or C++. |
Simulation software | Testing & analysis | Software used to simulate the cloud environment and measure the performance of the search algorithm. Could be MATLAB, Simulink, or a custom simulator. |
Table 5. Result analysis of different test case.
Metric | Test case 1 | Test case 2 | Test case 3 |
---|---|---|---|
Dataset size | 500 GB | 1 TB | 5 TB |
Query complexity | 3 keywords | 5 keywords | 7 keywords |
Response time | 1.2 s | 1.8 s | 2.5 s |
Accuracy | 92% | 89% | 85% |
System load | 15% CPU | 30% CPU | 50% CPU |
Network latency | 50 ms | 70 ms | 90 ms |
Indexing time | 10 min | 30 min | 1 hr |
Encryption time | 5 min | 15 min | 45 min |
Decryption time | 2 min | 6 min | 18 min |
Scalability | High | Moderate | Low |
Fault tolerance | 99.99% | 99.95% | 99.9% |
Algorithm 1. The process of encrypting data and constructing an index..
encryptData(data): |
encryptedData = AES Encrypt(data, encryptionKey) |
return encryptedData |
function buildEncryptedIndex(data): |
index = createIndexStructure(data) |
encryptedIndex = encryptData(index) |
return encryptedIndex |
return encryptedIndex |
The output is a series of encrypted files and documents that is represented as |
these are assigned at the moment of encryption. |
– For every d in D |
– Take terms from documents ending in |
– Take the stopwords out of k. |
– Let k undergo lemmatization |
– Determine k’s TF and IDF |
– Use threshold and TF to filter k. |
- then, encrypt k and produce I. |
– Obtain key K’ from KDS |
– Create D’ and encrypt D. |
– Let’s wrap up. |
- Upload D’s document set to the cloud. |
– Transfer my index set to a cloud server |
– Update the cloud I’s tree |
– Go back |
– function GDFS(encryptedIndex, encryptedQuery): |
– stack = initializeStack() |
– stack.push(encryptedIndex.root) |
– results = [] |
– while not stack.is m |
– if secureKNN(node, encryptedQuery) |
– if node.isLeaf( ) |
– results.add(node) |
– else |
– for child in node.children |
– stack.push(child) |
– return results |
– data = “Sensitive data here” |
– encryptedData = encryptData(data) |
– encryptedIndex = buildEncryptedIndex(data) |
– query = “Search query” |
– searchResults = GDFS(encryptedIndex, query) |
– print(“Search Results: ‘‘, searchResults) |
I′ = I1′, I2′, ..., In’, cloud set of indices |
T=Cloud index tree |
K: a group of keys or KDS |
W: a group of search terms |
Result: D= D1, D2, ..., Dn |
Set of m pages that the F filters |
matched with the term “w” |
– For every W in W |
– Remove stop terms |
– Use lemmatization |
– Should a filter include a synonym search |
– Use WordNet to find the synonym for “w” |
– Modify the word list. |
- Close if |
– Use SHA1 encryption to w |
– Conclude |
– Use filter F to upload search index I to the cloud. |
– Use filters and apply DFS to T |
– Obtain the corresponding documentation |
– Sort the results according to matches |
– Forward the appropriate D’ to the end user |
– Receive keys from KDS |
– Decrypt and show the outcome |
– Go back |
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(4): 416-427
Published online December 25, 2024 https://doi.org/10.5391/IJFIS.2024.24.4.416
Copyright © The Korean Institute of Intelligent Systems.
Narendra Shyam Joshi1, Kuldeep P. Sambrekar1, Abhijit J. Patankar2, Archana Jadhav3, and Prajakta Ajay Khadkikar4
1Department of Computer Science and Engineering, KLS Gogte Institute of Technology, Visvesvaraya Technological University, Belagavi Karnataka, India
2Department of Information Technology, D Y Patil College of Engineering, Savitribai Phule Pune University, Akurdi Pune, India
3Department of Artificial Intelligence and Data Science, D Y Patil Institute of Engineering Management and Research, Savitribai Phule Pune University, Akurdi Pune, India
4Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Savitribai Phule Pune University, Dhanakavadi Pune, India
Correspondence to:Narendra Shyam Joshi (nsjsandip100@gmail.com)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The increasing usage of cloud services is correlated with the increasing importance of the data security issues linked to these services. Therefore, numerous individuals are concerned about the risk of eavesdropping on sensitive information while it is being transmitted via cloud services. If unauthorized individuals gain access to a firm’s cloud tools, they may harm the company’s databases, or misuse them. Encryption is crucial for ensuring data security when transmitting data to the cloud. Searching for a large amount of encrypted user data poses unique challenges that are not encountered in other scenarios. During the experiment, we reviewed previous studies to examine how other researchers have addressed similar difficulties. Our main objective is to deliver the quickest achievable search speed while maintaining the highest levels of data protection and computing efficiency. We developed a cutting-edge data encryption method specifically designed for cloud computing. We believe that implementing the Greedy depth-first search (GDFS) ranked search technique significantly simplifies the task of finding encrypted content. We aim to optimize the utilization of cloud computing to enhance time efficiency. We are currently exploring methods for integrating multiple encryption algorithms into a unified method that is compatible with various techniques.
Keywords: Efficient keyword search, Privacy preservation, Binary tree search, Ranked DFS search, Cloud-data encryption
Cloud computing has revolutionized various fields in recent years by transforming data storage, processing, and access. Cloud computing is rapidly gaining popularity among enterprises and individuals owing to its scalability, agility, and affordability. The potential lack of security of the data stored on cloud servers is a significant problem. One of the key features of cloud computing is the organization’s ability to secure sensitive data from being lost, stolen, or compromised. Cloud environments present new risks and challenges. Although traditional security measures, such as access control and encryption, are widely used, they may be insufficient. Therefore, novel strategies are required to improve cloud data security. In this study, we propose a hybrid strategy that combines ranked searching and Greedy depth-first search (GDFS) [1] to address the issue of cloud data security. Our objective is to maximize retrieval efficiency while minimizing waiting times by utilizing the GDFS algorithm [2], which is especially well suited for searching and navigating huge datasets. The proposed method can increase the effectiveness of data retrieval from the cloud by using this technology. To further enhance users’ privacy and safety, the proposed hybrid technique incorporates ranked searching. Encryption and indexing of sensitive data using ranked search can reduce the risk of falling into the incorrect hand. This reduces the possibility that it is depicted in Figure 1.
The proposed solution uses a methodology that guarantees cloud data secrecy. The key security problems in cloud computing can be addressed by employing a hybrid strategy that combines enhanced security measures with improved data retrieval speeds. Enhancing data access and implementing encryption measures can reduce the chances of data breaches, unauthorized access, and data leakage. The proposed technique strikes a balance between data protection and system performance to ensure that security measures do not hinder cloud-computing scalability and adaptability. This study introduces an innovative method to safeguard data stored in the cloud. It combines the advantages of ranked search techniques with a GDFS. By addressing serious security issues, utilizing effective data retrieval, and boosting data protection, this approach provides an effective solution for safeguarding sensitive data stored in the cloud. The findings of this study provide important guidance for improving cloud security, which will drive advancement in the cloud-computing industry.
The objectives of this study are as follows:
1. Learn the concept of data entry into a cloud system.
2. Create a system that can enhance both the performance and privacy by integrating a cloud-based multi-keyword ranked search with GDFS encryption.
3. Experimentally demonstrate that our proposed methodology is better than the present multikey encryption method.
The remainder of this paper is structured as follows: in Section 2, we present the groundwork for our study, including a formal definition and security model, and address specific queries. Section 3 provides a detailed explanation of the creation of the search strategy for semantic terms, including relevant examples. Section 4 presents the security analysis. Section 5 presents the results of both theoretical and experimental investigations. The conclusion is presented in Section 6.
Extensive research has been conducted on techniques for searching encrypted cloud data. While some researchers have focused on methods for recovering lost information, others have endeavored to improve the outcomes of search engine queries. Below, we provide a summary of the most noteworthy studies on this topic. This method was devised to expedite the identification of subgraphs pertinent to a specified query. This technique reduced the search area by utilizing the graph’s topology and priority queue processes. Experimental results demonstrate that the newly proposed strategy outperforms other current approaches in terms of effectiveness, precision, and efficiency when handling large graphs. The most recent approach for performing ranked searches on large graphs involves utilizing a fast and GDFS algorithm. This approach utilizes pruning techniques to optimize the search process and enhance the overall efficiency. Vertically, we assigned a score to each subgraph, depending on the level of correspondence with the query. This approach proficiently extracts subgraphs from large graphs. This method utilizes geometric mean fusion in ambient intelligence and humanized computing [3] to enhance the speed and accuracy of subgraphs. The objective is to identify subgraphs that closely match a given query by calculating identical scores and results for all subgraphs. This method combines similarity metrics using geometric mean fusion to calculate scores. An improved version of the GDFS ranked search method that considers the entire node distance was introduced in a 2020 publication in a cluster computing journal [4]. The work analysis results are listed in Table 1 [3–7].
We developed a GDFS ranked search algorithm as an improved method for implementing search algorithms [8]. This method was developed expressly for this study, and is based on the cumulative distance between nodes contained within subgraphs. Because the cloud server is responsible for managing both the encrypted searchable tree index I and encrypted document collection D, the owner of the data is assured control over both. By using the term-document combination that the user has requested in the index tree, the cloud server can obtain the top-k encrypted documents that have the highest ratings. Every change made by the data owner must be reflected in document collections D and I, which are stored on the server. An efficient method that uses a priority queue is proposed to simultaneously retrieve relevant subgraphs for a large number of requests. Additionally, it narrows the search field by employing a pruning technique based on dynamic programming, making the search process more straightforward. Table 1 presents the results of this comparison.
Efficient, secure, and privacy preserving multi-keyword ranked search encryption for data stored in the cloud has become a critical area in cloud security. Currently, traditional encryption schemes, hierarchical indexing, and simple searching are the most widely used techniques for ranking search functionalities, while preserving privacy at the expense of performance. These methods employ structures, such as the secure k-nearest neighbor (k-NN) and term frequency-inverse document frequency (TF-IDF) scoring mechanisms to rank the search results. However, they experience issues, such as high computational overhead, poor scalability, and vulnerability to inference attacks. To address these issues, we propose a new approach in the form of GDFS–based encryption. This method implements a depth-first traversal technique along with dynamic optimization in the search process. GDFS [9] is a hierarchical encryption-based approach for data retrieval, among which data are encrypted, and the traversal mechanism greedily evaluates terms of query and ranks scores, achieving a faster search with less computational overhead. In contrast to breadth-first or other exhaustive approaches, GDFS attains depth (resp. width) first paradigm and prunes irrelevant nodes early in the search, thereby avoiding unnecessary searches for irrelevant data. It speeds up searches and limits the exposure of metadata to increase privacy guarantees. Furthermore, the encryption is continuously updated according to the relevance of the keywords and how frequently they appear, thereby optimizing the ranking while offering data confidentiality. The performance evaluation of the proposed GDFS approach demonstrated notable improvements in query latency, encryption complexity, and data security performance compared with existing schemes. By integrating obfuscation techniques and relevance-preserving transformations [9], the resulting protocols enjoy a privacy-preserving nature, rendering adversaries unable to infer sensitive information from access patterns. However, current techniques heavily depend on static encryption and ranking algorithms, which do not evolve gracefully as datasets become larger or more complicated. Natively, GDFS adapts to provide a flexible and robust solution with realistic dynamic query frequencies for real-time cloud environments. This analysis also uses greedy optimization to ensure that the system can dynamically provide computational resources, thereby improving the energy efficiency, which is a key requirement for large-scale cloud implementations. This approach fills this gap by balancing the enhancement of privacy and ensuring that end users and organizations use secure cloud storage solutions for processing sensitive data. The GDFS encryption paradigm proposed in this thesis integrates advanced cryptographic techniques and efficient search mechanisms to achieve a promising tradeoff between security, privacy, and efficiency of cloud based multi keyword ranked searches.
·
–
–
–
·
–
–
–
·
–
–
·
–
–
* It traverses the tree depth-first starting at the root.
* Greedily, higher weight nodes are expanded first as they have higher relevance keywords.
–
–
·
–
–
–
–
All text-related actions in the datasets used to extract relevant keywords were included in document preprocessing [13]. The function responsible for preprocessing the data is represented as
Here,
An information provider, information consumer, and cloud storage server comprise the suggestions shown in Figure 2.
·
·
·
·
The performance was evaluated using the REUTERS 21578 dataset, with parameters detailed in Table 2, which outlines the parameter settings for the analysis shown in the result. We predicted 90% and above precision, recall, accuracy, and F1score for the listed encryption algorithms, including the suggested algorithm. The k-means technique, implemented in Tables 2 and Python, was used to cluster the texts into 5 or 10 groups. The text count per cluster is shown in Table 2. The text number varied from 1,000 to 10,000 for comparative analysis. By concatenating the outputs of the various hash functions, the binary indices of 2688 bits were produced, and a reduction factor of 6 was shortened to 448 bits. The proposed scheme supports efficient conjunctive searching through keyword field free indexes, as initially suggested by Li et al. [17]. It differs from existing schemes by reducing the number of texts examined to retrieve relevant results. It only examines texts within the matching cluster, thereby reducing comparison counts and search time. The proposed scheme was compared with existing scheme [18], which were implemented using the same parameters [19] as those described in Table 2.
·
·
·
·
To be considered suitable for real-world applications, a search technique must possess high accuracy and efficiency. The proposed search technique enhances search efficiency by decreasing the average search time required to locate pertinent texts, in contrast to previous strategies [17] that involved scanning the complete collection of texts. Search accuracy was evaluated by computing metrics, such as recall, precision, F1-score, and false accept rate (FAR). We conducted a comprehensive evaluation of the compiled text using a set of 100 questions. Each query consisted of five relevant terms and 30 irrelevant terms, and the user’s text was devoid of content.
A comparison of the search accuracies is displayed in Figure 3. Based on the available information, we constructed a revised table that displays the search accuracy data.
Tables 3 and 4 show the search accuracy comparison between the newly introduced and existing schemes, as well as the tools and technology, indicating improvements in precision, F1-score, and reduction in the FAR. The recall remains the same for both schemes at 100%. The gain column represents the percentage increase or decrease in the performance index of the newly introduced scheme compared with existing schemes.
· Dataset size: The term “data capacity” pertains to the overall quantity of data that can be processed by the search system.
· Query complexity: This study examines the relationship between the performance of the system and the complexity of the query, specifically in terms of the number of keywords and the use of synonym mapping.
· Response time: The amount of time that passes between starting a search query and the user seeing the search results afterward.
· Accuracy: The accuracy of the searching outcomes in relation to the retrieval of pertinent texts.
· System load: The term “computational load” refers to the number of computational resources required by a system during the processing of queries.
· Network latency: The duration required for data transmission over the network during the execution of a query.
· Indexing time: The time necessary to index the data for search operations.
· Encryption/decryption time: The duration required for the encryption of data before its storage and the subsequent decryption process during retrieval.
· Scalability: The system’s capacity to sustain performance levels while expanding in terms of data volume and user count.
· Fault tolerance: The evaluation of the system’s capacity to sustain functioning in the face of component failures.
· Throughput: The system’s query processing capacity within a specified time interval.
As the number of files approached 500, a slight difference between the lines became apparent. An unsuccessful search occurred when a query yielded no results. It is crucial to promptly identify unsuccessful searches to conserve cloud resources. The current research does not examine the time taken to declare that a search is unsuccessful. The quick recognition of an unsuccessful search can reduce users’ financial costs, making it a key performance metric.
Under current schemes [17], a failed search is determined after reviewing all text indices, requiring N comparisons to confirm that no relevant text exists. Conversely, the proposed scheme utilizes a cluster head that embodies all keywords within its cluster, allowing the determination of the presence or absence of text with search terms by reviewing only the cluster indices. Therefore, an unsuccessful search can only be concluded after K comparisons. Figure 4 illustrates the reduction in both the average number of comparisons and average time required to determine an unsuccessful search. Compared to previous schemes, the proposed method enhances the declaring efficiency unsuccessful searches by 99.29%.
In the newly introduced search scheme, the time required to construct a searchable index encompasses the duration required to create both text and cluster indices. The build time of the index for the proposed scheme tends to be greater than that of the existing scheme [15, 16], which is attributed to the additional step of generating indexes for multiple clusters. The number of clusters, denoted by K, varies based on the application’s needs, size of the text collection, and clustering algorithm employed. Incorporating clustering into the index-building process resulted in a slight increase in the time required to create an index for a large text collection. For the graph, we illustrate the comparison of index-build times between the proposed and existing schemes, highlighting a minor increase owing to clustering. If you can provide any specific data or parameters that you would like to include in the graph, please provide them. Otherwise, we proceeded with a hypothetical representation. The average time required to construct a query, which encompasses HMAC computation, reduction, and bitwise-AND operations for each term, remains identical for both the proposed and existing schemes [15, 16] because the proposed scheme introduces no additional delays owing to clustering. The average time required to build queries, ranged from 1 to 5 genuine terms. As shown in Figure 5, scenarios with and without the inclusion of noise terms were considered. This timing was based on the mean time required to generate 200 queries with varying counts of 1–5 genuine terms. To create a graph without specific values, we obtain a plot of the average query build time as it varies with the number in both scenarios (with and without noise). As there are no specific values, we can create a hypothetical graph to illustrate this concept. A graph representing this scenario was generated.
The efficiency of result ranking is evaluated by comparing the time required to generate ‘p’ indexes at different relevance levels within the text collection. The increase in index-build time owing to higher relevance levels is a one-time overhead managed during the offline stage by the data owner. The utilization of cloud resources and parallel processing can further reduce this impact. Thus, the extra time for creating multiple indexes is outweighed by the benefit of delivering superior-ranked search results to users. To illustrate this, in Figure 6, we plotted a graph showing the alignment of top-ranked texts from the proposed scheme with the top results from plain-text searches using hypothetical data.
Considering cloud-based multi-keyword ranked search encryption, we show that the GDFS algorithm incurs indexing and search time, which are the main performance factors. In GDFS, indexing builds a hierarchical structure for the encrypted data such that the time complexity is controlled by the depth and breadth of the tree-shaped structure, namely,
Using a greedy search algorithm that emphasizes depth, encrypted cloud-based multi-keyword ranked searches may be conducted in a confidential and efficient manner. This systems can adhere to strict privacy restrictions, while enabling prompt data retrieval. This technique is based on computational technology performance, and prioritizes relevant results through intelligent navigation of the encryption space. Encryption safeguards sensitive information from unauthorized access from hackers and other intruders. Ranked searching enhances the user experience by organizing the search results based on their relevance. By categorizing the findings, clients can save time and computer resources. In the realm of big data, handling complex queries requires the use of multi-keyword capabilities. Owing to the ever-evolving cyber dangers, achieving perfect security for any system is unachievable. Continuous research and development are essential for adapting and enhancing these systems to withstand emerging Vulnerabilities. This integration establishes a foundation for delivering safe cloud services and showcases interaction between speed and privacy. Cloud computing enables user-centric, efficient, and secure searches services by controlling and storing the data. This will be feasible because of advanced methods currently available.
No potential conflict of interest relevant to this article was reported.
Basic model.
Architecture.
Search accuracy comparison.
Unsuccessful search: gain in average search time.
Average query time by number of genuine term.
Rank efficiency of proposed search scheme.
Table 1 . Comparison of multi-keyword ranked search methodologies.
Study | Methodology | Key features | Performance metrics | Advantages | Limitations | Time complexity |
---|---|---|---|---|---|---|
Das and Kalra [3] | Public key encryption with ranked keyword search | Utilizes secure kNN to achieve multi-keyword search | Precision, recall, search efficiency | High security, efficient search | High computational overhead | |
Guo et al. [4] | Multi-keyword ranked search over encrypted cloud data | Adopts inner product similarity measure for ranking | Search accuracy, response time | Enhanced search accuracy, privacy-preserving | Limited scalability | |
Liu et al. [5] | Privacy-preserving multi-keyword ranked search | Coordinates matching and ranking with privacy-preserving operations | Precision, recall, search efficiency | High accuracy, strong privacy guarantees | Computationally intensive | |
Gawade and Kadu [6] | Privacy-preserving multi-keyword text search | Leverages homomorphic encryption for secure search | Accuracy, search time, encryption/decryption time | Strong security, efficient search | High storage and computation overhead | |
Xu et al. [7] | Efficient multi-keyword ranked search | Combines secure index with advanced ranking | Search precision, query response time | High precision, reduced query time | Complexity in implementation | |
Proposed work | Greedy depth-first encryption (GDFE) | Cloud-based, multi-keyword search, ranked results, greedy depth-first approach | Search efficiency, encryption/decryption time, scalability, privacy | Enhanced performance, strong privacy, scalable | To be determined through experimental evaluation |
Table 2 . Simulation environments with parameter.
Datase | Cluster count | Number of texts | Hash function for indexing | HMAC functions for query construction | Reduction factor (d) | Final query length (r) | Server configuration | Programming language |
---|---|---|---|---|---|---|---|---|
REUTERS-21578 [320] | Uniform: 5, Non-uniform: 10 | 1,000 to 10,000 | MD5 | hash functions like SHA-256, SHA-384, and SHA-512 | 6 | 448 bits | Processor Core s, 4 TB HDD, 64 GB RAM and more | Python |
Table 3 . Comparative analysis of proposed scheme and existing scheme (unit: %).
Parameter | Proposed scheme | Existing scheme | Gain |
---|---|---|---|
Recall | 100% | 100% | Same |
Precision | 82.4 | 76.27 | +6.13 |
F1-score | 89.07 | 84.89 | +4.18 |
FAR | 0.128 | 0.286 | −55.24 |
Table 4 . Tools and technology.
Tool/technology | Purpose | Description |
---|---|---|
Encryption software | Data security | Software used to encrypt the cloud data. Examples include AES, RSA, or custom encryption algorithms. |
Cloud platform | Data hosting | Cloud service provider used to host the encrypted data. This could be AWS, Azure, Google Cloud, etc. |
Indexing engine | Data retrieval | Tool used to create searchable indexes for the encrypted data. Examples might include Apache Lucene or Elasticsearch. |
Synonym database | Search enhancement | A database or API service, such as WordNet that provides synonyms for extending search capabilities. |
Greedy DFS algorithm implementation | Search algorithm | Custom or pre-built greedy DFS algorithm used to perform the ranked searching. |
Programming language | Development | Language used for implementing the search algorithm and handling the encryption/decryption. Likely candidates are Python, Java, or C++. |
Simulation software | Testing & analysis | Software used to simulate the cloud environment and measure the performance of the search algorithm. Could be MATLAB, Simulink, or a custom simulator. |
Table 5 . Result analysis of different test case.
Metric | Test case 1 | Test case 2 | Test case 3 |
---|---|---|---|
Dataset size | 500 GB | 1 TB | 5 TB |
Query complexity | 3 keywords | 5 keywords | 7 keywords |
Response time | 1.2 s | 1.8 s | 2.5 s |
Accuracy | 92% | 89% | 85% |
System load | 15% CPU | 30% CPU | 50% CPU |
Network latency | 50 ms | 70 ms | 90 ms |
Indexing time | 10 min | 30 min | 1 hr |
Encryption time | 5 min | 15 min | 45 min |
Decryption time | 2 min | 6 min | 18 min |
Scalability | High | Moderate | Low |
Fault tolerance | 99.99% | 99.95% | 99.9% |
Algorithm 1. The process of encrypting data and constructing an index..
encryptData(data): |
encryptedData = AES Encrypt(data, encryptionKey) |
return encryptedData |
function buildEncryptedIndex(data): |
index = createIndexStructure(data) |
encryptedIndex = encryptData(index) |
return encryptedIndex |
return encryptedIndex |
The output is a series of encrypted files and documents that is represented as |
these are assigned at the moment of encryption. |
– For every d in D |
– Take terms from documents ending in |
– Take the stopwords out of k. |
– Let k undergo lemmatization |
– Determine k’s TF and IDF |
– Use threshold and TF to filter k. |
- then, encrypt k and produce I. |
– Obtain key K’ from KDS |
– Create D’ and encrypt D. |
– Let’s wrap up. |
- Upload D’s document set to the cloud. |
– Transfer my index set to a cloud server |
– Update the cloud I’s tree |
– Go back |
– function GDFS(encryptedIndex, encryptedQuery): |
– stack = initializeStack() |
– stack.push(encryptedIndex.root) |
– results = [] |
– while not stack.is m |
– if secureKNN(node, encryptedQuery) |
– if node.isLeaf( ) |
– results.add(node) |
– else |
– for child in node.children |
– stack.push(child) |
– return results |
– data = “Sensitive data here” |
– encryptedData = encryptData(data) |
– encryptedIndex = buildEncryptedIndex(data) |
– query = “Search query” |
– searchResults = GDFS(encryptedIndex, query) |
– print(“Search Results: ‘‘, searchResults) |
I′ = I1′, I2′, ..., In’, cloud set of indices |
T=Cloud index tree |
K: a group of keys or KDS |
W: a group of search terms |
Result: D= D1, D2, ..., Dn |
Set of m pages that the F filters |
matched with the term “w” |
– For every W in W |
– Remove stop terms |
– Use lemmatization |
– Should a filter include a synonym search |
– Use WordNet to find the synonym for “w” |
– Modify the word list. |
- Close if |
– Use SHA1 encryption to w |
– Conclude |
– Use filter F to upload search index I to the cloud. |
– Use filters and apply DFS to T |
– Obtain the corresponding documentation |
– Sort the results according to matches |
– Forward the appropriate D’ to the end user |
– Receive keys from KDS |
– Decrypt and show the outcome |
– Go back |
Smita Sharma and Sanjay Tyagi
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(4): 428-439 https://doi.org/10.5391/IJFIS.2024.24.4.428Basic model.
|@|~(^,^)~|@|Architecture.
|@|~(^,^)~|@|Search accuracy comparison.
|@|~(^,^)~|@|Unsuccessful search: gain in average search time.
|@|~(^,^)~|@|Average query time by number of genuine term.
|@|~(^,^)~|@|Rank efficiency of proposed search scheme.