Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(4): 416-427

Published online December 25, 2024

https://doi.org/10.5391/IJFIS.2024.24.4.416

© The Korean Institute of Intelligent Systems

Enhancing Performance and Privacy on Cloud-Based Multi-Keyword Ranked Search Encryption Using Greedy Depth-First Encryption

Narendra Shyam Joshi1, Kuldeep P. Sambrekar1, Abhijit J. Patankar2, Archana Jadhav3, and Prajakta Ajay Khadkikar4

1Department of Computer Science and Engineering, KLS Gogte Institute of Technology, Visvesvaraya Technological University, Belagavi Karnataka, India
2Department of Information Technology, D Y Patil College of Engineering, Savitribai Phule Pune University, Akurdi Pune, India
3Department of Artificial Intelligence and Data Science, D Y Patil Institute of Engineering Management and Research, Savitribai Phule Pune University, Akurdi Pune, India
4Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Savitribai Phule Pune University, Dhanakavadi Pune, India

Correspondence to :
Narendra Shyam Joshi (nsjsandip100@gmail.com)

Received: July 17, 2024; Accepted: December 11, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

The increasing usage of cloud services is correlated with the increasing importance of the data security issues linked to these services. Therefore, numerous individuals are concerned about the risk of eavesdropping on sensitive information while it is being transmitted via cloud services. If unauthorized individuals gain access to a firm’s cloud tools, they may harm the company’s databases, or misuse them. Encryption is crucial for ensuring data security when transmitting data to the cloud. Searching for a large amount of encrypted user data poses unique challenges that are not encountered in other scenarios. During the experiment, we reviewed previous studies to examine how other researchers have addressed similar difficulties. Our main objective is to deliver the quickest achievable search speed while maintaining the highest levels of data protection and computing efficiency. We developed a cutting-edge data encryption method specifically designed for cloud computing. We believe that implementing the Greedy depth-first search (GDFS) ranked search technique significantly simplifies the task of finding encrypted content. We aim to optimize the utilization of cloud computing to enhance time efficiency. We are currently exploring methods for integrating multiple encryption algorithms into a unified method that is compatible with various techniques.

Keywords: Efficient keyword search, Privacy preservation, Binary tree search, Ranked DFS search, Cloud-data encryption

Cloud computing has revolutionized various fields in recent years by transforming data storage, processing, and access. Cloud computing is rapidly gaining popularity among enterprises and individuals owing to its scalability, agility, and affordability. The potential lack of security of the data stored on cloud servers is a significant problem. One of the key features of cloud computing is the organization’s ability to secure sensitive data from being lost, stolen, or compromised. Cloud environments present new risks and challenges. Although traditional security measures, such as access control and encryption, are widely used, they may be insufficient. Therefore, novel strategies are required to improve cloud data security. In this study, we propose a hybrid strategy that combines ranked searching and Greedy depth-first search (GDFS) [1] to address the issue of cloud data security. Our objective is to maximize retrieval efficiency while minimizing waiting times by utilizing the GDFS algorithm [2], which is especially well suited for searching and navigating huge datasets. The proposed method can increase the effectiveness of data retrieval from the cloud by using this technology. To further enhance users’ privacy and safety, the proposed hybrid technique incorporates ranked searching. Encryption and indexing of sensitive data using ranked search can reduce the risk of falling into the incorrect hand. This reduces the possibility that it is depicted in Figure 1.

The proposed solution uses a methodology that guarantees cloud data secrecy. The key security problems in cloud computing can be addressed by employing a hybrid strategy that combines enhanced security measures with improved data retrieval speeds. Enhancing data access and implementing encryption measures can reduce the chances of data breaches, unauthorized access, and data leakage. The proposed technique strikes a balance between data protection and system performance to ensure that security measures do not hinder cloud-computing scalability and adaptability. This study introduces an innovative method to safeguard data stored in the cloud. It combines the advantages of ranked search techniques with a GDFS. By addressing serious security issues, utilizing effective data retrieval, and boosting data protection, this approach provides an effective solution for safeguarding sensitive data stored in the cloud. The findings of this study provide important guidance for improving cloud security, which will drive advancement in the cloud-computing industry.

The objectives of this study are as follows:

  • 1. Learn the concept of data entry into a cloud system.

  • 2. Create a system that can enhance both the performance and privacy by integrating a cloud-based multi-keyword ranked search with GDFS encryption.

  • 3. Experimentally demonstrate that our proposed methodology is better than the present multikey encryption method.

The remainder of this paper is structured as follows: in Section 2, we present the groundwork for our study, including a formal definition and security model, and address specific queries. Section 3 provides a detailed explanation of the creation of the search strategy for semantic terms, including relevant examples. Section 4 presents the security analysis. Section 5 presents the results of both theoretical and experimental investigations. The conclusion is presented in Section 6.

Extensive research has been conducted on techniques for searching encrypted cloud data. While some researchers have focused on methods for recovering lost information, others have endeavored to improve the outcomes of search engine queries. Below, we provide a summary of the most noteworthy studies on this topic. This method was devised to expedite the identification of subgraphs pertinent to a specified query. This technique reduced the search area by utilizing the graph’s topology and priority queue processes. Experimental results demonstrate that the newly proposed strategy outperforms other current approaches in terms of effectiveness, precision, and efficiency when handling large graphs. The most recent approach for performing ranked searches on large graphs involves utilizing a fast and GDFS algorithm. This approach utilizes pruning techniques to optimize the search process and enhance the overall efficiency. Vertically, we assigned a score to each subgraph, depending on the level of correspondence with the query. This approach proficiently extracts subgraphs from large graphs. This method utilizes geometric mean fusion in ambient intelligence and humanized computing [3] to enhance the speed and accuracy of subgraphs. The objective is to identify subgraphs that closely match a given query by calculating identical scores and results for all subgraphs. This method combines similarity metrics using geometric mean fusion to calculate scores. An improved version of the GDFS ranked search method that considers the entire node distance was introduced in a 2020 publication in a cluster computing journal [4]. The work analysis results are listed in Table 1 [37].

We developed a GDFS ranked search algorithm as an improved method for implementing search algorithms [8]. This method was developed expressly for this study, and is based on the cumulative distance between nodes contained within subgraphs. Because the cloud server is responsible for managing both the encrypted searchable tree index I and encrypted document collection D, the owner of the data is assured control over both. By using the term-document combination that the user has requested in the index tree, the cloud server can obtain the top-k encrypted documents that have the highest ratings. Every change made by the data owner must be reflected in document collections D and I, which are stored on the server. An efficient method that uses a priority queue is proposed to simultaneously retrieve relevant subgraphs for a large number of requests. Additionally, it narrows the search field by employing a pruning technique based on dynamic programming, making the search process more straightforward. Table 1 presents the results of this comparison.

Efficient, secure, and privacy preserving multi-keyword ranked search encryption for data stored in the cloud has become a critical area in cloud security. Currently, traditional encryption schemes, hierarchical indexing, and simple searching are the most widely used techniques for ranking search functionalities, while preserving privacy at the expense of performance. These methods employ structures, such as the secure k-nearest neighbor (k-NN) and term frequency-inverse document frequency (TF-IDF) scoring mechanisms to rank the search results. However, they experience issues, such as high computational overhead, poor scalability, and vulnerability to inference attacks. To address these issues, we propose a new approach in the form of GDFS–based encryption. This method implements a depth-first traversal technique along with dynamic optimization in the search process. GDFS [9] is a hierarchical encryption-based approach for data retrieval, among which data are encrypted, and the traversal mechanism greedily evaluates terms of query and ranks scores, achieving a faster search with less computational overhead. In contrast to breadth-first or other exhaustive approaches, GDFS attains depth (resp. width) first paradigm and prunes irrelevant nodes early in the search, thereby avoiding unnecessary searches for irrelevant data. It speeds up searches and limits the exposure of metadata to increase privacy guarantees. Furthermore, the encryption is continuously updated according to the relevance of the keywords and how frequently they appear, thereby optimizing the ranking while offering data confidentiality. The performance evaluation of the proposed GDFS approach demonstrated notable improvements in query latency, encryption complexity, and data security performance compared with existing schemes. By integrating obfuscation techniques and relevance-preserving transformations [9], the resulting protocols enjoy a privacy-preserving nature, rendering adversaries unable to infer sensitive information from access patterns. However, current techniques heavily depend on static encryption and ranking algorithms, which do not evolve gracefully as datasets become larger or more complicated. Natively, GDFS adapts to provide a flexible and robust solution with realistic dynamic query frequencies for real-time cloud environments. This analysis also uses greedy optimization to ensure that the system can dynamically provide computational resources, thereby improving the energy efficiency, which is a key requirement for large-scale cloud implementations. This approach fills this gap by balancing the enhancement of privacy and ensuring that end users and organizations use secure cloud storage solutions for processing sensitive data. The GDFS encryption paradigm proposed in this thesis integrates advanced cryptographic techniques and efficient search mechanisms to achieve a promising tradeoff between security, privacy, and efficiency of cloud based multi keyword ranked searches.

3.1 Step-by-Step Methodology

  • · Data preparation and indexing

    • Input dataset: Keywords are extracted from a collection of plaintext documents.

    • Keyword tokenization: Techniques for natural language processing are used to extract keywords from a text.

    • Indexing: A structured representation of each document is created by indexing the document on the keywords. Each document is indexed, that is, linked to a unique identifier.

  • · Keyword encryption

    • Term frequency calculation: To measure keyword importance within a document, we calculate the term frequency of extracted keywords for each document.

    • Keyword weighting: Weights for keywords are then calculated using schemes, such as TF-IDF. It is used for ranking by relevance.

    • Encryption of weights: A secure cryptographic scheme is used to encrypt the calculated keyword weights such that data confidentiality is achieved without sacrificing the ability to rank.

  • · Query construction

    • Multi-keyword query formation: We analyze the pattern [10] in which users of the service form a search query from multiple keywords. For each keyword in the query, weights are assigned based on user preferences or defaults.

    • Query encryption: The document index is encrypted using the same cryptographic algorithm and parameters for the query.

  • · Greedy depth-first encryption (GDFE)

    • Tree construction: The query keywords and the document index are organized hierarchically as a tree. A node is a keyword with a node weight signifying significance.

    • Greedy algorithm for relevance estimation:

      • * It traverses the tree depth-first starting at the root.

      • * Greedily, higher weight nodes are expanded first as they have higher relevance keywords.

    • Pruning irrelevant nodes: To cut down the search time, subtrees with low relevance (defined by a predefined threshold) are pruned. This step also reduces the computational overhead by ignoring documents that are less relevant.

    • Encrypted matching: Intra-region (or intra-node) matching is performed between encrypted query keywords and encrypted document keywords [11].

  • · Ranking computation

    • Relevance score calculation: The matched keyword weights of the documents are aggregated through a relevance score for each document [12]. The score is calculated in an encrypted domain to protect privacy.

    • Normalized ranking: The normalization of scores allows scores to be compared across documents. This enables ranked results effectively while maintaining data privacy during encryption.

  • Search result generation

    • Result compilation: Relevant scores for documents are computed and ranked in descending order.

    • Top-K retrieval: To reduce bandwidth consumption and improve efficiency, only the top-K most relevant documents are retrieved.

3.2 Data Preprocessing Strategy

All text-related actions in the datasets used to extract relevant keywords were included in document preprocessing [13]. The function responsible for preprocessing the data is represented as fdp = fl, fs, fst, which consists of numerous sub-functions. Stop word removal (fs), lexical analysis (fl), and stemming (fst) are included in this package, and the importance of each statement can be ascertained by summing the TF and IDF values of each sentence. The weights are used to select the most effective terms for inclusion in the keyword dictionary.

Wik=TF*IDF=1/DF=fik*Log(n/nk).

Here, fik calculates the occurrence rate of the word “I” in the given text. The equation above represents the cumulative number of records or files that have the term “i” with a value of “nk.” To generate a keyword collection [14], it is crucial to exclude unnecessary phrases from each page. The first step in constructing an index is generating a tree node for every page that acts as the terminal point of the tree. The interior nodes were formed sequentially from these terminal nodes. The following algorithms provide a comprehensive explanation of the process of encrypting data and constructing an index.

3.3 Architecture Model

An information provider, information consumer, and cloud storage server comprise the suggestions shown in Figure 2.

  • · Data owner: Wants to assign a group of documents [15] to a cloud server safely while yet enabling effective searches. They use the document collection to generate an encrypted version of the document set and a secure search tree index.

  • · Data users: Only authorized users can access the files owned by the data users. With a secret key and a set of query phrases, users may create a search-based trapdoor to retrieve encrypted documents from the cloud server.

  • · Cloud server: The encryption of the document collection and the upkeep of the searchable tree index [16] in the cloud are a responsibility of the server in the cloud. The system achieves this by searching the index tree for the user-requested term-document combination, and then retrieving the encrypted documents with the highest scores.

  • · Key distribution server: to ensure security and privacy for cloud-based data and search queries, a key distribution server (KDS) is required for key sharing.

The performance was evaluated using the REUTERS 21578 dataset, with parameters detailed in Table 2, which outlines the parameter settings for the analysis shown in the result. We predicted 90% and above precision, recall, accuracy, and F1score for the listed encryption algorithms, including the suggested algorithm. The k-means technique, implemented in Tables 2 and Python, was used to cluster the texts into 5 or 10 groups. The text count per cluster is shown in Table 2. The text number varied from 1,000 to 10,000 for comparative analysis. By concatenating the outputs of the various hash functions, the binary indices of 2688 bits were produced, and a reduction factor of 6 was shortened to 448 bits. The proposed scheme supports efficient conjunctive searching through keyword field free indexes, as initially suggested by Li et al. [17]. It differs from existing schemes by reducing the number of texts examined to retrieve relevant results. It only examines texts within the matching cluster, thereby reducing comparison counts and search time. The proposed scheme was compared with existing scheme [18], which were implemented using the same parameters [19] as those described in Table 2.

4.1 Search Efficiency

  • · Uniform text distribution: Each cluster, from 1 to 5, contains an identical number of texts, totaling 1,200 in each. Consequently, the sum of texts across all clusters amounts to 6,000.

  • · Non-uniform text distribution: The text count varies across Clusters 1 to 10, with the highest concentration in Cluster 1 (3,966 texts) and the lowest in Cluster 10 (239 texts), cumulating in a total of 10,000 texts. The approach for identifying relevant texts varies based on the clustering method employed. Therefore, these are two cluster methods used.

  • · Hard clustering: In hard clustering, texts are exclusively assigned to a single cluster. Algorithms in this category do not recognize multiple themes within a text. To find pertinent texts, one needs to identify the one relevant cluster and then search within it.

  • · Soft clustering: Soft clustering allows for a text’s presence in several clusters. These algorithms can discern multiple themes within a text, which aids in exploring various relationships among the data. Finding relevant texts in this context entails pinpointing all pertinent clusters and searching within them.

To be considered suitable for real-world applications, a search technique must possess high accuracy and efficiency. The proposed search technique enhances search efficiency by decreasing the average search time required to locate pertinent texts, in contrast to previous strategies [17] that involved scanning the complete collection of texts. Search accuracy was evaluated by computing metrics, such as recall, precision, F1-score, and false accept rate (FAR). We conducted a comprehensive evaluation of the compiled text using a set of 100 questions. Each query consisted of five relevant terms and 30 irrelevant terms, and the user’s text was devoid of content.

A comparison of the search accuracies is displayed in Figure 3. Based on the available information, we constructed a revised table that displays the search accuracy data.

Tables 3 and 4 show the search accuracy comparison between the newly introduced and existing schemes, as well as the tools and technology, indicating improvements in precision, F1-score, and reduction in the FAR. The recall remains the same for both schemes at 100%. The gain column represents the percentage increase or decrease in the performance index of the newly introduced scheme compared with existing schemes.

Explanation of metrics:Table 5 show results analysis of different test cases

  • · Dataset size: The term “data capacity” pertains to the overall quantity of data that can be processed by the search system.

  • · Query complexity: This study examines the relationship between the performance of the system and the complexity of the query, specifically in terms of the number of keywords and the use of synonym mapping.

  • · Response time: The amount of time that passes between starting a search query and the user seeing the search results afterward.

  • · Accuracy: The accuracy of the searching outcomes in relation to the retrieval of pertinent texts.

  • · System load: The term “computational load” refers to the number of computational resources required by a system during the processing of queries.

  • · Network latency: The duration required for data transmission over the network during the execution of a query.

  • · Indexing time: The time necessary to index the data for search operations.

  • · Encryption/decryption time: The duration required for the encryption of data before its storage and the subsequent decryption process during retrieval.

  • · Scalability: The system’s capacity to sustain performance levels while expanding in terms of data volume and user count.

  • · Fault tolerance: The evaluation of the system’s capacity to sustain functioning in the face of component failures.

  • · Throughput: The system’s query processing capacity within a specified time interval.

As the number of files approached 500, a slight difference between the lines became apparent. An unsuccessful search occurred when a query yielded no results. It is crucial to promptly identify unsuccessful searches to conserve cloud resources. The current research does not examine the time taken to declare that a search is unsuccessful. The quick recognition of an unsuccessful search can reduce users’ financial costs, making it a key performance metric.

Under current schemes [17], a failed search is determined after reviewing all text indices, requiring N comparisons to confirm that no relevant text exists. Conversely, the proposed scheme utilizes a cluster head that embodies all keywords within its cluster, allowing the determination of the presence or absence of text with search terms by reviewing only the cluster indices. Therefore, an unsuccessful search can only be concluded after K comparisons. Figure 4 illustrates the reduction in both the average number of comparisons and average time required to determine an unsuccessful search. Compared to previous schemes, the proposed method enhances the declaring efficiency unsuccessful searches by 99.29%.

4.2 Computation Cost

In the newly introduced search scheme, the time required to construct a searchable index encompasses the duration required to create both text and cluster indices. The build time of the index for the proposed scheme tends to be greater than that of the existing scheme [15, 16], which is attributed to the additional step of generating indexes for multiple clusters. The number of clusters, denoted by K, varies based on the application’s needs, size of the text collection, and clustering algorithm employed. Incorporating clustering into the index-building process resulted in a slight increase in the time required to create an index for a large text collection. For the graph, we illustrate the comparison of index-build times between the proposed and existing schemes, highlighting a minor increase owing to clustering. If you can provide any specific data or parameters that you would like to include in the graph, please provide them. Otherwise, we proceeded with a hypothetical representation. The average time required to construct a query, which encompasses HMAC computation, reduction, and bitwise-AND operations for each term, remains identical for both the proposed and existing schemes [15, 16] because the proposed scheme introduces no additional delays owing to clustering. The average time required to build queries, ranged from 1 to 5 genuine terms. As shown in Figure 5, scenarios with and without the inclusion of noise terms were considered. This timing was based on the mean time required to generate 200 queries with varying counts of 1–5 genuine terms. To create a graph without specific values, we obtain a plot of the average query build time as it varies with the number in both scenarios (with and without noise). As there are no specific values, we can create a hypothetical graph to illustrate this concept. A graph representing this scenario was generated.

4.3 Rank Efficiency

The efficiency of result ranking is evaluated by comparing the time required to generate ‘p’ indexes at different relevance levels within the text collection. The increase in index-build time owing to higher relevance levels is a one-time overhead managed during the offline stage by the data owner. The utilization of cloud resources and parallel processing can further reduce this impact. Thus, the extra time for creating multiple indexes is outweighed by the benefit of delivering superior-ranked search results to users. To illustrate this, in Figure 6, we plotted a graph showing the alignment of top-ranked texts from the proposed scheme with the top results from plain-text searches using hypothetical data.

4.4 Computational Complexity Analysis

Considering cloud-based multi-keyword ranked search encryption, we show that the GDFS algorithm incurs indexing and search time, which are the main performance factors. In GDFS, indexing builds a hierarchical structure for the encrypted data such that the time complexity is controlled by the depth and breadth of the tree-shaped structure, namely, O(n log n) in the worst case. This yields approximately O(k log n) search time, where k is the number of keywords and n is the number of web pages. The algorithm is efficient in terms of search capability, while simultaneously promising secure encryption, although its performance decreases when the dataset size or query complexity increases.

Using a greedy search algorithm that emphasizes depth, encrypted cloud-based multi-keyword ranked searches may be conducted in a confidential and efficient manner. This systems can adhere to strict privacy restrictions, while enabling prompt data retrieval. This technique is based on computational technology performance, and prioritizes relevant results through intelligent navigation of the encryption space. Encryption safeguards sensitive information from unauthorized access from hackers and other intruders. Ranked searching enhances the user experience by organizing the search results based on their relevance. By categorizing the findings, clients can save time and computer resources. In the realm of big data, handling complex queries requires the use of multi-keyword capabilities. Owing to the ever-evolving cyber dangers, achieving perfect security for any system is unachievable. Continuous research and development are essential for adapting and enhancing these systems to withstand emerging Vulnerabilities. This integration establishes a foundation for delivering safe cloud services and showcases interaction between speed and privacy. Cloud computing enables user-centric, efficient, and secure searches services by controlling and storing the data. This will be feasible because of advanced methods currently available.

Fig. 1.

Basic model.


Fig. 2.

Architecture.


Fig. 3.

Search accuracy comparison.


Fig. 4.

Unsuccessful search: gain in average search time.


Fig. 5.

Average query time by number of genuine term.


Fig. 6.

Rank efficiency of proposed search scheme.


Table. 1.

Table 1. Comparison of multi-keyword ranked search methodologies.

StudyMethodologyKey featuresPerformance metricsAdvantagesLimitationsTime complexity
Das and Kalra [3]Public key encryption with ranked keyword searchUtilizes secure kNN to achieve multi-keyword searchPrecision, recall, search efficiencyHigh security, efficient searchHigh computational overheadO(n2)
Guo et al. [4]Multi-keyword ranked search over encrypted cloud dataAdopts inner product similarity measure for rankingSearch accuracy, response timeEnhanced search accuracy, privacy-preservingLimited scalabilityO(n log n)
Liu et al. [5]Privacy-preserving multi-keyword ranked searchCoordinates matching and ranking with privacy-preserving operationsPrecision, recall, search efficiencyHigh accuracy, strong privacy guaranteesComputationally intensiveO(n2)
Gawade and Kadu [6]Privacy-preserving multi-keyword text searchLeverages homomorphic encryption for secure searchAccuracy, search time, encryption/decryption timeStrong security, efficient searchHigh storage and computation overheadO(n3)
Xu et al. [7]Efficient multi-keyword ranked searchCombines secure index with advanced rankingSearch precision, query response timeHigh precision, reduced query timeComplexity in implementationO(n log n)
Proposed workGreedy depth-first encryption (GDFE)Cloud-based, multi-keyword search, ranked results, greedy depth-first approachSearch efficiency, encryption/decryption time, scalability, privacyEnhanced performance, strong privacy, scalableTo be determined through experimental evaluationO(n log n)

Table. 2.

Table 2. Simulation environments with parameter.

DataseCluster countNumber of textsHash function for indexingHMAC functions for query constructionReduction factor (d)Final query length (r)Server configurationProgramming language
REUTERS-21578 [320]Uniform: 5, Non-uniform: 101,000 to 10,000MD5hash functions like SHA-256, SHA-384, and SHA-5126448 bitsProcessor Core s, 4 TB HDD, 64 GB RAM and morePython

Table. 3.

Table 3. Comparative analysis of proposed scheme and existing scheme (unit: %).

ParameterProposed schemeExisting schemeGain
Recall100%100%Same
Precision82.476.27+6.13
F1-score89.0784.89+4.18
FAR0.1280.286−55.24

Table. 4.

Table 4. Tools and technology.

Tool/technologyPurposeDescription
Encryption softwareData securitySoftware used to encrypt the cloud data. Examples include AES, RSA, or custom encryption algorithms.
Cloud platformData hostingCloud service provider used to host the encrypted data. This could be AWS, Azure, Google Cloud, etc.
Indexing engineData retrievalTool used to create searchable indexes for the encrypted data. Examples might include Apache Lucene or Elasticsearch.
Synonym databaseSearch enhancementA database or API service, such as WordNet that provides synonyms for extending search capabilities.
Greedy DFS algorithm implementationSearch algorithmCustom or pre-built greedy DFS algorithm used to perform the ranked searching.
Programming languageDevelopmentLanguage used for implementing the search algorithm and handling the encryption/decryption. Likely candidates are Python, Java, or C++.
Simulation softwareTesting & analysisSoftware used to simulate the cloud environment and measure the performance of the search algorithm. Could be MATLAB, Simulink, or a custom simulator.

Table. 5.

Table 5. Result analysis of different test case.

MetricTest case 1Test case 2Test case 3
Dataset size500 GB1 TB5 TB
Query complexity3 keywords5 keywords7 keywords
Response time1.2 s1.8 s2.5 s
Accuracy92%89%85%
System load15% CPU30% CPU50% CPU
Network latency50 ms70 ms90 ms
Indexing time10 min30 min1 hr
Encryption time5 min15 min45 min
Decryption time2 min6 min18 min
ScalabilityHighModerateLow
Fault tolerance99.99%99.95%99.9%

Table. 6.

Algorithm 1. The process of encrypting data and constructing an index..

Step 1: Encrypt data function
encryptData(data):
encryptedData = AES Encrypt(data, encryptionKey)
return encryptedData
Step 2: Build encrypted index
function buildEncryptedIndex(data):
index = createIndexStructure(data)
encryptedIndex = encryptData(index)
return encryptedIndex
return encryptedIndex
Procedure: Set of input document T threshold.
Data input: D = D1, D2, ...Dn
The output is a series of encrypted files and documents that is represented as D′ = D1′, D2′, ..., Dn
Index Set: I′ = I1′, I2′, ..., In
these are assigned at the moment of encryption.
T: Index tree
Method:
– For every d in D
– Take terms from documents ending in k = k1, k2, ..., kn.
– Take the stopwords out of k.
– Let k undergo lemmatization
– Determine k’s TF and IDF
– Use threshold and TF to filter k.
- then, encrypt k and produce I.
– Obtain key K’ from KDS
– Create D’ and encrypt D.
– Let’s wrap up.
- Upload D’s document set to the cloud.
– Transfer my index set to a cloud server
– Update the cloud I’s tree
– Go back
Step 3: Greedy depth-first search (GDFS) algorithm
– function GDFS(encryptedIndex, encryptedQuery):
– stack = initializeStack()
– stack.push(encryptedIndex.root)
– results = []
– while not stack.is m
– if secureKNN(node, encryptedQuery)
– if node.isLeaf( )
– results.add(node)
– else
– for child in node.children
– stack.push(child)
– return results
– data = “Sensitive data here”
– encryptedData = encryptData(data)
– encryptedIndex = buildEncryptedIndex(data)
– query = “Search query”
– searchResults = GDFS(encryptedIndex, query)
– print(“Search Results: ‘‘, searchResults)
Step 4: Using a search query for documents from the cloud Procedure: Enter D′ = D1′, D2′, ..., Dn′ collection of cloud-based documents
I′ = I1′, I2′, ..., In’, cloud set of indices
T=Cloud index tree
K: a group of keys or KDS
W: a group of search terms
Result: D= D1, D2, ..., Dn
Set of m pages that the F filters
matched with the term “w”
Method:
– For every W in W
– Remove stop terms
– Use lemmatization
– Should a filter include a synonym search
– Use WordNet to find the synonym for “w”
– Modify the word list.
- Close if
– Use SHA1 encryption to w
– Conclude
– Use filter F to upload search index I to the cloud.
– Use filters and apply DFS to T
– Obtain the corresponding documentation
– Sort the results according to matches
– Forward the appropriate D’ to the end user
– Receive keys from KDS
– Decrypt and show the outcome
– Go back

  1. Dai, X, Dai, H, Rong, C, Yang, G, Xiao, F, and Xiao, B (2022). Enhanced semantic-aware multi-keyword ranked search scheme over encrypted cloud data. IEEE Transactions on Cloud Computing. 10, 2595-2612. https://doi.org/10.1109/TCC.2020.3047921
    CrossRef
  2. Liu, L, and Chen, Q (2020). A novel feature matching ranked search mechanism over encrypted cloud data. IEEE Access. 8, 114057-114065. https://doi.org/10.1109/ACCESS.2020.3002236
    CrossRef
  3. Das, D, and Kalra, S . An efficient LSI based multi-keyword ranked search algorithm on encrypted data in cloud environment., Proceedings of 2020 International Wireless Communications and Mobile Computing (IWCMC), 2020, Limassol, Cyprus, Array, pp.1777-1782. https://doi.org/10.1109/IWCMC48107.2020.9148123
  4. Guo, C, Zhuang, R, Chang, CC, and Yuan, Q (2019). multi-keyword ranked search based on bloom filter over encrypted cloud data. IEEE Access. 7, 35826-35837. https://doi.org/10.1109/ACCESS.2019.2904763
    CrossRef
  5. Liu, G, Yang, G, Bai, S, Wang, H, and Xiang, Y (2022). FASE: a fast and accurate privacy-preserving multi-keyword top-k retrieval scheme over encrypted cloud data. IEEE Transactions on Services Computing. 15, 1855-1867. https://doi.org/10.1109/TSC.2020.3023393
    CrossRef
  6. Gawade, S, and Kadu, S . Secure data storage and efficient data retrieval over cloud using sensitive hashing., Proceedings of 2018 2nd International Conference on Intelligent Computing and Control Systems (ICICCS), 2018, Madurai, India, Array, pp.87-92. https://doi.org/10.1109/ICCONS.2018.8662973
  7. Xu, J, Huang, X, Yang, G, and Wu, Y . An efficient multi-keyword top-k search scheme over encrypted cloud data., Proceedings of 2018 15th International Symposium on Pervasive Systems, Algorithms and Networks (I-SPAN), 2018, Yichang, China, Array, pp.305-310. https://doi.org/10.1109/I-SPAN.2018.00059
  8. Mlgheit, JR, Houssein, EH, and Zayed, HH . Efficient privacy preserving of multi-keyword ranked search model over encrypted cloud computing., Proceedings of 2018 1st International Conference on Computer Applications & Information Security (ICCAIS), 2018, Riyadh, Saudi Arabia, Array, pp.1-6. https://doi.org/10.1109/CAIS.2018.8441944
  9. Peng, T, Lin, Y, Yao, X, and Zhang, W (2018). An efficient ranked multi-keyword search for multiple data owners over encrypted cloud data. IEEE Access. 6, 21924-21933. https://doi.org/10.1109/ACCESS.2018.2828404
    CrossRef
  10. Deepa, N, Vijayakumar, P, Rawal, BS, and Balamurugan, B . An extensive review and possible attack on the privacy preserving ranked multi-keyword search for multiple data owners in cloud computing., Proceedings of 2017 IEEE International Conference on Smart Cloud (SmartCloud), 2017, New York, NY, USA, Array, pp.149-154. https://doi.org/10.1109/SmartCloud.2017.30
  11. Seema, S, Harshitha, Y, and Apoorva, P . Centralized multi-user and dynamic multi-keywords search scheme over encrypted cloud data., Proceedings of 2017 International Conference on Communication and Signal Processing (ICCSP), 2017, Chennai, India, Array, pp.0913-0917. https://doi.org/10.1109/ICCSP.2017.8286502
  12. Jivane, AB . Time efficient privacy-preserving multi-keyword ranked search over encrypted cloud data., Proceedings of 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), 2017, Chennai, India, Array, pp.497-503. https://doi.org/10.1109/ICPCSI.2017.8392345
  13. Ponnusamy, PP, Vidhyapriya, R, and Maheswari, SU . A survey on multi-keyword ranked search manipulations over encrypted cloud data., Proceedings of 2017 International Conference on Computer Communication and Informatics (ICCCI), 2017, Coimbatore, India, Array, pp.1-5. https://doi.org/10.1109/ICCCI.2017.8117731
  14. Saiharitha, V, and Saritha, SJ . A privacy and dynamic multi-keyword ranked search scheme over cloud data encrypted., Proceedings of 2016 International Conference on Communication and Electronics Systems (ICCES), 2016, Coimbatore, India, Array, pp.1-5. https://doi.org/10.1109/CESYS.2016.7890001
  15. Yao, X, Lin, Y, Liu, Q, and Zhang, J (2018). Privacy-preserving search over encrypted personal health record in multi-source cloud. IEEE Access. 6, 3809-3823. https://doi.org/10.1109/ACCESS.2018.2793304
    CrossRef
  16. Dai, H, Ji, Y, Yang, G, Huang, H, and Yi, X (2019). A privacy-preserving multi-keyword ranked search over encrypted data in hybrid clouds. IEEE Access. 8, 4895-4907. https://doi.org/10.1109/ACCESS.2019.2963096
    CrossRef
  17. Li, H, Liu, D, Dai, Y, Luan, TH, and Shen, XS (2015). Enabling efficient multi-keyword ranked search over encrypted mobile cloud data through blind storage. IEEE Transactions on Emerging Topics in Computing. 3, 127-138. https://doi.org/10.1109/TETC.2014.2371239
    CrossRef
  18. Ajai, AK, and Rajesh, RS . Hierarchical multi-keyword ranked search for secured document retrieval in public clouds., Proceedings of 2014 International Conference on Communication and Network Technologies, 2014, Sivakasi, India, Array, pp.33-37. https://doi.org/10.1109/CNT.2014.7062720
  19. Cao, N, Wang, C, Li, M, Ren, K, and Lou, W (2014). Privacy-preserving multi-keyword ranked search over encrypted cloud data. IEEE Transactions on Parallel and Distributed Systems. 25, 222-233. https://doi.org/10.1109/TPDS.2013.45
    CrossRef

Narendra Shyam Joshi is a Ph.D. candidate in the Department of Computer Science & Engineering at KLS Gogte Institute of Technology, Belagavi, and Karnataka, Visvesvaraya Technological University. He has 15+ years of teaching experience, has published 10+ papers in national and international journals, and has three books for diploma students to his credit. He has delivered technical talks on current trends and technologies in various Engineering/BCA/12th Science colleges.

Kuldeep P. Sambrekar is a professor in the Department of Computer Science & Engineering at KLS Gogte Institute of Technology, Belagavi, India. He received his Ph.D in Computer Science from Visvesvaraya Technological University in 2020. He has more than 17 years of teaching experience. He has filed two patents for credit. He has published more than 25 papers in national and international journals, and has two book chapters on his credit. He has been invited as a session chair for many IEEE conferences.

Abhijit J. Patankar works as an associate professor in the Information Technology Department at the D. Y. Patil College of Engineering, Akurdi Pune, India. He has 23 years of teaching and research experience. He is a member of the Board of Studies SPPU Pune. He completed Ph.D. in Computer Science and Engineering from VTU Belagavi. His research areas include cloud computing, data science, technology, and AI.

Archana JaganathJadhav works as an assistant professor with Dr. D. Y. Patil College of Engineering, Management and Research, India. She previously worked as an assistant professor at Alard College of Engineering and Management, India.

Prajakta Ajay Khadkikar is an accomplished assistant professor with a distinguished career in innovative technical teaching spanning over 15 years. She possesses a wealth of expertise in conducting impactful computer science research, as evidenced by numerous publications in reputable journal.

Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(4): 416-427

Published online December 25, 2024 https://doi.org/10.5391/IJFIS.2024.24.4.416

Copyright © The Korean Institute of Intelligent Systems.

Enhancing Performance and Privacy on Cloud-Based Multi-Keyword Ranked Search Encryption Using Greedy Depth-First Encryption

Narendra Shyam Joshi1, Kuldeep P. Sambrekar1, Abhijit J. Patankar2, Archana Jadhav3, and Prajakta Ajay Khadkikar4

1Department of Computer Science and Engineering, KLS Gogte Institute of Technology, Visvesvaraya Technological University, Belagavi Karnataka, India
2Department of Information Technology, D Y Patil College of Engineering, Savitribai Phule Pune University, Akurdi Pune, India
3Department of Artificial Intelligence and Data Science, D Y Patil Institute of Engineering Management and Research, Savitribai Phule Pune University, Akurdi Pune, India
4Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Savitribai Phule Pune University, Dhanakavadi Pune, India

Correspondence to:Narendra Shyam Joshi (nsjsandip100@gmail.com)

Received: July 17, 2024; Accepted: December 11, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The increasing usage of cloud services is correlated with the increasing importance of the data security issues linked to these services. Therefore, numerous individuals are concerned about the risk of eavesdropping on sensitive information while it is being transmitted via cloud services. If unauthorized individuals gain access to a firm’s cloud tools, they may harm the company’s databases, or misuse them. Encryption is crucial for ensuring data security when transmitting data to the cloud. Searching for a large amount of encrypted user data poses unique challenges that are not encountered in other scenarios. During the experiment, we reviewed previous studies to examine how other researchers have addressed similar difficulties. Our main objective is to deliver the quickest achievable search speed while maintaining the highest levels of data protection and computing efficiency. We developed a cutting-edge data encryption method specifically designed for cloud computing. We believe that implementing the Greedy depth-first search (GDFS) ranked search technique significantly simplifies the task of finding encrypted content. We aim to optimize the utilization of cloud computing to enhance time efficiency. We are currently exploring methods for integrating multiple encryption algorithms into a unified method that is compatible with various techniques.

Keywords: Efficient keyword search, Privacy preservation, Binary tree search, Ranked DFS search, Cloud-data encryption

1. Introduction

Cloud computing has revolutionized various fields in recent years by transforming data storage, processing, and access. Cloud computing is rapidly gaining popularity among enterprises and individuals owing to its scalability, agility, and affordability. The potential lack of security of the data stored on cloud servers is a significant problem. One of the key features of cloud computing is the organization’s ability to secure sensitive data from being lost, stolen, or compromised. Cloud environments present new risks and challenges. Although traditional security measures, such as access control and encryption, are widely used, they may be insufficient. Therefore, novel strategies are required to improve cloud data security. In this study, we propose a hybrid strategy that combines ranked searching and Greedy depth-first search (GDFS) [1] to address the issue of cloud data security. Our objective is to maximize retrieval efficiency while minimizing waiting times by utilizing the GDFS algorithm [2], which is especially well suited for searching and navigating huge datasets. The proposed method can increase the effectiveness of data retrieval from the cloud by using this technology. To further enhance users’ privacy and safety, the proposed hybrid technique incorporates ranked searching. Encryption and indexing of sensitive data using ranked search can reduce the risk of falling into the incorrect hand. This reduces the possibility that it is depicted in Figure 1.

The proposed solution uses a methodology that guarantees cloud data secrecy. The key security problems in cloud computing can be addressed by employing a hybrid strategy that combines enhanced security measures with improved data retrieval speeds. Enhancing data access and implementing encryption measures can reduce the chances of data breaches, unauthorized access, and data leakage. The proposed technique strikes a balance between data protection and system performance to ensure that security measures do not hinder cloud-computing scalability and adaptability. This study introduces an innovative method to safeguard data stored in the cloud. It combines the advantages of ranked search techniques with a GDFS. By addressing serious security issues, utilizing effective data retrieval, and boosting data protection, this approach provides an effective solution for safeguarding sensitive data stored in the cloud. The findings of this study provide important guidance for improving cloud security, which will drive advancement in the cloud-computing industry.

The objectives of this study are as follows:

  • 1. Learn the concept of data entry into a cloud system.

  • 2. Create a system that can enhance both the performance and privacy by integrating a cloud-based multi-keyword ranked search with GDFS encryption.

  • 3. Experimentally demonstrate that our proposed methodology is better than the present multikey encryption method.

The remainder of this paper is structured as follows: in Section 2, we present the groundwork for our study, including a formal definition and security model, and address specific queries. Section 3 provides a detailed explanation of the creation of the search strategy for semantic terms, including relevant examples. Section 4 presents the security analysis. Section 5 presents the results of both theoretical and experimental investigations. The conclusion is presented in Section 6.

2. Related Work

Extensive research has been conducted on techniques for searching encrypted cloud data. While some researchers have focused on methods for recovering lost information, others have endeavored to improve the outcomes of search engine queries. Below, we provide a summary of the most noteworthy studies on this topic. This method was devised to expedite the identification of subgraphs pertinent to a specified query. This technique reduced the search area by utilizing the graph’s topology and priority queue processes. Experimental results demonstrate that the newly proposed strategy outperforms other current approaches in terms of effectiveness, precision, and efficiency when handling large graphs. The most recent approach for performing ranked searches on large graphs involves utilizing a fast and GDFS algorithm. This approach utilizes pruning techniques to optimize the search process and enhance the overall efficiency. Vertically, we assigned a score to each subgraph, depending on the level of correspondence with the query. This approach proficiently extracts subgraphs from large graphs. This method utilizes geometric mean fusion in ambient intelligence and humanized computing [3] to enhance the speed and accuracy of subgraphs. The objective is to identify subgraphs that closely match a given query by calculating identical scores and results for all subgraphs. This method combines similarity metrics using geometric mean fusion to calculate scores. An improved version of the GDFS ranked search method that considers the entire node distance was introduced in a 2020 publication in a cluster computing journal [4]. The work analysis results are listed in Table 1 [37].

We developed a GDFS ranked search algorithm as an improved method for implementing search algorithms [8]. This method was developed expressly for this study, and is based on the cumulative distance between nodes contained within subgraphs. Because the cloud server is responsible for managing both the encrypted searchable tree index I and encrypted document collection D, the owner of the data is assured control over both. By using the term-document combination that the user has requested in the index tree, the cloud server can obtain the top-k encrypted documents that have the highest ratings. Every change made by the data owner must be reflected in document collections D and I, which are stored on the server. An efficient method that uses a priority queue is proposed to simultaneously retrieve relevant subgraphs for a large number of requests. Additionally, it narrows the search field by employing a pruning technique based on dynamic programming, making the search process more straightforward. Table 1 presents the results of this comparison.

3. Methodology and Algorithms

Efficient, secure, and privacy preserving multi-keyword ranked search encryption for data stored in the cloud has become a critical area in cloud security. Currently, traditional encryption schemes, hierarchical indexing, and simple searching are the most widely used techniques for ranking search functionalities, while preserving privacy at the expense of performance. These methods employ structures, such as the secure k-nearest neighbor (k-NN) and term frequency-inverse document frequency (TF-IDF) scoring mechanisms to rank the search results. However, they experience issues, such as high computational overhead, poor scalability, and vulnerability to inference attacks. To address these issues, we propose a new approach in the form of GDFS–based encryption. This method implements a depth-first traversal technique along with dynamic optimization in the search process. GDFS [9] is a hierarchical encryption-based approach for data retrieval, among which data are encrypted, and the traversal mechanism greedily evaluates terms of query and ranks scores, achieving a faster search with less computational overhead. In contrast to breadth-first or other exhaustive approaches, GDFS attains depth (resp. width) first paradigm and prunes irrelevant nodes early in the search, thereby avoiding unnecessary searches for irrelevant data. It speeds up searches and limits the exposure of metadata to increase privacy guarantees. Furthermore, the encryption is continuously updated according to the relevance of the keywords and how frequently they appear, thereby optimizing the ranking while offering data confidentiality. The performance evaluation of the proposed GDFS approach demonstrated notable improvements in query latency, encryption complexity, and data security performance compared with existing schemes. By integrating obfuscation techniques and relevance-preserving transformations [9], the resulting protocols enjoy a privacy-preserving nature, rendering adversaries unable to infer sensitive information from access patterns. However, current techniques heavily depend on static encryption and ranking algorithms, which do not evolve gracefully as datasets become larger or more complicated. Natively, GDFS adapts to provide a flexible and robust solution with realistic dynamic query frequencies for real-time cloud environments. This analysis also uses greedy optimization to ensure that the system can dynamically provide computational resources, thereby improving the energy efficiency, which is a key requirement for large-scale cloud implementations. This approach fills this gap by balancing the enhancement of privacy and ensuring that end users and organizations use secure cloud storage solutions for processing sensitive data. The GDFS encryption paradigm proposed in this thesis integrates advanced cryptographic techniques and efficient search mechanisms to achieve a promising tradeoff between security, privacy, and efficiency of cloud based multi keyword ranked searches.

3.1 Step-by-Step Methodology

  • · Data preparation and indexing

    • Input dataset: Keywords are extracted from a collection of plaintext documents.

    • Keyword tokenization: Techniques for natural language processing are used to extract keywords from a text.

    • Indexing: A structured representation of each document is created by indexing the document on the keywords. Each document is indexed, that is, linked to a unique identifier.

  • · Keyword encryption

    • Term frequency calculation: To measure keyword importance within a document, we calculate the term frequency of extracted keywords for each document.

    • Keyword weighting: Weights for keywords are then calculated using schemes, such as TF-IDF. It is used for ranking by relevance.

    • Encryption of weights: A secure cryptographic scheme is used to encrypt the calculated keyword weights such that data confidentiality is achieved without sacrificing the ability to rank.

  • · Query construction

    • Multi-keyword query formation: We analyze the pattern [10] in which users of the service form a search query from multiple keywords. For each keyword in the query, weights are assigned based on user preferences or defaults.

    • Query encryption: The document index is encrypted using the same cryptographic algorithm and parameters for the query.

  • · Greedy depth-first encryption (GDFE)

    • Tree construction: The query keywords and the document index are organized hierarchically as a tree. A node is a keyword with a node weight signifying significance.

    • Greedy algorithm for relevance estimation:

      • * It traverses the tree depth-first starting at the root.

      • * Greedily, higher weight nodes are expanded first as they have higher relevance keywords.

    • Pruning irrelevant nodes: To cut down the search time, subtrees with low relevance (defined by a predefined threshold) are pruned. This step also reduces the computational overhead by ignoring documents that are less relevant.

    • Encrypted matching: Intra-region (or intra-node) matching is performed between encrypted query keywords and encrypted document keywords [11].

  • · Ranking computation

    • Relevance score calculation: The matched keyword weights of the documents are aggregated through a relevance score for each document [12]. The score is calculated in an encrypted domain to protect privacy.

    • Normalized ranking: The normalization of scores allows scores to be compared across documents. This enables ranked results effectively while maintaining data privacy during encryption.

  • Search result generation

    • Result compilation: Relevant scores for documents are computed and ranked in descending order.

    • Top-K retrieval: To reduce bandwidth consumption and improve efficiency, only the top-K most relevant documents are retrieved.

3.2 Data Preprocessing Strategy

All text-related actions in the datasets used to extract relevant keywords were included in document preprocessing [13]. The function responsible for preprocessing the data is represented as fdp = fl, fs, fst, which consists of numerous sub-functions. Stop word removal (fs), lexical analysis (fl), and stemming (fst) are included in this package, and the importance of each statement can be ascertained by summing the TF and IDF values of each sentence. The weights are used to select the most effective terms for inclusion in the keyword dictionary.

Wik=TF*IDF=1/DF=fik*Log(n/nk).

Here, fik calculates the occurrence rate of the word “I” in the given text. The equation above represents the cumulative number of records or files that have the term “i” with a value of “nk.” To generate a keyword collection [14], it is crucial to exclude unnecessary phrases from each page. The first step in constructing an index is generating a tree node for every page that acts as the terminal point of the tree. The interior nodes were formed sequentially from these terminal nodes. The following algorithms provide a comprehensive explanation of the process of encrypting data and constructing an index.

3.3 Architecture Model

An information provider, information consumer, and cloud storage server comprise the suggestions shown in Figure 2.

  • · Data owner: Wants to assign a group of documents [15] to a cloud server safely while yet enabling effective searches. They use the document collection to generate an encrypted version of the document set and a secure search tree index.

  • · Data users: Only authorized users can access the files owned by the data users. With a secret key and a set of query phrases, users may create a search-based trapdoor to retrieve encrypted documents from the cloud server.

  • · Cloud server: The encryption of the document collection and the upkeep of the searchable tree index [16] in the cloud are a responsibility of the server in the cloud. The system achieves this by searching the index tree for the user-requested term-document combination, and then retrieving the encrypted documents with the highest scores.

  • · Key distribution server: to ensure security and privacy for cloud-based data and search queries, a key distribution server (KDS) is required for key sharing.

4. Results and Discussion

The performance was evaluated using the REUTERS 21578 dataset, with parameters detailed in Table 2, which outlines the parameter settings for the analysis shown in the result. We predicted 90% and above precision, recall, accuracy, and F1score for the listed encryption algorithms, including the suggested algorithm. The k-means technique, implemented in Tables 2 and Python, was used to cluster the texts into 5 or 10 groups. The text count per cluster is shown in Table 2. The text number varied from 1,000 to 10,000 for comparative analysis. By concatenating the outputs of the various hash functions, the binary indices of 2688 bits were produced, and a reduction factor of 6 was shortened to 448 bits. The proposed scheme supports efficient conjunctive searching through keyword field free indexes, as initially suggested by Li et al. [17]. It differs from existing schemes by reducing the number of texts examined to retrieve relevant results. It only examines texts within the matching cluster, thereby reducing comparison counts and search time. The proposed scheme was compared with existing scheme [18], which were implemented using the same parameters [19] as those described in Table 2.

4.1 Search Efficiency

  • · Uniform text distribution: Each cluster, from 1 to 5, contains an identical number of texts, totaling 1,200 in each. Consequently, the sum of texts across all clusters amounts to 6,000.

  • · Non-uniform text distribution: The text count varies across Clusters 1 to 10, with the highest concentration in Cluster 1 (3,966 texts) and the lowest in Cluster 10 (239 texts), cumulating in a total of 10,000 texts. The approach for identifying relevant texts varies based on the clustering method employed. Therefore, these are two cluster methods used.

  • · Hard clustering: In hard clustering, texts are exclusively assigned to a single cluster. Algorithms in this category do not recognize multiple themes within a text. To find pertinent texts, one needs to identify the one relevant cluster and then search within it.

  • · Soft clustering: Soft clustering allows for a text’s presence in several clusters. These algorithms can discern multiple themes within a text, which aids in exploring various relationships among the data. Finding relevant texts in this context entails pinpointing all pertinent clusters and searching within them.

To be considered suitable for real-world applications, a search technique must possess high accuracy and efficiency. The proposed search technique enhances search efficiency by decreasing the average search time required to locate pertinent texts, in contrast to previous strategies [17] that involved scanning the complete collection of texts. Search accuracy was evaluated by computing metrics, such as recall, precision, F1-score, and false accept rate (FAR). We conducted a comprehensive evaluation of the compiled text using a set of 100 questions. Each query consisted of five relevant terms and 30 irrelevant terms, and the user’s text was devoid of content.

A comparison of the search accuracies is displayed in Figure 3. Based on the available information, we constructed a revised table that displays the search accuracy data.

Tables 3 and 4 show the search accuracy comparison between the newly introduced and existing schemes, as well as the tools and technology, indicating improvements in precision, F1-score, and reduction in the FAR. The recall remains the same for both schemes at 100%. The gain column represents the percentage increase or decrease in the performance index of the newly introduced scheme compared with existing schemes.

Explanation of metrics:Table 5 show results analysis of different test cases

  • · Dataset size: The term “data capacity” pertains to the overall quantity of data that can be processed by the search system.

  • · Query complexity: This study examines the relationship between the performance of the system and the complexity of the query, specifically in terms of the number of keywords and the use of synonym mapping.

  • · Response time: The amount of time that passes between starting a search query and the user seeing the search results afterward.

  • · Accuracy: The accuracy of the searching outcomes in relation to the retrieval of pertinent texts.

  • · System load: The term “computational load” refers to the number of computational resources required by a system during the processing of queries.

  • · Network latency: The duration required for data transmission over the network during the execution of a query.

  • · Indexing time: The time necessary to index the data for search operations.

  • · Encryption/decryption time: The duration required for the encryption of data before its storage and the subsequent decryption process during retrieval.

  • · Scalability: The system’s capacity to sustain performance levels while expanding in terms of data volume and user count.

  • · Fault tolerance: The evaluation of the system’s capacity to sustain functioning in the face of component failures.

  • · Throughput: The system’s query processing capacity within a specified time interval.

As the number of files approached 500, a slight difference between the lines became apparent. An unsuccessful search occurred when a query yielded no results. It is crucial to promptly identify unsuccessful searches to conserve cloud resources. The current research does not examine the time taken to declare that a search is unsuccessful. The quick recognition of an unsuccessful search can reduce users’ financial costs, making it a key performance metric.

Under current schemes [17], a failed search is determined after reviewing all text indices, requiring N comparisons to confirm that no relevant text exists. Conversely, the proposed scheme utilizes a cluster head that embodies all keywords within its cluster, allowing the determination of the presence or absence of text with search terms by reviewing only the cluster indices. Therefore, an unsuccessful search can only be concluded after K comparisons. Figure 4 illustrates the reduction in both the average number of comparisons and average time required to determine an unsuccessful search. Compared to previous schemes, the proposed method enhances the declaring efficiency unsuccessful searches by 99.29%.

4.2 Computation Cost

In the newly introduced search scheme, the time required to construct a searchable index encompasses the duration required to create both text and cluster indices. The build time of the index for the proposed scheme tends to be greater than that of the existing scheme [15, 16], which is attributed to the additional step of generating indexes for multiple clusters. The number of clusters, denoted by K, varies based on the application’s needs, size of the text collection, and clustering algorithm employed. Incorporating clustering into the index-building process resulted in a slight increase in the time required to create an index for a large text collection. For the graph, we illustrate the comparison of index-build times between the proposed and existing schemes, highlighting a minor increase owing to clustering. If you can provide any specific data or parameters that you would like to include in the graph, please provide them. Otherwise, we proceeded with a hypothetical representation. The average time required to construct a query, which encompasses HMAC computation, reduction, and bitwise-AND operations for each term, remains identical for both the proposed and existing schemes [15, 16] because the proposed scheme introduces no additional delays owing to clustering. The average time required to build queries, ranged from 1 to 5 genuine terms. As shown in Figure 5, scenarios with and without the inclusion of noise terms were considered. This timing was based on the mean time required to generate 200 queries with varying counts of 1–5 genuine terms. To create a graph without specific values, we obtain a plot of the average query build time as it varies with the number in both scenarios (with and without noise). As there are no specific values, we can create a hypothetical graph to illustrate this concept. A graph representing this scenario was generated.

4.3 Rank Efficiency

The efficiency of result ranking is evaluated by comparing the time required to generate ‘p’ indexes at different relevance levels within the text collection. The increase in index-build time owing to higher relevance levels is a one-time overhead managed during the offline stage by the data owner. The utilization of cloud resources and parallel processing can further reduce this impact. Thus, the extra time for creating multiple indexes is outweighed by the benefit of delivering superior-ranked search results to users. To illustrate this, in Figure 6, we plotted a graph showing the alignment of top-ranked texts from the proposed scheme with the top results from plain-text searches using hypothetical data.

4.4 Computational Complexity Analysis

Considering cloud-based multi-keyword ranked search encryption, we show that the GDFS algorithm incurs indexing and search time, which are the main performance factors. In GDFS, indexing builds a hierarchical structure for the encrypted data such that the time complexity is controlled by the depth and breadth of the tree-shaped structure, namely, O(n log n) in the worst case. This yields approximately O(k log n) search time, where k is the number of keywords and n is the number of web pages. The algorithm is efficient in terms of search capability, while simultaneously promising secure encryption, although its performance decreases when the dataset size or query complexity increases.

5. Conclusion and Future Work

Using a greedy search algorithm that emphasizes depth, encrypted cloud-based multi-keyword ranked searches may be conducted in a confidential and efficient manner. This systems can adhere to strict privacy restrictions, while enabling prompt data retrieval. This technique is based on computational technology performance, and prioritizes relevant results through intelligent navigation of the encryption space. Encryption safeguards sensitive information from unauthorized access from hackers and other intruders. Ranked searching enhances the user experience by organizing the search results based on their relevance. By categorizing the findings, clients can save time and computer resources. In the realm of big data, handling complex queries requires the use of multi-keyword capabilities. Owing to the ever-evolving cyber dangers, achieving perfect security for any system is unachievable. Continuous research and development are essential for adapting and enhancing these systems to withstand emerging Vulnerabilities. This integration establishes a foundation for delivering safe cloud services and showcases interaction between speed and privacy. Cloud computing enables user-centric, efficient, and secure searches services by controlling and storing the data. This will be feasible because of advanced methods currently available.

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Fig 1.

Figure 1.

Basic model.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 416-427https://doi.org/10.5391/IJFIS.2024.24.4.416

Fig 2.

Figure 2.

Architecture.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 416-427https://doi.org/10.5391/IJFIS.2024.24.4.416

Fig 3.

Figure 3.

Search accuracy comparison.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 416-427https://doi.org/10.5391/IJFIS.2024.24.4.416

Fig 4.

Figure 4.

Unsuccessful search: gain in average search time.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 416-427https://doi.org/10.5391/IJFIS.2024.24.4.416

Fig 5.

Figure 5.

Average query time by number of genuine term.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 416-427https://doi.org/10.5391/IJFIS.2024.24.4.416

Fig 6.

Figure 6.

Rank efficiency of proposed search scheme.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 416-427https://doi.org/10.5391/IJFIS.2024.24.4.416

Table 1 . Comparison of multi-keyword ranked search methodologies.

StudyMethodologyKey featuresPerformance metricsAdvantagesLimitationsTime complexity
Das and Kalra [3]Public key encryption with ranked keyword searchUtilizes secure kNN to achieve multi-keyword searchPrecision, recall, search efficiencyHigh security, efficient searchHigh computational overheadO(n2)
Guo et al. [4]Multi-keyword ranked search over encrypted cloud dataAdopts inner product similarity measure for rankingSearch accuracy, response timeEnhanced search accuracy, privacy-preservingLimited scalabilityO(n log n)
Liu et al. [5]Privacy-preserving multi-keyword ranked searchCoordinates matching and ranking with privacy-preserving operationsPrecision, recall, search efficiencyHigh accuracy, strong privacy guaranteesComputationally intensiveO(n2)
Gawade and Kadu [6]Privacy-preserving multi-keyword text searchLeverages homomorphic encryption for secure searchAccuracy, search time, encryption/decryption timeStrong security, efficient searchHigh storage and computation overheadO(n3)
Xu et al. [7]Efficient multi-keyword ranked searchCombines secure index with advanced rankingSearch precision, query response timeHigh precision, reduced query timeComplexity in implementationO(n log n)
Proposed workGreedy depth-first encryption (GDFE)Cloud-based, multi-keyword search, ranked results, greedy depth-first approachSearch efficiency, encryption/decryption time, scalability, privacyEnhanced performance, strong privacy, scalableTo be determined through experimental evaluationO(n log n)

Table 2 . Simulation environments with parameter.

DataseCluster countNumber of textsHash function for indexingHMAC functions for query constructionReduction factor (d)Final query length (r)Server configurationProgramming language
REUTERS-21578 [320]Uniform: 5, Non-uniform: 101,000 to 10,000MD5hash functions like SHA-256, SHA-384, and SHA-5126448 bitsProcessor Core s, 4 TB HDD, 64 GB RAM and morePython

Table 3 . Comparative analysis of proposed scheme and existing scheme (unit: %).

ParameterProposed schemeExisting schemeGain
Recall100%100%Same
Precision82.476.27+6.13
F1-score89.0784.89+4.18
FAR0.1280.286−55.24

Table 4 . Tools and technology.

Tool/technologyPurposeDescription
Encryption softwareData securitySoftware used to encrypt the cloud data. Examples include AES, RSA, or custom encryption algorithms.
Cloud platformData hostingCloud service provider used to host the encrypted data. This could be AWS, Azure, Google Cloud, etc.
Indexing engineData retrievalTool used to create searchable indexes for the encrypted data. Examples might include Apache Lucene or Elasticsearch.
Synonym databaseSearch enhancementA database or API service, such as WordNet that provides synonyms for extending search capabilities.
Greedy DFS algorithm implementationSearch algorithmCustom or pre-built greedy DFS algorithm used to perform the ranked searching.
Programming languageDevelopmentLanguage used for implementing the search algorithm and handling the encryption/decryption. Likely candidates are Python, Java, or C++.
Simulation softwareTesting & analysisSoftware used to simulate the cloud environment and measure the performance of the search algorithm. Could be MATLAB, Simulink, or a custom simulator.

Table 5 . Result analysis of different test case.

MetricTest case 1Test case 2Test case 3
Dataset size500 GB1 TB5 TB
Query complexity3 keywords5 keywords7 keywords
Response time1.2 s1.8 s2.5 s
Accuracy92%89%85%
System load15% CPU30% CPU50% CPU
Network latency50 ms70 ms90 ms
Indexing time10 min30 min1 hr
Encryption time5 min15 min45 min
Decryption time2 min6 min18 min
ScalabilityHighModerateLow
Fault tolerance99.99%99.95%99.9%

Algorithm 1. The process of encrypting data and constructing an index..

Step 1: Encrypt data function
encryptData(data):
encryptedData = AES Encrypt(data, encryptionKey)
return encryptedData
Step 2: Build encrypted index
function buildEncryptedIndex(data):
index = createIndexStructure(data)
encryptedIndex = encryptData(index)
return encryptedIndex
return encryptedIndex
Procedure: Set of input document T threshold.
Data input: D = D1, D2, ...Dn
The output is a series of encrypted files and documents that is represented as D′ = D1′, D2′, ..., Dn
Index Set: I′ = I1′, I2′, ..., In
these are assigned at the moment of encryption.
T: Index tree
Method:
– For every d in D
– Take terms from documents ending in k = k1, k2, ..., kn.
– Take the stopwords out of k.
– Let k undergo lemmatization
– Determine k’s TF and IDF
– Use threshold and TF to filter k.
- then, encrypt k and produce I.
– Obtain key K’ from KDS
– Create D’ and encrypt D.
– Let’s wrap up.
- Upload D’s document set to the cloud.
– Transfer my index set to a cloud server
– Update the cloud I’s tree
– Go back
Step 3: Greedy depth-first search (GDFS) algorithm
– function GDFS(encryptedIndex, encryptedQuery):
– stack = initializeStack()
– stack.push(encryptedIndex.root)
– results = []
– while not stack.is m
– if secureKNN(node, encryptedQuery)
– if node.isLeaf( )
– results.add(node)
– else
– for child in node.children
– stack.push(child)
– return results
– data = “Sensitive data here”
– encryptedData = encryptData(data)
– encryptedIndex = buildEncryptedIndex(data)
– query = “Search query”
– searchResults = GDFS(encryptedIndex, query)
– print(“Search Results: ‘‘, searchResults)
Step 4: Using a search query for documents from the cloud Procedure: Enter D′ = D1′, D2′, ..., Dn′ collection of cloud-based documents
I′ = I1′, I2′, ..., In’, cloud set of indices
T=Cloud index tree
K: a group of keys or KDS
W: a group of search terms
Result: D= D1, D2, ..., Dn
Set of m pages that the F filters
matched with the term “w”
Method:
– For every W in W
– Remove stop terms
– Use lemmatization
– Should a filter include a synonym search
– Use WordNet to find the synonym for “w”
– Modify the word list.
- Close if
– Use SHA1 encryption to w
– Conclude
– Use filter F to upload search index I to the cloud.
– Use filters and apply DFS to T
– Obtain the corresponding documentation
– Sort the results according to matches
– Forward the appropriate D’ to the end user
– Receive keys from KDS
– Decrypt and show the outcome
– Go back

References

  1. Dai, X, Dai, H, Rong, C, Yang, G, Xiao, F, and Xiao, B (2022). Enhanced semantic-aware multi-keyword ranked search scheme over encrypted cloud data. IEEE Transactions on Cloud Computing. 10, 2595-2612. https://doi.org/10.1109/TCC.2020.3047921
    CrossRef
  2. Liu, L, and Chen, Q (2020). A novel feature matching ranked search mechanism over encrypted cloud data. IEEE Access. 8, 114057-114065. https://doi.org/10.1109/ACCESS.2020.3002236
    CrossRef
  3. Das, D, and Kalra, S . An efficient LSI based multi-keyword ranked search algorithm on encrypted data in cloud environment., Proceedings of 2020 International Wireless Communications and Mobile Computing (IWCMC), 2020, Limassol, Cyprus, Array, pp.1777-1782. https://doi.org/10.1109/IWCMC48107.2020.9148123
  4. Guo, C, Zhuang, R, Chang, CC, and Yuan, Q (2019). multi-keyword ranked search based on bloom filter over encrypted cloud data. IEEE Access. 7, 35826-35837. https://doi.org/10.1109/ACCESS.2019.2904763
    CrossRef
  5. Liu, G, Yang, G, Bai, S, Wang, H, and Xiang, Y (2022). FASE: a fast and accurate privacy-preserving multi-keyword top-k retrieval scheme over encrypted cloud data. IEEE Transactions on Services Computing. 15, 1855-1867. https://doi.org/10.1109/TSC.2020.3023393
    CrossRef
  6. Gawade, S, and Kadu, S . Secure data storage and efficient data retrieval over cloud using sensitive hashing., Proceedings of 2018 2nd International Conference on Intelligent Computing and Control Systems (ICICCS), 2018, Madurai, India, Array, pp.87-92. https://doi.org/10.1109/ICCONS.2018.8662973
  7. Xu, J, Huang, X, Yang, G, and Wu, Y . An efficient multi-keyword top-k search scheme over encrypted cloud data., Proceedings of 2018 15th International Symposium on Pervasive Systems, Algorithms and Networks (I-SPAN), 2018, Yichang, China, Array, pp.305-310. https://doi.org/10.1109/I-SPAN.2018.00059
  8. Mlgheit, JR, Houssein, EH, and Zayed, HH . Efficient privacy preserving of multi-keyword ranked search model over encrypted cloud computing., Proceedings of 2018 1st International Conference on Computer Applications & Information Security (ICCAIS), 2018, Riyadh, Saudi Arabia, Array, pp.1-6. https://doi.org/10.1109/CAIS.2018.8441944
  9. Peng, T, Lin, Y, Yao, X, and Zhang, W (2018). An efficient ranked multi-keyword search for multiple data owners over encrypted cloud data. IEEE Access. 6, 21924-21933. https://doi.org/10.1109/ACCESS.2018.2828404
    CrossRef
  10. Deepa, N, Vijayakumar, P, Rawal, BS, and Balamurugan, B . An extensive review and possible attack on the privacy preserving ranked multi-keyword search for multiple data owners in cloud computing., Proceedings of 2017 IEEE International Conference on Smart Cloud (SmartCloud), 2017, New York, NY, USA, Array, pp.149-154. https://doi.org/10.1109/SmartCloud.2017.30
  11. Seema, S, Harshitha, Y, and Apoorva, P . Centralized multi-user and dynamic multi-keywords search scheme over encrypted cloud data., Proceedings of 2017 International Conference on Communication and Signal Processing (ICCSP), 2017, Chennai, India, Array, pp.0913-0917. https://doi.org/10.1109/ICCSP.2017.8286502
  12. Jivane, AB . Time efficient privacy-preserving multi-keyword ranked search over encrypted cloud data., Proceedings of 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), 2017, Chennai, India, Array, pp.497-503. https://doi.org/10.1109/ICPCSI.2017.8392345
  13. Ponnusamy, PP, Vidhyapriya, R, and Maheswari, SU . A survey on multi-keyword ranked search manipulations over encrypted cloud data., Proceedings of 2017 International Conference on Computer Communication and Informatics (ICCCI), 2017, Coimbatore, India, Array, pp.1-5. https://doi.org/10.1109/ICCCI.2017.8117731
  14. Saiharitha, V, and Saritha, SJ . A privacy and dynamic multi-keyword ranked search scheme over cloud data encrypted., Proceedings of 2016 International Conference on Communication and Electronics Systems (ICCES), 2016, Coimbatore, India, Array, pp.1-5. https://doi.org/10.1109/CESYS.2016.7890001
  15. Yao, X, Lin, Y, Liu, Q, and Zhang, J (2018). Privacy-preserving search over encrypted personal health record in multi-source cloud. IEEE Access. 6, 3809-3823. https://doi.org/10.1109/ACCESS.2018.2793304
    CrossRef
  16. Dai, H, Ji, Y, Yang, G, Huang, H, and Yi, X (2019). A privacy-preserving multi-keyword ranked search over encrypted data in hybrid clouds. IEEE Access. 8, 4895-4907. https://doi.org/10.1109/ACCESS.2019.2963096
    CrossRef
  17. Li, H, Liu, D, Dai, Y, Luan, TH, and Shen, XS (2015). Enabling efficient multi-keyword ranked search over encrypted mobile cloud data through blind storage. IEEE Transactions on Emerging Topics in Computing. 3, 127-138. https://doi.org/10.1109/TETC.2014.2371239
    CrossRef
  18. Ajai, AK, and Rajesh, RS . Hierarchical multi-keyword ranked search for secured document retrieval in public clouds., Proceedings of 2014 International Conference on Communication and Network Technologies, 2014, Sivakasi, India, Array, pp.33-37. https://doi.org/10.1109/CNT.2014.7062720
  19. Cao, N, Wang, C, Li, M, Ren, K, and Lou, W (2014). Privacy-preserving multi-keyword ranked search over encrypted cloud data. IEEE Transactions on Parallel and Distributed Systems. 25, 222-233. https://doi.org/10.1109/TPDS.2013.45
    CrossRef

Share this article on :

Related articles in IJFIS