How to Use PHP and Regex to Scan Large Files for Keyword Matches?

Here's a very simple PHP script to scan a text file for occurrences of a given keyword using regular expressions.

<?php
    $keyword = "your_keyword"; // Replace with your keyword
    $file_path = "your_file.txt"; // Replace with your file path

    // Check if the file exists
    if (!file_exists($file_path)) {
        die("File not found.");
    }

    // Load file content
    $content = file_get_contents($file_path);

    // Escaping special characters in the keyword
    $keyword = preg_quote($keyword, '/');

    // Regex pattern
    $pattern = "/$keyword/i";

    // Check matches
    preg_match_all($pattern, $content, $matches);

    // Print matches
    $occurrences = count($matches[0]);
    echo "The keyword '{$keyword}' occurs {$occurrences} time(s) in the file.";

?>

In the script above, we're using `preg_match_all` function, which performs a global regular expression match. The pattern `/$keyword/i` is a case-insensitive (`i`) match for the keyword. This script will find all occurrences of the keyword in the file and print the total number of matches.

Please note, you will need to replace "your_keyword" with the keyword you want to search for and "your_file.txt" with the path to the text file you wish to scan.

Also, this script won't be super fast on very large files, as it reads the entire file into memory before scanning. For very large files, you may want to use a more efficient method of scanning the file in chunks, such as using the `fread` function in a loop, but this would make the script more complex.

Using fread to search large files in chunks

Below is an example of using `fread` in a loop to scan the contents of a large file in chunks. This script reads a chunk of the file, scans it for the keyword, then moves on to the next chunk. This allows it to handle very large files that wouldn't fit into memory.

<?php
    $keyword = "your_keyword"; // Replace with your keyword
    $file_path = "your_file.txt"; // Replace with your file path
    $chunk_size = 8192; // Size of chunks in bytes

    // Check if the file exists
    if (!file_exists($file_path)) {
        die("File not found.");
    }

    // Open the file
    $file_handle = fopen($file_path, "r");
    if (!$file_handle) {
        die("Unable to open file.");
    }

    // Escaping special characters in the keyword
    $keyword = preg_quote($keyword, '/');

    // Regex pattern
    $pattern = "/$keyword/i";

    $occurrences = 0;

    // Loop over the file
    while (!feof($file_handle)) {
        // Read a chunk of the file
        $chunk = fread($file_handle, $chunk_size);

        // Check matches
        preg_match_all($pattern, $chunk, $matches);

        // Add to total occurrences
        $occurrences += count($matches[0]);
    }

    // Close the file
    fclose($file_handle);

    // Print matches
    echo "The keyword '{$keyword}' occurs {$occurrences} time(s) in the file.";

?>

The chunk size of `8192` bytes (8KB) is just an example. You may adjust the chunk size based on the available memory and the file size you are dealing with. Please replace "your_keyword" with the keyword you want to search for and "your_file.txt" with the path to the text file you want to scan.

Be aware that if your keyword splits across two chunks, this method will fail to detect it. For files where this is a potential issue, you would need a more complex solution to handle that scenario.

Posted by: - Tue, Aug 1, 2023. This article has been viewed 162 times.
Online URL: https://www.articlediary.com/article/how-to-use-php-and-regex-to-scan-large-files-for-keyword-matches-1081.html

Powered by PHPKB (Knowledge Base Software)