Searching

Search string and find the number of occurrences

Pinterest LinkedIn Tumblr

grep command is used to search text or searches the given file for lines containing a match to the given strings or words.

The grep search the string from given file and returns the number of occurrences and index for each string in the file. Write program to take file, search string and find the number of occurrences and list of index values from the given string.

source Code

package com.dsacode.Algorithm.search;
 
import java.io.IOException;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
 
public class FileGrep {
 
    private static final int MAPSIZE = 4 * 1024 ;
 
    private static String searchFor(String grepfor, Path path) throws IOException {
        final byte[] tosearch = grepfor.getBytes(StandardCharsets.UTF_8);
        StringBuilder report = new StringBuilder();
        int padding = 1;
        int linecount = 0;
        int matches = 0;
        boolean inword = false;
        boolean scantolineend = false;
        try (FileChannel channel = FileChannel.open(path, StandardOpenOption.READ)) {
            final long length = channel.size();
            int pos = 0;
            while (pos < length) {
                long remaining = length - pos;
                int trymap = MAPSIZE + tosearch.length + padding;
                int tomap = (int)Math.min(trymap, remaining);
                int limit = trymap == tomap ? MAPSIZE : (tomap - tosearch.length);
                MappedByteBuffer buffer = channel.map(MapMode.READ_ONLY, pos, tomap);
                System.out.println("Mapped from " + pos + " for " + tomap);
                pos += (trymap == tomap) ? MAPSIZE : tomap;
                for (int i = 0; i < limit; i++) {
                    final byte b = buffer.get(i);
                    if (scantolineend) {
                        if (b == '\n') {
                            scantolineend = false;
                            inword = false;
                            linecount ++;
                        }
                    } else if (b == '\n') {
                        linecount++;
                        inword = false;
                    } else if (b == '\r' || b == ' ') {
                        inword = false;
                    } else if (!inword) {
                        if (wordMatch(buffer, i, tomap, tosearch)) {
                            matches++;
                            i += tosearch.length - 1;
                            if (report.length() > 0) {
                                report.append(", ");
                            }
                            report.append(linecount);
                            scantolineend = true;
                        } else {
                            inword = true;
                        }
                    }
                }
            }
        }
        return "Times found at : " + matches + "\nWord found at : " + report;
    }
 
    private static boolean wordMatch(MappedByteBuffer buffer, int pos, int tomap, byte[] tosearch) {
        for (int i = 0; i < tosearch.length; i++) {
            if (tosearch[i] != buffer.get(pos + i)) {
                return false;
            }
        }
        byte nxt = (pos + tosearch.length) == tomap ? (byte)' ' : buffer.get(pos + tosearch.length);
        return nxt == ' ' || nxt == '\n' || nxt == '\r';
    }
     
    public static void main(String[] args) throws IOException {
         
        String grepfor = "static";
        String BASE_DIR ="XXXXXXX";
        String FILE_PATH="src/com/dsacode/Algorithm/search/FileGrep.java";
        Path path = Paths.get(BASE_DIR+FILE_PATH);
 
        System.out.println("Search file Name: '"+ FILE_PATH+ "'");
        System.out.println("Search keyword  : '"+ grepfor+"'");
        String report = searchFor(grepfor, path);
 
        System.out.println(report);
    }
}

Output

Search file Name: 'src/com/dsacode/Algorithm/search/FileGrep.java'
Search keyword: 'static'
Mapped from 0 for 3709
Times found at: 4
Word found at: 13, 15, 66, 76

Algorithm Explanation

Take search string and filename as input.
Open the file and define the buffer to read the content from the file.
If the keyword matches, increments the count and add a position to the list. If the search finds the newline, increase line count.
Print the number of search string match and number of indexes.

Time Complexity

Best CaseAverage CaseWorst Case
O(n)O(n)O(n)

Reference

http://unixhelp.ed.ac.uk/CGI/man-cgi?grep

Write A Comment