This weekend I revisited an old project Using grep On A Large File. TL;DR I downloaded a 80+GB email and password text file dump. This probably includes your past passwords. You can find out at have i been pwned

I really like this project because it contains real email addresses and passwords to aid you in ethical hacking. I find this information especially useful because of so many people reusing old passwords. In the future I plan to write scripts that will create modified versions of the passwords contained in this list. For example, if your password was ‘password’. I want the script to produce (‘Password’, ‘password1’, ‘password!’, ‘Password1!’) and many other variations similiar that meet typcal security requirements.

Previously searches on this file were taking an hour or more using grep. After making optimizations I got it down to 20 minutes. I was proud of this but it was still not quite usable taking that long, especially if I wanted to make many searches. After more googling I found bgrep. This ended up being extremely confusing because I think that there are two different programs named bgrep. Fortunately bgrep did lead me to look which ultimately solved my problem.

Originally I thought that bgrep used the binary search algorith. I thought this solution would solve my issue since grep searches the file from beginning to end. To sort the data I used sort -S 95% -o sorted_passwords.txt unsorted_passwords.txt. The -S 95% helped speed up the process by allowing the command to use 95% of system memory. This command still took over 2 hours to complete on my 2017 macbook pro. After the data was sorted I attempted to get whatever version of bgrep that I installed on my system working. While I was researching bgrep I stumbled across the look command. If you do man look you will see “The look utility displays any lines in file which contain string as a prefix. As look performs a binary search, the lines in file must be sorted.”. This is perfect since I already have the data sorted. I can now type look joshsisto@gmail.com sorted_passwords.txt and it finds my email and password almost instantly! Using the time command I can see it takes on average .7 seconds to search the entire file.

This has been a fun project that I will continue to build upon. Remember to change your passwords!