Using grep On A Large File
I have had an increased interest in hacking and have been playing around with brute force attacks using Python. I was thinking of better ways to brute force and the first solution that came to mind was a password list. I went to have i been pwned? to see if I could get the database used for checking if passwords have been used. Of course I couldn’t find it on the website, it would be very irresponsible of someone to do that. So I made it a mission to go out and find my own password lists. I found a few useful lists but the one I was mainly after was Collection #1 breach. I finally found it after a few hours of searching. After downloading it I started a search for my email address. I used grep "joshsisto@gmail.com" collection1.txt
to search for my email address and it found my email address and previous password! It took over an hour to search the 80+GB file. I needed a quicker way to search for it. After a little googling I discovered an article on stackoverflow using parallel to search it in a fraction of the time. The command I ended up using was parallel --pipe --block 100M grep “joshsisto@gmail.com” < collection1.txt
. Using parallel I was able to bring most searches down to 20 minutes or less, about a 5x improvement. I’ll eventually use this data for a project, I just haven’t figured out the right one yet.