Bear: Over 4000 downloadsΒΆ

Sat Oct 19 08:37:40 CEST 2019

According to PePy the package reached over 4000 (4063 and counting) total downloads since its publishing to PyPI on June 24th, 2019.

This is how it looked like in the beginning when there was just a simple sys.argv and pretty-printing:

import sys
from hashlib import md5
from pprint import pprint

hashfiles = {}

for fname in sys.argv[1:]:
    try:
        with open(fname, 'rb') as fcontent:
            fhash = md5(fcontent.read()).hexdigest()
    except FileNotFoundError:
        continue

    if fhash not in hashfiles:
        hashfiles[fhash] = [fname]
    else:
        hashfiles[fhash].extend([fname])

pprint(hashfiles)

Once the script has proven itself useful on multiple occasions to me, I needed to switch from a function to something safer, automatized, scalable and ditch the manual file removal (rm <file>). Cleaning my storage was pinned on a TODO list.

commit 272ec8073d10a9564a511a3540a06ff798f6a89f
Author: Peter Badida <KeyWeeUsr@users.noreply.github.com>
Date:   Sat Jun 22 21:59:16 2019 +0200

    [Add] Add getting MD5 hash for a string input

diff --git a/bear/__init__.py b/bear/__init__.py
new file mode 100644
index 0000000..a0b247a
--- /dev/null
+++ b/bear/__init__.py
@@ -0,0 +1,9 @@
+"""
+Main file for the Bear package.
+"""
+
+from hashlib import md5
+
+
+def hasher(inp):
+    return md5(inp).hexdigest()

Since creating the first package file Bear has become more than just some simple hashing function with arguments later provided through argparse.

Currently Bear supports multiple hashing algorithms to satisfy various taste or collisions concerns and implements new techniques for excluding files to be checked together with a recovery option in case the checking crashes due to an unexpected external factor (e.g. kill -9, OOM, hard shutdown) or stucks on the machine if there is no free CPU available.

Bear became a robust duplicate checker.

Check project website and showcase video on YouTube, give it a try.