Bear: Over 4000 downloads

Sat Oct 19 08:37:40 CEST 2019

According to PePy the package reached over 4000 (4063 and counting) total downloads since its publishing to PyPI on June 24th, 2019.

This is how it looked like in the beginning when there was just a simple sys.argv and pretty-printing:

import sys
from hashlib import md5
from pprint import pprint

hashfiles = {}

for fname in sys.argv[1:]:
    try:
        with open(fname, 'rb') as fcontent:
            fhash = md5(fcontent.read()).hexdigest()
    except FileNotFoundError:
        continue

    if fhash not in hashfiles:
        hashfiles[fhash] = [fname]
    else:
        hashfiles[fhash].extend([fname])

pprint(hashfiles)

Once the script has proven itself useful on multiple occasions to me, I needed to switch from a function to something safer, automatized, scalable and ditch the manual file removal (rm <file>). Cleaning my storage was pinned on a TODO list.

commit 272ec8073d10a9564a511a3540a06ff798f6a89f
Author: Peter Badida <KeyWeeUsr@users.noreply.github.com>
Date:   Sat Jun 22 21:59:16 2019 +0200

    [Add] Add getting MD5 hash for a string input

diff --git a/bear/__init__.py b/bear/__init__.py
new file mode 100644
index 0000000..a0b247a
--- /dev/null
+++ b/bear/__init__.py
@@ -0,0 +1,9 @@
+"""
+Main file for the Bear package.
+"""
+
+from hashlib import md5
+
+
+def hasher(inp):
+    return md5(inp).hexdigest()

Since creating the first package file Bear has become more than just some simple hashing function with arguments later provided through argparse.

Currently Bear supports multiple hashing algorithms to satisfy various taste or collisions concerns and implements new techniques for excluding files to be checked together with a recovery option in case the checking crashes due to an unexpected external factor (e.g. kill -9, OOM, hard shutdown) or stucks on the machine if there is no free CPU available.

Bear became a robust duplicate checker.

Check project website and showcase video on YouTube, give it a try.

Project: Bear

Mon Aug 19 23:34:09 CEST 2019

Problem

Working daily with a tremendous amount of files can give one a headache. Once those files get repetitive especially code, pictures or recordings it’ll take a huge chunk out of your storage. There are multiple ways how to handle such a problem one of which is having a good categorization for the content you work with. However that approach is seriously broken for designers, photographers, programmers and many others because even with good categorization there still will be this one quick-fix, minor customization or that one special-snowflake customer. Even a plain office job can clutter your computer with multiple versions of documents, spreadsheets or presentations.

How many times has it happened that you found similar contents of a folder:

  • report.docx

  • report2.docx

  • final_report.docx

  • final_report_backup.docx

  • report2_new.docx

  • final_report_backup (2).docx

Looking familiar? Perhaps final_report.docx is just a copy of report2.docx that did not need any edit after all? Maybe you have hundreds or thousands of such files? Certainly you won’t filter those files by hand and most likely you don’t want to just throw them away. Thinking about cheap backups in a free cloud storage? Will you buy additional space just to backup your whole drive because no one can figure out what the hell is even on that machine?

Solution

No need to do that. Once you can identify the messy folder(s), Bear, the decluttering deduplicator can save you. Remove clutter of duplicated content on your computer with confidence, find all the trash you’ve been collecting for a while and decide what do you want to purge out of list of duplicated files or – just let the program remove everything except the oldest or the newest version of the content.

Now, what actually is “the content”? Simply said, imagine having notes in a file:

+----------------------+             +----------------------+
| Desktop/notes.docx   |             | Documents/notes.docx |
+----------------------+   =======   +----------------------+
| * buy eggs           |             | * buy eggs           |
| * clean kitchen      |   =======   | * clean kitchen      |
| * fix computer       |             | * fix computer       |
+----------------------+             +----------------------+

As it illustrates, there is a file on your Desktop and in your Documents folder and both are the same which means double the space that’s necessary to have for storing the file and you can safely delete one of them in case there was no intention for keeping two copies of the same file. That being said, this small change might not seem enough, but imagine the same description applied to large presentations, documents or sound files that take place in tens of megabytes.

Once you start removing such duplicates, soon you realize how much free space you’ve gained and looking through the various files from the high-level point of view will slightly poke you to set the order to the files, to categorize them and eventually make from your machine a clean workspace where you are not a slave of your content.

How to start

  1. Go to the project website

  2. Check basic documentation in README.md

  3. Navigate to releases section

  4. Choose your platform and download

  5. In case of GNU/Linux or MacOS make the file executable with:

    chmod +x <file>
    
  6. Enjoy

FROM scratch

Mon Aug 19 23:25:31 CEST 2019

A minor Docker’s reserved empty image reference if you will.

This website needed a major rewrite since beginning when I decided to have some kind of a landing page with links to multiple websites/platforms I interact with. It lost its point though as well as the website’s old design. This time I’ll just leave it in a blog-like shape, separate RST pages as “articles” and perhaps pull some interesting ones into a separate navigation section.