TUVIMEN

reliq (C)

My very own query language for html. The reason for it's existence is me not realizing that xpath existed and since i needed a cli tool i've made one. I also didn't know about pup's existence.

It has more concise syntax than xpath and can easily return json, it's also faster, more powerful and extendable than xpath, although compared to xpath 3 it's not really turing complete (yet).

It was first named hgrep but a couple months after it's release another more popular project with the same name was created so people started to get confused.

Here's an example of it's syntax xenforo1.reliq, and an image of it with highlighting xenforo1.reliq.png (adding highlighting to github is a lot of corporate bs).

csas (C)

A file manager heavily inspired by ranger and lf, but a hundred times faster and more memory efficient (not a hyperbolization), slightly slower at loading than nnn but sightly more memory efficient.

It was designed to handle directories containing millions of files. It has it's own config file format with a lot of options, calculator, file opener, sxiv integration, workspaces, multiple selections, filters, events system, bulkrename and scout command similar to unix find.

It was developed around 2019 and since then has been my only file manager, unfortunately it never got much attention so it's quite unpolished and suited mainly for me.

forumscraper (python)

Universal, automatic and extensive forum scraper based on reliq. It supports Invision Power Board, PhpBB, Simple Machines Forum, XenForo, XMB, Hacker News, StackExchange and vBulletin engines.

It has replaced my other bash scripts xenforo-scraper, invision-scraper, phpbb-scraper, smf-scraper, stackexchange-scraper and xmbforum-scraper. But it has extended them tremendously by correcting them to more websites, adding support for guessing, identifying, getting boards and forums pages and finding the root of a site.

It can be used as a cli tool and python library.

torge (shell)

My tool for quick searching for torrents/books. It has support for thepiratebay, limetorrents, 1337x, rarbg, nyaa and libgen. It integrates well with fzf and shell scripts, it even supports csv and json output. Someone even made a web app on this script.

It's a merged and upgraded version of my previous projects tpb, tlt, t1337x, tnyaa, trarbg and libgen.

lyryx (python)

A cli tool made for downloading lyrics for music. It supports azlyrics, genius, mojim, tekstowo and lyricsjonk and for some of them can download pretty much the whole sites.

It was a catalyst to creation of reliq, ironically after a couple of years i've rewritten it to learn and compare it to xpath.

sunt (bash)

My sophisticated script for transferring and storing files between my remotes, has syntax similar to git.

reliq-python (python)

Python wrapper for reliq library written with ctypes. Python has tremendously stupid packaging system that is COMPLETELY undocumented so the only way of learning about it was from github actions of other projects like raylib that are sane enough to use cffi instead of ctypes, what couldn't be learned had to be guessed. Also pip on windows is retarded and so is pypi.

wordpress-madara-scraper (bash)

A scraper made for sites using the madara wordpress extention although focused mainly on the ones with images. It can download the whole sites with structure describing comics and listing links for chapters. It can also immediately download images so I use it as manhwa downloader. This tool is very useful if you want to fill your disk with life time supply of manhwa.

mangabuddy-scraper (bash)

Similar to wordpress-madara-scraper but made for specific site and since more tailored to it. It can download all metadata about lists, comics and chapters (image links, comments, reviews, basic info) and images. It has 3 modes of scraping: only images, images and basic data, images and all that can be found, you can also get data without images. Overall you could get more than 100GB of metadata out of the whole site.

The only reason for making it was because it had a manhwa i could not find anywhere else and it had very silly anti downloading system, seeing the well defined html structure i've made my scraper for all metadata on this site.

elektroda-scraper (bash)

A scraper for famously not so helpful polish electronic forum. I did not add it to forumscraper since it's not a forum engine.

disqus-scraper (bash)

Well, everyone knows disqus, an external commenting system for websites with just a link. Looking into it i've discovered that they use only one barely obfuscated api auth key that grants you unlimited transfer for all visible things. With this you can very quickly and effortlessly download all comments from any site. In a month i've downloaded over 100TB of data from it. The best of it is that the script has 107 lines of code and uses only curl and jq as dependencies, there is no scraping it's just downloading all data available, and it's very clean data at that 10/10.

pornhub-scraper, xhamster-scraper, xvideos-scraper, xnxx-scraper, redtube-scraper, eporner-scraper, youporn-scraper, tube8-scraper, hclips-scraper (bash)

These type of sites have a lot of data, data about humans. It's very easy to scrape and there's a lot of it so why not.

I've tried my best to make them as extensive as possible so they have all data that can be accessed on the site like the views per time in the video, comments, tags, etc. Above that most of them have all other categories than videos like playlists, channels, galleries, pornstars, users.

For the less known sites you can expect around 3GB of data each. For xnxx and xvideos about 9GB each but only about videos (comments included).

More social and popular platforms like pornhub and xhamster give around 30GB without counting users, with them you could get hundreds of GB.

Analyzing the data from them leads to funny conclusions.

sexstories-scraper (shell)

It download all stories into files named by their titles, there is no special file format, First 8 lines are headers with basic information about them.

This one is an actual goldmine of funny and bad stories. It takes about 1.3GB uncompressed and it downloads quickly so have a laugh.

Others

I have a couple of other projects on my github, although they are not as grand. I also have a lot of useful projects that i did not publish on github since making something usable for others is a pain so they stay useful only for me.

If you have any cool ideas about my projects or new ones you can contact me.