Scraping the (RSS) Barrel

I consume a lot of content via Miniflux, an RSS/ATOM “feed” reader. Many news sites and services provide some content via this method. However, it is sometimes incomplete. The idea is that news sites want you to visit their site to generate advertising dollars.

No thanks.

Miniflux adds the ability to fetch the original article (in case they just provide a snippet) and then “scrape” the source for content and display that inside the article, instead. This allows you to get a nice, mostly uncluttered, view of the content you want. The main problem with this is: Most websites have odd layouts and weird ways of inserting content.

For example. NPR posts quotes in their articles via:

<div class="bucketwrap pullquote ...">
  <aside aria-label="pullquote" ...>
    <div class="bucket" ...>
      <p>Nathan is such a froody dude.</p>
    </div>
  </aside>
</div>

You know, instead of:

<blockquote>
  <p>Nathan is such a froody dude</p>
  <p><cite>Nathan</cide></p>
</div>

OR, if you’re feeling fancy:

<figure>
  <blockquote>
    <p>Nathan is such a froody dude</p>
    <p><cite>Nathan</cite></p>
  </blockquote>
</figure>

Miniflux does an okay job parsing this mess. In fact, the reason it does so well is that there are already some predefined rules included. We can, however, do much better than that.

While I know that this can be a cat and mouse game between a scraper and the site to be scraped, I can at least make my best efforts to clean up the junk that comprises an average article. To that end, I’ve begun cataloging the various scraper rules that I use to clean up articles. It’s by no means a perfect list of cleanup rules, but it makes articles much better to read.

If you’re reading this in some far flung future, and have a rule you’d like to share with me, and by that extension: the world, send me an email with the scraping rule and the site and I’ll add it.

A World Without CSS or JavaScript

Would the world be better without these tools? Not too long ago, I might have agreed with that. The problem is people, not (entirely) the technology.

Kev recently posted about a designer’s thoughts on a style-less and javascript-less world. A world in which every page looks like something straight out of the 1980’s. While the idea behind these thoughts is sound: The current system allows for abuse and is often used harmfully. The problem with this thinking is that regardless of the medium, humans will abuse it to the maximum extent to get what they want. HTML is easy to abuse, and other formats that try and slim it down more, like Gemini and Markdown are still very abusable.

Ultimately, I think that Bradely is too close to the problem. As a designer, I can imagine that seeing such abuses daily, in and around your own work, and work of others in your peer group, can really drain the love of the craft out of your life.

Mastodon Icons!

It took some doing, but I finally have a Mastodon icon for my social media menu. It may not seem like it’s important, but I’m really nit-picky about this crap.

A picture of the social media icons on my site (menu and footer).
Yay!

Turns out it took a bit of doing. I installed Add Fediverse Icons to Jetpack, then (thanks to the developer of that plugin) I added this snippet he wrote for me:

add_filter( 'walker_nav_menu_start_el', 'my_apply_icon', 100, 4 );
function my_apply_icon( $item_output, $item, $depth, $args ) {
	$social_icons = array(
		'Diaspora'   => 'diaspora',
		'Friendica'  => 'friendica',
		'GNU Social' => 'gnu-social',
		'Mastodon'   => 'mastodon',
		'PeerTube'   => 'peertube',
		'Pixelfed'   => 'pixelfed',
	);

	if ( 'social' === $args->theme_location ) {
		foreach ( $social_icons as $attr => $value ) {
			if ( false !== stripos( $item_output, $attr ) ) {
                                $item_output = preg_replace( '@<svg(.*?)</svg>@', '', $item_output );
				$item_output = str_ireplace(
					$args->link_after,
					'</span>' . jetpack_social_menu_get_svg( array( 'icon' => esc_attr( $value ) ) ),
					$item_output
				);
			}
		}
	}

	return $item_output;
}

This added the icon to successfully to my menu. However, it was giant. I noticed in the web inspector in Firefox, that the bounding box for the SVG was huge. I changed it to be 24 pixels tall like so:

.icon.icon-mastodon {
  height: 24px;
}

Finally, things are good!

Edit: I keep all my CSS edits in Git.

I’ve been bouncing back and forth over theme choices, because as a former webmaster, I feel compelled to make my site look good. The problem with this is that I’m not a designer. I can make things work, but I often need an original framework off of which to work.

To that end, I ended up with the Scratchpad WordPress.com theme. This theme is neat, and full of whimsy. It has a lot of different options already styled. One thing I did notice is that the Jetpack-powered Portfolio extensions aren’t well styled. I’ve done some quick styling to patch up the visual issues with the Portfolio Types page:

.jetpack-portfolio {
  background-color: #FFF;
  padding: 1em;
}

.jetpack-portfolio a svg {
  transform: rotate(-30deg);
  position: absolute;
  top: -40px;
}

This just puts the posts on that category page into line with the theme look and feel. There are more tweaks to do to make it more mobile friendly, but it already works pretty well as-is.

Hope this helps!