The missing “Value” of “AI”-generated Content

The image shows colourful dots randomly spread on a black background. Source: https://www.publicdomainpictures.net/en/view-image.php?image=234852&picture=ruido-de-colorThis foxhole has been quite silent. There are lots of reasons for this. The main reason is that I write lots of other stuff for other blogs, technical documentation, or code. I recently came across a text a colleague sent me. He asked for feedback on a manual-like document. The text was reworked multiple times, and it was another iteration with additional text and rearrangement. ChatGPT created some parts of it. After spending more than an hour with the document, my feedback contained criticism for vague terms, missing definitions, too many general statements, and a complete lack of structures that could act like a manual. It looked liked a collection of thought and references along with some instructions. I could not pinpoint what part was created by ChatGPT, but the document had no good flow when reading it. So this is the reason for this article.

Using Large Language Models (LLMs) for text generation has spread to widely available tools and platforms. These models have gone through some iterations, but they only grew bigger. The underlying technology is still the same. Start-ups (i.e. companies with no solid business model) hope that making these models more complex will yield better results. The generated texts look good and are easy to read, but they completely lack any thoughtful structure. If you write a manual to instruct persons in doing something sensibly, then the generated texts require some didactics. You need to divide the topics into sensible groups. Instructions must be easy to understand and concise. Technical documents require a clear definition of terms and concepts. All topics should be in the right order. This usually means in ascending order of complexity. LLMs cannot do this because they have no cognitive skills. This is not meant as arrogance, machines cannot think, at least for now.

People use LLMs for summarising content. If you use common search engines, then you will find lots of tutorials on how to use “AI” tools to create summaries. The problem is, that the LLM engines cannot summarise content. They just remove most parts and reorder the rest. The result may look like a summary, but it is not. There is a reason why writing summaries is a tool for teaching. You can check if someone understood the key points of a document, book, or novel. A good summary requires a lot of thought, because you need to use the key ideas and remove everything else. I have read some LLM-generated summaries. When reading, I noticed a lack of structure and repetitions. Someone gave me a three-paragraph text to help me write documentation. I had to rewrite the piece, because it wouldn’t fit. This is another noticeable difference. If you ask a person to produce a text fragment which can be inserted into a different document, then this person needs to have the skill to write a fragment. A fragment is not a randomly shortened text. You need to know how the beginning and the end can be connected to other texts–which you do not know yet.

And there is the question of energy consumption. LLMs are destroying Iceland right as you read this sentence. Think twice before using lots of energy and maybe money for mediocre results.

j j j

Producing actual Content is hard – Information Flow on Social Media Sites

This blog is the second incarnation of a private “news” outlet. Blogs and information channels dealing with personal stuff have a different topic selection. There is no need to stay ahead of current events. Your hobby and your interests may or may not be interesting for others. If you don’t want to deep dive into long articles, fancy content management systems, or blog software, then there is microblogging. Just write a few sentences, add a picture, and you are done. People used Twitter this way before it turned into a cesspool of questionable accounts publishing hate speech. So what about alternatives? The Fediverse looks ok, right? Right, but there is a catch.

Centralised platforms have an agenda. They usually need to earn money, so there is a general purpose and direction for the distributed content. News is a fine product to put on your platform, but creating it is expensive. If you ask real journalists, then news is something no one else knows before it is published. Plus, it is true and checked for errors in advance. Ideally, it is neutral, but information takes sides. Wouldn’t it be nice to have a decentralised platform where making money is not the principal goal? The Fediverse looks like the place you should publish your microblogging texts. I have spent some time on Twitter, deleted my account, and moved my microblogging activity to the Fediverse. The atmosphere is different. There are fewer news channels available. This may change, but it depends on the culture of the local Fediverse server. There are discussions about what content to federate, how the toots should look like, how you should mark the content (sensitive or not), and lots of other nuances in opinion. Moderating content is difficult. Given the people who will abuse your platform, you probably need to check the toots sporadically. And so you are back to the problems of centralised microblogging platforms.

My main argument in favour of the Fediverse is the missing agenda. While no billionaire with mental illness can buy the platform and destroy it, the volunteers can run out of resources. Keep this in mind and think about donating to your local instance or to Free Software in general.

j j j

Strange microblogging habits – “from social media (can be anything, sorry)”

I use a few social media network platforms. I never used Facebook, but I started using Twitter a long time ago. A few years later, a Mastodon account was added. The “community” is wildly different. Twitter is known for powerful shit storms triggered by a few words. There is an endless discussion about which system is better. Centralised, decentralised, moderated, unmoderated, free flow, free spirit, more social, less community, stupidity filters, and more is discussed. It all boils down to the fact that small groups have a higher IQ and less evil intent on average than large groups of humans. Manipulation happens on all levels. No surprise there. This is where my observation comes into play. I frequently read the sentence “From birdsite (can be anything, sorry)” on Mastodon when someone quotes a tweet from Twitter. Why?

  • “from birdsite” makes no sense. You cannot make something disappear just by refusing to call it by name. It’s the same as saying “because of the sun” instead of using references to the climate change.
  • “can by anything” makes even less sense. If you squeeze data into the maximum length of a tweet or a toot, then you will have to leave out some information. Furthermore, if I can write anything by using a microblogging account, then the published text will be anything. So anything you quote, be it tweet, toot or fart, will most likely be anything.
  • “sorry” is at the top of making no sense at all. If you don’t want to mention Twitter, have realised that microblogging platforms can be used by anyone or anything, then why the hell would you be sorry? If you really were sorry, then you wouldn’t cross-post the information.

From my blog (can be anything, sorry). 😂

j j j

Modern Way of attending online Meetings

COVID-19 has transformed the digital world. A lot of events where people can meet one another have gone virtual. Digital teleconferencing platforms are the places to go. The problem is the design of many of these platforms. As someone who uses the NoScript add-on and a sophisticated email filtering system the registration process can be a challenge. There is a recent case I want to discuss. It is not meant as a rant, but instead as a hint for developers of these platforms.

The registration process uses multiple sites. There is the portal from where I got the tickets (by way of a partner). This site connects to the conference registration page. From there you get redirected to the ticket shop. After entering the voucher code one gets to select the talks in order to compile a personalised schedule. The last step of the registration process is the confirmation click. The web site then tells me to expect a follow-up with the confirmation information from a specific domain. You have to know that the registration process only gives you access to another registration procedure on a different site.

The email never arrives. So the account is registered, but there is no link to complete the registration at the conference platform. After some emails the missed email with the instructions and the link arrive. The sender is completely different from the one in the announcement. Following the link leads to a questionnaire about interests, personal data, business data, and more. You also get to set the password. After yet another confirmation everything is ready to connect to the conference platform. The problem is that the personalised schedule is nowhere to be found. The platform shows the live events instead. There is no trace of my preferences. Well, at least watching presentations is possible.

So what should a conference platform look like. Here are some hints for developers:

  • Have one registration portal.
  • Don’t use trackers or advertising platforms for all sites necessary for administration.
  • Handle registration details and handover or activation of subsequent systems in the background.
  • Create a documentation of what domains participants need to white-list. Useful for preventing phishing.
  • If registrations are missing before the events starts, send out reminders.
  • When sending messages, please refrain from using fancy layout, bloated HTML content (better use no HTML at all), and all tracking features in message.
  • Create an emergency contact for questions regarding the registration process.
j j j

Stadtführer durch die Welt der Nachrichten

Erste Erinnerungen an das Zeitungslesen enthalten bei mir das Attentat auf Papst Johannes Paul II. Mit etwas Arithmetik lese ich daher seit mehr als 40 Jahren Zeitungen. Ich denke ich habe mit 7, 8 oder 9 Jahren angefangen. Erste Zeitungen waren die Bild Zeitung, eine mittelhessische Lokalgröße und Wochenmagazine (ich glaube der Stern). Der Spiegel war rar, ebenso wie andere Zeitungen. Im Zuge der Schulbildung kamen dann andere Publikationen hinzu. Wenn ich den Konsum der Nachrichten in den vergangenen Dekaden in mein Gedächtnis zurückrufe, dann hat sich einiges an der Art zu Lesen und an den Artikel selbst geändert. Die Texte der Nachrichtenartikel enthalten weniger Informationen. Der erste Absatz, der eigentlich den Kern der Nachricht skizzieren soll, ist zu einem Trailer für Kinofilme geworden. Man erfährt nur Fragmente, die in Frage gestellt oder offen gelassen werden. Das erleichtert das Lesen, weil die Titel oder diese Teaser schon ankündigen was zu erwarten ist. Generell verwende ich diese Regeln beim Lesen:

  • Alle Artikel mit einem Fragezeichen „?“ im Titel muss man nicht lesen. Es handelt sich um keine Nachrichten.
  • Alle Artikel mit den „Top n“ Dingen im Titel muss man nicht lesen. Es handelt sich nur um Aufzählungen, die durch die reduzierte Anzahl (meist ungerade oder 10) von Begriffen nur die Aufmerksamkeit stehlen wollen. Seriöse Artikel verzichten auf die Anzahl und erklären stattdessen im Titel worum es geht.
  • Wenn im Teaser etwas stark angekündigt und dann wieder relativiert wird, dann kann man sich das Lesen auch sparen. Meist sind es Formulierungen, die den ersten Teil des Teasers relativieren, um eine Unsicherheit und damit Neugierde zu erzeugen. Im Englisch nennt man diese Köder clickbait. Es geht ja schließlich um gute Statistiken im Newsroom Backend.
  • Manche Medien packen Blogartikel oder Kommentare in Artikelboxen und nennen es dann Diskurs, Community oder Meinung. 99,9% Prozent dieser Beiträge enthalten keine Nachrichten. Nichts gegen Perspektivwechsel, aber gleich eine ganze Sektion auf Basis von redaktionslosen Blogtexten zu importieren ist doch etwas dünn für Interessierte, die Fakten suchen.
  • Alle Artikel, deren Rückgrat ein Bild oder ein Video ist muss man nicht lesen. Wer sucht schon auf Videoportalen nach Nachrichten? Ja, es gibt Nachrichtensendungen, aber die sind dann Fernsehen (oder digital archiviert dann Streams in der Mediathek). Je redaktioneller Aufwand, desto Nachrichten. Ein Bild oder ein verwackeltes Video von einem Smartphone ist keine Information.

Ausgerüstet mit dieser Checkliste beschränkt sich das digitale Zeitunglesen auf wenige Minuten pro Tag pro Medium. Mit manchen Zeitungen ist man nach weniger als 20 Sekunden schon fertig. Da fragt man sich wozu ein Abonnement gut sein soll, wenn die Leseproben Mist sind und die vermeintlich guten Artikel hinter der Paywall versteckt werden. Marketing funktioniert anders.

Was bleibt? Ich bin auf Wochenzeitschriften umgestiegen. Ich lese gerne längere und gut vorbereitete Artikel. Bestimmte Blogs sind auch dazu in der Lage, vorausgesetzt die dargestellten Informationen sind gut recherchiert und in Text gegossen. Der Nachteil an Blogs ist die Suche danach und der damit verbundene Zeitaufwand.

j j j

Excavation is the title of the new Blog

Texts age. Some do not age well. As with many other aspects of the human existence mindsets and the relationship to one’s writing change. This has happened to the old blog (which has been turned into an archive and will not be hyperlinked from here). The Excavation blog carries the name foxhole. The term is ambiguous, because it is the den of an animal, can be your only shelter in times of crisis, or can be a site of freshly dug up revelation (as opposed to the rabbit hole – often a place of mental disorders; don’t follow white rabbits, eat them instead). Genesis out. Your turn.

j j j