GitHub - NatLibFi/Annif: Annif is a statistical automated indexing tool for libraries, archives and museums. This repository is used for developing a production version of the system, based on ideas from the initial prototype.
Annif is an automated subject indexing toolkit. It was originally created as a statistical automated indexing tool that used metadata from the discovery interface as a training corpus. This repo contains a rewritten production version of Annif based on the prototype. via Pocket
Scale, whose army of humans annotate raw data to train self-driving and other AI systems, nabs $18M – TechCrunch
The artificial intelligence revolution is underway in the world of technology, but as it turns out, some of the most faithful foot soldiers are still humans.
Necsus | How machines see the world: Understanding image annotation
big companies like Amazon (Amazon Mechanical Turk) can hire a large number of digital workers, who manually annotate images presented to them. Working from home at their computers, these digital annotators describe, pigeonhole, mark, segment, and frame images. For example, when a strawberry is shown on the screen, they will label it ‘strawberry’ (object classification). All tagged images are then organised into semantic areas based on their labelling, and later collected in databases used to train machines and algorithms. But what does ‘annotation’ mean? To annotate means to define areas in an image and assign them a value. The information, or metadata, can be for instance a series of keywords that attribute a semantic value to the chosen portion of the image. To create a machine vision system able to automatically find a cat and define its location in a picture, for example, a large collection of manually annotated images is required. The tasks digital workers are assigned reflect ones that will subsequently be performed by machines and algorithms. These tasks include:

Object classification (Fig. 1): determining whether an object is present or absent in the image (Is there a cat in the image? Are human beings present in the image?).

Object detection (Fig. 2): identifying a particular object and its arrangement in space (Where is the dog located?). In this case, the worker is asked to draw a bounding box around a single object.

Scene classification (Fig. 3): classifying a given environment. Questions such as Is the building a museum or a hospital? are presented to the annotator, who has to assign the corresponding label.

Image segmentation or pixel-level image segmentation (Fig. 4): determining which object a pixel in the image belongs to. The worker is asked to outline single objects’ profiles and annotate every area separately.

Attribute recognition (Fig. 5): defining the visual properties or qualities of objects – how an object looks and not just where is it located. The worker is asked to choose adjectives that describe the object (Is the scene ‘cold’ or ‘hot’?)....

is it possible to reduce an image, a visual experience, to a mere group of words? Is it possible to translate visual information into language?...

In some cases, the images presented to the annotator do not match her knowledge, and therefore create an obstacle and force the worker to find a solution. The use of synonyms can also be problematic....

some crowdsourcing platforms establish a list of terms for which models will be trained, called attribute vocabulary...

Two additional cases are particularly problematic for annotators: describing an object that is partially hidden by other elements in the image, and objects reflected by surfaces such as mirrors or present in transparent containers
Image classification - Prodigy Support
I think I’m not understanding something basic about the API. If I need to categorize text into 20 classes, do I need to make 20 different datasets? Or do I need to pretrain a spacy model to randomly output those classes …
NLP-progress - Tracking Progress in Natural Language Processing
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
Chaitanya Chemudugunta
papers on topic modeling, text analysis, nlp
Unidentifiable fossils: palaeontological problematica
There is a detailed vocabulary used to describe organisms which defy classification and a system of nomenclature to denote confidence limits on probable or speculative affinities, but they are generally grouped together as “problematica”. A handy grab-bag of misfits that have exasperated or eluded scientists, ready for future generations to have a go at. In museums, problematica specimens reside in drawers and cabinets equivalent to the ubiquitous drawer of odds and sods that most people have in the kitchen.
