classification   10179

« earlier    

GitHub - NatLibFi/Annif: Annif is a statistical automated indexing tool for libraries, archives and museums. This repository is used for developing a production version of the system, based on ideas from the initial prototype.
Annif is an automated subject indexing toolkit. It was originally created as a statistical automated indexing tool that used metadata from the discovery interface as a training corpus. This repo contains a rewritten production version of Annif based on the prototype. via Pocket
catalog  classification  mach  text  tools  diglib 
3 days ago by kintopp
Scale, whose army of humans annotate raw data to train self-driving and other AI systems, nabs $18M – TechCrunch
The artificial intelligence revolution is underway in the world of technology, but as it turns out, some of the most faithful foot soldiers are still humans.
ai  marketplaces  classification 
6 days ago by marshallk
Necsus | How machines see the world: Understanding image annotation
big companies like Amazon (Amazon Mechanical Turk) can hire a large number of digital workers, who manually annotate images presented to them. Working from home at their computers, these digital annotators describe, pigeonhole, mark, segment, and frame images. For example, when a strawberry is shown on the screen, they will label it ‘strawberry’ (object classification). All tagged images are then organised into semantic areas based on their labelling, and later collected in databases used to train machines and algorithms. But what does ‘annotation’ mean? To annotate means to define areas in an image and assign them a value. The information, or metadata, can be for instance a series of keywords that attribute a semantic value to the chosen portion of the image. To create a machine vision system able to automatically find a cat and define its location in a picture, for example, a large collection of manually annotated images is required. The tasks digital workers are assigned reflect ones that will subsequently be performed by machines and algorithms. These tasks include:

Object classification (Fig. 1): determining whether an object is present or absent in the image (Is there a cat in the image? Are human beings present in the image?).

Object detection (Fig. 2): identifying a particular object and its arrangement in space (Where is the dog located?). In this case, the worker is asked to draw a bounding box around a single object.

Scene classification (Fig. 3): classifying a given environment. Questions such as Is the building a museum or a hospital? are presented to the annotator, who has to assign the corresponding label.

Image segmentation or pixel-level image segmentation (Fig. 4): determining which object a pixel in the image belongs to. The worker is asked to outline single objects’ profiles and annotate every area separately.

Attribute recognition (Fig. 5): defining the visual properties or qualities of objects – how an object looks and not just where is it located. The worker is asked to choose adjectives that describe the object (Is the scene ‘cold’ or ‘hot’?)....

is it possible to reduce an image, a visual experience, to a mere group of words? Is it possible to translate visual information into language?...

In some cases, the images presented to the annotator do not match her knowledge, and therefore create an obstacle and force the worker to find a solution. The use of synonyms can also be problematic....

some crowdsourcing platforms establish a list of terms for which models will be trained, called attribute vocabulary...

Two additional cases are particularly problematic for annotators: describing an object that is partially hidden by other elements in the image, and objects reflected by surfaces such as mirrors or present in transparent containers
machine_vision  classification  metadata  annotation 
7 days ago by shannon_mattern
Image classification - Prodigy Support
I think I’m not understanding something basic about the API. If I need to categorize text into 20 classes, do I need to make 20 different datasets? Or do I need to pretrain a spacy model to randomly output those classes …
prodigy  image  images  classification  labeling  multiclass  multi-class  annotation  annotations 
8 days ago by nharbour
NLP-progress - Tracking Progress in Natural Language Processing
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
nlp  deeplearning  research  AI  machinelearning  ner  translation  speechrecognition  sentimentanalysis  textanalysis  classification  entityextraction 
9 days ago by sachaa
Chaitanya Chemudugunta
papers on topic modeling, text analysis, nlp
nlp  classification  topic  text  textmining 
18 days ago by tswaterman
Unidentifiable fossils: palaeontological problematica
There is a detailed vocabulary used to describe organisms which defy classification and a system of nomenclature to denote confidence limits on probable or speculative affinities, but they are generally grouped together as “problematica”. A handy grab-bag of misfits that have exasperated or eluded scientists, ready for future generations to have a go at. In museums, problematica specimens reside in drawers and cabinets equivalent to the ubiquitous drawer of odds and sods that most people have in the kitchen.
Science  palaeontology  classification  problematica  species  evolution 
18 days ago by zzkt

« earlier    

related tags

2018  551  ai  aip  airbnb  algorithm  algorithms  amidonnier  analysis  analytical-holistic  andrewgroover  annotation  annotations  anthropology  anttiaarne  api  approximation  arcgis  archives  archiving  art  artificial_intelligence  audio  automation  axioms  being-becoming  bestpractices  bias  big-peeps  big  bio  biology  blé  blés  botany  boulangerie  card-catalogues  cart  cartography  catalog  cataloging  categorization  causation  census  characteristics  chinese  civilization  classifier  classify  cloudcomputing  clustering  coalitions  coarse-fine  comparison  competition  complexity  compliance  concept  conceptual-vocab  controlledvocab  convolution  convolutional  cooperate-defect  crypto  cryptocurrencies  curiosity  da  darwinian  data-science  data  data_science  datascience  dataviz  davidneale  dcm  decisiontree  decisiontrees  deep-learning  deep-materialism  deep-time  deeplearning  density  depthwise-convolutions  devops  dewey  digital_archives  digital_art  diglib  direct-indirect  discrete  distribution  dl  documentation  dumpling  duplication  ebooks  email  ends-means  engrain  entityextraction  epistemic  epistemology  equilibrium  estimator  europe  evolution  examples  facebook  factor_analysis  fair  fairness  fashun  fastai  favorites  fewshot  fine-grained  five  flask  folklore  food  forms-instances  funny  ga360  gdpr  gender  genetics  genomes  genre  geology  gis  github  gnon  goods  google  group-level  guide  handbook  heat  homo-hetero  hypothesis-testing  ideology  ifttt  image  images  index-cards  index  indexes  industry  information  infrastructure  installation  intelligence  interchange  international  intricacy  is-ought  jsic  keras  label  labeling  law  leadership  learn  learning  libraries  literature  logistic  longevity  mach  machine-learning  machine  machine_learning  machine_vision  machinelearning  marketplaces  metabuch  metadata  ml  mm-17/18  models  motif  multi-class  multi  multiclass  multispecies  naivebayes  naturallanguageprocessing  nature  ner  neural  neuralnets  new-religion  nihil  nlp  oclc  optimization  organization  palaeontology  peace-violence  personality  philosophy  physics  plants  policy  problematica  prodigy  product  programming  python  pytorch  r  race  rachelehrenberg  racism  randomforest  ratio  realness  recognition  reference  regression  reputation  research  resource  ronaldlanner  room  s-factor  saliency  schelling  science  scoring  security  selection  semantic  sentimentanalysis  server  services  sexism  shopping  skeleton  smoothness  socialjustice  species  speechrecognition  standards  statistics  status  stiththompson  story  stratification  subjective-objective  syllabus  symbology  sysadmin  system  systems  tagging  taxonomy  teaching  techdebt  telos-atelos  tensorflow  text-classification  text  text_analysis  textanalysis  textmining  the-classics  thinking  time-series  time  todo  token  tools  topic  toread  tradcultures  trademark  traits  transfer-learning  transfer  transferlearning  translation  trees  tricks  tutorial  twitter  valuation  values  verbs  visualization  war  web  wiki  wikipedia  within-group  within-without  wood  word2vec  writing  épautre   

Copy this bookmark: