mapreduce   11513

« earlier    

What is a good way to create an item-item similarity matrix for a recommendation engine where items aren't actually rated by users, but rather 'used', 'clicked', 'bought' or 'played' by users? - Quora
"I've been using the Jaccard Coefficient, and specifically, the Tanimoto Coefficient, both described at to calculate item-item similarity.  They are both measures of overlap.

The formula is

AB / ( A + B - AB)"
recommendation  webdev  python  algorithm  mapreduce 
9 weeks ago by jake101
Docker & Kubernetes on Apache Hadoop YARN
leverage Apache Hadoop YARN to further enhance the meaningfulness of Hadoop for our users by bringing together the worlds of Data and PaaS by leveraging Docker, Google Kubernetes & Red Hat OpenShift on YARN. As you can see, the goal is to enable common resource management across data and PaaS workloads in a seamless fashion. Furthermore, the exciting developments in the Apache Hadoop HDFS community to develop an Ozone, an Object Store on HDFS, is another giant step in this direction.
scalability  analytics  mapreduce 
10 weeks ago by janpeuker
Apache Apex - Unified Stream & Batch Processing Engine
Enterprise-grade unified stream and batch processing engine.
Now with event-time windowing and high-level API.
Enterprise Grade.

Apex is a Hadoop YARN native platform that unifies stream and batch processing. It processes big data in-motion in a way that is highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and easily operable.
streaming  Hadoop  opensource  mapreduce  big-data  Java  framework 
november 2017 by liqweed
Apache Apex - Unified platform for big data stream & batch processing (on Hadoop & YARN)
A unified platform for big data stream and batch processing. Use cases include ingestion, ETL, real-time analytics, alerts and real-time actions. Apex is a Hadoop-native YARN implementation and uses HDFS by default. It simplifies development and productization of Hadoop applications by reducing time to market. Key features include Enterprise Grade Operability with Fault Tolerance, State Management, Event Processing Guarantees, No Data Loss, In-memory Performance & Scalability and Native Window Support.
Java  Hadoop  mapreduce  streaming  opensource  big-data 
october 2017 by liqweed

« earlier    

related tags

advocacy  algorithm  algorithms  amazon  analytics  apache  app  architecture  article  asynchronous_programming  avro  avrò  aws-lambda  aws  bash  bi  big-data  big  bigdata  bigquery  blockchain  business  calcite  cascading  cassandra  cli  clojure  cloud  cluster  clustering  clusters  code  coding  collection  collectors  compsci  computerscience  computing  concurrency  containers  coreos  course  criticism  cryptography  data/science  data  database  databases  datalake  datapipeline  dataplatform  datastore  dataviz  datawarehouse  dbms  delicious-import  delicious  dev  development  distcp  distributed  distributedsystem  docker  doop  drill  editor  emr  engine  etl  example  facebook  fileformats  filter  flume  for  forthecomments  framework  functional  functionalprogramming  gae  git  github  go  go_tr  golang  google  graph  hadoop  hbase  hdfs  history  hive  howto  important  integration  intro  java  java8  javascript  jobtracker  kafka  kmeans  kubernetes  lambda  local  logfiles  machinelearning  map-reduce  map  mapr  memory  mesos  metadata  mobile  mr1  mr2  mrjob  networking  node.js  nosql  nsa  opensource  openstreetmap  package  parallel  patterns  paypal  performance  php  pig  pipeline  presentation  processing  programming  python  pywren  r  rdbms  realtime  recommendation  reduce  reference  relational  relationship  review  rhadoop  s3  scala  scalability  scalding  scale  schema  science  security  serverless  shell  shellscript  slides  smartcontracts  software  spark  sql  sqoop  storage  storm  stream  streaming  sysadmin  system  teaching  tech  techtalk  tez  tutorial  twitter  typescript  udf  video  web  webdev  yarn 

Copy this bookmark: