Bedrock by Expensify
Bedrock is a simple, modular, WAN-replicated data foundation for global-scale applications.
sql  distributed  sqlite 
november 2016
bitsy
Bitsy is a small, fast, embeddable, durable in-memory graph database that implements the Blueprints API.
database  graphdb  java 
october 2016
Home - Zulu.org
Now there’s an easy way to embrace OpenJDK. Zulu is a tested and certified build of OpenJDK. Ready for Linux, Mac, Windows, Docker, hypervisors & cloud. Download it. Use it. Talk about it. Here at Zulu.org.
devops  java 
august 2016
GitHub - Yelp/dumb-init: A minimal init system for Linux containers
dumb-init is a simple process supervisor and init system designed to run as PID 1 inside minimal container environments (such as Docker). It is a deployed as a small, statically-linked binary written in C.
init  docker 
july 2016
Open Network Insight
For organizations with dynamic data centers and networks, Open Network Insight is an advanced threat detection solution that uses big data analytics, that perform at cloud scale, to provide actionable insights into operational and security threats. Running on Cloudera Enterprise Data Hub (EDH), ONI can analyze billions of events in order to detect unknown threats, insider threats, and gain a new level of visibility into the network.
security  hadoop 
june 2016
Quarks
Apache Quarks is a programming model and micro-kernel style runtime that can be embedded in gateways and small footprint edge devices enabling local, real-time, analytics on the continuous streams of data coming from equipment, vehicles, systems, appliances, devices and sensors of all kinds (for example, Raspberry Pis or smart phones). Working in conjunction with centralized analytic systems, Apache Quarks provides efficient and timely analytics across the whole IoT ecosystem: from the center to the edge.
kafka  iot  networks  analytics 
may 2016
Riak-TS-Python-Demo: Sample Python code that demonstrates how to use the Python client to work with Riak TS (Time Series)
Sample Python 2 and 3 code that demonstrates how to use the Python client to work with Riak TS (Time Series). Riak TS is an extension of Riak KV that is optimized for storing semistructured data by timestamp.
riak  timeseries  python  jupyter  pynb 
may 2016
Contiv
Contiv provides a higher level of networking abstraction for microservices. Contiv secures your application using a rich policy framework. It provides built-in service discovery and service routing for scale out services.
containers  docker  networking  microservices 
may 2016
Introducing and Open Sourcing Ambry - LinkedIn’s New Distributed Object Store | LinkedIn Engineering
Today, we are announcing that Ambry is now available as an open source project under the Apache 2.0 license. Ambry is optimized to store and serve media. Media content has become critical for any website to increase user engagement, virality and monetization. Media pipelines will need to be supported by more companies, especially with the advancement of video and virtual reality. Ambry can play a critical role in this future, and in the future of any company that is interested in diverse kinds of media to a global audience.
distributed  blobstore 
may 2016
Træfɪk
Træfɪk is a modern HTTP reverse proxy and load balancer made to deploy microservices with ease. It supports several backends (Docker, Swarm, Mesos/Marathon, Kubernetes, Consul, Etcd, Zookeeper, BoltDB, Rest API, file…) to manage its configuration automatically and dynamically.
docker  golang  http  microservices 
may 2016
Concord: concord.io
Concord is a distributed stream processing framework built in C++ on top of Apache Mesos, designed for high performance data processing jobs that require flexibility & control.
distributed  streaming  c++  mesos 
may 2016
Distributed systems theory for the distributed systems engineer : Paper Trail
A little theory is, in this case, not such a dangerous thing. So I tried to come up with a list of what I consider the basic concepts that are applicable to my every-day job as a distributed systems engineer; what I consider ‘table stakes’ for distributed systems engineers competent enough to design a new system. Let me know what you think I missed!
distributed 
may 2016
Building real-time dashboard applications with Apache Flink, Elasticsearch, and Kibana | Elastic
In this blog post, we demonstrate how to build a real-time dashboard solution for stream data analytics using Apache Flink, Elasticsearch, and Kibana.
flink  kibana  elasticsearch  streaming  infoviz 
may 2016
Conda + Spark — quasiben.github.io
Here I am going to demonstrate how we can ship a Python environment, complete with desired dependencies, as part of a Spark job without installing Python on every node.
spark  conda  deployment  python 
may 2016
GitHub - gunthercox/ChatterBot: ChatterBot is a machine learning, conversational dialog engine.
ChatterBot is a machine-learning based conversational dialog engine build in Python which makes it possible to generate responses based on collections of known conversations. The language independent design of ChatterBot allows it to be trained to speak any language.
python  chatbot 
may 2016
Errbot — Err 4.0.3 documentation
Errbot is a chatbot, a daemon that connects to your favorite chat service and brings your tools into the conversation.

The goal of the project is to make it easy for you to write your own plugins so you can make it do whatever you want: a deployment, retrieving some information online, trigger a tool via an API, troll a co-worker,...

Errbot is being used in a lot of different contexts: chatops (tools for devops), online gaming chatrooms like EVE, video streaming chatrooms like livecoding.tv, home security, etc.
python  chatbot 
may 2016
Linux network metrics: why you should use nstat instead of netstat
This article is about the differences between netstat and nstat regarding Linux system network metrics, and why nstat is superior to netstat (at least for this purpose.)
linux  networking 
april 2016
Apache Kafka Producer Benchmarks - Java vs. Jython vs. Python · Uncanny Recursions
In my previous post, I wrote about how we can interface Jython with Kafka 0.8.x and use Java consumer clients directly with Python code. As a followup to that, I got curious about what would be the performance difference between Java, Jython and Python clients. In this post, I am publishing some of the benchmarks that I have been doing with Java, Jython and Python producers.
kafka  jython  messaging  python 
april 2016
Spark and Kafka Integration Patterns, Part 2 - Passionate Developer
In this blog post you will learn how to publish stream processing results to Apache Kafka in reliable way. First you will learn how Kafka Producer is working, how to configure Kafka producer and how to setup Kafka cluster to achieve desired reliability. In the second part of the blog post, I will present how to implement convenient library for sending continuous sequence of RDDs (DStream) to Apache Kafka topic, as easy as in the code snippet below.
kafka  scala  spark 
march 2016
Maintain Separate GitHub accounts
Someone recently commented that with Github it is "a pain if you want to have a work and personal identity."

It is? I've had separate work and personal Github accounts for years. I thought everyone knew this trick.
git  github 
march 2016
NFQL
Understanding intricate traffic patterns require sophisticated flow analysis tools that can mine flow records for complex use cases. Unfortunately current tools fail to deliver owing to their language design and simplistic filtering methods. We have designed a network flow query language (NFQL) that aims to cater to such needs.
json  networking 
march 2016
Spotify’s Event Delivery – The Road to the Cloud (Part I) | Labs
In this first post, we’ll explain how our current event delivery system works and talk about some of the lessons we’ve learned from operating it. In the next post, we will cover the design of the new event delivery system, and why we choose Cloud Pub/Sub as the transport mechanism for all the events. In the third and final post, we will explain how we consume all the published events with DataFlow, and what we have discovered about the performance of this approach so far.
kafka  messaging 
march 2016
Introducing Vega-Lite — Medium
Today we are excited to announce the official 1.0 release of Vega-Lite, a high-level format for rapidly creating visualizations for analysis and presentation. With Vega-Lite, one can concisely describe a visualization as a set of encodings that map from data fields to the properties of graphical marks, using a JSON format. Vega-Lite also supports data transformations such as aggregation, binning, filtering, and sorting, along with visual transformations including stacked layouts and faceting into small multiples.
infoviz  javsscript 
march 2016
Connecting Docker Containers, Part Two
This post is part two of a miniseries looking at how to connect Docker containers.

In part one, we looked at the bridge network driver that allows us to connect containers that all live on the same Docker host. Specifically, we looked at three basic, older uses of this network driver: port exposure, port binding, and linking.

In this post, we’ll look at a more advanced, and up-to-date use of the bridge network driver.

We’ll also look at using the overlay network driver for connecting Docker containers across multiple hosts.
networking  consul  docker 
march 2016
Automatic Docker Service Announcement with Registrator :: Jeff Lindsay
No matter which service discovery system you use, it will not likely know how to register your services for you. Service discovery requires your services to somehow announce themselves to the service directory. This is not as trivial as it sounds. There are many approaches to do this, each with their own pros and cons.

In an ideal world, you wouldn't have to do anything special. With Docker, we can actually arrange this with a component I've made called Registrator.

Before I get to Registrator, let's understand what it means to register a service and see what kind of approaches are out there for registering or announcing services. It might also be a good idea to see my last posts on Consul and on service discovery in general.
consul  docker  servicediscovery 
march 2016
Airflow Documentation — Airflow Documentation
Airflow is a platform to programmatically author, schedule and monitor workflows.

Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.

When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.
python  workflow 
march 2016
Spring Cloud Data Flow
A cloud native programming and operating model for composable data microservices on a structured platform. With Spring Cloud Data Flow, developers can create, orchestrate and refactor data pipelines through single programming model for common use cases such as data ingest, real-time analytics, and data import/export.
java  microservices  spring 
march 2016
Spark | Hue
Blog posts related to Hue interconnecting with Apache Spark
spark  hue 
march 2016
Building a REST Job Server for interactive Spark as a service
Building a REST Job Server for interactive Spark as a service by Romain Rigaux and Erick Tryzelaar
livy  spark  rest 
march 2016
Livy Spark REST Server
Livy is an open source REST interface for interacting with Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in YARN.
spark  rest  livy 
march 2016
GitHub - apache/incubator-toree: Mirror of Apache Toree (Incubating)
The main goal of the Toree is to provide the foundation for interactive applications to connect to and use Apache Spark.
python  spark  jupyter  ipynb  scala 
march 2016
Making Python on Apache Hadoop Easier with Anaconda and CDH
Enabling Python development on CDH clusters (for PySpark, for example) is now much easier thanks to new integration with Continuum Analytics’ Python platform (Anaconda). Python has become an increasingly popular tool for data analysis, including data processing, feature engineering, machine learning, and visualization. Data scientists and data engineers enjoy Python’s rich numerical and analytical libraries—such as NumPy, pandas, and scikit-learn—and have long wanted to apply them to large datasets stored in Apache Hadoop clusters. Read More The post Making Python on Apache Hadoop Easier with Anaconda and CDH appeared first on Cloudera Engineering Blog.
python  spark  anaconda  cloudera 
february 2016
Unofficial Storm and Kafka Best Practices Guide - Hortonworks
A collection of best practices from teams implementing Storm and Kafka in production,
kafka  storm  messaging 
february 2016
Introducing FiloDB
I am excited to announce FiloDB, a new open-source distributed columnar database fromTupleJump. FiloDB is designed to ingest streaming data of various types, including machine, event, and time-series data, and run very fast analytical queries over them. In four-letter acronyms, it is an OLAP solution, not OLTP.
distributed  database  cassandra  spark 
february 2016
Storm on Mesos!
Storm integration with the Mesos cluster resource manager.
mesos  storm 
february 2016
The Best of RICON 2015 - Home on Rails
RICON is all about distributed systems. There are a lot of academic (Phd) talks and a few practical ones. I’ve chosen 3 out of 37 videos, which I liked the most. Hope you’ll enjoy these too.
distributed  messaging  nsq  kafka 
february 2016
about | Alpine Linux
Alpine Linux is an independent, non-commercial, general purpose Linux distribution designed for power users who appreciate security, simplicity and resource efficiency.
docker  linux 
january 2016
consulate: Python client for the Consul HTTP API
Consulate is a Python client library and set of application for the Consul service discovery and configuration system.
consul  python 
january 2016
PyExPool: Python Multi-Process Execution Pool
Lightweight Multi-Process Execution Pool to schedule Jobs execution with per-job timeout, optionally grouping them into Tasks and specifying execution parameters:
python  multiprocessing 
january 2016
Upsert Records with PostgreSQL 9.5 - The Hashrocket Blog
For those not familiar with the term upsert, it is also sometimes referred to as a merge. The idea is this. Some data needs to be put in the database. If the data has an existing associated record, that record should be updated, otherwise the data should be freshly inserted as a new record. Depending on the state of the database, a record is either updated or inserted, hence the name upsert. This type of functionality has been available in other relational databases (e.g. MySQL) for a while, so it's exciting to see it makes its way into PostgreSQL.
postgres  sql 
january 2016
Introducing the Kafka Consumer: Getting Started with the New Apache Kafka 0.9 Consumer Client
When Kafka was originally created, it shipped with a Scala producer and consumer client. Over time we came to realize many of the limitations of these APIs. For example, we had a “high-level” consumer API which supported consumer groups and handled failover, but didn’t support many of the more complex usage scenarios. We also had a “simple” consumer client which provided full control, but required users to manage failover and error handling themselves. So we set about redesigning these clients in order to open up many use cases that were hard or impossible with the old clients and establish a set of APIs we could support over the long haul.
kafka  java  messaging 
january 2016
PureSolTechnologies/DuctileDB · GitHub
Ductile DB is a graph database based on Hadoop/HBase which provides a vast set of features.
graph  graphdb  hbase 
january 2016
How to open a port in the firewall on CentOS or RHEL - Ask Xmodulo
Out of the box, enterprise Linux distributions such as CentOS or RHEL come with a powerful firewall built-in, and their default firewall rules are pretty restrictive. Thus if you install any custom services (e.g., web server, NFS, Samba), chances are their traffic will be blocked by the firewall rules. You need to open up necessary ports on the firewall to allow their traffic.
linux  centos  rhel  firewall  networking 
january 2016
Kafka-Pixy (HTTP Proxy)
Kafka-Pixy is a local aggregating HTTP proxy to Kafka with automatic consumer group control. It is designed to hide the complexity of the Kafka client protocol and provide a stupid simple HTTP API that is trivial to implement in any language.

Kafka-Pixy works with Kafka 0.8.2.x and 0.9.0.x. It uses the Kafka Offset Commit/Fetch API to keep track of consumer offsets and ZooKeeper to manage distribution of partitions among consumer group members.
kafka  rest  http 
january 2016
The Elements of Python Style
This document goes beyond PEP8 to cover the core of what I think of as great Python style. It is opinionated, but not too opinionated. It goes beyond mere issues of syntax and module layout, and into areas of paradigm, organization, and architecture. I hope it can be a kind of condensed "Strunk & White" for Python code.
python  codestyle 
january 2016
SQLite Table-Valued Functions with Python
One of the benefits of running an embedded database like SQLite is that you can configure SQLite to call into your application's code. SQLite provides APIs that allow you to create your own scalar functions, aggregate functions, collations, and even your own virtual tables. In this post I'll describe how I used the virtual table APIs to expose a nice API for creating table-valued (or, multi-value) functions in Python. The project is called sqlite-vtfunc and is hosted on GitHub.
python  sqlite 
january 2016
Updated instructions for compiling BerkeleyDB with SQLite for use with Python
About three years ago I posted some instructions for building the Python SQLite driver for use with BerkeleyDB. While those instructions still work, they have the unfortunate consequence of stomping on any other SQLite builds you've installed in /usr/local. I haven't been able to build pysqlite with BerkeleyDB compiled in, because the source amalgamation generated by BerkeleyDB doesn't compile. So that leaves us with dynamically linking, and that requires that we use the BerkeleyDB libsqlite, which is exactly what the previous post described.

In this post I'll describe a better approach. Instead of building a modified version of libsqlite3, we'll modify pysqlite to use the BerkeleyDB libdb_sql library.

Why use BerkeleyDB at all?
...
To sum up, BerkeleyDB might be a good option if you have many concurrent writers.
berkeleydb  keyvalue  python  sqlite 
january 2016
Announcing sophy: fast Python bindings for Sophia Database
Sophia is a powerful key/value database with loads of features packed into a simple C API. In order to use this database in some upcoming projects I've got planned, I decided to write some Python bindings and the result is sophy. In this post, I'll describe the features of Sophia database, and then show example code using sophy, the Python wrapper.
sophiadb  keyvalue  python 
january 2016
Using SQLite4's LSM Storage Engine as a Stand-alone NoSQL Database with Python
As I was reading about SQLite4, I saw that one of the design goals was to provide an interface for pluggable storage engines. At the time I'm writing this, SQLite4 has two built-in storage backends, one of which is an LSM key/value store. Over the past month or two I've been having fun with Cython, writing Python wrappers for the embedded key/value stores UnQLite and Vedis. I figured it would be cool to use Cython to write a Python interface for SQLite4's LSM storage engine.

After pulling down the SQLite4 source code and reading through the LSM header file (it's very small!), I started coding and the result is python-lsm-db (docs).
python  sqlite  lsm  nosql 
january 2016
Python Bindings for the SQLite4 LSM Key/Value Store
Two of my favorite topics to write about are SQLite and Key/Value databases. Imagine my joy when I stumbled across the SQLite4 documentation describing the new key/value database that will serve as SQLite4's default storage layer.

After reading a bit about the new storage layer, I wanted to try it out as a standalone database. I'd recently finished two projects using Cython to wrap embedded C databases (unqlite-python and vedis-python), so implementing the wrapper with Cython was a no-brainer. After pulling down the SQLite4 source code and reading through the LSM header file (it's very small!), I started coding and the result is now on GitHub: python-lsm-db and readthedocs: lsm-db docs.
sqlite  lsm  python 
january 2016
Introduction to the fast new UnQLite Python Bindings
About a year ago, I blogged about some Python bindings I wrote for the embedded NoSQL document store UnQLite. One year later I'm happy to announce that I've rewritten the library using Cython and operations are, in most cases, an order of magnitude faster.

This was my first real attempt at using Cython and the experience was just the right mix of challenging and rewarding. I bought the O'Reilly Cython Book which came in super handy, so if you're interested in getting started with Cython I recommend picking up a copy.

In this post I'll quickly touch on the features of UnQLite, then show you how to use the Python bindings. When you're done reading you should hopefully be ready to use UnQLite in your next Python project.
unqlite  python  cython 
january 2016
Alternative Redis-Like Databases with Python
Recently I've learned about a few new Redis-like databases: Rlite, Vedis and LedisDB. Each of these projects offers a slightly different take on the data-structure server you find in Redis, so I thought that I'd take some time and see how they worked. In this post I'll share what I've learned, and also show you how to use these databases with Walrus, as I've added support for them in the latest 0.3.0 release.
redis  python  nosql  vedis  rlite  ledisdb 
january 2016
Welcome to TinyDB! — TinyDB 3.0.0 documentation
Welcome to TinyDB, your tiny, document oriented database optimized for your happiness :)
python  json  database 
december 2015
F-Secure/see · GitHub
Sandboxed Execution Environment (SEE) is a framework for building test automation in secured Environments.
security  testing  lxc  containers  virtualization 
december 2015
Astro by HuaweiBigData
Astro is fully distributed SQL engine on HBase by leveraging Spark ecosystem. It enables systematic and powerful handling of data pruning, intelligent scan, and pushdowns like custom filters and coprocessor, and make more traditional RDBMS capabilities possible.
spark  hbase  hadoop  sql 
december 2015
SqlPad - A web app for running SQL queries and visualizing the results
SqlPad is a self-hosted web app for writing and running SQL queries and visualizing the results. Its goal is to be a simple tool for exploratory data work and visualizations, ideal for data analysts who would prefer to work in SQL.
sql  web 
december 2015
Hackers do the Haka – Part 1 | This is Security :: by Stormshield
Haka is an open source network security oriented language that allows writing security rules and protocol dissectors. In this first part of a two-part series, we will focus on writing security rules.
wireshark  security  networking  pcap  lua 
november 2015
Pineapple - Python Notebooks for Mac OS X
Pineapple is a self-contained application that requires no other components to work. Installation is a simple drag-n-drop into your Applications folder. It will not interfere with any other installations of languages, libraries, or environments.

The classic IPython notebook interface uses a web browser for the user interface. Pineapple controls are native to the operating system for a consistent, integrated experience.
ipynb  jupyter  python  macosx 
november 2015
Prelude
Prelude is an enhanced Emacs 24 distribution that should make your experience with Emacs both more pleasant and more powerful.
emacs  elisp 
november 2015
init.el for Noah Hoffman
In a fit of literate programming yak-shaving, I implemented my Emacs configuration as an org-mode file. I have also tried to a provide complete-ish description of my environment for anyone interested in starting more or less from scratch.
emacs  elisp 
november 2015
The Benefits of the Gremlin Graph Traversal Machine | DataStax
This blog post will review the benefits of Apache TinkerPop’s Gremlin graph traversal machine for both graph language designers and graph system vendors. A graph language designer develops a language specification (e.g. SPARQL, GraphQL, Cypher, Gremlin) and respective compiler for its evaluation over some graph system. A graph system vendor develops an OTLP graph database (e.g. Titan, Neo4j, OrientDB) and/or an OLAP graph processor (e.g. Titan/Hadoop, Giraph, Hama) for storing and processing graphs. The benefits of the Gremlin traversal machine to these stakeholders are enumerated below and discussed in depth in their respective sections following this prolegomenon.
graph  database  gremlin 
november 2015
Multiagent Systems
Multiagent systems consist of multiple autonomous entities having different information and/or diverging interests. This comprehensive introduction to the field offers a computer science perspective, but also draws on ideas from game theory, economics, operations research, logic, philosophy and linguistics. It will serve as a reference for researchers in each of these fields, and be used as a text for advanced undergraduate and graduate courses.
distributed  systems 
november 2015
Ligra: A Lightweight Graph Processing Framework for Shared Memory
Ligra is a lightweight graph processing framework for shared memory. It is particularly suited for implementing parallel graph traversal algorithms where only a subset of the vertices are processed in an iteration. The project was motivated by the fact that the largest publicly available real-world graphs all fit in shared memory. When graphs fit in shared-memory, processing them using Ligra can give performance improvements of up to orders of magnitude compared to distributed-memory graph processing systems.
graph 
november 2015
RFC 1925 - The Twelve Networking Truths
This memo documents the fundamental truths of networking for the Internet community. This memo does not specify a standard, except in the sense that all standards must implicitly follow the fundamental truths.
networking  ietf  rfc 
november 2015
Axibase Time-Series Database - Axibase
ATSD is designed from the ground-up to store and analyze time-series data at scale. Unlike traditional databases it comes with pre-integrated Visualization and Rule Engine. It’s a forward-looking technology to support your IoT, system monitoring and other Big Data use cases.
timeseries  database  metrics  performance 
november 2015
Axibase | Google cAdvisor
Axibase Time Series Database collects Docker container performance metrics through Google cAdvisor (Container Advisor) for long-term retention, analytics and visualization. A single ATSD instance can collect metrics from multiple Docker hosts and cAdvisors instances.

In a basic configuration cAdvisor monitors all running containers on the Docker host. Container statistics are sent over TCP protocol to the ATSD container installed on the same host. When a new container is launched it will be automatically discovered by cAdvisor and its statistics will be continuously sent into ATSD while the container is running.
timeseries  database  docker  performance 
november 2015
Axibase | What is Docker and how do you monitor it?
Modern companies are becoming overwhelmed with machine data, they can rarely leverage this information to predict operational behavior of their systems, applications, and users because advanced analytics and forecasts require detailed data for 3 to 5 years to be accurate. ATSD solves this problem by storing granular data and layering forecasting on top of it in order to support fully automated, predictive operations at the enterprise scale.
timeseries  performance  docker  database 
november 2015
Advanced Jupyter Notebook Tricks — Part I
Jupyter is so great for interactive exploratory analysis that it's easy to overlook some of its other powerful features and use cases. I wanted to write a blog post on some of the lesser known ways of using Jupyter — but there are so many that I broke the post into two parts.

In Part 1, today, I describe how to use Jupyter to create pipelines and reports. In the next post, I will describe how to use Jupyter to create interactive dashboards.
ipynb  jupyter  python 
november 2015
curio - concurrent I/O
Curio is a modern library for performing reliable concurrent I/O using Python coroutines and the explicit async/await syntax introduced in Python 3.5. Its programming model is based on cooperative multitasking and common system programming abstractions such as threads, sockets, files, subprocesses, locks, and queues. Under the covers, it is based on a task queuing system that is small, fast, and powerful.
concurrency  python 
november 2015
Introducing agate: a Better Data Analysis Library for Journalists - Features - Source: An OpenNews project
In greater depth, agate is a Python data analysis library in the vein of numpy or pandas, but with one crucial difference. Whereas those libraries optimize for the needs of scientists—namely, being incredibly fast when working with vast numerical datasets—agate instead optimizes for the performance of the human who is using it. That means stripping out those technical optimizations and instead focusing on designing code that is easy to learn, readable, and flexible enough to handle any weird data you throw at it.
analysis  python 
november 2015
agate 1.1.0 — agate 1.1.0 documentation
agate is a Python data analysis library that is optimized for humans instead of machines. It is an alternative to numpy and pandas that helps you solve real-world problems with readable code.
python  analytics 
november 2015
How to use the Livy Spark REST Job Server API for submitting batch jar, Python and Streaming Jobs | Hue - Hadoop User Experience - The Apache Hadoop UI
Livy is an open source REST interface for interacting with Spark from anywhere. It supports executing snippets of code or programs in a Spark Context that runs locally or in YARN.
spark  rest  python 
november 2015
Greenplum Database
The Greenplum Database (GPDB) is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced cost-based query optimizer delivering high analytical query performance on large data volumes.
postgresql  bigdata 
november 2015
Qubes OS Project
Qubes is a security-oriented, open-source operating system for personal computers. It uses virtualization to implement security by compartmentalization and supports both Linux and Windows virtual environments. Qubes 3.0 introduces the Hypervisor Abstraction Layer (HAL), which renders Qubes independent of its underlying virtualization system.
linux  security  virtualization 
november 2015
Docker Monitoring Continued: Prometheus and Sysdig | Rancher Labs
I recently compared several docker monitoring tools and services. Since the article went live we have gotten feedback about additional tools that should be included in our survey. I would like to highlight two such tools; Prometheus and Sysdig cloud. Prometheus is a capable self-hosted solution which is easier to manage than sensu. Sysdig cloud on the other hand provides us with another hosted service much like Scout and Datadog. Collectively they help us add more choices to their respective classes. As before I will be using the following six criteria to evaluate Prometheus and Sysdig cloud: 1) ease of deployment, 2) level of detail of information presented, 3) level of aggregation of information from entire deployment, 4) ability to raise alerts from the data and 5)  Ability to monitor non-docker resources 6) cost.
docker  performance  metrics 
november 2015
Comparing Five Monitoring Options for Docker | Rancher Labs
Today’s article has covered several options for monitoring docker containers, ranging from free options; docker stats, CAdvisor or Sensu to paid services such as Scout and DataDog.
docker  performance  metrics 
november 2015
« earlier      
1password 4kmonitor actor agriculture algorithms amplab amqp anaconda analysis analytics api architecture avro aws bash berkeleydb berkshelf bigdata bitcoin blaze blobstore blockchain blogs bokeh boot2docker browser bsd c++ calendar cartography cassandra centos charting chatbot chef chev chrome ci cli click clojure cloud cloudera clustering codestyle coding compiler concurrency conda conference confluent consensus consul containers continuous_delivery coreos cqrs csv cython daemon dash dashboard database datadump datascience datawrangling ddos debugging decorators deployment devops dhcp distnoted distributed django dns dnsmasq docker documentation dotnet ebpf ec2 elasticsearch elisp emacs emr ergonomics etcd eventdriven fabric fig filesystem firewall flask fleet flink flume freebsd fsm ftrace functional gce gcp geospatial gis gist git github gitlab golang grafana graph graphdb graphite gremlin gunicorn hackernews hadoop hbase hdfs headphones heroku homebrew http hue ietf impala influxdb infoviz init inotify instagram ios iot iphone ipynb ipython jails java javascript javsscript json jupyter jython kafka keyvalue kibana kvm lastfm lawncare ledisdb libpcap linux livy logging logstash lsm lua lxc machine_learning macosx marathon matching measurement menubar mesos messaging metrics microservices mirageos multiprocessing nats netflix netflow networking networks nginx nlp nosql nsq ntp numba opensource orgmode os paas pandas pcap pcp performance phusion pinboard pip pldi policy postgres postgresql probabilistic probability protocols proxy pubsub pycon pydata pynb python rabbitmq raft rdkafka redis regex remotework research rest rfc rhel riak rlite rocksdb ruby runit s3 samza scala scheduling screencast scripting security serde servicediscovery shell socks5 solr sophiadb spark spring sql sqlite ssl stackoverflow statistics statsd storm streaming supervision svg systemd systems tcp tcpdump testing textsearch timeseries tinycorelinux topology tornado traceroute tracing training trunknotes unikernels unix unqlite upstart vagrant vedis virtualenv virtualization vldb vpn wallpaper weave web wellnes wifi wireshark workflow wren wsgi xml youtube zeroconf zeromq zookeeper

Copy this bookmark:



description:


tags: