tobym + analysis   39

Datasette — Datasette documentation
simple csv data set inspection from simon willison
data  publishing  sqlite  analysis  csv 
8 weeks ago by tobym
OmniSci | The Extreme Analytics™ Platform | Omnisci
Realtime big data visualization and analytics. Used by be called MapD.
bigdata  analysis  visualization  dataviz  realtime 
12 weeks ago by tobym
DBToaster - Welcome to
Engine for continuous analytical queries. Sounds similar to PipelineDB (, Noria, maybe KQSL.

DBToaster is an SQL-to-native-code compiler. It generates lightweight, specialized, embeddable query engines for applications that require real-time, low-latency data processing and monitoring capabilities. The DBToaster compiler generates code that can be easily incorporated into any C++ or JVM-based (Java, Scala, ...) project.

Since 2009, DBToaster has spearheaded the currently ongoing database compilers revolution. If you are looking for the fastest possible execution of continuous analytical queries, DBToaster is the answer. DBToaster code is 3-6 orders of magnitude faster than all other systems known to us.
database  olap  analysis  analytics 
october 2018 by tobym
angr, a binary analysis framework
angr is a python framework for analyzing binaries. It combines both static and dynamic symbolic ("concolic") analysis, making it applicable to a variety of tasks
binary  analysis 
may 2018 by tobym
igraph – Network analysis software
igraph is a collection of network analysis tools with the emphasis on efficiency, portability and ease of use. igraph is open source and free. igraph can be programmed in R, Python and C/C++.
graph  analysis  python 
january 2018 by tobym
csvkit documentation
csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats.
csv  data  analysis  datamining  cli  python  documentation  reference 
april 2017 by tobym
tcptrace - Official Homepage
tcptrace is a tool written by Shawn Ostermann at Ohio University, for analysis of TCP dump files. It can take as input the files produced by several popular packet-capture programs, including tcpdump, snoop, etherpeek, HP Net Metrix, and WinDump. tcptrace can produce several different types of output containing information on each connection seen, such as elapsed time, bytes and segments sent and received, retransmissions, round trip times, window advertisements, throughput, and more. It can also produce a number of graphs for further analysis.
tcp  networking  analysis  tools 
february 2017 by tobym
ELKI Data Mining Framework
Datamining software in Java with algorithms focused on unsupervised cluster analysis and outlier detection. AGPLv3. Alleges to use smart data structures or "index structures" perhaps meaning the data is laid out linearly but there are indexes to access it (e.g. R* tree).
ml  machinelearning  datascience  clustering  algorithms  analysis  datamining 
february 2017 by tobym
airbnb/superset: Superset is a data exploration platform designed to be visual, intuitive, and interactive
Superset is a data exploration platform designed to be visual, intuitive and interactive.

[this project used to be named Caravel, and Panoramix in the past]

Compare with Pivot, maybe Grafana. Also compare to a lightweight Tableau!

Originally designed as a UI for Druid, now also supports a SQL backend via SQAlchemy.
dashboard  data  visualization  analysis 
january 2017 by tobym
Moose is a platform for software and data analysis.
It helps programmers craft custom analyses cheaply.
It's based on Pharo and it's open source under BSD/MIT.
data  analysis  code  visualization 
november 2016 by tobym
CodeCity is an integrated environment for software analysis, in which software systems are visualized as interactive, navigable 3D cities. The classes are represented as buildings in the city, while the packages are depicted as the districts in which the buildings reside. The visible properties of the city artifacts depict a set of chosen software metrics, as in the polymetric views of CodeCrawler.
code  visualization  analysis  awesome  research 
november 2016 by tobym
USE Method: Rosetta Stone of Performance Checklists
USE (utilization, saturation, errors) checklists for Linux, Solaris, Mac OS X, FreeBSD
linux  sysadmin  performance  systems  analysis 
august 2015 by tobym
The USE Method
The Utilization Saturation and Errors (USE) Method is a methodology for analyzing the performance of any system. It directs the construction of a checklist, which for server analysis can be used for quickly identifying resource bottlenecks or errors. It begins by posing questions, and then seeks answers, instead of beginning with given metrics (partial answers) and trying to work backwards.
sysadmin  systems  analysis  performance 
august 2015 by tobym
UA string analysis ::
This tool was developed for user agent string analysis. Our analysis gives you information on client SW type (browser, webcrawler, anonymizer etc.), which OS is used by the client and moreover we display detailed analysis of UA fragments together with the URL of client's SW producer and our commentary.
user-agent  browser  analysis  tool 
september 2012 by tobym
MOA Massive Online Analysis
Real Time Analytics for Data Streams
Related to the WEKA project, but for learning from a stream.
datamining  java  streaming  stream  analysis  machinelearning  ml  weka 
august 2012 by tobym
SPEAR Algorithm @ Michael G. Noll
The SPEAR algorithm is a tool for ranking users in social networks by their expertise and influence within the community
social  analysis  algorithms  from delicious
june 2011 by tobym
CRM114 is a system to examine incoming e-mail, system log streams, data files or other data streams, and to sort, filter, or alter the incoming files or data streams according to the user's wildest desires. Criteria for categorization of data can be via a host of methods, including regexes, approximate regexes, a Hidden Markov Model, Bayesian Chain Rule Orthogonal Sparse Bigrams, Winnow, Correlation, KNN/Hyperspace, Bit Entropy, CLUMP, SVM, Neural Networks ( or by other means)
opensource  bayes  analysis  classification  compsci  from delicious
may 2011 by tobym
new data cleaning/transformation tool. Looks basically identical to Freebase GridWorks aka Google Refine
data  analysis  tools  from delicious
february 2011 by tobym
google-refine - Project Hosting on Google Code
pretty awesome downloadable tool to interact with data, like an uber-spreadsheet. Great for cleaning up/normalizing dirty datasets.

This is the same thing as Freebase's Gridworks, just re-released under Google's name since they just bought Freebase.
google  data  analysis  datamining  opensource  tools  spreadsheet  freebase  gridworks 
november 2010 by tobym
jmotif - Project Hosting on Google Code
Implements algorithms for time-series analysis and datamining in Java and R.
R  java  opensource  datamining  timeseries  analysis 
november 2010 by tobym
Carrot2 Clustering Engine
Open-source "search results clustering" engine. Has integration with Solr, lucene, nutch, yahoo, google, bing, and others. In java, but has a REST interface.

Designed for in-memory clustering of up to 1,000 documents of a few paragraphs each. (For bigger scale, use Mahout)
search  lucene  semantic  visualization  cluster  analysis  opensource  java 
november 2010 by tobym
Developer Portal - Evri
Evri has opened a text analysis API, meant to extract relevant information and insights from unstructured text.
api  semanticweb  analysis  nlp 
july 2010 by tobym
SNAP: Stanford Network Analysis Platform
SNAP is a general purpose network analysis and graph mining library. It is written in C++ and easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges.
programming  library  graph  analysis  software  stanford  c++  network  snap  computing  compsci 
may 2010 by tobym

Copy this bookmark: