csv   9650

« earlier    

The Journal of Brief Ideas
File extensions for data sharing sometimes lie about their contents.

Here is an algorithm to infer the actual delimiter of a CSV, TSV or any related format:

Assume that alpha-numeric characters (A-Z, a-z, 0-9) and the period/full stop (.) are cannot be delimiters.
Begin with input text a.
Store a short sample of a as b, by copying n lines from the input text a.
Rank every character that appears in b by frequency, creating candidate delimiters c
For every candidate delimiter d in c, split each line in b. If every line has the same number of splits, d is the delimiter.
An implementation of this idea is available on Zenodo and within the Python Package Index.
1 hour ago by euler
Datos banco mundial
Página para descargar multitud de datos (csv, excel...).
datos  csv 
3 days ago by franzz2000
JSON Lines
This page describes the JSON Lines text format, also called newline-delimited JSON. JSON Lines is a convenient format for storing structured data that may be processed one record at a time. It works well with unix-style text processing tools and shell pipelines. It's a great format for log files. It's also a flexible format for passing messages between cooperating processes.
json  csv  standards 
5 days ago by dserodio

« earlier    

related tags

!beyond-seven-review  analyzer  annoyance  apps  async  asynchronicity  atom  awk  aws  bestpractices  bioinformatics  chart  charts  cli  coding  commandline  commands  conversion  convert  converter  data-analysis  data-visualisation  data  database  databases  datasci  datascience  datos  delimitedtext  dev  development  dlang  documentation  download  dsl  editing  editor  elixir  excel  file-format  fileformat  fileformats  floss  foss  fusion_table  gems  generator  github  google  googledrive  gpu  graph  graphics  gsheets  history  howto  html  iis  iiv  import  interesting  javascript  js  json  kml-files  kml  library  lint  linux  loading  log  log4net  logging  mapping  markdown  matplotlib  metadata  microsoft  minimalism  mock-data  mock  notebook  open  opensource  pandas  paper  parser  parsing  pdf  php  pivot  plots  plugin  productivity  programming  purrr  py  python  query  r  rdf  read_csv  reconcile  reference  ruby  rust  s3  scala  scraper  scraping  script  scripting  sed  shell  software  solution  source  spatial-data  spreadsheet  sql  sqlite  sqlite3  standards  stats  stream  strings  tabdelimited  table  tables  tabular  testing  text-analysis  textprocessing  tidy  tidyverse  tips  tool  toolkit  tools  toread  tsv  tutorial  twitter  unicode  unix  utf8  util  utility  vim  visualisation  visualization  viz  vocabulary  vscode  vxrebalance  workaround  worldwideweb  xls  xls2csv  xlsx 

Copy this bookmark: