pandas  pd  programGeneration  automation  tm351 
6 weeks ago by psychemedia
Apache Flink: Stateful Computations over Data Streams
I so don't have a handle on how streaming data processing works...
tm351  streamingdata 
10 weeks ago by psychemedia
QB4ST: RDF Data Cube extensions for spatio-temporal components
Hmm... interesting.. QB4ST: RDF Data Cube extensions for spatio-temporal components A bit m…
data  datatype  TM351  StevensNOIR 
10 weeks ago by psychemedia
How can we describe different types of dataset? Ten dataset archetypes – Lost Boy
Ever an interesting read, @ldodds on "How can we describe different types of dataset? Ten dataset archetypes"
data  archetypes  persona  TM351 
10 weeks ago by psychemedia
Why a data scientist is not a data engineer - O'Reilly Media
Interesting... "Why a data scientist is not a data engineer" I think they are both distinct…
tm351  tm358  dataScience  dataEngineering  data  jobs  dataJobs 
april 2019 by psychemedia
Python Record Linkage Toolkit Documentation — Python Record Linkage Toolkit 0.12 documentation
This could be handy for ETL / reconciling data files / partial matching datasets: #ddj
TM351  dataPipeline  fuzzymatch  ETL  ddj 
march 2019 by psychemedia
Garbage Collection in Python - GeeksforGeeks
That why, sometime i get out of running time or out of memory in Jupyter Notebook on my local or kaggle :D
tm351  bestpractice  py  resourceUsage  memory  garbageCollection  from twitter_favs
february 2019 by psychemedia
Convert VDI (VirtualBox) to raw, qcow2, qed, vmdk, vhd in Windows
Generate raw for upload to openstack from virtualbox box; eg:

VBoxManage clonemedium ~/VirtualBox\ VMs/tm351_18J-student/box-disk001.vmdk tm351_18J-student.raw --format RAW
virtualbox  openstack  tm351  VM 
november 2018 by psychemedia
About — Deon
"command line tool that allows you to easily add an ethics checklist to your data science projects"
ethics  dataEthics  TM351  checklist  guide  data 
october 2018 by psychemedia
Closing issues using keywords - User Documentation
Close issues automatically when a PR branch that addresses the issue is merged into default branch.
github  issues  workflow  TM351 
october 2018 by psychemedia
deathbeds/importnb: notebook files as source
@psychemedia imports notebooks as modules, and it has a ton of notebook tests.…
ipynb  workflow  tm351 
may 2018 by psychemedia
testing jupyterhub unicode errors in logging
"I just setup a docker container that runs something in the background! (it was jupyterhub).
It was this gist.
Starting something in my background wasn't my goal, I was aiming to reproduce a bug in a testable environment, and that meant using supervisor.
But you can do the same, either launch postgres as a service, or launch it with supervisor, or even putting it in the background with nohup postgres &.
I think you'll need to set the ENTRYPOINT of the image to spawn postgress prior to launching the command passed by binder."
jupyter  startup  postgres  binderhub  tm351 
february 2018 by psychemedia
Linked Data Templates
@fantasticlife interesting take on structuring triple soup as an API via an ontology… Or something…
linkedData  LD  tm351  publishing 
january 2018 by psychemedia
betatim/openrefineder: 💠 + 📚 OpenRefine on Binder!
Thinking this recipe from @betatim torun Openrefine via Binderhub could be tweaked to run datasette ? [@simonw]
openrefine  binder  binderhub  mybinder  tm351 
january 2018 by psychemedia
Simpler alternative to pandas
tm351  pandas  data  csv  py  package 
january 2018 by psychemedia
Death by Pokémon GO by Mara Faccio, John McConnell :: SSRN
Faccio, Mara and McConnell, John J., Death by Pokémon GO (November 18, 2017). Available at SSRN:
tm351  accidentData  geodata 
november 2017 by psychemedia
» As a researcher…I’m a bit bloody fed up with Data Management
“As a researcher…I’m a bit bloody fed up with Data Management”, via @cameronneylon Yep…
researchData  data  management  dataManagement  RDM  researchDataManagement  TM351  library 
june 2017 by psychemedia
