jm + pinterest   10

Pinterest blocks vaccine-related searches in bid to fight anti-vaxx propaganda | Technology | The Guardian
The phenomenon on display in the Facebook search result screenshots is known in technology circles as a “data void”, after a paper by the Data & Society founder and researcher danah boyd. For certain search terms, boyd explains, “the available relevant data is limited, non-existent, or deeply problematic”.

In the case of vaccines, the fact that scientists and doctors are not producing a steady stream of new digital content about settled science has left a void for conspiracy theorists and fraudsters to fill with fear-mongering propaganda and misinformation. [...]

Pinterest has responded by building a “blacklist” of “polluted” search terms.

“We are doing our best to remove bad content, but we know that there is bad content that we haven’t gotten to yet,” explained Ifeoma Ozoma, a public policy and social impact manager at Pinterest. “We don’t want to surface that with search terms like ‘cancer cure’ or ‘suicide’. We’re hoping that we can move from breaking the site to surfacing only good content. Until then, this is preferable.”
data-voids  danah-boyd  pinterest  antivax  vaccination  misinformation  disinfo  vaccines  truth  blacklisting 
28 days ago by jm
Evolving MySQL Compression - Part 2 | Pinterest Engineering
generating a near-optimal external dictionary for Zlib deflate compression
compression  deflate  zlib  pinterest  hacks  mysql 
january 2017 by jm
Auto scaling Pinterest
notes on a second-system take on autoscaling -- Pinterest tried it once, it didn't take, and this is the rerun. I like the tandem ASG approach (spots and nonspots)
spot-instances  scaling  aws  scalability  ops  architecture  pinterest  via:highscalability 
december 2016 by jm
Pinball
Pinterest's Hadoop workflow manager; 'scalable, reliable, simple, extensible' apparently. Hopefully it allows upgrades of a workflow component without breaking an existing run in progress, like LinkedIn's Azkaban does :(
python  pinterest  hadoop  workflows  ops  pinball  big-data  scheduling 
april 2015 by jm
Making Pinterest — Learn to stop using shiny new things and love MySQL
'The third reason people go for shiny is because older tech isn’t advertised as aggressively as newer tech. The younger companies needs to differentiate from the old guard and be bolder, more passionate and promise to fulfill your wildest dreams. But most new tech sales pitches aren’t generally forthright about their many failure modes. In our early days, we fell into this third trap. We had a lot of growing pains as we scaled the architecture. The most vocal and excited database companies kept coming to us saying they’d solve all of our scalability problems. But nobody told us of the virtues of MySQL, probably because MySQL just works, and people know about it.'

It's true! -- I'm still a happy MySQL user for some use cases, particularly read-mostly relational configuration data...
mysql  storage  databases  reliability  pinterest  architecture 
april 2015 by jm
Pinterest's highly-available configuration service
Stored on S3, update notifications pushed to clients via Zookeeper
s3  zookeeper  ha  pinterest  config  storage 
march 2015 by jm
Zookeeper: not so great as a highly-available service registry
Turns out ZK isn't a good choice as a service discovery system, if you want to be able to use that service discovery system while partitioned from the rest of the ZK cluster:
I went into one of the instances and quickly did an iptables DROP on all packets coming from the other two instances.  This would simulate an availability zone continuing to function, but that zone losing network connectivity to the other availability zones.  What I saw was that the two other instances noticed the first server “going away”, but they continued to function as they still saw a majority (66%).  More interestingly the first instance noticed the other two servers “going away”, dropping the ensemble availability to 33%.  This caused the first server to stop serving requests to clients (not only writes, but also reads).


So: within that offline AZ, service discovery *reads* (as well as writes) stopped working due to a lack of ZK quorum. This is quite a feasible outage scenario for EC2, by the way, since (at least when I was working there) the network links between AZs, and the links with the external internet, were not 100% overlapping.

In other words, if you want a highly-available service discovery system in the fact of network partitions, you want an AP service discovery system, rather than a CP one -- and ZK is a CP system.

Another risk, noted on the Netflix Eureka mailing list at https://groups.google.com/d/msg/eureka_netflix/LXKWoD14RFY/tA9UnerrBHUJ :

ZooKeeper, while tolerant against single node failures, doesn't react well to long partitioning events. For us, it's vastly more important that we maintain an available registry than a necessarily consistent registry. If us-east-1d sees 23 nodes, and us-east-1c sees 22 nodes for a little bit, that's OK with us.


I guess this means that a long partition can trigger SESSION_EXPIRED state, resulting in ZK client libraries requiring a restart/reconnect to fix. I'm not entirely clear what happens to the ZK cluster itself in this scenario though.

Finally, Pinterest ran into other issues relying on ZK for service discovery and registration, described at http://engineering.pinterest.com/post/77933733851/zookeeper-resilience-at-pinterest ; sounds like this was mainly around load and the "thundering herd" overload problem. Their workaround was to decouple ZK availability from their services' availability, by building a Smartstack-style sidecar daemon on each host which tracked/cached ZK data.
zookeeper  service-discovery  ops  ha  cap  ap  cp  service-registry  availability  ec2  aws  network  partitions  eureka  smartstack  pinterest 
november 2014 by jm
Pinterest Secor
Today we’re open sourcing Secor, a zero data loss log persistence service whose initial use case was to save logs produced by our monetization pipeline. Secor persists Kafka logs to long-term storage such as Amazon S3. It’s not affected by S3’s weak eventual consistency model, incurs no data loss, scales horizontally, and optionally partitions data based on date.
pinterest  hadoop  secor  storm  kafka  architecture  s3  logs  archival 
may 2014 by jm
ZooKeeper Resilience at Pinterest
essentially decoupling the client services from ZK using a local daemon on each client host; very similar to Airbnb's Smartstack. This is a bit of an indictment of ZK's usability though
ops  architecture  clustering  network  partitions  cap  reliability  smartstack  airbnb  pinterest  zookeeper 
march 2014 by jm
High Scalability - Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
wow, Pinterest have a pretty hardcore architecture. Sharding to the max. This is scary stuff for me:
a [Cassandra-style] Cluster Management Algorithm is a SPOF. If there’s a bug it impacts every node. This took them down 4 times.


yeah, so, eek ;)
clustering  sharding  architecture  aws  scalability  scaling  pinterest  via:matt-sergeant  redis  mysql  memcached 
april 2013 by jm

Copy this bookmark:



description:


tags: