“Nalin was a fantastic summer student. What I thought, before getting to know him, should take most of the summer, Nalin sorted out in a couple of days! I was most impressed. His insight and efficiency allowed us to go much further than I had ever hoped.”
Nalin Chhibber
Toronto, Ontario, Canada
2K followers
500+ connections
View mutual connections with Nalin
Nalin can introduce you to 10+ people at Amazon
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Nalin
or
New to LinkedIn? Join now
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Websites
- Personal Website
-
http://nalinc.github.com
- Personal Website
-
https://scholar.google.ca/citations?user=sA300w0AAAAJ&hl=en
- Personal Website
-
https://stackoverflow.com/users/1379667/nalinc
About
Nalin is a Computer Science Researcher, Fullstack Software Engineer, and an NLP…
Experience
Education
Recommendations received
3 people have recommended Nalin
Join now to viewView Nalin’s full profile
-
See who you know in common
-
Get introduced
-
Contact Nalin directly
Other similar profiles
Explore more posts
-
Ilyes.T. M.
GM CAPITAL HOLDING • 10K followers
Microsoft Just Made 100B Parameter LLMs Run on Your Laptop. Who Governs Them? BitNet breaks the GPU barrier. 1.58 bit quantization. 100 billion parameters on a single CPU. No cloud. No API costs. Local inference at human reading speed. This is a genuine breakthrough for accessibility. But accessibility without governance is liability. Running locally solves the infrastructure problem. It does not solve the governance problem. A 100B parameter model on your laptop can still hallucinate with confidence. It can still produce biased outputs. It can still make decisions with no audit trail. It can still violate constitutional bounds you never defined. Privacy from cloud providers is not the same as privacy from the model itself. Local execution means your data stays on your device. It does not mean the model operates within governance constraints. The question is not where the model runs. The question is whether the model can prove it operated within bounds. YIN-GATEWAY solves this. YIN-GATEWAY is a cryptographic control plane that sits between any LLM and its users. Cloud model, local model, quantized model, full precision model. The deployment changes. The governance requirement does not. Layer 2 Admissibility Gates reject non-compliant queries before execution. The model never sees requests that violate policy. Layer 3 Constitutional Bounds verify every output before it reaches users. If the output would violate governance constraints, it does not proceed. Layer 14 SENTINEL Verification runs independent binary checkpoint validation. Pass or fail. No partial compliance. Layer 15 Human Override requires cryptographic signatures before high-risk outputs proceed. Not a checkbox. A mathematical proof that a human authorized this specific action. BitNet gives you a 100B parameter model on your CPU. YIN-GATEWAY gives you cryptographic proof that the model operated within constitutional bounds, regardless of where it runs. The hardware revolution is here. The governance layer was already built. https://lnkd.in/eb2EdEwR 32 USPTO applications. 3,057 claims. Any model. One governance layer. Enforced behavior. #BitNet #YINMazariArchitecture #YINGATEWAY #AIGovernance #CryptographicCompliance #Microsoft #LocalAI #LLM #PrivacyPreserving #ResponsibleAI #EdgeAI #CPUInference #OpenSource
4
1 Comment -
Shane Deen
Salt XC • 4K followers
Google released an official CLI for Google Workspace — meaning AI agents can now directly interact with Gmail, Drive, Calendar, Sheets, and Docs through simple terminal commands. No custom integrations, no new protocols. An agent just runs a command and gets back clean, structured data. But Shane, what's is a CLI? a CLI (command line interface) is basically how software talks to other software. Developers have used them forever. What's changed is that AI agents — the kind that work autonomously in the background — naturally live in that same environment. So when Google ships a CLI, they're not just making a developer tool. They're making their entire Workspace ecosystem accessible to agents. What I found most interesting is the framing from one of the engineers who built it: he said he designed it for AI agents first, humans second. That's a meaningful shift in how tools are being built. There's also a genuine debate happening right now about whether MCP servers — which have been the dominant way to connect agents to external tools — are actually the right approach. A recent poll of agent builders had MCP finishing last, behind traditional APIs, CLIs, and even plain markdown files. The main critique: MCP loads everything into the agent's context window upfront, eating tokens before any real work starts. A CLI approach sidesteps that entirely. None of this is settled. But Google now has a stake in the ground at exactly the moment agent infrastructure is being figured out — and their real advantage isn't the CLI itself, it's everything already living in Workspace. Your emails, your Drive, your documents. That's context Anthropic and OpenAI don't own.
7
-
Navneet Anand
30K followers
I do not really post a lot about tech stuff, but maybe I can start. Here is a paper I read recently. ROSE: Robust caches for Amazon product search [https://lnkd.in/gPhbgHHQ] Thanks to Arpit Bhayani's paper shelf recommendations. The insights from this are really fascinating for the search world and how Amazon always knows how to find what you are looking for even if you have clumsy fingers like mine. For example if you are looking for shoes, you might type something like: “nike shoe,” “nike shoes,” “nikes shooes.” A normal cache treats these as different, so it gets big and slow, and you miss the cache a lot. Amazon’s ROSE fixes this by caching the intent, not the exact spelling. What ROSE essentially is, is a typo/variant-tolerant cache for product search that groups similar queries into the same “bucket,” so most lookups hit fast without growing memory. It’s deployed in Amazon Search and improved both latency (single-digit ms for most traffic) and business metrics. How this works is even more fascinating. 1. Locality-Sensitive Hashing: Similar queries collide in the same bucket. • Lexical-preserving hashing (character n-grams + minhash) for typos/variants. • Product-type-preserving hashing (weighted minhash) to keep the category intent (e.g., “dishwasher” vs. “dishwasher parts”). 2. Reservoir sampling: caps each bucket so memory stays constant even as queries grow. (Super convenient, and one of the many engineering gotcha's people might miss) 3. Count-based k-selection: avoids pairwise similarity inside buckets; just count collisions across hash tables → near constant-time retrieval. Where we can apply some of these concepts with Gen-AI (which is in many flavors essentially a search problem with extra steps)- Semantic prompt cache: Group paraphrased prompts to reuse an answer/logits/KV cache even when text isn’t identical. Can be helpful in saving your tokens and $. RAG request cache: map similar user questions to one retrieval plan to cut vector searches + reranking cost. [I remember Azure AI search having a RAG cache, but I am not sure how that worked] Tool/agent call cache: Dedupe near-duplicate function/tool invocations (e.g., same API call phrased differently). Eval & feedback loops: Bucket similar generations or errors to reuse critiques/patches. Support/search front-ends to LLMs: Typo-tolerant, intent-stable pre-rewrites before hitting the model, ensure we only use tokens when absolutely needed.
174
6 Comments -
Shivam Pradhan
PhonePe • 6K followers
-- Microsoft Layoffs Layoffs at the Top – A Sobering Industry Reminder The recent round of layoffs at Microsoft has sent ripples through the tech industry — not just for the scale, but for who was impacted. Among them was Ron Buckton, a respected engineer with 18 years at Microsoft, including a decade dedicated to the evolution of TypeScript — a foundational language for modern web development. His departure underscores a reality many professionals and hiring leaders must now confront: Tenure and impact no longer equate to job security. This is not about performance. It’s about the new structure of global tech: #AI is shifting priorities and #skill demands. Efficiency and margin pressure are driving reorganizations. Even high performers with deep institutional knowledge are being affected. What this means for us as professionals and hiring leaders: - Job security is no longer linear. Career stability now depends on agility, visibility, and continuous evolution. - Networks matter more than ever. Strong personal brands, public contributions, and community presence are no longer optional. - Companies must evolve how they recognize and retain technical excellence — before it's lost in reorgs. As hiring leaders, we must look beyond resumes and tenure — and as professionals, we must prepare for a world where stability is self-created. To those impacted: Your contributions do not go unnoticed. The market still values technical depth, integrity, and leadership. The best companies are watching, and many of us are hiring. Let’s stay connected. Let’s support our peers. And let’s build a future that values long-term impact, not just short-term alignment. #TechLayoffs #Microsoft #TypeScript #Hiring #Leadership #AI #EngineeringExcellence #CareerStrategy #OpenToWork #Future
43
-
Shalini Goyal
JPMorganChase • 119K followers
System Design interviews don’t reward memorization. They reward clarity. Because today, system design is no longer “for seniors only.” Even SDE1/SDE2 interviews test whether you can think at scale - users, latency, failures, cost, and reliability. That’s exactly what this guide covers. It breaks down 10 must-know system design architectures that show up again and again in interviews - along with the core components, scaling tricks, and the exact “interview gold points” for each system. Instead of random theory, it trains you to explain systems like an engineer. How requests flow, where bottlenecks happen, what components matter most, and what tradeoffs interviewers actually care about. The 10 Designs Covered Inside : 1. It starts with fundamentals like a URL Shortener - then expands into real-world architectures like: 2. A Messaging System (WhatsApp) that handles delivery, retries, ordering, read receipts, and offline users. 3. An Instagram Feed that teaches fanout-on-write vs fanout-on-read and ranking tradeoffs. 4. A Video Streaming system (YouTube/Netflix) that shows the encoding pipeline, chunking, and why CDNs dominate scale. 5. A Ride Sharing system (Uber) built on real-time GPS streams, ETA accuracy, and geospatial indexing. 6. A Web Crawler (Google) designed as a pipeline: discover → fetch → parse → index with deduplication. 7. An E-commerce system (Amazon) showing inventory consistency, idempotent orders, and event-driven flows. 8. A Food Delivery system powered by workflow engines and state machines for reliable tracking. 9. A Spotify architecture where streaming + personalization becomes the retention engine. 10. And a Dropbox file sharing system where metadata + versioning is the real product. If you can explain these 10 clearly, you’re already ahead of most candidates - because these designs cover almost every interview pattern: caching, queues, sharding, consistency, fanout, CDN, state machines, and observability.
410
73 Comments -
James Webb
CoinDesk • 977 followers
🎩 The Case for LLM Gallantry Posts have gone viral recently citing that manners create "noisy context" and that your "please" is costing millions in compute. I respectfully disagree. 1. Prime for 'Expert Clusters'—Sophisticated reasoning usually inhabits the same domain as polite, professional, discourse. 🔷 Bluntness or rudeness can direct a model toward “internet argument” weighting, encouraging it to cite equally toned sources. Gallantry primes the model to navigate toward ‘Expert Clusters’ such as academic journals and expert forums. 2. Reduce Entropy—Bluntness does not equal directness. Abrasive or chaotic syntax introduces irrelevant emotional tokens that compete for compute attention. 🔷 Clear gallant framing acts as a high-fidelity signal. It minimises entropy, ensuring the model's $K, V$ matrices are focused on your logic problem, and not your emotional tone. 3. Use the Reward Model—LLMs are trained using RLHF, harsh prompting risks triggering safety guardrails or defensive hedging. 🔷 Collaborative, chivalrous tone aligns with model's reward model, creating a fast-lane to complex reasoning capabilities. 4. Tonal Drift & The Preservation of Excellence—The period (.) is 1500 years old, but modern texting culture has caused its meaning to morph into a symbol of passive-aggression. 🔷 Cultural preservation of respectful communication, is not a burden, but a necessary exercise of selflessness, lest your uncouth LLM-speak drift into your real-world communication. Manners maketh man. #AI #PromptEngineering #GenerativeAI #TechTips #LLM #Etiquette I used em dashes before they were cool.
4
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More