Tag Archives: analytics

[repost ]IBM 2013 Highlights: Big Data and Analytics


Happy Holidays to the information management community from the IBM Data magazine staff

By  and  | Published December 13, 2013 

IBM Data magazine continued to successfully execute its mission bringing high-quality content to the information management community in 2013. The thought leadership, best practices, strategies, tools, and technologies that apply to big data, analytics, databases, governance, information strategy, and other areas are having a tremendous impact on today’s enterprises and organizations in the era of big data.

IBM Data magazine was even recently recognized by HadoopSphere.com as a top 2013 big data influenceralong with several other IBM channels delivering content focused specifically on big data. This industry recognition would not have been possible without the outstanding contributed content from our extensive team of subject-matter experts.

As 2013 draws to a close, maybe you want to revisit that article on optimizing a data warehouse for analytics, or maybe you want to catch up on an installment from one of the regular columnists. During the final two weeks of December 2013, IBM Data magazine offers compilations of selected highlights for a look back at some of this year’s coverage. This installment provides the following highlights of postings focused on big data, data warehousing, and analytics. Look for more highlights featuring database technologies and information governance in another December posting. And as always, please share any thoughts, suggestions, or questions in the comments.


Feature highlights



Big Data and the Evolving Role of Governance


January 2013

By Sunil Soares
How data governance leads and data stewards can embrace and
extend the role of big data in their organizations


Is the Data Warehouse Dead?


May 2013

By Andrew Foo
The role of the data warehouse in the big data era


5 Steps for Migrating Data to IBM DB2 with BLU Acceleration


June 2013

By Mohankumar Saraswatipura – IBM Champion
Easily optimize data from an existing data warehouse for big data analytics


Big Data and the Internet of Everything


June 2013

By Alex Philip
A brave new world of possibilities


Three Data Categories Likely Missing in
Your Data Warehouse


June 2013

By David Mould
Both data warehouse managers and data scientists should jointly
evaluate these potential sources of useful predictors


Data Warehouse Testing – Part 1

July 2013
By Wayne Yaddow
Conducting end-to-end testing and quality assurance for data warehouses


Data Warehouse Testing – Part 2


July 2013

By Wayne Yaddow
Enriching data warehouse testing with checklists


Easing the Heavy Lifting of Bulk Data Processing


July 2013

By David Birmingham – IBM Champion
Generate SQL from a template, and voilà! Portable cross-database


Cooking Up a Hadoop Implementation


September 2013

By Sachin Ghai
A recipe for deploying a Hadoop technology stack in four to six months


Column highlights


Big Data

Why Big Data Doesn’t Require a Big Idea


January 2013

By Tom Deutsch
Bigger isn’t better when choosing your first big data project


Why Is Schema on Read So Useful?


May 2013

By Tom Deutsch
A primer on why flexibility—not scale—often drives big data adoption


Ten Years Later, Does IT Matter?


June 2013

By Tom Deutsch
Taking a look back on Nicholas Carr’s seminal article from 2003



Big Data Architects

Mining Data in a High-Performance Sandbox


November 2013

By Tommy Eunice
Fulfill data analysts’ dreams with data warehouse appliances for
in-database analytics and data mining



Data Warehousing

A Smarter Approach to Tackling Big Data


February 2013

By Nancy Hensley
The new IBM PureData System for Analytics: Focusing on analytic
performance and data center efficiency


If Hadoop Was Easy, Everyone Would Be Doing It


April 2013

By Nancy Hensley
IBM PureData System for Hadoop aims to simplify big data for the enterprise


Relishing the Big Data Burger


August 2013

By Nancy Hensley
How Hadoop wraps the data warehouse in a savory big data sandwich



Rocket Ship to Planet Petabyte

The Role of Stream Computing in Big Data


January 2013

By James Kobielus
Analyzing data in motion is a critical capability in a balanced
big data infrastructure


The Next Big “H” in Big Data: Hybrid Architectures


May 2013

By James Kobielus
Fit-for-purpose big data platforms will play together under virtualization



Social Business: The Power of Big Data in
Agile Engagement


June 2013

By James Kobielus
Why you should humanize everything your business says—to all
stakeholders, on all social networks

[repost ]AMP Camp Three – Analytics and Machine Learning at Scale was held in Berkeley California and live streamed online


AMP Camp Three – Analytics and Machine Learning at Scale was held in Berkeley California and live streamed online, August 29-30, 2013. AMP Camp 3 attendees and online viewers learned to solve big data problems using components of the Berkeley Data Analytics Stack (BDAS) including Spark, Shark, Mesos, Tachyon, MLbase as well as Hadoop.

The event was held in the Chevron Auditorium in the International House at UC Berkeley (details below).

The event was also live streamed and video archived on YouTube for free (find links in the agenda below).

At the event, we published the AMP Camp 3 Hands-on Exercises, which anybody you can use to get experience analyzing real big data on a real computer cluster of 6 Amazon EC2 instances (requires an AWS account). At the actual AMP Camp 3 event, we provided attendees with pre-launched clusters.

AMP Camp 3 consisted of the following agenda. Links to talk slides and videos can be found next to talk titles.

Day 1 – Thursday, Aug 29

8:00-9:00am Continental Breakfast

9:00-9:35am Intro to Big Data, the AMPLab, and the Berkeley Data Analytics Stack (Ion Stoica) pptx video

9:35-9:45am Overview of the AMP Camp Curriculum (Andy Konwinski)

9:45-10:30am Datacenter Management with Mesos (Benjamin Hindman)pptxpdfvideo

10:30-10:45am Break

10:45-11:45am Parallel programming with Spark (Matei Zaharia) – Matei Zahariapptxpdfvideo

11:45am-12:15pm Introduction to using Shark (Reynold Xin) pptxvideo

12:15-12:45pm Introduction to using Spark Streaming (Tathagata Das) pptxvideo

12:45-2:00pm Lunch

2:00-4:30pm Hands-on Exercises covering Spark, Shark, and Spark Streaming

4:30-5:00pm Break

5:00-6:30pm User Case Study Talks:

6:30-8:30pm Reception

Day 2 – Friday, Aug 30

8:00-9:00am Continental Breakfast

9:00-9:30am Introduction to using BlinkDB (Sameer Agarwal) pptxvideo

9:30-10:15am Introduction to using MLbase (Ameet Talwalkar, Evan Sparks)pdfvideo

10:15-10:45am Break

10:45am-12:45pm Hands-on Exercises covering Mesos, BlinkDB, and MLbase

12:45-2:15pm Lunch

2:15-2:45pm Introduction to Tachyon (Haoyuan Li) pptxpdfvideo

2:45-3:15pm Introduction GraphX (Joseph Gonzalez, Reynold Xin) pptxvideo

3:15-3:30pm Wrap-up and concluding words (Andy Konwinski, Michael Franklin)pdfvideo

[repost ]Book: Twitter Data Analytics – free download


New book, Twitter Data Analytics, explains Twitter data collection, management, and analysis – download a free preprint (PDF) and code examples.

A retweet network containing nodes from the topics #zuccotti and #nypdTwitter Data Analytics
by Shamanth Kumar, Fred Morstatter, and Huan Liu
Data Mining and Machine Learning Lab
Arizona State University

Springer, 2013.

Social media has become a major platform for information sharing. Due to its openness in sharing data, Twitter is a prime example of social media in which researchers can verify their hypotheses, and practitioners can mine interesting patterns and build realworld applications. This book takes a reader through the process of harnessing Twitter data to find answers to intriguing questions.

We begin with an introduction to the process of collecting data through Twitter’s APIs and proceed to discuss strategies for curating large datasets. We then guide the reader through the process of visualizing Twitter data with realworld examples, present challenges and complexities of building visual analytic tools, and provide strategies to address these issues. We show by example how some powerful measures can be computed using various Twitter data sources. This book is designed to provide researchers, practitioners, project managers, and graduate students new to the field with an entry point to jump start their endeavors. It also serves as a convenient reference for readers seasoned in Twitter data analysis.

Download the full book (preprint, PDF) here.

Download Code Examples (ZIP).

For more information, visit


[repost ]big data analytics


Big data analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue.


The primary goal of big data analytics is to help companies make better business decisions by enabling data scientists and other users to analyze huge volumes of transaction data as well as other data sources that may be left untapped by conventional business intelligence(BI) programs. These other data sources may include Web server logs and Internet clickstream data, social media activity reports, mobile-phone call detail records and information captured by sensors. Some people exclusively associate big data and big data analytics with unstructured data of that sort, but consulting firms like Gartner Inc. and Forrester Research Inc. also consider transactions and other structured data to be valid forms of big data.

Big data analytics can be done with the software tools commonly used as part of advanced analytics disciplines such as predictive analytics and data mining. But the unstructured data sources used for big data analytics may not fit in traditional data warehouses. Furthermore, traditional data warehouses may not be able to handle the processing demands posed by big data. As a result, a new class of big data technology has emerged and is being used in many big data analytics environments. The technologies associated with big data analytics include NoSQL databases, Hadoop and MapReduce. These technologies form the core of an open source software framework that supports the processing of large data sets across clustered systems.

Potential pitfalls that can trip up organizations on big data analytics initiatives include a lack of internal analytics skills and the high cost of hiring experienced analytics professionals, plus challenges in integrating Hadoop systems and data warehouses, although vendors are starting to offer software connectors between those technologies.


[repost ]teradata Big Data Analytics


Big Data Analytics

Between now and 2020, the sheer volume of digital information is predicted to increase to 35 trillion gigabytes – much of it coming from new sources including blogs, social media, internet search, and sensor networks.

Teradata can help you manage this onslaught with big data analytics for structured big data within an integrated relational database– and now Teradata’s Aster Data Analytic Platform can help you deal with the emerging big data that typically has unknown relationships, includes non-relational data types. Together, these two powerful technologies provide greater insight than ever for smarter, faster, decisions.

Teradata’s Aster Data Analytic Platform powers next-generation big data analytic applications with a massively parallel processing (MPP) analytics engine that stores and processes big data analytics together with data. As a result, it delivers breakthrough performance and scalability to give you a competitive edge in these critical areas:

  • Enable new analytics: Big data analytics framework with pattern and graph analysis that are hard to define and execute in SQL enable valuable new applications – including digital marketing optimization, fraud detection and prevention, and social network and relationship analysis.
  • Accelerate analytics development: Unique analytics architecture combined with pre-built library of analytic modules, visual development environment, and local testing capability simplify and streamline analytic development. Supports a variety of languages are supported – including C, C++, C#, Java, Python, Perl, and R – to simplify development and embedding of rich analytics in the MPP data store.
  • High-performance and elastic scalability: Patented high parallelism and massive scalability for complex analytics that enable iterative, on-the-fly data exploration and analysis to rapidly uncover new and changing patterns in data.
  • Cost-effective big data analytics: Uses commodity hardware to provide lower cost to scale than alternative approaches.

Products and Services

Teradata can help you design and implement BI across the enterprise based on a solid, proven platform backed by experienced support.


Teradata Database

The most scalable and easily managed relational database on the market.

Teradata Platform Family

From entry level to enterprise, Teradata serves different data warehousing and BI paradigms.

Teradata Value Analyzer

Delivers the necessary profitability insight to make better-informed marketing, distribution, product, and risk-management decisions.

Teradata Warehouse Miner

Provides an array of data profiling and mining functions, ranging from data exploration and transformation to analytic model development and deployment that are performed directly in Teradata database.



Teradata Industry Consulting Services

Tailored to specific industries including architecture and design modeling, data warehouse migration, and analytics.

Teradata Project Management Services

Leverage the Teradata difference to expedite implementation and training, and align strategic and operational intelligence.

Teradata Availability Management Services

Minimize risk and maintain the most uptime possible for your data warehouse.


Big Data Analytics In Action


-: The Next Big Bang

Combining online and offline customer data sparks explosion of new business insights.


Retail: Barnes & Noble – Cross-channel Analytics for Deeper Customer Insights

Barnes and Noble combines data from multiple channels to better understand its customers.


-: It’s your next opportunity,Big data

Big data is driving innovative approaches that change as frequently as the problems do. It’s going mainstream, and it’s your next opportunity.


Living in the Age of Wow: The Socialization of Data

Break through the boundaries of the “same old information” rut. Create new, dramatic opportunities to drive innovation.


[repost ]Data Science vs. Data Analytics


As this topic came up a few times this week for discussion at various places, I thought of composing a post on “Data Scientist vs. Data Analytics Engineer”; even though this is not in the list of TODO blog posts.

Data Science:

My personal understanding of “Data Science” (DS):

One who understands the data and business logic and provides predictions by sampling the current business data (also known as “data insights / business insights / data discovery / business discovery”); about the direction in which the business is heading (both good and bad) or where to head by spotting the trends; so that the business can take a right decision on their next steps; such as:

  • improving the product/feature based on user interest levels
  • driving more users
  • driving more clicks/ impressions / conversion / revenue / leads
  • user experience
  • recommendation
  • user retention

In general, “Data Science” is driven by “Data Scientists”; PhD in math, physics, statistics, machine learning or even computer science; Without a PhD in one of these areas, It is unlikely that one can be hired. In one of the recent ACM conferences, a leading online bidding data science hiring manager said in the open Q&A that she can’t hire anyone without a PhD (+ experience).

Data Scientist Qualifications:

  • Familiar on “how to use database systems (SQL interface, ad-hoc) esp. MySQL and Hive (at-least)” to begin with
  • Java / python / simple map-reduce jobs development, if needed
  • Exposure to various analytics functions (over, median, rank, etc.) and how to use them on various data sets
  • Mathematics, Statistics, Correlation, Data mining and Predictive analytics (fast to future prediction based on probability & correlation)
  • R” and/or “RStudio” (optionally excelSASIBM SPSSMATLAB)
  • Deep insights into (statistical ) data model development (in agile fashion) and in-general self learning model is the best in today’s dynamics; so that it can learn and tune from its own output by combining with performance over the period of time
  • Work with (very) large data sets, grouping together various data sets and visualizing them
  • Familiar with machine learning and/or data mining algorithms (MahoutBayesianClustering, etc.)

As there are different qualifications and expertize within data science, one needs to pick the right candidate for the type of role they going to play. For example, if you have a natural language processing (NLP) role; then you may need a different set of skills to match that role. At times, it also depends on the team size; one can be jack of all trades or the roles could be split among multiple teams.

At present, there is a lot of demand for “Data Scientists” in the market; probably one of the leading job roles after “Data Analytics”. Here is the trend for “Data Science” from indeed:


Data Analytics:

Data Analytics (DA) in general is a logical extension (or just a buzz word) to Data Warehousing(DW), Business Intelligence (BI); which provides complete insights into business data in most usable form. The major difference in warehousing to analytics is, analytics can be real-time and dynamic in most cases; where as warehouse is ETL driven in off-line fashion.

Every business who deals with “data”, must have “Data Analytics”; without analytics in-place; the business is treated as dead man walking without a heart, a soul and a mind.

Data Analytics (Engineer) Qualifications:

  • Familiar with data warehousing and business intelligence concepts
  • Strong in-depth exposure to SQL and analytic solutions
  • Exposure to hadoop platform based analytics solution (HBase, Hive, Map-reduce jobs, Impala, Cascading, etc.)
  • Exposure to various enterprise commercial data analytical stores (VerticaGreenplumAster DataTeradataNetezza, etc.) esp. on how to store/retrieve data in most efficient manner from these stores.
  • Familiar with various ETL tools (especially for transforming different sources of data into analytics data stores), if needed able to make everything (or some critical business features) real-time
  • Schema design for storing and retrieving data efficiently
  • Familiar with various tools and components in the data architecture
  • Decision making skills (real-time vs ETL, using X component instead of Y for implementing Z etc.)

Sometimes, A Data Analytics Engineer also plays the role of data mining on demand as needed; as he has a better understanding of the data than anyone else; and in-general they have to work closely to get better results.

Data Analytics can also be divided or shared between 4 different teams or people (as it is hard to hire a person with a complete skill-set and more over administration is different from development).

  • data architect
  • database administrator
  • analytics engineer and
  • operations

At present, “Data Analytics” is probably one of the hot jobs (may be Hadoop/Big Data Engineer has taken over by now); Here is the trend for “Data Analytics” from indeed; and it may continue to be “hot” for a while; as most business needs to have data analytics in place.


Even though both “Data Science” and “Data Analytics” look similar in terms of technology domain; but data science is a data consumer within the business unit and solely depends on data provided by data analytics team. More than that; most of the model predictions or algorithms works really well on large data sets due to better probability on bigger data sets ; so the bigger the data; you have much better chance to predict it right and drive the business further; which means both are directly depending on each other. If you have an engineer with both the qualifications, then he can play everything.

Academy: How to Become Data Scientist or Data Analytics Engineer


Related posts:

  1. Typical “Big” Data Architecture
  2. Data Store, Software and Hardware – What is best
  3. SQL Server Data Services – Invitation to attend the event
  4. Yahoo analytics, indextools

[repost ]IBM LanguageWare Resource Workbench



An Eclipse application for building custom language analysis into IBM LanguageWare resources and their associated UIMA annotators.


IBM LanguageWare Resource WorkbenchActions

Update: July 20, 2012:
Studio 3.0 is out and it is officially bundled with ICA 3.0. If you are a Studio 3.0 user, please use ICA forum instead of LRW forum. LRW is a fixpack that resolves issues in various areas including the Parsing Rules editor, PEAR file export and Japanese/Chinese language support. LRW is still available for download on the Downloads link for IBM OmniFind Enterprise Edition V9.1 Fix Pack users.

What is IBM LanguageWare?

IBM LanguageWare is a technology which provides a full range of text analysis functions. It is used extensively throughout the IBM product suite and is successfully deployed in solutions which focus on mining facts from large repositories of text. LanguageWare is the ideal solution for extracting the value locked up in unstructured text information and exposing it to business applications. With the emerging importance of Business Intelligence and the explosion in text-based information, the need to exploit this “hidden” information has never been so great. LanguageWare technology not only provides the functionality to address this need, it also makes it easier than ever to create, manage and deploy analysis engines and their resources.

It comprises Java libraries with a large set of features and the linguistic resources that supplement them. It also comprises an easy-to-use Eclipse-based development environment for building custom text analysis applications. In a few clicks, it is possible to create and deploy UIMA (Unstructured Information Management Architecture) annotators that perform everything from simple dictionary lookups to more sophisticated syntactic and semantic analysis of texts using dictionaries, rules and ontologies.

The LanguageWare libraries provide the following non-exhaustive list of features: dictionary look-up and fuzzy look-up, lexical analysis, language identification, spelling correction, hyphenation, normalization, part-of-speech disambiguation, syntactic parsing, semantic analysis, facts/entities extraction and relationship extraction. For more details see the documentation.

The LanguageWare Resource Workbench provides a complete development environment for the building and customization of dictionaries, rules, ontologies and associated UIMA annotators. This environment removes the need for specialist knowledge of the underlying technologies of natural language processing or UIMA. In doing so, it allows the user to focus on the concepts and relationships of interest, and to develop analyzers which extract them from text without having to write any code. The resulting application code is wrapped as UIMA annotators, which can be seamlessly plugged into any application that is UIMA-compliant. Further information about UIMA is available on the UIMA Apache site

LanguageWare is used in such various products as Lotus Notes and Domino, Information Integrator OmniFind Edition (IBM’s search technology), and more.

The LanguageWare Resource Workbench technology runs on Microsoft Windows and Linux. The core LanguageWare libraries support a much broader list of platforms. For more details on platform support please see the product documentation.

How does it work?

The LanguageWare Resource Workbench allows users to easily:

  • Develop rules to spot facts, entities and relationships using a simple drag and drop paradigm
  • Build language and domain resources into a LanguageWare dictionary or ontology
  • Import and export dictionary data to/from a database
  • Browse the dictionaries to assess their content and quality
  • Test rules and dictionaries in real-time on documents
  • Create UIMA annotators for annotating text with the contents of dictionaries and rules
  • Annotate text and browse the contents of each annotation.

The Workbench contains the following tools:

  • A dictionary viewer/editor
  • An XML-based dictionary builder
  • A Database-based dictionary builder (IBM DB2 and Apache Derby support are provided)
  • A dictionary comparison tool
  • A rule viewer/editor/builder
  • A UIMA annotator generator, which allows text documents to be annotated and the results displayed.
  • A UIMA CAS (common annotation structure) comparator, which allows you to compare the results of two different analyses through comparing the CASes generated by each run.

The LanguageWare Resource Workbench documentation is available online and is also installed using the Microsoft Windows or Linux installers or using the respective .zip files.

What type of application is LanguageWare suitable for?

LanguageWare technology can be used in any application that makes use of text analytics. Good examples are:

  • Business Intelligence
  • Information Search and Retrieval
  • The Semantic Web (in particular LanguageWare supports semantic analysis of documents based on ontologies)
  • Analysis of Social Networks
  • Semantic tagging applications
  • Semantic search applications
  • Any application wishing to extract useful data from unstructured text

For Web-based semantic query of the LanguageWare text analytics, you might be interested in checking out IBM Data Discovery and Query Builder. When used together, these two technologies can provide a full range of data access services including UI presentation, security and auditing of users, structured and unstructured data access through semantic concepts and deep text analytics of unstructured data elements.

More information

About the technology author(s)

LanguageWare is a worldwide organization comprising a highly qualified team of specialists with a diverse combination of backgrounds: linguists, computer scientists, mathematicians, cognitive scientists, physicists, and computational linguists. This team is responsible for developing innovative Natural Language Processing technology for IBM Software Group.

LanguageWare, along with LanguageWare Resource Workbench, is a collaborative project combining skills, technologies, and ideas gathered from various IBM product teams and IBM Research division.

Platform requirements

Operating systems: Microsoft Windows XP, Microsoft Windows Vista, or SUSE Linux Enterprise Desktop 11

Hardware: Intel 32-bit platforms (tested)


  • Java 5 SR5 or above (not compatible with Java 5 SR4 or below)
  • Apache UIMA SDK 2.3 (required by LanguageWare Annotators in order to run outside the Workbench)

Notes: Other platforms and JDK implementations may work but have not been significantly tested.

Installation instructions

Installing LanguageWare Resource Workbench and LanguageWare Demonstrator

On each platform, there are two methods of installation.
On Microsoft Windows:

  • Download the lrw.win32.install.exe or Demonstrator.exe and launch the installation.

On Linux:

  • Download the lrw.linux.install.bin or Demonstrator.bin and launch the installation (e.g., by running sh ./lrw.linux.install.bin).

LanguageWare Training and Enablement

The LanguageWare Training Material is a set of presentations that walk you through the use of LanguageWare Resource Workbench. It uses a step by step approach with examples and practice exercises to build up your knowledge of the application and understand data modeling. Please follow the logical succession of the material, and make sure you finish the sample exercise.
Please use this link to access the presentation decks.


  • Intel is the trademarks or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.
  • Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
  • Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
  • Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.


Tab navigation

1. What is the LanguageWare Resource Workbench? Why should I use it?The LanguageWare Resource Workbench is a comprehensive Eclipse-based environment for developing UIMA Analyzers. It allows you to build Domain Extraction Models (lexical resources and parsing rules, see FAQ 8) which describe the entities and relationships you wish to extract, creating custom annotators tailored to your specific needs. These annotators can be easily exported as PEAR files (see FAQ 10) and installed into any UIMA pipeline or deployed onto an ICA server. If you satisfy the following criteria, then you will want to use LanguageWare Resource Workbench:

  • You need a robust open standards (UIMA) text analyzer that can be easily customized to your specific domain and analysis challenges.
  • You need a technology that will enable you to exploit your existing structured data repositories in the analysis of the unstructured sources.
  • * You need a technology that allows you to build custom domain models that become your intellectual property and differentiation in the marketplace.
  • need a technology that is multi-lingual, multi-platform, multi-domain, and high performance.

Back to top

2. What’s new in this version of LanguageWare Resource Workbench?

All new features are outlined in the LanguageWare Resource Workbench Release Notes, ReleaseNotes.htm, which is located in the Workbench installation directory.

Back to top

3. Where should I start with LanguageWare?

The best way to get started with LanguageWare is to install the LanguageWare Resource Workbench. Check the training videos provided above or the training material (to be posted soon); it will introduce you to the Workbench and show you how it works.

Back to top

4. What documentation is available to help me use LanguageWare?

Context-sensitive help is provided as part of the LanguageWare Resource Workbench. There is an online help system shipped with the LanguageWare Resource Workbench (under Help / Help Contents). Check the training videos provided above or the training material (to be posted soon). More detailed information about the underlying APIs will be provided for fully-licensed users of the technology.

Back to top

5. What are the known limitations with this release of the LanguageWare Resource Workbench?

Any problems or limitations are outlined in the LanguageWare Resource Workbench Release Notes, ReleaseNote.htm, which is located in the LanguageWare Resource Workbench installation directory and is part of the LanguageWare Resource Workbench Help System.

Back to top

6. What version of UIMA do I need to use the LanguageWare annotators?

LanguageWare Resource Workbench ships with, and has been tested against, Apache UIMA, Version 2.3. They should work with newer versions of Apache UIMA; however, they have not been extensively tested for compatibility. Therefore, we would recommend Apache UIMA v2.3. The LanguageWare annotators are not compatible with versions of UIMA prior to 2.1. These were released by IBM and have namespace conflict with Apache UIMA.

Back to top

7. What is a Domain Extraction Model (or “model,” “annotator,” “analyzer”)? How do I build a good Domain Extraction Model?

A “model” is the set of resources you build to describe what you want to extract from the data. The models are a combination of:

  • The morphological resources, which describe the basic language characteristics
  • The lexical resources, which describe the entities/concepts that you want to recognize
  • The POS tagger resource
  • The parsing rules, which describe how concepts combine to generate new entities and relationships.

The process of building data models is an iterative process within the LanguageWare Resource Workbench.

Back to top

8. How do I change the default editor for new file types in the LanguageWare Resource Workbench?

Go to Window / Preferences / General / Editors / File Associations. If the content type is already listed, just add a new editor and pick the LanguageWare Text Editor. You can set this to be the default, or alternatively leave it as an option that you can choose, on right click, whenever you open a file of that type. You will need to restart the LanguageWare Resource Workbench before this comes into effect. Note: Eclipse remembers the last viewer you used for a file type so if you opened a document with a different editor beforehand you may need to right-click on the file and explicitly choose the LanguageWare Resource Workbench Text Editor the first time on restart.

Back to top

9. How do I integrate the UIMA Analyzers that I develop in the LanguageWare Resource Workbench?

Once you have completed building your Domain Extraction Models (dictionaries and rules), the LanguageWare Resource Workbench provides an “Export as UIMA Pear” function under File / Export. This will generate a PEAR file that contains all the code and resources required to run your pipeline in any UIMA-enabled application, that is, in a UIMA pipeline.

Back to top

10. How is my data stored?

The LanguageWare Resource Workbench is designed to primarily help you to build your domain extraction models and this includes databases in which you can store your data. The LanguageWare Resource Workbench ships with an embedded database (Derby, open source); however, it can also connect to an enterprise database, such as DB2.

Back to top

11. What licensing conditions apply for LanguageWare, for academic purposes, or for commercial use?LanguageWare is licensed through the IBM Content Analytics License at http://www-01.ibm.com/software/data/cognos/products/cognos-content-analytics/.

Back to top

13. Is Language Identification identifying the wrong language?

Sometimes the default amount of text (1024 characters) used by Language Identification is not enough to disambiguate the correct language. This happens specially when languages are quite close or when the text analysed may include text in more than one language. In this case, it may help to increase the MaxCharsToExamine parameter. To do this, select from the LWR menu: Window > Preferences > LanguageWare > UIMA Annotation Display. Enable the checkbox for “Show edit advanced configuration option on pipeline stages.” Select “Apply” and “OK.” Next time you open a UIMA Pipeline Configuration file, you will notice an Advanced Configuration link at the Document Language stage. Click on it to expand and display its contents, notice the MaxCharsToExamine parameter can be edited. Change the default number displayed to a bigger threshold. Save your changes and try again to see if the Language Identification has improved.

Back to top

13. Why is the LanguageWare Resource Workbench shipped as an Eclipse-based application?

We built the LanguageWare Resource Workbench on Eclipse because it provides a collaborative framework through which we can share components with other product teams across IBM, with our partners, and with our customers. This version of the LanguageWare Resource Workbench is a complete, stand-alone application. However, users can still get the benefits of the Eclipse IDE by installing Eclipse features into the Workbench. Popular features include the Eclipse CVS feature for managing shared projects and the Eclipse XML feature for full XML editing support. See the Eclipse online help for more information about finding and installing new features. It is important to understand that while the LanguageWare Resource Workbench is Eclipse-based, the Annotators that are exported from the LanguageWare Resource Workbench (under File / Export) can be installed into any UIMA pipeline and can be deployed in a variety of ways. The LanguageWare Resource Workbench team, as part of the commercial LanguageWare Resource Workbench license, provides integration source code to simplify the overall deployment and integration effort. This includes UIMA serializers, CAS consumers, and APIs for integrating into through C/JNI, Eclipse, Web Services (REST), and others.

Back to top

14. What languages are supported by the LanguageWare Resource Workbench?The following languages are fully supported, ie, LanguageID, Lexical Analysis and Part of Speech Disambiguation.






Chinese (Simplified)


Chinese (Traditional)





















For the following languages, a lexical dictionary without part of speech disambiguation can be made available upon request: Afrikaans, Catalan, Greek, Norwegian (Bokmal), Norwegian (Nynorsk), Russian and Swedish. These dictionaries are provided “AS-IS” (i.e. they have not been maintained and will not be supported. While feedback on them is much appreciated, requests for changes, fixes or queries will only be addressed if adequately planned and sufficiently funded).
Back to top

15. Does LanguageWare support GB18030?

LanguageWare annotators support UTF-16, and this qualifies as GB18030 support. This does mean that you need to translate the text from GB18030 to UTF-16 at the document ingestion stage. Java will do this automatically for you (in the collection reader stage) as long as the correct encoding is specified when reading files.

Please note that text in GB18030 extension B may contain characters outside the Unicode Basic Multilingual Plane. Currently the default LanguageWare break rules would incorrectly split such characters into two tokens. If support for these rare characters is required, the attached break rules file can be used to ensure the proper handling of 4-byte characters. (Note that the file zh-surrogates.dic for Chinese is wrapped in zh-surrogates.zip.)

Back to top

16. Are you experiencing problems with the LanguageWare Resource Workbench UI on Linux platforms?

There is a known issue on Ubuntu with Eclipse and the version of the GTK+ toolkit that prevents toolbars being drawn properly or buttons working properly with mouse clicks. The fix is explained here:
To provide a work around you need to create an environment variable “GDK_NATIVE_WINDOWS=1” before loading up the LanguageWare Resource Workbench. Another issue was reported for Ubuntu 9.10 (Karmic) with LanguageWare Resource Workbench showing an empty dialog window when starting. The issue is explained here:
To provide a work around you need to add “-Dorg.eclipse.swt.browser.XULRunnerPath=/dev/null” line to lrw.ini file in the LanguageWare Resource Workbench installation folder.

Back to top

17. Any other questions not covered here?

Please use the “Forum” to post your questions and we will get back to you.

Back to top

18. How do I upgrade my version of the LRW?

On Windows and Linux, each version of the LRW is a separate application. You should not install a new version over a previous one, instead make sure to either uninstall your previous version, or ensure that all versions are installed in separate locations. If in doubt, the default use of the LRW installers will ensure this behaviour.

The projects you created with older versions of the LRW will never be removed by the uninstall process. You can point your new version of the LRW at the same data workspace during startup.

Back to top


Bookmarks logo

Download Text Analytics Tools and Runtime for IBM LanguageWare

Updated by AlexisTipper|Dec 12 2011|Tags: download
Bookmarks logo

LanguageWare Forum

Updated by Amine_Akrout|Sep 14 2011|Tags:
Bookmarks logo


Updated by iron-horse|Jun 21 2011|Tags: alphaworks
Bookmarks logo

developerWorks Community

Updated by iron-horse|Jun 21 2011|Tags: dw
View All